1
0
Fork 0
forked from eden-emu/eden
Commit graph

254 commits

Author SHA1 Message Date
Lioncash
f1443d2b41 shader_bytecode: Make Matcher constexpr capable
Greatly shrinks the amount of generated code for GetDecodeTable().

Collapses an assembly output of 9000+ lines down to ~3621 with Clang,
and 6513 down to ~2616 with GCC, given it's now allowed to construct all
the entries as a sequence of constant data.
2019-10-24 01:10:10 -04:00
bunnei
6deb6d2b10 Merge pull request #2869 from ReinUsesLisp/suld
shader/image: Implement SULD and fix SUATOM
2019-09-23 21:47:03 -04:00
Rodrigo Locatti
e33b9e3e6f Merge pull request #2878 from FernandoS27/icmp
shader_ir: Implement ICMP
2019-09-21 18:06:07 -03:00
ReinUsesLisp
79a7463f4c gl_shader_decompiler: Use uint for images and fix SUATOM
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
2019-09-21 17:33:52 -03:00
ReinUsesLisp
331d140bb4 shader/image: Implement SULD and remove irrelevant code
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
2019-09-21 17:32:48 -03:00
ReinUsesLisp
dfe69a7f19 shader_bytecode: Add SULD encoding 2019-09-21 17:31:46 -03:00
Fernando Sahmkow
f02b9d37f0 Shader_IR: ICMP corrections and fixes 2019-09-21 14:28:03 -04:00
Fernando Sahmkow
01b8a78a8a Shader_IR: Implement ICMP. 2019-09-19 20:56:29 -04:00
ReinUsesLisp
42815d1d24 shader_ir/warp: Implement SHFL 2019-09-17 17:44:07 -03:00
ReinUsesLisp
2e6bebb3d2 shader/image: Implement SUATOM and fix SUST 2019-09-10 20:22:31 -03:00
ReinUsesLisp
9b001821d9 shader/shift: Implement SHR wrapped and clamped variants
Nvidia defaults to wrapped shifts, but this is undefined behaviour on
OpenGL's spec. Explicitly mask/clamp according to what the guest shader
requires.
2019-09-04 01:55:24 -03:00
bunnei
3df0f440fd Merge pull request #2812 from ReinUsesLisp/f2i-selector
shader_ir/conversion: Implement F2I and F2F F16 selector
2019-09-03 22:35:33 -04:00
bunnei
4ae7f81090 Merge pull request #2811 from ReinUsesLisp/fsetp-fix
float_set_predicate: Add missing negation bit for the second operand
2019-09-03 22:34:34 -04:00
ReinUsesLisp
6f134adf2a shader_ir/conversion: Split int and float selector and implement F2F H1 2019-08-28 16:09:33 -03:00
ReinUsesLisp
d9ad389777 shader_ir/conversion: Implement F2I F16 Ra.H1 2019-08-27 23:40:40 -03:00
ReinUsesLisp
d490cc5285 float_set_predicate: Add missing negation bit for the second operand 2019-08-27 21:57:43 -03:00
ReinUsesLisp
67f47b2f6a shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
bunnei
0d754d7a75 Merge pull request #2753 from FernandoS27/float-convert
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
2019-08-21 10:27:57 -04:00
ReinUsesLisp
b6272eb8e2 shader_ir: Implement NOP 2019-08-04 03:02:55 -03:00
Fernando Sahmkow
9a0fa90be2 Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
This commit takes care of implementing the F16 Variants of the 
conversion instructions and makes sure conversions are done.
2019-07-20 17:38:25 -04:00
ReinUsesLisp
edc43b2509 shader/half_set_predicate: Implement missing HSETP2 variants 2019-07-19 22:20:47 -03:00
Fernando Sahmkow
e221290cb7 Merge pull request #2695 from ReinUsesLisp/layer-viewport
gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
2019-07-15 16:28:07 -04:00
Fernando Sahmkow
43662e376e Merge pull request #2692 from ReinUsesLisp/tlds-f16
shader/texture: Add F16 support for TLDS
2019-07-14 08:44:38 -04:00
Fernando Sahmkow
d5d4cc30ec shader_ir: Implement BRX & BRA.CC 2019-07-09 08:14:37 -04:00
ReinUsesLisp
a650406899 gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.

In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
2019-07-07 20:42:55 -03:00
ReinUsesLisp
48d485d6df shader/texture: Add F16 support for TLDS 2019-07-07 16:05:56 -03:00
ReinUsesLisp
7eed876cfb shader_bytecode: Include missing <array> 2019-06-24 01:51:02 -03:00
ReinUsesLisp
224e4e174d shader: Decode SUST and implement backing image functionality 2019-06-20 21:38:33 -03:00
ReinUsesLisp
27cd63a05a shader: Implement texture buffers 2019-06-20 21:36:12 -03:00
Fernando Sahmkow
a8250f511b shader_bytecode: Mark EXIT as flow instruction 2019-06-04 12:18:35 -04:00
ReinUsesLisp
68af52d525 shader/memory: Implement ST (generic memory) 2019-05-20 22:41:53 -03:00
ReinUsesLisp
71ded7da4e shader/memory: Implement LD (generic memory) 2019-05-20 22:38:59 -03:00
ReinUsesLisp
5bf7324068 shader_ir/other: Implement IPA.IDX 2019-05-02 21:46:37 -03:00
ReinUsesLisp
f96020b2ae shader_ir/memory: Implement physical input attributes 2019-05-02 21:46:25 -03:00
ReinUsesLisp
9a9902214e shader_bytecode: Add AL2P decoding 2019-05-02 21:46:25 -03:00
bunnei
7fc67a06bb Merge pull request #2407 from FernandoS27/f2f
Do some corrections in conversion shader instructions.
2019-04-20 00:42:34 -04:00
bunnei
c1c43bde80 Merge pull request #2348 from FernandoS27/guest-bindless
Implement Bindless Textures on Shader Decompiler and GL backend
2019-04-17 20:59:49 -04:00
bunnei
d4b42f6bc6 Merge pull request #2315 from ReinUsesLisp/severity-decompiler
shader_ir/decode: Reduce the severity of common assertions
2019-04-16 22:21:19 -04:00
Fernando Sahmkow
73f925a949 Do some corrections in conversion shader instructions.
Corrects encodings for I2F, F2F, I2I and F2I
Implements Immediate variants of all four conversion types.
Add assertions to unimplemented stuffs.
2019-04-15 19:16:27 -04:00
ReinUsesLisp
79e7fb6d6f shader_ir: Implement STG, keep track of global memory usage and flush 2019-04-14 00:25:32 -03:00
bunnei
dd5989d907 Merge pull request #2366 from FernandoS27/xmad-fix
Correct XMAD mode, psl and high_b on different encodings.
2019-04-09 19:15:01 -04:00
Fernando Sahmkow
25e6fb72eb Correct LOP_IMN encoding 2019-04-08 13:39:12 -04:00
Fernando Sahmkow
34b15b69df Correct XMAD mode, psl and high_b on different encodings. 2019-04-08 13:01:17 -04:00
Fernando Sahmkow
f5792ffeab Move ConstBufferAccessor to Maxwell3d, correct mistakes and clang format. 2019-04-08 11:36:11 -04:00
Fernando Sahmkow
2f456841b0 Implement TXQ_B 2019-04-08 11:29:52 -04:00
Fernando Sahmkow
8bb9877b70 Corrections to TEX_B 2019-04-08 11:28:44 -04:00
Fernando Sahmkow
ee9b2e3cdc Implement Bindless Samplers and TEX_B in the IR. 2019-04-08 11:23:42 -04:00
ReinUsesLisp
f725007975 shader_ir/memory: Reduce severity of LD_L cache management and log it 2019-04-03 17:12:44 -03:00
ReinUsesLisp
c2ea1d5263 shader_ir/memory: Reduce severity of ST_L cache management and log it 2019-04-03 17:12:44 -03:00
bunnei
11ac277646 Merge pull request #2147 from ReinUsesLisp/texture-clean
shader_ir: Remove "extras" from the MetaTexture
2019-03-10 17:28:36 -04:00