-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nv2a: Support signed textures #36
base: master
Are you sure you want to change the base?
Conversation
55e26f2
to
558fe2b
Compare
What happens when R8 or something has signed channels? Will G and B be affected by R signedness? will A still be 1.0? |
6f7c9b3
to
4efcb0b
Compare
hw/xbox/nv2a/nv2a_pgraph.c
Outdated
@@ -4089,31 +4089,113 @@ static uint8_t* convert_texture_data(const TextureShape s, | |||
unsigned int slice_pitch) | |||
{ | |||
//FIXME: Handle all formats | |||
if (s.color_format == NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_A8R8G8B8) { | |||
if ((s.color_format == NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_A8R8G8B8) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME: Explain what this code path does
Note to self: This flips the sign.
Normally this is two's-complement. So:
0x80 = -128 = 128
0xFF = ..-1 = 255
0x00 = ...0 = ..0
0x7F = .127 = 127
Note the discontinuity between 0xFF and 0x00 when interpreting it as unsigned value. By flipping the sign bit and interpreting as unsigned, we achieve the following:
0x80
will be0x00 = ...0 = ..0
0xFF
will be0x7F = .127 = 127
0x00
will be0x80 = -128 = 128
0x7F
will be0xFF = ..-1 = 255
- So we have gotten rid of the discontinuity.
In the shader, we scale and add a bias, to bring it back into the -128 to 127 range.
return NULL; | ||
} | ||
|
||
// Convert XOR mask to be re-usable for 8 or 16bpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME: Add comment that this is a hack / incomplete
How are 1 bit values signed? like A1R5G5B5 |
@@ -4086,6 +4086,38 @@ static void convert_yuy2_to_rgb(const uint8_t *line, unsigned int ix, | |||
*b = cliptobyte((298 * c + 516 * d + 128) >> 8); | |||
} | |||
|
|||
static const uint32_t* get_gl_sign_bits(GLenum gl_format, GLenum gl_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what this does
@@ -4086,6 +4086,38 @@ static void convert_yuy2_to_rgb(const uint8_t *line, unsigned int ix, | |||
*b = cliptobyte((298 * c + 516 * d + 128) >> 8); | |||
} | |||
|
|||
static const uint32_t* get_gl_sign_bits(GLenum gl_format, GLenum gl_type) { | |||
if ((gl_format == GL_BGRA) && (gl_type == GL_UNSIGNED_INT_8_8_8_8_REV)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also using GL_RGBA, GL_UNSIGNED_INT_8_8_8_8
So I assume that a bias is wrong, but 8 bits hides that problem. This is a design issue that I'll try to solve. But as a temporary hack we can probably fall back to the current solution, too (as the normalized result will be "good enough" for most applications). |
@@ -1055,6 +1055,8 @@ | |||
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_R5G6B5 0x11 | |||
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_A8R8G8B8 0x12 | |||
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_Y8 0x13 | |||
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_R8B8 0x16 | |||
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_G8B8 0x17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split this commit into a separate PR
qstring_append(final, "\n"); | ||
qstring_append(final, "vec4 signed_texture(vec4 sample, bvec4 mask) {\n"); | ||
qstring_append(final, " vec4 signed_sample = sample * 2.0 - 1.0;\n"); | ||
qstring_append(final, " return mix(sample, signed_sample, mask);\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Hack so interpolation will work properly
uint8_biased = uint8 xor 0x80;
// Normalization step
//FIXME: GL will use / 255.0, or / 15.0 etc.?
uint8_biased_normalized = uint8_biased/254;
// Unbias to recover intended value
sint8 = (uint8_biased_normalized*2-1)-2/254
Equivalent (unconfirmed):
// Normalization step
uint8_biased_normalized = uint8_biased/255; // Assumed as part of GL
// Unbias to recover intended value
sint8 = (uint8_biased_normalized*255-1)/127-1.0
Equivalent (unconfirmed):
// Normalization step
uint8_biased_normalized = uint8_biased/255; // Assumed as part of GL
// Unbias to recover intended value
const scale = 255/127
const bias_prime = -1.0-1/127
sint8 = uint8_biased_normalized * scale + bias
is closer to values shown for SNORM8 at https://github.com/nlguillemot/SNormTest
(Clamping not shown here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see equation 2.3 (page 11) of https://www.khronos.org/registry/OpenGL/specs/gl/glspec33.core.pdf
bool gl_component_signed[4] = { false, false, false, false }; | ||
for(int i = 0; i < 4; i++) { | ||
|
||
// Use GL swizzle mask to figure out which GL component to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember what I meant by this and the comment doesn't help.
I should improve this comment.
@@ -4080,6 +4127,116 @@ static uint8_t* convert_texture_data(const TextureShape s, | |||
unsigned int row_pitch, | |||
unsigned int slice_pitch) | |||
{ | |||
//FIXME: Handle all formats | |||
if (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably happen after format conversion
// Get texture format information | ||
ColorFormatInfo f = kelvin_color_format_map[s.color_format]; | ||
|
||
//FIXME: Extend for other formats in the main texture table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was probably added when get_gl_sign_bits
wasn't a function yet.
It can be removed.
GLenum gl_swizzle = f.gl_swizzle_mask[i]; | ||
|
||
//FIXME: What happens with these? | ||
if (gl_swizzle == GL_ZERO) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a switch-case?
gl_component_used[j] = true; | ||
|
||
// Apply XOR mask for signedness, and mark as signed | ||
if (s.signed_rgba[i]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be checked at the start of the loop?
} | ||
|
||
if (gl_swizzle == GL_RED) { | ||
j = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
j
should be called something like gl_component_index
uint32_t gl_xor_mask = 0; | ||
bool gl_component_used[4] = { false, false, false, false }; | ||
bool gl_component_signed[4] = { false, false, false, false }; | ||
for(int i = 0; i < 4; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
should be called something like nv2a_component_index
This PR adds a bit of a hack for signed textures. Consider this merely a stub.
Xbox allows per-texture per-component signed-flags. This flag decides wether a texture component is intepreted as unsigned value in range 0.0 to 1.0 (UNORM), or as signed two's complement value -1.0 to 1.0 (SNORM).
In hardware, the output of this sample is probably a sign-extended fixed point integer value.
There's different solutions we could use to solve this; this implements a limited variant of the third proposal (which has many issues):
1. Two's complement conversion in shader
The value is sampled as UNORM by OpenGL (values mapped to range from 0.0 to 1.0), but then mapped to SNORM in the range -1.0 to 0.0, or 0.0 to 1.0, depending on the highest bit (by checking if the UNORM value is above 0.5).
The interpolation issue happens because with unsigned textures (such as OpenGL) those 8-bit values would be sampled as value 0.5 (127 or 128). Going from 127 (< 0.5) to 128 (> 0.5) means a slow value increase, so the interpolated value goes up as the sample point moves closer to raw value 128.
- For signed textures (two's complement), the sampled values would be 1.0 (127) and -1.0 (-128) respectively. When going from 127 (1.0) to 128 (-1.0) we should expect a rapid decrease for the interpolated value.
However, as OpenGL doesn't interpolate this direction, the interpolated values are wrong in such transition areas.
Also see nlguillemot/SNormTest#2 (comment)
2. SNORM textures
OpenGL has signed textures itself. These aren't per-component (as on Xbox), so we'd have to sample different textures (or texture-views in OpenGL 4.3 and above), then mix the results in the shaders.
3. Software decoding on CPU
We could just decode all affected formats on the CPU and upload the expected GPU results to the GPU.
This PR implements this, but doesn't support DXT textures
4. Software interpolation in shader
We can also use point-filtering and do all interpolation in the shader.
This approach could also be extended with integer lookups / software DXT decompression.
This means it should work for all formats and optionally provide raw integer access; but we can also continue to use the host GPUs format decoder / normalization.
We could also integrate the software unswizzling in the shader, so the CPU rarely has to touch resources. We could move this into a compute(-like)-shader, if the performance impact is too high.
As we'd also control the texture address, we could even accurately represent the texture address fraction bits for the interpolation (so effects like banding are emulated).
This PR strongly interacts with texture / surface caching and probably dotmapping.
TODO: