Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nv2a: Support signed textures #36

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

nv2a: Support signed textures #36

wants to merge 5 commits into from

Conversation

JayFoxRox
Copy link
Owner

@JayFoxRox JayFoxRox commented Jun 23, 2019

This PR adds a bit of a hack for signed textures. Consider this merely a stub.

Xbox allows per-texture per-component signed-flags. This flag decides wether a texture component is intepreted as unsigned value in range 0.0 to 1.0 (UNORM), or as signed two's complement value -1.0 to 1.0 (SNORM).

In hardware, the output of this sample is probably a sign-extended fixed point integer value.


There's different solutions we could use to solve this; this implements a limited variant of the third proposal (which has many issues):

1. Two's complement conversion in shader

The value is sampled as UNORM by OpenGL (values mapped to range from 0.0 to 1.0), but then mapped to SNORM in the range -1.0 to 0.0, or 0.0 to 1.0, depending on the highest bit (by checking if the UNORM value is above 0.5).

  • This breaks when interpolating 8-bit values below 0x7F to values above 0x80 (equvialent issues happen for other number of bits).
  • No access to raw integer values.

The interpolation issue happens because with unsigned textures (such as OpenGL) those 8-bit values would be sampled as value 0.5 (127 or 128). Going from 127 (< 0.5) to 128 (> 0.5) means a slow value increase, so the interpolated value goes up as the sample point moves closer to raw value 128.
- For signed textures (two's complement), the sampled values would be 1.0 (127) and -1.0 (-128) respectively. When going from 127 (1.0) to 128 (-1.0) we should expect a rapid decrease for the interpolated value.
However, as OpenGL doesn't interpolate this direction, the interpolated values are wrong in such transition areas.

Also see nlguillemot/SNormTest#2 (comment)

2. SNORM textures

OpenGL has signed textures itself. These aren't per-component (as on Xbox), so we'd have to sample different textures (or texture-views in OpenGL 4.3 and above), then mix the results in the shaders.

  • Multiple texture lookups.
  • Only works for 8-bit RGBA formats (R8, RG8, RGB8 and RGBA8); internal storage for RGB565 etc. will have to be changed (likely software decoding in driver).
  • Will not work for DXT at all, so we have to decode these manually.
  • Will not emulate texture address accuracy of Xbox.
  • Will not provide access to raw integer values (might be necessary for dotmapping or some games).

3. Software decoding on CPU

We could just decode all affected formats on the CPU and upload the expected GPU results to the GPU.

  • Overhead for each texture upload (although we already do this for unswizzling).
  • A lot of code complexity for decoding each texture format, including DXT.
  • Will not emulate texture address accuracy of Xbox.

This PR implements this, but doesn't support DXT textures

4. Software interpolation in shader

We can also use point-filtering and do all interpolation in the shader.

  • Multiple texture lookups.
  • Complicates shader logic, and will probably negatively affect performance.

This approach could also be extended with integer lookups / software DXT decompression.
This means it should work for all formats and optionally provide raw integer access; but we can also continue to use the host GPUs format decoder / normalization.

We could also integrate the software unswizzling in the shader, so the CPU rarely has to touch resources. We could move this into a compute(-like)-shader, if the performance impact is too high.

As we'd also control the texture address, we could even accurately represent the texture address fraction bits for the interpolation (so effects like banding are emulated).


This PR strongly interacts with texture / surface caching and probably dotmapping.


TODO:

  • Add more bias, so that the values match Xbox
  • PR description

@JayFoxRox
Copy link
Owner Author

What happens when R8 or something has signed channels?

Will G and B be affected by R signedness? will A still be 1.0?

@JayFoxRox JayFoxRox changed the title nv2a: Support signed textures (breaks if interpolation is active) nv2a: Support signed textures Jun 25, 2019
@@ -4089,31 +4089,113 @@ static uint8_t* convert_texture_data(const TextureShape s,
unsigned int slice_pitch)
{
//FIXME: Handle all formats
if (s.color_format == NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_A8R8G8B8) {
if ((s.color_format == NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_A8R8G8B8) ||
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME: Explain what this code path does

Note to self: This flips the sign.

Normally this is two's-complement. So:

  • 0x80 = -128 = 128
  • 0xFF = ..-1 = 255
  • 0x00 = ...0 = ..0
  • 0x7F = .127 = 127

Note the discontinuity between 0xFF and 0x00 when interpreting it as unsigned value. By flipping the sign bit and interpreting as unsigned, we achieve the following:

  • 0x80 will be 0x00 = ...0 = ..0
  • 0xFF will be 0x7F = .127 = 127
  • 0x00 will be 0x80 = -128 = 128
  • 0x7F will be 0xFF = ..-1 = 255

- So we have gotten rid of the discontinuity.
In the shader, we scale and add a bias, to bring it back into the -128 to 127 range.

return NULL;
}

// Convert XOR mask to be re-usable for 8 or 16bpp
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME: Add comment that this is a hack / incomplete

@JayFoxRox
Copy link
Owner Author

How are 1 bit values signed? like A1R5G5B5

@@ -4086,6 +4086,38 @@ static void convert_yuy2_to_rgb(const uint8_t *line, unsigned int ix,
*b = cliptobyte((298 * c + 516 * d + 128) >> 8);
}

static const uint32_t* get_gl_sign_bits(GLenum gl_format, GLenum gl_type) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what this does

@@ -4086,6 +4086,38 @@ static void convert_yuy2_to_rgb(const uint8_t *line, unsigned int ix,
*b = cliptobyte((298 * c + 516 * d + 128) >> 8);
}

static const uint32_t* get_gl_sign_bits(GLenum gl_format, GLenum gl_type) {
if ((gl_format == GL_BGRA) && (gl_type == GL_UNSIGNED_INT_8_8_8_8_REV)) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also using GL_RGBA, GL_UNSIGNED_INT_8_8_8_8

@JayFoxRox
Copy link
Owner Author

  • Banding is wrong for RGBA4444
  • RGBA5551 alpha is wrong when signed

So I assume that a bias is wrong, but 8 bits hides that problem.

This is a design issue that I'll try to solve. But as a temporary hack we can probably fall back to the current solution, too (as the normalized result will be "good enough" for most applications).

@@ -1055,6 +1055,8 @@
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_R5G6B5 0x11
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_A8R8G8B8 0x12
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_Y8 0x13
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_R8B8 0x16
# define NV097_SET_TEXTURE_FORMAT_COLOR_LU_IMAGE_G8B8 0x17
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this commit into a separate PR

qstring_append(final, "\n");
qstring_append(final, "vec4 signed_texture(vec4 sample, bvec4 mask) {\n");
qstring_append(final, " vec4 signed_sample = sample * 2.0 - 1.0;\n");
qstring_append(final, " return mix(sample, signed_sample, mask);\n");
Copy link
Owner Author

@JayFoxRox JayFoxRox Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Hack so interpolation will work properly
uint8_biased = uint8 xor 0x80;

// Normalization step
//FIXME: GL will use / 255.0, or / 15.0 etc.?
uint8_biased_normalized = uint8_biased/254;

// Unbias to recover intended value
sint8 = (uint8_biased_normalized*2-1)-2/254

Equivalent (unconfirmed):

// Normalization step
uint8_biased_normalized = uint8_biased/255; // Assumed as part of GL

// Unbias to recover intended value
sint8 = (uint8_biased_normalized*255-1)/127-1.0

Equivalent (unconfirmed):

// Normalization step
uint8_biased_normalized = uint8_biased/255; // Assumed as part of GL

// Unbias to recover intended value
const scale = 255/127
const bias_prime = -1.0-1/127
sint8 = uint8_biased_normalized * scale + bias

is closer to values shown for SNORM8 at https://github.com/nlguillemot/SNormTest

(Clamping not shown here)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JayFoxRox JayFoxRox added the wip label Jun 27, 2019
bool gl_component_signed[4] = { false, false, false, false };
for(int i = 0; i < 4; i++) {

// Use GL swizzle mask to figure out which GL component to use
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember what I meant by this and the comment doesn't help.
I should improve this comment.

@@ -4080,6 +4127,116 @@ static uint8_t* convert_texture_data(const TextureShape s,
unsigned int row_pitch,
unsigned int slice_pitch)
{
//FIXME: Handle all formats
if (true) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably happen after format conversion

// Get texture format information
ColorFormatInfo f = kelvin_color_format_map[s.color_format];

//FIXME: Extend for other formats in the main texture table
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was probably added when get_gl_sign_bits wasn't a function yet.
It can be removed.

GLenum gl_swizzle = f.gl_swizzle_mask[i];

//FIXME: What happens with these?
if (gl_swizzle == GL_ZERO) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a switch-case?

gl_component_used[j] = true;

// Apply XOR mask for signedness, and mark as signed
if (s.signed_rgba[i]) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be checked at the start of the loop?

}

if (gl_swizzle == GL_RED) {
j = 0;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

j should be called something like gl_component_index

uint32_t gl_xor_mask = 0;
bool gl_component_used[4] = { false, false, false, false };
bool gl_component_signed[4] = { false, false, false, false };
for(int i = 0; i < 4; i++) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i should be called something like nv2a_component_index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant