Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a union struct for an optimal PythonConstant #235

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

atifaziz
Copy link
Contributor

@atifaziz atifaziz commented Oct 6, 2024

This PR is an alternative design to the refactoring proposed in PR #234. Only one of the two should be merged.

In contrast to PR #234, which uses a type hierarchy of subclasses to represent a union of all PythonConstant types, this uses a single struct as a union. It has the following benefits with respect to the current implementation and the one proposed in PR #234:

  • It uses a single storage for all scalar value types (bool, long and double) and another for strings, instead of the sum of storage needed for all value types.
  • Being a value type, it requires no additional heap allocations itself.
  • In its default state (default(PythonConstant)), it represents None.
  • Various properties provide strong-typed access to the underlying value (union of scalars or string) and designed to be friendly for pattern-matching (see ArgumentReflection.ArgumentSyntax for application). In fact, PythonConstant.ConstantType can even be made private since it's no longer used outside of PythonConstant.

The main motivation for accepting this design over PR #234 is zero heap footprint.

@tonybaloney
Copy link
Owner

The main motivation for accepting this design over PR #234 is zero heap footprint.

thanks. The memory consumption of the parser for the source generation is low-priority.

I'm having a hard time understanding the code on this one and prefer the other PR. Granted, my C# knowledge is intermediate (a way from advanced).

Conflicts resolved:

- src/CSnakes.SourceGeneration/Parser/PythonParser.Constants.cs
- src/CSnakes.SourceGeneration/Parser/Types/PythonConstant.cs
- src/CSnakes.SourceGeneration/Reflection/ArgumentReflection.cs
- src/CSnakes.Tests/TokenizerTests.cs
@atifaziz
Copy link
Contributor Author

atifaziz commented Oct 7, 2024

The memory consumption of the parser for the source generation is low-priority.

I understand.

I'm having a hard time understanding the code on this one and prefer the other PR. Granted, my C# knowledge is intermediate (a way from advanced).

FWIW, the overall design here is two-fold (and thus non-obvious). First, to optimise the storage space via Union:

[StructLayout(LayoutKind.Explicit)]
private readonly struct Union
{
    [FieldOffset(0)] public readonly bool Boolean;
    [FieldOffset(0)] public readonly long Int64;
    [FieldOffset(0)] public readonly double Double;
}

This would be akin to a union in C:

union Data {
    int flag; // bool
    long lval
    double dval;
    char* pstr;
};

The [FieldOffset(0)] attributes ensure that all fields of the Union share the same memory address. Unlike in C, however, you can't overlap scalar/value and reference types (like string) and so Union only contains scalars and that lives alongside the string as a regular field in PythonConstant:

public readonly struct PythonConstant
{
    readonly Union union; // union storage for value primitives (bool, long & double)
    readonly string? str; // storage for string reference
}

This means that the size of PythonConstant will be constant (size of Union will be its largest field that's double and size of PythonConstant will be size of Union + string reference or 8 bytes for a 64-bit pointer + ConstantType) irrespective of how many scalars or references it needs to represent (today and tomorrow) in a mutually exclusive manner.

The None member of PythonConstant.ConstantType is moved to first position to receive a compiler-assigned value of 0. Since PythonConstant is a struct that always has a default constructor, this ensures that the default representation is always None. In other words, if someone creates an array of PythonConstant, the elements will naturally appear as None in their default state as opposed to being undefined (which would introduce an extra edge to have to handle).

The PythonConstant.ConstantType is still needed as a discriminator. The properties that provide access to the constant's value are all typed as being nullable. So a property will only yield a value if it happens to be of the request value type (based on PythonConstant.Type) and returns null otherwise. This allows use of pattern matching and specifically the empty property pattern { } to effectively mean non-null. While C# has the not null pattern, { } goes one step further and allows binding to a variable. That's what's going in the switch block in ArgumentReflection.ArgumentSyntax:

switch (parameter.DefaultValue)
{
    case null: /* ... */ break;
    case { HexidecimalIntegerValue: { } v }: /* ... */ break;
    case { BinaryIntegerValue: { } v }: /* ... */ break;
    case { IntegerValue: { } v and >= int.MinValue and <= int.MaxValue }: /* ... */ break;
    case { IntegerValue: { } v }: /* ... */ break;
    case { StringValue: { } v }: /* ... */ break;
    case { FloatValue: { } v }: /* ... */ break;
    case { BoolValue: true }: /* ... */ break;
    case { BoolValue: false }: /* ... */ break;
    case { IsNone: true }: /* ... */ break;
}

Whether this PR is approved or not, I've expanded on the design if it can help with the decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants