Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline ASTNode bindings dependencies and observers #15098

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ggiraldez
Copy link
Contributor

Every ASTNode contains two arrays used for type inference and checking: dependencies and observers. By default, these are created lazily, but most active (ie. effectively typed) ASTNodes end up creating them. Furthermore, on average both these lists contain less than 2 elements each.

This PR replaces the both Array(ASTNode)? references in ASTNode by inlined structs that can hold two elements and a tail array for the cases where more links are needed. This reduces number of allocations, bytes allocated, number of instructions executed and running time.

Some numbers from compiling the Crystal compiler itself without running codegen (since type binding occurs in the semantic phase anyway).

  • Running time (measured with hyperfine running with GC_DONT_GC=1): ~6% reduction
    Before:

    Benchmark 1: ./self-semantic-only.sh
      Time (mean ± σ):      3.398 s ±  0.152 s    [User: 2.264 s, System: 0.470 s]
      Range (min … max):    3.029 s …  3.575 s    10 runs
    

    After:

    Benchmark 1: ./self-semantic-only.sh
      Time (mean ± σ):      3.180 s ±  0.095 s    [User: 2.153 s, System: 0.445 s]
      Range (min … max):    3.046 s …  3.345 s    10 runs
    
  • Memory (as reported by the compiler itself, with GC): ~9.6% reduction
    Before:

    Parse:                             00:00:00.000038590 (   1.05MB)
    Semantic (top level):              00:00:00.483357706 ( 174.13MB)
    Semantic (new):                    00:00:00.002156811 ( 174.13MB)
    Semantic (type declarations):      00:00:00.038313066 ( 174.13MB)
    Semantic (abstract def check):     00:00:00.014283169 ( 190.13MB)
    Semantic (restrictions augmenter): 00:00:00.010672651 ( 206.13MB)
    Semantic (ivars initializers):     00:00:04.660611385 (1250.07MB)
    Semantic (cvars initializers):     00:00:00.008343907 (1250.07MB)
    Semantic (main):                   00:00:00.780627942 (1346.07MB)
    Semantic (cleanup):                00:00:00.000961914 (1346.07MB)
    Semantic (recursive struct check): 00:00:00.001121766 (1346.07MB)
    

    After:

    Parse:                             00:00:00.000044417 (   1.05MB)
    Semantic (top level):              00:00:00.546445955 ( 190.03MB)
    Semantic (new):                    00:00:00.002488975 ( 190.03MB)
    Semantic (type declarations):      00:00:00.040234541 ( 206.03MB)
    Semantic (abstract def check):     00:00:00.015473723 ( 222.03MB)
    Semantic (restrictions augmenter): 00:00:00.010828366 ( 222.03MB)
    Semantic (ivars initializers):     00:00:03.324639987 (1135.96MB)
    Semantic (cvars initializers):     00:00:00.007359853 (1135.96MB)
    Semantic (main):                   00:00:01.806822202 (1217.96MB)
    Semantic (cleanup):                00:00:00.000626975 (1217.96MB)
    Semantic (recursive struct check): 00:00:00.001435494 (1217.96MB)
    
  • Callgrind stats:

    • Instruction refs: 17,477,865,704 -> 16,712,610,033 (~4.4% reduction)
    • Estimated cycles: 26,835,733,874 -> 26,154,926,143 (~2.5% reduction)
    • GC_malloc_kind call count: 35,161,616 -> 25,684,997 (~27% reduction)

This allows big savings in number and bytes allocated since most
ASTNodes will have on average less than 2 elements in each of these
lists.
@ggiraldez
Copy link
Contributor Author

Damn... I wished I saw this before I even started 😬

The CI is failing anyway, and maybe the reason is the same as the last time. Will investigate further.

@straight-shoota
Copy link
Member

Ah, I assumed you were aware of that. 🙈
I had definitely mentioned there had been a previous attempt that looked nice but didn't work out.

Anyway the issue here must be something relatively obvious. I presume it shouldn't be hard to fix. The interpreter failure was more subtle.

Comment on lines 2 to 11
# Specialized container for ASTNodes to use for bindings tracking.
#
# The average number of elements in both dependencies and observers is below 2
# for ASTNodes. This struct inlines the first two elements saving up 4
# allocations per node (two arrays, with a header and buffer for each) but we
# need to pay a slight extra cost in memory upfront: a total of 6 pointers (48
# bytes) vs 2 pointers (16 bytes). The other downside is that since this is a
# struct, we need to be careful with mutation.
struct SmallNodeList
include Indexable(ASTNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: It might be worth noting that this collection type implements a subset of Array's features, but the missing bits are not relevant to the use case.
I'm also wondering if this actually needs to implement Indexable. Enumerable should actually be sufficient.
The only place were indexed access is needed is for a deconstruction first, second = nodes in type_merge and that's easy to refactor. This would reduce overall code by removing unsafe_fetch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks! I'll make the change.

@ggiraldez
Copy link
Contributor Author

Ah, I assumed you were aware of that. 🙈 I had definitely mentioned there had been a previous attempt that looked nice but didn't work out.

No worries. It was pretty straightforward to implement. I know that you mentioned it, but it didn't occur to me that there was a trail to follow.

Anyway the issue here must be something relatively obvious. I presume it shouldn't be hard to fix. The interpreter failure was more subtle.

Not sure it's obvious 😬 So far, I spent more time trying to figure out what's broken than actually implementing the optimisation. The good news is that this doesn't reproduce #12769.

@straight-shoota
Copy link
Member

Turning SmallNodeList into a class seems to fix the failure. So it's a pass-by-value issue.
We can dig through the source code to finde where we pass an instance. Or we could keep it a class, and maybe inline the allocation with ReferenceStorage 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize allocation for node dependencies and observers in compiler
3 participants