-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline ASTNode
bindings dependencies and observers
#15098
base: master
Are you sure you want to change the base?
Inline ASTNode
bindings dependencies and observers
#15098
Conversation
This allows big savings in number and bytes allocated since most ASTNodes will have on average less than 2 elements in each of these lists.
Damn... I wished I saw this before I even started 😬 The CI is failing anyway, and maybe the reason is the same as the last time. Will investigate further. |
Ah, I assumed you were aware of that. 🙈 Anyway the issue here must be something relatively obvious. I presume it shouldn't be hard to fix. The interpreter failure was more subtle. |
# Specialized container for ASTNodes to use for bindings tracking. | ||
# | ||
# The average number of elements in both dependencies and observers is below 2 | ||
# for ASTNodes. This struct inlines the first two elements saving up 4 | ||
# allocations per node (two arrays, with a header and buffer for each) but we | ||
# need to pay a slight extra cost in memory upfront: a total of 6 pointers (48 | ||
# bytes) vs 2 pointers (16 bytes). The other downside is that since this is a | ||
# struct, we need to be careful with mutation. | ||
struct SmallNodeList | ||
include Indexable(ASTNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: It might be worth noting that this collection type implements a subset of Array
's features, but the missing bits are not relevant to the use case.
I'm also wondering if this actually needs to implement Indexable
. Enumerable
should actually be sufficient.
The only place were indexed access is needed is for a deconstruction first, second = nodes
in type_merge
and that's easy to refactor. This would reduce overall code by removing unsafe_fetch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks! I'll make the change.
No worries. It was pretty straightforward to implement. I know that you mentioned it, but it didn't occur to me that there was a trail to follow.
Not sure it's obvious 😬 So far, I spent more time trying to figure out what's broken than actually implementing the optimisation. The good news is that this doesn't reproduce #12769. |
Turning |
Every
ASTNode
contains two arrays used for type inference and checking: dependencies and observers. By default, these are created lazily, but most active (ie. effectively typed)ASTNode
s end up creating them. Furthermore, on average both these lists contain less than 2 elements each.This PR replaces the both
Array(ASTNode)?
references inASTNode
by inlined structs that can hold two elements and a tail array for the cases where more links are needed. This reduces number of allocations, bytes allocated, number of instructions executed and running time.Some numbers from compiling the Crystal compiler itself without running codegen (since type binding occurs in the semantic phase anyway).
Running time (measured with hyperfine running with
GC_DONT_GC=1
): ~6% reductionBefore:
After:
Memory (as reported by the compiler itself, with GC): ~9.6% reduction
Before:
After:
Callgrind stats:
GC_malloc_kind
call count: 35,161,616 -> 25,684,997 (~27% reduction)