-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler: don't always use Array for node dependencies and observers #12405
Conversation
Note: I'm still not convinced we should merge this. Maybe others can run benchmarks on their machines, with their projects, to see if there's any noticeable difference. |
For fun I decided to give this a try: This is compiling my app with the compiler on master (1.6.0.dev), no warm cache:
With a warm cache:
This branch with no warm cache:
This branch with warm cache:
So looks like it compiles with about 500mb less, and in about 1min quicker? |
In your tests two of the macro runs were fresh only in the first compilation. |
I managed to introduce an abstraction for this concept of "zero, one or many." I had trouble doing that at first because making We'll see if CI passes. Doing that refactor actually caught a case where If all goes well, making a call args, or in general most of ASTNodes that hold an array of other nodes use this type, might give a really big memory save, and maybe also a performance boost. But I'd leave that for a later PR if we go forward with this one. |
With this new change, here's compiling the compiler without cache:
That's 200MB less than before! And about 3 seconds faster too. With a fully warm cache:
About 1 second less than before. And given that it was taking about 11 seconds before and now it's 10 seconds, that's like a 10% speed improvement! |
41b86d0
to
db1aae9
Compare
I continued changing some code from
I'm sure there are many other places where "just one" is more common than "a lot", so I'll see if I can find more, but for now this change is already pretty good. Here are some stats comparing the compiler before and after this PR. All stats are with a warm cache (which makes sense because these changes mostly affect the semantic phase):
So it seems that memory goes down by about 20%, and the time goes down by about 10%. Every small win counts! @jwoertink I'd love to see your stats with the latest changes in this PR 💙 |
I've just tried this PR with an M1 using Mint and I can say that I'm getting close results to @asterite, overall of 10% faster compile time and 20% less memory Crystal 1.5.0:
This PR:
🚀 🚀 🚀 |
This is with warm cache:
|
For me on Debian Linux, using LLVM12, I noticed the opposite: use a little more RAM (~40MB) but seems a bit faster (~1%), however it is within margin of error. I ran |
Here are the results I got: master
opt/less-arrays
Edit: I ran them multiple times and this results are the most representative. |
@j8r Thanks! I find it suspicious that the memory goes up. Just in case, here are the steps to reproduce the results:
Are you sure you ran the steps above? |
What if instead of |
The main issue with using What's why I chose the name
|
I'll try once more to rename them to |
Regarding generalizing with |
If we go forward with this, I'd prefer a separate |
Do you think we'll merge this before 1.6.0? And if not... ever? I have a lot more optimization PRs after this one (I already sent two of them) but I need all PRs to be merged to continue, otherwise I have to duplicate code and effort on every new optimization I try. Just asking to get an answer, not to rush anyone. |
@straight-shoota Thanks for the approval! What do you think about #12426 ? Options:
|
Oh, I forgot about that one. I think it looks good, we can probably merge it into this PR before merging into master. |
2edabc5
to
14170ac
Compare
…es and observers (#12849)
…e dependencies and observers (crystal-lang#12849)
This PR changes how bindings and observers are stored in code.
What are bindings and observers?
When you have code like this:
What Crystal does is to bind nodes between each other to propagate types. In the example above:
a
is bound to1
b
is bound toa
in one branch, and to0
in anotherThe example above is very simple, but the idea of binding is that if a node's type changes (and that can happen in Crystal if we are inside a loop) then the types are recomputed across nodes.
These are stored bi-directionally:
b
area
and0
a
isb
We store dependencies and observers as arrays.
What did I find
It turns out, most of the time the dependencies and observers arrays only have one element. For example, when compiling the compiler, we get this:
That means we are allocating a lot of arrays of size 1 when we could avoid allocating all that memory. Allocating memory well, requires memory, and also puts more pressure into the GC.
What's the change
We change
dependencies
andobservers
fromArray(ASTNode)?
toNil | ASTNode | Array(ASTNode)
. The middle type is the case of a single bound node. It slightly complicates the compiler's logic, but not that much.Benchmark
This is compiling the compiler with the old compiler, no warm cache:
With a warm cache:
So around 2.3GB of memory, 22.9s without cache and 11.34s with cache.
With this PR, compiling the compiler, no warm cache:
With a warm cache:
So 2.111GB of memory, and about the same time.
I hoped the times would go down a bit more, but at least 200MB less for compiling the compiler seems good.