Compiler: don't always use Array for node dependencies and observers #12405

asterite · 2022-08-19T16:01:03Z

This PR changes how bindings and observers are stored in code.

What are bindings and observers?

When you have code like this:

a = 1
if rand < 0.5
  b = a
else
  b = 0
end

What Crystal does is to bind nodes between each other to propagate types. In the example above:

a is bound to 1
b is bound to a in one branch, and to 0 in another

The example above is very simple, but the idea of binding is that if a node's type changes (and that can happen in Crystal if we are inside a loop) then the types are recomputed across nodes.

These are stored bi-directionally:

dependencies is used to compute the type of a node. In this case the dependencies of b are a and 0
observers are used to know which nodes to propagate types in case a node's type changes. For example the observer of a is b

We store dependencies and observers as arrays.

What did I find

It turns out, most of the time the dependencies and observers arrays only have one element. For example, when compiling the compiler, we get this:

dependencies 
------------
array size: occurrences
1: 1278339
2: 137905
3: 6840
4: 1272
5: 644
6: 785
7: 1574
8: 263
...

observers
---------
array size: occurrences
1: 589519
2: 54349
3: 123616
4: 6272
5: 1004
6: 331
7: 162
8: 62

That means we are allocating a lot of arrays of size 1 when we could avoid allocating all that memory. Allocating memory well, requires memory, and also puts more pressure into the GC.

What's the change

We change dependencies and observers from Array(ASTNode)? to Nil | ASTNode | Array(ASTNode). The middle type is the case of a single bound node. It slightly complicates the compiler's logic, but not that much.

Benchmark

This is compiling the compiler with the old compiler, no warm cache:

✗ time bin/crystal build -s src/compiler/crystal.cr
Parse:                             00:00:00.000038250 (   0.83MB)
Semantic (top level):              00:00:00.317475667 ( 143.70MB)
Semantic (new):                    00:00:00.002673166 ( 143.70MB)
Semantic (type declarations):      00:00:00.040537083 ( 143.70MB)
Semantic (abstract def check):     00:00:00.015006458 ( 159.70MB)
Semantic (restrictions augmenter): 00:00:00.009682959 ( 159.70MB)
Semantic (ivars initializers):     00:00:04.800375875 (1815.61MB)
Semantic (cvars initializers):     00:00:00.005825667 (1815.61MB)
Semantic (main):                   00:00:09.014618542 (2111.61MB)
Semantic (cleanup):                00:00:00.000723083 (2111.61MB)
Semantic (recursive struct check): 00:00:00.001276250 (2111.61MB)
Codegen (crystal):                 00:00:03.229728375 (2319.61MB)
Codegen (bc+obj):                  00:00:04.390755792 (2319.61MB)
Codegen (linking):                 00:00:00.644151084 (2319.61MB)
dsymutil:                          00:00:00.382296125 (2319.61MB)

bin/crystal build -s src/compiler/crystal.cr  38.90s user 2.33s system 179% cpu 22.973 total

With a warm cache:

$ time bin/crystal build -s src/compiler/crystal.cr
Parse:                             00:00:00.000038417 (   0.83MB)
Semantic (top level):              00:00:00.315096292 ( 143.70MB)
Semantic (new):                    00:00:00.002657209 ( 143.70MB)
Semantic (type declarations):      00:00:00.041264208 ( 143.70MB)
Semantic (abstract def check):     00:00:00.015403083 ( 159.70MB)
Semantic (restrictions augmenter): 00:00:00.009669917 ( 159.70MB)
Semantic (ivars initializers):     00:00:04.605003917 (1815.61MB)
Semantic (cvars initializers):     00:00:00.005939417 (1815.61MB)
Semantic (main):                   00:00:00.824864833 (2023.61MB)
Semantic (cleanup):                00:00:00.000550917 (2023.61MB)
Semantic (recursive struct check): 00:00:00.001286000 (2023.61MB)
Codegen (crystal):                 00:00:03.887214208 (2303.61MB)
Codegen (bc+obj):                  00:00:00.500315541 (2303.61MB)
Codegen (linking):                 00:00:00.636580958 (2303.61MB)
dsymutil:                          00:00:00.378992750 (2303.61MB)
bin/crystal build -s src/compiler/crystal.cr  17.08s user 1.44s system 163% cpu 11.342 total

So around 2.3GB of memory, 22.9s without cache and 11.34s with cache.

With this PR, compiling the compiler, no warm cache:

$ time bin/crystal build -s src/compiler/crystal.cr
Using compiled compiler at .build/crystal
Parse:                             00:00:00.000299458 (   0.83MB)
Semantic (top level):              00:00:00.321059583 ( 143.78MB)
Semantic (new):                    00:00:00.002321625 ( 143.78MB)
Semantic (type declarations):      00:00:00.039610125 ( 143.78MB)
Semantic (abstract def check):     00:00:00.014762666 ( 159.78MB)
Semantic (restrictions augmenter): 00:00:00.009603708 ( 159.78MB)
Semantic (ivars initializers):     00:00:04.492760667 (1559.70MB)
Semantic (cvars initializers):     00:00:00.005612375 (1559.70MB)
Semantic (main):                   00:00:07.867504833 (1807.70MB)
Semantic (cleanup):                00:00:00.000605042 (1807.70MB)
Semantic (recursive struct check): 00:00:00.001187625 (1807.70MB)
Codegen (crystal):                 00:00:03.365013917 (2111.70MB)
Codegen (bc+obj):                  00:00:04.369402625 (2111.70MB)
Codegen (linking):                 00:00:00.651541375 (2111.70MB)
dsymutil:                          00:00:00.374684958 (2111.70MB)
bin/crystal build -s src/compiler/crystal.cr  37.83s user 2.28s system 174% cpu 22.933 total

With a warm cache:

$ time bin/crystal build -s src/compiler/crystal.cr
Using compiled compiler at .build/crystal
Parse:                             00:00:00.000036541 (   0.83MB)
Semantic (top level):              00:00:00.312257667 ( 143.78MB)
Semantic (new):                    00:00:00.002320750 ( 143.78MB)
Semantic (type declarations):      00:00:00.039073250 ( 143.78MB)
Semantic (abstract def check):     00:00:00.014990125 ( 159.78MB)
Semantic (restrictions augmenter): 00:00:00.009610959 ( 159.78MB)
Semantic (ivars initializers):     00:00:04.531297042 (1559.70MB)
Semantic (cvars initializers):     00:00:00.005627000 (1559.70MB)
Semantic (main):                   00:00:00.801419375 (1735.70MB)
Semantic (cleanup):                00:00:00.000553834 (1735.70MB)
Semantic (recursive struct check): 00:00:00.001289375 (1735.70MB)
Codegen (crystal):                 00:00:03.953488458 (2111.70MB)
Codegen (bc+obj):                  00:00:00.495159166 (2111.70MB)
Codegen (linking):                 00:00:00.632672083 (2111.70MB)
dsymutil:                          00:00:00.376506500 (2111.70MB)
bin/crystal build -s src/compiler/crystal.cr  16.43s user 1.42s system 158% cpu 11.293 total

So 2.111GB of memory, and about the same time.

I hoped the times would go down a bit more, but at least 200MB less for compiling the compiler seems good.

asterite · 2022-08-19T18:57:23Z

Note: I'm still not convinced we should merge this.

Maybe others can run benchmarks on their machines, with their projects, to see if there's any noticeable difference.

jwoertink · 2022-08-19T22:51:00Z

For fun I decided to give this a try:

This is compiling my app with the compiler on master (1.6.0.dev), no warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000037160 (   0.76MB)
Semantic (top level):              00:00:21.296049102 ( 652.70MB)
Semantic (new):                    00:00:00.006644233 ( 652.70MB)
Semantic (type declarations):      00:00:00.115471755 ( 652.70MB)
Semantic (abstract def check):     00:00:00.036184495 ( 684.70MB)
Semantic (restrictions augmenter): 00:00:00.030057488 ( 700.70MB)
Semantic (ivars initializers):     00:00:00.132122853 ( 780.70MB)
Semantic (cvars initializers):     00:00:00.018864900 ( 796.70MB)
Semantic (main):                   00:00:07.086655243 (2283.70MB)
Semantic (cleanup):                00:00:00.001547965 (2283.70MB)
Semantic (recursive struct check): 00:00:00.003476881 (2283.70MB)
Codegen (crystal):                 00:00:07.836129589 (3019.70MB)
Codegen (bc+obj):                  00:00:08.676941735 (3019.70MB)
Codegen (linking):                 00:00:04.986041171 (3019.70MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: 00:00:09.341495853
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: 00:00:10.235055927
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002888857)
real	0m50.348s
user	1m21.199s
sys	0m5.408s

With a warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000027830 (   0.76MB)
Semantic (top level):              00:00:01.079584170 ( 555.76MB)
Semantic (new):                    00:00:00.007232593 ( 555.76MB)
Semantic (type declarations):      00:00:00.112843284 ( 555.76MB)
Semantic (abstract def check):     00:00:00.034558802 ( 587.76MB)
Semantic (restrictions augmenter): 00:00:00.029456914 ( 587.76MB)
Semantic (ivars initializers):     00:00:00.133077860 ( 683.76MB)
Semantic (cvars initializers):     00:00:00.018820298 ( 683.76MB)
Semantic (main):                   00:00:06.719366274 (2124.69MB)
Semantic (cleanup):                00:00:00.001525072 (2124.69MB)
Semantic (recursive struct check): 00:00:00.003323244 (2124.69MB)
Codegen (crystal):                 00:00:10.018644245 (2815.69MB)
Codegen (bc+obj):                  00:00:04.566887130 (2815.69MB)
Codegen (linking):                 00:00:04.789592386 (2815.69MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002541069)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.002758590)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002699890)

Codegen (bc+obj):
 - 4269/4412 .o files were reused

These modules were not reused:
 - a very long list of my code lol

real	0m27.632s
user	0m35.284s
sys	0m4.272s

This branch with no warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000033191 (   0.76MB)
Semantic (top level):              00:00:01.111511339 ( 554.45MB)
Semantic (new):                    00:00:00.007689575 ( 554.45MB)
Semantic (type declarations):      00:00:00.114648171 ( 554.45MB)
Semantic (abstract def check):     00:00:00.033110647 ( 570.45MB)
Semantic (restrictions augmenter): 00:00:00.029217100 ( 586.45MB)
Semantic (ivars initializers):     00:00:00.131192539 ( 666.45MB)
Semantic (cvars initializers):     00:00:00.018006393 ( 666.45MB)
Semantic (main):                   00:00:07.057628712 (1851.38MB)
Semantic (cleanup):                00:00:00.001507374 (1851.38MB)
Semantic (recursive struct check): 00:00:00.003049429 (1851.38MB)
Codegen (crystal):                 00:00:09.843128251 (2574.38MB)
Codegen (bc+obj):                  00:00:04.726382542 (2574.38MB)
Codegen (linking):                 00:00:04.785858078 (2574.38MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002809027)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.002745616)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002840708)
real	0m27.988s
user	0m32.999s
sys	0m4.424s

This branch with warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000030140 (   0.76MB)
Semantic (top level):              00:00:01.083887540 ( 555.63MB)
Semantic (new):                    00:00:00.007279427 ( 555.63MB)
Semantic (type declarations):      00:00:00.114083008 ( 555.63MB)
Semantic (abstract def check):     00:00:00.033601663 ( 571.63MB)
Semantic (restrictions augmenter): 00:00:00.028796460 ( 587.63MB)
Semantic (ivars initializers):     00:00:00.128038875 ( 667.63MB)
Semantic (cvars initializers):     00:00:00.017660649 ( 667.63MB)
Semantic (main):                   00:00:06.503724956 (1852.56MB)
Semantic (cleanup):                00:00:00.001476434 (1852.56MB)
Semantic (recursive struct check): 00:00:00.002905585 (1852.56MB)
Codegen (crystal):                 00:00:09.960081052 (2591.56MB)
Codegen (bc+obj):                  00:00:00.618658292 (2591.56MB)
Codegen (linking):                 00:00:04.913773358 (2591.56MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002599683)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.004127867)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002720205)

Codegen (bc+obj):
 - all previous .o files were reused

real	0m23.521s
user	0m28.763s
sys	0m3.914s

So looks like it compiles with about 500mb less, and in about 1min quicker?

HertzDevil · 2022-08-19T23:01:29Z

In your tests two of the macro runs were fresh only in the first compilation.

asterite · 2022-08-20T19:15:05Z

I managed to introduce an abstraction for this concept of "zero, one or many." I had trouble doing that at first because making push and concat work would be impossible if it's a struct. So I made the choice of explicitly asking to reassign the value if it's mutated, and named the methods with and without to make that explicit.

We'll see if CI passes.

Doing that refactor actually caught a case where @dependencies was always created with an array of a single element, so in that case there's no allocation anymore. And that case was pretty frequent too! So that reduced the memory usage a bit more.

If all goes well, making a call args, or in general most of ASTNodes that hold an array of other nodes use this type, might give a really big memory save, and maybe also a performance boost. But I'd leave that for a later PR if we go forward with this one.

asterite · 2022-08-20T19:32:04Z

With this new change, here's compiling the compiler without cache:

Parse:                             00:00:00.000235917 (   0.83MB)
Semantic (top level):              00:00:00.344559792 ( 143.53MB)
Semantic (new):                    00:00:00.002485083 ( 143.53MB)
Semantic (type declarations):      00:00:00.040520083 ( 143.53MB)
Semantic (abstract def check):     00:00:00.015642417 ( 159.53MB)
Semantic (restrictions augmenter): 00:00:00.009304917 ( 159.53MB)
Semantic (ivars initializers):     00:00:04.121719666 (1415.45MB)
Semantic (cvars initializers):     00:00:00.005593833 (1415.45MB)
Semantic (main):                   00:00:05.605343458 (1655.45MB)
Semantic (cleanup):                00:00:00.000618042 (1655.45MB)
Semantic (recursive struct check): 00:00:00.001130084 (1655.45MB)
Codegen (crystal):                 00:00:03.241560791 (1911.45MB)
Codegen (bc+obj):                  00:00:04.368373875 (1911.45MB)
Codegen (linking):                 00:00:00.636823125 (1911.45MB)
dsymutil:                          00:00:00.370766208 (1911.45MB)
bin/crystal build -s src/compiler/crystal.cr  34.58s user 2.26s system 182% cpu 20.145 total

That's 200MB less than before! And about 3 seconds faster too.

With a fully warm cache:

Parse:                             00:00:00.000032125 (   0.83MB)
Semantic (top level):              00:00:00.313381959 ( 143.53MB)
Semantic (new):                    00:00:00.002431584 ( 143.53MB)
Semantic (type declarations):      00:00:00.040245291 ( 143.53MB)
Semantic (abstract def check):     00:00:00.015619792 ( 159.53MB)
Semantic (restrictions augmenter): 00:00:00.009290666 ( 159.53MB)
Semantic (ivars initializers):     00:00:04.119975833 (1415.45MB)
Semantic (cvars initializers):     00:00:00.005418167 (1415.45MB)
Semantic (main):                   00:00:00.764760042 (1591.45MB)
Semantic (cleanup):                00:00:00.000504583 (1591.45MB)
Semantic (recursive struct check): 00:00:00.001178584 (1591.45MB)
Codegen (crystal):                 00:00:03.177530417 (1975.45MB)
Codegen (bc+obj):                  00:00:00.495698875 (1975.45MB)
Codegen (linking):                 00:00:00.626489041 (1975.45MB)
dsymutil:                          00:00:00.389252333 (1975.45MB)
bin/crystal build -s src/compiler/crystal.cr  14.94s user 1.37s system 162% cpu 10.066 total

About 1 second less than before. And given that it was taking about 11 seconds before and now it's 10 seconds, that's like a 10% speed improvement!

src/compiler/crystal/zero_one_or_many.cr

asterite · 2022-08-22T12:02:35Z

I continued changing some code from Array(T) to ZeroOneOrMany(T), specifically in these cases:

The set of methods a call resolves to (target_defs) is usually of size one. Multidispatch is possible, but it's way less frequent than no multidispatch.
Likewise, when looking up methods, it's likely that the answer will be a single method, not multiple ones.

I'm sure there are many other places where "just one" is more common than "a lot", so I'll see if I can find more, but for now this change is already pretty good.

Here are some stats comparing the compiler before and after this PR. All stats are with a warm cache (which makes sense because these changes mostly affect the semantic phase):

Compiler:
- Before: 2367.61MB, 11.330s
- After: 1927.77MB, 10.116s
All spec:
- Before: 4511.62MB, 29.386s
- After: 3583.64MB, 25.225s
Mint:
- Before: 1047.61MB, 7.367s
- After: 851.64MB, 6.749s

So it seems that memory goes down by about 20%, and the time goes down by about 10%. Every small win counts!

@jwoertink I'd love to see your stats with the latest changes in this PR 💙

sdogruyol · 2022-08-22T12:49:19Z

I've just tried this PR with an M1 using Mint and I can say that I'm getting close results to @asterite, overall of 10% faster compile time and 20% less memory

Crystal 1.5.0:

Memory: 1031.61 MB
Time: 7.927 seconds

This PR:

Memory: 851.78 MB
Time: 7.016 seconds

🚀 🚀 🚀

jwoertink · 2022-08-22T18:09:45Z

@asterite

This is with warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server
Using compiled compiler at /home/jeremy/Development/crystal/lang/.build/crystal
Parse:                             00:00:00.000028701 (   0.76MB)
Semantic (top level):              00:00:01.162010325 ( 555.73MB)
Semantic (new):                    00:00:00.007236840 ( 555.73MB)
Semantic (type declarations):      00:00:00.123850789 ( 555.73MB)
Semantic (abstract def check):     00:00:00.039450360 ( 587.73MB)
Semantic (restrictions augmenter): 00:00:00.032601763 ( 587.73MB)
Semantic (ivars initializers):     00:00:00.127918452 ( 667.73MB)
Semantic (cvars initializers):     00:00:00.018243774 ( 667.73MB)
Semantic (main):                   00:00:07.484193000 (1836.66MB)
Semantic (cleanup):                00:00:00.001705514 (1836.66MB)
Semantic (recursive struct check): 00:00:00.003096386 (1836.66MB)
Codegen (crystal):                 00:00:09.974950788 (2575.66MB)
Codegen (bc+obj):                  00:00:04.800466018 (2575.66MB)
Codegen (linking):                 00:00:05.414338341 (2575.66MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002753663)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.002784373)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002700702)

Codegen (bc+obj):
 - 4269/4413 .o files were reused

These modules were not reused:
 - _main (_main.bc)
 - Process (P-rocess.bc)
 - Crystal::Signal (C-rystal5858S-ignal.bc)
 - Log::AsyncDispatcher (L-og5858A-syncD-ispatcher.bc)
 - Hash(Int32, PG::Decoders::Decoder) (H-ash40I-nt324432P-G-5858D-ecoders5858D-ecoder41.bc)
 - ASN1::BER::Length (A-S-N-15858B-E-R-5858L-ength.bc)
 - Lucky::Params (L-ucky5858P-arams.bc)
 - DB::Database (D-B-5858D-atabase.bc)
 - Avram::QueryBuilder (A-vram5858Q-ueryB-uilder.bc)
 - a lot more of my code

real	0m29.340s
user	0m35.442s
sys	0m5.911s

j8r · 2022-08-23T22:13:05Z

For me on Debian Linux, using LLVM12, I noticed the opposite: use a little more RAM (~40MB) but seems a bit faster (~1%), however it is within margin of error.

I ran rm -rf ~/.cache/crystal/; time bin/crystal build -s src/compiler/crystal.cr and used the master branch then the upstream/opt/less-arrays one.

j8r · 2022-08-23T22:22:10Z

Here are the results I got:

master

Using compiled compiler at .build/crystal
Parse:                             00:00:00.000033980 (   0.76MB)
Semantic (top level):              00:00:00.539207291 ( 140.77MB)
Semantic (new):                    00:00:00.002808200 ( 140.77MB)
Semantic (type declarations):      00:00:00.055299521 ( 156.77MB)
Semantic (abstract def check):     00:00:00.049569422 ( 156.77MB)
Semantic (restrictions augmenter): 00:00:00.018224674 ( 156.77MB)
Semantic (ivars initializers):     00:00:10.066043345 (1764.70MB)
Semantic (cvars initializers):     00:00:00.012062183 (1780.70MB)
Semantic (main):                   00:00:11.557960834 (2100.70MB)
Semantic (cleanup):                00:00:00.001043465 (2100.70MB)
Semantic (recursive struct check): 00:00:00.002456341 (2100.70MB)
Codegen (crystal):                 00:00:09.732771761 (2508.70MB)
Codegen (bc+obj):                  00:00:12.532001324 (2508.70MB)
Codegen (linking):                 00:00:05.949288418 (2508.70MB)

Macro runs:
 - /home/jrei/Desktop/Projects/crystal/crystal/src/ecr/process.cr: 00:00:08.687054383

Codegen (bc+obj):
 - no previous .o files were reused
bin/crystal build -s src/compiler/crystal.cr  87.03s user 8.29s system 187% cpu 50.767 total

opt/less-arrays

Using compiled compiler at .build/crystal
Parse:                             00:00:00.000068820 (   0.76MB)
Semantic (top level):              00:00:00.577303302 ( 140.55MB)
Semantic (new):                    00:00:00.003032252 ( 140.55MB)
Semantic (type declarations):      00:00:00.059802349 ( 156.55MB)
Semantic (abstract def check):     00:00:00.051617765 ( 156.55MB)
Semantic (restrictions augmenter): 00:00:00.020554056 ( 156.55MB)
Semantic (ivars initializers):     00:00:09.877500313 (1780.48MB)
Semantic (cvars initializers):     00:00:00.012239180 (1796.48MB)
Semantic (main):                   00:00:11.417950642 (2116.48MB)
Semantic (cleanup):                00:00:00.000847293 (2116.48MB)
Semantic (recursive struct check): 00:00:00.002358350 (2116.48MB)
Codegen (crystal):                 00:00:09.996958760 (2524.48MB)
Codegen (bc+obj):                  00:00:12.531236392 (2540.48MB)
Codegen (linking):                 00:00:05.675753104 (2540.48MB)

Macro runs:
 - /home/jrei/Desktop/Projects/crystal/crystal/src/ecr/process.cr: 00:00:08.648373085

Codegen (bc+obj):
 - no previous .o files were reused
bin/crystal build -s src/compiler/crystal.cr  88.21s user 8.35s system 191% cpu 50.469 total

Edit: I ran them multiple times and this results are the most representative.

asterite · 2022-08-24T11:56:57Z

@j8r Thanks!

I find it suspicious that the memory goes up.

Just in case, here are the steps to reproduce the results:

Checkout this PR's branch
Run bin/crystal build -s src/compiler/crystal.cr many times until the output says that all previous .o files were reused (in both your outputs that's not the case)

This output shows the times before this PR. (why does it say "using compiled compiler" in your case? That shouldn't be the case)

Run make clean crystal release=1
Run point 2 again

This output shows the times with this PR (in this case it should say "using compiled compiler")

Are you sure you ran the steps above?

Fryguy · 2022-08-29T18:41:28Z

What if instead of + it's <<? To me, the shovel operator implies that the receiver is mutated, whereas + implies that a copy is being returned. += thus implies that a copy is returned and the receiver is replaced with the copy (different than a mutation).

asterite · 2022-08-29T18:43:24Z

The main issue with using << or push is that when updating code that uses Array(T) to ZeroOneOrMany(T), code might still compile but it might not do what you expected, mainly because the type is a struct.

What's why I chose the name with to make it very explicitly that something uncommon is happening there.

+ doesn't have that much of an issue because that's expected to return a new value, even if it mutates the original one.

asterite · 2022-08-29T18:48:14Z

I'll try once more to rename them to push, concat and reject! and see how it goes. Maybe even if this is a struct it'll work fine in most cases.

straight-shoota · 2022-08-29T19:30:31Z

Regarding generalizing with OneOrMany(T) I suppose it might be an option to keep the special handling for Nil values (meaning no value as opposed to a nil value), but let it be configurable via the type parameter: OneOrMany(T) means there must be at least one value, OneOrMany(T | Nil) means there can be zero values (and the wrapper array is of type Array(T)). Not sure if this is too much of a hack or another brilliant idea 😆

asterite · 2022-09-01T18:59:21Z

Regarding generalizing with OneOrMany(T)

If we go forward with this, I'd prefer a separate OneOrMany type.

asterite · 2022-09-06T09:26:06Z

Do you think we'll merge this before 1.6.0? And if not... ever? I have a lot more optimization PRs after this one (I already sent two of them) but I need all PRs to be merged to continue, otherwise I have to duplicate code and effort on every new optimization I try.

Just asking to get an answer, not to rush anyone.

src/compiler/crystal/zero_one_or_many.cr

… is T

…o opt/less-arrays

asterite · 2022-09-06T10:56:12Z

@straight-shoota Thanks for the approval!

What do you think about #12426 ? Options:

Close that PR
Merge that PR into this one, and then merge this one
Merge this PR first, then see if we merge the other one

straight-shoota · 2022-09-06T11:04:09Z

Oh, I forgot about that one. I think it looks good, we can probably merge it into this PR before merging into master.

…es and observers (#12849)

straight-shoota · 2022-12-17T13:17:30Z

This was eventually reverted in #12849 because if broke the interpreter (#12769).

We probably stil want to bring this back with fixed behaviour. That's tracked in #12851.

…e dependencies and observers (crystal-lang#12849)

Ary Borenszweig added 3 commits August 19, 2022 12:16

Compiler: don't always use Array for node dependencies

d3a23ee

Compiler: don't always use array for observers

f50db86

Don't use splat in arrays

d656361

Blacksmoke16 added kind:refactor topic:compiler:semantic labels Aug 19, 2022

asterite added the performance label Aug 19, 2022

asterite marked this pull request as ready for review August 19, 2022 18:56

Ary Borenszweig added 5 commits August 20, 2022 16:02

Introduce the ZeroOneOrMany(T) type

6e3f4ae

Refactor: extract type_from_dependencies

bea2465

Fix while that would always be true

c910356

Remove unused method

675f266

More fixes

d1db38f

Don't use same? in ZeroOneOrMany

744554e

Ary Borenszweig added 3 commits August 20, 2022 17:23

Put condition back

9875c1d

Use ZeroOneOrMany for Call#@target_defs

4f5dc0e

Use ZeroOneOrMany in lookup_matches

881226f

HertzDevil reviewed Aug 22, 2022

View reviewed changes

src/compiler/crystal/zero_one_or_many.cr Outdated Show resolved Hide resolved

Put ZeroOneOrMany in the Crystal module

db1aae9

asterite force-pushed the opt/less-arrays branch from 41b86d0 to db1aae9 Compare August 22, 2022 11:35

asterite mentioned this pull request Aug 29, 2022

Let ZeroOneOrMany be mutable #12426

Merged

Merge branch 'master' into opt/less-arrays

471a54b

straight-shoota reviewed Sep 6, 2022

View reviewed changes

src/compiler/crystal/zero_one_or_many.cr Show resolved Hide resolved

Ary Borenszweig added 3 commits September 6, 2022 06:41

Raise if calling ZeroOneOrMany#unsafe_fetch with index != 0 and value…

8d4ca38

… is T

Merge branch 'opt/less-arrays' of github.com:crystal-lang/crystal int…

ecc33cc

…o opt/less-arrays

Raise IndexError

3d96d3e

straight-shoota approved these changes Sep 6, 2022

View reviewed changes

straight-shoota added this to the 1.6.0 milestone Sep 6, 2022

Let ZeroOneOrMany be mutable (#12426)

14170ac

asterite mentioned this pull request Sep 6, 2022

Compiler refactor: extract type_from_dependencies #12437

Merged

asterite force-pushed the opt/less-arrays branch from 2edabc5 to 14170ac Compare September 6, 2022 12:47

Merge branch 'master' into opt/less-arrays

d9d0cff

asterite merged commit 4abeeb1 into master Sep 7, 2022

asterite deleted the opt/less-arrays branch September 7, 2022 14:48

ukd1 mentioned this pull request Oct 8, 2022

Issues with HEAD and Crystal 1.6 elbywan/crystalline#46

Closed

caspiano mentioned this pull request Nov 3, 2022

BUG: Typed local variable in block raises nil assertion #7037

Closed

HertzDevil mentioned this pull request Nov 21, 2022

Interpreter broken on master when exiting naturally #12769

Closed

beta-ziliani mentioned this pull request Dec 16, 2022

Reverting #12405 Compiler: don't always use Array for node dependencies and observers #12849

Merged

straight-shoota pushed a commit that referenced this pull request Dec 17, 2022

Reverting #12405 Compiler: don't always use Array for node dependenci…

8825501

…es and observers (#12849)

straight-shoota mentioned this pull request Dec 17, 2022

Optimize allocation for node dependencies and observers in compiler #12851

Open

carlhoerberg pushed a commit to carlhoerberg/crystal that referenced this pull request Dec 21, 2022

Reverting crystal-lang#12405 Compiler: don't always use Array for nod…

a4cd354

…e dependencies and observers (crystal-lang#12849)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler: don't always use Array for node dependencies and observers #12405

Compiler: don't always use Array for node dependencies and observers #12405

asterite commented Aug 19, 2022 •

edited

Loading

asterite commented Aug 19, 2022

jwoertink commented Aug 19, 2022

HertzDevil commented Aug 19, 2022

asterite commented Aug 20, 2022

asterite commented Aug 20, 2022

asterite commented Aug 22, 2022

sdogruyol commented Aug 22, 2022 •

edited

Loading

jwoertink commented Aug 22, 2022

j8r commented Aug 23, 2022

j8r commented Aug 23, 2022 •

edited

Loading

asterite commented Aug 24, 2022

Fryguy commented Aug 29, 2022

asterite commented Aug 29, 2022

asterite commented Aug 29, 2022

straight-shoota commented Aug 29, 2022

asterite commented Sep 1, 2022

asterite commented Sep 6, 2022

asterite commented Sep 6, 2022

straight-shoota commented Sep 6, 2022

straight-shoota commented Dec 17, 2022

Compiler: don't always use Array for node dependencies and observers #12405

Compiler: don't always use Array for node dependencies and observers #12405

Conversation

asterite commented Aug 19, 2022 • edited Loading

What are bindings and observers?

What did I find

What's the change

Benchmark

asterite commented Aug 19, 2022

jwoertink commented Aug 19, 2022

HertzDevil commented Aug 19, 2022

asterite commented Aug 20, 2022

asterite commented Aug 20, 2022

asterite commented Aug 22, 2022

sdogruyol commented Aug 22, 2022 • edited Loading

🚀 🚀 🚀

jwoertink commented Aug 22, 2022

j8r commented Aug 23, 2022

j8r commented Aug 23, 2022 • edited Loading

asterite commented Aug 24, 2022

Fryguy commented Aug 29, 2022

asterite commented Aug 29, 2022

asterite commented Aug 29, 2022

straight-shoota commented Aug 29, 2022

asterite commented Sep 1, 2022

asterite commented Sep 6, 2022

asterite commented Sep 6, 2022

straight-shoota commented Sep 6, 2022

straight-shoota commented Dec 17, 2022

asterite commented Aug 19, 2022 •

edited

Loading

sdogruyol commented Aug 22, 2022 •

edited

Loading

j8r commented Aug 23, 2022 •

edited

Loading