Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler: don't always use Array for node dependencies and observers #12405

Merged
merged 24 commits into from
Sep 7, 2022

Conversation

asterite
Copy link
Member

@asterite asterite commented Aug 19, 2022

This PR changes how bindings and observers are stored in code.

What are bindings and observers?

When you have code like this:

a = 1
if rand < 0.5
  b = a
else
  b = 0
end

What Crystal does is to bind nodes between each other to propagate types. In the example above:

  • a is bound to 1
  • b is bound to a in one branch, and to 0 in another

The example above is very simple, but the idea of binding is that if a node's type changes (and that can happen in Crystal if we are inside a loop) then the types are recomputed across nodes.

These are stored bi-directionally:

  • dependencies is used to compute the type of a node. In this case the dependencies of b are a and 0
  • observers are used to know which nodes to propagate types in case a node's type changes. For example the observer of a is b

We store dependencies and observers as arrays.

What did I find

It turns out, most of the time the dependencies and observers arrays only have one element. For example, when compiling the compiler, we get this:

dependencies 
------------
array size: occurrences
1: 1278339
2: 137905
3: 6840
4: 1272
5: 644
6: 785
7: 1574
8: 263
...

observers
---------
array size: occurrences
1: 589519
2: 54349
3: 123616
4: 6272
5: 1004
6: 331
7: 162
8: 62

That means we are allocating a lot of arrays of size 1 when we could avoid allocating all that memory. Allocating memory well, requires memory, and also puts more pressure into the GC.

What's the change

We change dependencies and observers from Array(ASTNode)? to Nil | ASTNode | Array(ASTNode). The middle type is the case of a single bound node. It slightly complicates the compiler's logic, but not that much.

Benchmark

This is compiling the compiler with the old compiler, no warm cache:

✗ time bin/crystal build -s src/compiler/crystal.cr
Parse:                             00:00:00.000038250 (   0.83MB)
Semantic (top level):              00:00:00.317475667 ( 143.70MB)
Semantic (new):                    00:00:00.002673166 ( 143.70MB)
Semantic (type declarations):      00:00:00.040537083 ( 143.70MB)
Semantic (abstract def check):     00:00:00.015006458 ( 159.70MB)
Semantic (restrictions augmenter): 00:00:00.009682959 ( 159.70MB)
Semantic (ivars initializers):     00:00:04.800375875 (1815.61MB)
Semantic (cvars initializers):     00:00:00.005825667 (1815.61MB)
Semantic (main):                   00:00:09.014618542 (2111.61MB)
Semantic (cleanup):                00:00:00.000723083 (2111.61MB)
Semantic (recursive struct check): 00:00:00.001276250 (2111.61MB)
Codegen (crystal):                 00:00:03.229728375 (2319.61MB)
Codegen (bc+obj):                  00:00:04.390755792 (2319.61MB)
Codegen (linking):                 00:00:00.644151084 (2319.61MB)
dsymutil:                          00:00:00.382296125 (2319.61MB)

bin/crystal build -s src/compiler/crystal.cr  38.90s user 2.33s system 179% cpu 22.973 total

With a warm cache:

$ time bin/crystal build -s src/compiler/crystal.cr
Parse:                             00:00:00.000038417 (   0.83MB)
Semantic (top level):              00:00:00.315096292 ( 143.70MB)
Semantic (new):                    00:00:00.002657209 ( 143.70MB)
Semantic (type declarations):      00:00:00.041264208 ( 143.70MB)
Semantic (abstract def check):     00:00:00.015403083 ( 159.70MB)
Semantic (restrictions augmenter): 00:00:00.009669917 ( 159.70MB)
Semantic (ivars initializers):     00:00:04.605003917 (1815.61MB)
Semantic (cvars initializers):     00:00:00.005939417 (1815.61MB)
Semantic (main):                   00:00:00.824864833 (2023.61MB)
Semantic (cleanup):                00:00:00.000550917 (2023.61MB)
Semantic (recursive struct check): 00:00:00.001286000 (2023.61MB)
Codegen (crystal):                 00:00:03.887214208 (2303.61MB)
Codegen (bc+obj):                  00:00:00.500315541 (2303.61MB)
Codegen (linking):                 00:00:00.636580958 (2303.61MB)
dsymutil:                          00:00:00.378992750 (2303.61MB)
bin/crystal build -s src/compiler/crystal.cr  17.08s user 1.44s system 163% cpu 11.342 total

So around 2.3GB of memory, 22.9s without cache and 11.34s with cache.

With this PR, compiling the compiler, no warm cache:

$ time bin/crystal build -s src/compiler/crystal.cr
Using compiled compiler at .build/crystal
Parse:                             00:00:00.000299458 (   0.83MB)
Semantic (top level):              00:00:00.321059583 ( 143.78MB)
Semantic (new):                    00:00:00.002321625 ( 143.78MB)
Semantic (type declarations):      00:00:00.039610125 ( 143.78MB)
Semantic (abstract def check):     00:00:00.014762666 ( 159.78MB)
Semantic (restrictions augmenter): 00:00:00.009603708 ( 159.78MB)
Semantic (ivars initializers):     00:00:04.492760667 (1559.70MB)
Semantic (cvars initializers):     00:00:00.005612375 (1559.70MB)
Semantic (main):                   00:00:07.867504833 (1807.70MB)
Semantic (cleanup):                00:00:00.000605042 (1807.70MB)
Semantic (recursive struct check): 00:00:00.001187625 (1807.70MB)
Codegen (crystal):                 00:00:03.365013917 (2111.70MB)
Codegen (bc+obj):                  00:00:04.369402625 (2111.70MB)
Codegen (linking):                 00:00:00.651541375 (2111.70MB)
dsymutil:                          00:00:00.374684958 (2111.70MB)
bin/crystal build -s src/compiler/crystal.cr  37.83s user 2.28s system 174% cpu 22.933 total

With a warm cache:

$ time bin/crystal build -s src/compiler/crystal.cr
Using compiled compiler at .build/crystal
Parse:                             00:00:00.000036541 (   0.83MB)
Semantic (top level):              00:00:00.312257667 ( 143.78MB)
Semantic (new):                    00:00:00.002320750 ( 143.78MB)
Semantic (type declarations):      00:00:00.039073250 ( 143.78MB)
Semantic (abstract def check):     00:00:00.014990125 ( 159.78MB)
Semantic (restrictions augmenter): 00:00:00.009610959 ( 159.78MB)
Semantic (ivars initializers):     00:00:04.531297042 (1559.70MB)
Semantic (cvars initializers):     00:00:00.005627000 (1559.70MB)
Semantic (main):                   00:00:00.801419375 (1735.70MB)
Semantic (cleanup):                00:00:00.000553834 (1735.70MB)
Semantic (recursive struct check): 00:00:00.001289375 (1735.70MB)
Codegen (crystal):                 00:00:03.953488458 (2111.70MB)
Codegen (bc+obj):                  00:00:00.495159166 (2111.70MB)
Codegen (linking):                 00:00:00.632672083 (2111.70MB)
dsymutil:                          00:00:00.376506500 (2111.70MB)
bin/crystal build -s src/compiler/crystal.cr  16.43s user 1.42s system 158% cpu 11.293 total

So 2.111GB of memory, and about the same time.

I hoped the times would go down a bit more, but at least 200MB less for compiling the compiler seems good.

@asterite
Copy link
Member Author

Note: I'm still not convinced we should merge this.

Maybe others can run benchmarks on their machines, with their projects, to see if there's any noticeable difference.

@jwoertink
Copy link
Contributor

For fun I decided to give this a try:

This is compiling my app with the compiler on master (1.6.0.dev), no warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000037160 (   0.76MB)
Semantic (top level):              00:00:21.296049102 ( 652.70MB)
Semantic (new):                    00:00:00.006644233 ( 652.70MB)
Semantic (type declarations):      00:00:00.115471755 ( 652.70MB)
Semantic (abstract def check):     00:00:00.036184495 ( 684.70MB)
Semantic (restrictions augmenter): 00:00:00.030057488 ( 700.70MB)
Semantic (ivars initializers):     00:00:00.132122853 ( 780.70MB)
Semantic (cvars initializers):     00:00:00.018864900 ( 796.70MB)
Semantic (main):                   00:00:07.086655243 (2283.70MB)
Semantic (cleanup):                00:00:00.001547965 (2283.70MB)
Semantic (recursive struct check): 00:00:00.003476881 (2283.70MB)
Codegen (crystal):                 00:00:07.836129589 (3019.70MB)
Codegen (bc+obj):                  00:00:08.676941735 (3019.70MB)
Codegen (linking):                 00:00:04.986041171 (3019.70MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: 00:00:09.341495853
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: 00:00:10.235055927
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002888857)
real	0m50.348s
user	1m21.199s
sys	0m5.408s

With a warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000027830 (   0.76MB)
Semantic (top level):              00:00:01.079584170 ( 555.76MB)
Semantic (new):                    00:00:00.007232593 ( 555.76MB)
Semantic (type declarations):      00:00:00.112843284 ( 555.76MB)
Semantic (abstract def check):     00:00:00.034558802 ( 587.76MB)
Semantic (restrictions augmenter): 00:00:00.029456914 ( 587.76MB)
Semantic (ivars initializers):     00:00:00.133077860 ( 683.76MB)
Semantic (cvars initializers):     00:00:00.018820298 ( 683.76MB)
Semantic (main):                   00:00:06.719366274 (2124.69MB)
Semantic (cleanup):                00:00:00.001525072 (2124.69MB)
Semantic (recursive struct check): 00:00:00.003323244 (2124.69MB)
Codegen (crystal):                 00:00:10.018644245 (2815.69MB)
Codegen (bc+obj):                  00:00:04.566887130 (2815.69MB)
Codegen (linking):                 00:00:04.789592386 (2815.69MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002541069)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.002758590)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002699890)

Codegen (bc+obj):
 - 4269/4412 .o files were reused

These modules were not reused:
 - a very long list of my code lol

real	0m27.632s
user	0m35.284s
sys	0m4.272s

This branch with no warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000033191 (   0.76MB)
Semantic (top level):              00:00:01.111511339 ( 554.45MB)
Semantic (new):                    00:00:00.007689575 ( 554.45MB)
Semantic (type declarations):      00:00:00.114648171 ( 554.45MB)
Semantic (abstract def check):     00:00:00.033110647 ( 570.45MB)
Semantic (restrictions augmenter): 00:00:00.029217100 ( 586.45MB)
Semantic (ivars initializers):     00:00:00.131192539 ( 666.45MB)
Semantic (cvars initializers):     00:00:00.018006393 ( 666.45MB)
Semantic (main):                   00:00:07.057628712 (1851.38MB)
Semantic (cleanup):                00:00:00.001507374 (1851.38MB)
Semantic (recursive struct check): 00:00:00.003049429 (1851.38MB)
Codegen (crystal):                 00:00:09.843128251 (2574.38MB)
Codegen (bc+obj):                  00:00:04.726382542 (2574.38MB)
Codegen (linking):                 00:00:04.785858078 (2574.38MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002809027)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.002745616)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002840708)
real	0m27.988s
user	0m32.999s
sys	0m4.424s

This branch with warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server

Parse:                             00:00:00.000030140 (   0.76MB)
Semantic (top level):              00:00:01.083887540 ( 555.63MB)
Semantic (new):                    00:00:00.007279427 ( 555.63MB)
Semantic (type declarations):      00:00:00.114083008 ( 555.63MB)
Semantic (abstract def check):     00:00:00.033601663 ( 571.63MB)
Semantic (restrictions augmenter): 00:00:00.028796460 ( 587.63MB)
Semantic (ivars initializers):     00:00:00.128038875 ( 667.63MB)
Semantic (cvars initializers):     00:00:00.017660649 ( 667.63MB)
Semantic (main):                   00:00:06.503724956 (1852.56MB)
Semantic (cleanup):                00:00:00.001476434 (1852.56MB)
Semantic (recursive struct check): 00:00:00.002905585 (1852.56MB)
Codegen (crystal):                 00:00:09.960081052 (2591.56MB)
Codegen (bc+obj):                  00:00:00.618658292 (2591.56MB)
Codegen (linking):                 00:00:04.913773358 (2591.56MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002599683)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.004127867)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002720205)

Codegen (bc+obj):
 - all previous .o files were reused

real	0m23.521s
user	0m28.763s
sys	0m3.914s

So looks like it compiles with about 500mb less, and in about 1min quicker?

@HertzDevil
Copy link
Contributor

In your tests two of the macro runs were fresh only in the first compilation.

@asterite
Copy link
Member Author

I managed to introduce an abstraction for this concept of "zero, one or many." I had trouble doing that at first because making push and concat work would be impossible if it's a struct. So I made the choice of explicitly asking to reassign the value if it's mutated, and named the methods with and without to make that explicit.

We'll see if CI passes.

Doing that refactor actually caught a case where @dependencies was always created with an array of a single element, so in that case there's no allocation anymore. And that case was pretty frequent too! So that reduced the memory usage a bit more.

If all goes well, making a call args, or in general most of ASTNodes that hold an array of other nodes use this type, might give a really big memory save, and maybe also a performance boost. But I'd leave that for a later PR if we go forward with this one.

@asterite
Copy link
Member Author

With this new change, here's compiling the compiler without cache:

Parse:                             00:00:00.000235917 (   0.83MB)
Semantic (top level):              00:00:00.344559792 ( 143.53MB)
Semantic (new):                    00:00:00.002485083 ( 143.53MB)
Semantic (type declarations):      00:00:00.040520083 ( 143.53MB)
Semantic (abstract def check):     00:00:00.015642417 ( 159.53MB)
Semantic (restrictions augmenter): 00:00:00.009304917 ( 159.53MB)
Semantic (ivars initializers):     00:00:04.121719666 (1415.45MB)
Semantic (cvars initializers):     00:00:00.005593833 (1415.45MB)
Semantic (main):                   00:00:05.605343458 (1655.45MB)
Semantic (cleanup):                00:00:00.000618042 (1655.45MB)
Semantic (recursive struct check): 00:00:00.001130084 (1655.45MB)
Codegen (crystal):                 00:00:03.241560791 (1911.45MB)
Codegen (bc+obj):                  00:00:04.368373875 (1911.45MB)
Codegen (linking):                 00:00:00.636823125 (1911.45MB)
dsymutil:                          00:00:00.370766208 (1911.45MB)
bin/crystal build -s src/compiler/crystal.cr  34.58s user 2.26s system 182% cpu 20.145 total

That's 200MB less than before! And about 3 seconds faster too.

With a fully warm cache:

Parse:                             00:00:00.000032125 (   0.83MB)
Semantic (top level):              00:00:00.313381959 ( 143.53MB)
Semantic (new):                    00:00:00.002431584 ( 143.53MB)
Semantic (type declarations):      00:00:00.040245291 ( 143.53MB)
Semantic (abstract def check):     00:00:00.015619792 ( 159.53MB)
Semantic (restrictions augmenter): 00:00:00.009290666 ( 159.53MB)
Semantic (ivars initializers):     00:00:04.119975833 (1415.45MB)
Semantic (cvars initializers):     00:00:00.005418167 (1415.45MB)
Semantic (main):                   00:00:00.764760042 (1591.45MB)
Semantic (cleanup):                00:00:00.000504583 (1591.45MB)
Semantic (recursive struct check): 00:00:00.001178584 (1591.45MB)
Codegen (crystal):                 00:00:03.177530417 (1975.45MB)
Codegen (bc+obj):                  00:00:00.495698875 (1975.45MB)
Codegen (linking):                 00:00:00.626489041 (1975.45MB)
dsymutil:                          00:00:00.389252333 (1975.45MB)
bin/crystal build -s src/compiler/crystal.cr  14.94s user 1.37s system 162% cpu 10.066 total

About 1 second less than before. And given that it was taking about 11 seconds before and now it's 10 seconds, that's like a 10% speed improvement!

@asterite
Copy link
Member Author

I continued changing some code from Array(T) to ZeroOneOrMany(T), specifically in these cases:

  • The set of methods a call resolves to (target_defs) is usually of size one. Multidispatch is possible, but it's way less frequent than no multidispatch.
  • Likewise, when looking up methods, it's likely that the answer will be a single method, not multiple ones.

I'm sure there are many other places where "just one" is more common than "a lot", so I'll see if I can find more, but for now this change is already pretty good.

Here are some stats comparing the compiler before and after this PR. All stats are with a warm cache (which makes sense because these changes mostly affect the semantic phase):

  • Compiler:
    • Before: 2367.61MB, 11.330s
    • After: 1927.77MB, 10.116s
  • All spec:
    • Before: 4511.62MB, 29.386s
    • After: 3583.64MB, 25.225s
  • Mint:
    • Before: 1047.61MB, 7.367s
    • After: 851.64MB, 6.749s

So it seems that memory goes down by about 20%, and the time goes down by about 10%. Every small win counts!

@jwoertink I'd love to see your stats with the latest changes in this PR 💙

@sdogruyol
Copy link
Member

sdogruyol commented Aug 22, 2022

I've just tried this PR with an M1 using Mint and I can say that I'm getting close results to @asterite, overall of 10% faster compile time and 20% less memory

Crystal 1.5.0:

  • Memory: 1031.61 MB
  • Time: 7.927 seconds

This PR:

  • Memory: 851.78 MB
  • Time: 7.016 seconds

🚀 🚀 🚀

@jwoertink
Copy link
Contributor

@asterite

This is with warm cache:

❯ time /home/jeremy/Development/crystal/lang/bin/crystal build -s src/start_server.cr -o bin/start_server
Using compiled compiler at /home/jeremy/Development/crystal/lang/.build/crystal
Parse:                             00:00:00.000028701 (   0.76MB)
Semantic (top level):              00:00:01.162010325 ( 555.73MB)
Semantic (new):                    00:00:00.007236840 ( 555.73MB)
Semantic (type declarations):      00:00:00.123850789 ( 555.73MB)
Semantic (abstract def check):     00:00:00.039450360 ( 587.73MB)
Semantic (restrictions augmenter): 00:00:00.032601763 ( 587.73MB)
Semantic (ivars initializers):     00:00:00.127918452 ( 667.73MB)
Semantic (cvars initializers):     00:00:00.018243774 ( 667.73MB)
Semantic (main):                   00:00:07.484193000 (1836.66MB)
Semantic (cleanup):                00:00:00.001705514 (1836.66MB)
Semantic (recursive struct check): 00:00:00.003096386 (1836.66MB)
Codegen (crystal):                 00:00:09.974950788 (2575.66MB)
Codegen (bc+obj):                  00:00:04.800466018 (2575.66MB)
Codegen (linking):                 00:00:05.414338341 (2575.66MB)

Macro runs:
 - /home/jeremy/Sites/joysticktv/lib/teeplate/src/lib/file_tree/macros/directory.cr: reused previous compilation (00:00:00.002753663)
 - /home/jeremy/Sites/joysticktv/lib/lucky/src/run_macros/generate_asset_helpers.cr: reused previous compilation (00:00:00.002784373)
 - /home/jeremy/Development/crystal/lang/src/ecr/process.cr: reused previous compilation (00:00:00.002700702)

Codegen (bc+obj):
 - 4269/4413 .o files were reused

These modules were not reused:
 - _main (_main.bc)
 - Process (P-rocess.bc)
 - Crystal::Signal (C-rystal5858S-ignal.bc)
 - Log::AsyncDispatcher (L-og5858A-syncD-ispatcher.bc)
 - Hash(Int32, PG::Decoders::Decoder) (H-ash40I-nt324432P-G-5858D-ecoders5858D-ecoder41.bc)
 - ASN1::BER::Length (A-S-N-15858B-E-R-5858L-ength.bc)
 - Lucky::Params (L-ucky5858P-arams.bc)
 - DB::Database (D-B-5858D-atabase.bc)
 - Avram::QueryBuilder (A-vram5858Q-ueryB-uilder.bc)
 - a lot more of my code

real	0m29.340s
user	0m35.442s
sys	0m5.911s

@j8r
Copy link
Contributor

j8r commented Aug 23, 2022

For me on Debian Linux, using LLVM12, I noticed the opposite: use a little more RAM (~40MB) but seems a bit faster (~1%), however it is within margin of error.

I ran rm -rf ~/.cache/crystal/; time bin/crystal build -s src/compiler/crystal.cr and used the master branch then the upstream/opt/less-arrays one.

@j8r
Copy link
Contributor

j8r commented Aug 23, 2022

Here are the results I got:

master
Using compiled compiler at .build/crystal
Parse:                             00:00:00.000033980 (   0.76MB)
Semantic (top level):              00:00:00.539207291 ( 140.77MB)
Semantic (new):                    00:00:00.002808200 ( 140.77MB)
Semantic (type declarations):      00:00:00.055299521 ( 156.77MB)
Semantic (abstract def check):     00:00:00.049569422 ( 156.77MB)
Semantic (restrictions augmenter): 00:00:00.018224674 ( 156.77MB)
Semantic (ivars initializers):     00:00:10.066043345 (1764.70MB)
Semantic (cvars initializers):     00:00:00.012062183 (1780.70MB)
Semantic (main):                   00:00:11.557960834 (2100.70MB)
Semantic (cleanup):                00:00:00.001043465 (2100.70MB)
Semantic (recursive struct check): 00:00:00.002456341 (2100.70MB)
Codegen (crystal):                 00:00:09.732771761 (2508.70MB)
Codegen (bc+obj):                  00:00:12.532001324 (2508.70MB)
Codegen (linking):                 00:00:05.949288418 (2508.70MB)

Macro runs:
 - /home/jrei/Desktop/Projects/crystal/crystal/src/ecr/process.cr: 00:00:08.687054383

Codegen (bc+obj):
 - no previous .o files were reused
bin/crystal build -s src/compiler/crystal.cr  87.03s user 8.29s system 187% cpu 50.767 total
opt/less-arrays
Using compiled compiler at .build/crystal
Parse:                             00:00:00.000068820 (   0.76MB)
Semantic (top level):              00:00:00.577303302 ( 140.55MB)
Semantic (new):                    00:00:00.003032252 ( 140.55MB)
Semantic (type declarations):      00:00:00.059802349 ( 156.55MB)
Semantic (abstract def check):     00:00:00.051617765 ( 156.55MB)
Semantic (restrictions augmenter): 00:00:00.020554056 ( 156.55MB)
Semantic (ivars initializers):     00:00:09.877500313 (1780.48MB)
Semantic (cvars initializers):     00:00:00.012239180 (1796.48MB)
Semantic (main):                   00:00:11.417950642 (2116.48MB)
Semantic (cleanup):                00:00:00.000847293 (2116.48MB)
Semantic (recursive struct check): 00:00:00.002358350 (2116.48MB)
Codegen (crystal):                 00:00:09.996958760 (2524.48MB)
Codegen (bc+obj):                  00:00:12.531236392 (2540.48MB)
Codegen (linking):                 00:00:05.675753104 (2540.48MB)

Macro runs:
 - /home/jrei/Desktop/Projects/crystal/crystal/src/ecr/process.cr: 00:00:08.648373085

Codegen (bc+obj):
 - no previous .o files were reused
bin/crystal build -s src/compiler/crystal.cr  88.21s user 8.35s system 191% cpu 50.469 total

Edit: I ran them multiple times and this results are the most representative.

@asterite
Copy link
Member Author

@j8r Thanks!

I find it suspicious that the memory goes up.

Just in case, here are the steps to reproduce the results:

  1. Checkout this PR's branch
  2. Run bin/crystal build -s src/compiler/crystal.cr many times until the output says that all previous .o files were reused (in both your outputs that's not the case)
  • This output shows the times before this PR. (why does it say "using compiled compiler" in your case? That shouldn't be the case)
  1. Run make clean crystal release=1
  2. Run point 2 again
  • This output shows the times with this PR (in this case it should say "using compiled compiler")

Are you sure you ran the steps above?

@Fryguy
Copy link
Contributor

Fryguy commented Aug 29, 2022

What if instead of + it's <<? To me, the shovel operator implies that the receiver is mutated, whereas + implies that a copy is being returned. += thus implies that a copy is returned and the receiver is replaced with the copy (different than a mutation).

@asterite
Copy link
Member Author

The main issue with using << or push is that when updating code that uses Array(T) to ZeroOneOrMany(T), code might still compile but it might not do what you expected, mainly because the type is a struct.

What's why I chose the name with to make it very explicitly that something uncommon is happening there.

+ doesn't have that much of an issue because that's expected to return a new value, even if it mutates the original one.

@asterite
Copy link
Member Author

I'll try once more to rename them to push, concat and reject! and see how it goes. Maybe even if this is a struct it'll work fine in most cases.

@straight-shoota
Copy link
Member

Regarding generalizing with OneOrMany(T) I suppose it might be an option to keep the special handling for Nil values (meaning no value as opposed to a nil value), but let it be configurable via the type parameter: OneOrMany(T) means there must be at least one value, OneOrMany(T | Nil) means there can be zero values (and the wrapper array is of type Array(T)). Not sure if this is too much of a hack or another brilliant idea 😆

@asterite
Copy link
Member Author

asterite commented Sep 1, 2022

Regarding generalizing with OneOrMany(T)

If we go forward with this, I'd prefer a separate OneOrMany type.

@asterite
Copy link
Member Author

asterite commented Sep 6, 2022

Do you think we'll merge this before 1.6.0? And if not... ever? I have a lot more optimization PRs after this one (I already sent two of them) but I need all PRs to be merged to continue, otherwise I have to duplicate code and effort on every new optimization I try.

Just asking to get an answer, not to rush anyone.

@straight-shoota straight-shoota added this to the 1.6.0 milestone Sep 6, 2022
@asterite
Copy link
Member Author

asterite commented Sep 6, 2022

@straight-shoota Thanks for the approval!

What do you think about #12426 ? Options:

  1. Close that PR
  2. Merge that PR into this one, and then merge this one
  3. Merge this PR first, then see if we merge the other one

@straight-shoota
Copy link
Member

Oh, I forgot about that one. I think it looks good, we can probably merge it into this PR before merging into master.

@straight-shoota
Copy link
Member

This was eventually reverted in #12849 because if broke the interpreter (#12769).

We probably stil want to bring this back with fixed behaviour. That's tracked in #12851.

carlhoerberg pushed a commit to carlhoerberg/crystal that referenced this pull request Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants