Skip to content

Latest commit

 

History

History
138 lines (100 loc) · 5.9 KB

readme.md

File metadata and controls

138 lines (100 loc) · 5.9 KB

GHC's field selector optimisation

While looking at implementing GHC ticket #17991 I learned about the what and why of the field selector optimisation.

What are selectors?

When compiling GHC's intermediary language STG, selectors are identified by expressions/closures that look like this:

-- STG code
case variable of
  Constructor ... x ... -> x

So selectors are case expressions with one free variable that match one constructor selecting just one field of that constructor.

Haskell expressions of that same form will most likely be compiled to similar looking STG but other source Haskell forms like

-- Haskell code, fst as defined in the Prelude
fst :: (a,b) -> a
fst (x,_) = x

Will also be compiled to STG code that matches the is selector criteria. These selector expressions are identified and compiled specially in the first equation of GHC.StgToCmm.Bind.mkRhsClosure.

Compilation of field selector closures

GHC compiles closures to various special pieces of data and code in the final executable:

  • Header
    • Closure type
    • Closure info table, describing the layout of the payload
  • Payload, pointers and values
  • Code to evaluate the closure

(This is described better and in more detail on the GHC wiki about heap objects.)

Selectors get their own special closure type called THUNK_SELECTOR (defined in includes/rts/storage/ClosureTypes.h and their own special info table and code built in handcrafted Cmm.

The Cmm code lives in rts/StgStdThunks.cmm written there by a weird amalgamation of C-preprocessor macros and the GHC specific Cmm language.

The code starts with INFO_TABLE_SELECTOR, a special directive to the Cmm parser that makes the compiler emit a thunk with the special closure type and closure layout and following it is the Cmm code that will evaluate the selectee if necessary and then pick out the required field from it. Finally it jumps to stg_ap_0_fast(field), presumably to evaluate the field contents as needed.

The Cmm code comes in two flavours upd and noupd (probably meaning update), I do not yet know what the significance of this is.

From the code in StgStdThunks.cmm we can see that the special selector thunks are only available for fields one through 16 (the GHC issue is about extending this to larger numbers). For field numbers larger than 16 GHC will currently compile each selector to a regular StgThunk closure and emit code that only applies to that closure.

Field selectors during GC

All this special machinery has a purpose! During garbage collection the RTS will recognise these special thunks and if possible evaluate them. This happens in rts/sm/Evac.c:eval_thunk_selector.

For example if the GC finds a field selector thunk like stg_sel_2_upd(evaluated_data) it will leave just the value of field 2 in place of the selector thunk once it is done. But it is even more powerful:

If there is a chain of selector thunks like stg_sel_2_upd(stg_sel_3_upd(stg_sel_4_upd(some_data))) it will leave behind just the second field of the third field of the 4th field of some_data (if there were no thunks blocking selecting).

This means that during program evaluation we might not have to run the code associated with selecting a field, seems like a big win if it works out! Pretty neat.

Limitations and solutions

As the GHC ticket and the code we've seen the special casing only kicks in if we are interested in selecting fields 1-16. The suggestion is the introduce an additional special info table that has an additional payload naming the field to select. This seems like a neat generalisation since we get just 1 (or 2, for the upd and noupd variants) extra piece of code and eliminate all "normal" thunks for all field selectors as well as potentially enabling the GC to do its magic.

The suggestion is to generate code like stg_sel_n_upd(some_data, field_offset). To implement this some things are needed:

  1. A new special INFO_TABLE_SELECTOR thunk and code to implement the stg_sel_n_* thunks. This seems quite simple: the code will be much the same as the existing selector thunks, juts grabbing the offset from the heap instead of it being hardcoded.

  2. Generate code for this new selector thunk. This is mostly about grappling with the StgToCmm part of the GHC compiler which is large but probably understandable with some time spent.

  3. Teach the RTS and GC about this new variant of the selector thunks. Seems a bit trickier, currently a StgSelector stores it's offset hardcoded in the info table and only has 1 payload. But I think it can be approached as follows:

    1. Reuse the same info table as the hardcoded selectors but put a special value in the offset field. Perhaps just a value that is bigger than the current max of 16.
    2. Extend the StgSelector payload to be an array, first element is the selectee and a potential second element is the large offset.
    3. Teach the GC about the new layout.

This seems a bit hacky but may not require too many code changes. Another possibility is to introduce a new closure type THUNK_SELECTOR_N with accompanying StgSelectorN and info table. This seems slightly more invasive but perhaps cleaner.