Reading list

Here is a brief list of relevant readings about GHC internals and WebAssembly suited for newcomers.

  • GHC documentation regarding the GHC API: a nice reading for anyone looking forward to using the GHC API.

  • GHC commentary: a wiki containing lots of additional knowledge regarding GHC's implementation. Keep in mind some content is out-dated though. Some useful entries regarding this project:

    • Building guide. A tl;dr for this section is our CI scripts.
    • Overview of pipeline: we use the Hooks mechanism (specifically, runPhaseHook) to replace the default pipeline with our own, to enable manipulation of in-memory IRs.
    • How STG works: a nice tutorial containing several examples of compiled examples, illustrating how the generated code works under the hood.
    • The Cmm types: it's outdated and the types don't exactly match the GHC codebase now, but the explanations still shed some light on how the current Cmm types work.
    • The runtime system: content regarding the runtime system.
  • Understanding the Stack: A blog post explaining how generated code works at the assembly level. Also, its sequel Understanding the RealWorld

  • The WebAssembly spec: a useful reference regarding what's already present in WebAssembly.

  • The binaryen C API: binaryen handles WebAssembly code generation. There are a few differences regarding binaryen AST and WebAssembly AST, the most notable ones:

    • binaryen uses a recursive BinaryenExpression which is side-effectful. The original WebAssembly standard instead uses a stack-based model and manipulates the operand stack with instructions.

    • binaryen contains a "Relooper" which can recover high-level structured control flow from a CFG. However the relooper doesn't handle jumping to unknown labels (aka computed goto), so we don't use it to handle tail calls.

The following entries are papers which consume much more time to read, but still quite useful for newcomers:

Finally, the GHC codebase itself is also a must-read, but since it's huge we only need to check relevant parts when unsure about its behavior. Tips on reading GHC code:

  • There are a lot of insightful and up-to-date comments which all begin with "Notes on xxx". It's a pity the notes are neither collected into the sphinx-generated documentation or into the haddock docs of GHC API.

  • When writing for compiling GHC, add HADDOCK_DOCS = YES to ensure building haddock docs of GHC API, and EXTRA_HADDOCK_OPTS += --quickjump --hyperlinked-source to enable symbol hyperlinks in the source pages. This will save you tons of time from greping the ghc codebase.

  • greping is still unavoidable in some cases, since there's a lot of CPP involved and they aren't well handled by haddock.