WebAssembly as a Haskell compilation target
There are a few issues to address when compiling Cmm to WebAssembly.
Implementing Haskell Stack/Heap
The Haskell runtime maintains a TSO(Thread State Object) for each Haskell thread, and each TSO contains a separate stack for the STG machine. The WebAssembly platform has its own "stack" concept though; the execution of WebAssembly is based on a stack machine model, where instructions consume operands on the stack and push new values onto it.
We use the linear memory to simulate Haskell stack/heap. Popping/pushing the Haskell stack only involves loading/storing on the linear memory. Heap allocation only involves bumping the heap pointer. Running out of space will trigger a WebAssembly trap, instead of doing GC.
All discussions in the documentation use the term "stack" for the Haskell stack, unless explicitly stated otherwise.
Implementing STG machine registers
The Haskell runtime makes use of "virtual registers" like Sp, Hp or R1 to
implement the STG machine. The NCG(Native Code Generator) tries to map some of
the virtual registers to real registers when generating assembly code. However,
WebAssembly doesn't have language constructs that map to real registers, so we
simply implement Cmm local registers as WebAssembly locals, and global
registers as fields of StgRegTable.
Handling control flow
WebAssembly currently enforces structured control flow, which prohibits arbitrary branching. Also, explicit tail calls are missing.
The Cmm control flow mainly involves two forms of branching: in-function or
cross-function. Each function consists of a map from hoopl labels to basic
blocks and an entry label. Branching happens at the end of each basic block.
In-function branching is relatively easier to handle. binaryen provides a
"relooper" which can recover WebAssembly instructions with structured control
flow from a control-flow graph. Note that we're using our own relooper though,
see issue #22 for relevant
discussion.
Cross-function branching (CmmCall) is tricky. WebAssembly lacks explicit tail
calls, and the relooper can't be easily used in this case since there's a
computed goto, and potential targets include all Cmm blocks involved in
linking. There are multiple possible ways to handle this situation:
-
Collect all Cmm blocks into one function, additionally add a "dispatcher" block. All
CmmCalls save the callee to a register and branch to the "dispatcher" block, and the "dispatcher" usesbr_tableor a binary decision tree to branch to the entry block of callee. -
One WebAssembly function for one
CmmProc, and uponCmmCallthe function returns the function id of callee. A mini-interpreter function at the top level repeatedly invoke the functions usingcall_indirect. This approach is actually used by the unregisterised mode ofghc.
We're using the latter approach: every CmmProc marshals to one WebAssembly
function. This choice is tightly coupled with some other functionalities (e.g.
debug mode) and it'll take quite some effort to switch away.
Handling relocations
When producing a WebAssembly binary, we need to map CLabels to the precise
linear memory locations for CmmStatics or the precise table ids for
CmmProcs. They are unknown when compiling individual modules, so binaryen
is invoked only when linking, and during compiling we only convert CLabels to
some serializable representation.
Currently WebAssembly community has a
proposal
for linkable object format, and it's prototyped by lld. We'll probably turn
to that format and use lld some day, but right now we'll simply stick to our
own format for simplicity.
The word size story
Although wasm64 is scheduled, currently only wasm32 is implemented.
However, we are running 64-bit ghc, and there are several places which need
extra care:
- The load/store instructions operate on 64-bit addresses, yet
wasm32useuint32when indexing into the linear memory. - The
CmmSwitchlabels are 64-bit.CmmCondBranchalso checks a 64-bit condition.br_if/br_tableoperates onuint32. - Only
i32/i64is supported bywasm32value types, but in Cmm we also need arithmetic on 8-bit/16-bit integers.
We insert instructions for converting between 32/64-bits in the codegen. The
binaryen validator also helps checking bit lengths.
As for booleans: there's no native boolean type in either WebAssembly or Cmm.
As a convention we use uint32.
Pages and addresses
The WebAssembly linear memory has a hard-coded page size of 64KB. There are several places which operate in units of pages rather than raw bytes:
CurrentMemory/GrowMemoryMemorycomponent of aModule
When performing final linking, we layout static data segments to the linear
memory. We ensure the memory size is always divisible by MBLOCK_SIZE, so it's
easy to allocate new mega blocks and calculate required page count.
The first 8 bytes of linear memory (from 0x0 to 0x7) are uninitialized. 0x0 is treated as null pointer, and loading/storing on null pointer or other uninitialized regions is prohibited. In debug mode the program immediately aborts.