Asterius is a Haskell to WebAssembly compiler based on GHC. It compiles simple Haskell source files or Cabal executable targets to WebAssembly+JavaScript code which can be run in node.js or browsers. It features seamless JavaScript interop (lightweight Async FFI with Promise support) and small output code (~600KB hello.wasm for a Hello World). A lot of common Haskell packages like lens are already supported. The project is actively maintained by Tweag I/O.

Contributors

Asterius is maintained by Tweag I/O.

Have questions? Need help? Tweet at @tweagio.

Overview

Asterius compiles Haskell code to WebAssembly (Wasm). Its frontend is based on GHC.

The Asterius pipeline provides everything to create a Wasm instance which exports the foreign exported functions (e.g. main) that can be called from JavaScript to execute the main Haskell program.

Asterius pipeline

Asterius piepeline

Using prebuilt container images

We host prebuilt container images on Docker Hub under the terrorjack/asterius repository. The images work with podman or docker.

About versioning

Whenever the master branch gets a new commit, we trigger an image build on our infrastructure. After the build completes, we push to the terrorjack/asterius:latest tag. When trying asterius locally, it's recommended to use terrorjack/asterius:latest since it follows master closely.

The images are built with the gitrev label to indicate the exact asterius repository revision. Use docker inspect terrorjack/asterius | grep "gitrev" to find out the revision info.

You may want to stick with a specific version of the prebuilt image for some time for more reproducibility in e.g. CI builds. In that case, browse for the tags page and use an image with a specific tag, e.g. terrorjack/asterius:200520. We always push a versioned tag first before we update the latest tag.

Using the image

We recommend podman for running containers from our prebuilt images. The following commands are compatible with docker as well; simply change podman to docker.

The images can be used interactively. Navigate to the project directory and use the following command to start an interactive bash session, mounting the current directory to /workspace. In the bash session we can use tools like ahc-cabal, ahc-dist or ahc-link to compile the Haskell sources.


terrorjack@hostname:/project$ podman run -it --rm -v $(pwd):/workspace -w /workspace terrorjack/asterius
root@hostname:/workspace#

It's also possible to use the images in a non-interactive manner:


terrorjack@hostname:/project$ podman run --rm -v $(pwd):/workspace -w /workspace terrorjack/asterius ahc-link --input-hs example.hs

Check the reference of the docker run command for details. podman run accepts most arguments of docker run and has its own extensions.

`podman`-specific tips

When using the prebuilt image with podman, things should work out of the box with the default configuration. Check the official installation guide on how to install podman in your environment. It's likely that you'd like to use podman with a non-root user, in which case make sure to check the official tutorial for non-root users before usage.

`docker`-specific tips

When using the prebuilt image with docker, there's a file permission problem with the default configuration: the default user in the container is root, and the processes will be run with the host root users as well. So programs like ahc-link will create output files owned by root in the host file system, which is a source of annoyance. Things still work fine as long as you don't mind manually calling chown to fix the permissions.

The proper solution is remapping the root user inside the container to the current non-root user. See the docker official userns-remap guide and this blog post for further explanation.

Building guide

Building and using `asterius` locally

Asterius is organized as a stack project at the moment. The reason is mainly historical: stack has builtin support for managing different sandboxed GHC installations, and we used to require a custom GHC fork to build, so using stack has been more convenient.

In principle, building with cabal should also work, but this hasn't been tested on CI yet. Some additional work is needed (checking in generated .cabal files, setting up a cabal project, etc) and PRs are welcome.

System dependencies

In addition to regular GHC dependencies, these dependencies are needed in the local environment:

git
binaryen (at least version_98)
automake, autoconf (required by ahc-boot)
cabal (at least v3.0.0.0)
node, npm (at least v12)
python3
stack
wasi-sdk (the WASI_SDK_PREFIX environment variable must point to the installation)

Preparing the source tree

After checking out, one needs to run a script to generate the in-tree private GHC API packages required by Asterius.


$ mkdir lib
$ pushd lib
$ ../utils/make-packages.py
$ rm -rf ghc
$ popd

The make-packages.py script will checkout our custom GHC fork, run hadrian to generate some autogen files, and generate several Haskell packages in lib. A run takes ~5min on CI. This script only needs to be run once. After that, Asterius can be built using vanilla GHC.

If it's inconvenient to run make-packages.py, it's also possible to download the generated packages from the CI artifacts. Check the CI log of a recent commit, and one of the artifacts is named lib. Download and unzip it in the project root directory.

Building `asterius`

After checking out and running make-packages.py, simply run stack build asterius to build it.

After the asterius package is built, run stack exec ahc-boot to perform booting. This will compile the standard libraries to WebAssembly and populate the asterius global package database. Some packages are compiled using ahc-cabal in the boot process, so internet is required at least for the first boot.

Calling executables of `asterius`

After the booting process completes, it's possible to use stack exec to call executables of asterius, e.g. ahc-link or ahc-cabal. Although it's possible to use stack install asterius to install the executables to somewhere in PATH and directly call them later, this is not recommended, since the asterius executables rely on certain components in the PATH set up by stack exec.

If direnv is enabled, then the shell session can automatically set up the correct PATH when navigating into the asterius project directory. Thus it's possible to directly call ahc-boot for booting, ahc-link for compiling, etc.

For trying small examples, it's convenient to put them in the test directory under the project root directory, since it's a .gitignore item, so they won't be tracked by git.

Building and using `asterius` with Docker

Using the prebuilt Docker image

The recommended way of trying asterius is using our prebuilt Docker image on Docker Hub. The image is updated regularly upon new master branch commits, and also ships ~2k prebuilt packages from a recent stackage snapshot, so it's convenient to test simple examples which use common dependencies without needing to set up a cabal project.

To use the image, mount the working directory containing the Haskell source code as a Docker shared volume, then use the ahc-link program:


username@hostname:~/project$ docker run --rm -it -v $(pwd):/project -w /project terrorjack/asterius
asterius@hostname:/project$ ahc-link --input-hs main.hs

Check the official reference of docker run to learn more about the command given in the example above. The example opens an interactive bash session for exploration, but it's also possible to use docker run to invoke the Asterius compiler on local Haskell source files. Note that podman can be used instead of docker here.

Building the Docker images

The prebuilt Docker image can be reproduced by building from the in-tree Dockerfiles.

base.Dockerfile can be used for building the base image. The base image contains an out-of-the-box installation of asterius, but doesn't come with the additional stackage packages. There's very aggressive trimming logic in base.Dockerfile to make the image slimmer, so in the resulting base image, there isn't a complete stack project directory for asterius, and it's not possible to modify the Haskell logic of asterius and partially rebuild/reboot it given a base image.

stackage.Dockerfile can be used for building the image containing additional stackage packages upon the base image. Modify lts.sh for adding/removing packages to be built into the final image, and ghc-toolkit/boot-libs/cabal.config for modifying the package version constraints. All the stackage packages are installed into the asterius global package database, so they can be directly used by ahc-link, but this shouldn't affect ahc-cabal for installing other versions of those packages elsewhere.

The image for VSCode remote containers

dev.Dockerfile is used to build terrorjack/asterius:dev, which is the image for VSCode remote containers.

Cabal support

Asterius now has preliminary Cabal support. By substituting toolchain executables like ghc/ghc-pkg and supplying some other configure options, Cabal can build static libraries and "executables" using Asterius. The "executables" can be quickly converted to node/web artifacts using ahc-dist.

We also provide ahc-cabal which is a wrapper for cabal. ahc-cabal works with typical nix-style commands like new-update/new-build, etc. The legacy commands with v1 prefix may also work.

Using `ahc-link`/`ahc-dist`

ahc-link is the frontend program of Asterius. It takes a Haskell Main module and optionally an ES6 "entry" module as input, then emits a .wasm WebAssembly binary module and companion JavaScript files, which can then be run in environments like Node.js or browsers.

ahc-dist works similarly, except it takes the pseudo-executable file generated from ahc-cabal as input. All command-line arguments are the same as ahc-link, except ahc-link takes --input-hs, while ahc-dist takes --input-exe.

Quick examples

Compiling a Haskell file, running the result with node immediately: ahc-link --input-hs hello.hs --run

Compiling for browsers, bundling JavaScript modules to a single script: ahc-link --input-hs hello.hs --browser --bundle

Compiling a Cabal executable target: ahc-cabal new-install --installdir . hello && ahc-dist --input-exe hello --run

Reference

`--input-hs ARG`

The Haskell Main module's file path. This option is mandatory; all others are optional. Works only for ahc-link.

The Main module may reference other local modules, as well as packages in the asterius global package database.

`--input-exe ARG`

The pseudo-executable file path. A pseudo-executable is produced by using ahc-cabal to compile a Cabal executable target. This works only for ahc-dist, and is also mandatory.

`--input-mjs ARG`

The ES6 "entry" module's file path. If not specified, a default entry module will be generated, e.g. xxx.hs's entry script will be xxx.mjs. The entry module can either be run by node, or included in a <script> tag, depending on the target supplied at link time.

It's possible to override the default behavior by specifying your own entry module. The easiest way to write a custom entry module is to modify the default one:


import * as rts from "./rts.mjs";
import module from "./xxx.wasm.mjs";
import req from "./xxx.req.mjs";

module
  .then(m => rts.newAsteriusInstance(Object.assign(req, { module: m })))
  .then(i => {
    i.exports.main();
  });

xxx.wasm.mjs and xxx.req.mjs are generated at link-time. xxx.wasm.mjs exports a default value, which is a Promise resolving to a WebAssembly.Module value. xxx.req.mjs exports the "request object" containing app-specific data required to initialize an instance. After adding the module field to the request object, the result can be used as the input to newAsteriusInstance exported by rts.mjs.

newAsteriusInstance will eventually resolve to an Asterius instance object. Using the instance object, one can call the exported Haskell functions.

`--output-directory ARG`

Specifies the output directory. Defaults to the same directory of --input-hs.

`--output-prefix` ARG

Specifies the prefix of the output files. Defaults to the base filename of --input-hs, so for xxx.hs, we generate xxx.wasm, xxx.req.mjs, etc.

`--verbose-err`

This flag will enable more verbose runtime error messages. By default, the data segments related to runtime messages and the function name section are stripped in the output WebAssembly module for smaller binary size.

When reporting a runtime error in the asterius issue tracker, it is recommended to compile and run the example with --verbose-err so there's more info available.

`--no-main`

This is useful for compiling and linking a non-Main module. This will pass -no-hs-main to GHC when linking, and the usual i.exports.main() main function won't be available.

Note that the default entry script won't work for such modules, since there isn't an exported main functions, but it's still possible to export other Haskell functions and call them from JavaScript; do not forget to use --export-function=.. to specify those functions.

`--browser`

Indicates the output code is targeting the browser environment. By default, the target is Node.js.

Since the runtime contains platform-specific modules, the compiled WebAssembly/JavaScript code only works on a single specific platform. The pseudo-executable generated by ahc or ahc-cabal is platform-independent though; it's possible to compile Haskell to a pseudo-executable, and later use ahc-dist to generate code for different platforms.

`--bundle`

Instead of generating a bunch of ES6 modules in the target directory, generate a self-contained xxx.js script, and running xxx.js has the same effect as running the entry module. Only works for the browser target for now.

--bundle is backed by webpack under the hood and performs minification on the bundled JavaScript file. It's likely beneficial since it reduces the total size of scripts and doesn't require multiple requests for fetching them.

`--tail-calls`

Enable the WebAssembly tail call opcodes. This requires Node.js/Chromium to be called with the --experimental-wasm-return-call flag.

See the "Using experimental WebAssembly features" section for more details.

`--optimize-level=N`

Set the optimize level of binaryen. Valid values are 0 to 4. The default value is 4.

Check the relevant source code in binaryen for the passes enabled for different optimize/shrink levels here.

`--shrink-level=N`

Set the shrink level of binaryen. Valid values are 0 to 2. The default value is 2.

`--ghc-option ARG`

Specify additional ghc options. The {-# OPTIONS_GHC #-} pragma also works.

`--run`

Runs the output code using node. Ignored for browser targets.

`--debug`

Switch on the debug mode. The memory trap will be enabled, which replaces all load/store instructions in WebAssembly with load/store functions in JavaScript, performing aggressive validity checks on the addresses.

`--yolo`

Switch on the yolo mode. Garbage collection will never occur, instead the storage manager will simply allocate more memory upon heap overflows. This is mainly used for debugging potential gc-related runtime errors.

`--gc-threshold=N`

Set the gc threshold value to N MBs. The default value is 64. The storage manager won't perform actual garbage collection if the size of active heap region is below the threshold.

`--no-gc-sections`

Do not run dead code elimination.

`--export-function ARG`

For each foreign export javascript function f that will be called, a --export-function=f link-time flag is mandatory.

`--extra-root-symbol ARG`

Specify a symbol to be added to the "root symbol set". Root symbols and their transitive dependencies will survive dead code elimination.

`--output-ir`

Output Wasm IRs of compiled Haskell modules and the resulting module. The IRs aren't intended to be consumed by external tools like binaryen/wabt.

`--console-history`

The stdout/stderr of the runtime will preserve the already written content. The UTF-8 decoded history content can be fetched via i.stdio.stdout()/i.stdio.stderr(). These functions will also clear the history when called.

This flag can be useful when writing headless Node.js or browser tests and the stdout/stderr contents need to be compared against a file.

JavaScript FFI

Asterius implements JSFFI, which enables importing sync/async JavaScript code, and exporting static/dynamic Haskell functions. The JSFFI syntax and semantics is inspired by JSFFI in GHCJS, but there differ in certain ways.

Marshaling data between Haskell and JavaScript

Directly marshalable value types

There are mainly 3 kinds of marshalable value types which can be directly used as function arguments and return values in either JSFFI imports or exports:

Regular Haskell value types like Int, Ptr, StablePtr, etc. When the MagicHash and UnliftedFFITypes extensions are enabled, some unboxed types like Int# are also supported.
The JSVal type and its newtypes.
The Any type.

The JSVal type is exported by Asterius.Types. It represents an opaque JavaScript value in the Haskell world; one can use JSFFI imports to obtain JSVal values, pass them across Haskell/JavaScript, store them in Haskell data structures like ordinary Haskell values. JSVals are garbage collected, but it's also possible to call freeJSVal to explicitly free them in the runtime.

The Any type in GHC.Exts represents a boxed Haskell value, which is a managed pointer into the heap. This is only intended to be used by power users.

Just like regular ccall imports/exports, the result type of javascript imports/exports can be wrapped in IO or not.

The `JSVal` family of types

Other than JSVal, Asterius.Types additionally exports these types:

JSArray
JSFunction
JSObject
JSString
JSUint8Array

They are newtypes of JSVal and can be directly used as argument or result types as well. The runtime doesn't perform type-checking at the JavaScript side, e.g. it won't check if typeof $1 === "string" when $1 is declared as a JSString. It's up to the users to guarantee the runtime invariants about such JSVal wrapper types.

User-defined newtypes of JSVal can also be used as marshalable value types, as long as the newtype constructor is available in scope.

Marshaling structured data

Given the ability of passing simple value types, one can implement their own utilities for passing a piece of structured data either from JavaScript to Haskell, or vice versa.

To build a Haskell data structure from a JavaScript value, usually we write a builder function which recursively traverses the substructure of the JavaScript value (sequence, tree, etc) and build up the Haskell structure, passing one cell at a time. Similarly, to pass a Haskell data structure to JavaScript, we traverse the Haskell data structure and build up the JavaScript value.

The Asterius standard library provides functions for common marshaling purposes:


import Asterius.Aeson
import Asterius.ByteString
import Asterius.Text
import Asterius.Types

fromJSArray :: JSArray -> [JSVal]
toJSArray :: [JSVal] -> JSArray
fromJSString :: JSString -> String
toJSString :: String -> JSString
byteStringFromJSUint8Array :: JSUint8Array -> ByteString
byteStringToJSUint8Array :: ByteString -> JSUint8Array
textFromJSString :: JSString -> Text
textToJSString :: Text -> JSString
jsonToJSVal :: ToJSON a => a -> JSVal
jsonFromJSVal :: FromJSON a => JSVal -> Either String a
jsonFromJSVal' :: FromJSON a => JSVal -> a

The 64-bit integer precision problem

Keep in mind that when passing 64-bit integers via Int, Word, etc, precision can be lost, since they're represented by numbers on the JavaScript side. In the future, we may consider using bigints instead of numbers as the JavaScript representations of 64-bit integers to solve this issue.

JSFFI imports

JSFFI import syntax


import Asterius.Types

foreign import javascript unsafe "new Date()" current_time :: IO JSVal

foreign import javascript interruptible "fetch($1)" fetch :: JSString -> IO JSVal

The source text of foreign import javascript should be a single valid JavaScript expression, using $n to refer to the n-th argument (starting from 1). It's possible to use IIFE(Immediately Invoked Function Expression) in the source text, so more advanced JavaScript constructs can be used.

Sync/async JSFFI imports

The safety level in a foreign import javascript declaration indicates whether the JavaScript logic is asynchronous. When omitted, the default is unsafe, which means the JavaScript code will return the result synchronously. When calling an unsafe import, the whole runtime blocks until the result is returned from JavaScript.

The safe and interruptible levels mean the JavaScript code should return a Promise which later resolves with the result. The current thread will be suspended when such an import function is called, and resumed when the Promise resolves or rejects. Other threads may continue execution when a thread is blocked by a call to an async import.

Error handling in JSFFI imports

When calling a JSFFI import function, The JavaScript code may synchronously throw exceptions or reject the Promise with errors. They are wrapped as JSExceptions and thrown in the calling thread, and the JSExceptions can be handled like regular synchronous exceptions in Haskell. JSException is also exported by Asterius.Types; it contains both a JSVal reference to the original JavaScript exception/rejection value, and a String representation of the error, possibly including a JavaScript stack trace.

Accessing the asterius instance object

In the source text of a foreign import javascript declaration, one can access everything in the global scope and the function arguments. Additionally, there is an __asterius_jsffi binding which represents the Asterius instance object. __asterius_jsffi exposes certain interfaces for power users, e.g. __asterius_jsffi.exposeMemory() which exposes a memory region as a JavaScript typed array. The interfaces are largely undocumented and not likely to be useful to regular users.

There is one usage of __asterius_jsffi which may be useful to regular users though. Say that we'd like the JSFFI import code to call some 3rd-party library code, but we don't want to pollute the global scope; we can assign the library functions as additional fields of the Asterius instance object after it's returned by newAsteriusInstance(), then access them using __asterius_jsffi in the JSFFI import code.

JSFFI exports

JSFFI static exports


foreign export javascript "mult_hs" (*) :: Int -> Int -> Int

The foreign export javascript syntax can be used for exporting a static top-level Haskell function to JavaScript. The source text is the export function name, which must be globally unique. The supported export function types are the same with JSFFI imports.

For the exported functions we need to call in JavaScript, at link-time, each exported function needs an additional --export-function flag to be passed to ahc-link/ahc-dist, e.g. --export-function=mult_hs.

In JavaScript, after newAsteriusInstance() returns the Asterius instance object, one can access the exported functions in the exports field:


const r = await i.exports.mult_hs(6, 7);

Note that all exported Haskell functions are async JavaScript functions. The returned Promise resolves with the result when the thread successfully returns; otherwise it may reject with a JavaScript string, which is the serialized form of the Haskell exception if present.

It's safe to call a JSFFI export function multiple times, or call another JSFFI export function before a previous call resolves/rejects. The export functions can be passed around as first-class JavaScript values, called as ordinary JavaScript functions or indirectly as JavaScript callbacks. They can even be imported back to Haskell as JSVals and called in Haskell.

JSFFI dynamic exports


import Asterius.Types

foreign import javascript "wrapper" makeCallback :: (JSVal -> IO ()) -> IO JSFunction
foreign import javascript "wrapper oneshot" makeOneshotCallback :: (JSVal -> IO ()) -> IO JSFunction

freeHaskellCallback :: JSFunction -> IO ()

The foreign import javascript "wrapper" syntax can be used for exporting a Haskell function closure to a JavaScript function dynamically. The type signature must be of the form Fun -> IO JSVal, where Fun represents a marshalable JSFFI function type in either JSFFI imports or static exports, and the result can be JSVal or its newtype.

After declaring the "wrapper" function, one can pass a Haskell function closure to it and obtain the JSVal reference of the exported JavaScript function. The exported function can be used in the same way as the JSFFI static exports.

When a JSFFI dynamic export is no longer useful, call freeHaskellCallback to free it. The JSVal reference of the JavaScript callback as well as the StablePtr of the Haskell closure will be freed.

Sometimes, we expect a JSFFI dynamic export to be one-shot, being called for only once. For such one-shot exports, use foreign import javascript "wrapper oneshot". The runtime will automatically free the resources once the exported JavaScript is invoked, and there'll be no need to manually call freeHaskellCallback for one-shot exports.

Template Haskell

We added hooks to these iserv-related functions:

startIServ
stopIServ
iservCall
readIServ
writeIServ

The hook of hscCompileCoreExpr is also used. The implementation of the hooks are in Asterius.GHCi.Internals

Normally, startIServ and stopIServ starts/stops the current iserv process. We don't use the normal iserv library for iserv though; we use inline-js-core to start a node process. inline-js-core has its own mechanism of message passing between host/node, which is used for sending JavaScript code to node for execution and getting results. In the case of TH, the linked JavaScript and WebAssembly code is sent. Additionally, we create POSIX pipes and pass the file descriptors as environment variables to the sent code; so most TH messages are still passed via the pipes, like normal iserv processes.

The iservCall function is used for sending a Message to iserv and synchronously getting the result. The sent messages are related to linking, like loading archives and objects. Normally, linking is handled by the iserv process, since it's linked with GHC's own runtime linker. In our case, porting GHC's runtime linker to WebAssembly is going to be a huge project, so we still perform TH linking in the host ahc process. The linking messages aren't sent to node at all; using the hooked iservCall, we maintain our own in-memory linker state which records information like the loaded archives and objects.

When splices are executed, GHC first emits a RunTH message, then repeatedly queries the response message from iserv; if it's a RunTHDone, then the dust settles and GHC reads the execution result. The response message may also be a query to GHC, then GHC sends back the query result and repeat the loop. In our case, we don't send the RunTH message itself to node; RunTH indicates execution has begun, so we perform linking, and use inline-js-core to load the linked JavaScript and WebAssembly code, then create and initialize the Asterius instance object. The splice's closure address is known at link time, so we can apply the TH runner's function closure to the splice closure, and kick off evaluation from there. The TH runner function creates a fresh IORef QState, a Pipe from the passed in pipe file descriptors, and uses ghci library's own runTH function to run the splice. During execution, the Quasi class methods may be called, and on the node side, they are turned to THMessages sent back to the host via the Pipe, and the responses are then fetched.

Our function signatures of readIServ and writeIServ are modified. Normal GHC simply uses Get and Put in the binary library for reading/writing via the Pipe, but we simply read/write a polymorphic type variable a, with Binary and Typeable constraints. Having Binary constraint allows fetching the needed get and put functions, and Typeable allows us to inspect the message pre-serialization. This is important, since we need to catch RunTH or RunModFinalizer messages. As mentioned before, these messages aren't sent to node, and we have special logic to handle them.

As for hscCompileCoreExpr: it's used for compiling the CoreExpr of a splice and getting the resulting RemoteHValue. We don't support GHC bytecode, so we overload it and go through the regular pipeline, compile it down to Cmm, then WebAssembly, finally performing linking, using the closures of the TH runner function and the splice as "root symbols". The resulting RemoteHValue is not "remote" though; it's simply the static address of the splice's closure, and the TH runner function will need to encapsulate it as a RemoteRef before feeding to runTH.

TH WIP branch: asterius-TH

GitHub Project with relevant issues

Invoking RTS API in JavaScript

For the brave souls who prefer to play with raw pointers instead of syntactic sugar, it's possible to invoke RTS API directly in JavaScript. This grants us the ability to:

Allocate memory, create and inspect Haskell closures on the heap.
Trigger Haskell evaluation, then retrieve the results back into JavaScript.
Use raw Cmm symbols to summon any function, not limited to the "foreign exported" ones.

Here is a simple example. Suppose we have a Main.fact function:


fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n - 1)

The first step is ensuring fact is actually contained in the final WebAssembly binary produced by ahc-link. ahc-link performs aggressive dead-code elimination (or more precisely, live-code discovery) by starting from a set of "root symbols" (usually Main_main_closure which corresponds to Main.main), repeatedly traversing ASTs and including any discovered symbols. So if Main.main does not have a transitive dependency on fact, fact won't be included into the binary. In order to include fact, either use it in some way in main, or supply --extra-root-symbol=Main_fact_closure flag to ahc-link when compiling.

The next step is locating the pointer of fact. The "Asterius instance" type we mentioned before contains two "symbol map" fields: staticsSymbolMap maps static data symbols to linear memory absolute addresses, and functionSymbolMap maps function symbols to WebAssembly function table indices. In this case, we can use i.staticsSymbolMap.Main_fact_closure as the pointer value of Main_fact_closure. For a Haskell top-level function, there're also pointers to the info table/entry function, but we don't need those two in this example.

Since we'd like to call fact, we need to apply it to an argument, build a thunk representing the result, then evaluate the thunk to WHNF and retrieve the result. Assuming we're passing --asterius-instance-callback=i=>{ ... } to ahc-link, in the callback body, we can use RTS API like this:


const argument = i.exports.rts_mkInt(5);
const thunk = i.exports.rts_apply(i.staticsSymbolMap.Main_fact_closure, argument);
const tid = i.exports.rts_eval(thunk);
console.log(i.exports.rts_getInt(i.exports.getTSOret(tid)));

A line-by-line explanation follows:

Assuming we'd like to calculate fact 5, we need to build an Int object which value is 5. We can't directly pass the JavaScript 5, instead we should call rts_mkInt, which properly allocates a heap object and sets up the info pointer of an Int value. When we need to pass a value of basic type (e.g. Int, StablePtr, etc), we should always call rts_mk* and use the returned pointers to the allocated heap object.
Then we can apply fact to 5 by using rts_apply. It builds a thunk without triggering evaluation. If we are dealing with a curried multiple-arguments function, we should chain rts_apply repeatedly until we get a thunk representing the final result.
Finally, we call rts_eval, which enters the runtime and perform all the evaluation for us. There are different types of evaluation functions:
- rts_eval evaluates a thunk of type a to WHNF.
- rts_evalIO evaluates the result of IO a to WHNF.
- rts_evalLazyIO evaluates IO a, without forcing the result to WHNF. It is also the default evaluator used by the runtime to run Main.main.
All rts_eval* functions initiate a new Haskell thread for evaluation, and they return a thread ID. The thread ID is useful for inspecting whether or not evaluation succeeded and what the result is.
If we need to retrieve the result back to JavaScript, we must pick an evaluator function which forces the result to WHNF. The rts_get* functions assume the objects are evaluated and won't trigger evaluation.
Assuming we stored the thread ID to tid, we can use getTSOret(tid) to retrieve the result. The result is always a pointer to the Haskell heap, so additionally we need to use rts_getInt to retrieve the unboxed Int content to JavaScript.

Most users probably don't need to use RTS API manually, since the foreign import/export syntactic sugar and the makeHaskellCallback interface should be sufficient for typical use cases of Haskell/JavaScript interaction. Though it won't hurt to know what is hidden beneath the syntactic sugar, foreign import/export is implemented by automatically generating stub WebAssembly functions which calls RTS API for you.

IR types and transformation passes

This section explains various IR types in Asterius, and hopefully presents a clear picture of how information flows from Haskell to WebAssembly. (There's a similar section in jsffi.md which explains implementation details of JSFFI)

Cmm IR

Everything starts from Cmm, or more specifically, "raw" Cmm which satisfies:

All calls are tail calls, parameters are passed by global registers like R1 or on the stack.
All info tables are converted to binary data segments.

Check Cmm module in ghc package to get started on Cmm.

Asterius obtains in-memory raw Cmm via:

cmmToRawCmmHook in our custom GHC fork. This allow us to lay our fingers on Cmm generated by either compiling Haskell modules, or .cmm files (which are in rts)
There is some abstraction in ghc-toolkit, the compiler logic is actually in the Compiler datatype as some callbacks, and ghc-toolkit converts them to hooks, frontend plugins and ghc executable wrappers.

There is one minor annoyance with the Cmm types in GHC (or any other GHC IR type): it's very hard to serialize/deserialize them without setting up complicated contexts related to package databases, etc. To experiment with new backends, it's reasonable to marshal to a custom serializable IR first.

Pre-linking expression IR

We then marshal raw Cmm to an expression IR defined in Asterius.Types. Each compilation unit (Haskell module or .cmm file) maps to one AsteriusModule, and each AsteriusModule is serialized to a .asterius_o object file which will be deserialized at link time. Since we serialize/deserialize a structured expression IR faithfully, it's possible to perform aggressive LTO by traversing/rewriting IR at link time, and that's what we're doing right now.

The expression IR is mostly a Haskell modeling of a subset of binaryen's expression IR, with some additions:

Unresolved related variants, which allow us to use a symbol as an expression. At link time, the symbols are re-written to absolute addresses.
Unresolved locals/globals. At link time, unresolved locals are laid out to Wasm locals, and unresolved globals (which are really just Cmm global regs) become fields in the global Capability's StgRegTable.
EmitErrorMessage, as a placeholder of emitting a string error message then trapping. At link time, such error messages are collected into an "error message pool", and the Wasm code is just "calling some error message reporting function with an array index".
Null. We're civilized, educated functional programmers and should really be using Maybe Expression in some fields instead of adding a Null constructor, but this is just handy. Blame me.

It's possible to encounter things we can't handle in Cmm (unsupported primops, etc). So AsteriusModule also contains compile-time error messages when something isn't supported, but the errors are not reported, instead they are deferred to runtime error messages. (Ideally link-time, but it turns out to be hard)

The symbols are simply converted to Z-encoded strings that also contain module prefixes, and they are assumed to be unique across different compilation units.

The store

There's an AsteriusStore type in Asterius.Types. It's an immutable data structure that maps symbols to underlying entities in the expression IR for every single module, and is a critical component of the linker.

Modeling the store as a self-contained data structure makes it pleasant to write linker logic, at the cost of exploding RAM usage. So we implemented a poor man's KV store in Asterius.Store which performs lazy-loading of modules: when initializing the store, we only load the symbols, but not the actual modules; only when a module is "requested" for the first time, we perform deserialization for that module.

AsteriusStore supports merging. It's a handy operation, since we can first initialize a "global" store that represents the standard libraries, then make another store based on compiling user input, simply merge the two and we can start linking from the output store.

Post-linking expression IR

At link time, we take AsteriusStore which contains everything (standard libraries and user input code), then performs live-code discovery: starting from a "root symbol set" (something like Main_main_closure), iteratively fetch the entity from the store, traverse the AST and collect new symbols. When we reach a fixpoint, that fixpoint is the outcome of dependency analysis, representing a self-contained Wasm module.

We then do some rewriting work on the self contained module: making symbol tables, rewriting symbols to absolute addresses, using our own relooper to convert from control-flow graphs to structured control flow, etc. Most of the logic is in Asterius.Resolve.

The output of linker is Module. It differs from AsteriusModule, and although it shares quite some datatypes with AsteriusModule (for example, Expression), it guarantees that some variants will not appear (for example, Unresolved*). A Module is ready to be fed to a backend which emits real Wasm binary code.

There are some useful linker byproducts. For example, there's LinkReport which contains mappings from symbols to addresses which will be lost in Wasm binary code, but is still useful for debugging.

Generating binary code via binaryen

Once we have a Module (which is essentially just Haskell modeling of binaryen C API), we can invoke binaryen to validate it and generate Wasm binary code. The low-level bindings are maintained in the binaryen package, and Asterius.Marshal contains the logic to call the imported functions to do actual work.

Generating binary code via wasm-toolkit

We can also convert Module to IR types of wasm-toolkit, which is our native Haskell Wasm engine. It's now the default backend of ahc-link, but the binaryen backend can still be chosen by ahc-link --binaryen.

Generating JavaScript stub script

To make it actually run in Node.js/Chrome, we need two pieces of JavaScript code:

Common runtime which can be reused across different Asterius compiled modules. It's in asterius/rts/rts.js.
Stub code which contains specific information like error messages, etc.

The linker generates stub script along with Wasm binary code, and concats the runtime and the stub script to a self-contained JavaScript file which can be run or embedded. It's possible to specify JavaScript "target" to either Node.js or Chrome via ahc-link flags.

The runtime debugging feature

There is a runtime debugging mode which can be enabled by the --debug flag for ahc-link. When enabled, the compiler inserts "tracing" instructions in the following places:

The start of a function/basic block
SetLocal when the local type is I64
Memory load/stores, when the value type is I64

The tracing messages are quite helpful in observing control flow transfers and memory operations. Remember to also use the --output-link-report flag to dump the linking report, which contains mapping from data/function symbols to addresses.

The runtime debugging mode also enables a "memory trap" which intercepts every memory load/store instruction and checks if the address is null pointer or other uninitialized regions of the linear memory. The program immediately aborts if an invalid address is encountered. (When debugging mode is switched off, program continues execution and the rest of control flow is all undefined behavior!)

Virtual address spaces

Remember that we're compiling to wasm32 which has a 32-bit address space, but the host GHC is actually 64-bits, so all pointers in Asterius are 64-bits, and upon load/store/call_indirect, we truncate the 64-bit pointer, using only the lower 32-bits for indexing.

The higher 32-bits of pointers are idle tag bits at our disposal, so, we implemented simple virtual address spaces. The linker/runtime is aware of the distinction between:

The physical address, which is either an i32 index of the linear memory for data, or an i32 index of the table for functions.
The logical address, which is the i64 pointer value we're passing around.

All access to the memory/table is achieved by using the logical address. The access operations are accompanied by a mapping operation which translates a logical address to a physical one. Currently it's just a truncate, but in the future we may get a more feature-complete mmap/munmap implementation, and some additional computation may occur when address translation is done.

We chose two magic numbers (in Asterius.Internals.MagicNumber) as the tag bits for data/function pointers. The numbers are chosen so that when applied, the logical address does not exceed JavaScript's safe integer limit.

When we emit debug log entries, we may encounter various i64 values. We examine the higher 32-bits, and if it matches the pointer tag bits, we do a lookup in the data/function symbol table, and if there's a hit, we output the symbol along the value. This spares us the pain to keep a lot of symbol/address mappings in our working memory when examining the debug logs. Some false positives (e.g. some random intermediate i64 value in a Haskell computation accidentally collides with a logical address) may exist in theory, but the probability should be very low.

Note that for consistency between vanilla/debug mode, the virtual address spaces are in effect even in vanilla mode. This won't add extra overhead, since the truncate instruction for 64-bit addresses has been present since the beginning.

Complete list of emitted debugging log entries

Assertions: some hand-written WebAssembly functions in Asterius.Builtins contain assertions which are only active in debugging mode. Failure of an assertion causes a string error message to be printed, and the whole execution flow aborted.
Memory traps: In Asterius.MemoryTrap, we implement a rewriting pass which rewrites all load/store instructions into invocations of load/store wrapper functions. The wrapper functions are defined in Asterius.Builtins, which checks the address and traps if it's an invalid one (null pointer, uninitialized region, etc).
Control-flow: In Asterius.Tracing, we implement a rewriting pass on functions (which are later invoked at link-time in Asterius.Resolve), which emits messages when:
- Entering a Cmm function.
- Entering a basic block. To make sense of block ids, you need to dump pre-linking IRs (which isn't processed by the relooper yet, and preserves the control-flow graph structure)
- Assigning a value to an i64 local. To make sense of local ids, dump IRs. Also note that the local ids here don't match the actual local ids in Wasm binary code (there is a re-mapping upon serialization), but it shouldn't be a problem since we are debugging the higher level IR here.

Dumping IRs

There are multiple ways to dump IRs:

Via GHC flags: GHC flags like -ddump-to-file -ddump-cmm-raw dump pretty-printed GHC IRs to files.
Via environment variable: Set the ASTERIUS_DEBUG environment variable, then during booting, a number of IRs (mainly raw Cmm in its AST form, instead of pretty-printed form) will be dumped.
Via ahc-link flag: Use ahc-link --output-ir to dump IRs when compiling user code.

High-level architecture

The asterius project is hosted at GitHub. The monorepo contains several packages:

asterius. This is the central package of the asterius compiler.
binaryen. It contains the latest source code of the C++ library binaryen in tree, and provides complete raw bindings to its C API.
ghc-toolkit. It provides a framework for implementing Haskell-to-X compilers by retrieving ghc's various types of in-memory intermediate representations. It also contains the latest source code of ghc-prim/integer-gmp/integer-simple/base in tree.
wasm-toolkit. It implements the WebAssembly AST and binary encoder/decoder in Haskell, and is now the default backend for generating WebAssembly binary code.

The asterius package provides an ahc executable which is a drop-in replacement of ghc to be used with Setup configure. ahc redirects all arguments to the real ghc most of the time, but when it's invoked with the --make major mode, it invokes ghc with its frontend plugin. This is inspired by Edward Yang's How to integrate GHC API programs with Cabal.

Based on ghc-toolkit, asterius implements a ghc frontend plugin which translates Cmm to binaryen IR. The serialized binaryen IR can then be loaded and linked to a WebAssembly binary (not implemented yet). The normal compilation pipeline which generates native machine code is not affected.

About "booting"

In order for asterius to support non-trivial Haskell programs (that is, at least most things in Prelude), it needs to run the compilation process for base and its dependent packages. This process is known as "booting".

The asterius package provides an ahc-boot test suite which tests booting by compiling the wired-in packages provided by ghc-toolkit and using ahc to replace ghc when configuring. This is inspired by Joachim Breitner's veggies.

Writing WebAssembly code in Haskell

In Asterius.Builtins, there are WebAssembly shims which serve as our runtime. We choose to write WebAssembly code in Haskell, using Haskell as our familiar meta-language.

As of now, there are two ways of writing WebAssembly code in Haskell. The first way is directly manipulating AST types as specified in Asterius.Types. Those types are pretty bare-metal and maps closely to binaryen IR. Simply write some code to generate an AsteriusFunction, and ensure the function and its symbol is present in the store when linking starts. It will eventually be bundled into output WebAssembly binary file.

Directly using Asterius.Types is not a pleasant experience, it's basically a DDoS on one's working memory, since the developer needs to keep a lot of things in mind: parameter/local ids, block/loop labels, etc. Also, the resulting Haskell code is pretty verbose, littered with syntactic noise (e.g. tons of list concats when constructing a block)

We now provide an EDSL in Asterius.EDSL to construct an AsteriusFunction. Its core type is EDSL a, and can be composed with a Monad or Monoid interface. Most builtin functions in Asterius.Builtins are already refactored to use this EDSL. Typical usages:

"Allocate" a parameter/local. Use param or local to obtain an immutable Expression which corresponds to the value of a new parameter/local. There are also mutable variants.
An opaque LVal type is provided to uniformly deal with local reads/assignments and memory loads/stores. Once an LVal is instantiated, it can be used to read an Expression in the pure world, or set an Expression in the EDSL monad.
Several side-effecting instructions can simply be composed with the monadic/monoidal interface, without the need to explicitly construct an anonymous block.
When we need named blocks/loops with branching instructions inside, use the block/loop combinators which has the type (Label -> EDSL ()) -> EDSL (). Inside the passed in continuation, we can use break' to perform branching. The Label type is also opaque and cannot be inspected, the only thing we know is that it's scope-checked just like any ordinary Haskell value, so it's impossible to accidentally branch to an "inner" label.

The EDSL only checks for scope safety, so we don't mess up different locals or jump to non-existent labels. Type-safety is not guaranteed (binaryen validator checks for it anyway). Underneath it's just a shallow embedded DSL implemented with a plain old state monad. Some people call it the "remote monad design pattern".

WebAssembly as a Haskell compilation target

There are a few issues to address when compiling Cmm to WebAssembly.

Implementing Haskell Stack/Heap

The Haskell runtime maintains a TSO(Thread State Object) for each Haskell thread, and each TSO contains a separate stack for the STG machine. The WebAssembly platform has its own "stack" concept though; the execution of WebAssembly is based on a stack machine model, where instructions consume operands on the stack and push new values onto it.

We use the linear memory to simulate Haskell stack/heap. Popping/pushing the Haskell stack only involves loading/storing on the linear memory. Heap allocation only involves bumping the heap pointer. Running out of space will trigger a WebAssembly trap, instead of doing GC.

All discussions in the documentation use the term "stack" for the Haskell stack, unless explicitly stated otherwise.

Implementing STG machine registers

The Haskell runtime makes use of "virtual registers" like Sp, Hp or R1 to implement the STG machine. The NCG(Native Code Generator) tries to map some of the virtual registers to real registers when generating assembly code. However, WebAssembly doesn't have language constructs that map to real registers, so we simply implement Cmm local registers as WebAssembly locals, and global registers as fields of StgRegTable.

Handling control flow

WebAssembly currently enforces structured control flow, which prohibits arbitrary branching. Also, explicit tail calls are missing.

The Cmm control flow mainly involves two forms of branching: in-function or cross-function. Each function consists of a map from hoopl labels to basic blocks and an entry label. Branching happens at the end of each basic block.

In-function branching is relatively easier to handle. binaryen provides a "relooper" which can recover WebAssembly instructions with structured control flow from a control-flow graph. Note that we're using our own relooper though, see issue #22 for relevant discussion.

Cross-function branching (CmmCall) is tricky. WebAssembly lacks explicit tail calls, and the relooper can't be easily used in this case since there's a computed goto, and potential targets include all Cmm blocks involved in linking. There are multiple possible ways to handle this situation:

Collect all Cmm blocks into one function, additionally add a "dispatcher" block. All CmmCalls save the callee to a register and branch to the "dispatcher" block, and the "dispatcher" uses br_table or a binary decision tree to branch to the entry block of callee.
One WebAssembly function for one CmmProc, and upon CmmCall the function returns the function id of callee. A mini-interpreter function at the top level repeatedly invoke the functions using call_indirect. This approach is actually used by the unregisterised mode of ghc.

We're using the latter approach: every CmmProc marshals to one WebAssembly function. This choice is tightly coupled with some other functionalities (e.g. debug mode) and it'll take quite some effort to switch away.

Handling relocations

When producing a WebAssembly binary, we need to map CLabels to the precise linear memory locations for CmmStatics or the precise table ids for CmmProcs. They are unknown when compiling individual modules, so binaryen is invoked only when linking, and during compiling we only convert CLabels to some serializable representation.

Currently WebAssembly community has a proposal for linkable object format, and it's prototyped by lld. We'll probably turn to that format and use lld some day, but right now we'll simply stick to our own format for simplicity.

The word size story

Although wasm64 is scheduled, currently only wasm32 is implemented. However, we are running 64-bit ghc, and there are several places which need extra care:

The load/store instructions operate on 64-bit addresses, yet wasm32 use uint32 when indexing into the linear memory.
The CmmSwitch labels are 64-bit. CmmCondBranch also checks a 64-bit condition. br_if/br_table operates on uint32.
Only i32/i64 is supported by wasm32 value types, but in Cmm we also need arithmetic on 8-bit/16-bit integers.

We insert instructions for converting between 32/64-bits in the codegen. The binaryen validator also helps checking bit lengths.

As for booleans: there's no native boolean type in either WebAssembly or Cmm. As a convention we use uint32.

Pages and addresses

The WebAssembly linear memory has a hard-coded page size of 64KB. There are several places which operate in units of pages rather than raw bytes:

CurrentMemory/GrowMemory
Memory component of a Module

When performing final linking, we layout static data segments to the linear memory. We ensure the memory size is always divisible by MBLOCK_SIZE, so it's easy to allocate new mega blocks and calculate required page count.

The first 8 bytes of linear memory (from 0x0 to 0x7) are uninitialized. 0x0 is treated as null pointer, and loading/storing on null pointer or other uninitialized regions is prohibited. In debug mode the program immediately aborts.

Using experimental WebAssembly features

By default, Asterius only emits code that uses WebAssembly MVP features. There are flags to make use of WebAssembly experimental features:

--tail-calls: Emits tail call opcodes for Cmm function calls; overrides the default trampoline approach. Only supported by the wasm-toolkit backend at the moment.
--debug: Uses i64 BigInt integration for passing i64 values between js/wasm.

The above features require specific flags to switch on in V8. They are known to work in latest Node.js 12.x versions, and we test them on CI.

The V8 team maintains a Node.js 13.x build which integrates V8 trunk, described here. It's possible to use that build to evaluate experimental WebAssembly features; we provide a script which unzips the latest test-passing build to the current directory, so it's possible to use the node binary for testing bleeding-edge Wasm features in V8.

We are keeping an eye on the development of experimental WebAssembly features. Here is a list of V8 tracking issues of the features we are interested in. Some are already available in recent Node.js or Chromium releases.

Hacking guide

Using VSCode remote containers

We recommend using VSCode Remote Containers to reproduce the very same dev environment used by our core team members. The steps to set up the dev environment are:

Do a local clone of the asterius repo
Install VSCode (at least 1.45) and its remote extension
Install podman, and make sure the podman command works with the current user
Set up a docker symlink which points to podman, according to VSCode announcement of podman support
docker pull terrorjack/asterius:dev
Open the asterius repo with remote containers

Opening the repo with remote containers for the first time will take some time, since it runs the build script to build asterius and perform booting. Later re-opening will be near instant, since it reuses the previous container.

The dev image shall work with docker too if the userns-remap related settings are correctly set up. Check the documentation section for relevant explanation; when using docker with default settings, there is a file permission issue when mounting your local filesystem into the prebuilt container images.

Using `direnv`

If direnv is enabled, the PATH of the current shell session will be extended to include the locations of Asterius executables. This means it's possible to run ahc-link .. instead of stack exec ahc-link -- ...

Hacking with `ghcid`

A known-to-work workflow of hacking Asterius is using ghcid. We also include an example .ghcid file, so running ghcid at the project root directory shall work out of the box.

Some notes regarding the usage of ghcid:

Multiple lib targets can be loaded at once, but only one main target (exe/test) can be loaded. When hacking a specific exe/test, modify the local utils/ghcid.sh script first. Before committing changes in the Haskell codebase, it would be nice to run stack build --test --no-run-tests to make sure all executables are not broken by lib changes.

To boot or not to boot

As described in the building guide, stack build only builds the Asterius compiler itself; additionally we need to run stack exec ahc-boot to run the compiler on the boot libs. This process is typically only needed once, but there are cases when it needs to be re-run:

The boot libs in ghc-toolkit/boot-libs are modified.
The Asterius.Types module is modified, so the IR types have changed.
The Asterius.CodeGen module is modified and you're sure different code will be generated when compiling the same Haskell/Cmm files.

Most other modifications in the Asterius lib/exes won't need a reboot. Specifically:

Asterius.Builtins modifications don't impact the boot cache. The builtin module is generated on the fly with every linker invocation.

When rebooting, run utils/reboot.sh in the project root directory, so that we can ensure the booting is used with the up-to-date version of asterius and the boot lib sources.

The ahc-boot process is configurable via these environment variables:

ASTERIUS_CONFIGURE_OPTIONS
ASTERIUS_BUILD_OPTIONS
ASTERIUS_INSTALL_OPTIONS

Doing profiled builds

Doing profiled builds within a local git tree

Use stack-profile.yaml to overwrite stack.yaml, and then run utils/reboot.sh to kick off the rebooting process. This will be quite slow due to the nature of profiled builds; all libraries will be rebuilt with the profiled flavor. Better to perform a profiled build in a standalone git tree.

Once the profiled build is complete, it's possible to use RTS flags to obtain profile data when compiling Haskell sources. At runtime there are two ways to pass RTS flags to a Haskell executable:

The GHCRTS environment variable
The +RTS ... -RTS command line arguments

Always use GHCRTS when running programs like ahc-link, since those programs can spawn other processes (e.g. ahc-ld), and we're often interested in the profile data of all Asterius executables. The GHCRTS environment variable can propagate to all processes.

See the relevant section in the GHC user guide for more information on profiling Haskell apps. There are also some third party applications useful for analyzing the profiling data, e.g. eventlog2html, ghc-prof-flamegraph.

Fow now, a major problem with the profiled build is: it seems to emit dysfunctional code which doesn't work. Consequently, this affects the TH runner, so any dependencies relying on TH isn't supported by the profiled build.

Measuring time/allocation differences

When working on a performance-related PR, we often want to measure the time/allocation differences it introduced. The workflow is roughly:

Perform two profiled builds with Docker; one builds from the master branch, one from the PR's branch.
Run ahc-link in the built images on the example program below, setting the necessary GHCRTS to generate the profile reports. The code should be put in two standalone directories, otherwise the .hi/.o files may conflict or be accidentally reused.

The profiled Docker images contain pre-compiled Cabal. And the example program we use to stress-test the linker is:


import Distribution.Simple
main = defaultMain

We choose this program since it's classic, and although being short, it pulls in a lot of data segments and functions, so it exposes the linker's performance bottleneck pretty well.

Adding a test case

To add a test case, it is best to replicate what has been done for an existing testcase.

For example, git grep bytearraymini should show all the places where the test case bytearraymini has been used. Replicating the same files for a new test case should "just work".

Code formatting

In Asterius we use ormolu for formatting Haskell and prettier for formatting JavaScript. Though not all parts of the codebase are currently formatted this way, it is recommended that when you submit a PR you run the respective formatters on the changed parts of the code, so that gradually the whole codebase is formatted uniformly.

Hacking on `build01`

This section is for Tweagers only.

First, set up your build01 account according to the handbook. Don't forget to add the groups = ["docker"] line in your PR.

Once the PR is merged, you can SSH into a NixOS non-privileged user. You can check out the asterius repo, set up your favorite text editor, make edits and push to the remote.

To build/boot and run tests, a dev container needs to be built first. The dev.rootless.Dockerfile can be used to build an image which has the same UID with your user and doesn't mess up local file permissions:


$ docker build --build-arg UID=$(id -u) --file dev.rootless.Dockerfile --tag my_dev_image .

Building the image can take around 10min. After my_dev_image is built, a dev container can be started:


$ docker run -it -v $(pwd):/asterius -w /asterius --name my_dev_container my_dev_image

The command above will start my_dev_container from my_dev_image, mount the current project directory to /asterius and drop into the bash prompt, from where you can run build commands.

After exiting the current bash prompt of my_dev_container, it can be restarted later:


$ docker start -ai my_dev_container

If you're using VSCode remote SSH, the first attempt to set up will fail. A known to work workaround is available at https://github.com/microsoft/vscode-remote-release/issues/648#issuecomment-503148523.

Reading list

Here is a brief list of relevant readings about GHC internals and WebAssembly suited for newcomers.

GHC documentation regarding the GHC API: a nice reading for anyone looking forward to using the GHC API.
GHC commentary: a wiki containing lots of additional knowledge regarding GHC's implementation. Keep in mind some content is out-dated though. Some useful entries regarding this project:
- Building guide. A tl;dr for this section is our CI scripts.
- Overview of pipeline: we use the Hooks mechanism (specifically, runPhaseHook) to replace the default pipeline with our own, to enable manipulation of in-memory IRs.
- How STG works: a nice tutorial containing several examples of compiled examples, illustrating how the generated code works under the hood.
- The Cmm types: it's outdated and the types don't exactly match the GHC codebase now, but the explanations still shed some light on how the current Cmm types work.
- The runtime system: content regarding the runtime system.
Understanding the Stack: A blog post explaining how generated code works at the assembly level. Also, its sequel Understanding the RealWorld
The WebAssembly spec: a useful reference regarding what's already present in WebAssembly.
The binaryen C API: binaryen handles WebAssembly code generation. There are a few differences regarding binaryen AST and WebAssembly AST, the most notable ones:
- binaryen uses a recursive BinaryenExpression which is side-effectful. The original WebAssembly standard instead uses a stack-based model and manipulates the operand stack with instructions.
- binaryen contains a "Relooper" which can recover high-level structured control flow from a CFG. However the relooper doesn't handle jumping to unknown labels (aka computed goto), so we don't use it to handle tail calls.

The following entries are papers which consume much more time to read, but still quite useful for newcomers:

Making a fast curry: push/enter vs. eval/apply for higher-order languages: A thorough explanation of what is STG and how it is implemented (via two different groups of rewrite rules, also with real benchmarks)
The STG runtime system (revised): Includes some details on the runtime system and worth a read. It's a myth why it's not merged with the commentary though. Install a TeX distribution like TeX Live or use a service like Overleaf to compile the .tex file to .pdf before reading.
The GHC storage manager: Similar to above.
Bringing the Web up to Speed with WebAssembly: The PLDI'17 paper about WebAssembly. Contains overview of WebAssembly design rationales and rules of small-step operational semantics.

Finally, the GHC codebase itself is also a must-read, but since it's huge we only need to check relevant parts when unsure about its behavior. Tips on reading GHC code:

There are a lot of insightful and up-to-date comments which all begin with "Notes on xxx". It's a pity the notes are neither collected into the sphinx-generated documentation or into the haddock docs of GHC API.
When writing build.mk for compiling GHC, add HADDOCK_DOCS = YES to ensure building haddock docs of GHC API, and EXTRA_HADDOCK_OPTS += --quickjump --hyperlinked-source to enable symbol hyperlinks in the source pages. This will save you tons of time from greping the ghc codebase.
greping is still unavoidable in some cases, since there's a lot of CPP involved and they aren't well handled by haddock.

Project status & roadmap

Overview

The Asterius project has come a long way and some examples with complex dependencies already work. It's still less mature than GHCJS though; see the next section for details.

In general, it's hard to give ETA for "production readiness", since improvements are continuous, and we haven't collected enough use cases from seed users yet. For more insight into what comes next for this project, we list our quarterly roadmap here.

Besides the goals in each quarter, we also do regular maintenance like dependency upgrades and bugfixes. We also work on related projects (mainly haskell-binaryen and inline-js) to ensure they are kept in sync and also useful to regular Haskell developers.

What works now

Almost all GHC language features (TH support is partial, cross-splice state persistence doesn't work yet).
The pure parts in standard libraries and other packages. IO is achieved via rts primitives or user-defined JavaScript imports.
Importing JavaScript expressions via the foreign import javascript syntax. First-class garbage collected JSVal type in Haskell land.
Preliminary copying GC, managing both Haskell heap objects and JavaScript references.
Cabal support. Use ahc-cabal to compile libraries and executables. Support for custom Setup.hs is limited.
Marshaling between Haskell/JavaScript types based on aeson.
Calling Haskell functions from JavaScript via the foreign export javascript syntax. Haskell closures can be passed between the Haskell/JavaScript boundary via StablePtr.
Invoking RTS API on the JavaScript side to manipulate Haskell closures and trigger evaluation.
A linker which performs aggressive dead-code elimination, based on symbol reachability.
A debugger which checks invalid memory access and outputs memory loads/stores and control flow transfers.
Complete binaryen raw bindings, plus a monadic EDSL to construct WebAssembly code directly in Haskell.
wasm-toolkit: a Haskell library to handle WebAssembly code, which already powers binary code generation.
Besides WebAssembly MVP and BigInt, there are no special requirements on the underlying JavaScript engine at the moment.

What may stop one from using Asterius right now

Lack of JavaScriptCore/Safari support, due to incomplete JavaScript BigInt support at the moment.
Runtime bugs. The generated code comes with a complex hand-written runtime which is still buggy at times. The situation is expected to improve once we're able to work with an IR more high-level than Cmm and shave off the current hand-written garbage collector; see the 2020 Q3 section for more details.
GHCJS projects aren't supported out of the box. Major incompatibilities include:
- Word sizes differ. Asterius is still 64-bit based at the moment.
- JSFFI syntax and semantics differ. Asterius uses Promise-based async JSFFI and GHCJS uses callbacks.
- Cabal handles GHCJS and Asterius differently.
Lack of Nix support.
Lack of GHCi support.
TH support is not 100% complete; certain TH API which require preserving state across splices (e.g. getQ/putQ) don't work yet.
Cabal tests and benchmarks can't be run out of the box.
Custom Setup.hs support is limited. If it has setup-deps outside GHC boot libs, it won't work.
Lack of profiling support for generated code.
Excessive memory usage when linking large programs.

Quarterly roadmap

2021 Q3

For the past months before this update, I took a break from the Asterius project and worked on a client project instead. There's a saying "less is more", and I believe my absense in this project for a few months is beneficial in multiple ways:

I gained a lot more nix-related knowledge.
Purging the short-term memory on the project and coming back, this gives me some insight on the difficulties of onboarding new contributors.
After all, it was a great mental relief to work on something which I was definitely not a bottleneck of the whole project.

Before I took the break, Asterius was stuck with a very complex & ad-hoc build system, and it was based on ghc-8.8. The most production-ready major version of ghc is ghc-8.10 today. Therefore, Q3 goals and roadmap has been adjusted accordingly:

Upgrade Asterius to use ghc-8.10. The upgrade procedure should be principled & documented, so someone else can repeat this when Asterius upgrades to ghc-9.2 in the future.
Use cabal & nix as the primary build system.

What has been achieved so far:

There is a new ghc fork dedicated for asterius at https://github.com/tweag/ghc-asterius. It's based on ghc-8.10 branch, the previous asterius-specific patches have all been ported, and I implemented nix-based logic to generate cabal-buildable ghc api packages to be used by Asterius, replacing the previous ad-hoc python script.
There is a WIP branch of ghc-8.10 & nix support at https://github.com/tweag/asterius/pull/860. Most build errors in the host compiler have been fixed, and the booting logic will be fixed next.
A wasi-sdk/wasi-libc fork is also maintained in the tweag namespace. It's possible to configure our ghc fork with wasm32-unknown-wasi triple now, so that's a good start for future work of proper transition of Asterius to a wasi32 backend of ghc.

Remaining work of Q3 will be wrapping up #860 and merging it to master.

Beyond Q3, the overall plan is also guided by the "less is more" principle: to reduce code rather than to add, leveraging upstream logic whenever possible, while still maintaing and even improving end-user experience. Many hacks were needed in the past due to various reasons, and after all the lessons learned along the way, there are many things that should be shaved off:

The hacks related to 64-bit virtual address space. Reusing host GHC API which targets 64-bit platform for Asterius was the easiest way to get the MVP working, but given we have much better knowledge about how cross-compiling in ghc works, these hacks needs to go away.
Custom object format and linking logic. This was required since Asterius needed to record a lot of Haskell-specific info in the object files: JSFFI imports/exports, static pointer table, etc. However, with runtime support, these custom info can all be replaced by vanilla data sections in the wasm or llvm bitcode object files.
Following the entry above, most of the existing wasm codegen logic. It looks possible to leverage the llvm codegen, only adding specific patches to support features like JSFFI.
Most of the existing JavaScript runtime. They will be gradually replaced by cross-compiled ghc rts for the wasi32 target, component after component. The ultimate goal is to support generating self-contained JavaScript-less wasm modules which work in runtimes beyond browsers/nodejs (that's why we stick to wasi-sdk instead of emscripten in the first place).

2021 Q1

In 2020 Q4 we mainly delivered:

Use standalone stage-1 GHC API packages and support building Asterius using vanilla GHC.
Remove numerous hacks and simplify the codebase, e.g.:
- Make ahc a proper GHC frontend exe, support ahc -c on non-Haskell sources
- Use vanilla archives and get rid of custom ahc-ar
Refactor things incompatible with 32-bit pointer convention, e.g.:
- Proper heap layout for JSVal# closures
- Remove higher 32-bit data/function address tags

In 2021 Q1, the primary goals are:

Finish transition to 32-bit code generation.
Improve C/C++ support, including support for integer-gmp and cbits in common packages.

The plan to achieving above goals:

Audit the current code generator & runtime and remove everything incompatible with 32-bit pointer convention.
For the time being, favor simplicity/robustness over performance. Some previous optimizations may need to be reverted temporarily to simplify the codebase and reduce the refactoring overhead.
Use wasi-sdk as the C toolchain to configure the stage-1 GHC and finish the transition.

A longer term goal beyond Q1 is upstreaming Asterius as a proper wasm backend of GHC. We need to play well with wasi-sdk for this to happen, so another thing we're working on in Q1 is: refactor the linker infrastructure to make it LLVM-compliant, which means managing non-standard entities (e.g. static pointers, JSFFI imports/exports) in a standard-compliant way.

2020 Q4

In 2020 Q3 we mainly delivered:

PIC(Position Independent Code) support. We worked on PIC since in the beginning, we thought it was a prerequisite of C/C++ support. Turned out it's not, but still PIC will be useful in the future when we implement dynamic linker and ghci support.
Initial C/C++ support, using wasi-sdk to compile C/C++ sources. Right now this doesn't work Cabal yet, so the C/C++ sources need to be manually added to asterius/libc to be compiled and linked. We already replaced quite some legacy runtime shims with actual C code (e.g. cbits in bytestring/text), and more will come in the future.

Proper C/C++ support requires Asterius to be a proper wasm32-targetting cross GHC which is configured to use wasi-sdk as the underlying toolchain. The immediate benefits are:

Get rid of various hacks due to word size mismatch in the code emitted by Asterius and wasi-sdk. Some packages (e.g. integer-gmp) are incompatible with these hacks.
Implement proper Cabal integration and support cbits in user packages.
Improve code size and runtime performance, getting rid of the i64/i32 pointer casting everywhere.
Get rid of BigInt usage in the JavaScript runtime, and support running generated code in Safari.

Thus the goal of 2020 Q4 is finishing the 32-bit cross GHC transition. The steps to achieve this is roughly:

Detangle the host/wasm GHC API usage. Asterius will shift away from using ghc of the host GHC and instead use its own stage-1 GHC API packages.
Fix various issues when configuring GHC to target wasm32-wasi and using wasi-sdk as the toolchain.
Refactor the code generator and the runtime to work with the new 32-bit pointer convention.

2020 Q3

Work in 2020 Q3 is focused on:

Introducing C/C++ toolchain support. The first step is to introduce libc in the generated wasm code, and use libc functionality to replace certain runtime functionality (e.g. memory management). Once we're confident our runtime and generated code is compatible with libc, we'll look into building & linking C source files in Haskell packages.
Research on a high-level variant of Cmm which abstracts away closure representation and can be efficiently mapped to platforms providing host garbage collection (e.g. wasm-gc, JavaScript, JVM). This will enable us to avoid relying on a hand-written custom garbage collector and improve the runtime reliability significantly.

Project Milestones, January 2022 edition

The goals for Asterius are described on the page WebAssembly goals on the GHC Wiki. This document describes some milestones on the path to those goals.

Getting to JavaScript-free functionality

Although JavaScript interoperation is the big use case, much of the support needed for WebAssembly is independent of JavaScript.

Codegen: New back end

A new back end will have to be defined in a way that fits into GHC's existing structure.

GHC support required:

~~Either a new value constructor for Backend or (likely) changes to the NcgImpl record~~ Make the Backend type abstract, add a new value constructor for it.

Codegen: handling arbitrary control-flow graphs

WebAssembly lacks goto and provides only structured control flow: loops, blocks, if statements, and multilevel continue/break. A Cmm control-flow graph must be converted to this structured control.

Status: A prototype has been implemented and tested, but the prototype works only on reducible control-flow graphs. A transformation from irreducible to reducible CFGs has yet to be implemented.

GHC support required:

Dominator analysis on CmmGraph

Codegen: fit linking information into standard object files

The Asterius prototype emits object files that are represented in a custom format. This format contains ad hoc information that can be handled only by a custom linker. The information currently stored in custom object files must either be expressed using standard object files that conform to C/C++ toolchain convention, or it must be eliminated.

Status: All information currently emitted by the Asterius prototype can be expressed using standard object files, with one exception: JSFFI records. We plan to turn these records into standard data segments whose symbols will be reachable from related Haskell functions. Such segments can be handled by a standard C/C++ linker. The data segments will be consumed by the JavaScript adjunct to GHC's run-time system, which will use them to reconstruct imported and exported functions.

GHC support required:

None

Codegen: implement WebAssembly IR and binary encoder

Rather than attempt to prettyprint WebAssembly directly from Cmm, the WebAssembly back end will first translate Cmm to an internal representation of a WebAssembly module, tentatively to be called WasmModule. A WasmModule can be serialized to the standard WebAssembly binary format.

A preliminary design might look like this:

A WasmModule contains sections
A section may contain functions, memory segments or other metadata
A function body is control flow (WasmStmt ...)
Control flow may contain straight-line code
Straight-line code may be a tree structure or may be a sequence of Wasm instructions

Status: Except for that WasmStmt fragment, which contains the WebAssembly control-flow constructs, the internal representation has yet to be defined. And we have yet to reach consensus on whether we wish to be able to emit both textual and binary WebAssembly, or whether we prefer to emit only binary WebAssembly and to rely on an external disassembler to produce a more readable representation. (External assemblers are apparently not good enough to be able to rely on emitting only a textual representation.)

GHC support required:

None

Codegen: implement Cmm to WebAssembly IR codegen

We need a translator from CmmGroup to WasmModule. Our prototype relooper translates CmmGraph to WasmStmt ..., and the other parts of the translation should mostly be a 1-to-1 mapping. Some Cmm features can be translated in more than one way:

Global registers. We can use the in-memory register table as in unregisterised mode, or one WebAssembly global for each global register, or use WebAssembly multi-value feature to carry the registers around. Start with WebAssembly globals first, easy to implement, should be reasonably faster than memory load/store.
Cmm tail calls. We can use WebAssembly experimental tail calls feature, or do trampolining by making each Cmm function return its jump target. Since WebAssembly tail calls is not widely implemented in engines yet, start with trampolining.

Status: Not started, but given the rich experience with the Asterius prototype, no difficulties are anticipated.

GHC support required:

None

Build system

The build system has to be altered to select the proper C code for the WebAssembly target. We're hoping for the following:

The build system can build and package the run-time system standalone.
The build system can easily cross-compile from a POSIX host to the Wasm target.
A developer can instruct the build system to choose Wasm-compatible features selectively to build and test on a POSIX platform (so-called "feature vector").

Meeting these goals will require both conditional build rules and CPP macros for code specific to wasm32-wasi.

Status: Not yet begun.

GHC support required:

Coordination with the cross-compilation team (Sylvain Henry, John Ericson)

RTS: avoid `mmap`

The run-time storage manager uses mmap and munmap to allocate and free MBlocks. But mmap and munmap aren't available on the WASI platform, so we need to use standard libc allocation routines instead.

Status: we implemented the patch, tested with WebAssembly, i386 and x64-without-large-address-space.

GHC support required:

New directory rts/wasi to go alongside rts/posix and rts/win32.
Altered logic in rts/rts.cabal.in and elsewhere to use conditional compilation to select OSMem.c from the rts/wasi directory.

RTS: replace the timer used in the scheduler

The run-time system currently uses a timer to know when to deliver a Haskell Execution Context (virtual CPU) to another Haskell thread. But the timer is implemented using pthreads and POSIX signals, which are not available on WebAssembly---so it has to go. We'll need some other method for deciding when to switch contexts.

This change will remove dependencies on pthreads and on a POSIX signal (VTALRM).

Status: We have patched the run-time system to disable that timer, and we have tested the patch on POSIX. In this patch, the scheduler does a context switch at every heap-block allocation (as in the -C0 RTS flag). Yet to be done: determine a viable long-term strategy for deciding when to context switch.

GHC support required:

Patches to scheduler, of a detailed nature to be specified later

RTS: replace other uses of POSIX signals

The run-time system depends on the signals API in various ways: it can handle certain OS signals, and it can even support setting Haskell functions as signal handlers. Such functionality, which inherently depends on signals, must be made conditional on the target platform.

There is already a RTS_USER_SIGNALS CPP macro that guards some signal logic, but not all. To make signals truly optional, more work is needed.

Status: In progress.

GHC support required:

Not yet known

RTS: port libffi to WebAssembly

libffi is required for dynamic exports to C. It's technically possible to port libffi to either pure WebAssembly or WebAssembly+JavaScript.

Status: Not yet implemented.

GHC support required:

Likely none.

Milestones along the way to full JavaScript interoperability

(The audience for this section is primarily the Asterius implementation team, but there are a few things that ought to be communicated to other GHC implementors.)

RTS for JSFFI: representing and garbage-collecting foreign references

When Haskell interoperates with JavaScript, Haskell objects need to be able to keep JavaScript objects alive and vice versa, even though they live on different heaps. Similarly, JavaScript needs to be able to reclaim JavaScript objects once there are no more references to them.

We propose to extend GHC with a new primitive type JSVal#, whose closure payload is a single word. The JavaScript adjunct uses this word to index into an internal table. After each major garbage collection, the collector notifies the JavaScript adjunct of all live JSVal# closures. The adjunct uses this report to drop its references to JavaScript objects that cannot be reached from the Haskell heap.

Status: Not yet implemented.

GHC support required:

Build-system support for the JavaScript adjunct to the RTS
New primitive type JSVal#
Patch to the garbage collector to report live JSVal# closures.

RTS: API/semantics for scheduling and JavaScript foreign calls

Write down and document whatever API is needed for calls across the Haskell/JavaScript boundary and for sharing the single CPU among both Haskell threads and JavaScript's event loop. Ideal documentation would include a small-step operational semantics.

Status: Work in progress

GHC support required:

Coordinate with GHCJS team (unclear at what stage)

RTS: Scheduler issues

GHC's scheduler will need to be altered to support an event-driven model of concurrency. The details are work in progress.

Draft semantics of concurrency and foreign calls.

Note: This document assumes that every function takes exactly one argument. Just imagine that it's the last argument in a fully saturated call.

Foreign export asynchronous

Suppose that a Haskell function f is exported to JavaScript asynchronously (which might be the default). When JavaScript calls the exported function with argument v, it has the effect of performing the IO action ⟦f⟧ v, where the translation ⟦f⟧ is defined as follows:


⟦f⟧ v = do
   p <- allocate new promise
   let run_f = case try (return $ f $ jsToHaskell v) of
                 Left exn -> p.fails (exnToJS exn)
                 Right a  -> p.succeeds (haskellToJS a)
   forkIO run_f
   return p -- returned to JavaScript

Not specified here is whether the scheduler is allowed to steal a few cycles to run previously forked threads.

N.B. This is just a semantics. We certainly have the option of implementing the entire action completely in the runtime system.

Not yet specified: What is the API by which JavaScript would call an asynchronously exported Haskell function? Would it, for example, use API functions to construct a Haskell closure, then evaluate it?

Foreign import asynchronous

Suppose that a JavaScript function g is imported asynchronously (which might be the default). Let types a and b stand for two unknown but fixed types. The JavaScript function expects an argument of type a and returns a Promise that (if successful) eventually delivers a value of type b. When a Haskell thunk of the form g e is forced (evaluated), the machine performs the following monadic action, the result of which is (eventually) written into the thunk.


do let v = haskellToJS e  -- evaluates e, converts result to JavaScript
  p <- g v               -- call returns a `Promise`, "immediately"
  m <- newEmptyMVar
  ... juju to associate m with p ... -- RTS primitive?
  result <- takeMVar m
  case result of Left fails -> ... raise asynchronous exception ...
                  Right b -> return $ jsToHaskell v

Suppose GHC wishes to say politely to the JavaScript engine, "every so often I would like to use the CPU for a bounded time." It looks like Haskell would need to add a message to the JavaScript message queue, such that the function associated with that messages is "run Haskell for N ticks." Is the right API to call setTimeout with a delay of 0 seconds?

Concurrency sketch

Let's suppose the state of a Haskell machine has these components:

F ("fuel") is the number of ticks a Haskell thread can execute before returning control to JavaScript. This component is present only when Haskell code is running.
R ("running") is either the currently running Haskell thread, or if no thread is currently running, it is • ("nothing")
Q ("run queue") is a collection of runnable threads.
H ("heap") is the Haskell heap, which may contain MVars and threads that are blocked on them.

Components R and H are used linearly, so they can be stored in global mutable state.

The machine will enjoy a set of labeled transitions such as are described in Simon PJ's paper on the "Awkward Squad." Call these the "standard transitions." (The awkward-squad machine state is a single term, formed by the parallel composition of R with all the threads of Q and all the MVars of H. The awkward squad doesn't care about order, but we do.) To specify the standard transitions, we could add an additional clock that tells the machine when to switch the running thread R out for a new thread from the queue. Or we could leave the context switch nondeterministic, as it is in the awkward-squad paper. Whatever seems useful.

Every state transition has the potential use to fuel. Fuel might actually be implemented using an allocation clock, but for semantics purposes, we can simply decrement fuel at each state transition, then gate the standard transitions on the condition F > 0.

At a high level, every invocation of Haskell looks the same: JavaScript starts the Haskell machine in a state ⟨F, •, Q, H⟩, and the Haskell machine makes repeated state transitions until it reaches one of two stopping states:

⟨F',•, [], H'⟩: no Haskell threads are left to run
⟨0, ̧R', Q', H'⟩: fuel is exhausted, in which case the machine moves the currently running thread onto the run queue, reaching state ⟨0, ̧•, R':Q', H'⟩

Once one of these states is reached, GHC's runtime system takes two actions:

It allocates a polite request for the CPU and puts that request on the JavaScript message queue, probably using setTimeout with a delay of 0 seconds.
It returns control to JavaScript.

GHC RTS scheduler refactoring

All discussion in this document refers to the non-threaded RTS.

Potential semantics

GHC relies on the scheduler to manage both concurrency and foreign calls. Foreign calls are in play because most foreign calls are asynchronous, so implementing a foreign call requires support from the scheduler. A preliminary sketch of possible semantics can be found in file semantics.md.

JavaScript user experience

I have foo.hs. I can compile to foo.wasm and foo.js. foo.wasm is a binary artifact that needs to be shipped with foo.js, nothing else you need to know about this file. foo.js conforms to some JavaScript module standard and exports a JavaScript object. Say this object is foo.

For each exported top-level Haskell function, foo contains a corresponding async method. Consider the most common case main :: IO (), then you can call foo.main(). For something like fib :: Int -> Int, you can do let r = await foo.fib(10) and get the number result in r. The arguments and result can be any JavaScript value, if the Haskell type is JSVal.

Now, suppose we await foo.main(), and main finished successfully. The RTS must remain alive, because:

main might have forked other Haskell threads, those threads are expected to run in the background.
main might have dynamically exported a Haskell function closure as a JSFunction. This JSFunction is passed into the outside JavaScript world, and it is expected to be called back some time in the future.

Notes regarding error handling: any unhandled Haskell exception is converted to a JavaScript error. Likewise, any JavaScript error is converted to a Haskell exception.

Notes regarding RTS startup: foo encapsulates some RTS context. That context is automatically initialized no later than the first time you call any method in foo.

Notes regarding RTS shutdown: not our concern yet. As long as the browser tab is alive, the RTS context should be alive.

Primer

ghc-devs thread: Thoughts on async RTS API?

ghc commentary: scheduler

Consider a native case...

Suppose we'd like to run some Haskell computation from C (e.g. the main function). After the RTS state is initialized, we need to:

If the Haskell function expects arguments, call the rts_mk* functions in RtsAPI.h to convert C argument values to Haskell closures. Call rts_apply repeatedly to apply the Haskell function closure to argument closures, until we end up with a closure of Haskell type IO a or a, ready to be evaluated.
Call one of the eval functions in RtsAPI.h. The eval function creates a TSO(Thread State Object), representing the Haskell thread where the computation happens.
The eval function does some extra bookkeeping, then enters the scheduler loop.
The scheduler loop exits when the initial Haskell thread finishes. The thread return value and exit code is recorded.
The eval function retrieves the thread return value and exit code. We need to check whether the thread completed successfully, if so, we can call one of rts_get* functions in RtsAPI.h to convert the result Haskell closure to C value.

The key logic is in the schedule function which implements the scheduler loop. The implementation is quite complex, for now we only need to keep in mind:

In each iteration, the Haskell thread being run is not necessarily the initial thread we created to kick off evaluation. New threads may get forked and executed, but the loop exits only when the initial thread finishes!
Threads may block due to a variety of reasons, they will be suspended and resumed as needed. It may be possible that all live threads are blocked, RTS will attempt to make progress by collecting file descriptors related to blocking I/O and do a select() call, to ensure I/O can proceed for at least one file descriptor.

The problem

Suppose we'd like to call an async JavaScript function and get the result in Haskell:


foreign import javascript safe "fetch($1)" js_fetch :: JSRequest -> IO JSResponse

In Haskell, when js_fetch returns, the actual fetch() call should have already resolved; if it rejected, then an exception should be raised in Haskell.

Now, the main thread calls js_fetch at some point, no other threads involved. According to previous section, the current call stack is something like:


main -> rts_evalLazyIO -> scheduleWaitThread -> schedule -> fetch

The Haskell code does a fetch() call (or it arranges the RTS to perform one). fetch() will immediately return a Promise handle. Now what? What do we do with this Promise thing? More importantly, the scheduler loop can't make any progress! The Haskell thread is blocked, suspended, the run queue is empty, the RTS scheduler only knows about posix blocking read/write, so it doesn't know how to handle this situation.

After fetch() returns, the call stack is:


main -> rts_evalLazyIO -> scheduleWaitThread -> schedule

Remember the "run-to-completion" principle of the JavaScript concurrency model! We're currently inside some JavaScript/WebAssembly function, which counts as a single tick in the entire event loop. The functions we're running right now must run to completion and return, only after that, the fetch() result can become available.

And also remember how the WebAssembly/JavaScript interop works: you can only import synchronous JavaScript functions, and export WebAssembly functions as synchronous JavaScript functions. Every C function in RTS that we cross-compile to WebAssembly is also synchronous, no magic blocking or preemptive context switch will ever take place!

What we need

All the scheduler-related synchronous C functions in RTS, be it rts_eval* or schedule, they only return when the initial Haskell thread completes. We must teach these functions to also return when the thread blocks, at least when blocking reason is beyond conventional posix read/write.

Here's how things should look like after the scheduler is refactored:

There are async flavours of scheduler functions. When they return, the Haskell thread may have completed, or may have been blocked due to some reason. In that case, the returned blocking info will contain at least one file descriptor or Promise related to blocking, and also the blocked thread ids.
When we do async JavaScript calls, we attach resolve/reject handlers to the returned Promise. These handlers will resume the entire RTS and carry-on Haskell computation.
Since any Haskell thread may perform async JavaScript call, all Haskell functions are exported as async JavaScript functions. A Promise is returned immediately, but it's resolved/rejected in the future, when the corresponding Haskell thread runs to completion.

Potential milestones

RTS: integrating foreign event loops

Draft: The RTS scheduler is synchronous. If you call rts_eval* to enter the scheduler and do some evaluation, it'll only return when the relevant Haskell thread is completed or killed. This model doesn't work if we want to be able to call async foreign functions without blocking the entire RTS. The root of this problem: the scheduler loop has no knowledge about foreign event loops.

Status: we have looked into this, and based on our experience in Asterius, the implementation plan is as follows:

Add CPS-style async versions of rts_eval* RTS API functions. Original sync versions continue to work, but panics with a reasonable error message when unsupported foreign blocking event occurs.
The scheduler loop is broken down into "ticks". Each tick runs to the point when some Haskell computation finishes or blocks, much like a single iteration in the original scheduler loop. The scheduler ticks can be plugged into a foreign event loop, so Haskell evaluation fully interleaves with other foreign computation.

GHC support required:

Restructuring of the current scheduler.

RTS: make usage of `select`/`poll` optional

In the current non-threaded RTS, when there are no immediately runnable Haskell threads, a select() call will be performed on all the file descriptors related to blocking. The call returns when I/O is possible for at least one file descriptor, therefore some Haskell thread blocked on I/O can be resumed.

This may work for us when we target pure wasm32-wasi instead of the browser. The WASI standard defines a poll_oneoff syscall, and wasi-libc implements select()/poll() using this syscall.

However, this doesn't work well with JavaScript runtime (or any foreign event loop in general). poll() calls are blocking calls, so they can block the entire event loop, hang the browser tab and prevent "real work" (e.g. network requests) from proceeding.

Status: we have looked into this, and there are roughly two possible approaches:

Use the binaryen "asyncify" wasm rewriting pass to instrument the linked wasm module, to implement the blocking behavior of poll_oneoff without actually blocking the entire event loop. Easy to implement, but it's a very ugly hack that also comes with penalty in code size and performance.
Restructure the scheduler, so that for non-threaded RTS, each scheduler tick will not attempt to do a blocking poll() call at all. The higher-level caller of scheduler ticks will be in charge of collecting blocking I/O events and handling them.

GHC support required:

Same as previous subsection