Asterius is a Haskell to WebAssembly compiler based on GHC. It
compiles simple Haskell source files or Cabal executable targets to
WebAssembly+JavaScript code which can be run in node.js or browsers.
It features seamless JavaScript interop (lightweight Async FFI with
Promise
support) and small output code (~600KB hello.wasm
for a
Hello World). A
lot of common Haskell packages like lens
are already supported. The
project is actively maintained by Tweag I/O.
Contributors
Asterius is maintained by Tweag I/O.
Have questions? Need help? Tweet at @tweagio.
Overview
Asterius compiles Haskell code to WebAssembly (Wasm). Its frontend is based on GHC.
The Asterius pipeline provides everything to create a Wasm instance which
exports the foreign exported functions (e.g. main
) that can be called from
JavaScript to execute the main Haskell program.
Asterius pipeline
Using prebuilt container images
We host prebuilt container images on Docker Hub under the
terrorjack/asterius
repository. The images work with podman
or
docker
.
About versioning
Whenever the master
branch gets a new commit, we trigger an image build on our
infrastructure. After the build completes, we push to the
terrorjack/asterius:latest
tag. When trying asterius
locally, it's
recommended to use terrorjack/asterius:latest
since it follows master
closely.
The images are built with the gitrev
label to indicate the exact asterius
repository revision. Use docker inspect terrorjack/asterius | grep "gitrev"
to
find out the revision info.
You may want to stick with a specific version of the prebuilt image for some
time for more reproducibility in e.g. CI builds. In that case, browse for the
tags
page and use an image with a specific tag, e.g. terrorjack/asterius:200520
. We
always push a versioned tag first before we update the latest
tag.
Using the image
We recommend podman
for running containers from our prebuilt images. The
following commands are compatible with docker
as well; simply change podman
to docker
.
The images can be used interactively. Navigate to the project directory and use
the following command to start an interactive bash
session, mounting the
current directory to /workspace
. In the bash
session we can use tools like
ahc-cabal
, ahc-dist
or ahc-link
to compile the Haskell sources.
terrorjack@hostname:/project$ podman run -it --rm -v $(pwd):/workspace -w /workspace terrorjack/asterius
root@hostname:/workspace#
It's also possible to use the images in a non-interactive manner:
terrorjack@hostname:/project$ podman run --rm -v $(pwd):/workspace -w /workspace terrorjack/asterius ahc-link --input-hs example.hs
Check the reference of the docker run
command for
details. podman run
accepts most arguments of docker run
and has its own extensions.
podman
-specific tips
When using the prebuilt image with podman
, things should work out of the box
with the default configuration. Check the official installation
guide on how to install
podman
in your environment. It's likely that you'd like to use podman
with a
non-root user, in which case make sure to check the official
tutorial
for non-root users before usage.
docker
-specific tips
When using the prebuilt image with docker
, there's a file permission problem
with the default configuration: the default user in the container is root
, and
the processes will be run with the host root
users as well. So programs like
ahc-link
will create output files owned by root
in the host file system,
which is a source of annoyance. Things still work fine as long as you don't mind
manually calling chown
to fix the permissions.
The proper solution is remapping the root
user inside the container to the
current non-root user. See the docker official
userns-remap
guide and
this blog post
for further explanation.
Building guide
Building and using asterius
locally
Asterius is organized as a stack
project at the moment. The reason is mainly
historical: stack
has builtin support for managing different sandboxed GHC
installations, and we used to require a custom GHC fork to build, so using
stack
has been more convenient.
In principle, building with cabal
should also work, but this hasn't been
tested on CI yet. Some additional work is needed (checking in generated .cabal
files, setting up a cabal
project, etc) and PRs are welcome.
System dependencies
In addition to regular GHC dependencies, these dependencies are needed in the local environment:
git
binaryen
(at leastversion_98
)automake
,autoconf
(required byahc-boot
)cabal
(at leastv3.0.0.0
)node
,npm
(at leastv12
)python3
stack
wasi-sdk
(theWASI_SDK_PREFIX
environment variable must point to the installation)
Preparing the source tree
After checking out, one needs to run a script to generate the in-tree private GHC API packages required by Asterius.
$ mkdir lib
$ pushd lib
$ ../utils/make-packages.py
$ rm -rf ghc
$ popd
The make-packages.py
script will checkout our custom GHC
fork, run hadrian
to generate some
autogen files, and generate several Haskell packages in lib
. A run takes ~5min
on CI. This script only needs to be run once. After that, Asterius can be built
using vanilla GHC.
If it's inconvenient to run make-packages.py
, it's also possible to download
the generated packages from the CI artifacts. Check the CI log of a recent
commit, and one of the artifacts is named lib
. Download and unzip it in the
project root directory.
Building asterius
After checking out and running make-packages.py
, simply run stack build asterius
to build it.
After the asterius
package is built, run stack exec ahc-boot
to perform
booting. This will compile the standard libraries to WebAssembly and populate
the asterius
global package database. Some packages are compiled using
ahc-cabal
in the boot process, so internet is required at least for the first
boot.
Calling executables of asterius
After the booting process completes, it's possible to use stack exec
to call
executables of asterius
, e.g. ahc-link
or ahc-cabal
. Although it's
possible to use stack install asterius
to install the executables to somewhere
in PATH
and directly call them later, this is not recommended, since the
asterius
executables rely on certain components in the PATH
set up by stack exec
.
If direnv
is enabled, then the shell session can
automatically set up the correct PATH
when navigating into the asterius
project directory. Thus it's possible to directly call ahc-boot
for booting,
ahc-link
for compiling, etc.
For trying small examples, it's convenient to put them in the test
directory
under the project root directory, since it's a .gitignore
item, so they won't
be tracked by git
.
Building and using asterius
with Docker
Using the prebuilt Docker image
The recommended way of trying asterius
is using our prebuilt Docker image on
Docker Hub. The image is updated
regularly upon new master
branch commits, and also ships ~2k prebuilt
packages from a recent stackage
snapshot, so it's convenient to test simple examples which use common
dependencies without needing to set up a cabal
project.
To use the image, mount the working directory containing the Haskell source code
as a Docker shared volume, then use the ahc-link
program:
username@hostname:~/project$ docker run --rm -it -v $(pwd):/project -w /project terrorjack/asterius
asterius@hostname:/project$ ahc-link --input-hs main.hs
Check the official
reference of docker run
to learn more about the command given in the example above. The example
opens an interactive bash
session for exploration, but it's also possible to
use docker run
to invoke the Asterius compiler on local Haskell source files.
Note that podman
can be used instead of docker
here.
Building the Docker images
The prebuilt Docker image can be reproduced by building from the in-tree
Dockerfile
s.
base.Dockerfile
can be used for building the base image. The base image
contains an out-of-the-box installation of asterius
, but doesn't come with the
additional stackage packages. There's very aggressive trimming logic in
base.Dockerfile
to make the image slimmer, so in the resulting base image,
there isn't a complete stack
project directory for asterius
, and it's not
possible to modify the Haskell logic of asterius
and partially rebuild/reboot
it given a base image.
stackage.Dockerfile
can be used for building the image containing additional
stackage packages upon the base image. Modify lts.sh
for adding/removing
packages to be built into the final image, and
ghc-toolkit/boot-libs/cabal.config
for modifying the package version
constraints. All the stackage packages are installed into the asterius
global
package database, so they can be directly used by ahc-link
, but this shouldn't
affect ahc-cabal
for installing other versions of those packages elsewhere.
The image for VSCode remote containers
dev.Dockerfile
is used to build terrorjack/asterius:dev
, which is the image
for VSCode remote containers.
Cabal support
Asterius now has preliminary Cabal support. By substituting toolchain
executables like ghc
/ghc-pkg
and supplying some other configure options,
Cabal can build static libraries and "executables" using Asterius. The
"executables" can be quickly converted to node/web artifacts using ahc-dist
.
We also provide ahc-cabal
which is a wrapper for cabal
. ahc-cabal
works
with typical nix-style commands like new-update
/new-build
, etc. The legacy
commands with v1
prefix may also work.
Using ahc-link
/ahc-dist
ahc-link
is the frontend program of Asterius. It takes a Haskell Main
module
and optionally an ES6 "entry" module as input, then emits a .wasm
WebAssembly
binary module and companion JavaScript files, which can then be run in
environments like Node.js or browsers.
ahc-dist
works similarly, except it takes the pseudo-executable file generated
from ahc-cabal
as input. All command-line arguments are the same as
ahc-link
, except ahc-link
takes --input-hs
, while ahc-dist
takes
--input-exe
.
Quick examples
Compiling a Haskell file, running the result with node
immediately: ahc-link --input-hs hello.hs --run
Compiling for browsers, bundling JavaScript modules to a single script:
ahc-link --input-hs hello.hs --browser --bundle
Compiling a Cabal executable target: ahc-cabal new-install --installdir . hello && ahc-dist --input-exe hello --run
Reference
--input-hs ARG
The Haskell Main
module's file path. This option is mandatory; all others are
optional. Works only for ahc-link
.
The Main
module may reference other local modules, as well as packages in the
asterius
global package database.
--input-exe ARG
The pseudo-executable file path. A pseudo-executable is produced by using
ahc-cabal
to compile a Cabal executable target. This works only for
ahc-dist
, and is also mandatory.
--input-mjs ARG
The ES6 "entry" module's file path. If not specified, a default entry module
will be generated, e.g. xxx.hs
's entry script will be xxx.mjs
. The entry
module can either be run by node
, or included in a <script>
tag, depending
on the target supplied at link time.
It's possible to override the default behavior by specifying your own entry module. The easiest way to write a custom entry module is to modify the default one:
import * as rts from "./rts.mjs";
import module from "./xxx.wasm.mjs";
import req from "./xxx.req.mjs";
module
.then(m => rts.newAsteriusInstance(Object.assign(req, { module: m })))
.then(i => {
i.exports.main();
});
xxx.wasm.mjs
and xxx.req.mjs
are generated at link-time. xxx.wasm.mjs
exports a default value, which is a Promise
resolving to a
WebAssembly.Module
value. xxx.req.mjs
exports the "request object"
containing app-specific data required to initialize an instance. After adding
the module
field to the request object, the result can be used as the input to
newAsteriusInstance
exported by rts.mjs
.
newAsteriusInstance
will eventually resolve to an Asterius instance object.
Using the instance object, one can call the exported Haskell functions.
--output-directory ARG
Specifies the output directory. Defaults to the same directory of --input-hs
.
--output-prefix
ARG
Specifies the prefix of the output files. Defaults to the base filename of
--input-hs
, so for xxx.hs
, we generate xxx.wasm
, xxx.req.mjs
, etc.
--verbose-err
This flag will enable more verbose runtime error messages. By default, the data segments related to runtime messages and the function name section are stripped in the output WebAssembly module for smaller binary size.
When reporting a runtime error in the asterius
issue tracker, it is
recommended to compile and run the example with --verbose-err
so there's more
info available.
--no-main
This is useful for compiling and linking a non-Main
module. This will pass
-no-hs-main
to GHC when linking, and the usual i.exports.main()
main
function won't be available.
Note that the default entry script won't work for such modules, since there
isn't an exported main
functions, but it's still possible to export other
Haskell functions and call them from JavaScript; do not forget to use
--export-function=..
to specify those functions.
--browser
Indicates the output code is targeting the browser environment. By default, the target is Node.js.
Since the runtime contains platform-specific modules, the compiled
WebAssembly/JavaScript code only works on a single specific platform. The
pseudo-executable generated by ahc
or ahc-cabal
is platform-independent
though; it's possible to compile Haskell to a pseudo-executable, and later use
ahc-dist
to generate code for different platforms.
--bundle
Instead of generating a bunch of ES6 modules in the target directory, generate a
self-contained xxx.js
script, and running xxx.js
has the same effect as
running the entry module. Only works for the browser target for now.
--bundle
is backed by webpack
under the hood and
performs minification on the bundled JavaScript file. It's likely beneficial
since it reduces the total size of scripts and doesn't require multiple requests
for fetching them.
--tail-calls
Enable the WebAssembly tail call opcodes. This requires Node.js/Chromium to be
called with the --experimental-wasm-return-call
flag.
See the "Using experimental WebAssembly features" section for more details.
--optimize-level=N
Set the optimize level of binaryen
. Valid values are 0
to 4
. The default
value is 4
.
Check the relevant source code in binaryen
for the passes enabled for
different optimize/shrink levels
here.
--shrink-level=N
Set the shrink level of binaryen
. Valid values are 0
to 2
. The default
value is 2
.
--ghc-option ARG
Specify additional ghc options. The {-# OPTIONS_GHC #-}
pragma also works.
--run
Runs the output code using node
. Ignored for browser targets.
--debug
Switch on the debug mode. The memory trap will be enabled, which replaces all load/store instructions in WebAssembly with load/store functions in JavaScript, performing aggressive validity checks on the addresses.
--yolo
Switch on the yolo mode. Garbage collection will never occur, instead the storage manager will simply allocate more memory upon heap overflows. This is mainly used for debugging potential gc-related runtime errors.
--gc-threshold=N
Set the gc threshold value to N
MBs. The default value is 64
. The storage
manager won't perform actual garbage collection if the size of active heap
region is below the threshold.
--no-gc-sections
Do not run dead code elimination.
--export-function ARG
For each foreign export javascript
function f
that will be called, a
--export-function=f
link-time flag is mandatory.
--extra-root-symbol ARG
Specify a symbol to be added to the "root symbol set". Root symbols and their transitive dependencies will survive dead code elimination.
--output-ir
Output Wasm IRs of compiled Haskell modules and the resulting module. The IRs
aren't intended to be consumed by external tools like binaryen
/wabt
.
--console-history
The stdout
/stderr
of the runtime will preserve the already written content.
The UTF-8 decoded history content can be fetched via
i.stdio.stdout()
/i.stdio.stderr()
. These functions will also clear the
history when called.
This flag can be useful when writing headless Node.js or browser tests and the
stdout
/stderr
contents need to be compared against a file.
JavaScript FFI
Asterius implements JSFFI, which enables importing sync/async JavaScript code, and exporting static/dynamic Haskell functions. The JSFFI syntax and semantics is inspired by JSFFI in GHCJS, but there differ in certain ways.
Marshaling data between Haskell and JavaScript
Directly marshalable value types
There are mainly 3 kinds of marshalable value types which can be directly used as function arguments and return values in either JSFFI imports or exports:
- Regular Haskell value types like
Int
,Ptr
,StablePtr
, etc. When theMagicHash
andUnliftedFFITypes
extensions are enabled, some unboxed types likeInt#
are also supported. - The
JSVal
type and itsnewtype
s. - The
Any
type.
The JSVal
type is exported by
Asterius.Types
.
It represents an opaque JavaScript value in the Haskell world; one can use JSFFI
imports to obtain JSVal
values, pass them across Haskell/JavaScript, store
them in Haskell data structures like ordinary Haskell values. JSVal
s are
garbage collected, but it's also possible to call freeJSVal
to explicitly free
them in the runtime.
The Any
type in GHC.Exts
represents a boxed Haskell value, which is a
managed pointer into the heap. This is only intended to be used by power users.
Just like regular ccall
imports/exports, the result type of javascript
imports/exports can be wrapped in IO
or not.
The JSVal
family of types
Other than JSVal
, Asterius.Types
additionally exports these types:
JSArray
JSFunction
JSObject
JSString
JSUint8Array
They are newtype
s of JSVal
and can be directly used as argument or result
types as well. The runtime doesn't perform type-checking at the JavaScript side,
e.g. it won't check if typeof $1 === "string"
when $1
is declared as a
JSString
. It's up to the users to guarantee the runtime invariants about such
JSVal
wrapper types.
User-defined newtype
s of JSVal
can also be used as marshalable value types,
as long as the newtype
constructor is available in scope.
Marshaling structured data
Given the ability of passing simple value types, one can implement their own utilities for passing a piece of structured data either from JavaScript to Haskell, or vice versa.
To build a Haskell data structure from a JavaScript value, usually we write a builder function which recursively traverses the substructure of the JavaScript value (sequence, tree, etc) and build up the Haskell structure, passing one cell at a time. Similarly, to pass a Haskell data structure to JavaScript, we traverse the Haskell data structure and build up the JavaScript value.
The Asterius standard library provides functions for common marshaling purposes:
import Asterius.Aeson
import Asterius.ByteString
import Asterius.Text
import Asterius.Types
fromJSArray :: JSArray -> [JSVal]
toJSArray :: [JSVal] -> JSArray
fromJSString :: JSString -> String
toJSString :: String -> JSString
byteStringFromJSUint8Array :: JSUint8Array -> ByteString
byteStringToJSUint8Array :: ByteString -> JSUint8Array
textFromJSString :: JSString -> Text
textToJSString :: Text -> JSString
jsonToJSVal :: ToJSON a => a -> JSVal
jsonFromJSVal :: FromJSON a => JSVal -> Either String a
jsonFromJSVal' :: FromJSON a => JSVal -> a
The 64-bit integer precision problem
Keep in mind that when passing 64-bit integers via Int
, Word
, etc, precision
can be lost, since they're represented by number
s on the JavaScript side. In
the future, we may consider using bigint
s instead of number
s as the
JavaScript representations of 64-bit integers to solve this issue.
JSFFI imports
JSFFI import syntax
import Asterius.Types
foreign import javascript unsafe "new Date()" current_time :: IO JSVal
foreign import javascript interruptible "fetch($1)" fetch :: JSString -> IO JSVal
The source text of foreign import javascript
should be a single valid
JavaScript expression, using $n
to refer to the n
-th argument (starting from
1
). It's possible to use IIFE(Immediately Invoked Function Expression) in the
source text, so more advanced JavaScript constructs can be used.
Sync/async JSFFI imports
The safety level in a foreign import javascript
declaration indicates whether
the JavaScript logic is asynchronous. When omitted, the default is unsafe
,
which means the JavaScript code will return the result synchronously. When
calling an unsafe
import, the whole runtime blocks until the result is
returned from JavaScript.
The safe
and interruptible
levels mean the JavaScript code should return a
Promise
which later resolves with the result. The current thread will be
suspended when such an import function is called, and resumed when the Promise
resolves or rejects. Other threads may continue execution when a thread is
blocked by a call to an async import.
Error handling in JSFFI imports
When calling a JSFFI import function, The JavaScript code may synchronously
throw exceptions or reject the Promise
with errors. They are wrapped as
JSException
s and thrown in the calling thread, and the JSException
s can be
handled like regular synchronous exceptions in Haskell. JSException
is also
exported by Asterius.Types
; it contains both a JSVal
reference to the
original JavaScript exception/rejection value, and a String
representation of
the error, possibly including a JavaScript stack trace.
Accessing the asterius instance object
In the source text of a foreign import javascript
declaration, one can access
everything in the global scope and the function arguments. Additionally, there
is an __asterius_jsffi
binding which represents the Asterius instance object.
__asterius_jsffi
exposes certain interfaces for power users, e.g.
__asterius_jsffi.exposeMemory()
which exposes a memory region as a JavaScript
typed array. The interfaces are largely undocumented and not likely to be useful
to regular users.
There is one usage of __asterius_jsffi
which may be useful to regular users
though. Say that we'd like the JSFFI import code to call some 3rd-party library
code, but we don't want to pollute the global scope; we can assign the library
functions as additional fields of the Asterius instance object after it's
returned by newAsteriusInstance()
, then access them using __asterius_jsffi
in the JSFFI import code.
JSFFI exports
JSFFI static exports
foreign export javascript "mult_hs" (*) :: Int -> Int -> Int
The foreign export javascript
syntax can be used for exporting a static
top-level Haskell function to JavaScript. The source text is the export
function name, which must be globally unique. The supported export function
types are the same with JSFFI imports.
For the exported functions we need to call in JavaScript, at link-time, each
exported function needs an additional --export-function
flag to be passed to
ahc-link
/ahc-dist
, e.g. --export-function=mult_hs
.
In JavaScript, after newAsteriusInstance()
returns the Asterius instance
object, one can access the exported functions in the exports
field:
const r = await i.exports.mult_hs(6, 7);
Note that all exported Haskell functions are async JavaScript functions. The
returned Promise
resolves with the result when the thread successfully
returns; otherwise it may reject with a JavaScript string, which is the
serialized form of the Haskell exception if present.
It's safe to call a JSFFI export function multiple times, or call another JSFFI
export function before a previous call resolves/rejects. The export functions
can be passed around as first-class JavaScript values, called as ordinary
JavaScript functions or indirectly as JavaScript callbacks. They can even be
imported back to Haskell as JSVal
s and called in Haskell.
JSFFI dynamic exports
import Asterius.Types
foreign import javascript "wrapper" makeCallback :: (JSVal -> IO ()) -> IO JSFunction
foreign import javascript "wrapper oneshot" makeOneshotCallback :: (JSVal -> IO ()) -> IO JSFunction
freeHaskellCallback :: JSFunction -> IO ()
The foreign import javascript "wrapper"
syntax can be used for exporting a
Haskell function closure to a JavaScript function dynamically. The type
signature must be of the form Fun -> IO JSVal
, where Fun
represents a
marshalable JSFFI function type in either JSFFI imports or static exports, and
the result can be JSVal
or its newtype
.
After declaring the "wrapper" function, one can pass a Haskell function closure
to it and obtain the JSVal
reference of the exported JavaScript function. The
exported function can be used in the same way as the JSFFI static exports.
When a JSFFI dynamic export is no longer useful, call freeHaskellCallback
to
free it. The JSVal
reference of the JavaScript callback as well as the
StablePtr
of the Haskell closure will be freed.
Sometimes, we expect a JSFFI dynamic export to be one-shot, being called for
only once. For such one-shot exports, use foreign import javascript "wrapper oneshot"
. The runtime will automatically free the resources once the exported
JavaScript is invoked, and there'll be no need to manually call
freeHaskellCallback
for one-shot exports.
Template Haskell
We added
hooks
to these iserv
-related functions:
startIServ
stopIServ
iservCall
readIServ
writeIServ
The hook of hscCompileCoreExpr
is also used. The implementation of the hooks
are in
Asterius.GHCi.Internals
Normally, startIServ
and stopIServ
starts/stops the current iserv
process.
We don't use the normal iserv
library for iserv
though; we use
inline-js-core
to start a node process. inline-js-core
has its own mechanism
of message passing between host/node, which is used for sending JavaScript code
to node for execution and getting results. In the case of TH, the linked
JavaScript and WebAssembly code is sent. Additionally, we create POSIX pipes and
pass the file descriptors as environment variables to the sent code; so most TH
messages are still passed via the pipes, like normal iserv
processes.
The iservCall
function is used for sending a Message
to iserv
and
synchronously getting the result. The sent messages are related to linking, like
loading archives and objects. Normally, linking is handled by the iserv
process, since it's linked with GHC's own runtime linker. In our case, porting
GHC's runtime linker to WebAssembly is going to be a huge project, so we still
perform TH linking in the host ahc
process. The linking messages aren't sent
to node at all; using the hooked iservCall
, we maintain our own in-memory
linker state which records information like the loaded archives and objects.
When splices are executed, GHC first emits a RunTH
message, then repeatedly
queries the response message from iserv
; if it's a RunTHDone
, then the dust
settles and GHC reads the execution result. The response message may also be a
query to GHC, then GHC sends back the query result and repeat the loop. In our
case, we don't send the RunTH
message itself to node; RunTH
indicates
execution has begun, so we perform linking, and use inline-js-core
to load the
linked JavaScript and WebAssembly code, then create and initialize the Asterius
instance object. The splice's closure address is known at link time, so we can
apply the TH runner's function closure to the splice closure, and kick off
evaluation from there. The TH runner function creates a fresh IORef QState
, a
Pipe
from the passed in pipe file descriptors, and uses ghci
library's own
runTH
function to run the splice. During execution, the Quasi
class methods
may be called, and on the node side, they are turned to THMessage
s sent back
to the host via the Pipe
, and the responses are then fetched.
Our function signatures of readIServ
and writeIServ
are modified. Normal GHC
simply uses Get
and Put
in the binary
library for reading/writing via the
Pipe
, but we simply read/write a polymorphic type variable a
, with Binary
and Typeable
constraints. Having Binary
constraint allows fetching the
needed get
and put
functions, and Typeable
allows us to inspect the
message pre-serialization. This is important, since we need to catch RunTH
or
RunModFinalizer
messages. As mentioned before, these messages aren't sent to
node, and we have special logic to handle them.
As for hscCompileCoreExpr
: it's used for compiling the CoreExpr
of a splice
and getting the resulting RemoteHValue
. We don't support GHC bytecode, so we
overload it and go through the regular pipeline, compile it down to Cmm, then
WebAssembly, finally performing linking, using the closures of the TH runner
function and the splice as "root symbols". The resulting RemoteHValue
is not
"remote" though; it's simply the static address of the splice's closure, and the
TH runner function will need to encapsulate it as a RemoteRef
before feeding
to runTH
.
TH WIP branch:
asterius-TH
GitHub Project with relevant issues
Invoking RTS API in JavaScript
For the brave souls who prefer to play with raw pointers instead of syntactic sugar, it's possible to invoke RTS API directly in JavaScript. This grants us the ability to:
- Allocate memory, create and inspect Haskell closures on the heap.
- Trigger Haskell evaluation, then retrieve the results back into JavaScript.
- Use raw Cmm symbols to summon any function, not limited to the "foreign exported" ones.
Here is a simple example. Suppose we have a Main.fact
function:
fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n - 1)
The first step is ensuring fact
is actually contained in the final
WebAssembly binary produced by ahc-link
. ahc-link
performs aggressive
dead-code elimination (or more precisely, live-code discovery) by starting from
a set of "root symbols" (usually Main_main_closure
which corresponds to
Main.main
), repeatedly traversing ASTs and including any discovered symbols.
So if Main.main
does not have a transitive dependency on fact
, fact
won't
be included into the binary. In order to include fact
, either use it in some
way in main
, or supply --extra-root-symbol=Main_fact_closure
flag to
ahc-link
when compiling.
The next step is locating the pointer of fact
. The "Asterius instance" type
we mentioned before contains two "symbol map" fields: staticsSymbolMap
maps
static data symbols to linear memory absolute addresses, and
functionSymbolMap
maps function symbols to WebAssembly function table
indices. In this case, we can use i.staticsSymbolMap.Main_fact_closure
as the
pointer value of Main_fact_closure
. For a Haskell top-level function,
there're also pointers to the info table/entry function, but we don't need
those two in this example.
Since we'd like to call fact
, we need to apply it to an argument, build a
thunk representing the result, then evaluate the thunk to WHNF and retrieve the
result. Assuming we're passing --asterius-instance-callback=i=>{ ... }
to
ahc-link
, in the callback body, we can use RTS API like this:
const argument = i.exports.rts_mkInt(5);
const thunk = i.exports.rts_apply(i.staticsSymbolMap.Main_fact_closure, argument);
const tid = i.exports.rts_eval(thunk);
console.log(i.exports.rts_getInt(i.exports.getTSOret(tid)));
A line-by-line explanation follows:
-
Assuming we'd like to calculate
fact 5
, we need to build anInt
object which value is5
. We can't directly pass the JavaScript5
, instead we should callrts_mkInt
, which properly allocates a heap object and sets up the info pointer of anInt
value. When we need to pass a value of basic type (e.g.Int
,StablePtr
, etc), we should always callrts_mk*
and use the returned pointers to the allocated heap object. -
Then we can apply
fact
to5
by usingrts_apply
. It builds a thunk without triggering evaluation. If we are dealing with a curried multiple-arguments function, we should chainrts_apply
repeatedly until we get a thunk representing the final result. -
Finally, we call
rts_eval
, which enters the runtime and perform all the evaluation for us. There are different types of evaluation functions:rts_eval
evaluates a thunk of typea
to WHNF.rts_evalIO
evaluates the result ofIO a
to WHNF.rts_evalLazyIO
evaluatesIO a
, without forcing the result to WHNF. It is also the default evaluator used by the runtime to runMain.main
.
-
All
rts_eval*
functions initiate a new Haskell thread for evaluation, and they return a thread ID. The thread ID is useful for inspecting whether or not evaluation succeeded and what the result is. -
If we need to retrieve the result back to JavaScript, we must pick an evaluator function which forces the result to WHNF. The
rts_get*
functions assume the objects are evaluated and won't trigger evaluation. -
Assuming we stored the thread ID to
tid
, we can usegetTSOret(tid)
to retrieve the result. The result is always a pointer to the Haskell heap, so additionally we need to userts_getInt
to retrieve the unboxedInt
content to JavaScript.
Most users probably don't need to use RTS API manually, since the foreign import
/export
syntactic sugar and the makeHaskellCallback
interface should
be sufficient for typical use cases of Haskell/JavaScript interaction. Though
it won't hurt to know what is hidden beneath the syntactic sugar, foreign import
/export
is implemented by automatically generating stub WebAssembly
functions which calls RTS API for you.
IR types and transformation passes
This section explains various IR types in Asterius, and hopefully presents a
clear picture of how information flows from Haskell to WebAssembly. (There's a
similar section in jsffi.md
which explains implementation details of JSFFI)
Cmm IR
Everything starts from Cmm, or more specifically, "raw" Cmm which satisfies:
-
All calls are tail calls, parameters are passed by global registers like R1 or on the stack.
-
All info tables are converted to binary data segments.
Check Cmm
module in ghc
package to get started on Cmm.
Asterius obtains in-memory raw Cmm via:
-
cmmToRawCmmHook
in our custom GHC fork. This allow us to lay our fingers on Cmm generated by either compiling Haskell modules, or.cmm
files (which are inrts
) -
There is some abstraction in
ghc-toolkit
, the compiler logic is actually in theCompiler
datatype as some callbacks, andghc-toolkit
converts them to hooks, frontend plugins andghc
executable wrappers.
There is one minor annoyance with the Cmm types in GHC (or any other GHC IR type): it's very hard to serialize/deserialize them without setting up complicated contexts related to package databases, etc. To experiment with new backends, it's reasonable to marshal to a custom serializable IR first.
Pre-linking expression IR
We then marshal raw Cmm to an expression IR defined in Asterius.Types
. Each
compilation unit (Haskell module or .cmm
file) maps to one AsteriusModule
,
and each AsteriusModule
is serialized to a .asterius_o
object file which
will be deserialized at link time. Since we serialize/deserialize a structured
expression IR faithfully, it's possible to perform aggressive LTO by
traversing/rewriting IR at link time, and that's what we're doing right now.
The expression IR is mostly a Haskell modeling of a subset of binaryen
's
expression IR, with some additions:
-
Unresolved
related variants, which allow us to use a symbol as an expression. At link time, the symbols are re-written to absolute addresses. -
Unresolved locals/globals. At link time, unresolved locals are laid out to Wasm locals, and unresolved globals (which are really just Cmm global regs) become fields in the global Capability's
StgRegTable
. -
EmitErrorMessage
, as a placeholder of emitting a string error message then trapping. At link time, such error messages are collected into an "error message pool", and the Wasm code is just "calling some error message reporting function with an array index". -
Null
. We're civilized, educated functional programmers and should really be usingMaybe Expression
in some fields instead of adding aNull
constructor, but this is just handy. Blame me.
It's possible to encounter things we can't handle in Cmm (unsupported primops,
etc). So AsteriusModule
also contains compile-time error messages when
something isn't supported, but the errors are not reported, instead they are
deferred to runtime error messages. (Ideally link-time, but it turns out to be
hard)
The symbols are simply converted to Z-encoded strings that also contain module prefixes, and they are assumed to be unique across different compilation units.
The store
There's an AsteriusStore
type in Asterius.Types
. It's an immutable data
structure that maps symbols to underlying entities in the expression IR for
every single module, and is a critical component of the linker.
Modeling the store as a self-contained data structure makes it pleasant to
write linker logic, at the cost of exploding RAM usage. So we implemented a
poor man's KV store in Asterius.Store
which performs lazy-loading of modules:
when initializing the store, we only load the symbols, but not the actual
modules; only when a module is "requested" for the first time, we perform
deserialization for that module.
AsteriusStore
supports merging. It's a handy operation, since we can first
initialize a "global" store that represents the standard libraries, then make
another store based on compiling user input, simply merge the two and we can
start linking from the output store.
Post-linking expression IR
At link time, we take AsteriusStore
which contains everything (standard
libraries and user input code), then performs live-code discovery: starting
from a "root symbol set" (something like Main_main_closure
), iteratively
fetch the entity from the store, traverse the AST and collect new symbols. When
we reach a fixpoint, that fixpoint is the outcome of dependency analysis,
representing a self-contained Wasm module.
We then do some rewriting work on the self contained module: making symbol
tables, rewriting symbols to absolute addresses, using our own relooper to
convert from control-flow graphs to structured control flow, etc. Most of the
logic is in Asterius.Resolve
.
The output of linker is Module
. It differs from AsteriusModule
, and
although it shares quite some datatypes with AsteriusModule
(for example,
Expression
), it guarantees that some variants will not appear (for example,
Unresolved*
). A Module
is ready to be fed to a backend which emits real
Wasm binary code.
There are some useful linker byproducts. For example, there's LinkReport
which contains mappings from symbols to addresses which will be lost in Wasm
binary code, but is still useful for debugging.
Generating binary code via binaryen
Once we have a Module
(which is essentially just Haskell modeling of binaryen
C API), we can invoke binaryen to validate it and generate Wasm binary code.
The low-level bindings are maintained in the binaryen
package, and
Asterius.Marshal
contains the logic to call the imported functions to do
actual work.
Generating binary code via wasm-toolkit
We can also convert Module
to IR types of wasm-toolkit
, which is our native
Haskell Wasm engine. It's now the default backend of ahc-link
, but the
binaryen backend can still be chosen by ahc-link --binaryen
.
Generating JavaScript stub script
To make it actually run in Node.js/Chrome, we need two pieces of JavaScript code:
-
Common runtime which can be reused across different Asterius compiled modules. It's in
asterius/rts/rts.js
. -
Stub code which contains specific information like error messages, etc.
The linker generates stub script along with Wasm binary code, and concats the
runtime and the stub script to a self-contained JavaScript file which can be
run or embedded. It's possible to specify JavaScript "target" to either Node.js
or Chrome via ahc-link
flags.
The runtime debugging feature
There is a runtime debugging mode which can be enabled by the --debug
flag
for ahc-link
. When enabled, the compiler inserts "tracing" instructions in
the following places:
- The start of a function/basic block
SetLocal
when the local type isI64
- Memory load/stores, when the value type is
I64
The tracing messages are quite helpful in observing control flow transfers and
memory operations. Remember to also use the --output-link-report
flag to dump
the linking report, which contains mapping from data/function symbols to
addresses.
The runtime debugging mode also enables a "memory trap" which intercepts every memory load/store instruction and checks if the address is null pointer or other uninitialized regions of the linear memory. The program immediately aborts if an invalid address is encountered. (When debugging mode is switched off, program continues execution and the rest of control flow is all undefined behavior!)
Virtual address spaces
Remember that we're compiling to wasm32
which has a 32-bit address space, but
the host GHC is actually 64-bits, so all pointers in Asterius are 64-bits, and
upon load
/store
/call_indirect
, we truncate the 64-bit pointer, using only
the lower 32-bits for indexing.
The higher 32-bits of pointers are idle tag bits at our disposal, so, we implemented simple virtual address spaces. The linker/runtime is aware of the distinction between:
-
The physical address, which is either an
i32
index of the linear memory for data, or ani32
index of the table for functions. -
The logical address, which is the
i64
pointer value we're passing around.
All access to the memory/table is achieved by using the logical address. The
access operations are accompanied by a mapping operation which translates a
logical address to a physical one. Currently it's just a truncate, but in
the future we may get a more feature-complete mmap
/munmap
implementation,
and some additional computation may occur when address translation is done.
We chose two magic numbers (in Asterius.Internals.MagicNumber
) as the tag
bits for data/function pointers. The numbers are chosen so that when applied,
the logical address does not exceed JavaScript's safe integer limit.
When we emit debug log entries, we may encounter various i64
values. We
examine the higher 32-bits, and if it matches the pointer tag bits, we do a
lookup in the data/function symbol table, and if there's a hit, we output the
symbol along the value. This spares us the pain to keep a lot of symbol/address
mappings in our working memory when examining the debug logs. Some false
positives (e.g. some random intermediate i64
value in a Haskell computation
accidentally collides with a logical address) may exist in theory, but the
probability should be very low.
Note that for consistency between vanilla/debug mode, the virtual address spaces are in effect even in vanilla mode. This won't add extra overhead, since the truncate instruction for 64-bit addresses has been present since the beginning.
Complete list of emitted debugging log entries
-
Assertions: some hand-written WebAssembly functions in
Asterius.Builtins
contain assertions which are only active in debugging mode. Failure of an assertion causes a string error message to be printed, and the whole execution flow aborted. -
Memory traps: In
Asterius.MemoryTrap
, we implement a rewriting pass which rewrites all load/store instructions into invocations of load/store wrapper functions. The wrapper functions are defined inAsterius.Builtins
, which checks the address and traps if it's an invalid one (null pointer, uninitialized region, etc). -
Control-flow: In
Asterius.Tracing
, we implement a rewriting pass on functions (which are later invoked at link-time inAsterius.Resolve
), which emits messages when:- Entering a Cmm function.
- Entering a basic block. To make sense of block ids, you need to dump pre-linking IRs (which isn't processed by the relooper yet, and preserves the control-flow graph structure)
- Assigning a value to an i64 local. To make sense of local ids, dump IRs. Also note that the local ids here don't match the actual local ids in Wasm binary code (there is a re-mapping upon serialization), but it shouldn't be a problem since we are debugging the higher level IR here.
Dumping IRs
There are multiple ways to dump IRs:
-
Via GHC flags: GHC flags like
-ddump-to-file -ddump-cmm-raw
dump pretty-printed GHC IRs to files. -
Via environment variable: Set the
ASTERIUS_DEBUG
environment variable, then during booting, a number of IRs (mainly raw Cmm in its AST form, instead of pretty-printed form) will be dumped. -
Via
ahc-link
flag: Useahc-link --output-ir
to dump IRs when compiling user code.
High-level architecture
The asterius
project is hosted at
GitHub. The monorepo contains several
packages:
-
asterius
. This is the central package of theasterius
compiler. -
binaryen
. It contains the latest source code of the C++ librarybinaryen
in tree, and provides complete raw bindings to its C API. -
ghc-toolkit
. It provides a framework for implementing Haskell-to-X compilers by retrievingghc
's various types of in-memory intermediate representations. It also contains the latest source code ofghc-prim
/integer-gmp
/integer-simple
/base
in tree. -
wasm-toolkit
. It implements the WebAssembly AST and binary encoder/decoder in Haskell, and is now the default backend for generating WebAssembly binary code.
The asterius
package provides an ahc
executable which is a drop-in
replacement of ghc
to be used with Setup configure
. ahc
redirects all
arguments to the real ghc
most of the time, but when it's invoked with the
--make
major mode, it invokes ghc
with its frontend plugin. This is
inspired by Edward Yang's
How to integrate GHC API programs with Cabal.
Based on ghc-toolkit
, asterius
implements a
ghc
frontend plugin
which translates
Cmm to
binaryen
IR. The serialized binaryen
IR can then be loaded and linked to a
WebAssembly binary (not implemented yet). The normal compilation pipeline which
generates native machine code is not affected.
About "booting"
In order for asterius
to support non-trivial Haskell programs (that is, at
least most things in Prelude
), it needs to run the compilation process for
base
and its dependent packages. This process is known as "booting".
The asterius
package provides an ahc-boot
test suite which tests booting by
compiling the wired-in packages provided by ghc-toolkit
and using ahc
to
replace ghc
when configuring. This is inspired by Joachim Breitner's
veggies
.
Writing WebAssembly code in Haskell
In Asterius.Builtins
, there are WebAssembly shims which serve as our runtime.
We choose to write WebAssembly code in Haskell, using Haskell as our familiar
meta-language.
As of now, there are two ways of writing WebAssembly code in Haskell. The first
way is directly manipulating AST types as specified in Asterius.Types
. Those
types are pretty bare-metal and maps closely to binaryen IR. Simply write some
code to generate an AsteriusFunction
, and ensure the function and its symbol
is present in the store when linking starts. It will eventually be bundled into
output WebAssembly binary file.
Directly using Asterius.Types
is not a pleasant experience, it's basically a
DDoS on one's working memory, since the developer needs to keep a lot of things
in mind: parameter/local ids, block/loop labels, etc. Also, the resulting
Haskell code is pretty verbose, littered with syntactic noise (e.g. tons of
list concats when constructing a block)
We now provide an EDSL in Asterius.EDSL
to construct an AsteriusFunction
.
Its core type is EDSL a
, and can be composed with a Monad
or Monoid
interface. Most builtin functions in Asterius.Builtins
are already refactored
to use this EDSL. Typical usages:
-
"Allocate" a parameter/local. Use
param
orlocal
to obtain an immutableExpression
which corresponds to the value of a new parameter/local. There are also mutable variants. -
An opaque
LVal
type is provided to uniformly deal with local reads/assignments and memory loads/stores. Once anLVal
is instantiated, it can be used to read anExpression
in the pure world, or set anExpression
in theEDSL
monad. -
Several side-effecting instructions can simply be composed with the monadic/monoidal interface, without the need to explicitly construct an anonymous block.
-
When we need named blocks/loops with branching instructions inside, use the
block
/loop
combinators which has the type(Label -> EDSL ()) -> EDSL ()
. Inside the passed in continuation, we can usebreak'
to perform branching. TheLabel
type is also opaque and cannot be inspected, the only thing we know is that it's scope-checked just like any ordinary Haskell value, so it's impossible to accidentally branch to an "inner" label.
The EDSL only checks for scope safety, so we don't mess up different locals or jump to non-existent labels. Type-safety is not guaranteed (binaryen validator checks for it anyway). Underneath it's just a shallow embedded DSL implemented with a plain old state monad. Some people call it the "remote monad design pattern".
WebAssembly as a Haskell compilation target
There are a few issues to address when compiling Cmm to WebAssembly.
Implementing Haskell Stack/Heap
The Haskell runtime maintains a TSO(Thread State Object) for each Haskell thread, and each TSO contains a separate stack for the STG machine. The WebAssembly platform has its own "stack" concept though; the execution of WebAssembly is based on a stack machine model, where instructions consume operands on the stack and push new values onto it.
We use the linear memory to simulate Haskell stack/heap. Popping/pushing the Haskell stack only involves loading/storing on the linear memory. Heap allocation only involves bumping the heap pointer. Running out of space will trigger a WebAssembly trap, instead of doing GC.
All discussions in the documentation use the term "stack" for the Haskell stack, unless explicitly stated otherwise.
Implementing STG machine registers
The Haskell runtime makes use of "virtual registers" like Sp, Hp or R1 to
implement the STG machine. The NCG(Native Code Generator) tries to map some of
the virtual registers to real registers when generating assembly code. However,
WebAssembly doesn't have language constructs that map to real registers, so we
simply implement Cmm local registers as WebAssembly locals, and global
registers as fields of StgRegTable
.
Handling control flow
WebAssembly currently enforces structured control flow, which prohibits arbitrary branching. Also, explicit tail calls are missing.
The Cmm control flow mainly involves two forms of branching: in-function or
cross-function. Each function consists of a map from hoopl
labels to basic
blocks and an entry label. Branching happens at the end of each basic block.
In-function branching is relatively easier to handle. binaryen
provides a
"relooper" which can recover WebAssembly instructions with structured control
flow from a control-flow graph. Note that we're using our own relooper though,
see issue #22 for relevant
discussion.
Cross-function branching (CmmCall
) is tricky. WebAssembly lacks explicit tail
calls, and the relooper can't be easily used in this case since there's a
computed goto, and potential targets include all Cmm blocks involved in
linking. There are multiple possible ways to handle this situation:
-
Collect all Cmm blocks into one function, additionally add a "dispatcher" block. All
CmmCall
s save the callee to a register and branch to the "dispatcher" block, and the "dispatcher" usesbr_table
or a binary decision tree to branch to the entry block of callee. -
One WebAssembly function for one
CmmProc
, and uponCmmCall
the function returns the function id of callee. A mini-interpreter function at the top level repeatedly invoke the functions usingcall_indirect
. This approach is actually used by the unregisterised mode ofghc
.
We're using the latter approach: every CmmProc
marshals to one WebAssembly
function. This choice is tightly coupled with some other functionalities (e.g.
debug mode) and it'll take quite some effort to switch away.
Handling relocations
When producing a WebAssembly binary, we need to map CLabel
s to the precise
linear memory locations for CmmStatics
or the precise table ids for
CmmProc
s. They are unknown when compiling individual modules, so binaryen
is invoked only when linking, and during compiling we only convert CLabel
s to
some serializable representation.
Currently WebAssembly community has a
proposal
for linkable object format, and it's prototyped by lld
. We'll probably turn
to that format and use lld
some day, but right now we'll simply stick to our
own format for simplicity.
The word size story
Although wasm64
is scheduled, currently only wasm32
is implemented.
However, we are running 64-bit ghc
, and there are several places which need
extra care:
- The load/store instructions operate on 64-bit addresses, yet
wasm32
useuint32
when indexing into the linear memory. - The
CmmSwitch
labels are 64-bit.CmmCondBranch
also checks a 64-bit condition.br_if
/br_table
operates onuint32
. - Only
i32
/i64
is supported bywasm32
value types, but in Cmm we also need arithmetic on 8-bit/16-bit integers.
We insert instructions for converting between 32/64-bits in the codegen. The
binaryen
validator also helps checking bit lengths.
As for booleans: there's no native boolean type in either WebAssembly or Cmm.
As a convention we use uint32
.
Pages and addresses
The WebAssembly linear memory has a hard-coded page size of 64KB. There are several places which operate in units of pages rather than raw bytes:
CurrentMemory
/GrowMemory
Memory
component of aModule
When performing final linking, we layout static data segments to the linear
memory. We ensure the memory size is always divisible by MBLOCK_SIZE
, so it's
easy to allocate new mega blocks and calculate required page count.
The first 8 bytes of linear memory (from 0x0 to 0x7) are uninitialized. 0x0 is treated as null pointer, and loading/storing on null pointer or other uninitialized regions is prohibited. In debug mode the program immediately aborts.
Using experimental WebAssembly features
By default, Asterius only emits code that uses WebAssembly MVP features. There are flags to make use of WebAssembly experimental features:
--tail-calls
: Emits tail call opcodes for Cmm function calls; overrides the default trampoline approach. Only supported by thewasm-toolkit
backend at the moment.--debug
: Uses i64 BigInt integration for passing i64 values between js/wasm.
The above features require specific flags to switch on in V8. They are known to work in latest Node.js 12.x versions, and we test them on CI.
The V8 team maintains a Node.js 13.x build which integrates V8 trunk, described
here. It's possible to use that
build to evaluate experimental WebAssembly features; we provide a
script which
unzips the latest test-passing build to the current directory, so it's possible
to use the node
binary for testing bleeding-edge Wasm features in V8.
We are keeping an eye on the development of experimental WebAssembly features. Here is a list of V8 tracking issues of the features we are interested in. Some are already available in recent Node.js or Chromium releases.
- WebAssembly SIMD
- WebAssembly Multi-value
- WebAssembly nontrapping float-to-int conversions
- Tail call opcodes
- Reference types
- WebAssembly i64 BigInt integration
- WebAssembly JS Reflection API
- WebAssembly Bulk Memory
- Garbage Collection
- Exception handling
Hacking guide
Using VSCode remote containers
We recommend using VSCode Remote Containers to reproduce the very same dev environment used by our core team members. The steps to set up the dev environment are:
- Do a local clone of the asterius repo
- Install VSCode (at least
1.45
) and its remote extension - Install
podman
, and make sure thepodman
command works with the current user - Set up a
docker
symlink which points topodman
, according to VSCode announcement ofpodman
support docker pull terrorjack/asterius:dev
- Open the asterius repo with remote containers
Opening the repo with remote containers for the first time will take some time,
since it runs the build script to build asterius
and perform booting. Later
re-opening will be near instant, since it reuses the previous container.
The dev image shall work with docker
too if the userns-remap
related
settings are correctly set up. Check the documentation section for
relevant explanation; when using docker
with default settings, there is a file
permission issue when mounting your local filesystem into the prebuilt container
images.
Using direnv
If direnv
is enabled, the PATH
of the current shell session will be extended
to include the locations of Asterius executables. This means it's possible to
run ahc-link ..
instead of stack exec ahc-link -- ..
.
Hacking with ghcid
A known-to-work workflow of hacking Asterius is using ghcid
. We also include
an example .ghcid
file, so running ghcid
at the project root directory shall
work out of the box.
Some notes regarding the usage of ghcid
:
- Multiple lib targets can be loaded at once, but only one main target
(exe/test) can be loaded. When hacking a specific exe/test, modify the local
utils/ghcid.sh
script first. Before committing changes in the Haskell codebase, it would be nice to runstack build --test --no-run-tests
to make sure all executables are not broken by lib changes.
To boot or not to boot
As described in the building guide, stack build
only builds the Asterius
compiler itself; additionally we need to run stack exec ahc-boot
to run the
compiler on the boot libs. This process is typically only needed once, but there
are cases when it needs to be re-run:
- The boot libs in
ghc-toolkit/boot-libs
are modified. - The
Asterius.Types
module is modified, so the IR types have changed. - The
Asterius.CodeGen
module is modified and you're sure different code will be generated when compiling the same Haskell/Cmm files.
Most other modifications in the Asterius lib/exes won't need a reboot. Specifically:
Asterius.Builtins
modifications don't impact the boot cache. The builtin module is generated on the fly with every linker invocation.
When rebooting, run utils/reboot.sh
in the project root directory, so that we
can ensure the booting is used with the up-to-date version of asterius
and the
boot lib sources.
The ahc-boot
process is configurable via these environment variables:
ASTERIUS_CONFIGURE_OPTIONS
ASTERIUS_BUILD_OPTIONS
ASTERIUS_INSTALL_OPTIONS
Doing profiled builds
Doing profiled builds within a local git tree
Use stack-profile.yaml
to overwrite stack.yaml
, and then run
utils/reboot.sh
to kick off the rebooting process. This will be quite slow due
to the nature of profiled builds; all libraries will be rebuilt with the
profiled flavor. Better to perform a profiled build in a standalone git tree.
Once the profiled build is complete, it's possible to use RTS flags to obtain profile data when compiling Haskell sources. At runtime there are two ways to pass RTS flags to a Haskell executable:
- The
GHCRTS
environment variable - The
+RTS ... -RTS
command line arguments
Always use GHCRTS
when running programs like ahc-link
, since those programs
can spawn other processes (e.g. ahc-ld
), and we're often interested in the
profile data of all Asterius executables. The GHCRTS
environment variable can
propagate to all processes.
See the relevant
section
in the GHC user guide for more information on profiling Haskell apps. There are
also some third party applications useful for analyzing the profiling data, e.g.
eventlog2html
,
ghc-prof-flamegraph
.
Fow now, a major problem with the profiled build is: it seems to emit dysfunctional code which doesn't work. Consequently, this affects the TH runner, so any dependencies relying on TH isn't supported by the profiled build.
Measuring time/allocation differences
When working on a performance-related PR, we often want to measure the time/allocation differences it introduced. The workflow is roughly:
- Perform two profiled builds with Docker; one builds from the
master
branch, one from the PR's branch. - Run
ahc-link
in the built images on the example program below, setting the necessaryGHCRTS
to generate the profile reports. The code should be put in two standalone directories, otherwise the.hi
/.o
files may conflict or be accidentally reused.
The profiled Docker images contain pre-compiled Cabal
. And the example program
we use to stress-test the linker is:
import Distribution.Simple
main = defaultMain
We choose this program since it's classic, and although being short, it pulls in a lot of data segments and functions, so it exposes the linker's performance bottleneck pretty well.
Adding a test case
To add a test case, it is best to replicate what has been done for an existing testcase.
- For example,
git grep bytearraymini
should show all the places where the test casebytearraymini
has been used. Replicating the same files for a new test case should "just work".
Code formatting
In Asterius we use ormolu
for formatting
Haskell and prettier
for formatting JavaScript.
Though not all parts of the codebase are currently formatted this way, it is
recommended that when you submit a PR you run the respective formatters on the
changed parts of the code, so that gradually the whole codebase is formatted
uniformly.
Hacking on build01
This section is for Tweagers only.
First, set up your build01
account according to the
handbook.
Don't forget to add the groups = ["docker"]
line in your PR.
Once the PR is merged, you can SSH into a NixOS non-privileged user. You can
check out the asterius
repo, set up your favorite text editor, make edits and
push to the remote.
To build/boot and run tests, a dev container needs to be built first. The
dev.rootless.Dockerfile
can be used to build an image which has the same UID
with your user and doesn't mess up local file permissions:
$ docker build --build-arg UID=$(id -u) --file dev.rootless.Dockerfile --tag my_dev_image .
Building the image can take around 10min.
After my_dev_image
is built, a dev container can be started:
$ docker run -it -v $(pwd):/asterius -w /asterius --name my_dev_container my_dev_image
The command above will start my_dev_container
from my_dev_image
, mount the
current project directory to /asterius
and drop into the bash prompt, from
where you can run build commands.
After exit
ing the current bash prompt of my_dev_container
, it can be restarted later:
$ docker start -ai my_dev_container
If you're using VSCode remote SSH, the first attempt to set up will fail. A known to work workaround is available at https://github.com/microsoft/vscode-remote-release/issues/648#issuecomment-503148523.
Reading list
Here is a brief list of relevant readings about GHC internals and WebAssembly suited for newcomers.
-
GHC documentation regarding the GHC API: a nice reading for anyone looking forward to using the GHC API.
-
GHC commentary: a wiki containing lots of additional knowledge regarding GHC's implementation. Keep in mind some content is out-dated though. Some useful entries regarding this project:
- Building guide. A tl;dr for this section is our CI scripts.
- Overview of pipeline:
we use the Hooks mechanism (specifically,
runPhaseHook
) to replace the default pipeline with our own, to enable manipulation of in-memory IRs.
- How STG works: a nice tutorial containing several examples of compiled examples, illustrating how the generated code works under the hood.
- The Cmm types: it's outdated and the types don't exactly match the GHC codebase now, but the explanations still shed some light on how the current Cmm types work.
- The runtime system: content regarding the runtime system.
-
Understanding the Stack: A blog post explaining how generated code works at the assembly level. Also, its sequel Understanding the RealWorld
-
The WebAssembly spec: a useful reference regarding what's already present in WebAssembly.
-
The
binaryen
C API:binaryen
handles WebAssembly code generation. There are a few differences regardingbinaryen
AST and WebAssembly AST, the most notable ones:-
binaryen
uses a recursiveBinaryenExpression
which is side-effectful. The original WebAssembly standard instead uses a stack-based model and manipulates the operand stack with instructions. -
binaryen
contains a "Relooper" which can recover high-level structured control flow from a CFG. However the relooper doesn't handle jumping to unknown labels (aka computed goto), so we don't use it to handle tail calls.
-
The following entries are papers which consume much more time to read, but still quite useful for newcomers:
-
Making a fast curry: push/enter vs. eval/apply for higher-order languages: A thorough explanation of what is STG and how it is implemented (via two different groups of rewrite rules, also with real benchmarks)
-
The STG runtime system (revised): Includes some details on the runtime system and worth a read. It's a myth why it's not merged with the commentary though. Install a TeX distribution like TeX Live or use a service like Overleaf to compile the
.tex
file to.pdf
before reading. -
The GHC storage manager: Similar to above.
-
Bringing the Web up to Speed with WebAssembly: The PLDI'17 paper about WebAssembly. Contains overview of WebAssembly design rationales and rules of small-step operational semantics.
Finally, the GHC codebase itself is also a must-read, but since it's huge we only need to check relevant parts when unsure about its behavior. Tips on reading GHC code:
-
There are a lot of insightful and up-to-date comments which all begin with "Notes on xxx". It's a pity the notes are neither collected into the sphinx-generated documentation or into the haddock docs of GHC API.
-
When writing
build.mk
for compiling GHC, addHADDOCK_DOCS = YES
to ensure building haddock docs of GHC API, andEXTRA_HADDOCK_OPTS += --quickjump --hyperlinked-source
to enable symbol hyperlinks in the source pages. This will save you tons of time fromgrep
ing the ghc codebase. -
grep
ing is still unavoidable in some cases, since there's a lot of CPP involved and they aren't well handled by haddock.
Project status & roadmap
Overview
The Asterius project has come a long way and some examples with complex dependencies already work. It's still less mature than GHCJS though; see the next section for details.
In general, it's hard to give ETA for "production readiness", since improvements are continuous, and we haven't collected enough use cases from seed users yet. For more insight into what comes next for this project, we list our quarterly roadmap here.
Besides the goals in each quarter, we also do regular maintenance like
dependency upgrades and bugfixes. We also work on related projects (mainly
haskell-binaryen
and
inline-js
) to ensure they are kept in
sync and also useful to regular Haskell developers.
What works now
- Almost all GHC language features (TH support is partial, cross-splice state persistence doesn't work yet).
- The pure parts in standard libraries and other packages. IO is achieved via rts primitives or user-defined JavaScript imports.
- Importing JavaScript expressions via the
foreign import javascript
syntax. First-class garbage collectedJSVal
type in Haskell land. - Preliminary copying GC, managing both Haskell heap objects and JavaScript references.
- Cabal support. Use
ahc-cabal
to compile libraries and executables. Support for customSetup.hs
is limited. - Marshaling between Haskell/JavaScript types based on
aeson
. - Calling Haskell functions from JavaScript via the
foreign export javascript
syntax. Haskell closures can be passed between the Haskell/JavaScript boundary viaStablePtr
. - Invoking RTS API on the JavaScript side to manipulate Haskell closures and trigger evaluation.
- A linker which performs aggressive dead-code elimination, based on symbol reachability.
- A debugger which checks invalid memory access and outputs memory loads/stores and control flow transfers.
- Complete
binaryen
raw bindings, plus a monadic EDSL to construct WebAssembly code directly in Haskell. wasm-toolkit
: a Haskell library to handle WebAssembly code, which already powers binary code generation.- Besides WebAssembly MVP and
BigInt
, there are no special requirements on the underlying JavaScript engine at the moment.
What may stop one from using Asterius right now
- Lack of JavaScriptCore/Safari support, due to incomplete JavaScript
BigInt
support at the moment. - Runtime bugs. The generated code comes with a complex hand-written runtime which is still buggy at times. The situation is expected to improve once we're able to work with an IR more high-level than Cmm and shave off the current hand-written garbage collector; see the 2020 Q3 section for more details.
- GHCJS projects aren't supported out of the box. Major incompatibilities
include:
- Word sizes differ. Asterius is still 64-bit based at the moment.
- JSFFI syntax and semantics differ. Asterius uses
Promise
-based async JSFFI and GHCJS uses callbacks. - Cabal handles GHCJS and Asterius differently.
- Lack of Nix support.
- Lack of GHCi support.
- TH support is not 100% complete; certain TH API which require preserving state
across splices (e.g.
getQ
/putQ
) don't work yet. - Cabal tests and benchmarks can't be run out of the box.
- Custom
Setup.hs
support is limited. If it hassetup-deps
outside GHC boot libs, it won't work. - Lack of profiling support for generated code.
- Excessive memory usage when linking large programs.
Quarterly roadmap
2021 Q3
For the past months before this update, I took a break from the Asterius project and worked on a client project instead. There's a saying "less is more", and I believe my absense in this project for a few months is beneficial in multiple ways:
- I gained a lot more nix-related knowledge.
- Purging the short-term memory on the project and coming back, this gives me some insight on the difficulties of onboarding new contributors.
- After all, it was a great mental relief to work on something which I was definitely not a bottleneck of the whole project.
Before I took the break, Asterius was stuck with a very complex & ad-hoc build system, and it was based on ghc-8.8. The most production-ready major version of ghc is ghc-8.10 today. Therefore, Q3 goals and roadmap has been adjusted accordingly:
- Upgrade Asterius to use ghc-8.10. The upgrade procedure should be principled & documented, so someone else can repeat this when Asterius upgrades to ghc-9.2 in the future.
- Use cabal & nix as the primary build system.
What has been achieved so far:
- There is a new ghc fork dedicated for asterius at
https://github.com/tweag/ghc-asterius. It's based on
ghc-8.10
branch, the previous asterius-specific patches have all been ported, and I implemented nix-based logic to generate cabal-buildable ghc api packages to be used by Asterius, replacing the previous ad-hoc python script. - There is a WIP branch of ghc-8.10 & nix support at https://github.com/tweag/asterius/pull/860. Most build errors in the host compiler have been fixed, and the booting logic will be fixed next.
- A wasi-sdk/wasi-libc fork is also maintained in the tweag namespace. It's
possible to configure our ghc fork with
wasm32-unknown-wasi
triple now, so that's a good start for future work of proper transition of Asterius to a wasi32 backend of ghc.
Remaining work of Q3 will be wrapping up #860 and merging it to master
.
Beyond Q3, the overall plan is also guided by the "less is more" principle: to reduce code rather than to add, leveraging upstream logic whenever possible, while still maintaing and even improving end-user experience. Many hacks were needed in the past due to various reasons, and after all the lessons learned along the way, there are many things that should be shaved off:
- The hacks related to 64-bit virtual address space. Reusing host GHC API which targets 64-bit platform for Asterius was the easiest way to get the MVP working, but given we have much better knowledge about how cross-compiling in ghc works, these hacks needs to go away.
- Custom object format and linking logic. This was required since Asterius needed to record a lot of Haskell-specific info in the object files: JSFFI imports/exports, static pointer table, etc. However, with runtime support, these custom info can all be replaced by vanilla data sections in the wasm or llvm bitcode object files.
- Following the entry above, most of the existing wasm codegen logic. It looks possible to leverage the llvm codegen, only adding specific patches to support features like JSFFI.
- Most of the existing JavaScript runtime. They will be gradually replaced by
cross-compiled ghc rts for the wasi32 target, component after component. The
ultimate goal is to support generating self-contained JavaScript-less wasm
modules which work in runtimes beyond browsers/nodejs (that's why we stick to
wasi-sdk
instead ofemscripten
in the first place).
2021 Q1
In 2020 Q4 we mainly delivered:
- Use standalone stage-1 GHC API packages and support building Asterius using vanilla GHC.
- Remove numerous hacks and simplify the codebase, e.g.:
- Make
ahc
a proper GHC frontend exe, supportahc -c
on non-Haskell sources - Use vanilla archives and get rid of custom
ahc-ar
- Make
- Refactor things incompatible with 32-bit pointer convention, e.g.:
- Proper heap layout for
JSVal#
closures - Remove higher 32-bit data/function address tags
- Proper heap layout for
In 2021 Q1, the primary goals are:
- Finish transition to 32-bit code generation.
- Improve C/C++ support, including support for
integer-gmp
andcbits
in common packages.
The plan to achieving above goals:
- Audit the current code generator & runtime and remove everything incompatible with 32-bit pointer convention.
- For the time being, favor simplicity/robustness over performance. Some previous optimizations may need to be reverted temporarily to simplify the codebase and reduce the refactoring overhead.
- Use
wasi-sdk
as the C toolchain to configure the stage-1 GHC and finish the transition.
A longer term goal beyond Q1 is upstreaming Asterius as a proper wasm backend of
GHC. We need to play well with wasi-sdk
for this to happen, so another thing
we're working on in Q1 is: refactor the linker infrastructure to make it
LLVM-compliant, which means managing non-standard entities (e.g. static
pointers, JSFFI imports/exports) in a standard-compliant way.
2020 Q4
In 2020 Q3 we mainly delivered:
- PIC(Position Independent Code) support. We worked on PIC since in the beginning, we thought it was a prerequisite of C/C++ support. Turned out it's not, but still PIC will be useful in the future when we implement dynamic linker and ghci support.
- Initial C/C++ support, using
wasi-sdk
to compile C/C++ sources. Right now this doesn't work Cabal yet, so the C/C++ sources need to be manually added toasterius/libc
to be compiled and linked. We already replaced quite some legacy runtime shims with actual C code (e.g.cbits
inbytestring
/text
), and more will come in the future.
Proper C/C++ support requires Asterius to be a proper wasm32
-targetting cross
GHC which is configured to use wasi-sdk
as the underlying toolchain. The
immediate benefits are:
- Get rid of various hacks due to word size mismatch in the code emitted by
Asterius and
wasi-sdk
. Some packages (e.g.integer-gmp
) are incompatible with these hacks. - Implement proper Cabal integration and support
cbits
in user packages. - Improve code size and runtime performance, getting rid of the
i64
/i32
pointer casting everywhere. - Get rid of
BigInt
usage in the JavaScript runtime, and support running generated code in Safari.
Thus the goal of 2020 Q4 is finishing the 32-bit cross GHC transition. The steps to achieve this is roughly:
- Detangle the host/wasm GHC API usage. Asterius will shift away from using
ghc
of the host GHC and instead use its own stage-1 GHC API packages. - Fix various issues when configuring GHC to target
wasm32-wasi
and usingwasi-sdk
as the toolchain. - Refactor the code generator and the runtime to work with the new 32-bit pointer convention.
2020 Q3
Work in 2020 Q3 is focused on:
- Introducing C/C++ toolchain support. The first step is to introduce libc in the generated wasm code, and use libc functionality to replace certain runtime functionality (e.g. memory management). Once we're confident our runtime and generated code is compatible with libc, we'll look into building & linking C source files in Haskell packages.
- Research on a high-level variant of Cmm which abstracts away closure representation and can be efficiently mapped to platforms providing host garbage collection (e.g. wasm-gc, JavaScript, JVM). This will enable us to avoid relying on a hand-written custom garbage collector and improve the runtime reliability significantly.
Project Milestones, January 2022 edition
The goals for Asterius are described on the page WebAssembly goals on the GHC Wiki. This document describes some milestones on the path to those goals.
Getting to JavaScript-free functionality
Although JavaScript interoperation is the big use case, much of the support needed for WebAssembly is independent of JavaScript.
Codegen: New back end
A new back end will have to be defined in a way that fits into GHC's existing structure.
GHC support required:
Either a new value constructor forMake theBackend
or (likely) changes to theNcgImpl
recordBackend
type abstract, add a new value constructor for it.
Codegen: handling arbitrary control-flow graphs
WebAssembly lacks goto
and provides only structured control flow:
loops, blocks, if
statements, and multilevel continue
/break
. A
Cmm control-flow graph must be converted to this structured control.
Status: A prototype has been implemented and tested, but the prototype works only on reducible control-flow graphs. A transformation from irreducible to reducible CFGs has yet to be implemented.
GHC support required:
- Dominator analysis on
CmmGraph
Codegen: fit linking information into standard object files
The Asterius prototype emits object files that are represented in a custom format. This format contains ad hoc information that can be handled only by a custom linker. The information currently stored in custom object files must either be expressed using standard object files that conform to C/C++ toolchain convention, or it must be eliminated.
Status: All information currently emitted by the Asterius prototype can be expressed using standard object files, with one exception: JSFFI records. We plan to turn these records into standard data segments whose symbols will be reachable from related Haskell functions. Such segments can be handled by a standard C/C++ linker. The data segments will be consumed by the JavaScript adjunct to GHC's run-time system, which will use them to reconstruct imported and exported functions.
GHC support required:
- None
Codegen: implement WebAssembly IR and binary encoder
Rather than attempt to prettyprint WebAssembly directly from Cmm, the
WebAssembly back end will first translate Cmm to an internal
representation of a WebAssembly module, tentatively to be called
WasmModule
. A WasmModule
can be serialized to the standard
WebAssembly binary format.
A preliminary design might look like this:
- A
WasmModule
contains sections - A section may contain functions, memory segments or other metadata
- A function body is control flow (
WasmStmt ...
) - Control flow may contain straight-line code
- Straight-line code may be a tree structure or may be a sequence of Wasm instructions
Status: Except for that WasmStmt
fragment, which contains the
WebAssembly control-flow constructs, the internal representation has
yet to be defined.
And we have yet to reach consensus on whether we wish to be able to
emit both textual and binary WebAssembly, or whether we prefer to emit
only binary WebAssembly and to rely on an external disassembler to
produce a more readable representation. (External assemblers are
apparently not good enough to be able to rely on emitting only a
textual representation.)
GHC support required:
- None
Codegen: implement Cmm to WebAssembly IR codegen
We need a translator from CmmGroup
to WasmModule
. Our prototype
relooper translates CmmGraph
to WasmStmt ...
, and the other parts
of the translation should mostly be a 1-to-1 mapping. Some Cmm
features can be translated in more than one way:
-
Global registers. We can use the in-memory register table as in unregisterised mode, or one WebAssembly global for each global register, or use WebAssembly multi-value feature to carry the registers around. Start with WebAssembly globals first, easy to implement, should be reasonably faster than memory load/store.
-
Cmm tail calls. We can use WebAssembly experimental tail calls feature, or do trampolining by making each Cmm function return its jump target. Since WebAssembly tail calls is not widely implemented in engines yet, start with trampolining.
Status: Not started, but given the rich experience with the Asterius prototype, no difficulties are anticipated.
GHC support required:
- None
Build system
The build system has to be altered to select the proper C code for the WebAssembly target. We're hoping for the following:
-
The build system can build and package the run-time system standalone.
-
The build system can easily cross-compile from a POSIX host to the Wasm target.
-
A developer can instruct the build system to choose Wasm-compatible features selectively to build and test on a POSIX platform (so-called "feature vector").
Meeting these goals will require both conditional build rules and
CPP macros for code specific to wasm32-wasi
.
Status: Not yet begun.
GHC support required:
- Coordination with the cross-compilation team (Sylvain Henry, John Ericson)
RTS: avoid mmap
The run-time storage manager uses mmap
and munmap
to allocate and
free MBlock
s. But mmap
and munmap
aren't available on the WASI
platform, so we need to use standard libc allocation routines instead.
Status: we implemented the patch, tested with WebAssembly, i386 and x64-without-large-address-space.
GHC support required:
-
New directory
rts/wasi
to go alongsiderts/posix
andrts/win32
. -
Altered logic in
rts/rts.cabal.in
and elsewhere to use conditional compilation to selectOSMem.c
from therts/wasi
directory.
RTS: replace the timer used in the scheduler
The run-time system currently uses a timer to know when to deliver a Haskell Execution Context (virtual CPU) to another Haskell thread. But the timer is implemented using pthreads and POSIX signals, which are not available on WebAssembly---so it has to go. We'll need some other method for deciding when to switch contexts.
This change will remove dependencies on pthreads and on a POSIX signal (VTALRM).
Status: We have patched the run-time system to disable that timer,
and we have tested the patch on POSIX. In this patch, the scheduler
does a context switch at every heap-block allocation (as in the -C0
RTS
flag).
Yet to be done: determine a viable long-term strategy for deciding
when to context switch.
GHC support required:
- Patches to scheduler, of a detailed nature to be specified later
RTS: replace other uses of POSIX signals
The run-time system depends on the signals API in various ways: it can handle certain OS signals, and it can even support setting Haskell functions as signal handlers. Such functionality, which inherently depends on signals, must be made conditional on the target platform.
There is already a RTS_USER_SIGNALS
CPP macro that guards some
signal logic, but not all. To make signals truly optional, more work
is needed.
Status: In progress.
GHC support required:
- Not yet known
RTS: port libffi to WebAssembly
libffi
is required for dynamic exports to C. It's technically
possible to port libffi
to either pure WebAssembly or
WebAssembly+JavaScript.
Status: Not yet implemented.
GHC support required:
- Likely none.
Milestones along the way to full JavaScript interoperability
(The audience for this section is primarily the Asterius implementation team, but there are a few things that ought to be communicated to other GHC implementors.)
RTS for JSFFI: representing and garbage-collecting foreign references
When Haskell interoperates with JavaScript, Haskell objects need to be able to keep JavaScript objects alive and vice versa, even though they live on different heaps. Similarly, JavaScript needs to be able to reclaim JavaScript objects once there are no more references to them.
We propose to extend GHC with a new primitive type JSVal#
, whose
closure payload is a single word. The JavaScript adjunct uses this
word to index into an internal table. After each major garbage
collection, the collector notifies the JavaScript adjunct of all live
JSVal#
closures. The adjunct uses this report to drop its references
to JavaScript objects that cannot be reached from the Haskell heap.
Status: Not yet implemented.
GHC support required:
-
Build-system support for the JavaScript adjunct to the RTS
-
New primitive type
JSVal#
-
Patch to the garbage collector to report live
JSVal#
closures.
RTS: API/semantics for scheduling and JavaScript foreign calls
Write down and document whatever API is needed for calls across the Haskell/JavaScript boundary and for sharing the single CPU among both Haskell threads and JavaScript's event loop. Ideal documentation would include a small-step operational semantics.
Status: Work in progress
GHC support required:
- Coordinate with GHCJS team (unclear at what stage)
RTS: Scheduler issues
GHC's scheduler will need to be altered to support an event-driven model of concurrency. The details are work in progress.
Draft semantics of concurrency and foreign calls.
Note: This document assumes that every function takes exactly one argument. Just imagine that it's the last argument in a fully saturated call.
Foreign export asynchronous
Suppose that a Haskell function f
is exported to JavaScript
asynchronously (which might be the default). When JavaScript calls the
exported function with argument v
, it has the effect of performing
the IO action ⟦f⟧ v
, where the translation ⟦f⟧
is defined as
follows:
⟦f⟧ v = do
p <- allocate new promise
let run_f = case try (return $ f $ jsToHaskell v) of
Left exn -> p.fails (exnToJS exn)
Right a -> p.succeeds (haskellToJS a)
forkIO run_f
return p -- returned to JavaScript
Not specified here is whether the scheduler is allowed to steal a few cycles to run previously forked threads.
N.B. This is just a semantics. We certainly have the option of implementing the entire action completely in the runtime system.
Not yet specified: What is the API by which JavaScript would call an asynchronously exported Haskell function? Would it, for example, use API functions to construct a Haskell closure, then evaluate it?
Foreign import asynchronous
Suppose that a JavaScript function g
is imported asynchronously
(which might be the default). Let types a
and b
stand for two
unknown but fixed types. The JavaScript function expects an argument
of type a
and returns a Promise
that (if successful) eventually
delivers a value of type b
. When a Haskell thunk of the form g e
is forced (evaluated), the machine performs the following monadic
action, the result of which is (eventually) written into the thunk.
do let v = haskellToJS e -- evaluates e, converts result to JavaScript
p <- g v -- call returns a `Promise`, "immediately"
m <- newEmptyMVar
... juju to associate m with p ... -- RTS primitive?
result <- takeMVar m
case result of Left fails -> ... raise asynchronous exception ...
Right b -> return $ jsToHaskell v
CPU sharing
Suppose GHC wishes to say politely to the JavaScript engine, "every so
often I would like to use the CPU for a bounded time." It looks like
Haskell would need to add a message to the JavaScript message queue,
such that the function associated with that messages is "run Haskell
for N ticks." Is the right API to call setTimeout
with a delay of 0
seconds?
Concurrency sketch
Let's suppose the state of a Haskell machine has these components:
-
F
("fuel") is the number of ticks a Haskell thread can execute before returning control to JavaScript. This component is present only when Haskell code is running. -
R
("running") is either the currently running Haskell thread, or if no thread is currently running, it is • ("nothing") -
Q
("run queue") is a collection of runnable threads. -
H
("heap") is the Haskell heap, which may containMVar
s and threads that are blocked on them.
Components R
and H
are used linearly, so they can be stored in
global mutable state.
The machine will enjoy a set of labeled transitions such as are
described in Simon PJ's paper on the "Awkward Squad." Call these the
"standard transitions." (The awkward-squad machine state is a single
term, formed by the parallel composition of R
with all the threads
of Q
and all the MVars of H
. The awkward squad doesn't care about
order, but we do.) To specify the standard transitions, we could add
an additional clock that tells the machine when to switch the running
thread R
out for a new thread from the queue. Or we could leave the
context switch nondeterministic, as it is in the awkward-squad paper.
Whatever seems useful.
Every state transition has the potential use to fuel. Fuel might
actually be implemented using an allocation clock, but for semantics
purposes, we can simply decrement fuel at each state transition, then
gate the standard transitions on the condition F > 0
.
At a high level, every invocation of Haskell looks the same:
JavaScript starts the Haskell machine in a state ⟨F, •, Q, H⟩
, and
the Haskell machine makes repeated state transitions until it reaches
one of two stopping states:
-
⟨F',•, [], H'⟩
: no Haskell threads are left to run -
⟨0, ̧R', Q', H'⟩
: fuel is exhausted, in which case the machine moves the currently running thread onto the run queue, reaching state⟨0, ̧•, R':Q', H'⟩
Once one of these states is reached, GHC's runtime system takes two actions:
-
It allocates a polite request for the CPU and puts that request on the JavaScript message queue, probably using
setTimeout
with a delay of 0 seconds. -
It returns control to JavaScript.
GHC RTS scheduler refactoring
All discussion in this document refers to the non-threaded RTS.
Potential semantics
GHC relies on the scheduler to manage both concurrency and foreign calls. Foreign calls are in play because most foreign calls are asynchronous, so implementing a foreign call requires support from the scheduler. A preliminary sketch of possible semantics can be found in file semantics.md
.
JavaScript user experience
I have foo.hs
. I can compile to foo.wasm
and foo.js
. foo.wasm
is a binary artifact that needs to be shipped with foo.js
, nothing
else you need to know about this file. foo.js
conforms to some
JavaScript module standard and exports a JavaScript object. Say this
object is foo
.
For each exported top-level Haskell function, foo
contains a
corresponding async method. Consider the most common case main :: IO ()
, then you can call foo.main()
. For something like fib :: Int -> Int
, you can do let r = await foo.fib(10)
and get the number result
in r
. The arguments and result can be any JavaScript value, if the
Haskell type is JSVal
.
Now, suppose we await foo.main()
, and main
finished successfully.
The RTS must remain alive, because:
main
might have forked other Haskell threads, those threads are expected to run in the background.main
might have dynamically exported a Haskell function closure as aJSFunction
. ThisJSFunction
is passed into the outside JavaScript world, and it is expected to be called back some time in the future.
Notes regarding error handling: any unhandled Haskell exception is converted to a JavaScript error. Likewise, any JavaScript error is converted to a Haskell exception.
Notes regarding RTS startup: foo
encapsulates some RTS context. That
context is automatically initialized no later than the first time you
call any method in foo
.
Notes regarding RTS shutdown: not our concern yet. As long as the browser tab is alive, the RTS context should be alive.
Primer
ghc-devs thread: Thoughts on async RTS API?
ghc commentary: scheduler
Consider a native case...
Suppose we'd like to run some Haskell computation from C (e.g. the main
function). After the RTS state is initialized, we need to:
- If the Haskell function expects arguments, call the
rts_mk*
functions inRtsAPI.h
to convert C argument values to Haskell closures. Callrts_apply
repeatedly to apply the Haskell function closure to argument closures, until we end up with a closure of Haskell typeIO a
ora
, ready to be evaluated. - Call one of the eval functions in
RtsAPI.h
. The eval function creates a TSO(Thread State Object), representing the Haskell thread where the computation happens. - The eval function does some extra bookkeeping, then enters the scheduler loop.
- The scheduler loop exits when the initial Haskell thread finishes. The thread return value and exit code is recorded.
- The eval function retrieves the thread return value and exit code.
We need to check whether the thread completed successfully, if so,
we can call one of
rts_get*
functions inRtsAPI.h
to convert the result Haskell closure to C value.
The key logic is in the
schedule
function which implements the scheduler loop. The implementation is
quite complex, for now we only need to keep in mind:
- In each iteration, the Haskell thread being run is not necessarily the initial thread we created to kick off evaluation. New threads may get forked and executed, but the loop exits only when the initial thread finishes!
- Threads may block due to a variety of reasons, they will be
suspended and resumed as needed. It may be possible that all live
threads are blocked, RTS will attempt to make progress by collecting
file descriptors related to blocking I/O and do a
select()
call, to ensure I/O can proceed for at least one file descriptor.
The problem
Suppose we'd like to call an async JavaScript function and get the result in Haskell:
foreign import javascript safe "fetch($1)" js_fetch :: JSRequest -> IO JSResponse
In Haskell, when js_fetch
returns, the actual fetch()
call should
have already resolved; if it rejected, then an exception should be
raised in Haskell.
Now, the main thread calls js_fetch
at some point, no other threads
involved. According to previous section, the current call stack is
something like:
main -> rts_evalLazyIO -> scheduleWaitThread -> schedule -> fetch
The Haskell code does a fetch()
call (or it arranges the RTS to
perform one). fetch()
will immediately return a Promise
handle.
Now what? What do we do with this Promise
thing? More importantly,
the scheduler loop can't make any progress! The Haskell thread is
blocked, suspended, the run queue is empty, the RTS scheduler only
knows about posix blocking read/write, so it doesn't know how to
handle this situation.
After fetch()
returns, the call stack is:
main -> rts_evalLazyIO -> scheduleWaitThread -> schedule
Remember the
"run-to-completion"
principle of the JavaScript concurrency model! We're currently inside
some JavaScript/WebAssembly function, which counts as a single tick in
the entire event loop. The functions we're running right now must run
to completion and return, only after that, the fetch()
result can
become available.
And also remember how the WebAssembly/JavaScript interop works: you can only import synchronous JavaScript functions, and export WebAssembly functions as synchronous JavaScript functions. Every C function in RTS that we cross-compile to WebAssembly is also synchronous, no magic blocking or preemptive context switch will ever take place!
What we need
All the scheduler-related synchronous C functions in RTS, be it
rts_eval*
or schedule
, they only return when the initial Haskell
thread completes. We must teach these functions to also return when
the thread blocks, at least when blocking reason is beyond
conventional posix read/write.
Here's how things should look like after the scheduler is refactored:
- There are async flavours of scheduler functions. When they return,
the Haskell thread may have completed, or may have been blocked due
to some reason. In that case, the returned blocking info will
contain at least one file descriptor or
Promise
related to blocking, and also the blocked thread ids. - When we do async JavaScript calls, we attach resolve/reject
handlers to the returned
Promise
. These handlers will resume the entire RTS and carry-on Haskell computation. - Since any Haskell thread may perform async JavaScript call, all
Haskell functions are exported as async JavaScript functions. A
Promise
is returned immediately, but it's resolved/rejected in the future, when the corresponding Haskell thread runs to completion.
Potential milestones
RTS: integrating foreign event loops
Draft:
The RTS scheduler is synchronous. If you call rts_eval*
to enter the
scheduler and do some evaluation, it'll only return when the relevant
Haskell thread is completed or killed. This model doesn't work if we
want to be able to call async foreign functions without blocking the
entire RTS. The root of this problem: the scheduler loop has no
knowledge about foreign event loops.
Status: we have looked into this, and based on our experience in Asterius, the implementation plan is as follows:
-
Add CPS-style async versions of
rts_eval*
RTS API functions. Original sync versions continue to work, but panics with a reasonable error message when unsupported foreign blocking event occurs. -
The scheduler loop is broken down into "ticks". Each tick runs to the point when some Haskell computation finishes or blocks, much like a single iteration in the original scheduler loop. The scheduler ticks can be plugged into a foreign event loop, so Haskell evaluation fully interleaves with other foreign computation.
GHC support required:
- Restructuring of the current scheduler.
RTS: make usage of select
/poll
optional
In the current non-threaded RTS, when there are no immediately
runnable Haskell threads, a select()
call will be performed on all
the file descriptors related to blocking. The call returns when I/O is
possible for at least one file descriptor, therefore some Haskell
thread blocked on I/O can be resumed.
This may work for us when we target pure wasm32-wasi
instead of the
browser. The WASI standard defines a poll_oneoff
syscall, and
wasi-libc
implements select()
/poll()
using this syscall.
However, this doesn't work well with JavaScript runtime (or any
foreign event loop in general). poll()
calls are blocking calls, so
they can block the entire event loop, hang the browser tab and prevent
"real work" (e.g. network requests) from proceeding.
Status: we have looked into this, and there are roughly two possible approaches:
- Use the binaryen "asyncify" wasm rewriting pass to instrument the
linked wasm module, to implement the blocking behavior of
poll_oneoff
without actually blocking the entire event loop. Easy to implement, but it's a very ugly hack that also comes with penalty in code size and performance. - Restructure the scheduler, so that for non-threaded RTS, each
scheduler tick will not attempt to do a blocking
poll()
call at all. The higher-level caller of scheduler ticks will be in charge of collecting blocking I/O events and handling them.
GHC support required:
- Same as previous subsection