caml-list - the Caml user's mailing list
 help / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: caml-list@inria.fr
Subject: [Caml-list] [ANN] wasicaml - a code emitter for OCaml targeting WebAssembly
Date: Tue, 22 Jun 2021 15:55:32 +0200
Message-ID: <2df12937-40cb-1766-73e8-c847311b7238@gerd-stolpmann.de> (raw)

[-- Attachment #1.1.1: Type: text/plain, Size: 6006 bytes --]

Hello everybody,

I'd like to announce a new project to develop a code generator that emits
WebAssembly:

https://github.com/remixlabs/wasicaml

With the support of RemixLabs I could already create a very first version
that takes the OCaml bytecode as input and translates it to WebAssembly.
While this approach probably doesn't lead to the fastest code, it is
easy to accomplish, and it demonstrates the challenge (and already shows how
to solve many of the part problems along the road).

To be precisely, the target of the translator is wasm32-unknown-wasi, i.e.
the WASI ABI. This ABI is still in early development, but provides already
the syscalls (or better, host calls) to access files, to get the current
time, and to read the environment. This is almost enough to run a compiler -
I only had to add system() so that ocamlc can start external preprocessors.
Also, due to the fact that the current wasm implementations still lack
exception handling, I had to assume the presence of a host emulation of
exceptions (which is easy to provide if the host environment is Javascript,
but not necessarily for other environments).

The translator takes the OCaml bytecode as input, i.e. you first create
an excecutable

$ ocamlc -o myexec ...

and then make wasm out of it:

$ wasicaml -o myexec.wasm myexec

If you omit the .wasm suffix, wasicaml will put a preamble in front of the
wasm code that starts the execution:

$ wasicaml -o myexec_wasm myexec
$ ./myexec_wasm

Because of this trick, many problems of cross-compiling can be avoided.

You may ask what the benefits of yet another "Web" language are. We already
have two emitters targeting Javascript - isn't that enough? Well, two
answers here.

First, WASI is a proper LLVM target. Because of this, you can link
code from other languages with your executable (e.g. C or Rust). So
you are not limited to OCaml but can use any language that also targets
the WASI ABI. E.g. you can do

$ wasicaml -o myexec.wasm myexec -ccopt -lfoo

to also link in libfoo.a (which must also be compiled to wasm). So
it is multi-lingual from the beginning.

Second, WebAssembly can be used outside the web, too. WASI targets more
the command-line, and server plugins, and generally any OS-independent
environments. For example, imagine you have an Electron app with a
great UI, but for some special functionality you need to include some
OCaml code, too. You don't want to give up the OS-independence, and
WASI gives you now a natural option to add the OCaml code. And you still
have access to the filesystem without hassle. - Another example is edge
computing, i.e. when the cloud is extended by computers outside the data
center, and the code should be in a form so that it can be run on as many
platforms as possible. - All in all, WASI plays well when you need to
combine OS-independence with a classic way of organizing the code as
command or as server function, and you also need predictable performance.

The challenge of translating OCaml to wasm is mainly the garbage collector.
Wasm doesn't permit many of the tricks ocamlopt is using to know in which
memory (or register) locations OCaml values are stored. In wasm, there are
no registers but the closest vehicle are local variables. Now, it is not
possible to scan these variables from the GC function, making it practically
impossible to put OCaml values there while a function is called that might
trigger a GC. There is also no really cheap way of obtaining a stack
descriptor.

Wasicaml inherits the stack from the bytecode interpreter and uses it as
its own shadow stack for OCaml values. As wasicaml bases on the bytecode
representation of the code, the bytecode instructions already ensure that
values always live in this stack when the GC might run. Wasicaml additionally
tries to identify values that don't need this special treatment (like ints
and bools) and that are preferably stored in local variables, giving the
wasm executor freedom to put these into registers or other high-speed
locations. (Unfortunately, most of the type information is already erased
in the bytecode, and this is definitely one of the deficiencies of the
bytecode approach.)

In order to maximize the performance, it is probably best to avoid the
stack whenever possible. The current approach of transforming the bytecode
hasn't brought to an end yet with respect to such optimizations. For
example, there could be more analyses that figure out when GC runs are
actually possible and when it is safe to use local variables.

Another problem of the bytecode basis is that all function calls are
indirect, preventing the wasm executor from inlining functions.

As a project, I'd like to see wasicaml progressing in two directions.
First, make the current approach as good as possible - although basing
it on the bytecode representation has its downsides, it is easy to understand
and it is possible to figure out what the necessary ingredients for fast
code are. Second, get an idea where a possible real wasm backend
would fit into the OCaml compiler (maybe it is c-- but maybe this doesn't
give us much and you start better with lambda).

Anyway, welcome to the new world of WebAssembly!

Gerd

--
PS. If you are interested in WebAssembly and like to work with me on another
Wasm port for some time, there is a position:
*https://www.mixtional.de/recruiting/2021-01/index.html -- PPS. Wasicaml
is a project of Figly, Inc., commonly known as RemixLabs, developing a
reactive low-code and code collaboration platform. https://remixlabs.com/ *

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #1.1.2: Type: text/html, Size: 6809 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

                 reply index

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2df12937-40cb-1766-73e8-c847311b7238@gerd-stolpmann.de \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

caml-list - the Caml user's mailing list

Archives are clonable: git clone --mirror https://inbox.ocaml.org/caml-list

AGPL code for this site: git clone https://public-inbox.org/ public-inbox