caml-list - the Caml user's mailing list
 help / Atom feed
From: François Pottier <>
To: OCaML Mailing List <>, menhir-list <>
Subject: [Caml-list] [ANN] New release of Menhir (20211230)
Date: Fri, 31 Dec 2021 09:22:49 +0100
Message-ID: <> (raw)

Dear OCaml & Menhir users,

I am pleased to announce a new release of Menhir, with a major improvement.

The code back-end has been rewritten from the ground up by Émile Trotignon
and by myself, and now produces efficient and well-typed OCaml code. The
infamous Obj.magic is not used any more.

Furthermore, the new code back-end produces code that is more aggressively
optimized, leading to a significant reduction in memory allocation and a
typical performance improvement of up to 20% compared to the previous code

   opam update
   opam install menhir.20211230

Happy well-typed parsing in 2022!

François Pottier

## 2021/12/30

* The code back-end has been rewritten from the ground up by Émile Trotignon
   and François Pottier, and now produces efficient and **well-typed** OCaml
   code. The infamous `Obj.magic` is not used any more.

   The table back-end and the Coq back-end are unaffected by this change.

   The main side effects of this change are as follows:

   - The code back-end now needs type information. This means that
     *either* Menhir's type inference mechanism must be enabled
              (the easiest way of enabling it is to use Menhir via `dune`
               and to check that the `dune-project` file says
               `(using menhir 2.0)` or later)
     *or* the type of every nonterminal symbol must be
          explicitly given via a `%type` declaration.

   - The code back-end no longer allows the type of any symbol to be an
     open polymorphic variant type, such as ```[> `A ]```. As a workaround,
     we suggest using a closed polymorphic variant instead.

   - The code back-end now adheres to the *simplified* error-handling 
     as opposed to the *legacy* strategy.

     For grammars that do *not* use the `error` token, this makes no 

     For grammars that use the `error` token in the limited way permitted by
     the simplified strategy, this makes no difference either. The 
     strategy makes the following requirement: the `error` token should 
     appear at the end of a production, whose semantic action should 
abort the
     parser by raising an exception.

     Grammars that make more complex use of the `error` token, and therefore
     need the `legacy` strategy, cannot be compiled by the new code 
     As a workaround, it is possible to switch to the table back-end (using
     `--table --strategy legacy`) or to the ancient code back-end (using
     `--code-ancient`). **In the long run, we recommend abandoning the 
use of
     the `error` token**. Support for the `error` token may be removed
     entirely at some point in the future.

   The original code back-end, which has been around since the early days of
   Menhir (2005), temporarily remains available (using `--code-ancient`). It
   will be removed at some point in the future.

   The new code back-end offers several levels of optimization, which remain
   undocumented and are subject to change in the future. At present, the 
   levels are roughly as follows:

   - `-O 0 --represent-everything` uses a uniform representation of the 
     and produces straightforward code.
   - `-O 0` uses a non-uniform representation of the stack; some stack cells
     have fewer fields; some stack cells disappear altogether.
   - `-O 1` reduces memory traffic by moving `PUSH` operations so that they
     meet `POP` operations and cancel out.
   - `-O 2` optimizes the reduction of unit productions (that is, 
     whose right-hand side has length 1) by performing a limited amount of
     code specialization.

   The default level of optimization is the maximum level, `-O 2`.

* The new command line switch `--exn-carries-state` causes the exception
   `Error` to carry an integer parameter: `exception Error of int`. When the
   parser detects a syntax error, the number of the current state is 
   in this way. This allows the caller to select a suitable syntax error
   message, along the lines described in
   [Section 11](
   of the manual. This command line switch is currently supported by the 
   back-end only.

* The `$syntaxerror` keyword is no longer supported.

* Document the trick of wrapping module aliases in `open struct ... end`,
   like this: `%{ open struct module alias M = MyLongModuleName end %}`.
   This allows you to use the short name `M` in your grammar, but forces
   OCaml to infer types that refer to the long name `MyLongModuleName`.
   (Suggested by Frédéric Bour.)

                 reply index

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

caml-list - the Caml user's mailing list

Archives are clonable: git clone --mirror

AGPL code for this site: git clone public-inbox