caml-list - the Caml user's mailing list
 help / Atom feed
* [Caml-list] [ANN] New release of Menhir (20211230)
@ 2021-12-31  8:22 François Pottier
  0 siblings, 0 replies; 1+ messages in thread
From: François Pottier @ 2021-12-31  8:22 UTC (permalink / raw)
  To: OCaML Mailing List, menhir-list


Dear OCaml & Menhir users,

I am pleased to announce a new release of Menhir, with a major improvement.

The code back-end has been rewritten from the ground up by Émile Trotignon
and by myself, and now produces efficient and well-typed OCaml code. The
infamous Obj.magic is not used any more.

Furthermore, the new code back-end produces code that is more aggressively
optimized, leading to a significant reduction in memory allocation and a
typical performance improvement of up to 20% compared to the previous code
back-end.

   opam update
   opam install menhir.20211230

Happy well-typed parsing in 2022!

--
François Pottier
francois.pottier@inria.fr
http://cambium.inria.fr/~fpottier/

## 2021/12/30

* The code back-end has been rewritten from the ground up by Émile Trotignon
   and François Pottier, and now produces efficient and **well-typed** OCaml
   code. The infamous `Obj.magic` is not used any more.

   The table back-end and the Coq back-end are unaffected by this change.

   The main side effects of this change are as follows:

   - The code back-end now needs type information. This means that
     *either* Menhir's type inference mechanism must be enabled
              (the easiest way of enabling it is to use Menhir via `dune`
               and to check that the `dune-project` file says
               `(using menhir 2.0)` or later)
     *or* the type of every nonterminal symbol must be
          explicitly given via a `%type` declaration.

   - The code back-end no longer allows the type of any symbol to be an
     open polymorphic variant type, such as ```[> `A ]```. As a workaround,
     we suggest using a closed polymorphic variant instead.

   - The code back-end now adheres to the *simplified* error-handling 
strategy,
     as opposed to the *legacy* strategy.

     For grammars that do *not* use the `error` token, this makes no 
difference.

     For grammars that use the `error` token in the limited way permitted by
     the simplified strategy, this makes no difference either. The 
simplified
     strategy makes the following requirement: the `error` token should 
always
     appear at the end of a production, whose semantic action should 
abort the
     parser by raising an exception.

     Grammars that make more complex use of the `error` token, and therefore
     need the `legacy` strategy, cannot be compiled by the new code 
back-end.
     As a workaround, it is possible to switch to the table back-end (using
     `--table --strategy legacy`) or to the ancient code back-end (using
     `--code-ancient`). **In the long run, we recommend abandoning the 
use of
     the `error` token**. Support for the `error` token may be removed
     entirely at some point in the future.

   The original code back-end, which has been around since the early days of
   Menhir (2005), temporarily remains available (using `--code-ancient`). It
   will be removed at some point in the future.

   The new code back-end offers several levels of optimization, which remain
   undocumented and are subject to change in the future. At present, the 
main
   levels are roughly as follows:

   - `-O 0 --represent-everything` uses a uniform representation of the 
stack
     and produces straightforward code.
   - `-O 0` uses a non-uniform representation of the stack; some stack cells
     have fewer fields; some stack cells disappear altogether.
   - `-O 1` reduces memory traffic by moving `PUSH` operations so that they
     meet `POP` operations and cancel out.
   - `-O 2` optimizes the reduction of unit productions (that is, 
productions
     whose right-hand side has length 1) by performing a limited amount of
     code specialization.

   The default level of optimization is the maximum level, `-O 2`.

* The new command line switch `--exn-carries-state` causes the exception
   `Error` to carry an integer parameter: `exception Error of int`. When the
   parser detects a syntax error, the number of the current state is 
reported
   in this way. This allows the caller to select a suitable syntax error
   message, along the lines described in
   [Section 11](http://cambium.inria.fr/~fpottier/menhir/manual.html#sec68)
   of the manual. This command line switch is currently supported by the 
code
   back-end only.

* The `$syntaxerror` keyword is no longer supported.

* Document the trick of wrapping module aliases in `open struct ... end`,
   like this: `%{ open struct module alias M = MyLongModuleName end %}`.
   This allows you to use the short name `M` in your grammar, but forces
   OCaml to infer types that refer to the long name `MyLongModuleName`.
   (Suggested by Frédéric Bour.)

^ permalink raw reply	[flat|nested] 1+ messages in thread

only message in thread, back to index

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-31  8:22 [Caml-list] [ANN] New release of Menhir (20211230) François Pottier

caml-list - the Caml user's mailing list

Archives are clonable: git clone --mirror https://inbox.ocaml.org/caml-list

AGPL code for this site: git clone https://public-inbox.org/ public-inbox