Very interesting and thought-provoking writeup, thank=
you!=C2=A0

Incidentally, we're investigat=
ing the same venues, in our CMU BAP project, as we found out that we need t=
he extensibility in the style of type classes/canonical structures to decou=
ple complex dependencies which arise in the program analysis domain.=C2=A0<=
/div>

In fact, we build our new BAP 2.0 framework largely on your [tagl=
ess-final][1] style which, let's admit it, works much better with type =
classes. Therefore we ended up implementing extensible type representations=
along with registries for our type classes. Unfortunately, the idea of sto=
ring rules in the registry didn't visit us, but we're now thinking =
about how to incorporate it (the classes that we have are very nontrivial, =
usually having hundreds of methods, so we're currently using functors t=
o manually derive on class from another, and registering the resulting stru=
ctures - but using your approach we can register functors as well and autom=
ate the derivation). We also didn't generalize the type class instantia=
tion, so our solutions do have some boilerplate (but I have to admit, that =
the total number of type classes that we need is not very big, so it really=
never bothered us). What could be surprising is that the universe of types=
actually grew quite large, that large that the linear search in the regist=
ry is not an option for us anymore. In fact, we have so many comparisons be=
tween treps, that instead of extracting the extension constructor number fr=
om an extensible variant we had to rely on our own freshly generated identi=
fier. But I'm running in front of myself, an important lesson that we h=
ave learned is that treps should not only be equality comparable but also o=
rdered (and even hashable) so that we can implement our registries as hash =
tables. It is also better to keep them abstract so that we can later extend=
them without breaking user code (to implement introspection as well as dif=
ferent resolution schemes). This is basically an elaboration of your approa=
ch (which is also could be commonly found in Janestreet's Core (Type_eq=
ual.Uid.t) and other implementations of existentials). In our case, we ende=
d up with the following implementation

```

=C2=A0 =C2=
=A0 type 'a witness =3D ..

=C2=A0 =C2=A0 m=
odule type Witness =3D sig

=C2=A0 =C2=A0 =C2=A0 type t

=C2= =A0 =C2=A0 =C2=A0 type _ witness +=3D Id : t witness

=C2=A0 =C2=A0 end

=C2=A0 =C2=A0 type 'a typeid =3D (module Witness with type t =3D = 'a)

=C2=A0 =C2=A0 type 'a key =3D {

=C2=A0 =C2=A0 =C2=A0 = ord : int;

=C2=A0 =C2=A0 =C2=A0 key : 'a typeid;=C2=A0

=C2= =A0 =C2=A0 =C2=A0 type _ witness +=3D Id : t witness

=C2=A0 =C2=A0 end

=C2=A0 =C2=A0 type 'a typeid =3D (module Witness with type t =3D = 'a)

=C2=A0 =C2=A0 type 'a key =3D {

=C2=A0 =C2=A0 =C2=A0 = ord : int;

=C2=A0 =C2=A0 =C2=A0 key : 'a typeid;=C2=A0

=C2=
=A0 =C2=A0 =C2=A0 name : string; (* for introspection *)

=C2=A0 =C2=A0 = =C2=A0 show : 'a -> Sexp.t; (* also for introspection *)

=C2=A0 = =C2=A0 }

=C2=A0 =C2=A0 = =C2=A0 show : 'a -> Sexp.t; (* also for introspection *)

=C2=A0 = =C2=A0 }

```

Now, we can use the `ord` field to ord=
er types, compare them, store in maps, hash tables, and even arrays. E.g., =
this is how our `teq` function looks like,

```

=C2=
=A0 =C2=A0 let same (type a b) x y : (a,b) Type_equal.t =3D

=C2=A0 =C2= =A0 =C2=A0 if x.id =3D=C2=A0 y.id then

=C2=A0 =C2=A0 =C2=A0 =C2=A0 let module X =3D (val x.= key : Witness with type t =3D a) in

=C2=A0 =C2=A0 =C2=A0 =C2=A0 let modu= le Y =3D (val y.key : Witness with type t =3D b) in

=C2=A0 =C2=A0 =C2=A0= =C2=A0 match X.Id with

=C2=A0 =C2=A0 =C2=A0 =C2=A0 | Y.Id -> Type_eq= ual.T

=C2=A0 =C2=A0 =C2=A0 =C2=A0 | _ -> failwith "broken type e= quality"

=C2=A0 =C2=A0 =C2=A0 else failwith "types are not equ= al"

=C2=A0 =C2= =A0 =C2=A0 if x.id =3D=C2=A0 y.id then

=C2=A0 =C2=A0 =C2=A0 =C2=A0 let module X =3D (val x.= key : Witness with type t =3D a) in

=C2=A0 =C2=A0 =C2=A0 =C2=A0 let modu= le Y =3D (val y.key : Witness with type t =3D b) in

=C2=A0 =C2=A0 =C2=A0= =C2=A0 match X.Id with

=C2=A0 =C2=A0 =C2=A0 =C2=A0 | Y.Id -> Type_eq= ual.T

=C2=A0 =C2=A0 =C2=A0 =C2=A0 | _ -> failwith "broken type e= quality"

=C2=A0 =C2=A0 =C2=A0 else failwith "types are not equ= al"

```

It is often used in the=
context where we already know that `x.id =3D <=
a href=3D"http://y.id">y.id`, e.g., when we already found an entry, so =
we just need to obtain the equality witness (we use Janestreet's Type_e=
qual.T, which is the same as yours eq type).=C2=A0

Concerning the representation of the registry, we also experimented with d=
ifferent approaches (since we have a few ways to make a type existential in=
OCaml), and found out the following to be the most efficient and easy to w=
ork with,

```

type ordered =3D {

=
=C2=A0 =C2=A0 order : 'a. 'a key -> 'a -> 'a -> in=
t;=C2=A0 } [@@unboxed]

```

Notice,=
that thanks to `[@@unboxed]` we got a free unpacked existential. We will n=
ext store `ordered` in our registry, which is a hash table,

<= /div>

```

let ordered : ordered Hashtbl.M(Int).t =3D Hashtbl.=
create (module Int)

```

and register it as simple as,

```

=C2=A0 let register: type p. p Key.t -> (p -> =
p -> int) -> unit =3D=C2=A0fun key order ->

=C2=A0 =C2=
=A0 Hashtbl.add_exn vtables ~key:(uid key) ~data:{

=C2=A0 =C2=A0 =C2=A0 = order =3D fun (type a) (k : a key) (x : a) (y : a) ->

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 let T =3D same k key in (* obtain the witness that we found t= he right structure *)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 order x y

=C2=A0 = =C2=A0 =C2=A0}

=C2=A0 =C2=A0 =C2=A0 = order =3D fun (type a) (k : a key) (x : a) (y : a) ->

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 let T =3D same k key in (* obtain the witness that we found t= he right structure *)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 order x y

=C2=A0 = =C2=A0 =C2=A0}

```

Instead of a =
hashtable, it is also possible to use `ordered array ref` (since our `ord` =
is just an integer which we increment every time a new class is declared). =
This will give us even faster lookup.=C2=A0

I hope=
that this was interesting. And if yes, I'm ready to elaborate more on =
our design decision or to hear suggestions and critics. Here are a few link=
s:

- https://github.com/BinaryAnalysisPlatform/bap - the BAP pr=
oject per se.=C2=A0

-=C2=A0https://binaryanalysisplatform.github.io/k=
nowledge-intro-1=C2=A0 - a small introductionary post about BAP 2.0 Kno=
wledge representation

-=C2=A0https:=
//github.com/BinaryAnalysisPlatform/bap/blob/master/lib/knowledge/bap_knowl=
edge.ml - the implementation of the knowledge system

- https://github.com/BinaryAnalysisPlatform/bap/tree/master/lib/bap_=
core_theory - The Core Theory, an exemplar type class of the theory tha=
t we're developing :)

Cheers,

Ivan G=
otovchits

Research Scientist, CMU Cylab

=
[1]:=C2=A0http://=
okmij.org/ftp/tagless-final/index.html

On Wed, Sep 4, 2019 at 11:2=
3 AM Oleg <oleg@okmij.org> wrot=
e:

This is to announce a simple, plain OCaml library to experiment with

type-class/implicits resolution, which can be thought of as evaluating

a Prolog-like program. One may allow `overlapping instances' -- or

prohibit them, insisting on uniqueness. One may make the search fully

deterministic, fully non-deterministic, or something in-between.

There is an immediate, albeit modest practical benefit: the facility

like "#install_printer", which was restricted to top-level, is no= w

available for all -- as a small, self-contained, plain OCaml library

with no knowledge or dependencies on the compiler internals. We show

an example at the end of the message.

This message has been inspired by the remarkable paper

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Canonical Structures for the working Coq user=C2=A0 =C2=A0 =C2=A0 =C2=A0 Assia Mahboubi, Enrico Tassi

=C2=A0 =C2=A0 =C2=A0 =C2=A0 DOI: 10.1007/978-3-642-39634-2_5

Its introduction is particularly insightful: the power of

(mathematical) notation is in eliding distracting details. Yet to

formally check a proof, or to run a program, the omitted has to be

found. When pressed to fill in details, people `skillful in the art'

look in the database of the `state of the art', with the context as

the key. Computers can be programmed similarly; types well represent

the needed context to guide the search.

Mahboubi and Tassi's paper explains very well how this eliding and

filling-in is realized, as a programmable unification, and used in

Coq. Yet their insights go beyond Coq and deserve to be known better.

This message and the accompanying code is to explain them in

plain OCaml and encourage experimentation. It could have been titled

`Canonical Structures for the working OCaml (meta) programmer'.

The rudiment of canonical structures is already present in OCaml, in

the form of the registry of printers for user-defined types. This

facility is available only at the top-level, however, and deeply

intertwined with it. As a modest practical benefit, this facility is

now available for all programs, as a plain, small, self-contained

library, with no compiler or other magic. The full potential of the

method is realized however in (multi-) staged programming. In fact, I'm=

planning to use it in the upcoming version of MetaOCaml to implement

`lifting' from a value to the code that produces it -- letting the

users register lifting functions for their own data types.

http://okmij.org/ftp/ML/canonical.ml

=C2=A0 =C2=A0 =C2=A0 =C2=A0 The implementation and the examples, some of wh= ich are noted below.

http://okmij.org/ftp/ML/trep.ml

=C2=A0 =C2=A0 =C2=A0 =C2=A0 A generic library of type representations: some= thing like

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Typeable in Haskell. Some day it may be built-i= n into the compiler

http://okmij.org/ftp/ML/canonical_leadup.ml

=C2=A0 =C2=A0 =C2=A0 =C2=A0 A well-commented code that records the progress= ive development of

=C2=A0 =C2=A0 =C2=A0 =C2=A0 canonical.ml. It is not part of the library, but = may serve as

=C2=A0 =C2=A0 =C2=A0 =C2=A0 its explanation.

Here are a few examples, starting with the most trivial one

=C2=A0 =C2=A0module Show =3D MakeResolve(struct type 'a dict =3D 'a= -> string end)

=C2=A0 =C2=A0let () =3D Show.register Int string_of_int=C2=A0 =C2=A0 (* Def= ine `instances' *)

=C2=A0 =C2=A0let () =3D Show.register Bool string_of_bool

=C2=A0 =C2=A0Show.find Int 1;;

However contrived and flawed, it is instructive. Here (Int : int trep)

is the value representing the type int. The type checker can certainly

figure out that 1 is of the type int, and could potentially save us

trouble writing Int explicitly. What the type checker cannot do by

itself is to find out which function to use to convert an int to a

string. After all, there are many of them. Show.register lets us

register the *canonical* int->string function. Show.find is to search

the database of such canonical functions: in effect, finding *the*

evidence that the type int->string is populated. Keeping Curry-Howard

in mind, Show.find does a _proof search_.

The type of Show.find is 'a trep -> ('a -> string). Compare w= ith

Haskell's show : Show a =3D> a -> String (or, desuraging =3D> = and Show)

show : ('a -> string) -> ('a -> string).=C2=A0 Haskell'= ;s show indeed does

not actually do anything: it is the identity function. All the hard

work -- finding out the right dictionary (the string producing

function) -- is done by the compiler. If one does not like the way the

compiler goes about it -- tough luck. There is little one may do save

complaining on reddit. In contrast, the first argument of Show.find is

trivial: it is a mere reflection of the type int, with no further

information. Hence Show.find has to do a non-trivial work.=C2=A0 In the

case of int, this work is the straightforward database search --

or, if you prefer, running the query ?- dict(int,R) against a logic

program

=C2=A0 =C2=A0 =C2=A0dict(int,string_of_int).

=C2=A0 =C2=A0 =C2=A0dict(bool,string_of_bool).

The program becomes more interesting when it comes to pairs:

=C2=A0 =C2=A0 =C2=A0dict(T,R) :- T =3D pair(X,Y), !,=C2=A0

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dict(X,DX), dict(Y,DY), R=3Dmake_pair_dic= t(DX,DY).

Here is how it is expressed in OCaml:

let () =3D

=C2=A0 let open Show in

=C2=A0 let pat : type b. b trep -> b rule_body option =3D function

=C2=A0 =C2=A0 | Pair (x,y) ->

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Some (Arg (x, fun dx -> Arg (y, fun dy ->=

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Fact (fun (x,y) -> "(" ^ dx= x ^ "," ^ dy y ^ ")"))))

=C2=A0 =C2=A0 | _=C2=A0 =C2=A0 =C2=A0 -> None

=C2=A0 in register_rule {pat}

let () =3D Show.register (Pair(Bool,Bool))

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(fun (x,y) -> string_of_bool x = ^ string_of_bool y)

Our library permits `overlapping instances'. We hence registered the

printer for generic pairs, and a particular printer just for pairs of

booleans.

The library is extensible with user-defined data types, for example:

=C2=A0 =C2=A0type my_fancy_datatype =3D Fancy of int * string * (int -> = string)

After registering the type with trep library, and the printer

=C2=A0 =C2=A0type _ trep +=3D MyFancy : my_fancy_datatype trep

=C2=A0 =C2=A0let () =3D Show.register MyFancy (function Fancy(x,y,_) -><= br> =C2=A0 =C2=A0 =C2=A0string_of_int x ^ "/" ^ y ^ "/" ^ &= quot;<fun>")

one can print rather complex data with fancy, with no further ado:

=C2=A0 =C2=A0Show.find (List(List(Pair(MyFancy,Int)))) [[(Fancy ...,5)];[]]=

As Mahboubi and Tassi would say, proof synthesis at work!

We should stress that what we have described is not a type-class

facility for OCaml. It is *meta* type-class facility.=C2=A0 Show.find hasmany drawbacks: we have to explicitly pass the trep argument like

Int. The resolution happens at run time, and hence the failure of the

resolution is a run-time exception. But the canonical instance

resolution was intended to be a part of a type checker. There, the

resolution failure is a type checking error. The trep argument,

representing the type in the object program, is also at

hand. Likewise, the drawbacks of Show.find disappear when we use the

library in a meta-program (code generator). The library then becomes a

type-class/implicits facility, for the generated code -- the facility,

we can easily (re)program.