ML Family Workshop

5Sep/140

ML Family Workshop

Welcome to the ML Family workshop liveblog with Leo, our guest blogger Gabriel Scherer and myself.

Welcome

Welcome, this year we worked closely with the OCaml workshop to give the two workshops they're own focus, this workshop is more theory focused then the OCaml workshop. This workshop doesn't just include features in current SML, but also features that could be included in the future. We welcome all ML's from the ML family. We will be publishing the proceeding and they will be available free online.

Session 1: Module Systems

Chair: Didier Rémy

1ML -- core and modules as one (Or: F-ing first-class modules) (Research presentation)

Andreas Rossberg

Abstract: We propose a redesign of ML in which modules are first-class values. Functions, functors, and even type constructors are one and the same construct. Likewise, no distinction is made between structures, records, or tuples, including tuples over types. Yet, 1ML does not depend on dependent types, and its type structure is expressible in terms of plain System F-omega, in a minor variation of our F-ing modules approach. Moreover, it supports (incomplete) Hindley/Milner-style type inference. Both is enabled by a simple universe distinction into small and large types.

We are rethinking the design of ML. We talk about ML as a language, its actually at least 2.5 languages. There is the core language and the module language. There's even a 3rd language of type expression. This are all syntactically different.

Who likes the type expression syntax in ML ? No-one

Modules in ML, as 2nd class. Modules are more verbose but powerful, leaving us with some difficult decisions. There has been work in packing modules as fist class values.

We want a classless society !
I want the expressive power of modules, with style.

I propose 1ML - first class modules language/ unified ML

No more staring with core language and add modules constructs. Instead we start with a module language and then add some core constructs.

Example of explicit typed 1ML: Similar to ML plus two functor types, one for pure and one for inprue. Type declaration syntax is replaced with anonymous modules.

Examples of types as modules: This common in module IL's, its not too new.

This extends to type constructors as applicative functors.

The unavoidable map functor example :)

A more interesting example: The textbook example of selection something at runtime. This could be done in OCaml but it would be much more verbose.

Example of collection to demonstrate associated types. Examples of how this can again be achieved in OCaml but its still more verbose.

Example code using higher kinded polymorphism, which couldn't be do in OCaml

I am using F-ing modules semantics and collapsing syntax and semantics.

Challenges: Decidability for subtypes, phase separation and type interface. The avoidance problem wasn't a problem at all.

Decidability: The ability to have abstract signatures can introduce decidability issues. Matching can lead in infinite sequence of substitution.

We disallow substitution an abstract types. This is also in papers from the 80's. We separate types into polytypes and monotypes. Only small types can have type type'.

Example of F'ing types definition for large and small types.

Phase separation: Not a problem

Type inference: We will only infer small types, so annotations can be omitted. We can recover ML style polymorphism with implicit functions.

Revisiting the Map example with all the type annotations.

Inference and subtypes: Subtyping relations almost degenerates to type equivalence on small types.

Incompleteness: Subtyping allows width subtyping, Include is a form of record concatenation and value restriction inside functors.

SML can't infer any of this either !

F-ing modules is almost a first class module language. Truly first-calls modules are perfectly doable without regressing the language. Isolate the essence of ML's core/modules.

Future: Generalise applicative functors, Generalise implicit functors (aka type classes ?), Row polymorphism, more sophisticated inference for first class polymorphism, recursive modules. This + MixML nit be like Scala.

Q: Is predicativity something we might regret in the future ?
A: We have pack like construct for first class modules

Q: The last 10 years we have moved towards dependant types, you've seemed to move the opposite direction ?
A: Interesting question

Type-level module aliases: independent and equal (Research presentation)

Jacques Garrigue (Nagoya University); Leo White (University of Cambridge)

Abstract: We promote the use of type-level module aliases, a trivial extension of the ML module system, which helps avoiding code dependencies, and provides an alternative to strengthening for type equalities.

Type-level module aliases - a simple feature in OCaml 4.02

Writing ```module S = String``` in signature will alias S to string.

Introducing the new inferred type-level module aliases and how this extending the subtyping relation.

The concept of type-level modules alias appears first in Traviatta (ICFP '06), used to allow type interface of recursive modules.

Later we discovered that type-level modules aliases were a good match for OCaml-style applicative functors.

Helping applicative functors was probably not enough to justify a new feature. However remember that the original goal was to simplify program analysis.

Modules as name spaces: Sharing of types allow grafting a modules somewhere else in the hierarchy.

This idea only working in theory: for separate compilation things backtrack. We have a risk of name conflict in are libraries as forest of modules. -pack is monolithic, using modules as name spaces creates large interfaces (e.g. JS core).

How to other language handle this issue: SML using a compilation manage which use special files using a dedicated syntax.

Example of manual packing

Now adding type-level module alias. We do not need to duplicate the module signatures.

Induced dependencies - a new compilation flag -no-alias-deps for 4.02.

3 Steps:
- Create a mapping unit whose role is only to map short names to prefixed names
- open this unit
- create an export unit

Compared to using -pack, this approach requires rename source files to unique name and add an open statement.

In core/async this divides the size of compiled interfaces by 3,

Currently, type level modules aliases can be created only for a limited subset of module paths, this excludes feature like functor applications, opaque coercions, functor arguments and recursive modules.

C: This also reduces required false dependences.

Q: What's the meta theory of this ?
A: This wasn't part of this work but were working into normalisation in the OCaml compiler

C: We are also using kinda thing for backpack in Haskell

Q: Currently we use ocamldep, what will happen to this tool ?

Session 2: Verification

Chair: Anil Madhavapeddy

Well-typed generic smart-fuzzing for APIs (Experience report)

Thomas Braibant (Cryptosense); Jonathan Protzenko; Gabriel Scherer (INRIA)

Abstract: In spite of recent advances in program certification, testing remains a widely-used component of the software development cycle. Various flavors of testing exist: popular ones include unit testing, which consists in manually crafting test cases for specific parts of the code base, as well as quickcheck-style testing, where instances of a type are automatically generated to serve as test inputs.

These classical methods of testing can be thought of as internal testing: the test routines access the internal representation of the data structures to be checked. We propose a new method of external testing where the library only deals with a module interface. The data structures are exported as abstract types; the testing framework behaves like regular client code and combines functions exported by the module to build new elements of the various abstract types. Any counter-examples are then presented to the user. Furthermore, we demonstrate that this methodology applies to the smart-fuzzing of security APIs.

I work for a start-up, work on hardware security models (HSM). Some hackers have been about to steal lots of money due to issues with HSM.

Cryptosense Workflow

Testing -> Learning -> Model-checking -> ...

We will focus on Testing, we automatically test API's using QuickCheck (a combinator library to generate test case in Haskell), But to generate tests we need to know how to generate our types/data structures.

ArtiCheck - a prototype library to generate test cases for safety properties. Comparison of QuickCheck of ArtiCheck.

Types help us to generate good random values, API's help generate values that have the right internal invariants.

Well-typed does not mean well-formed so we still need fuzzing.

Describing & Testing API's, we are only considering on first order functions.

Example of how we would test an example of API for red black trees.

Introducing a toy DSL for describing API's: describing types, functions and signatures.

There are many ways to combine these functions. so we introduce a richer DSL for types.

Field Report: Cryptosense
We need to enumerate a big combinational space made of constants and variables. We want good coverage and generate templates in a principle manor.

We have a library of enums with a low memory footprint with efficient access.

We have another DSL for describing combinatorial enumeration.

Example test results from a HSM. We have 10^5 test in 540 secs.

We have a principled way to test persistent APIs.

Writing a generator for testing BDD would be very tough without solutions like this.

Come work for Cryptosense :)

Q: How long does it take run the data on a real HSM ?
A: the numbers show actually already include already, we dynamically explore the state space.

Q: How do you trade-off coverage of API to coverage of state space ?
A: You a systematic approach. e.g. constantly generate trees which are bigger and bigger, so you don't just regenerate empty trees.

Q: Can you find a simple counterexample like QuickTest ?
A: Its not currently a function but maybe we can add a function to find an example smaller that exhibit the same behaviour

Q: Is this open source ?
A: Yes, the prototype is on github

Improving the CakeML Verified ML Compiler (Research presentation)

Ramana Kumar; Magnus O. Myreen (University of Cambridge); Michael Norrish (NICTA); Scott Owens (University of Kent)

Abstract: The CakeML project comprises a mechanised semantics for a subset of Standard ML and a formally verified compiler. We will discuss our plans for improving and applying CakeML in four directions: optimisations, new primitives, a library, and verified applications.

Introducing the CakeML team, we are verification people.

Why ML for verification: its clean and high level, its easy to reason about so can we then formally reason about it.

CakeML: a subset of ML which is easy to reason about, with verified implementation and demonstrate how easy it is to generate verified cakeML

How can we make proof assistant into trustworthy and practical program development platforms.
functions in HOL -> CakeMl -> Machine code

2 part to this talk:
- verified compiler
- making the compiler better

Highlights of CakeML compiler:
Most verified compiler work from source code to AST to IR to bytecode to Machine code. We take a difference approach and can do both dimensions in full.

The CakeML language, its SML without IO and functors. Since POPL '14 we've added arrays, vectors, string etc..

Contributions of POPL '14 paper: specification, verified algorithms, divergence preservation and bootstrapping. Proof developer where everything fits together.

Numbers: its slower, our aim is to be faster then interpreted OCaml.

The compiler phases are simple, we use only a few hops from IR to x86. Bytecode simplified proofs of real-eval-loop but made optimisation impossible.
We are now add many more IR's to optimise, like common compilers. Currently we only have x86 backend but we are adding ARM etc..

C: Inlining and specialisation of recursive functions are the key optimisations that you should focus on

C: You could use your compiler (written in cakeML) to evaluate the value of various optimisations

Q: Why don't you support functors ?
A: Its not a technical issue, this just a lot of work and its at the top of the priorities stack.

C: Your runtime value representation could make a big performance issues

C: Can you eliminate allocation in large blocks

Session 3: Beyond Hindley-Milner

Chair: Jacques Garrigue

The Rust Language and Type System (Demo)

Felix Klock; Nicholas Matsakis (Mozilla Research)

Abstract: Rust is a new programming language for developing reliable and efficient systems. It is designed to support concurrency and parallelism in building applications and libraries that take full advantage of modern hardware. Rust's static type system is safe and expressive and provides strong guarantees about isolation, concurrency, and memory safety.

Rust also offers a clear performance model, making it easier to predict and reason about program efficiency. One important way it accomplishes this is by allowing fine-grained control over memory representations, with direct support for stack allocation and contiguous record storage. The language balances such controls with the absolute requirement for safety: Rust's type system and runtime guarantee the absence of data races, buffer overflows, stack overflows, and accesses to uninitialized or deallocated memory. In this demonstration, we will briefly review the language features that Rust leverages to accomplish the above goals, focusing in particular on Rust's advanced type system, and then show a collection of concrete examples of program subroutines that are efficient, easy for programmers to reason about, and maintain the above safety property.

If time permits, we will also show the current state of Servo, Mozilla's research web browser that is implemented in Rust.

Motivation: We want to implement a next gen web browser servo

See the rusty playpen online http://www.rust-lang.org/

We want a language for systems programming, C/C++ dominate this space. We want sound type checking. We identify classic errors and using typing to fix them.

Well-typed programs help to assign blame.

Systems programmers want to be able to predict performance.

Rust syntax is very similar to OCaml. Types have to explicit on top-level functions. Rust has bounded polymorphism, we don't have functor, this is more similar to typeclasses.

Value model between OCaml and Rust is very different. Rust in-lines storage but must pick the size of the largest variant.

To move or copy ? Move semantics

The Copy bound expresses that a type is freely copyable and its checked by the compiler.

Many built-in types implement Copy

Why all the fuss about move semantics ?
Rust has 3 core types: non-reference, shared reference, mutable unaliased reference. (plus unsafe pointers etc..)
Rust patch matching moves stack slots. Rust introduces ref pattens and takes references to its interioa.

This is all more type soundness

Q: Is borrowing first order ? Can I give to someone else ?
A: Yes, I'll explain more later

Lifetimes: simulator to regions used by Tofte/Talpin

Example of sugar and de-suger forms of functions: how we can explicitly constrain lifetimes

Is every kind of mutability forced into a linearly passed type ?
Not really, there's inherited and interior mutability.

Rust has closures

Q: Does rust repeat earlier work on cyclone ?
A: Cyclone tried to stay very close to C but we more then happy to diverge from C/C++

Q: Large scale systems code becomes tanted with linearity, how do you deal with that ?
A: You can break out of it, only very occationally need to break into unsafe code

Q: How do you know that your type system is safe ?
A: There's ongoing work

Doing web-based data analytics with F# (Informed Position)

Tomas Petricek (University of Cambridge); Don Syme (Microsoft Research Cambridge)

Abstract: With type providers that integrate external data directly into the static type system, F# has become a fantastic language for doing data analysis. Rather than looking at F# features in isolation, this paper takes a holistic view and presents the F# approach through a case study of a simple web-based data analytics platform.
Experiments can have a life of its own independent of a large-scale theory. This is a relevant case study, theory and language independent and demonstrates a nice combination of language features

Demo: demo using type providers and translation from F# to JS.

Features used: type providers, meta-programming, ML tyoe inference and async workflows.

JS Intergration: questions on whether to use JS or F# semantics.
Asynchronous workflows - single thread semantics

Not your grandma's type safety,

Q: These language feature are good for specific application, do they make the language worse for other features ?
A: No the extensions use minimal syntax

Q: Is there version contrainting on type providers for data source ?
A: No

The rest of the posts will follow soon

Session 4: Implicits

Chair: Andreas Rossberg

Implicits in Practice (Demo)

Nada Amin (EPFL); Tiark Rompf (EPFL & Oracle Labs)

Abstract: Popularized by Scala, implicits are a versatile language feature that are receiving attention from the wider PL community. This demo will present common use cases and programming patterns with implicits in Scala.

Modular implicits (Research presentation)

Abstract: We propose a system for ad-hoc polymorphism in OCaml based on using modules as type-directed implicit parameters.

Session 5: To the bare metal

Chair: Martin Elsman

Metaprogramming with ML modules in the MirageOS (Experience report)

Anil Madhavapeddy; Thomas Gazagnaire (University of Cambridge); David Scott (Citrix Systems R&D); Richard Mortier (University of Nottingham)

Abstract: In this talk, we will go through how MirageOS lets the programmer build modular operating system components using a combination of functors and metaprogramming to ensure portability across both Unix and Xen, while preserving a usable developer workflow.

Compiling SML# with LLVM: a Challenge of Implementing ML on a Common Compiler Infrastructure (Research presentation)

Katsuhiro Ueno; Atsushi Ohori (Tohoku University)

Abstract: We report on an LLVM backend of SML#. This development provides detailed accounts of implementing functional language functionalities in a common compiler infrastructure developed mainly for imperative languages. We also describe techniques to compile SML#'s elaborated features including separate compilation with polymorphism, and SML#'s unboxed data representation.

Session 6: No longer foreign

Chair: Oleg Kiselyov

A Simple and Practical Linear Algebra Library Interface with Static Size Checking (Experience report)

Akinori Abe; Eijiro Sumii (Tohoku University)

Abstract: While advanced type systems--specifically, dependent types on natural numbers--can statically ensure consistency among the sizes of collections such as lists and arrays, such type systems generally require non-trivial changes to existing languages and application programs, or tricky type-level programming. We have developed a linear algebra library interface that guarantees consistency (with respect to dimensions) of matrix (and vector) operations by using generative phantom types as fresh identifiers for statically checking the equality of sizes (i.e., dimensions). This interface has three attractive features in particular.

(i) It can be implemented only using fairly standard ML types and its module system. Indeed, we implemented the interface in OCaml (without significant extensions like GADTs) as a wrapper for an existing library.

(ii) For most high-level operations on matrices (e.g., addition and multiplication), the consistency of sizes is verified statically. (Certain low-level operations, like accesses to elements by indices, need dynamic checks.)

(iii) Application programs in a traditional linear algebra library can be easily migrated to our interface. Most of the required changes can be made mechanically.

To evaluate the usability of our interface, we ported to it a practical machine learning library (OCaml-GPR) from an existing linear algebra library (Lacaml), thereby ensuring the consistency of sizes.

SML3d: 3D Graphics for Standard ML (Demo)

John Reppy (University of Chicago)

Abstract: The SML3d system is a collection of libraries designed to support real-time 3D graphics programming in Standard ML (SML). This paper gives an overview of the system and briefly highlights some of the more interesting aspects of its design and implementation.