syslog
2Sep/140

ICFP 2014: Day 2

Leonhard Markert@srg.cl $

This is day 2 of the 19th International Conference on Functional Programming. Leo and I(Heidi) are here bright and early to bring you the highlights from Gothenburg, Sweden. We are not the only livebloggers here at ICFP, check out Edward Yang's excellent notes.

Keynote (Chair: Jesse Tov)

Behavioral Software Contracts
Speaker: Robert Bruce Findler (Northwestern University)

Intro: Robby's work on behavioural software contracts is beautiful :)

Instead of an abstract, the keynote has a bibliography of interesting work in the area

Contracts are infectious

Example of racket code, working in a call-by-value language here a point-in function. Demonstration of contract violation including a blame, from applying bools instead of integers. Robby's adds an contract to guard access to point-in and the blame now points to the top-level.

Graph demonstrating how the number contracts in Racket in git has constantly increased. Robby identifiers the outliers e.g. big libraries with lots of contracts have been introduced. Most commits add 1 contract, if they add any.

Everyone is the audience is thinking, why use contracts when you can use types, as identified by Robby.

Contracts Verse Types: The previous example is used with types instead of contracts and negative integers are used to demonstrate how types are insufficient. He introduces a dependant contract, hinting at a comparison to dependent types. We have the arguments at the time of checking unlike dependant types hence its actually possible in real languages.

Switch to an example of a function called dc, which takes a drawing function and bonding box and draws as a side effect. Robby introduces a broken function, which introduces state pollution. We can't check that state after equals the state before, but this would be too late. Here's a contract example, generating a random state and testing for state pollution, this doesn't give us complete corrects but its good enough in this contexts.

Don't think about contracts in terms of projections, but in terms of boundaries (work by Christos)

Addition of contacts to simply typed lambda calculus, in particular how monitors add contracts to boundaries. Example of if we have contract (x =< 3), how values of x > 3 causes the function to abort. If we have another contract to checks f's arguments and return values are x=<3, we allow the function to passed though to with extra monitors on return values.

What happens if we have a bug in the monitor ?

Applications: How can contracts work if you don't give have access to the code the other side of the boundary. We can use the contract exercise function to generate random functions to test monitors. We have an notion of whether one contract is stronger than another.

Gradual typing: How we interfere between typed and untyped languages, see how we do it with Typed Racket.
Static Analysis: Havic, exhaustive boundary checking

This presentation is actually written in racket, all output is real output.

Takeaways: Contracts are not just crappy types, contracts infect code bases, boundaries are vital.

ICFP knows how to prove specifications. Contract infectiousness is our opportunity to share that knowledge with the world

Q: Randomisation would break formal tools, why not use NaN ?
A: Just setting to default may not find errors

Q: Contracts are checked at runtime, what about the overhead ? Is there a switch to turn it off ?
A: There's no kill switch, but there are option contracts

Q: Effects are alarming !
A: Well, it doesn't alarm me, its vital for practical code and we need to look into this further

Q: Why can't we introduce randomness to type checking ... ?
A: that the point

Q: Thank you for the cross community impact !

Q: How is the blaming system implemented ? Can we add it to python or C ?
A: Fundamentally we could do this, we would love to see more of this.

Q: The opposite of gradual typing, often checks aren't just typed based. Can we use types to check for placement of contracts ?
A: Contracts are specification, they can be wrong like normal programs.

Q: Dependent contracts seems to explicitly pass all arguments, its looks a bit strange ?
A: Explicitly labelling dependency makes it cheaper

 


 

Session 6: Contracts & Scheme (Chair: Michael Sperber)

Soft Contract Verification
Phuc C. Nguyen (University of Maryland); Sam Tobin-Hochstadt (Indiana University); David Van Horn (University of Maryland)

This is the snake game with contracts, most computation time is executing contracts not code, there is serious overhead.

Contract and blame avoidance cycle, how can we then introduce static contract verification.

We will consider soft contract verification:
- we can analysis first class contracts
- using SMT solver
- competitive
- works even without apparent contracts

CPCF

Introduce PCF language, and introducing contract notation.

How can we verify CPCF ? We abstract CPCF function to a type, symbolic values are sets of contracts.
Soundness: All concentration are approximated by CPCF

Check refine values are they flow through, these refines can influence the computation.

Theorem: Verified modules can't be blamed - Proved
We proof well to soft typing, occurrence typing, H.O recursion schemes and dependent refinements and video games for the beginning.

Contracts fail at first order, only need a F.O solver for verification
Q: If we replace with pos? with >4, where you short cutting the SMT solver ?
A: Yes, the name was just a shortcut

Q: How to do you handle recursion ?
A: This is described in the paper

On Teaching How to Design Programs: Observations from a Newcomer
Norman Ramsey (Tufts University)

This is course in problem solving, not functional programming

Process:
1) Describe data, input, making data examples
2) Describing functions, making function examples
3) Creating code templates from types, fill in the template
4) Test, review and re-factor

Example of 2D-tree data definition and writing a function to find the nearest point.

What's hard of students ?

- Type directed development is difficult even though there's mechanical rules to apply
- "Purpose statements" are hard, student just ignore them and read code, this breaks down then they get to recursion

The Tech:
- You don't need 5 language levels
- Don't be fooled by the DrRacket IDE - designed for fill racket, but every time you compile, untested code is thrown in your face
- Go deep intro "world programs" - interactive apps by composing pure functions

Open Problem - How can we evaluate students code ?
- Is this code the produce of systematic design ?
- Experienced instructions use point deducts

HtDP: low cost, high reward

Q: Design recipes suggest how to evaluate student assignments, this is the solution to your problem ?
A: We'll sit down and talk about it

 


 

Session 7: Tools Used in Anger (Chair: Jacques Garrigue)

SML# in Industry: A Practical ERP System Development (Experience Report)
Atsushi Ohori (Tohoku University); Katsuhiro Ueno (Tohoku University); Kazunori Hoshi (NEC Software Tohoku, Ltd.); Shinji Nozaki (NEC Software Tohoku, Ltd.); Takashi Sato (NEC Software Tohoku, Ltd.); Tasuku Makabe (NEC Software Tohoku, Ltd.); Yuki Ito (NEC Software Tohoku, Ltd.)

Lem: Reusable Engineering of Real-World Semantics
Dominic P. Mulligan (University of Cambridge); Scott Owens (University of Kent); Kathryn E. Gray (University of Cambridge); Tom Ridge (University of Leicester); Peter Sewell (University of Cambridge)


 

Session 8: Type Systems (Chair: Geoffrey Mainland)

Safe Zero-Cost Coercions for Haskell
Joachim Breitner (Karlsruhe Institute of Technology); Richard A. Eisenberg (University of Pennsylvania); Simon Peyton Jones (Microsoft Research); Stephanie Weirich (University of Pennsylvania)

Hindley-Milner Elaboration in Applicative Style (Functional Pearl)
Francois Pottier (INRIA)


 

Session 9: Incremental Computing (Chair: Tiark Rompf)

Settable and Non-Interfering Signal Functions for FRP
Daniel Winograd-Cort (Yale University); Paul Hudak (Yale University)

Functional Programming for Dynamic and Large Data with Self-Adjusting Computation
Yan Chen (Max Planck Institute for Software Systems); Umut Acar (Carnegie Mellon University); Kanat Tangwongsan (Mahidol University)


 

ICFP Contest Presentation

ICFP 2004 Most Influential Paper Award

1Sep/140

ICFP 2014: Day 1 (another perspective)

hh360@srg.cl $

This is day 1 of the 19th International Conference on Functional Programming. Leo and I (Heidi) are here live blogging from Gothenburg, Sweden. He posts is here, we will combine them (with photos :)) in the coming weeks.

Opening

Good morning and time to get started. Official definition of ICPF: its a forum for discussion of researchers. 466 people have registered for the conference or associated events. 21 sponsors giving $60K of sponsorship.

We can get a boat to the conference banquet :)

A big big thank you to the PC.


 

Keynote: Using Formal Methods to Enable More Secure Vehicles: DARPA's HACMS Program
Author & Speaker: Kathleen Fisher (Tufts University)

DAPRA, US funding organisation with rotating project management so constant surprises. Clean slate formal methods for proving security for embedded systems. Power, TV, medical devices, phones and vehicles, all have remote security risks with amusing exploits. Car is a computer on wheels, with many ECU's contacted with buses both high and low speed. If you can connect to the network, you can control anything under software control, getting to the network is easier then you'd think e,g using the diagnostic plug under the steering wheel. Almost all the car systems are under software control, include breaks. You can get access remotely by 4 ways: e.g. via bluetooth, allstar, virus to mechanises diagnostic equipment, via music to buffer overfull car software.

The traditional fix: air gaps & obscurity. Air gaps a no longer the case, cars need to be networked e.g. for remote monitoring.
Lots of documentation for this systems nowadays

The desktop approach fix: this approach barely works on desktop. The security issues with security code, approach 1/3 in recent watch list.
Current solutions are millions of LOC, extensive monitoring need extra CPU. Embedded codes goes though lots of review, no "patch every Tuesday" approach.

The new fix: Formal method based approach. Why now ? Lots of tools & infrastructure which is now mature and advances in SAT solves in later 10 years.

3 approaches:

code synthesis: people can't write secure C code, computers can and provide proofs. DSLs, Interactive theorem prover as PL.
HACMS program: collection of performers, bid for technical areas such as:
- Vehicle experts
- OS
- Control systems e.g. fly control
- Research Integration: managing rest of performers, composition, tools
- Red team: white hat hackers, measure effort needed to hack

Air team
- Boeing - unmanned little bird helicopter & open source version
- NICTA
- Galois
- RC*/U.Minn

All cars are vulnerable, no point in naming names.

composition means how do you get high assurance components and build a high assurance systems ?
Started with Pem testing 4 vehicles, 2 prep and 2 open source. The systems had been engineered to be easy to connect so this wasn't a surprise. Hijacking demonstration of AI drown using aircrack. This can be quite a shock to the user. e.g care driver. Ever 18 months, the red teams will report on how much effort is required to hack. Starting with monolithic quad. They began by upgrading processor and refactor monolithic software, the H/W abstraction layer was a big hit with the community. Gradually swapping out legacy code to high assure code.

How should you do if you know your under a DoS attack ? it depends on context

Its been proved that system is memory safe, ignore malformed messages, ignores non-authenticated and all good messages will reach motor control.

"The most secure UAV on the planet" :)

AADL - language for architecture specification used to propagate system wide properties down until an assurance it reached. A formal systematics haves been added to AADL and provided using resolute tool, assurance cases and synthesis tools. The glue code can be generated, this is typically the weak point.

Galois haven written two DSL's in Haskell, They have created Ivory for C like code and Tower for tasks/connections.

NICTA have written a fully functionally code micro kernel.

Boeing are using the tools build to develop the unmanned little bird. There on track for running real HACKEMS code on a helicopters in 2 years. There's a linux partition for mission specific code, the red team will have full access to this partition.

Q: Are you working on the wrong problem ? We don't have the talented people to make this commonplace, this isn't cool enough
A: Hacking demonstrations get people interested. The US government is considering mandating that all cars are networked, but this is dramatic
security issues, remote code execution

Q: Why is the hardware so constrained ?
A: It is very high scale, there's very small margins

Q: What did this cost, what this research want to do something differ to what management want ?
A; $70 million, we discuss but that why its good to NFS

Q: How much more expensive will high assures systems be ?
A: We don't know yet, that what we are working on. We focus on re-usability, open source, scale to reduce costs.

Q: What's the multi-core story ?
A: Its much more difficult to verify multi-core, bowing want single core, maybe they won't be available soon.

Q:
A: AR Don was started with an existing open source platform

Q: How will this approach scale
A: 80 % quadcoper is high assurance. But we don't even now how many CoL there is in car ? Linux is big, but we can add high assurance code progressively.

Q: How can insure and secure software co-exist ?
A: That's exactly what we are testing with the red team and linux partition.

Q: How do you know that hardware isn't vulnerable ?
A: Our project is big enough, other teams work this question.


Session 1: Domain Specific Languages I (Chair: Anil Madhavapedd)

Building Embedded Systems with Embedded DSLs (Experience Report)
Speaker: Patrick Hickey (Galois Inc)

A big thank you to kathleen

Goal: build a high assurance helicopter (quadcopter)

Challenges: Its an embedded systems, its high real time system (1 millsec), system needs to be safe and secure.
Constrains: 3 engineers and 18 months, similar projects are 50K CoL.

There are billions of embedded systems, the 1 pence cost sensitivity it very high. Basically choice between C, C or C.
Typical secure issues, difficult to push a patch.

Approach: We would love to use Haskell or OCaml, but GC is not compatible with hard real-time. C or C++ (a subsec), a trade-off between safety and developer productivity.

Build your own tools, starting with a clean state language, correct by construction. We built a embedded DSL compiler in just a few month. The language is embedded in Haskell and compiles to C, its OS and called Ivory. Ivory is safe subset of C with memory & type system, no undefined behaviour. No heap, you can only allocate to the stack or global. We embedded in Haskell so we can utilise its type checker. Haskell is ivory macro language.

Modules are composed to expressions and global, there's a global name space.

Tower DSL - for organising whole applications. Build on top off ivory, to combine procedures into applications.

The quadcopter story - we have almost build all code in ivory. The bulk of the code is building drivers, that was about 10 kloc, the application was only 3 kloc. We needed to operate with existing the messaging format, we mangled some python code to generate ivory. Totally about 48 kloc.

Evaluation:
The code was delivered to the red team, and now no undefined/buffer overflows etc.. There exists some broad system level issues.

This was a large complex application, but we build high assurance code very quickly. Embedding DSL in Haskell allowed us to build compositional programs.

Source code is at ivorylang.org

Q: You and OCaml both target ARM, so why not us OCaml
A: We wanted compatibility with niche systems

Q: Did you run out of memory ? how do you communication this to the programmer ?
A: We don't currently have way to determine maximum stack depth, we are working on it.

Q: Correct of construction C code, you mean subset of C right ?
A: Yes, no pointer arithmetic

Q: Array handling
A: All arrays are static length

Q: What about error messages for embedded DSL's ?
A: It tough, and we would love to improve it

Q: How do you stop integer overflow ?
A: Using assertions instead of expressing in the type systems. The user can manually add annotations.

Q: Your re-targeted a python script to generate ivory instead of C, how many bugs did you find ?
A: Not sure, its was more clean slate

Concurrent NetCore: From Policies to Pipelines
Speaker: Cole Schlesinger (Princeton University)

Policy language for describing network policies. A quick intro to basic networking, we traditionally we use distributed protocols such as OSPF but now we have a centralized server to coordinate. We us openflow 1.0, each switch is a simple predicate and action table. There isn't just 1 table, different tables for many difference purposes e.g. different layer. Some switches have reconfigurable pipelines. We are now looking at OpenFlow 2.0 to switch into two phrases: configuration (like flashing router) and population.

We have developed a core calculus for pipelines, we have hardware models for 3 switch architectures and add virtual pipelines to our compiler.

The language:

example using the Broadcom trident switch, each table is a function from a packet to a set of packets

The syntax for an example pipeline (trident switch)
W;(all,{port,vport}) ; x:(dst MAC,port) ; if -(vport=1) then y:(vport,port) else id ; z:(all,all)

The control can connect to many switchers, the programmer can express a virtual pipeline as a minimum for each physical pipeline

example controller
m:(typ, port) + r:(dst MAC, port)

We developed 3 general techniques we designed for "RMT" architecture
1. Consolidate packet duplication
2. Refactor field modification
3. find field size

The syntax for the language, see paper. We are inspired by the frenetic family of languages.

We have added concurrency and thus race conditions, we using typing judgements for read and write permissions on each field.
Our normalisation lemma handles rules such as no overlap between read and write sets.

Q: Your plus operator seems to compile after parallelism
A: Its not compiled away just implicit

 


 

Session 2: Static Analysis (Chair: Ken Friis Larse)

SeLINQ: Tracking Information Across Application-Database Boundaries

Speaker: Daniel Schoepe (Chalmers University of Technology)

Motivation: Information flow control for computation & database queries. We given data to many complex application that we don't trust. Communication bounds better application is weak spot, so we consider end-to-end. Our contribution is end-to-end information flow with type checker and compiles to F#.

Typically we interface with SQL database, if we don't do it properly then there's lots if issues. Now people write code to generate inference code with SQL

Information flow code should into lead private data. Non- interference means private data channels shouldn't interact with public data channels. Language-integrated query using a toy language (subset of F#) with includes meta-programming. Security policies and security levels are expressed in the type system. We can then annotate databases columns, e.g. name is public but age is private.

Low equivalence relation is equivalence taking into consideration the new types for security levels. We can then use this relation to encode non-interference.

The type system is based on flowCaml, by Pottier and Simonet, 2003.

Our type checker and compiler implementation is written in Haskell using BNFC, we support ADT's .

Using a movie rental database as an example, we consider addresses as private but cities as public. We can query for all rental from a city with public but if we then try to get addresses, the type checker will reject this.

Q: Do you support more complex labelling schemes ?
A: Yes, we support lattices

Q: Can we classify fields or rows ?
A: No

Q: The exists function you demonstrated doesn't depend on type of elements, does it ?
A: No, but we can express that

Q: Do you deal with aggregation ?
A: We haven't considered that yet

Type-Based Parametric Analysis of Program Families
Speaker: Sheng Chen (Oregon State University)

A software system is a family of programs with a configuration, making error messages are difficult to read. We start with a program family, select a program and analysis but variational analysis takes a different route. We will look at the variational analysis today (VA):

Programming by search, applying an analysis gives us results R', but we expected R. We can instead introduce variations to data structures. e.g list -> variational list. Then we adapt analysis, create correctness proof and evaluate performance.

There's a lot of very helpful code extracts, syntax definitions etc.. that I didn't have time to copy, see the paper for them.

Q: In your performance results, did you examples fit into the more or less aggressive approach ?
A: They work even in the more aggressive example
We are breaking for lunch so we'll be back at 2:00


Session 3: Binding Structure (Chair: Tarmo Uustal)

Romeo: A System for More Flexible Binding-Safe Programming
Paul Stansifer (Northeastern University); Mitchell Wand (Northeastern University)

Programming is hard. Meta programming allows us to program to program, but I can introducing name issues. Alpha equivalent inputs, should produce alpha equivalent outputs. We have developed a binding safe programming language, with notation to express complex binding. Romeo expresses binding information in types.

Example remeo code:
stx = (Lamda (a) (+ a 1))

Romeo automatically renames a (as its not exposed) to makes sure no names clash.

Again, have a look at the paper for cool code examples

Romeo is a binding safe language that support term languages with complex binding structure

Q: In meta programming we want effects, how do deal with e.g. meta programming for loop invariant code motion?
A: Remeo is side-effect free, effects in an meta programming we'll take this offline

Q: Is this online ?
A: Not yet

Q: This is very similar to lamdba M ?
A: This was your inspiration but its not powerful enough as we have syntax cases

Q: Hygienic macro expansion ?
A: We indirectly mentioned it

Q: Thanks for getting starting with an example
A: Thanks

Q: What's the complexity of expansing remeo code ?
A: potentially n^2, we think we could perform the substitutions lazily to improve this

Q: can we represent practical thing ?
A: We don't have a story for possible racket macro's you could want, some are simple

Maximal Sharing in the Lambda Calculus with letrec
Speaker: Clemens Grabmayer (VU University Amsterdam)

People desire increased sharing, e.g. to reduce code duplication and check equality of unfolding semantics of programs.

3 Steps:
- interpretation - turning lamdba-letrec into lambda term graphs
- bisimulation check & collapse of lambda let graphs -
- readback

Our tool is available on hackage
Possible extensions include adding support to full functional languages.

Q: How is this similar to disiding type equivalent in System F?
A: I will look into this

Q: Is this subsumed by Kleene algebra ?
A: I will look into this


Session 4: Program Optimisation (Chair: John Launchbury)

Practical and Effective Higher-Order Optimizations
Speaker: Lars Bergstrom (Mozilla Research)

Many CFA-based optimizations addressed in other ways, but higher-order inlining is the missing optimization. In practice, high performance functional programs avoid functions, e.g. OCaml compiler example.

What is the concrete function called from some computed function call ? (handled by 0CFA)

We have come up with a new analysis to see when can do higher-ordering inlining. We see if any free variables have been rebound between the closure capture location and the call site.

Technique:
- Normalize and number soure code
- Build control-flow graph
- Add CFA-infromed edges
- Perform rebinding

Q: How can I as a programmer understand when the optimisation will kick in ?
A: If your the compiler writer, yes, otherwise we don't yet have a story for this

Q: What happens with things like List.map and List.map ?
A: This are often special case in the compiler to inline

Worker/Wrapper/Makes It/Faster
Speaker: Jennifer Hackett (University of Nottingham)

Research focus on correctness of optimisations but there's a lot less work on how we check that the transformation is actually an improvement.
Example of adding an accumulator to the List.rev function

The worker wrapper theroms captures induction.

I'm headiing off and direct the reader over to Leo at http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1/

Filed under: Uncategorized No Comments
1Sep/140

ICFP 2014: Day 1

Leonhard Markert@srg.cl $

This is day 1 of the 19th International Conference on Functional Programming.

Opening by the General Chair, Johan Jeuring

Quick facts about ICFP 2014:

  • 340 people registered for ICFP
  • 466 people registered for any event
  • most people are from Sweden, UK and USA
  • 21 sponsors, $60k in sponsoring

Keynote: Using Formal Methods to Enable More Secure Vehicles -- DARPA's HACMS Program

Kathleen Fisher (Tufts University)

Four and a half years long program, 2012--2017.

Many everyday objects today are computers! Infrastructure, medical devices, refrigerators, cars ... examples given for these things being hacked.

Focuses on cars: "computer on wheel" -- 100s of interconnected computing units. Everything is under control of software: braking, accelerating, locking.

First attack vector was via diagnostic bus but this requires physical control (you need to be in the car). Later found that physical control is not necessary to take over a car (Bluetooth, emergency number, audio CD).

Documentation for car systems exist and spare parts can be ordered online and analyzed (so no air gap, no security by obscurity).

Use the desktop computer approach? -- anti-virus scanning, intrusion detection systems, patching infrastructure. Problem: "security software" tends to introduce vulnerabilities. Also: embedded systems don't have the resources to monitor themselves all the time the way desktop computers do.

SAT solvers are getting faster; this allows us to use them to verify real software and hardware.

Clean-slate methods for high-assurance software: code synthesis, DSLs, Interactive Theorem Prover as PL.

Structure of the HACMS program: technical areas are vehicle experts, OS (building on verified sel4 kernel), control systems, research integration (management, composition: how to put high-assurance components together to create a bigger high-assurance component?, integration), red team (hints at problem of how to assess the security of a system -- currently a heuristic is: how long does it take for hackers to take over the system?).

Air team works on helicopter and quadcopter; ground team works on cars.

Demo video: WiFi-controlled quadcopter being taken over using aircrack-ng.

Every 18 months the red team gets access to the system and tries to find vulnerabilities.

Started with monolithic software, no RTOS, no security. After first iteration: use faster processor, put FreeRTOS and HAL under still monolithic Ardupilot software. Next iterations: generate more and more code from new DSL.

SMACCMCopter 18 months assessment: it flies! Air team can prove system-wide security properties like memory safety, ignoring malformed / non-authenticated messages; all good messages received will reach the motor controller. Red team found no security flaws in sex weeks with full access to source code. "Probably the most secure UAV on the planet" ("kinda scary").

Research integration (Rockwell Collins) used AADL to describe how hardware and software components fit together. Developed tools (AGREE, Resolute) to reason about the system. They are used to check assumptions about the system (example: there is only one way to communicate with the quadcopter).

Control systems (Galois) created Ivory and Tower, open-source EDSLs for synthesizing safe low-level code and tasks, respectively. Designed and built SMACCMCopter, the first high-assurance UAV software, in under two engineer-years.

OS (NICTA) proved full functional correctness of a fast OS microkernel, sel4 (now open source). Also working on eChronos, a high-assurance RTOS product line, and formally verified OS components (drivers, file system etc). CAmkES is a system for connecting and configuring these components.

Vehicle experts (Boeing) integrate HACMS technologies into the Unmanned Little Bird.

Air team in phase two tries to re-architecture (split) the system so it runs on a flight computer and a mission computer as is usual in real world scenarios. Red team will then try to gain control over the flight computer from the mission computer.

More detailed description of individual tasks follows.

Tech transition: a huge problem. The cyber-physical systems industry know they have a cyber-security problem but there are many barriers to adoption of HACMS-like technology.

C: Make it cool! Come up with a catastrophic event!
A: Well this is a response to "you can take over a car wirelessly". Another aspect: it will probably be mandated soon for cars to take to each other -- but remote code execution was never considered.

Q: Why not have quadcores in the car?
A: Very price-sensitive; the industry has tiny margins.

Q: How much does this project cost?
A: $70m.

Q: Research: top-down or bottom-up, which is better?
A: We need both. DARPA is bottom-up, NSF is top-down.

Q: You expect to pay more for high-assurance code. What is the cost? Especially for the second, third, ... project.
A: High-assurance code will always be more expensive. We hope to bring down the additional cost by developing tools that aid building high-assurance code. But have to consider: traditionally we pay people to write sloppy code; the cost from running sloppy code is never properly accounted for! Maybe high-assurance software is cheaper in the long run.

Q: Why did you replace the Arduino processor?
A: We wanted to start with an existing open source hardware platform.

Q: (bit unclear)
A: We're close to having one high-assurance system, the SMACCMCopter. Still far away from having a high-assurance car for example (one problem: no-one knows how many lines of code a normal car has!)

Q: How do you combine secure and unsecure code?
A: That's one of the challenges of course.

Q: How do you know the hardware does what you think it does?
A: In this project, we simply trusted the hardware. Other people are working on hardware issues.

Session 1: Domain Specific Languages I (Chair: Anil Madhavapeddy)

Building Embedded Systems with Embedded DSLs (Experience Report)

Patrick Hickey (presenting), Lee Pike, Trevor Elliott, James Bielman, John Lauchbury (Galois Inc)

Open source: github.com/GaloisInc

Goal: Build a high assurance quadcopter controller (long-term: helicopter controller). Small embedded system (microcontroller), hard real time, safe, secure. Had three engineers, 18 months, starting from scratch. Expected to write ~50kloc C/C++ code.

Embedded systems are everywhere. They're basically computers from the 80s: small and cheap. Development tools are from the 80s too (C). You get all the security flaws you'd expect -- buffer overflows etc. Can't push patches, and there are more attack surfaces than ever.

Approach. Haskell, OCaml? Problem: very resource limited system; GC imcompatible with hard real-time. C, C++? Tradeoff between safety and productivity -- e.g. NASA Jet Propulsion Lab writes high assurance C; extremely costly.

Alternative: build your own tools. Start with clean slate, use language approach, correct by construction. Built an embedded DSL in just a few months: Ivory, embedded in Haskell, compiles to safe subset of C. It's memory safe and allows no undefined or implementation defined behavior. No heap, only global and stack allocation.

Haskell type system guarantees properties of Ivory language programs; Haskell is Ivory's macro language.

Ivory: expressions are pure Haskell values; statements are effectful, embedded in monad. Examples for what has been embedded in the type system from the paper are given.

Ivory programs (modules) are a collection of top level C procedures and variables.

Tower DSL: organising entire applications. Composes Ivory procedures into applications. Initial problem: multithreading with message passing; code generation of low level primitives. Tower is implemented as a Haskell macro that generates Ivory programs. Tower contains concept of Tasks and Channels. It became the DSL describing software components.

(Side story: a third-party Python tool generating C output was modified to output Ivory code as well; this code could then be type checked to make sure it was correct. This way, the Python tool didn't need to be trusted.)

SMACCMPilot: 10kloc (and growing) of drivers, 3kloc application (stabilizing the quadcopter and flying), 10kloc message marshaling. Generates about 48kloc of correct-by-construction C.

Red team results: found no buffer overflow, no undefined behaviours, no denial of service. After five months they found two subtle architecture level bugs. Lifted the layer of abstraction: no need to worry about low-level problems, can concentrate on architecture.

Conclusion. Quickly and cheaply built a large, complex application; caught lots of errors early by Haskell type checking, could focus on application design; used Haskell as a macro system making it possible to build compositional programs.

Q: How is this an improvement over the OCaml code generator?
A: Cross-compilation was a specific concern, making sure the code runs on 8-bit, 16-bit, 32-bit architectures.

Q: Restricted amount of memory: how did you deal with that?
A: Don't yet have a way to assess the maximum stack use of a program. But we think it is solvable.

Q: You construct correct-by-construction code that is a subset of C, e.g. you don't do pointer arithmetic.
A: Yes.

Q: Do all arrays have a size known at compile time?
A: Yes.

Q: Type error messages were a problem?
A: Yes.

Q: You don't allow undefined behavior. How do you prevent overflows?
A: This is not done at the type level; instead we insert assertions in the C code (at the moment ~2500 assertions). Hope to analyze and discharge these assertions statically in the future.

Q: How about Rust?
A: Big fan. Rust was in its infancy when the project started two years ago. But when using Haskell we get type checking for our macros.

Concurrent NetCore: From Policies to Pipelines

Cole Schlesinger (presenting), Michael Greenberg, David Walker (Princeton University)

Review on networking: have an enterprise network with hosts, connected to a switch. In a SDN, the controller tells the switch which rules to use to deal with packets. Openflow 1.0 uses prioritised rules for this. Has predicates and actions. But switch hardware is not that simple: lots of tables in a switch pipeline. New configurable designs available now.

Openflow 2.0: splits interaction between controller and switch into two phases: 1. configuration (think flashing an FPGA); 2. population of the tables.

Contributions: core calculus for packet processing pipelines and controller policies; hardware models for three switch architectures; compilation algorithms for deploying general controller polices to target switch architectures.

Example: Broadcom "trident" switch. Each component can be thought of as function from packets sets of packets. (empty: drop, singleton: forward, multiple: e.g. flooding).

Have forwarding table; sequencing; if-statements over predicates; rewrite component.

Intel "flexpipe" supports reconfigurability. Each table can match on and write to every field. Also supports concurrency; introduce concurrency operator to model this.

What about the controller? Easiest for programmer to define virtual pipeline that describes the requirements.

New operator: packet duplication -- copy and run packet through multiple components.

Example: compilation to the barefoot "RMT" architecture -- three passes: 1. consolidate packet duplication (output ports must be known at the time of packet duplication, also need to add metadata to model processing as it would have taken place in the virtual pipeline); 2. refactor field modification; 3. place generated tables into physical tables (use commutativity of concurrency and other tricks -- the algorithm is a combination of brute force and dynamic programming).

Metatheory. Predicates form a Boolean algebra; grammar given for Predicates and Policies. Have a type system and small-step semantics.

Typing judgement of the concurrency operator prevents race conditions. Can prove normalization and confluence, in combination giving strong normalization (determinacy). Adequacy can be shown too.

Future work: implement the compiler and transformer on a real chipset.

Q: Does your system support encryption of packets inside the network, like IPSEC?
A: Yes and no -- you could think of encryption as an atomic action but it has not been fully worked out.

Q: Plus operator?
A: Plus operator is duplication. Semantics: both packets are sequentially processed by the remaining pipeline.

Session 2: Static Analysis (Chair: Ken Friis Larse)

SeLINQ: Tracking Information Across Application-Database Boundaries

Daniel Schoepe, Daniel Hedin, Andrei Sabelfeld (Chalmers University of Technology)

Motivation: confidential data is entrusted to many application which are composed of many components. Information leakage often occurs at boundaries, e.g. SQL injections. Need to look at confidentiality as end-to-end security to prevent attaches.

Contributions: track information flow for code using database queries. Security type system ensures noninterference and proof of soundness. Compiles to executable F# code; realistic demo.

Approach: Traditionally, attacks often occur at component boundary, e.g. SQL injections. Tierless approach: correct communication across boundary ensured at compile-time. Example: write everything in F#, some of which compiles to SQL.

Intuition: untrusted program shouldn't leak private data to third parties. Need to track flow of information through the program.

Noninterference: private inputs must not influence public outputs. Attacker doesn't learn anything about secrets by observing public behavior of the program.

Language-integrated query -- build queries via meta-programming. Example: LINQ in F#. Guarantees user input is sanitized and SQL is syntactically correct.

Idea: types annotated with security levels (L for public, H for private).

Security policy specified by giving a typing to each database. Code example given.

Introduce low-equivalence relation for values that are indistinguishable for the attacker. Defined structurally based on type; components with level L have to be equal.

Each secure program should be noninterfering with itself.

Security type system has soundness: if program is well typed then it is noninterfering.

Implementation: type checker (based on constraint solving and unification)and compiler implemented in Haskell using BNFC. Language compiles to executable F# code; compilation consists mostly of removing security types. Type system and soundness proofs allow user-defined algebraic datatypes.

Case study: near-realistic movie rental database. Want to protect names and exact addresses of customers and staff. Examples given of valid and invalid queries.

Summary: light-weight framework for tracking information flow; security type system ensures noninterference; proof-of-concept implementation given; extension to ADTs; case study.

Future work: extend to client code (JS).

Q: Can I make the security policy express things like "only certain people should be able to access this"
A: No, can't do this at the moment.

Q: Could this be embedded into a computational framework where one could assist the system in showing that some access is in fact safe when the type system's approximation cannot show it?
A: That would be an interesting extension.

Q: Do you deal with aggregation?
A: This has not been considered yet.

Type-Based Parametric Analysis of Program Families

Sheng Chen (presenting), Martin Erwig (Oregon State University)

Can often think of programs as "program families" which by configuration can be refined to single programs. But build & install may fail in unexpected ways. Trying all combinations of options is infeasible.

Idea: use variational analysis directly on the program family / variational program. Introduce a framework for doing variational analysis in the type system.

Notation for choice in variational programs introduced.

Program by search.

Creating variational analyses: 1. add variation data structures, e.g. lists become variational lists (VL, variation in vales and list structures); 2. adapt analysis, e.g. analysis of sort becomes analysis of VL sort; 3. correctness proof (tedious); 4. performance analysis

Example: variational typing rules shown.

This framework consists of annotations (sets), types and constraints. All three have extension points. Application and Choice typing rules are shown.

A constraint solving example is shown involving merging and splitting.

Instantiating the framework for variational 0CFA: no extensions required for annotations, types and constraints but rules need to be extended.

Correctness evaluation of the framework. Implemented Flyweight Java version of the framework and proved it correct.

Performance evaluation shows that the framework performs almost as good as manually lifted 0CFA analysis.

Q: Is the aggressive (faster) approach powerful enough to check types?
A: Yes.

Q: Have you tried to do effect analysis in this framework?
A: The framework should be expressive enough to do this but it hasn't been done yet.

Session 3: Binding Structure (Chair: Tarmo Uustal)

Romeo: A System for More Flexible Binding-Safe Programming

Paul Stansifer (presenting), Mitchell Wand (Northeastern University)

Introduction: metaprogramming allows us to automatically generate programs. Name binding is a problem -- example given. Recent survey found that 8 out of 9 metaprogramming systems had name binding problems.

Binding safety: alpha-equivalent inputs should produce alpha-equivalent outputs. In theory this works: FreshML, MetaML. But these only support simple binding forms like λ. Real languages have more than λ, e.g. let* in Racket, "local" construct in SML.

Contributions: a binding-safe PL, Romeo, with a notation to express complex binding, clear big-step semantics, proof of correctness.

Explains the concepts of "imports" and "exports" in the context of name binding.

In Romeo, binding information is in the types.

What makes two let*s alpha-equivalent? Follow-up: when are two Romeo declarations alpha-equivalent? -- must have exactly the same behavior. Thus their exported binders must be the same.

Romeo's runtime enforces safety. It automatically avoids name clashes. Accidentally unbound names are a dynamic error ... which Romeo's optional deduction system can statically rule out.

Conclusion: Romeo is a binding-safe language that supports term languages with complex binding structure.

Q: What about de Bruijn indices? Effects in the metalanguage?
A: Romeo is side-effect free.

Q: Can I play with Romeo?
A: I'll try to convince my supervisor to release the source code. Currently the implementation is out of sync with the formalism.

Q: This looks similar to λ-m.
A: Lambda-m is not powerful enough. We give you syntax case.

C: Thanks for starting with a concrete example.
A: Thanks to my peers for the criticism!

Q: How does hygienic macro expansion relate to this?
A: If your macro gets larger and larger, there is less safety -- hygienic macros are all about protecting macros from each other.

Q: Complexity of expanding Romeo code?
A: O squared, so that's not good. Macro systems have the same problem though. Trick: perform actual substitution lazily, and we think the same technique could be used with Romeo also.

Q: How much expressiveness does Romeo have?
A: Can we express all Racket macros -- probably, we think most are just simple extensions of what we have right now.

Maximal Sharing in the Lambda Calculus with letrec

Clemens Grabmayer (presenting, VU University Amsterdam), Jan Rochel (Universiteit Utrecht)

Motivation: increasing sharing is desirable -- compact code, avoid duplicated work at run-time, enables us to check equality of unfolding semantics of programs.

1. interpretation of λ-letrec terms as term graphs; 2. bisimilarity and bisimulation collapse of λ term graphs; 3. readback of λ term graphs.

This method extends common subexpression elimination and is targeted at maximizing sharing statically.

Example of unfolding equivalence of fixpoint operator.

Example of graph interpretation given. First we generate a λ higher order term graphs. Transform into first-order term graph with scope vertices with backlinks, then into λ term graph.

Then we do a bisimulation check between λ term graphs.

The class of eager-scope λ term graphs is closed under functional bisimilarity.

Implementation: "maxsharing" on Hackage, uses Utrecht University Attribute Grammar Compiler.

Time complexity analysis of the individual steps follows.

Possible extensions: support for full functional languages (would require work on a core language with constructors and case statements); prevent space leaks caused by disadvantageous sharing.

Applications: use maximal sharing at run-time, possibly directly on supercombinator graphs; can be coupled with GC; code improvement (statically) to detect code duplication; checking function equivalence (relevant for proof assistants, theorem provers, dependently-typed PLs).

Q: Explicit weakening. Have you tried to implement reduction of the first order term graph?
A: It's possible. Used port graphs to implement reduction because that seemed easier.

Q: Does this subsume Kleene algebra?
A: Maybe? Interesting idea.

Session 4: Program Optimisation (Chair: John Launchbury)

Practical and Effective Higher-Order Optimizations

Lars Bergstrom (presenting, Mozilla Research), Matthew Fluet (Rochester Institute of Technology), John Reppy University of Chicago), Nora Sandler (University of Chicago), Matthew Le (Rochester Institute of Technology)

Control flow analysis (CFA) allows higher-order inlining. In practice, high performance function programs avoid functions.

Two subproblems: 1. what is the concrete function called from some computed function call? -- already handled by 0CFA; 2. does the potential inlining point have a compatible environment? -- "reflow", JITs.

Contribution: determine when higher-order inlining is safe. Key insight: don't check "is the environment the same?" but "could any of the free variables have been rebound between the closure capture location and the call site / potential inlining point?". This supersedes ad-hoc higher-order inlining tricks.

Technique: 1. normalize and number source code (annotations for variable binding sites); 2. build control-flow graph; 3. add CFA-informed edges; 4. Perform rebinding search on paths between closure capture location and potential inlining point

Safe and unsafe example given; control flow graphs shown.

Reflow analysis is strictly stronger. But this analysis has about 3% time cost of the entire compilation process.

Conclusion: start using closures in benchmarks!

Higher-order inlining heuristics still missing; correctness proof still incomplete.

Q (SPJ): You showed that the analysis is cheap but what do we get?
A: Benchmarks are in the paper.

Q: If I want to use this: is it clear for me where the optimization will kick in?
A: Well, yes, if you're the compiler writer (laughs from the audience). But this is a problem indeed.

Q: I usually pass closures to "map" and "fold", so these will have many call sites. Can this still be optimized in this framework?
A: "map" and "fold" are usually handled in a special way by the compiler anyway.

Q: Is it imaginable to prove that the optimization actually makes the program run faster?
A: Next presentation will try to answer this. But it's very difficult.

Q: What's the relationship to Milton's optimizations?
A: Not entirely sure. Is defunctionalization or CFA-style analysis better? An interesting question.

Worker/Wrapper/Makes It/Faster

Jennifer Hackett (presenting), Graham Hutton (University of Nottingham)

Context: lots of research into optimization focuses on correctness: is meaning preserved?; comparatively little work on improvement -- ensuring programs become "better".

Example: list reversal function. How can we prove that the improved program is both faithful to the original version and also more efficient?

The Worker/Wrapper transformation: Original program has type A, worker has type B, wrapper has type B -> A.

Formally, given

f : A -> A, g : B -> B, abs : B -> A, rep : A -> B

and

abs . rep = id, g . rep = rep . f

then

fix f = abs (fix g)

Reasoning about efficiency. Need some way to compare the cost of two programs. Naive approach: count steps taken to evaluate terms to some normal form. But this fails in a lazy context! Need precongruence.

Precongruence: a preorder and a congruence -- this excludes negative interaction between improvements at different points in the program.

Improvement theory: counting steps, also have a notion of improvement within a constant factor.

Improvement induction principle shown. "tick" idea. It's like guarded coinduction.

Shows the Worker/Wrapper theorem for improvement theory (typeless).

Structure of improvement proofs is essentially the same as that of correctness proofs.

Conclusions: we can make rigorous arguments about the efficiency of lazy programs; make performance guarantees for a general-purpose optimization.

Further work: structured recursion; turn it into a typed theory (advantage: only need to worry about contexts where the types make sense); look at improvement of space usage and other resources; quantify improvement (how much faster? which optimization should be applied if there are alternatives?); automate it.

Q: Do you have completeness results?
A: It's not clear whether Worker/Wrapper improvement proofs can be translated from fixpoint style to improvement theory.

Q: Do you have any examples where Worker/Wrapper transformation makes things slower?
A: Of course we have come across these, but we wouldn't put that in the paper! (Laughs from the audience). [Tree reversal example follows]

Q: Problem in GHC is the cost of building closures. How is this reflected in your framework?
A: Read early papers on improvement theory.

Q: How about local improvements?
A: This work does not look at the surroundings.

Q: Would things change dramatically if you applied this to a call by value language?
A: There has been some work on that.

Session 5: Context Dependence (Chair: Yukiyoshi Kameyama)

Compositional Semantics for Composable Continuations: From Abortive to Delimited Control

Paul Downen, Zena M. Ariola (University of Oregon)

Classical control: callcc is the classic control operator, going back to Scheme. Corresponds to classical logic (the way the λ-calculus corresponds to intuitionist logic).

Delimited control: delimit the scope of effects; continuations compose like functions. These little changes make delimited control vastly more expressive than classical control.

λ-calculus + callcc operator and λμ-calculus are equivalent. Relaxing λμ syntactically gives Λμ which allows delimited control! Λμ in turn is equivalent to lambda + shift0 + reset0.

Classical control in detail. Operational semantics of callcc as an extension of CBV lambda-calculus given. Alternatively, we can understand callcc using an equational theory. But: it's weaker than operational semantics. Some programs can be evaluated to a value but equational theory for callcc cannot reach a value! How can we know that we have the whole context?

Calling a continuation never returns -- it "jumps". A jump is the same when inside a larger evaluation context so it delimits the usable extent of a continuation.

λμ syntactically distinguishes jumps as "commands".

Relaxing the syntax. Λμ collapses term/command distinction. It is interpreted with the same rules only that we have more expressive meta-variables. No new constructs, no new rules. As typed calculi, Λμ is considered equivalent to Parigot's λμ. What happens if you ignore types?

shift and reset: common basis for delimited control. Continuations return and can be composed like functions.

λ-calculus + shift + reset is a true subset of Λμ. What's missing?

Difference between shift and shift0: the latter removes its surrounding delimiter.

λ-calculus + shift0 + reset0 is equivalent to Λμ.

Λμ as a framework for delimited control gives provable observational guarantees about the operators, e.g. idempotency of reset.

More in the paper: can get nice reasoning principles using call-by-value, call-by-name, call-by-need; equational correspondence with compositional transformations.

Q: What do you mean by equality in this presentation?
A: Observational equality -- this is about tools for reasoning about programs rather than expressivity of the calculi.

Q: λμ has a well known type system but to introduce delimited control you had to relax the syntax. Is there a type system for this?
A: (?)

Q: Classical logic, intuitionist logic -- delimited logic?
A: YES! Should investigate this.

Coeffects: A Calculus of Context-Dependent Computation

Tomas Petricek (presenting), Dominic Orchard, Alan Mycroft (University of Cambridge)

Coeffects is the dual of Effects. Done!

This talk is about the context in a type judgement.

Effect systems: Γ :- e : τ & σ
Coeffect : Γ & x :- e : τ

Interesting properties of Γ ("view from the extreme left"): variable related -- liveness, bounded linear logic, data-flow programming, provenance; environment related -- implicit parameters, type classes, distributed computing, platform versioning.

Coeffect calculus for bounded reuse: annotate each variable in the environment with the number of its uses in the expression, essentially tracking the number of variable uses.

Variable, abstraction and contraction rules shown.

Another possibility: coeffect calculus with historical values. Replace addition with max operation in the contraction rule.

Environment related: coeffect calculus for implicit parameters. Example: implicit parameter for time zone and time. We can calculate how many parameters are still required (i.e. not given by the current environment). The corresponding rule is non-deterministic.

What's in the paper? Unified system: coeffect scalar structure (a generalized semiring structure) and shape-indexed coeffects (context splitting and merging, per-variable or whole-context). Also have monadic and comonadic semantics -- not simple as λ-calculus is asymmetric (many to one)!

Why do coeffects matter? Generalize interesting systems (liveness, data-flow, implicit parameters, type classes) and (indexed) comonads are cool.

Q: What happens if you have (co)monads in the classical calculus?
A: Interesting question.

Program Chair's Report

Quick paper stats:

28 accepted papers out of 97 submissions.

2/9 functional pearls and 2/3 experience reports accepted.

299 reviews -- at least three per paper.

72 hour author response period (as usual).

It was decided not to allow papers from members of the program committee.

Comparison with previous years: number of papers approximately stable, share of accepted papers even more so.

Functional pearls are accepted less than normal papers.

Author's location: USA, UK, France, Japan, Netherlands, Sweden, Germany, other.

UK has highest acceptance rate, then USA.

Papers submitted only a couple of hours before the deadline were less likely to be accepted -- before that there seems to be no correlation between submission time and acceptance probability.

Big thanks to the program committee!

Deputy Lord Mayer of Gothenburg: invitation for drinks, short history of the city.

31Aug/140

ICFP 2014: Day 0 (WGP)

Leonhard Markert@srg.cl $

The 19th International Conference on Functional Programming will begin tomorrow. Two affiliated events, the 10th Workshop on Generic Programming (WGP) and the Workshop on Higher-Order Programming with Effects (HOPE), take place today.

WGP Session 1

Invited talk: Functional Programming, Object-Oriented Programming and Algebras

Bruno Oliveira (University of Hong Kong)

How to achieve modularity, type-safety and reuse without requiring a PhD in type theory? Bruno Oliveira proposes algebras as an alternative to algebraic datatypes and OO hierarchies, more specifically: variants of F-Algebras. Fold Algebras (using products of functions) and F-Algebras (using sums of products) are isomorphic.

An example of encoding generalized algebraic datatypes (GADTs) using type classes in a way similar to Church encodings due to Hinze from 2003, when there were no GADTs in Haskell.

Similarly, in OO, Church encodings were used to model the visitor pattern.

The Expression Problem: a problem of modularity. Given an existing type of expressions, how do you extend it (e.g. how do you extend data Exp = Lit Int | Add Exp Exp with multiplication)?

Solution 1: encode algebras as type classes, extend using subclasses.

Solution 2: use F-Algebras (Datatypes à la Carte)

In OO languages, can also solve the Expression Problem with algebras: use interface inheritance.

All current solutions using algebras still require heavy encodings. In the future: programming language support for algebras?

WGP Session 2

Generic Constructors and Eliminators from Descriptions

Larry Diehl (presenting), Tim Sheard (Portland State University)

Abstract: Dependently typed languages with an “open” type theory introduce new datatypes using an axiomatic approach. Each new datatype introduces axioms for constructing values of the datatype, and an elimination axiom (which we call the standard eliminator) for consuming such values. In a “closed” type theory a single introduction rule primitive and a single elimination rule primitive can be used for all datatypes, without adding axioms to the theory.

Abstract continued: We review a closed type theory, specified as an A GDA program, that uses descriptions for datatype construction. Descriptions make datatype definitions first class values, but writing programs using such datatypes requires low-level understanding of how the datatypes are encoded in terms of descriptions. In this work we derive constructors and standard eliminators, by defining generic functions parameterized by a description. Our generic type theory constructions are defined as generic wrappers around the closed type theory primitives, which are themselves generic functions in the A GDA model. Thus, we allow users to write programs in the model without understanding the details of the description-based encoding of datatypes, by using open type theory constructions as an internal domain-specific language (IDSL).

Ornaments in Practice

Thomas Williams (presenting), Pierre-Évariste Dagand, Didier Rémy (INRIA)

We often write very similar functions for structurally similar data types, e.g. "add" for natural numbers and "append" for lists. Natural numbers and lists have the same recursive structure and a coherent property, a "length" attribute:

add (length ml) (length nl) = length (append ml nl)

A simple projection function exists from lists to natural numbers: the "length" function.

The motivation for this paper was to add ornaments to ML. Naturals/lists example: an ornament is defined by a projection function from the ornamented type to the bare type, e.g. "length". Declare an ornament as

ornament from length : a list -> nat

The system checks whether recursive structure is preserved.

Syntactic lifting is how "append" can be derived from "add" almost automatically. Given

let rec add m n = match m with
  | Z -> n
  | S m' -> S (add m' n)

and the lifting expression

let lifting append from add with
  {length} -> {length} -> {length}

the system can derive

let rec append ml nl = match ml with
  | Nil -> nl
  | Cons(x, ml') -> Cons(?, append ml' nl)

But the ? is ambiguous (it's the "creative part") and needs to be completed manually, by code inference (heuristically) or a "patch".

A patch would look like

let append from add
  with {length} -> {length} -> {length}
  patch fun _ -> match _ with Cons(x, _) -> Cons({x}, _)

Applications of this: for refactoring (see question further down), or to derive an evaluation function (uniquely) from "conv", a bijective function between expression types.

More complex data structures can be lifted too (e.g. Sets and Maps); so can higher-order functions. Lifting for GADTs is also possible and is automatic for some invariants given the expected type of the function.

Conclusions: describing ornaments by projection is a good fit for ML, syntactic lifting gives good predictable results. Future work: better patches, integration into ML, combining ornaments.

Q: Projections seem very similar to abstraction functions in abstract interpretation.
A: Could be seen as abstract interpretation.

Q: Is this weaker than what was presented in the 2012 paper? There you could compute ornaments out of indices.
A: Indices flow in the other direction in the presented approach.

Q: Refactoring example. Normally want to throw old code away and keep new code. Here we get some conversion between old and new.
A: Yes, could remove old code.

Q: How about code maintenance?
A: Patches are part of the answer.

Q: Can the patch language also talk about the logical parts?
A: That would not be easy.

Type Inference for the Spine View of Data

Matthew Roberts (presenting), Tony Sloane (Macquarie University)

Context: rho-Calculus (theory of term rewriting), Pattern Calculus and bondi, Stratego, "Scrap your Boilerplate" (SYB paper), kiama, Reloaded/Revolutions.

Motivation: precisely characterize the type inference machinery needed to support the spine view of data.

Result: need to add "ispair" expression (see further down) to first-class polymorphism (FCP), which is a small extension of Hindley-Milner.

Fully applied constructor view means data equals constructor with all arguments (fully applied). When arguments are missing this is a function, so we can't pattern match against it.

Spine view only knows nodes of arity two or zero. This means that all pattern matching can be done using

ispair d bind (x, y) in e1 else e2

which is morally equivalent to

case c of o x y -> e1
          _     -> e2

Here x refers to the constructor and all its arguments except the last one, and y is the last argument.

Examples: generic map and fold.

Implementations: the "decidedly generic compiler" dgen and "kiama".

Q: Why is complete type inference so important?
A: The semantics of doing spine view with only one construct took many attempts; this one allows for type inference so it was chosen over the others.

Q: Have you tried to use FCP on other spine view representations?
A: With FCP, you trade off type annotations with data annotations.

WGP Session 3

First-class Isomorphic Specialization by Staged Evaluation

Alexander Slesarenko (presenting), Alexander Filippov, Alexey Romanov (Huawei Technologies)

There are three types of program specialization: partial evaluation, supercompilation and isomorphic specialization. Idea: we have A, B, P such that P : A -> B; also A', B' which are isomorphic to A and B, respectively, and P' : A' -> B' which can be computed more efficiently than P. Potentially there are multiple options for A', B' and P'. How can we use P' on A and B? (A and B: 'domain' data; A', B': 'core' data)

The example used is matrix-vector multiplication. Matrices, for example, can be represented as dense (vector of vectors), This is implemented as an EDSL (embedded domain specific language) in Scala. Here we can abstract over dense and sparse vectors and matrices. The specialization can take place at runtime, and we can allow for alternative specialized versions. Specialization can lead to faster code.

Generic composition of isomorphisms -- idea: build isomorphisms for each constructor in the core language. These isomorphisms are first class citizens of the framework.

How it works: staged method invocation with graphs.

Take-home agenda:

  1. Develop algorithm in EDSL
  2. Identify isomorphic representations in the domain language
  3. Implement ADTs using a core language and isomorphic representations
  4. Generate representation-specific implementations in the core language

Implementation can be found at github.com/scalan.

Q: How did you proceed? Paper includes theory and implementation notes
A: Original idea was to use graphs, that was the starting point

Q: You talk about large performance increases compared to original Scala implementation. Have you compared with C/C++?
A: No.

Algebraic Effects and Effect Handlers for Idioms and Arrows

Sam Lindley (University of Edinburgh)

Idea due to Plotkin: algebraic effects describe abstract computations. Abstract computations are trees (modulo equations).

Monad trees are unrestricted abstract computation trees (dynamic control and data flow). Arrow trees are monad trees where only static control flow is allowed (static control flow, dynamic data flow). Idiom trees (idiom = applicative functor) are arrow trees where only static data flow is allowed (static control and data flow).

Examples are given for each monad, arrow, and idiom; the control flow and data flow trees show the dynamic / static nature.

Static computations are useful as they allow optimisations and low-level implementations (arrows correspond approximately to circuits, for example).

Key contribution: using Flow Effects, we can write monad/arrow/idiom code in a single style -- this is very different from Haskell, where different notations exist for each.

A handler is an interpreter for an abstract computation. It is defined as a fold over a computation tree, specifying how return values and operations are interpreted. Given a monad handler, we can derive an arrow and idiom handler.

Q: Can it be inferred from a Flow Effects expression whether the computation it defines is a monad, an arrow or an idiom?
A: Yes.

Q: What's the status of the Flow Effects syntax?
A: Not finalised yet -- the syntax shown in the paper is a "mathematical" syntax with braces, indices etc.

A short discussion followed on whether a unifying notation is a good idea -- maybe it is a good thing that we can tell from the syntax whether a computation is a monad, an arrow or an idiom?

Scoping Rules on a Platter -- A Framework for Understanding and Specifying Name Binding

Larisse Voufo, Marcin Zalewski (presenting), Andrew Lumsdaine (Indiana University)

Name binding: for a given reference (use), where is the corresponding declaration (what it refers to)?

In order to understand and unambiguously define name binding rules, a number of name binding combinators has been developed.

Scope combinators are: hiding, merging, opening, weak hiding.

Together they provide a language that describes how scopes are combined to look up a reference. These are then used to describe how a specific language does name binding. The example is C++: the standard devotes over 60 pages in total to name binding rules (and it turns out both Clang and GCC get operator resolution wrong).

Discussant: Ilya Sergey

Q: Could we use attribute grammars for name binding?
A: Finding a grammar for C++ has been attempted many times... It's hard to take a principled approach and applying to C++.

Q: Did you find inconsistencies in the C++ standard when you tried to collect the scoping rules?
A: No, to our surprise -- it's complex but consistent.

Q: Did you come up with any ideas to simplify the rules along the way?
A: There don't seem to be any ways to simplify the rules (except using a different language).

WGP Session 4

Composing and Decomposing Data Types -- A Closed Type Families Implementation of Data Types à la Carte

Patrick Bahr (University of Copenhagen)

This paper was inspired by the recent addition of closed type families to the Glasgow Haskell Compiler (GHC). Closed type families were used to implement Data Types à la Carte, specifically the subtyping constraint :<:

Data Types à la Carte -- idea: decompose data types into two-level types. Express recursive data types as the fixpoint of a functor. These functors can be combined by coproduct construction :+:.

Example:

-- recursive data type
data Exp = Val Int | Add Exp Exp

-- as fixpoint of a functor
data Arith a = Val Int | Add a a
type Exp = Fix Arith

-- combine by coproduct construction
data Mul a = Mul a a
type Exp' = Fix (Arith :+: Mul)

The subtyping constraint is easily expressed as a type class with an injection function inj and a projection function prj. But this approach treats :+: asymmetrically, doesn't inspect the left-hand side and is ambiguous.

Contribution of this paper: a re-implementation of :<: which behaves as expected (f :<: g <==> there exists a unique injection from f to g), avoids ambiguous subtyping (by rejecting multiple occurrences of signatures) and allows expressing isomorphism.

Implementation: Have a type-level function Embed taking signatures f, g as arguments. Produces proof object p for f :<<: g -- a relaxed version of :<: allowing ambiguity -- and checks whether p proves f :<: g. We can then derive implementations of inj and prj using the proof object as an oracle in instance declarations.

The code for Embed, Sub and helper type classes is shown. It's all contained in the "compdata" package.

Caveats: unhelpful error messages; compile time performance is unpredictable.

Discussant: Wouter Swierstra

With this approach we gain lots of expressive power, e.g. control over how the search proceeds, at the price of more type level programming.

C: A solution for the error messages apparently exists. Also, Idris does this really nicely (it's also much more complicated).

Q: Motivation: want to use sums of products. Interesting tradeoff by maybe focusing on coproducts of products?
A: ?

Q: Would instance chains have solved your problem?
A: Yes.

Q: All this type hackery for deriving injections and projections -- how about just providing them every time?
A: Of course you can do that, but it's inconvenient and obvious from the type. If you need more flexibility, you can still do it manually.

Q: No canonical ordering for injection (eg sum of int and bool) -- is this a problem?
A: Yes, at the moment you have to annotate.

True Sums of Products

Edsko de Vries, Andres Löh (presenting) (Well-Typed LLP)

A new approach to datatype-generic programming in Haskell; notable features: faithful representation of datatypes as n-ary sums of n-ary products; a library of high-level combinators to encourage concise and reusable functions; separation of metadata in the representation.

Available on Hackage as "generics-sop". "basic-sop", "lens-sop" and "json-sop" provide examples, generic lenses and generic JSON (de)serialization respectively.

Motivation was to improve on existing models: GHC.Generics representation is too constrained (binary sums and products) but at the same time too flexible (arbitrary nesting). Also has noisy metadata everywhere. Functions end up making implicit assumptions about the structure of the data which is not guaranteed by the type system.

Discussant: Patrik Janssen

Q: What kind of code does this generate?
A: Not looked at yet. But lots of implicit arguments are potentially being passed around at runtime.

Q: Can this be done better in Agda?
A: Library was completely developed in Haskell. Can probably be done more nicely in Agda, but in this case the development was driven by an actual need to deal with certain generics problems in Haskell.

Q: n-ary lists are left-biased: don't you run into similar performance problems as with binary representation?
A: List-based representation gives uniform encoding for all constructors. But probably not ideal for data types with thousands of constructors.

19Jun/140

Ionel Gog receives Google Fellowship in Distributed Systems

Every year, Google award a number of PhD fellowship in Computer Science. These highly competitive awards support "some of the most outstanding graduate researchers in computer science across the globe" (Google). The SRG is fortunate to have Ionel Gog, a second-year PhD student, receiving the 2014 Google Europe Fellowship in Distributed Systems.

Ionel works on distributed data centre infrastructure for "big data" processing: his most recent project is the Circe workflow manager, which dynamically maps high-level declarative workflow descriptions to a range of different data processing systems -- such as Hadoop, Spark, GraphChi, PowerGraph or Naiad, -- choosing the best system automatically. Ionel also contributes to the R2D2 low-latency network interconnect, the Firmament cluster scheduler and the DIOS operating system as part of the SRG's CamSaS initiative for building a scalable systems software stack.

Congratulations, Ionel!

11Dec/130

Live-blog from CoNEXT 2013 – day 2

aa535@srg.cl $

Good morning! Today is the second day of CoNEXT 2013 and the sessions today are about trains, wires, tools and falling water. I will be blogging some of the sessions again today, and as before the official conference liveblog is on layer9.org.

10Dec/130

Live-blog from CoNEXT 2013 – day 1

aa535@srg.cl $

Kicking off the conference with some explanations about how the TPC selected the papers. The program looks great and I will be covering some of the sessions here over the next three days.

26Nov/130

misled by opportunistic routing…

Jon Crowcroft@srg.cl $

so one of the motivations that led me to think about all the Haggle stuff was the Milgram work. THis seemed to imply, as do pandemics, that a series of chance encounters could, over lots of iterations, provide end-to-end communications reliably.

10 years and 100 papers and algorithms later, it isn't looking so rosy - basically, pretty much every clever scheme has a delivery probability that is quite dismal, even if you set the lifetime of packets/bundles to be many days!

of course, we claim there are other reasons to build such a system (we always did - including reality mining style apps and serendipity)...

but the real problem is misunderstanding the result that Milgram got in his six-degrees work, which made use of the postal network - this isn't an opportunistic network - it is an unbelievably reliable aggregator and schedules delivery between pretty much anywhere on the planet mostly on a near daily basis. ANd the cunning people in the postal service historically provide us with a hint on how to apply this in opportunistic networking..

Indeed, if you go back to the distant past - read about Darwin's voyages on the Beagle (or, perhaps more fun, go see the movie Master and Commander) you'll see that any and all ships going around the world were tasked with the duty of delivering and picking up the post. Without deploying a specific scheduled post boat, instead, all boats going the right way were enlisted as the delivery channel.

Thus, instead of contacts between known friends/acquaintances being required in a successful chain, the weak ties are linked across distances by an early internet - but the early internet was, itself, spatially opportunistic (but "packets" were routed over it in aggregate sacks-worth to the right destination sets).

SO we need to build this (actually, back in Intel research cambridge days, I did propose putting a bluetooth device into the lining of all envelopes and postal bags as I guessed that people walking past the post room in each lab would "drop" and "pickup" their post via the bag, and that would achieve this miraculous shortcut that we've been looking for (without admitting defeat and going back to using the interweb:( )

So we need to data-mine people's use of google maps (i.e. equivalent of lifting their Satnav planned routes) and then use those people who've looked up a place (and maybe mentioned it in a tweet or other social media) to act as ferries (as per georgia tech term) Or mules, or, errr, unwitting post-people....to act as the glue to get us back to the reliable delivery (sure, it still will involve physical travel rather than light-speed photonic networking, but it might get similar delivery properties as the Big-I Internet)

Time to build a simulation and try it based on some trace data....

Building pony express trade routes to strengthen weak ties, and regain the six degrees of separation!!

12Nov/132

Live-blog from SenSys 2013 – Day 1

Aaron@srg.cl $

The 11th ACM Conference on Embedded Networked Sensor Systems (SenSys 2013) just kicked off, held at the University of Rome 'La Sapienza' in Rome. From Cambridge, Cecilia is chairing the session "Sensing People" and the SenSys 2013 Doctoral Colloquium.

Some facts about SenSys 2013:

  • 250+ attendees
  • 1/3 are students
  • 40% have registered for both the conference and the workshops

For submission:

  • 21 papers got accepted out of 123 submissions (17%)
  • most of them received 3 to 8 reviews
  • 70 presentations in the poster and demo track - each one also got a 1-min madness talk during the main conference

There is an interesting opening talk by the chairs showing some stats about the submission this year (as far as I can recall)

  • register the paper earlier, and submit it late (close to the DL) - this holds for most of the accepted ones
  • most of the accepted papers are from America, around 50%
  • certain key words in the title will lead to rejection: nodes, networking, vehicle, etc. (I should have taken a photo while laughing at that slide.. :(

Since there is no electro-plug in the conference room. Most of notes are on the paper. Will find time to move them here.

Keynote by Shahram Izadi from MSR Cambridge: "The New Face of the User Interface"

Introduce the work carried out at MSR Cambridge by the Human-Computer-Interaction (HCI) group, including the 3D interaction of physical and digital objects, 3D model reconstruction using smartphones (using the Queens' College and the Mathematics bridge in the demo). They highlight how to design using low price component and smartphones.

Session 1 - Communication Systems

Chaos: Versatile and Efficient All-to-All Data Sharing and In-Network Processing at Scale

Olaf Landsiedel (Chalmers University of Technology, Sweden), Federico Ferrari (ETH Zurich, Switzerland), Marco Zimmerling (ETH Zurich, Switzerland)

Q: Mainly evaluated by simulation, will it be a gap if the design is applied to real environment?

A: Not yet tested in real environment, hence we don't know.

Let the Tree Bloom: Scalable Opportunistic Routing with ORPL

Simon Duquennoy (SICS Swedish ICT AB, Sweden), Olaf Landsiedel (Chalmers University of Technology, Sweden), Thiemo Voigt (SICS Swedish ICT AB and Uppsala University, Sweden)

Practical Error Correction for Resource-Constrained Wireless Networks: Unlocking the Full Power of the CRC

Travis Mandel (University of Washington), Jens Mache (Lewis & Clark College)

Use CRC to correct wireless transmission errors, solving the reliable issue by their design TVA, Transmit-Verify-Ack.

Very good presentation, and presenter ends his talk with a song about CRC :)

Session 2 - Sensing People

iSleep: Unobtrusive Sleep Quality Monitoring using Smartphones

Tian Hao (Michigan State University), Guoliang Xing (Michigan State University), Gang Zhou (College of William and Mary)

Using sound monitoring to infer sleeping quality.

Lifestreams: a modular sense-making toolset for identifying important patterns from everyday life

Cheng-Kang Hsieh (UCLA CSD), Hongsuda Tangmunarunkit (UCLA CSD), Faisal Alquaddoomi (UCLA CSD), John Jenkins (Cornell Tech), Jinha Kang (UCLA CSD), Cameron Ketcham (Cornell Tech), Brent Longstaff (UCLA CSD), Joshua Selsky (Cornell Tech), Betta Dawson (UCLA CSD), Dallas Swendeman (UCLA David Geffen School of Medicine Department of Psychiatry and Biobehavioral Sciences), Deborah Estrin (Cornell Tech), Nithya Ramanathan (UCLA CSD)

Probably the longest author list in SenSys history.

14GB data collected for 44 young mothers.

Q: How do you recruit patients?

A: Done by the partners in hospital.

Q: Any new findings from the data?

A: We detect some hidden behaviors not noticed by the patients, e.g., unusual long walking outdoors, due to anxiety.

Diary: From GPS Signals to a Text-Searchable Diary

Dan Feldman (MIT), Andrew Sugaya (MIT), Cynthia Sung (MIT), Daniela Rus (MIT)

5Nov/130

Live-blog from SOSP 2013 – Day 2

Ionel Gog@srg.cl $

The second day of the 24th ACM Symposium on Operating Systems Principles has just started.  Several of us are here and will be live-blogging the talks and Q&A sessions.