ICFP 2014: Day 1 (another perspective)

1Sep/140

ICFP 2014: Day 1 (another perspective)

This is day 1 of the 19th International Conference on Functional Programming. Leo and I (Heidi) are here live blogging from Gothenburg, Sweden. He posts is here, we will combine them (with photos :)) in the coming weeks.

the beautiful conference location

Opening

Good morning and time to get started. Official definition of ICPF: its a forum for discussion of researchers.Â 466 people have registered for the conference or associated events. 21 sponsors giving $60K of sponsorship.

We can get a boat to the conference banquet :)

A big big thank you to the PC.

Keynote: Using Formal Methods to Enable More Secure Vehicles: DARPA's HACMS Program
Author & Speaker: Kathleen Fisher (Tufts University)

DAPRA, US funding organisation with rotating project management so constant surprises. Clean slate formal methods for proving securityÂ for embedded systems. Power, TV, medical devices, phones and vehicles, all have remote security risks with amusing exploits.Â Car is a computer on wheels, with many ECU's contacted with buses both high and low speed. If you can connect to theÂ network, you can control anything under software control, getting to the network is easier then you'd think e,g using theÂ diagnostic plug under the steering wheel. Almost all theÂ car systems are under software control, include breaks. You can get access remotely by 4 ways: e.g. via bluetooth, allstar, virus toÂ mechanises diagnostic equipment, via music to buffer overfull car software.

The traditional fix: air gaps & obscurity. Air gaps a no longer the case, cars need to be networked e.g. for remote monitoring.
Lots of documentation for this systems nowadays

The desktop approach fix: this approach barely works on desktop. The security issues with security code, approach 1/3 in recent watch list.
Current solutions are millions of LOC, extensive monitoring need extra CPU. Embedded codes goes though lots of review, no "patch every Tuesday" approach.

The new fix: Formal method based approach. Why now ? Lots of tools & infrastructure which is now mature and advances in SAT solves in later 10 years.

3 approaches:

code synthesis: people can't write secure C code, computers can and provide proofs. DSLs, Interactive theorem prover as PL.
HACMS program: collection of performers, bid for technical areas such as:
- Vehicle experts
- OS
- Control systems e.g. fly control
- Research Integration: managing rest of performers, composition, tools
- Red team: white hat hackers, measure effort needed to hack

Air team
- Boeing - unmanned little bird helicopter & open source version
- NICTA
- Galois
- RC*/U.Minn

All cars are vulnerable, no point in naming names.

composition means how do you get high assurance components and build a high assurance systems ?
Started with Pem testing 4 vehicles, 2 prep and 2 open source. The systems had been engineered to be easy to connect so this wasn't a surprise.Â Hijacking demonstration of AI drown using aircrack. This can be quite a shock to the user. e.g care driver.Â Ever 18 months, the red teams will report on how much effort is required to hack. Starting with monolithic quad. They began by upgrading processorÂ and refactor monolithic software, the H/W abstraction layer was a big hit with the community. Gradually swapping out legacy code to high assure code.

How should you do if you know your under a DoS attack ? it depends on context

Its been proved that system is memory safe, ignore malformed messages, ignores non-authenticated and all good messages will reach motor control.

"The most secure UAV on the planet" :)

AADL - language for architecture specification used to propagate system wide properties down until an assurance it reached. A formal systematics haves been added to AADL and provided using resolute tool, assurance cases and synthesis tools. The glue code can be generated, this is typically the weak point.

Galois haven written two DSL's in Haskell, They have created Ivory for C like code and Tower for tasks/connections.

NICTA have written a fully functionally code micro kernel.

Boeing are using the tools build to develop the unmanned little bird. There on track for running real HACKEMS code on a helicopters in 2 years.Â There's a linux partition for mission specific code, the red team will have full access to this partition.

Q: Are you working on the wrong problem ? We don't have the talented people to make this commonplace, this isn't cool enough
A: Hacking demonstrations get people interested. The US government is considering mandating that all cars are networked, but this is dramatic
security issues, remote code execution

Q: Why is the hardware so constrained ?
A: It is very high scale, there's very small margins

Q: What did this cost, what this research want to do something differ to what management want ?
A; $70 million, we discuss but that why its good to NFS

Q: How much more expensive will high assures systems be ?
A: We don't know yet, that what we are working on. We focus on re-usability, open source, scale to reduce costs.

Q: What's the multi-core story ?
A: Its much more difficult to verify multi-core, bowing want single core, maybe they won't be available soon.

Q:
A: AR Don was started with an existing open source platform

Q: How will this approach scale
A: 80 % quadcoper is high assurance. But we don't even now how many CoL there is in car ? Linux is big, but we can add high assurance code progressively.

Q: How can insure and secure software co-exist ?
A: That's exactly what we are testing with the red team and linux partition.

Q: How do you know that hardware isn't vulnerable ?
A: Our project is big enough, other teams work this question.

Session 1: Domain Specific Languages I (Chair: Anil Madhavapedd)

Building Embedded Systems with Embedded DSLs (Experience Report)
Speaker: Patrick Hickey (Galois Inc)

A big thank you to kathleen

Goal: build a high assurance helicopter (quadcopter)

Challenges: Its an embedded systems, its high real
time system (1 millsec), system needs to be safe and secure.
Constrains: 3 engineers and 18 months, similar projects are 50K CoL.

There are billions of embedded systems, the 1 pence cost sensitivity it very high. Basically choice between C, C or C.
Typical secure issues, difficult to push a patch.

Approach: We would love to use Haskell or OCaml, but GC is not compatible with hard real-time. C or C++ (a subsec), a trade-off between safety and developer productivity.

Build your own tools, starting with a clean state language, correct by construction. We built a embedded DSL compiler in just a few month. The language is embedded in Haskell and compiles to C, its OS and called Ivory. Ivory is safe subset of C with memory & type system, no undefined behaviour. No heap, you can only allocate to the stack or global. We embedded in Haskell so we can utilise its type checker. Haskell is ivory macro language.

Modules are composed to expressions and global, there's a global name space.

Tower DSL - for organising whole applications. Build on top off ivory, to combine procedures into applications.

The quadcopter story - we have almost build all code in ivory. The bulk of the code is building drivers, that was about 10 kloc, the application was only 3 kloc. We needed to operate with existing the messaging format, we mangled some python code to generate ivory. Totally about 48 kloc.

Evaluation:
The code was delivered to the red team, and now no undefined/buffer overflows etc.. There exists some broad system level issues.

This was a large complex application, but we build high assurance code very quickly. Embedding DSL in Haskell allowed us to build compositional programs.

Source code is at ivorylang.org

Q: You and OCaml both target ARM, so why not us OCaml
A: We wanted compatibility with niche systems

Q: Did you run out of memory ? how do you communication this to the programmer ?
A: We don't currently have way to determine maximum stack depth, we are working on it.

Q: Correct of construction C code, you mean subset of C right ?
A: Yes, no pointer arithmetic

Q: Array handling
A: All arrays are static length

Q: What about error messages for embedded DSL's ?
A: It tough, and we would love to improve it

Q: How do you stop integer overflow ?
A: Using assertions instead of expressing in the type systems. The user can manually add annotations.

Q: Your re-targeted a python script to generate ivory instead of C, how many bugs did you find ?
A: Not sure, its was more clean slate

Concurrent NetCore: From Policies to Pipelines
Speaker: Cole Schlesinger (Princeton University)

Policy language for describing network policies. A quick intro to basic networking, we traditionally we use distributed protocols such as OSPF but now we have a centralized server to coordinate. We us openflow 1.0, each switch is a simple predicate and action table. There isn't just 1 table, different tables for many difference purposes e.g. different layer. Some switches have reconfigurable pipelines. We are now looking at OpenFlow 2.0 to switch into two phrases: configuration (like flashing router) and population.

We have developed a core calculus for pipelines, we have hardware models for 3 switch architectures and add virtual pipelines to our compiler.

The language:

example using the Broadcom trident switch, each table is a function from a packet to a set of packets

The syntax for an example pipeline (trident switch)
W;(all,{port,vport}) ; x:(dst MAC,port) ; if -(vport=1) then y:(vport,port) else id ; z:(all,all)

The control can connect to many switchers, the programmer can express a virtual pipeline as a minimum for each physical pipeline

example controller
m:(typ, port) + r:(dst MAC, port)

We developed 3 general techniques we designed for "RMT" architecture
1. Consolidate packet duplication
2. Refactor field modification
3. find field size

The syntax for the language, see paper. We are inspired by the frenetic family of languages.

We have added concurrency and thus race conditions, we using typing judgements for read and write permissions on each field.
Our normalisation lemma handles rules such as no overlap between read and write sets.

Q: Your plus operator seems to compile after parallelism
A: Its not compiled away just implicit

Session 2: Static Analysis (Chair: Ken Friis Larse)

SeLINQ: Tracking Information Across Application-Database Boundaries

Speaker: Daniel Schoepe (Chalmers University of Technology)

Motivation: Information flow control for computation & database queries. We given data to many complex application that we don't trust. Communication bounds better application is weak spot, so we consider end-to-end. Our contribution is end-to-end information flow with type checker and compiles to F#.

Typically we interface with SQL database, if we don't do it properly then there's lots if issues. Now people write code to generate inference code with SQL

Information flow code should into lead private data. Non- interference means private data channels shouldn't interact with public data channels. Language-integrated query using a toy language (subset of F#) with includes meta-programming. Security policies and security levels are expressed in the type system. We can then annotate databases columns, e.g. name is public but age is private.

Low equivalence relation is equivalence taking into consideration the new types for security levels. We can then use this relation to encode non-interference.

The type system is based on flowCaml, by Pottier and Simonet, 2003.

Our type checker and compiler implementation is written in Haskell using BNFC, we support ADT's .

Using a movie rental database as an example, we consider addresses as private but cities as public. We can query for all rental from a city with public but if we then try to get addresses, the type checker will reject this.

Q: Do you support more complex labelling schemes ?
A: Yes, we support lattices

Q: Can we classify fields or rows ?
A: No

Q: The exists function you demonstrated doesn't depend on type of elements, does it ?
A: No, but we can express that

Q: Do you deal with aggregation ?
A: We haven't considered that yet

Type-Based Parametric Analysis of Program Families
Speaker: Sheng Chen (Oregon State University)

A software system is a family of programs with a configuration, making error messages are difficult to read. We start with a program family, select a program and analysis but variational analysis takes a different route. We will look at the variational analysis today (VA):

Programming by search, applying an analysis gives us results R', but we expected R. We can instead introduce variations to data structures. e.g list -> variational list. Then we adapt analysis, create correctness proof and evaluate performance.

There's a lot of very helpful code extracts, syntax definitions etc.. that I didn't have time to copy, see the paper for them.

Q: In your performance results, did you examples fit into the more or less aggressive approach ?
A: They work even in the more aggressive example
We are breaking for lunch so we'll be back at 2:00

Session 3: Binding Structure (Chair: Tarmo Uustal)

Romeo: A System for More Flexible Binding-Safe Programming
Paul Stansifer (Northeastern University); Mitchell Wand (Northeastern University)

Programming is hard. Meta programming allows us to program to program, but I can introducing name issues. Alpha equivalent inputs, should produce alpha equivalent outputs. We have developed a binding safe programming language, with notation to express complex binding. Romeo expresses binding information in types.

Example remeo code:
stx = (Lamda (a) (+ a 1))

Romeo automatically renames a (as its not exposed) to makes sure no names clash.

Again, have a look at the paper for cool code examples

Romeo is a binding safe language that support term languages with complex binding structure

Q: In meta programming we want effects, how do deal with e.g. meta programming for loop invariant code motion?
A: Remeo is side-effect free, effects in an meta programming we'll take this offline

Q: Is this online ?
A: Not yet

Q: This is very similar to lamdba M ?
A: This was your inspiration but its not powerful enough as we have syntax cases

Q: Hygienic macro expansion ?
A: We indirectly mentioned it

Q: Thanks for getting starting with an example
A: Thanks

Q: What's the complexity of expansing remeo code ?
A: potentially n^2, we think we could perform the substitutions lazily to improve this

Q: can we represent practical thing ?
A: We don't have a story for possible racket macro's you could want, some are simple

Maximal Sharing in the Lambda Calculus with letrec
Speaker: Clemens Grabmayer (VU University Amsterdam)

People desire increased sharing, e.g. to reduce code duplication and check equality of unfolding semantics of programs.

3 Steps:
- interpretation - turning lamdba-letrec into lambda term graphs
- bisimulation check & collapse of lambda let graphs -
- readback

Our tool is available on hackage
Possible extensions include adding support to full functional languages.

Q: How is this similar to disiding type equivalent in System F?
A: I will look into this

Q: Is this subsumed by KleeneÂ algebra ?
A: I will look into this

Session 4: Program Optimisation (Chair: John Launchbury)

Practical and Effective Higher-Order Optimizations
Speaker: Lars Bergstrom (Mozilla Research)

Many CFA-based optimizations addressed in other ways, but higher-order inlining is the missing optimization. In practice, high performance functional programs avoid functions, e.g. OCaml compiler example.

What is the concrete function called from some computed function call ? (handled by 0CFA)

We have come up with a new analysis to see when can do higher-ordering inlining. We see if any free variables have been rebound between the closure capture location and the call site.

Technique:
- Normalize and number soure code
- Build control-flow graph
- Add CFA-infromed edges
- Perform rebinding

Q: How can I as a programmer understand when the optimisation will kick in ?
A: If your the compiler writer, yes, otherwise we don't yet have a story for this

Q: What happens with things like List.map and List.map ?
A: This are often special case in the compiler to inline

Worker/Wrapper/Makes It/Faster
Speaker: Jennifer Hackett (University of Nottingham)

Research focus on correctness of optimisations but there's a lot less work on how we check that the transformation is actually an improvement.
Example of adding an accumulator to th
e List.rev function

The worker wrapper theroms captures induction.

I'm headiing off and direct the reader over to Leo at http://www.syslog.cl.cam.ac.uk/2014/09/01/icfp-2014-day-1/