{"id":1871,"date":"2014-11-25T14:51:45","date_gmt":"2014-11-25T14:51:45","guid":{"rendered":"http:\/\/www.syslog.cl.cam.ac.uk\/?p=1871"},"modified":"2016-10-04T13:10:07","modified_gmt":"2016-10-04T13:10:07","slug":"new-directions-in-operating-systems","status":"publish","type":"post","link":"https:\/\/www.syslog.cl.cam.ac.uk\/2014\/11\/25\/new-directions-in-operating-systems\/","title":{"rendered":"New Directions in Operating Systems"},"content":{"rendered":"
Notes from the New Directions in Operating Systems 1 day conference at the Shoreditch Village Hall in London.<\/p>\n
Antti Kantee<\/a>, Fixup Software<\/a><\/p>\n Writing drivers and filesystems is hard. Want to reuse this existing code, continuing to benefits from updates. A rump kernel runs on a hypervisor. A normal OS kernel with most of the code removed. Does not provide threads, scheduler, exec, VM. Can therefore run anywhere and integrates with anything. Some glue code connects to hypervisor. Application can add libc, etc, if desired. The syscall interface contains much useful logic (e.g. path resolution), so want to keep that too.<\/p>\n Was able to compile to JavaScript and debug driver using FireBug. Fetched a web-page in the Linux kernel. Also runs on Xen (Mini-OS) and bare metal.<\/p>\n http:\/\/rumpkernel.org\/<\/a> (BSD license)<\/p>\n Where are they useful?<\/p>\n Norman Feske<\/a>, Genode Labs<\/a><\/p>\n Laptop boots to GENODE desktop environment. Aims:<\/p>\n A GENODE system is structured as a tree, with parents owning their children. Parents provide resources to their children from their own resources and control what the children can do. Each application only needs to trust those components it depends on (e.g. its ancestors and services it uses).<\/p>\n Combined with virtualisations, can run e.g. Linux and BSD as processes. Also takes components from many places: Linux TCP stack, Rump kernel filesystems, etc, each running as a GENODE process.<\/p>\n Security is based on object capabilities. e.g. a child can tell its parent that it provides a service (e.g. the GUI) and other children can retrieve them. For example, xterm asks for access to the GUI. It\u00e2\u20ac\u2122s parent, the user session, tags the request with the application name and passes it to its parent (init), which forwards it to the GUI. Once the capability is granted, the application can use it directly without going via its parents.<\/p>\n Processes can trade resources. For example, a client can make a request on a server and pass the server the resources needed for the request.<\/p>\n VirtualBox has been ported (shows it booting Windows). Not optimised yet, but was able to play a video under it (slightly jerky).<\/p>\n Using a modified libc, can run a Unix environment without needing a virtualised kernel. Uses Vi to edit the GENODE background, showing integration with the rest of the system.<\/p>\n Runs a web browser (Qt app). The File Open dialog shows only a few files - the browser can't access anything else. Browser can run GENODE processes as (isolated) plugins. Uses a nested GENODE GUI to run a 3D demo. The plugin is also isolated from the browser - the browser can't get key presses from it (not sure how this works).<\/p>\n Current focus is making it usable as a dev system, adding a capability GUI.<\/p>\n Eating own dog food Capability-based UI<\/p>\n seL4 as base platform<\/p>\n ARM virtualization<\/p>\n package management<\/p>\n Q: How to revoke capabilities? Robert Watson<\/a>, Cambridge University, FreeBSD<\/p>\n 1990s: MAC and DAC. Orange book. Auditing. Slow, poor usability.<\/p>\n computer security in 2014<\/p>\n - embedded Lots of access control models. Hard for OS vendors to decide what to implement. => Linux LSM, FreeBSD MAC, etc. LSM not very composable, but MAC is. If policies disagree, it uses the most restrictive one.<\/p>\n Filtering system calls is not a good approach, due to concurrency. Instead every kernel component talks to the MAC framework. Attach labels to many kernel objects. They are useful for many different security frameworks.<\/p>\n MAC originally controlled access by users, but on modern devices it\u00e2\u20ac\u2122s applications that are the key subjects.<\/p>\n Debugging security is hard. Need good tracing. DTrace useful here.<\/p>\n Capsicum. Applications get compromised. Want to contain the attacker. Object capabilities for FreeBSD (extends POSIX by removing ambient authority). File descriptors become capability. Now a production feature in FreeBSD 10. Linux patches have been posted to linux-kernel. Linux seccomp required 11,300 lines of code, compared to 100 on Capsicum (doing what?).<\/p>\n Libraries can use sandboxing even when their application doesn\u00e2\u20ac\u2122t.<\/p>\n Compartmentalization is Neat! CHERI downloadable now!<\/p>\n Conclusion CHERI: Extending processors with better support for security. Open Source. Can be run on FPGA at home.<\/p>\n Gareth Rushgrove<\/a>, Puppet Labs<\/a><\/p>\n Any input to infrastructure is configuration<\/p>\n Configuration management is about managing those inputs over time<\/em><\/strong><\/p>\n History Future infrastructure as code<\/p>\n History<\/p>\n 50s research, 60s 480 series, 1991 MIL-HDBK-61 (conf mgmt guidance), Identification Cfmg verifies that the system is identified and documented in detail and performs as intended<\/p>\n Infrastructure as code<\/p>\n Ruby puppet SSH config example<\/p>\n Immutable Infrastructure Containers How immutable are your Docker containers? Infrastructure with APIs Configuration at a distance Increasingly managing higer-level systems Simpler hosts Moving configuration from host to network Future infrastructure as code Going from Puppet to etcd Going from etcd to Puppet Software installation done I want a pony Conclusions Michael Scherer, Red Hat<\/a><\/p>\n Ostree: read-only system in \/usr (with symlinks to \/var for mutable stuff). To upgrade, you must reboot. Can also reboot to old version. \"Git for filesystem\" (can branch, but not diff).<\/p>\n Want to protect base system from containers for OpenShift. Also, protect containers form each other. Uses SE-Linux for this (svirt).<\/p>\n Example: setting up a static web service, plus wiki and cache. Will put everything in a different container. Web designer used Fedora 20, so want to use the same. Want to isolate mediawiki from the network, except for the cache service. Using S3 storage for mediawiki. Want to make sure that updates are tracked, not just hacks on the production server. Everything is described in a JSON file (not shown).<\/p>\n Peter Tribble, Tribblix<\/a><\/p>\n Good features: ZFS, DTrace, Zones, compatibility.<\/p>\n Some cruft in the code base, bits not open sourced, things that need to be brought up-to-date.<\/p>\n Various distros: OpenIndiana, OmniOS (servers), SmartOS, Delphix (storage), etc.<\/p>\n But didn't like them. Wanted to understand how as OS really works. => Tribblix.<\/p>\n Focus on being retro, lightweight. Uses zones for app deployment.<\/p>\n Problems: not enough time or people. Fragmentation (work only being done at the distro level), SPARC port (about 3 users), cgo (C Go bindings) not working.<\/p>\n Martin Lucina, Lucina & Associates<\/a><\/p>\n Rump kernel plus:<\/p>\n e.g. \"rumprun xen ...\"\u009d. Runs as PV Xen guest using Mini-OS. Bare metal in progress.<\/p>\n Demo: compiles to 20MB hello world (debug build).<\/p>\n Demo: builds the same project once for Unix and once for Xen. Passes parameters for network and to map host files to guest NS.<\/p>\n Lots of changes to Mini-OS; needs upstreaming.<\/p>\n Franco Fichtner<\/a>, Packetwerk<\/a><\/p>\n Currently, no standards in this area. Reasons to do it include: speed, keeping complex code out of the kernel, using user-space libraries, binary libraries from venors, debugging.<\/p>\n gettimeofday slows things down. So, push the timestamp into the packet metadata and pass it through the stack.<\/p>\n zero copy: share memory between kernel and user. DPDK is stable. PF_RING has nice API. Linux only \/ GPL. Also, Intel only. Netmap (FreeBSD; patches exist for Linux).<\/p>\n Hard to play around with transport layer if you have to recompile the kernel every time.<\/p>\n Example: switch positions of TCP port and sequence number headers. If both endpoints are modified they will still work, but to monitoring systems on the path it will be difficult to track them (looks like the port keeps increasing).<\/p>\n Alexis Richardson<\/a>, Zettio<\/a><\/p>\n Weave: software defined network, for networking containers. Focus is ease of use.<\/p>\n Docker has changed things: now a container is what you ship.<\/p>\n New myths (like Sun's old networking myths):<\/p>\n Hajime Tazaki<\/a>, University of Tokyo<\/p>\n It takes a long time to get a new network feature (e.g. a TCP option) deployed and widely used. In the filesystem world, FUSE makes it easy to test new features easily, even though performance isn't great.<\/p>\n Options:<\/p>\n NUSE is a library OS version of a normal network stack. Patch to the kernel tree adds new arch \"arch\/sim\"\u009d. Hijacks application's system calls to redirect to NUSE so that existing apps don't need to be changed.<\/p>\n Possible uses for NUSE: developing new protocols, process-level network virtualisation. Currently, very few applications work unmodified (ping etc).<\/p>\n Anil Madhavapeddy<\/a>, OCaml Labs<\/a><\/p>\n In distributed systems, complexity is the killer. We need to get rid of all these layers. We all depend on infrastructure that suffers security breaks again and again. Let\u00e2\u20ac\u2122s get rid of POSIX and just keep network protocols as the standards.<\/p>\n Example: the slide stack is being server from a little ARM box (attached to the laptop by a network cable).<\/p>\n Although Mirage often runs of Xen, Mirage is more general than that. e.g. there is a JavaScript backend. You can start by building against Unix with Linux sockets, then remove this and use direct stacks.<\/p>\n Mirage aims to be contained, compact and efficient. And type-safe.<\/p>\n Each Xen guest has a flat memory layout and uses a single core. We trust all the code running in the unikernel, relying on type-safety, and try to protect against external threats.<\/p>\n Some common unikernels (DNS server, web server, etc) are around half a MB (smaller with dead-code elimination). Unikernels boot fast enough that they can be used like processes. Small enough to store the binaries in Git. Example: git pull to update web-site. Can git bisect to track down problems. Can detect a commit to a Git repository fixing a security bug and recompile within seconds.<\/p>\n Mirage defines abstract module types for devices, network flows, etc. There are about 80 libraries implementing these protocols.<\/p>\n OCaml is a good choice because of its module system. Like C++ templates but with more type safety.<\/p>\n http:\/\/nymote.org<\/a> is a project building various services using Mirage.<\/p>\n Example: OCaml video decoder compiled to JavaScript. The Mirage JavaScript port replaces the entire HTTP library with JavaScript requests. Karanbir Singh<\/a>, CentOS<\/a><\/p>\n CentOS has thousands of extra tests, performed on every build. Build by ops people, not developers.<\/p>\n It's the MySQL developers' responsibility to check their code, but CentOS's responsibility to make sure it links against the right versions of its libraries.<\/p>\n They have a custom build system called Reimzul. A system of triggers will e.g. run the tests for MySQL when openssl changes. Can automatically find code that changed and map it to tests that failed.<\/p>\n Also test external uses of CentOS. e.g. check that WordPress and Drupal still work after a CentOS change.<\/p>\n Also sanity checks: e.g. new version number is higher than previous.<\/p>\n 128 physical machines, 768 cores.<\/p>\n Geo Carncross, Telemetry<\/a><\/p>\n\n
Genode -- OS security by design<\/h1>\n
\n
\n- noux (gcc, vim, bash coreutils)
\n- wireless networking<\/p>\n
\nA: Destroy subsystem or destroy cap referent object.<\/p>\nNew ideas about old OS security<\/h1>\n
\n- ubiquitous
\n- many computers per person rather than many users per computer
\n- security still neglected
\n- software liability still disclaimed
\n- lots of malware<\/p>\n
\n- but different ways to break things up
\n- has overhead
\n- hardware does this badly
\n- so, put capability support into the architecture!
\n- now, userspace sandboxing with little os intervention
\n- BERI\/CHERI
\n- can we do fine-grained cpu-supported compartments and not touch kernel?
\n- How does tagged memory interact with virtual-memory systems?
\n- How should syscalls work? Sould be blox them?
\n- How do signals work?<\/p>\n
\n- Blurring boundaries to support application needs
\n- FreeBSD OS book (second edition)
\n- Saltzer and Schroeder<\/p>\nManaging configuration for future operating systems<\/h1>\n
\nEmerging Patterns
\n- immutable infrastructure
\n- infra APIs
\n- Autonomous systems
\n- Simpler hosts<\/p>\n
\n1998 ANSI-EIA-649?<\/p>\n
\nControl
\nStatus accounting
\nVerification and audit<\/p>\n
\n- Build once, run many times
\n- Amazon Machine Images
\n- E2e automation to avoid the golden image problem (heavyweight build
\nrequirements but no tracing)
\n- AMI build, audit, and trace cycle is faster, fundamental change<\/p>\n
\n- Docker is the UI
\n- LXC was painful
\n- Usability matters
\n- We have the primitives for immutable infrastructure but toolchains
\nare coming up<\/p>\n
\n- Nothing. You must impose it.<\/p>\n
\n- Cloud providers
\n- OSv machine APIs
\n- Network and storage, too (not just compute)
\n- Not just *nix, windows, too
\n- I love PowerShell.<\/p>\n
\n- No physical access.
\n- No remote access.
\n- Not user-like. Programmatic.<\/p>\n
\n- Not individual machines
\n- Servers are cattle not pets (fields and farms)
\n- Autoscaling groups (AWS)
\n- Mesos is a system which does this aggregation
\n- An operating system for a data center
\n- Kubernetes
\n- An ocean of user containers
\n- Scheduled and dynamically packed into nodes<\/p>\n
\n- Combinatorial package explosion
\n- Management of this package configuration is hard
\n- Project Atomic addresses this
\n- OSTree \"git for operating system binaries\"
\n- atomic config example
\n- CoreOS
\n- For container operating system (etcd, initd)
\n- \"firmware for running containers\" ~ John Vincent lusis.org<\/p>\n
\n- etcd, consul, zookeeper<\/p>\n
\n- From
\n- host centric
\n- localized
\n- executable for integration
\n- To
\n- Cluster centric
\n- Distributed
\n- HTTP for integration<\/p>\n
\n- puppet is dsl for describing infrastructure
\n- provides a graph abstraction
\n- where similar interfaces exist, provides abstractions
\n- e.g. key value store
\n- garethr\/key_value_config<\/p>\n
\n- e.g. writing to disk<\/p>\n
\nMore interesting to control installed software
\n- start a docker container
\n- run app on mesos
\n- set up security groups on AWS
\n- digitalOcean<\/p>\n
\n- Managing an autoscaling CoreOS\/Atomic cluster in AWS
\n- with etcd\/consul
\n- immutable instances
\n- with the network in VPC\/Weave
\n- with docker container arranged by Kubernetes
\n- All from Puppet DSL<\/p>\n
\n- Future here but not evenly distributed
\n- These tools look like the future
\n- Manage not just provision
\n- In Search of Certainty<\/p>\nExploring a new way to manage systems with ostree and Atomic project<\/h1>\n
Tribblix: adventures with illumos<\/h1>\n
Rumprun for Rump Kernels: Instant Unikernels for POSIX applications<\/h1>\n
\n
An introduction to userland networking<\/h1>\n
Weave: Myths of the New OS<\/h1>\n
\n
Network Stack in Userspace<\/h1>\n
\n
Jitsu: Just-in-Time Summoning of Unikernels<\/h1>\n
\nExample: TLS echo service running on ARM board, all implemented in OCaml.<\/p>\nCentOS Linux: A Continuously integrating platform<\/h1>\n
How to program computers (kos)<\/h1>\n