EuroSys 2011, day three
Session 7: Better Clouds
Kaleidoscope: Cloud Micro-Elasticity via VM State Coloring
The problem is that load on internet services fluctuates wildly throughout the day, but the bursts are very short (median around 20 minutes) and cloud providers are becoming "less elastic" (bigger VMs up for longer), and cannot support such short bursts because VMs are too heavyweight. The solution is based on VM cloning (SnowFlock), but the lazy propagation of state in SnowFlock leads to lots of blocking after the clone (for TPC-H). The solution is to do page coloring to work out the probable role of the page (code vs data, kernel vs user, etc.), and then tune the prefetching by color (such as read-ahead for cached files). Kaleidoscope also reduces the footprint of cloned VMs by allocating memory on-demand, and performing de-duplication. Most server apps tolerate cloning (only change is a new IP for the clones), and SPECweb, MySQL, httperf work fine. The experiments involved running Apache and TPC-H. Blocking decreases from 2 minutes to 30 seconds. TPC-H takes 80 seconds on a cold Xen VM, 20 seconds on a warm one, 130 seconds on a SnowFlock clone, and 30 seconds on a Kaleidoscope clone. Based on a simulation of an AT&T hosting service, Kaleidoscope achieved 98% fewer overheads using a 50% smaller data center. - dgm36