14Aug/130

Liveblog from SIGCOMM13 â€“ Day 2

Hello again from SIGCOMM 2013, Hong Kong, day 2. I'll be live blogging as much of the event as I can, weather permitting...

Morning Sessions Canceled

The Hong Kong weather observatory has issued a "Typhoon Warning 8" due to Tropical Cyclone Utor, which basically means that all government infrastructure has been suspended. This includes the University at which SIGCOMM13 is being held. We are awaiting further updates. Current weather forecasts suggest that this condition will remain until around 2pm today. Â More details expected at midday.

Datacenter Networking 1

Ananta: Cloud Scale Load Balancing

http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p207.pdf

Desire a loadÂ balancerÂ that does 20TBbps and has N+1Â redundancyÂ with quick failover.Â
Distributed loadballencer with PAXOS for keeping NAT state synced.
To manage latency, group pools of 8 port and 160 VMs.

[mpg39: sorry, this is a bit of a poor summary, I was a bit late and the speaker was veryÂ quiet]Â

Speeding up Distributed Request-Response Workï¬‚ows

http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p219.pdf

Latency matters ==> Strict SLAs for latencies
Why? Transient hotspots, network congestion etc.
Web services are implemented as complex workflows with a collection of input/output dependencies.
Up to 150 workflow stages, degree 40, 10 steps ==> stochastic latencies accumulate
Lots of work has been done, but we care about end-to-end latency
Reducing latency:
- Application level techniques:
  1. Reissue a request after a timeout -> uses extra resources
  2. Terminate a request early. Use whatever results we have got before a timeout to produce results. Â ->returns Â incomplete results
  3. Priority and pre-emption can be used. -> introduces additional contention
- Results in conservative parameters for all of the above.
Kwiken: Uses workflow description and logs to get an idea for latency at different stages.
Challenges:
1. Â Different stages show different gains with different techniques
2. Composing costs across stages is non-trivial
3. Decomposing end-to-end latency into indivitual components is hard.
Kwiken: End-To-End Design
Goal to minimise latency subject to cost <= budget
Solution #3 Minimize sum of variance in workflow latency instead
Solution #2 [..]
Solution #3 Construct per-stage variance response curves.
Eval: Traces from 45 most used workflows at Bing, simulated work loads
150 stage workload, with a 1% budget results in a 75% latency reduction using a combination or reissues and terminations.
Conclusions
- Complex workflows lead to high tail latencies
- Kiwiken is an end-to-end framework with combines many techniques.

Leveraging Endpoint Flexibility in Data-Intensive Clusters

http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p231.pdfÂ

Communication is crucial to analytics. 33% of the time is spent in comms.
Network usage isÂ imbalanced
Â What is the soruces of cross-rack traffic.Â
54%/40% from DFS writes.
DFS - GFS, HDFS, Cosmos etc.
- Many frameworks communicate via the DFS.
- Â Files ate 64MB to 1GB
- Each block is replicated 3x for FT
- 2 fault domains
- Uniformly (random) placed for balanced storage
How to handle DFS flows? (A few seconds long)
Observation: We don't care where the replicas end up.
Sinbad - Faster writes by avoid contention during writes.
Assumptions:
- Link util is stable during the write duration
- All blocks are the same size.
Greedy placement minimizes average block/file write time.
Reality:
- Link utilisations are reasonably stable
- Blocks are mostly (93%) the same size.
Sinbad puts the first block through the lowest loaded link.
Master:
- Prosodically estimates network hotspots
- takes greed online decisions
- adds hysteresis to next measurement
Evaluation
- 3000 node trace driven simulation.
- Simulation matches against a 100node EC2 deployment.
Simulation 1.39X, expr 1.26X. If using in memory storage 1.6X faster.
What about storage balance?
- Hotspots are uniformly distributed
Related work:
- #1 increase capacity (bi-sec BW)
- #2 Decrease loads: Data locatalty
- #3 Balance usage
Planning to deploy Sinbad at Facebook.

Questions

Q: How do you asses the readÂ performance? A: Assume that the file will be read by many readers. Â

Q: What does the name mean? A: Wanted to call it bad, then nbad, but that sounded bad, so called it Sinbad.

Q: What is the network architecture assumption? A: We need to know the possible locations of bottlenecks.

Q: [can't hear] A: used GFS open source using a plugin.

Q: Can you apply this to task scheduling? A: The reason we didn't do it is the complexity is very high if you must consider receiving as well.

Liveblog from SIGCOMM13 â€“ Day 2

Morning Sessions Canceled

Datacenter Networking 1

Ananta: Cloud Scale Load Balancing

Leveraging Endpoint Flexibility in Data-Intensive Clusters

Leave a comment

Pages

Categories

Blogroll

Archive

Meta