syslog
23Oct/130

Liveblogging IMC 2013 – Day 1

Beautiful and warm morning in Barcelona! I'm here today with a whole bunch of ex-SRG people: Narseo Vallina-Rodriguez, Ilias Leontiadis and Nishanth Sastry. I will do my best to liveblog as much of the event as I can.

Muddles in the Middle

Revealing Middlebox Intererence with Tracebox - short paper

G. Detal, B. Hesmans, O. Bonaventure (Université Catholique de Louvain), Y. Vanaubel and B. Donnet (Université de Liège)

End-to-end principle does not hold because of middleboxes in the net and especially NAT boxes.

Refers to previous work which has shown that with today's middleboxes almost every field of a TCP packet can potentially be modified, potentially making it difficult to detect some middleboxes. This is the motivation for Tracebox - they want to detect and localise all middleboxes.

Based on ICMP probes and incremental TTL. ICMP response includes the request payload, hence can use to detect middleboxes. Issue: ICMP (RFC792) only requires the first 8 bytes to be included, so only sequence number can really be used.

Deployment on planetlab - 70 vantage points talking to top 5000 Alexa pages... 80% paths tested contain at least one middlebox.

MSS gets modified as well...

The conclusion is that Tracebox can detect some middleboxes sometimes...

 

Demystifying the Dark Side of the Middle: A Field Study of Middlebox Failures in Datacenters - full paper

R. Potharaju (Purdue University) and N. Jain (Microsoft Research)

Confusion between application and network failures - nobody knows why things are not working and everyone thinks those are the other guys' fault. in 2008 middlebox failures contributed to 43% of high-severity incidents.

Want to answer three questions:

  • How reliable are middleboxes? How often do they fail and how long do failures last?
  • Failure root causes - What causes middlebox failures and what are the main resolution actions?
  • Middlebox redundancy - is middlebox redundancy 100% effective?

Challenges with:

  • How to remove duplicate events from syslog?
  • How to measure failure impact? Traffic impact - traffic distributions before and after failure. Use ratio of median traffic before/after failure. Best results are taking 2 hour periods
  • How to determine the root cause as indicated by natural language written error reports? Using approach of NETSIEVE (NSDI'13): what were the problems, what activities have been taken and what actions have been taken to resolve the problems.

Going back to the questions they wanted to answer

How reliable are they:

  • IDS systems fail infrequently, but with high variability
  • Load balancers have low failure rates, but different generations of load balancers have different characteristics.
  • VPNs and Firewalls are quite bad though, with around 24 failures per year for firewalls.

Most failures are short-lived, dominated by connectivity errors.

Median number of annual failures is small (1-3)

What causes the failures?

  1. connectivity failures
  2. hardware failures (mostly PSUs)
  3. misconfiguration, but not very frequent

Typically the first solution to solve the problem is to replace the cables and reboot, but the most common one is hardware replacements.

Effectiveness of middlebox redundancy: take the traffic impact test grouping the redundant middleboxes. Redundancy was ineffective in 33% of firewall failures, but almost 100% effective for IDS and VPN systems. The main causes are misconfiguration errors...

QA:

Q: Methodology - look back 2 hours, but traffic itself is varying (peak/off-peak), how do you deal with that?

A: All events deemed actionable by operators in the dataset were covered, so must be right.

Q: If I were to take the results and be practical... I sounds like I should deal with power supply vendors and have better configuration deployment policies. What can you do with Firewall failures though?

A: The main reason was difficulty of problem detection by replicating the problem the user is facing. Another problem was inconsistency between operators due to the lack of central management

 

A Method for Identifying and Confirming the Use of URL Filtering Products for Censorship  - short paper

J. Dalek, B. Haselton, H. Noman, A. Senft, M. Crete-Nishihata (Citizen Lab), P. Gill (Citizen Lab/Stony Brook University) and R.J. Deibert (Citizen Lab)

URL filtering devices are extremely common in all kinds of networks and are very versatile, making them very difficult to study.

Why studying them is important? Censorship and policy impacts.

Challenges: having vantage points within each country of interest and personal risk in a lot of client measurement (esp. some countries)

Proposed methodology: Locate deployments (SHODAN - looks interesting by the way), validate whether installation is live (because SHODAN data might be out of date - use WhatWeb) and confirming scope of deployment (site submission and in-country testing. Generate and submit test-sites to URL filter vendors, then using tester in the field and a remote server to fetch and compare results: page is blocked if pages are not the same).

Both SmartFilter and BlueCoat on Etisalat in UAE. SmartFilter in Saudi Arabia as well (Bayanet). Verified manually that SmartFilter is blocked by using an image of adult content from google image search :). Netweeper used in the Middle East; Netsweeper is peculiar in that it submits new domains automatically. Interestingly, in Yemennet pages were sometimes blocked and sometimes not, apparently because at the time they had a shortage of software user licenses.

Hope to cover larger datasets in the future.

 

Peer-assisted Content Distribution in Akamai NetSession - full paper

M. Zhao (University of Pennsylvania), P. Aditya (MPI-SWS), A. Chen (University of Pennsylvania), Y. Lin (Duke University), A. Haeberlen (University of Pennsylvania), P. Druschel (MPI-SWS), B. Maggs (Duke University/Akamai), B. Wishon and M. Ponec (Akamai)

The goal is to better understand peer-assistend CDN systems (hybrid of P2P and infrastructure solutions).

Use data from Akamai's NetSession: 25 million users in 239 countries currently! Data of october 2012: logins, logouts, downloads - 4 billion log entries, 133 million unique IPs in all countries. Quite a long description of how NetSession works at high level (very similar to BitTorrent).

Confirmed that content popularity still follows power law... Peer assist is generally used for files of hundreds of MB.

31% of users have uploading enabled. Likely affected by default settings: one version has it enabled by default and one disabled. More than 99% of users keep the default setting. Surprisingly, peers sent as 71.4% of data delivered by the CDN. Download rates only slightly lower from peers than from infrastructure CDN.

Risk: some ASes can upload a lot to others - potentially expensive. From the data, the traffic is generally balanced.

... aaand run out of time :)

Analyzing the Potential Benefits of CDN Augmentation Strategies for Internet Video Workloads - full paper

A. Balachandran (CMU), V. Sekar (Stoony Brook University), A. Akella (UW Madison) and S. Seshan (CMU)

Data from Conviva - 2 months, 30 million video sessions, clients in US.

CDN infrastructure is being stressed because of the huge increase of video traffic - CDN streaming failures are becoming frequent.

Existing CDN augmentation efforts: hybrid P2P-CDN designs (Akamai NetSession, Chinca Cache LiveSky) and federated CDNs (Open Carrier Exchange).

The questions are:

  1. are there any video-specific access patterns?
  2. what are the potential improvements?

Conventional wisdom: P2P will work for live, not for video on demand. Data makes the observations questionable to some extent. VOD for series can work very well after the initial release (not really surprising).

Their P2P CDN model for analysis: users are geographically close, watch the same chunk of a video and the size of the "swarm" of users is limited. BW savings: 98% for live and 87% VOD with 5 minute chunks and swarm size of up to 100. Especially good to use P2P to filter out "early quitters" (people watching only an initial fraction of a video).

Federated CDN: When is federation beneficial?

ISPs have varying regional presence and an ISP without significant coverage in an area can redirect customers to a federating ISP.

Regional interest in Live content typically correlates with population density, but occasionaly (e.g. sporting events with a team playing at home) there are changes.

Model as a resource allocation problem: minimize latency + number of session drops.

With country-wide federation sufficient to provision 0.8 of historical peak load for 100% availability (1.4 with regional federation).

Q: Have you looked at performance cost? E.g. serving data to west coast from east cost...

A: Not suggesting to spread around too widely, just showing that there is space for improvement due to time zone differences.

 

Keynote

by Pablo Rodriguez

IMC was started when drinking wine over the problem of not having a good venue to submit papers to :)

Pablo started his PhD in physics, counting photos in a basement... But got bored of that and got more interested in content distribution, which became a real problem when Victoria's Secret advert broke the Inernet!

Went to Microsoft and started working on P2P content distribution systems and immediately got blocked by IT department... Solution: go and have a chat with Bill Gates! Kept on working on the overlay content distribution system - Avalanche.

The big issue then became incentives: ISPs could not possibly be happy about that... Go back to science!

Fast-forward to moving to Telefonica: the same problem, rehashing the paper, changing the title, convincing people and putting P2P on set-top boxes. But in fact P2P started going down because of the cloud. In the long run it turned out it was cheaper and easier to do Cloud than P2P... But Pablo got promoted out of the whole problem...

Offering some insider information now: telcos have reached the end of one business period, have become a commodity business and revenues have flattened out while traffic is growing exponentially. Sadly, 60% of costs go to access - the cost that is not going to go down. This is why AT&T have decided to sell their cell towers and Telefonica and Vodafone are consolidating infrastructure.

What is the new business then?

Telco revenue worldwide is $2000 Billion while most internet businesses are a fraction of that.

Internet of Things and Big Data

The largest social network? The phone network! Not very explicit, but a very valuable one. And is authenticated, has a call graph, billing, location, scale, universal trust and everything else!

Phone network data can help city planners, institutions and policy makers: socio-economic analysis, epidemic analysis, human migration, etc., etc.

A clash with privacy... The future is then Individual as the New Platform! Giving back data and control to the users.

Moving onto more peculiar fields... Food and fine dining. The story about ElBulli and the collaboration between ElBulli and Telefonica and how technology plays an important role.

The world has changed since the first IMC - the community has adapted well, but needs to keep adapting: huge amounts of data, environment changing continuously... Economics, incentives, sociology, policy, politics... But our community is doing well and can play an important role in changing the world :)

QA

Q: One of the problems is privacy of data measurements and repeatability... Solutions to that or ways of moving forward?

A: More and more data is becoming available already... Will probably just keep moving in that direction

Q: You raised the question 'Why' do one thing or another. There is a corporate and a personal why, which are not necessarily aligned - is the friction good or bad? Are companies too inflexible to realise the need or is it beneficial for individuals to try and convince others that their research is important?

A: There is only one why and that is yours: what makes you passionate about a problem and what will help you navigate around all the issues? It is important to feel in place

Q: (Bala): AT&T research is not closed! Only changed the name and fired a few people.

Names and Domains

D-mystifying the D-Root Address Change (review) (short)

M. Lentz, D. Levin, J. Castonguay, N. Spring and B. Bhattacharjee (University of Maryland)

 

Understanding the Domain Registration Behavior of Spammers (review) (long)

S. Hao (Georgia Institute of Technology), M. Thomas (Verisign, Inc.), V. Paxson (ICSI/UC Berkeley), N. Feamster(Georgia Institute of Technology), C. Kreibich, C. Grier (ICSI) and S. Hollenbeck (Verisign, Inc.)

 

Domains are a valuable resource, but often abused by spammers. Any chance we could use domain registration information to predict spam before it happens rather than filter it at the time of use?

Typical lifecycle: 1-10 years active time, auto-renew grace of 45 days, redemption grace of 30 days, pending delete of 5 and removal.

For their research, they had to collect all the new domains in .com. Received data from Verisign:

  • over 5 months (march-july 2012)
  • almost 13 million domains
  • 134k+ were blacklisted later

What registrars are used by spammers? 70% of spammer domains from only 10 registrars. These registrars only contributed 20% of total domains in .com. However, number of non-spam and spam domains is correlated, hence most spammers use the most popular registrars that also host non-spam domains - no simple solution with registrar control.

Do spammers register domains in groups? Taking 5 minute epochs, only 20% of spammer domains were registered individually and most - in batches of 200 or more.

How can we identify "abnormally large" registration batches?

Spammers commonly re-register just deleted addresses when doing registrations in batches, but only 6.8% of domains that had previously been in blacklists.

Conclusion: registration patterns can help in distinguishing spammer domains, but they have not found a single accurate feature-set to do domain checking automatically.

Q: What about spelling variations and different zones?

A: Only focused on .com (no comment about spelling)

Q: why are non-spammer domains registered in bulks?

A: parking domains to get traffic, different kind of malicious...

Q: why do spammers register domains in bulk?

A: Hypothesis is bulk discounts and availability of APIs

 

On Measuring the Client-side DNS Infrastructure (review) (long)

K. Schomp, T. Callahan, M. Rabinovich (Case Western Reserve University) and M. Allman (ICSI)

Chair suggest this to be a "perfect IMC paper" :)

DNS resolution path is complex and hidden: multiple layers of resolvers, controlled by different organisations - hard to say who's responsible when things go wrong...

Want to discover open resolvers and classify open resolvers (ODNS) that forward to recursive resolvers (RNDS) as forwarding dns (FDNS).

RDNS servers don't typically respond to direct probing, need to rely on FDNS resolvers. From the authoritative side do CNAME chaining: RDNS will frequently peer to other RDNS resolvers and the authoritative server sees the request.

Only probe own domain to have minimal effects on the network.

ODNS are short lived - 20% are available for less than 3 hours, so must experiment during discovery.

Measure FDNS using cache injection - origin asks FDNS for a domain, which queries RDNS making the result ambiguous. Doing the same request just after the FDNS does not forward anymore however in 7%-9% of cases.

To probe RDNS use coordinated probing - multiple FDNS servers simultaneously - relying on the fact that multiple FDNS servers use a small number of RDNS servers and FDNS use pools of RDNS (80% of FDNS use 3 or more RDNS servers).

20% of FDNS are at least a 1000 miles from all of their RDNS servers and 20% of FDNS have 100ms or higher RTT to all of their RDNS. Some neat trickery on how the RTT is estimated - check paper.

Caching is problematic (shown before) - TTL ignored or violated. To check that, use majority reporting with coordinated probing. Small TTLs are commonly decreased and long ones are decreased. In aggregate, 30% of records are evicted before TTL while 10% are retained for longer than TTL.

More "popular" RDNS discovered early in the scan are more likely to be honest - something to do with the fact that RDNS is discovered through FDNS, naturally. FDNS dataset may not be representative though.

Q: What exactly do you need to send to FDNS to do an injection?

A: Simply send a response to port 53 on the devices and they accept it - the transaction ID does not matter and they accept an unsolicited response just out of the blue, scarily!

Q: any particular manufacturer doing that?

A: Not a single one for sure, but hard to say. Most of the vulnerable DNS servers run the same embedded web server, but not sure whether that's a good indication

Q: Local/home resolvers?

A: The home ones are likely the cause - they simply forward.

V4 and V6

Internet Nameserver IPv4 and IPv6 Address Relationships (review) (long)
A. Berger (MIT CSAIL/Akamai), N. Weaver (ICSI/UC San Diego), R. Beverly (Naval Postgraduate School) and L. Campbell (Larry Campbell)

Interested in finding associated IPv6 and IPv4 nameservers. The why: IPv6 geolocation, security (DoS).

One technique: passive, opportunistic DNS technique - based on hierarchies and the first-level authoritative DNS returning requesting resolver's IPv4 address encoded in IPv6 response. The seen address can then be logged for matching. Simple technique, almost no modifications and used by Akamai.

Sometimes, although less common, the pairs are not 1-1. 674k pairs in the Akamai's dataset. The equivalence classes (types of mappings) do become quite large, sometimes up to 100s of addresses. 1-1 are 34%, aggregating the addresses by prefixes (/64 for v6) that becomes ~50%.

 

Active probing technique: glue in NS records alternates between AAAA and A. CNAMEs encode source addresses and at the response to the prober at the end contains TXT record with a nonce, the 3 addresses from CNAME plus the address from final query.

Validation: starting with the passive dataset, determined open resolvers with v6 access - 5300 ipv4 and only 1700 ipv6. Addresses in 1-1 equivalance class: 87%. More probes reduce the proportion with this technique though (reasons discussed in the paper), but the increments become marginal after certain numbers.

For validation, contacted 6 network operators; 3 responded that all the 1-1 associations were indeed the same machine.

Akamai uses the technique to bootstrap IPv6 geolocation

Q: 1000s in your dataset and millions in the previous talk - why is that the case?

A: Restricted size of the dataset to begin with, but what significantly cut down the proportion was requirement for IPv6

Q: geoip - techniques that give you finer-grain info?

A: further aggregation, but very importantly need more sophisticated filtering and data validation

 

Understanding IPv6 Internet Background Radiation (review) (long)
J. Czyz, K. Lady, S. Miller, M. Bailey (University of Michigan), M. Kallitsis and M. Karir (Merit Network)

 

Speedtrap: Internet-Scale IPv6 Alias Resolution (review) (short)
M. Luckie (CAIDA/UC San Diego), R. Beverly, W. Brinkmeyer (Naval Postgraduate School), and K. Claffy(CAIDA/UC San Diego)

How is the router-level structure of IPv6 internet evolving with deployment? Need traceroute and alias resolution, but so far no internet-scale alias resolution technique.

Speedtrap uses IP-ID to fingerprint IPv6 routers; trying to send the minimum number of packets given the lack of counter velocity. Technique: obtains an ID field by sending a router an ICMP packet too big message (PTB) with an MTU field smaller than the size of the pack- ets solicited from it... Details in the paper.

 

Sorry about the lack of detail in the very last session... Anyway, done for today!