26 | February | 2012

original:http://highscalability.com/blog/2011/11/10/kill-the-telcos-save-the-internet-the-unsocial-network.html

Someone is killing the Internet. Since you probably use the Internet everyday you might find this surprising. It almost sounds silly, and the reason is technical, but our crack team of networking experts has examined the patient and made the diagnosis. What did they find?

Diagnostic team: the Packet Pushers gang (Greg Ferro, Jan Zorz,Ivan Pepelnjak) in the podcast How We Are Killing the Internet.
Diagnosis: invasive tunnelation. (tubes anyone?)
Prognosis: even Dr. House might not be able to help.
Cure: go back to what the Internet was; kill the tunnels; route IPv4 and IPv6; have public addresses on everything; disrupt the telcos.

This is a classic story in a strange setting–the network–but the themes are universal: centralization vs. decentralization (that’s where the telcos obviously come in), good vs. evil, order vs. disorder, tyranny vs. freedom, change vs. stasis, simplicity vs. complexity. And it’s all being carried out on battlefield few get to see: the infrastructure of Internet.

Our emergency medics for this battle, in this free flowing and wide ranging podcast, pinpoint the problem: through IPv6 and telco domination we are losing the original beauty and simplicity of the Internet. In summary:

We’ve effectively turned the Internet into a place with a bunch of tunnels infected with many layers of translation points. This is not the Internet we were thinking of of 20 to 30 years ago. We are stuffing IPv4 and IPv6 packets into a buch of tunnels: MPLS, VPLS, PPoE, PPoA, etc. Just a bunch of tunnels going on. The IETF standards continually talk about tunnels, IPv6 over IPv4 tunnels, carrier grade NAT, 6RD, and even NAT.

In another post, Greg Ferro asks: Shouldn’t we replace tunnels with routing and letting the network be a network, not a bunch of tunnels over the backbone? The alternative to tunnelling is routing and that’s what the Internet has always been about. Why isn’t the industry going back to what the Internet was? Let’s have IPv4 and IPv6 on routers and let’s have public addresses on everything.

So there are lots of issues here, but the main themes are: 1) tunnels suck 2) centralization vs decentralization.

What Are Tunnels?

First, what are these tunnel things? From networking expert Ivan Pepelnjak:

You can talk about tunneling when a protocol that should be lower in the protocol stack gets encapsulated in a protocol that you’d usually find above or next to it. MAC-in-IP, IPv6-in-IPv4, IP-over-GRE-over-IP, MAC-over-VPLS-over-MPLS-over-GRE-over-IPsec-over-IP … these are tunnels.

Why Do Tunnels Suck?

The podcasters had a lot of reasons why tunnels aren’t a good design:

MTU issues. Different networks have different packets sizes so packets have to be fragmented and reconstituted as they flow through the network. This process is complex, slow, and error prone.
Visibility. When packets are inside a tunnel you can’t apply your security policies to what is inside the payload.
Load sharing. You can’t reorder packets inside a flow and within tunnels everything looks like one flow, so you can’t load share across flows.
Suboptimal paths. Once you put a packet in a tunnel you can’t react to network changes so there’s no guarantee that your traffic is taking the optimal path. It could be taking the worst path, but you have no idea. Tunneling style networks won’t survive a catastrophe whereas the typical adaptive networking will.
NAT – translation. Why do we have to do that? We are introducing another layer of NAT called carrier grade NAT (evil). NAT broke a security model based on unique IP addresses. The IP address is not a unique identifier for a user so there’s no way to identify who is doing the bad thing. All you can do is identify the organization where the IP address came from. We are just playing the blame shifting game.
Complexity. Overlays, tunnels and NAT bring complexity into a network that creates failure, in multiple modes, in many different ways.
Centralization. More on this in the next section.

Telephony Thinking Is Killing The Internet

Tunneling is about centralization. The Internet is about decentralization. Tunneling creates distributed state because it must be tightly coupled to the core. If you want fast reroute on a failure, for example, you have to tightly couple the edges with the core.

You have to choose one or the other. Centralization or decentralization. The core business of carriers should be bandwidth: transport IP, lay cable, and peer everywhere. Focus on delivering packets instead of layering on higher margin complexity. More bandwidth everywhere. Stop having something over something. Instead, move intelligence to the edge and keep the core simple with fast switching.

The centralization push is coming from the carriers because complex services in the core are high margin services. Layer 2 vMotion, for example, is another example of trying create a high margin product through complexification and centralization.

And then the podcast wraps up with this call to action:

This telephony style thinking and demands to vendors is slowly killing the Internet. Carriers think telephony. They are trying to impose a telephony model over the Internet, which means centralization and complexity. What makes the Internet go around today is telcos connected to each other. We probably can’t change telco thinking. Maybe it’s time for telcos to close down and go somewhere else.

CAMPFIRE: THE TRUE STORY OF MPLS and MPLS is Not Tunneling by Ivan Pepelnjak
Greg Ferro’s post on Google+
BUSTING LAYER-2 DATA CENTER INTERCONNECT MYTHS by Ivan Pepelnjak

original:http://highscalability.com/blog/2012/1/17/paper-feeding-frenzy-selectively-materializing-users-event-f.html

How do you scale an inbox that has multiple highly volatile feeds? That’s a problem faced by social networks like Tumblr, Facebook, and Twitter. Follow a few hundred event sources and it’s hard to scalably order an inbox so that you see a correct view as event sources continually publish new events.

This can be considered like a view materialization problem in a database. In a database a view is a virtual table defined by a query that can be accessed like a table. Materialization refers to when the data behind the view is created. If a view is a join on several tables and that join is performed when the view is accessed, then performance will be slow. If the view is precomputed access to the view will be fast, but more resources are used, especially considering that the view may never be accessed.

Your wall/inbox/stream is a view on all the people/things you follow. If you never look at your inbox then materializing the view in your inbox is a waste of resources, yet you’ll be mad if displaying your inbox takes forever because all your event streams must be read, sorted, and filtered.

What’s a smart way of handling the materialization problem? That’s what is addressed in a very good paper on the subject, Feeding Frenzy: Selectively Materializing Users’ Event Feeds, from researchers at Yahoo!, who found:

The best policy is to decide whether to push or pull events on a per producer/consumer basis. This technique minimizes system cost both for workloads with a high query rate and those with a high event rate. It also exposes a knob, the push threshold, that we can tune to reduce latency in return for higher system cost.

I learned about this paper from Tumblr’s Blake Matheny, in an interview with him for a forthcoming post. This is broadly how they handle the inbox problem at Tumblr. More details later.

Abstract from the paper:

Near real-time event streams are becoming a key feature of many popular web applications. Many web sites allow users to create a personalized feed by selecting one or more event streams they wish to follow. Examples include Twitter and Facebook, which allow a user to follow other users’ activity, and iGoogle and My Yahoo, which allow users to follow selected RSS streams. How can we efficiently construct a web page showing the latest events from a user’s feed? Constructing such a feed must be fast so the page loads quickly, yet reflects recent updates to the underlying event streams. The wide fanout of popular streams (those with many followers) and high skew (fanout and update rates vary widely) make it difficult to scale such applications.

We associate feeds with consumers and event streams with producers. We demonstrate that the best performance results from selectively materializing each consumer’s feed: events from high-rate producers are retrieved at query time, while events from lower-rate producers are materialized in advance. A formal analysis of the problem shows the surprising result that we can minimize global cost by making local decisions about each producer/consumer pair, based on the ratio between a given producer’s update rate (how often an event is added to the stream) and a given consumer’s view rate (how often the feed is viewed). Our experimental results, using Yahoo!’s web-scale database PNUTS, shows that this hybrid strategy results in the lowest system load (and hence improves scalability) under a variety of workloads.