An Avalanche of DNA

Well, Microsoft's Research arm has come up with a new form of Peer-to-Peer filesharing which is based on DNA coding called Avalanche. Most of you know how Bittorrent works, but for those of you who don't, I'll explain it anyway.

Bittorrent basically works around the idea of, "Why have one big server sending lots of people a file, when everyone could send everyone else parts of the file they have?". Basically, you split the file up into hundreds (or thousands) of chunks, send a bit of the chunk to each "peer", and those peers swap the chunks they have with those that don't. Nifty, and can be fast for files that have a lot of peers. But the main problem with this is; what happens if Peer X is the only peer that has chunk #283, and he goes offline?

Microsoft's Avalanche uses algebra to replace some of the bandwidth. Instead of splitting a file into lots of chunks, it calculates lots of linear combinations, and sends those. Each of these combinations contains information about the entire file, so once you've obtained enough of these combinations, you can reconstruct the original file. It's kind of how DNA works.

I was really quite impressed when I read about this. Nature has a lot of cool things about it that seem to work well when modelled in a computer (take neural networks for example). But the main problem we've had to come up with hacks like DNA coding to distribute files is because of bastard ISPs.

There's a very easy way to distribute a large file amongst a lot of users. It's called multicast. But a lot of ISPs choose to buy dodgy cheap routers which don't route multicast packets, or choose not to allow multicast because they don't know who or how to charge for it. Is it moral for an ISP to charge 500 people 20c/mb when they're only using the same amount of bandwidth as one user? Probably not. So you make each of those 500 people maintain an independant connection. Sure, it means you use 500 times the bandwidth, but at least billing is easy.