Twitter needs to get creative
Nik Cubrilovic over at TechCrunch has a post up about why Twitter is having a hard time scaling:
Twitter is unique in that it needs to parse a large number of messages and deliver them to multiple recipients, with each user having unique connection.
Hmm, that sounds like something familiar… some other site I know.. somethi… Oh, wait… EMAIL.
Why doesn’t Twitter just set up a little mini internet-like SMTP-inspired network behind their load balancer, with the equivalent of SMTP servers handling a handful of twitter users on each server, put a bunch of routers in between, and do it that way? I mean, isn’t Twitter just email with a bunch of functionality cut out? Isn’t posting on twitter the same thing as sending an one-line email to a bunch of friends? And doesn’t everyone just read their tweets in the equivalent of a featureless inbox?
Forget about the whole one big database table idea that we think is mandatory for web apps. Just copy a distributed architecture that you know will work and will scale.






I think there are some pretty big differences between Twitter and SMTP, most of which have to do with recalling or modifying system state.
For instance, an SMTP message, once delivered to a peer, can be forgotten about completely by the sender. Twitter, on the other hand, wants to remember every single message that gets sent by every user for all time, I suppose so that people can check the web site at any point in the future and see what someone sent at any point in the past. SMTP would croak, too, if it had to handle this.
A second difference having to do with system state is recalling the sometimes-large number of friends/recipients that messages need to be relayed to. With SMTP, the recipients of each message are literally written within the message itself : The onus is on the message writer, not the system, to remember whom to send a message to. Twitter takes this onus from the user and places it on its database.
So I’m not really convinced you can replace all of the features of Twitter with something like SMTP, at least not without a lot of person-hours. But with just one or two people at Twitter working on infrastructure, such tasks are probably pie-in-the-sky compared with just keeping the database from catching on fire.
Leif,
I don’t buy it. Saving items in a “sent items” store on the user’s local server should be trivial. Almost all email applications do this. And the friends list can be stored locally too. There’s no need to compile friends lists across different users (i.e., who are the friends of BOTH Joe and Bob?) so that’s a trivial lookup.
You’re right that the number of recipients is sometimes large (tens of thousands) but only a few users have that many followers. What matters for scalability is the average number of recipients, which is probably pretty reasonably (~50 or so).
I just don’t buy the “oh, it’s so complicated” argument. It’s not complicated. They just chose a shitty architecture to begin with because they didn’t think it through, and now they’re having a hard time because they have to do the transition while under a huge load and whilst trying to maintain uptime.
Not that I don’t sympathize with them… architecture is hard. I’m just pointing out that the architecture they need isn’t some crazy new thing. It’s been around a long time, they just didn’t realize they needed it.
Erik