Senin, 18 November 2013

How the Nephew of Computer Science Royalty Remade Twitter

Sam Ritchie wasn't trained as a programmer. He was a paddler on the U.S. Sprint Kayak team, reaching the pinnacle of this niche sport at the 2009 World Championships. He was a math and science student who majored in mechanical and aerospace engineering at Princeton. It wasn't until after his uncle died, in late 2011, that he scaled the heights as a coder, creating one of the key tools used to build the burgeoning web empire that is Twitter.


There was something rather poetic about this transformation. His uncle, you see, is Dennis Ritchie, one of the most important software developers in the history of computing. At Bell Labs in Murray Hill, New Jersey, not far from Princeton, Dennis Ritchie created the C programming language, still the most popular language on earth, and together with Ken Thompson, he built the UNIX operating system, the basis for every Apple computer, tablet, and phone sold today - not to mention a world of Linux machines and Android devices.


'I accepted the soul of Dennis Ritchie. I was a terrible programmer, and then he passed'


'I accepted the soul of Dennis Ritchie,' Sam Ritchie says, in his typically playful way. 'I was a terrible programmer, and then he passed.'


Sam joined Twitter just before his uncle died, and there, together with an ex-quantum physics professor named Oscar Boykin, he built something called Summingbird, a new-age development tool that lets even rather green programmers quickly and relatively easily construct software that rapidly analyzes massive amounts of online data. At Twitter - where about 5,700 tweets are posted every second - that's something pretty close to gold. Analyzing all that data is a way of understanding how the service works - and improving it - but it's also a means of targeting ads, the heart beat of the company's business.


Summingbird is another milestone in the evolution of a new type of software that makes good use of the never-ending stream of information that comes tumbling off the internet with each passing second. Built mostly by the giants of the web, this software includes everything from Hadoop, a way of crunching data stored across dozens or even hundreds of machines, to tools like Twitter's Storm, which uses myriad machines to analyze newer data in near real-time, as it comes off the net.


What Summingbird offers is a way of building software and services that can tap both kinds of tools, both the massive 'batch processing' of Hadoop and the real-time analysis you get from Storm. 'Summingbird can describe logic that can run in real-time or on Hadoop or just on your laptop,' Boykin says. 'You can run it in all these different places without having to worry too much about each one, and you can then combine all the results.' That's not something we've seen before, and as companies move more and more towards real-time analytics, this sort of tool will become increasingly important.


Not long after Ritchie and Boykin built Summingbird, a Twitter college intern named Wen-Hao Lue used the tool in building the company's new Headlines service, which so quickly grabs links to news stories and webpages related to a particular tweet and then embeds them in the tweet itself. Headlines requires access to an enormous amount of processing power and data - data spread across thousands of Twitter servers and, in some cases, only just posted to the net - and with Summingbird, Lue, a relative novice in the coding world, could tap that power and data with unusual ease. If he didn't have Summingbird, he says, building Headlines was 'definitely not' a task he could have pulled off during a four-month internship.


The Odd Couple

After college, Sam Ritchie dabbled in programming, building stuff for the iPhone, and he eventually worked his way up to more ambitious online development. He wound up at Twitter when a company he was interviewing with, BackType, was acquired by the social networking outfit. BackType is where Storm was originally built, under the direction of a developer named Nathan Marz. After the acquisition, the tool became an integral part of Twitter's underlying infrastructure. It was a way of instantly analyzing stuff that was happening on the social network and feeding it to web 'dashboards' used by Twitter employees and ad partners.


'Think of the data available at Twitter as flows of data - garden hoses flying around,' Ritchie says. 'Storm is like a gold pan that helps you pull the good nuggets out.'


'Physicists are either attracted to - or instilled with - the notion that they can probably solve any problem'


Like web giants such as Yahoo and Facebook, the company also crunched massive amounts of older data using Hadoop. But this was a slower process, and like Storm, Hadoop was a rather difficult thing to use, even for seasoned programmers. It you wanted to tap the immense power of either tool, you needed a certain expertise, and building something that tapped both was particularly difficult. But then Ritchie ran into Oscar Boykin.


Boykin had joined Twitter after a long career in physics. As it turns out, particle physicists are rather well suited to building the kind of massive, complex software that runs modern web services. Adrian Cockcroft, the director of cloud architecture at Netflix, is a physicist, as are Mike Miller and Alan Hoffman, the cofounders of big data outfit Cloudant. 'It's a very common thing. From physics and math in general into computer science - that's a constant flow,' Boykin says. 'Physicists are attracted to - or instilled with - the notion that they can probably solve any problem.'


On the surface, Boykin and Ritchie seem so very different. The dark-bearded, 40-ish Boykin certainly has the air of a college professor, as he so carefully chooses his words, while the blonde, 20-something Ritchie is the unrestrained, talkative sort. But they have the kind of rapport where they complete each others' thoughts - and off-handedly make fun of their differences. When Boykin is asked to describe his background, Ritchie responds first. 'You have a lot to talk about, man,' he says.


'He's calling me old,' Boykin responds.


After meeting at Twitter, what they quickly realized is that they wanted to build the same thing. Having worked on systems that tapped into either Hadoop or Storm, they wanted to build a tool that would provide a common means of fashioning software and services that plugged into both at the same time.


Trail Philosophy

This became an obsession of sorts. Ritchie is now an ultra marathoner, and towards the end of a recent 100-mile race, Boykin, also a runner, joined him to help keep up his spirits - and talk about Summingbird. 'We were at mile 80, and we were talking about Summingbird,' Ritchie remembers. 'This woman says: 'We've got a couple of trail philosophers out here.''


Together with a few other developers, they designed the tool in a matter of months. Basically, it's a coding library that lets you build a single piece of software that can crunch enormous amounts of stored data with Hadoop, and then, if you want to fold in newer data as that long analysis job is wrapping up, it can also hook into Storm. 'Hadoop is very reliable, but it's also a bit slow. This lets you also run stuff in real-time, getting up-to-the-millisecond results,' Boykin says. 'You don't have to worry about two sets of systems and the complex process of merging the two.'


'Hadoop is very reliable but it's also a bit slow. This lets you also run stuff in real-time, getting up-to-the-millisecond results'


They called it Summingbird because most of Twitter's internal software tools carry names that play off the company's famous avian theme, and as is often the case at Twitter, they open sourced the tool, letting anyone outside the company use it for free. Some outsiders are already kicking the tires on thing, including Tom White, a longtime Hadoop developer and user. He says Summingbird is still rough around the edges, but he certainly sees the need for this kind of hybrid 'big data' tool.


'You need to have an overall system that codifies the use of these [big data] systems,' he says. Spark, a sweeping software platform developed at the University of California at Berkeley, does both Hadoop-style batch processing and Storm-style real-time jobs. But it's not like Summingbird. It doesn't provide a means of merging results from those two worlds in the way Summingbird does.


Sam Ritchie is a free spirit. Chatting inside Twitter's office in early October, he wears a single flip-flop, the sort that long-distance runners like to wear. The other one broke, so he just started showing up at the office half-barefoot. The next time we talk to him, he has left Twitter, departing the day after its big IPO. He's moving to Colorado to build a website called Paddleguru.com, a return to the world of sprint kayaking. But whatever else he does, he has left his mark on the world of elite programming. His uncle would be proud.



Cade Metz is the editor of Wired Enterprise. Got a NEWS TIP related to this story -- or to anything else in the world of big tech? Please e-mail him: cade_metz at wired.com.


Read more by Cade Metz

Follow @cademetz on Twitter.


Tidak ada komentar :

Posting Komentar