Kamis, 03 April 2014

Twitter details its Manhattan real


Twitter's service is nothing if not fast-moving, and on Tuesday night the company published a blog post detailing the database that helps it keep up. Called Manhattan, it's a distributed, real-time database built to serve multiple teams and applications within the company.


It's also something of an indictment against existing open source database technologies, at least when it comes to handling the scale and, probably more accurately, the speed of Twitter. The blog post was authored by Twitter software engineer Peter Schuller, who wrote:


'We were spending far too much time firefighting production systems to meet the performance expectations of our various products, and standing up new storage capacity for a use case involved too much manual work and process. Our experience developing and operating production storage at Twitter's scale made it clear that the situation was simply not sustainable.'


Schuller goes into a fair amount of detail into how Twitter built Manhattan to be reliable, consistent and easy to use, and also details some of the data formats it's designed to handle. For now, users interact with Manhattan as a key-value store, but Twitter is looking to add other interfaces, including a graph-based capability. It consists of three storage engines that are designed for read-only Hadoop data, write-heavy and read-heavy data, respectively. It has numerous services built in, including for importing Hadoop data, ensuring strong consistency and counting time-series data.



Perhaps most importantly for developers and engineers, Manhattan is a storage service meant to be consumed just like any other cloud storage service. 'Engineers can provision what their application needs (storage size, queries per second, etc) and start using storage in seconds without having to wait for hardware to be installed or for schemas to be set up,' Schuller wrote. Twitter took great care to ensure its multitenant status (i.e., it's serving many teams and application simultaneously) didn't result in subpar performance because one user is hogging too many resources.


Twitter plans to release a technical paper at some point detailing even more about how Manhattan is built. Given the company's penchant for open source, it wouldn't be surprising if it open sourced Manhattan at some point as well. The company released its MySQL code in 2012, and recently contributed code to Facebook's WebScaleSQL open source project.


The mere presence of Manhattan speaks to the incredible and often unique needs of large web companies, but it's fair to wonder for how long their present technologies will remain on the edge. For a growing number of applications, companies like Twitter, Google, Facebook and LinkedIn seem to have moved on from the first batch of NoSQL technologies - which are now working their ways into large enterprises - and are now building new systems just like they built Cassandra, Voldemort and BigTable in the past. Maybe Manhattan will be tomorrow's Cassandra, and LinkedIn's Espresso the new MongoDB, for the next wave of startup developers looking to do something new.


Related research

Subscriber Content


?Subscriber content comes from Gigaom Research, bridging the gap between breaking news and long-tail research. Visit any of our reports to learn more and subscribe. In Q3, Big Data Meant Big Dollars What developers should know when choosing an MBaaS solution Cloud and data second-quarter 2013: analysis and outlook By Derrick Harris

Like this post? Share it!


Follow @derrickharris or@gigaom for more stories like this.


Get top stories delivered daily. Subscribe


You're subscribed to our newsletter. If you'd like, you can update your settings


Join the conversation

Advertisement


Related stories When data journalism meets hyper-local - Oakland Local launches Police Beat

After investigating police brutality during the Occupy protests, the non-profit news site Oakland Local decided to harvest...


Mathew Ingram Pivotal hopes its new big data pricing makes it a real platform

EMC-and-VMware spinoff Pivotal has reworked the pricing of its big data software in order to get more...


Derrick Harris This is what happens when Facebook controls the signal, and it defines you as noise

Brands and prominent users like comedian Rainn Wilson are complaining about Facebook's algorithm changes and how that...


Mathew Ingram

Tidak ada komentar :

Posting Komentar