Abstract:
Distributed storage is the backbone of many of the largest scale web sites. Traditionally these systems have been the province of web giants like Google, Yahoo, and Amazon; but a recent surge of open source systems have begun to make this available to a much broader user base. This includes the design and implementation of Project Voldemort, an open source distributed storage system originally developed at LinkedIn. Voldemort handles a big chunk of traffic at Linkedin serving thousands of requests per second over terabytes of data. Voldemort is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage. Voldemort is still under development. It is neither an object database, nor a relational database. It does not try to satisfy arbitrary relations and the ACID properties, but rather is a big, distributed, fault-tolerant, persistent hash table. A 2012 study comparing systems for storing APM monitoring data reported that Voldemort, Cassandra, and HBase offered linear scalability in most cases, with Voldemort having the lowest latency and Cassandra having the highest throughput.