Gorilla: A fast, scalable, in-memory time series database (2016)
Recorded: May 25, 2026, 7:57 a.m.
| Original | Summarized |
Gorilla: A fast, scalable, in-memory time series database – the morning paper Skip to content the morning paper Menu About Gorilla: A fast, scalable, in-memory time series database May 3, 2016July 27, 2017 ~ adriancolyer Gorilla: A fast, scalable, in-memory time series database – Pelkonen et al. 2015 In the 18 months prior to publication, Gorilla helped Facebook engineers identify and debug several such production issues. An important requirement to operating [these] large scale services is to accurately monitor the health and performance of the underlying system ad quickly identify and diagnose problems as they arise. Facebook uses a time series database to store system measuring data points and provides quick query functionalities on top. As of Spring 2015, Facebook’s monitoring systems generated more than 2 billion unique time series of counters, with about 12 million data points added per second – over 1 trillion data points per day. Here then are the design goals for Gorilla: Store 2 billion unique time series, identifiable via a string key To meet the performance requirements, Gorilla is built as an in-memory TSDB that functions as a write-through cache for monitoring data ultimately written to an HBase data store. To meet the requirements to store 26 hours of data in-memory, Gorilla incorporates a new time series compression algorithm that achieves an average 12x reduction in size. The in-memory data structures allow fast and efficient scans of all data while maintaining constant time lookup of individual time series. The key specified in the monitoring data is used to uniquely identify a time series. By sharding all monitoring data based on these unique string keys, each time series dataset can be mapped to a single Gorilla host. Thus, we can scale Gorilla by simply adding new hosts and tuning the sharding function to map new time series data to the expanded set of hosts. When Gorilla was launched to production 18 months ago, our dataset of all time series data inserted in the past 26 hours fit into 1.3TB of RAM evenly distributed across 20 machines. Since then, we have had to double the size of the clusters twice due to data growth, and are now running on 80 machines within each Gorilla cluster. This process was simple due to the share-nothing architecture and focus on horizontal scalability. The in-memory data structure is anchored in a C++ standard library unordered map. This proved to have sufficient performance and no issues with lock contention. For persistence Gorilla stores data in GlusterFS, a POSIX-compliant distributed file system with 3x replication. “HDFS, or other distributed file systems would have sufficed just as easily.” For more details on the data structures and how Gorilla handles failures, see sections 4.3 and 4.4 in the paper. I want to focus here on the techniques Gorilla uses for time series compression to fit all of that data into memory! Gorilla compresses data points within a time series with no additional compression used across time series. Each data point is a pair of 64-bit values representing the time stamp and value at that time. Timestamps and values are compressed separately using information about previous values. When it comes to time stamps, a key observation is that most sources log points at fixed intervals (e.g. one point every 60 seconds). Every now and then the data point may be logged a little bit early or late (e.g., a second or two), but this window is normally constrained. We’re now entering a world where every bit counts, so if we can represent successive time stamps with very small numbers, we’re winning… Each data block is used to store two hours of data. The block header stores the starting time stamp, aligned to this two hour window. The first time stamp in the block (first entry after the start of the two hour window) is then stored as a delta from the block start time, using 14 bits. 14 bits is enough to span a bit more than 4 hours at second resolution so we know we won’t need more than that. For all subsequent time stamps, we compare deltas. Suppose we have a block start time of 02:00:00, and the first time stamp is 62 seconds later at 02:01:02. The next data point is at 02:02:02, another 60 seconds later. Comparing these two deltas, the second delta (60 seconds), is 2 seconds shorter than the first one (62). So we record -2. How many bits should we use to record the -2? As few as possible ideally! We can use tag bits to tell us how many bits the actual value is encoded with. The scheme works as follows: Calculate the delta of deltas: D = (tn – tn-1) – (tn-1 – tn-2) The particular values for the time ranges were selected by sampling a set of real time series from production systems and choosing the ranges that gave the best compression ratios. Figure 3 shows the results of time stamp compression in Gorilla. We have found that about 96% of all time stamps can be compressed to a single bit. (i.e., 96% of all time stamps occur at regular intervals, such that the delta of deltas is zero). So much for time stamps, what about the data values themselves? We discovered that the value in most time series does not change significantly when compared to its neighboring data points. Further, many data sources only store integers. This allowed us to tune the expensive prediction scheme in [25] to a simpler implementation that merely compares the current value to the previous value. If values are close together the sign, exponent, and first few bits of the mantissa will be identical. We leverage this to compute a simple XOR of the current and previous values rather than employing a delta encoding scheme. The values are then encoded as follows: The first value is stored with no compression Store the control bit ‘0’ and then store the meaningful XOR’d value (i.e. 032) in the example above. If the meaningful bits do not fit within the meaningful bit range of the previous value, then store the control bit ‘1’ followed by the number of leading zeros in the next 5 bits, the length of the meaningful XOR’d value in the next 6 bits, and finally the meaningful bits of the XOR’d value. Roughly 51% of all values are compressed to a single bit since the current and previous values are identical. About 30% of the values are compressed with the control bits ’10’ with an average compressed size of 26.6 bits. The remaining 19% are compressed with control bits ’11’, with an average size of 36.9 bits, due to the extra overhead required to encode the length of leading zero bits and meaningful bits. Building on top of Gorilla The correlation engine calculates the Pearson Product-Moment Correlation Coefficient (PPMCC) which compares a test time series to a large set of time series. We find that PPMCC’s ability to find correlation between similarly shaped time series, regardless of scale, greatly helps automate root-cause analysis and answer the question “What happened around the time my service broke?”. We found that this approach gives satisfactory answers to our question and was simpler to implement than similarly focused approaches described in the literature[10, 18, 16]. To compute PPMCC, the test time series is distributed to each Gorilla host along with all of the time series keys. Then, each host independently calculates the top N correlated time series, ordered by the absolute value of the PPMCC compared to the needle, and returning the time series values. In the future, we hope that Gorilla enables more advanced data mining techniques on our monitoring time series data, such as those described in the literature for clustering and anomaly detection [10, 11, 16]. Summing up lessons learned the authors provide three further takeaways: Prioritize recent data over historical data. Why things are broken right now is a more pressing question than why they were broken 2 days ago. We found that building a reliable, fault tolerant system was the most time consuming part of the project. While the team prototyped a high performance, compressed, in-memory TSDB in a very short period of time, it took several more months of hard work to make it fault tolerant. However, the advantages of fault tolerance were visible when the system successfully survived both real and simulated failures. Share this: Share on LinkedIn (Opens in new window) Email a link to a friend (Opens in new window) Print (Opens in new window) Related Posted in Uncategorized DatastoresFacebookTime series Post navigation 16 thoughts on “Gorilla: A fast, scalable, in-memory time series database” Pingback: Finding surprising patterns in a time series database in linear time and space | the morning paper Pingback: Big Analytics Roundup (May 9, 2016) | The Big Analytics Blog Pingback: Towards parameter-free data mining | the morning paper Jay says: June 2, 2016 at 11:52 pm I would like to explore and evaluate.. I don’t find the software for download for installation.. Can you please let us know if this software is available for public to use and explore? Is it an open source? adriancolyer says: June 3, 2016 at 9:13 am Hi Jay, This is a Facebook internal system and as far as I know they have no plans to make it available outside their organisation. Regards, Adrian. misev says: December 23, 2016 at 7:00 am It’s been open-sourced now: https://github.com/facebookincubator/beringei/ Jay says: June 2, 2016 at 11:55 pm From where shall I download the software for evaluation? Pingback: End of Term, and the power of compound interest | the morning paper Pingback: The Morning Paper on Operability | the morning paper onon says: November 5, 2016 at 2:15 pm it seems very complicated to decompress the timestamp and value? Pingback: Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services | the morning paper Pingback: So that was 2016 | the morning paper Pingback: Gorilla: A fast, scalable, in-memory time series database | the morning paper | What I learned Pingback: Chronix: Long term storage and retrieval technology for anomaly detection in operational data | the morning paper Pingback: BTrDB: Optimizing Storage System Design for Timeseries Processing | the morning paper Pingback: Observability - kubedex.com Comments are closed. Blog at WordPress.com. Reblog Subscribe Subscribed the morning paper Join 3,159 other subscribers
Sign me up Already have a WordPress.com account? Log in now.
the morning paper Subscribe Subscribed Sign up Copy shortlink Report this content View post in Reader Manage subscriptions Collapse this bar %d |
Gorilla is presented as a fast, scalable, in-memory time series database developed to manage monitoring data for large-scale services, exemplified by Facebook. The system was designed to provide rapid monitoring and diagnosis of system health by storing system measuring data points and offering quick querying capabilities. To meet the demands of Facebook's monitoring systems, which generated over two billion unique time series and processed massive data influx, Gorilla had stringent design goals, including storing two billion unique time series, handling insertion rates of up to seventy million data points per minute, retaining data for the last twenty-six hours, supporting up to forty thousand queries per second, and achieving read latencies under one millisecond. The system was architected as an in-memory time series database that functions as a write-through cache for data ultimately stored in an HBase data store, utilizing a share-nothing architecture to facilitate horizontal scalability through sharding based on unique string keys. To achieve the necessary performance, Gorilla incorporated a novel time series compression algorithm to fit the data within memory. For time stamps, the compression scheme exploits the observation that most logging occurs at fixed intervals, using two-hour blocks where the first time stamp is stored as a delta from the block start time. Subsequent time stamps are then encoded by comparing successive deltas, allowing the system to record the difference in a highly compressed manner. This approach successfully compressed approximately ninety-six percent of all time stamps down to a single bit. For the data values themselves, the method leverages the fact that values often do not change significantly between sequential data points, allowing the use of XOR operations instead of traditional delta encoding. This technique is further refined by examining the leading and trailing zero bits to determine the most efficient storage format for the compressed values, resulting in various compression ratios depending on the similarity of neighboring values. The operational success of Gorilla stemmed from its low-latency processing, which enabled the development of advanced tools built on top of the database, such as a correlation engine that calculates the Pearson Product-Moment Correlation Coefficient to identify relationships between time series, aiding in root-cause analysis. The authors emphasized three key lessons from the project: the necessity of prioritizing recent data over historical data when diagnosing issues, recognizing that read latency is paramount for practical application, and understanding that high availability is more critical than absolute resource efficiency. While the initial prototype focused on high performance and compression, achieving fault tolerance required additional development, demonstrating that building a reliable and fault-tolerant system is a crucial, time-consuming aspect of large-scale system design. |