Why archiving old data is a bad idea?
Mar 12, 2024
The idea of runaway expenses may give you the chills if you handle massive amounts of log data. The expense of storing your hot data is too high, and accessing your cold storage is too challenging. So now you have to make tradeoffs like reducing visibility into your older logs to save some bucks.
But what if you didn’t have to make that tradeoff? What if you were able to query all your data at all times even from cold s3 storage? This is exactly what CtrlB is here to help you with.
Why hot storage costs so much?
You can get excellent scalability, low latency ingest, and efficient query performance with hot storage. But there has always been a price associated with these advantages. Most platforms use pricey storage solutions like storage arrays and solid-state drives (SSDs) to achieve high performance. Although these storage options give you quick read and write times for your data, the additional expense makes hot storage very costly.
When you take into account the massive volume of data that many businesses consume on a daily basis, these costs increase even more. The standard hot data storage default for many log management systems and observability platforms may be three to thirty days. Some businesses consume so much logs that, in order to save money, data must be moved to cold storage after only a few days. Additionally, some solutions require you to export your log data yourself because they won't keep it after the hot storage time period, creating additional headaches and complexity.
The problems with cold storage
Comparing, comprehending, and analyzing your data becomes considerably more difficult when it is placed in cold storage. "Those who cannot remember the past are condemned to repeat it," goes the adage. With your data, one might say the same. What occurs when a significant prod incident in your team resembles one that occurred six months or a year ago, but you are unable to access the relevant data to compare and determine what went wrong? What happens, for example, if you are attempting to decipher patterns in your data that point to recurring slowdowns but are unable to go far enough back in time to fully comprehend those patterns?
Large-scale, cyclical events are another common use case where cold storage presents challenges. Imagine you're having an yearly black Friday sale. When all your storage is hot, you can make comparisons across large events in near real time that can support your business. When the data from the last major event is in cold storage, though, you can’t make those connections fast enough to act on them. You lose valuable contextual information—such as what a customer bought or looked at last year—to connect to your most recent data. And the customer that almost made a purchase last year might be a near-miss this year, too. Even worse, you’ll never connect those dots and understand what happened.
In the end, having more data collected over a longer period of time improves its precision and accuracy, facilitates the identification of patterns, offers insightful context, and yields richer insights. You run the danger of losing the information necessary for your business to succeed if you put data into cold storage, where it is difficult to query and access.
Introducing CtrlB
Since we store data on cheap s3 storage, and allow you to query on it, you don't have to worry about losing old data anymore. This gives you the following benefits:
- Make historical comparisons for deeper data insights. Compare today’s data against last week, last month, or last year—with no penalty for querying cold storage and no lost data.
- Eliminate the complexity of storage tiers. Either you or a platform handling the migration for you will have to oversee the data transfer between levels. In order to guarantee data integrity in the event that there are migration problems, you must back up your data before the migration. Additionally, you will want additional storage capacity to accommodate duplicate data during migrations. Moving data between tiers is not a concern when all of your storage is hot.
- Eliminate difficult decisions about data management. Working with tiered storage forces you to make difficult choices. When should the data be moved from hot to warm to cold? It's not just about the money; you also need to know which data is critical for operations. The more applications you have, the more difficult these choices become. Do you strive to cut costs by reducing the hot storage retention duration as your data volume and expenditures increase? What happens if your CFO says costs should be cut but your engineering teams insist on the data? These choices have the potential to create conflict between teams and induce anxiety. You'll have one less issue to worry about and more time to spend elsewhere when you use us (and yet cost-effective like cold storage).
If this sounds interesting, don't hesitate to reach to us at support@ctrlb.ai