The Cloud Dilemma: Balancing Observability at Scale

Jun 25, 2025

Modern engineering teams face a fundamental tension: cloud observability promises deep insight, but achieving it often forces trade-offs. Such dilemmas can hamper visibility or agility unless you rethink the architecture. CtrlB’s unified observability data lake is designed to eliminate the tradeoffs. It ingests any schema, stores data natively on cloud object storage, and runs hybrid-search queries via a serverless, MPP query engine. We’ll talk about 5 dilemmas here:

Complexity vs. Control

Observability solutions today sit between two extremes. Using a managed SaaS (e.g., Datadog, Azure Sentinel, Splunk Cloud) offers ease of use but locks you into hidden constraints and often incurs additional costs (e.g., egress, billing tiers). Running open-source stacks (Elastic, Loki, Prometheus, etc.) on your cloud gives control, but at the price of great complexity and maintenance overhead. In practice, teams find themselves burdened with managing indices, clusters, and pipelines just to keep the lights on. Self-hosting stacks like ELK on SSD drives up cost and complexity.

All your telemetry data lives in our object storage (S3). We support any schema of observability, security, or analytics data with full SQL and hybrid search. In plain terms, you no longer need separate systems for logs, traces, services, or alerts; all data lands in one platform. In practice, that means fewer moving parts and more centralised control, without sacrificing the flexibility to ask any question of your data. CtrlB is more cost-effective than most SaaS observability vendors, and its total cost of ownership is significantly lower than self-hosting Elastic.

Flexibility vs. Fragmentation

Another key frustration in observability is that different data types (logs, traces, alerts, geospatial data, ML vectors, etc.) often live in disconnected silos. Best-of-breed tools or databases exist for each modality, but making sense of that data is a headache. Teams lose agility because correlating an event across logs can mean stitching data from two or more platforms. Industry analysts note that the observability stack is fragmented in most companies' traces, and logs are often on different platforms and are hard to connect. This fragmentation forces either expensive engineering efforts or limited point solutions.

CtrlB’s answer is a hybrid search on a single engine. We ingest all telemetry into one unified store and support multiple search modalities on the same data. Hybrid Search combines different search modalities (full-text, analytical, time-series, vector) in one engine, letting you query and analyse data from a single source of truth. You are not limited to a homebrew query language. CtrlB fully supports ANSI SQL on any field, plus rich full-text lookups, without requiring data reshaping. This flexibility, all in one place, breaks the tradeoff between freedom and fragmentation.

For example, consider a security analyst investigating an incident: they might keyword-search logs, filter by time-series anomaly. With CtrlB, they stay on one platform. Under the hood, we handle schema-less ingestion (any JSON or log schema), micro-indexing for full-text fields. The alternative (using three or four different datastores) is simply eliminated. No more stitching ELK with Prometheus and Jaeger – CtrlB gives one unified observability fabric.

Cost vs. Coverage

One of the hardest trade-offs is financial. Every gigabyte of telemetry can cost money, so many teams sample, filter, or drop data to keep budgets under control. The prohibitive cost of storing terabytes of telemetry data forces teams to sample data, creating blind spots in the system. For example, ingesting 700 GB/day (a moderate load in a busy cloud) costs roughly $734k/year in Azure Sentinel, $547k in QRadar, and over $500k in Splunk. Even at 100 GB/day, Splunk Cloud would be around $80k/year. And Datadog’s indexing model can exceed $3M/year for 100B events. In short, traditional vendors make unfiltered observability a luxury: organisations end up throwing away low-value logs or chopping retention from 90 days to 7 just to survive.
CtrlB’s architecture slashes those costs. We store logs raw in cost-effective Parquet files on S3, achieving 15–20x compression over raw log text. In practice, CtrlB’s storage footprint is tiny.

These efficiencies translate into dramatic dollar savings. CtrlB can handle hybrid searches up to 10× faster while cutting costs by up to 90%. For example, the cost of retaining 30 TB/month (1 TB/day) for 90 days is only about $6,100 on CtrlB, versus roughly $55k on Elasticsearch or $67k on Splunk and hundreds of thousands on SaaS platforms. In other words, CtrlB lets you store and search all your telemetry (logs, traces, services) at near-unlimited scale for a fraction of the bill. By avoiding heavy indexing and using cheap object storage, we remove the need for data sampling and “cheaper tiers”. You get full coverage without breaking the bank.

Speed vs. Overhead

Traditional log platforms (like ELK) achieve low-latency results by pre-indexing every field and shard, which demands constant CPU usage and expensive SSD storage. On the flip side, “cheap” stores (like Loki on S3) minimize indexing but trade off query performance. This is the classic speed vs overhead dilemma: either you invest heavily in infrastructure or accept slower queries. CtrlB breaks this tradeoff with a compute-on-read, massively parallel architecture. Logs are indexed only once at ingest, and queries are executed by a fast, cloud-native query layer that dynamically spins up compute at scale. The result? High performance without high cost.
Instead of provisioning SSD-backed nodes or long-lived clusters, CtrlB spins up just enough compute when needed and tears it down when done. This model gives you fast access to log data at scale while keeping ongoing infrastructure costs low. In everyday use, engineers get interactive query speeds on typical workloads without over-provisioning, pre-processing, or vendor lock-in. You no longer have to choose between visibility and performance.
With CtrlB, you keep all logs searchable and accessible, stored cost-effectively on S3, and pay only for compute when queries are run.

Scaling Without Sacrifice

Finally, as environments grow, observability must grow with them, without forcing painful choices. In traditional “shared-nothing” architectures, more data requires more nodes, more indexing, and more human ops. One overloaded Elasticsearch node can slow the entire cluster, and multi-tenant contention becomes a risk. The promise of cloud-native systems is agility and scale, but the reality often looks like spiralling overhead. As infrastructure grows, so do the layers of tooling needed to keep it running: autoscaling groups, container orchestrators, security scanners, observability stacks, storage tiers, log forwarders, cost optimisers, the list never ends.

For many teams, observability becomes yet another layer to manage, configure, and scale manually. Self-hosted solutions like ELK or Loki require careful tuning of shards, replicas, and retention policies. Add in pipeline management (FluentBit, Vector), tracing backends (Jaeger, Tempo), and you’re now maintaining a small fleet of observability infrastructure, often without a dedicated observability engineer. Even in SaaS setups, complexity doesn’t vanish. Now you're managing vendor billing dashboards, noisy alert thresholds, agent upgrades, and feature limitations tied to license tiers. Teams throttle data volume or delay queries to cope, essentially sacrificing responsiveness as they scale.

CtrlB’s decoupled, serverless design sidesteps this. Storage and compute are fully independent. You dump data into S3 and let our query service elastically scale Lambdas on demand. In decoupled architectures, more data doesn’t necessarily mean more nodes – you can scale compute independently of storage growth.

In practical terms, CtrlB handles modern loads gracefully. Whether you ingest 1 TB/day or 7 TB/day, the ingestion pipeline saturates at the same per-second rate, and costs grow linearly only with storage (at pennies/GB). As data volumes increase, CtrlB’s query time increases only modestly, a scaling story most rivals cannot match without disproportionate overhead. This model lets you focus on what matters, understanding your systems, not on maintaining another system that helps you do that.

Conclusion

Observability shouldn’t demand sacrifices. CtrlB’s unified data lake approach means you no longer juggle multiple tools or compromise on data fidelity. By storing everything in cheap object storage, indexing smartly (once in compressed Parquet), and running queries in a massively parallel serverless engine, we eliminate the usual tradeoffs in speed, cost, and flexibility. Teams get full coverage, sub-second analytics across vast datasets, all while cutting total cost of ownership by an order of magnitude.

In short, CtrlB turns these observability dilemmas upside-down. You gain simplicity and control, flexibility without fragmentation, coverage without ruinous cost, speed without heavy overhead, and seamless scale without compromise.
Ready to unify your telemetry? Learn more at or reach out to see a live demo. Your cloud observability just became affordable and fast.

The Cloud Dilemma: Balancing Observability at Scale

Complexity vs. Control

Flexibility vs. Fragmentation

Cost vs. Coverage

Speed vs. Overhead

Scaling Without Sacrifice

Conclusion

Latest Blogs

The Dashboard Trap: Why Graphs Aren’t Enough

The Cloud Dilemma: Balancing Observability at Scale

Control Plane: One Plane To Control Them All

Ready to take control of your observability data?