Logs as Data: Why CtrlB Treats Raw Logs Like a Data Lake

May 7, 2025

In today’s cloud-native world, logs have grown beyond simple debugging tools; they’re now a valuable source of deep operational insight. At CtrlB, we see things differently. Logs are not just debugging artefacts; they are raw, untapped data. And we treat them with the same design principles you’d apply to a data lake.

This blog explores our philosophy: why CtrlB treats raw logs like a data lake, how it impacts observability, and what this means for developers building and operating distributed systems.

The Problem with Traditional Log Stacks

In most traditional observability setups, logs are piped into a rigid pipeline: parse, index, and visualise, usually in that order. Popular tools enforce early schema binding, meaning engineers must decide upfront how their logs should look, what they should mean, and which ones are worth retaining.

This approach has serious drawbacks, like:

Over-indexing costs: Query performance comes at the price of expensive storage and compute.
Inflexible Logging: Logs are the richest source of truth, & yet their value is often lost to rigid formats, filters, and early assumptions that discard what might matter later.
Lost context: Logs become decoupled from the services and traces to which they belong.
Logs as afterthought: Treated as transient, logs are often summarised or discarded quickly, missing opportunities for deeper insight.

Logs as a First-Class Data Source

Instead of rushing to discard logs, CtrlB treats raw logs as a long-term, queryable asset. Think of it like a data lake: logs are stored raw, preserved in full fidelity, and made available for ad hoc search or correlation later.

Schema-Less Search

We don’t enforce a schema on ingest. Logs can be stored as key-value pairs, structured JSON, and unstructured text. When you query, you define the structure you care about. This allows teams to evolve their log formats over time without breaking downstream tools.

Service-Aware Context

Logs aren’t just blobs of text. They belong to services, requests, and traces. CtrlB tags logs with service context on ingest, allowing you to filter, correlate, and trace across your stack without manually pre-processing or stitching things back together.

Pipeline Control (In Progress)

We’re building towards a world where teams can control how logs flow, get enriched, or archived without waiting on DevOps or rewriting config files. The idea is to offer seamless, centralised control over log pipelines, directly from the CtrlB control plane.

Data Lake, Not Junk Drawer

If you store everything raw, doesn’t that become a junk drawer?
Not if it's searchable, organised, and connected. Just like a data lake, CtrlB provides:

Optimised full-text search over raw logs
Schema-less access to raw data so you get what you want in sub-second speeds
Correlation of logs with services and traces
Retention flexibility, letting teams decide what’s hot vs cold data

You get the benefits of full fidelity without being locked into a parsing schema from day one.

Who Benefits from This?

Developers can now search logs the way they think, by service, span, or trace, rather than guessing index formats.
SREs, who need long-term visibility into production behaviour without juggling brittle log parsers.
Security teams who want raw logs preserved for long-term retention without needing to rehydrate archived data, keeping historical logs searchable without ballooning costs.
FinOps teams, who need to balance cost versus visibility and avoid overpaying for logs they rarely query, can access cold-tier logs instantly, which means no surprise rehydration delays.

Why CtrlB Built It This Way

CtrlB was designed from the ground up for the realities of cloud-native, microservice-heavy systems. Logs are not an afterthought or a sidecar; they are central to understanding how your systems behave. We believe storing them as raw, queryable data, not just rendered dashboards, is the only way to unlock their true potential.

We’re not just building another logging tool. We’re building a new foundation for observability, one where data comes first, and insights follow: Log Native Observability.

Closing Thoughts

Logs are data. It’s time we treated them that way. By embracing the data lake approach, CtrlB enables teams to make logging more flexible, more affordable, and more useful. We’re rethinking what log storage and search should look like in a modern, distributed world, without forcing engineers to jump through hoops just to ask simple questions of their systems.

Key Takeaways:

Logs are a valuable data resource, not just debugging artefacts.
Schema-less, service-aware log storage unlocks flexibility and long-term value.
A data lake approach delivers better observability, lower costs, and richer insights.

If you’re tired of managing brittle log pipelines or throwing away data you might need later, CtrlB might just be what you’ve been looking for.

Logs as Data: Why CtrlB Treats Raw Logs Like a Data Lake

The Problem with Traditional Log Stacks

Logs as a First-Class Data Source

Data Lake, Not Junk Drawer

Who Benefits from This?

Why CtrlB Built It This Way

Closing Thoughts

Latest Blogs

The Dashboard Trap: Why Graphs Aren’t Enough

The Cloud Dilemma: Balancing Observability at Scale

Control Plane: One Plane To Control Them All

Ready to take control of your observability data?