Finding the Needle.
CtrlB vs. ClickHouse, 8 Ways.

PUBLISHED:May 29, 2026

CtrlB vs ClickHouse benchmark comparison chart

ClickHouse documentation recommends using hasAnyTokens / hasAllTokens for full-text search against its text inverted index. We followed that guidance precisely and ran eight WHERE … LIMIT 100 queries against a 5 TB log table, comparing it with CtrlB on the same hardware.

99×

Fastest speedup
span_id point lookup

17×

Smaller index
16 GiB vs. 282 GiB

8/8

Queries won
CtrlB wins every pattern

Key Findings

CtrlB wins all 8 queries, from 2.2× to 99× faster, cold cache, same hardware
Index footprint: CtrlB 16.3 GiB vs. ClickHouse 282.4 GiB (17× gap)
CtrlB queries run in standard SQL with no additional syntax layer

ClickHouse is a capable analytical database with active documentation on optimizing full-text search using its inverted index functions. We followed those recommendations and set up both engines on identical hardware against the same 5 TB production dataset. Results are the p50 of 5 cold runs.

Test Environment

Component	Configuration
Dataset	5 TB structured Kubernetes log data, production workload
Table	`default.benchmark_5tb`
Hardware	16 vCPU instance, identical for both engines
Cache state	Cold cache at the start of every run
Measurement	p50 of 5 cold runs per query, wall-clock latency end-to-end
ClickHouse version	v26.2
ClickHouse indexes	`bloom_filter(0.01)` on `trace_id` & `span_id`; `text(tokenizer=splitByNonAlpha)` on `message`
CtrlB queries	Standard SQL with `LIKE` and equality predicates
ClickHouse queries	Written per ClickHouse documentation for full-text index usage

Table Schema & Index Setup

The ClickHouse table carries three secondary indexes representing the two index families ClickHouse provides for this workload.

ClickHouse

CREATE TABLE default.benchmark_5tb
(
  -- Timestamp & human-readable date
  `_timestamp`       UInt64,
  `ist_timestamp`    String DEFAULT '',

  -- Free-text log line (the haystack)
  `message`          String,

  -- Distributed tracing IDs (high-cardinality)
  `trace_id`         String DEFAULT '',
  `span_id`          String DEFAULT '',

  -- Kubernetes metadata (low/medium cardinality)
  `namespace_name`   LowCardinality(String) DEFAULT '',
  `container_name`   LowCardinality(String) DEFAULT '',
  `pod_name`         String DEFAULT '',
  `plugin_source`    LowCardinality(String) DEFAULT '',
  `stream`           LowCardinality(String) DEFAULT '',
  `bytes`            UInt64 DEFAULT 0,

  -- Full-text inverted index (splitByNonAlpha tokenizer)
  INDEX idx_message_text message
    TYPE text(tokenizer = splitByNonAlpha) GRANULARITY 100000000,

  -- Bloom filters for exact ID lookups (1% false-positive rate)
  INDEX idx_trace_id trace_id TYPE bloom_filter(0.01) GRANULARITY 1,
  INDEX idx_span_id  span_id  TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY (namespace_name, container_name, stream, _timestamp)
SETTINGS
  index_granularity      = 8192,
  min_compress_block_size = 65536,
  old_parts_lifetime     = 480

Sample log record

One representative row from the 5 TB Kubernetes log dataset used in this benchmark:

High cardinality Low / medium cardinality

Sample record
{
  "_timestamp": 1780257600000000,High
  "bytes": 17540,High
  "container_name": "rabbitmq-data-engine", Low
  "ist_timestamp": "01/06/2026, 01:30:00",High
  "message": "I, [2025-03-08T20:57:48.308159 #1]  INFO -- : [70b56038c39e2247e90f419c] [15dc74a5] [New relic Debug] - (job_started) - {:enqueued_at=>1741467082, :scheduled_at=>nil, :class=>\"TrackingCrawlerSqs\", :queue=>\"data_engine-high-processor\", :job_id=>\"70b56038c39e2247e90f419c\", :execution_id=>\"15dc74a5\", :execution_count=>1, :execution_at=>1741467468, :duration=>386, :status=>\"started\", :engine_name=>\"container_tracking\"} [batch:56e6d843-dee7-4160-8855-f3cbaeee41561162]", High
  "namespace_name": "production", Low
  "plugin_source": "k8s-fb-1.19.2", Low
  "pod_name": "ctrlbgo-app-rabbitmq-data-engine-79db6c66bc-gp6dl", High
  "span_id": "9ed343b6a933fc9a", High
  "stream": "stdout", Low
  "timestamp": "2026-05-31T19:59:59.065497790Z",High
  "trace_id": "9c3116fe0a2bbd33d9b1f4771f2868c4" High

}

The Two Index Families

idx_trace_id and idx_span_id use bloom_filter(0.01) with GRANULARITY 1. These are high-cardinality, near-unique ID columns where the entire point is exact-match lookup. A Bloom filter at granularity 1 is the textbook ClickHouse answer: one filter per MergeTree granule (~8,192 rows), letting ClickHouse skip granules that provably don't contain the value. The 0.01 false-positive rate is standard ClickStack tuning.

idx_message_text uses TYPE text(tokenizer = splitByNonAlpha) — ClickHouse's inverted full-text index on the free-text message column, tokenizing on non-alphanumeric boundaries. This is ClickHouse's strongest possible choice for substring search. The GRANULARITY 100000000 effectively covers the entire dataset in a single index entry, correct for a posting-list inverted index.

Ingestion Rate: 280 MB/s vs. 210 MB/s

Both engines ingested the same 5 TB dataset on identical hardware — a 16 vCPU VM — with sustained ingest measured during load. Storage was the only deliberate difference: CtrlB streams to object storage; ClickHouse wrote to local SSD (required for its MergeTree layout).

CtrlB · Ingestion rate

280 MB/s

~33% higher sustained ingest

16 vCPU VM · object storage (cold streaming)

ClickHouse · Ingestion rate

210 MB/s

Sustained ingest during load

16 vCPU VM · local SSD

Results: Wall-Clock Latency Per Query

8 queries · p50 of 5 cold runs · lower is better. ClickHouse queries use hasAnyTokens / hasAllTokens as recommended in the ClickHouse documentation for activating the text inverted index.

Query pattern	CtrlB	ClickHouse	Advantage
Find one `trace_id`	140 ms	7.1 s	50×
Find one `span_id`	92 ms	9.1 s	99×
Substring match, rare UUID	46 ms	721 ms	16×
Substring match, common token (`ERROR`)	33 ms	2.1 s	64×
Two filters: `container_name` + `trace_id`	107 ms	588 ms	5.5×
Filter + substring on `message`	180 ms	2.1 s	12×
Double substring on `message`, worst case	1.25 s	2.8 s	2.2×
High-cardinality equality on `pod_name`	3.6 s	24.5 s	7×

The 8 Queries

Each query was run against default.benchmark_5tb. CtrlB uses standard SQL. ClickHouse uses the syntax recommended in its documentation for full-text index coverage.

Query 1: Find One trace_id High-Cardinality Point Lookup

The most common operation in distributed tracing: pull all log lines for a given trace. Both engines use equality filters. ClickHouse has a Bloom filter on trace_id with GRANULARITY 1, the most aggressive resolution possible. At 1% FPR with thousands of granules across 5 TB, the Bloom filter generates hundreds of false-positive granule reads. Each false-positive requires decompressing the granule and doing a precise row-level check. That I/O and CPU cost accumulates to 7+ seconds. CtrlB's sparse index achieves near-zero false positives at this cardinality.

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE trace_id = '2d3d5a83f6a923d9...'
LIMIT 100

Wall clock 140 ms

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE trace_id = '8ceafb5879a818a5...'
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC
LIMIT 100 FORMAT JSONEachROW

Wall clock 7.1 s · 50× slower

Query 2: Find One span_id High-Cardinality Point Lookup

Same pattern as trace_id, but on span_id. The gap widens to 99×.

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE span_id = '07766058015caeea'
LIMIT 100

Wall clock 92 ms

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE span_id = 'fbf291e847a2ac5e'
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC
LIMIT 100 FORMAT JSONEachROW

Wall clock 9.1 s · 99× slower

Query 3: Substring Match, Rare UUID Full-Text / Wildcard Search

A LIKE '%<uuid>%' search on the message column for a low-frequency substring. CtrlB: 46 ms. ClickHouse with its text index: 721 ms.

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE message LIKE '%9d46ccea-98bd-4cef-bbd5%'
LIMIT 100

Wall clock 46 ms

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE hasAnyTokens(message,
    'c8fedcbb-9a05-47a4-8dd8-f24f75291749')
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC LIMIT 100

Wall clock 721 ms · 16× slower

Query 4: Substring Match, Common Token Full-Text / Wildcard Search

Searching for 'ERROR', a high-frequency token, is the worst case for an inverted index: the posting list is enormous and provides almost no pruning benefit.

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE message LIKE '%ERROR%'
LIMIT 100

Wall clock 33 ms

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE hasAnyTokens(message, 'ERROR')
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC LIMIT 100

Wall clock 2.1 s · 64× slower

Query 5: Two Filters: container_name + trace_id Low/Medium Cardinality Exact Filter

A compound predicate combining a low-cardinality equality filter with a high-cardinality ID lookup. CtrlB: 107 ms. ClickHouse: 588 ms (5.5× slower).

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE container_name = 'sidekiq-data-engine'
  AND trace_id = '2d3d5a83f6a923d9...'
LIMIT 100

Wall clock 107 ms

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE container_name = 'controller'
  AND trace_id = '4f99282f17d604c3...'
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC
LIMIT 100 FORMAT JSONEachROW

Wall clock 588 ms · 5.5× slower

Query 6: Filter + Substring on message Full-Text / Wildcard Search

Filter by container_name, then search for 'INFO' in the message — the bread-and-butter observability query: narrow by service, then search logs.

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE container_name = 'sidekiq-data-engine'
  AND message LIKE '%INFO%'
LIMIT 100

Wall clock 180 ms

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE container_name = 'demo-app'
  AND hasAnyTokens(message, 'INFO')
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC LIMIT 100

Wall clock 2.1 s · 12× slower

Query 7: Double Substring on message Full-Text / Wildcard Search

Two simultaneous LIKE conditions on message: 'INFO' AND 'timeout'. The most demanding full-text pattern in this benchmark. The gap narrows here because the intersection of posting lists for two common tokens genuinely prunes granules.

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE message LIKE '%INFO%'
  AND message LIKE '%timeout%'
LIMIT 100

Wall clock 1.25 s

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE hasAnyTokens(message, 'INFO')
  AND hasAnyTokens(message, 'timeout')
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC LIMIT 100

Wall clock 2.8 s · 2.2× slower

Query 8: High-Cardinality Equality on pod_name High Cardinality Exact Filter

An equality filter on pod_name, which has very high cardinality (unique per pod deployment). Neither engine has a dedicated secondary index on this column; both must scan. CtrlB's storage layout and scan path are more efficient even without a secondary index.

CtrlB

SELECT * FROM benchmark_5tb_rti_v17
WHERE pod_name =
  'demo-app-sqs-a4544e758c-fe295'
LIMIT 100

Wall clock 3.6 s

ClickHouse

SELECT * FROM default.benchmark_5tb
WHERE pod_name =
  'ingress-nginx-controller-ddc1a51c61-f795c'
  AND _timestamp >= 1776617583000000
  AND _timestamp <= 1776842347000000
ORDER BY _timestamp DESC LIMIT 100

Wall clock 24.5 s · 7× slower

Index Footprint: 16 GiB vs. 282 GiB

The most significant infrastructure difference is the index size on the message column. ClickHouse requires a full-text inverted index to make substring search tractable. That index weighs 282.4 GiB. CtrlB's sparse index on the same column is 16.3 GiB, 17× smaller.

CtrlB · Sparse index

16.3 GiB

0.33% of raw data

Streams cold data from object storage. No local SSD required.

ClickHouse · Text inverted index

282.4 GiB

5.6% of raw data

Requires oversized local SSD to keep compressed MergeTree parts + ~250 GiB text index hot.

The ClickHouse text inverted index uses a Lucene-style approach: for every token produced by the splitByNonAlpha tokenizer, it maintains a posting list of which granules contain that token. At 5 TB of log data, that posting list becomes very large, at 5.6% of the raw data size, compared to CtrlB's 0.33%.

Storage cost is not included in the latency numbers, but it shapes every capacity decision once the dataset crosses a few terabytes. At 50 TB, the ClickHouse text index alone would approach 2.8 TB of additional local SSD.

Same CPU. Very Different Bill.

Both engines ran on a 16-vCPU instance. The CPU was identical. What differed was the attached storage requirement.

Attribute	CtrlB	ClickHouse
Data storage	Object storage (cold streaming)	Local SSD required
Index size (message col.)	16.3 GiB	282.4 GiB
Index as % of raw data	0.33%	5.60%
Index scaling with data	Sub-linear (sparse)	Linear with data volume

Architectural Differences

ClickHouse Index Characteristics

Lucene-style inverted indexes work well for document search: structured token vocabularies and read-heavy patterns over well-defined fields. Log observability presents a different access pattern:

High-cardinality point lookups on UUIDs. Even 1% FPR becomes expensive at 5 TB scale, producing hundreds of false-positive granule reads per query
Substring matches on freeform message content that may cross token boundaries
Compound filters mixing cardinality levels in a single query
Cold cache on every query, with datasets too large to keep hot

CtrlB Index Design

CtrlB's sparse index is built for the observability workload. It keeps a small footprint, reads directly from object storage, achieves near-zero false positives on ID lookups at any scale, and accepts standard SQL LIKE natively.

Conclusion

ClickHouse is a capable analytical database with strong text indexing support. This benchmark followed its documentation for full-text search configuration and ran both engines on identical hardware. The results reflect how each system handles the specific access patterns common in log observability workloads at scale.

CtrlB is purpose-built for this workload shape. Across all 8 queries, the results were consistent:

Performance

8 of 8 queries: CtrlB is faster, by 2.2× to 99×

Index size

17× smaller index: 16.3 GiB vs. 282.4 GiB on the message column

SQL compatibility

Standard SQL with no additional syntax requirements

Storage

No local SSD requirement. Streams cold data directly from object storage

Benchmark dataset

The 5 TB Kubernetes log dataset used in this benchmark is stored in a public S3 bucket. Click here to browse or download the data for reproducibility.

Finding the Needle.
CtrlB vs. ClickHouse, 8 Ways.

Test Environment

Table Schema & Index Setup

Sample log record

The Two Index Families

Ingestion Rate: 280 MB/s vs. 210 MB/s

Results: Wall-Clock Latency Per Query

The 8 Queries

Index Footprint: 16 GiB vs. 282 GiB

Same CPU. Very Different Bill.

Architectural Differences

ClickHouse Index Characteristics

CtrlB Index Design

Conclusion

Benchmark dataset

LATEST ARTICLES

Your LLM doesn't need to read the same log line 900,000 times

Choosing to Drop the Vendor Workspace and Going Headless Instead

One unified data lake vs. a fragmented stack

Take control of your observability

Seamless Integration

Real-time Analytics

Enterprise Grade

Submit Your Query

Finding the Needle.CtrlB vs. ClickHouse, 8 Ways.

Test Environment

Table Schema & Index Setup

Sample log record

The Two Index Families

Ingestion Rate: 280 MB/s vs. 210 MB/s

Results: Wall-Clock Latency Per Query

The 8 Queries

Index Footprint: 16 GiB vs. 282 GiB

Same CPU. Very Different Bill.

Architectural Differences

ClickHouse Index Characteristics

CtrlB Index Design

Conclusion

Benchmark dataset

LATEST ARTICLES

Your LLM doesn't need to read the same log line 900,000 times

Choosing to Drop the Vendor Workspace and Going Headless Instead

One unified data lake vs. a fragmented stack

Take control of your observability

Seamless Integration

Real-time Analytics

Enterprise Grade

Submit Your Query

Finding the Needle.
CtrlB vs. ClickHouse, 8 Ways.