ctrlb-decompose
Compress raw log lines into structural patterns with statistics, anomalies, and correlations. Turn millions of noisy log lines into a handful of actionable patterns.
How It Works
A two-stage normalization and clustering pipeline that processes logs in a single streaming pass with minimal memory footprint.
Timestamp Extraction
Strip & parse timestamps (ISO 8601, Apache, syslog, Unix epoch, etc.) into normalized <TS> markers with DateTime values.
Drain3 Clustering
Tree-based similarity clustering groups logtypes into patterns. Differing tokens become <*> wildcards. Incremental — no second pass needed.
Statistics Accumulation
DDSketch quantiles (p50/p99), HyperLogLog cardinality estimation, top-k values, temporal bucketing, and reservoir-sampled example lines.
Scoring & Correlation
Keyword-based severity (ERROR > WARN > INFO > DEBUG), temporal co-occurrence, shared variable correlation, and error cascade detection across patterns.
Stage 1 — CLP Encoding
CLP (Compact Log Pattern) encoding normalizes variable tokens into typed placeholders, so structurally identical lines produce identical logtypes regardless of the actual values.
Stage 2 — Drain3 Clustering
The Drain algorithm builds a prefix tree over logtypes and groups them by token similarity (configurable threshold, default 0.4). Where tokens diverge, the template gains a <*> wildcard.
This runs incrementally — each line is processed once with no second pass needed.
Variable Classification
Extracted variables are classified into semantic types for richer analysis:
| Type | Example | Detection Method |
|---|---|---|
| IPv4 / IPv6 | 10.0.1.15 | CIDR pattern match |
| UUID | 550e8400-e29b-... | 8-4-4-4-12 hex format |
| Duration | 45ms, 3.2s | Numeric + time unit suffix |
| HexID | 0x1a2b3c | 4+ hex digits |
| Integer | 200 | Parses as i64 |
| Float | 3.14 | Contains ., parses as f64 |
| Enum | ERROR | Low cardinality (<=20 unique, top-3 >= 80%) |
| Timestamp | 2024-01-15T14:22:01Z | RFC 3339 pattern |
| String | anything else | Fallback |
Memory Efficiency
Drain3 Clusters
O(k) with LRU eviction (default 10k max)
Quantiles
DDSketch — fixed ~200 bytes per numeric slot, no raw value storage
Cardinality
HyperLogLog++ — ~200 bytes per high-cardinality variable
Examples
Reservoir sampling — bounded buffer per pattern
Installation
macOS (Homebrew)
Debian / Ubuntu
Build from source
Usage & Options
Command Line Options
ctrlb-decompose [OPTIONS] [FILE]
Arguments:
[FILE] Log file path (reads stdin if omitted or "-")
Options:
--human Human-readable output with colors (default)
--llm LLM-optimized compact markdown
--json Structured JSON output
--top <N> Show top N patterns (default: 20)
--context <N> Example lines per pattern (default: 0)
--no-color Disable ANSI colors
--no-banner Suppress header/footer
-q, --quiet Suppress progress messages
-h, --help Show help
-V, --version Show versionOutput Formats
| Format | Flag | Best for |
|---|---|---|
| Human | --human (default) | Terminal investigation — colored, visual bars |
| LLM | --llm | Feeding into LLMs — compact, token-efficient markdown |
| JSON | --json | Programmatic consumption — structured, machine-readable |