Total Disclosure Benchmark Registry.

Complete benchmark suite. All results honest — including where Purgr loses.

CARD 1 — NIAH: 100% FACTUAL RECALL
+

100% needle found across 7 scales from 10K to 1M tokens. Lifecycle cadence: compress() called every 10 messages (PRODUCTION_BATCH_SIZE=10).

Scale TRR Final Call Latency Needle
10k44.3%5.8ms✓ survived
25k57.8%24.5ms✓ survived
50k62.6%27.3ms✓ survived
100k69.5%50.8ms✓ survived
200k70.2%105.5ms✓ survived
500k71.4%318.4ms✓ survived
1M75.0%714.9ms✓ survived
TRR Progression (10k → 1M)
10k
████░░░░░░ 44.3%
25k
██████░░░░ 57.8%
50k
██████░░░░ 62.6%
100k
███████░░░ 69.5%
200k
███████░░░ 70.2%
500k
███████░░░ 71.4%
1M
████████░░ 75.0%
Methodology
Lifecycle benchmark — compress() called every 10 messages simulating realistic production session cadence.
CARD 2 — COMPETITIVE: PURGR vs LLMLINGUA-2
+

96% vs 75% NIAH. Zero dependencies vs transformer model required.

Comparison by Fixture
Dollar Amt
P: ██████████ | L: ██████████
Deadline
P: ██████░░░░ | L: ██████░░░░
Person/Role
P: ██████████ | L: █████░░░░░
Version Str
P: ██████████ | L: ░░░░░░░░░░
Negation
P: ██████████ | L: ██████████
Compound
P: ██████████ | L: ██████████
Coding Early
P: ██████████ | L: —
Doc Middle
P: ██████████ | L: —
Plan Early
P: ██████████ | L: —
Plan Late
P: ██████████ | L: —
Fixture Purgr LLMLingua-2
Dollar Amount100%100%
Deadline60%60%
Person/Role100%50%
Version String100%0%
Negation100%100%
Compound100%100%
Coding Early100%
Document Middle100%
Planning Early100%
Planning Late100%
Mean96%75%
Tool Comparison
Tool Mean NIAH Avg TRR Latency (10k tokens) Dependencies
Purgr96%44.3%5.8msZero
LLMLingua-275%~50%14,224msTransformer model
Latency measured on identical hardware. Purgr: final compress() call. LLMLingua-2: full pipeline including model inference.
Catastrophic Failure Detected
adv-4-version went from 0% to 100% after char n-gram Jaccard addition. LLMLingua-2 scores 0% on version strings.
CARD 3 — O(N) SCALING
+

Linear scaling confirmed 10k to 1M tokens. 714.9ms at 1M on local CPU, no GPU.

Latency Growth vs Token Scale
10k
█ 5.8ms
100k
███ 50.8ms
500k
██████ 318.4ms
1M
██████████ 714.9ms
Scale TRR Final Call Latency Needle
10k44.3%5.8ms✓ survived
100k69.5%50.8ms✓ survived
1M75.0%714.9ms✓ survived
Architectural Note: O(N²) Blowup Prevented
O(N) scaling behavior is confirmed. The 20-sample Jaccard boundary window limits pair-wise evaluations on high-turn conversations, maintaining flat latency profiles and ensuring zero memory degredation or processing blowup at massive scales.
CARD 5 — DETERMINISM: SAME INPUT, SAME OUTPUT
+

10 runs on identical input produce byte-identical compressed output and identical Merkle roots.

Msgs
100%
Identical
Merkle
100%
Identical
Runs
10
Tested
Latency
6.6ms
Mean
Property Result Notes
TokensIdentical1,206 tokens all 10 runs
ContentByte-identicalZero content variance
Merkle RootIdenticalReceipt chain is reproducible
SignaturesUniqueCryptographically correct
Reproducibility
Compression decisions are driven entirely by deterministic scoring — EWMA Jaccard overlap, Koopman operator matrix updates, and regex-based fact detection. No random sampling, no stochastic elements.
Bug Fix Note: v1.0.2
Anchor IDs were previously generated using performance.now() timestamps, causing non-deterministic IDs. This was corrected in v1.0.2.
CARD 4 — FACT FIDELITY: 143/143 CRITICAL FACTS PRESERVED
+

Post-compression deterministic verification. Every receipt shows how many critical facts survived.

Session Tokens In TRR Facts Preserved Fidelity
Developer session (bigcon.txt)242,30453%143143100%
Elon Musk interview25,74967%66100%
Yann LeCun interview34,59192%11100%
Grok technical session44,21417%1616100%
Fact Categories
Facts include: currency values, dates, identifiers, regulatory citations, version strings. Zero dependencies. Deterministic. Signed in Ed25519 receipt.
CARD 6 — CLA LIVENESS: LOGICALLY NECESSARY MESSAGES RESCUED
+

NCD/LZ hash containment identifies messages Koopman would incorrectly compress. Tested on 242k-token developer session (bigcon.txt).

Configuration TRR Rescued Why
DMD only53%0Baseline
DMD + Liveness51%4 messagesNCD/LZ identifies load-bearing messages
DMD + Liveness + Co-occurrence49%10 messagesSemantic signal rescues 6 additional messages
Why TRR Drops
TRR drops as more messages are correctly identified as load-bearing and preserved. Lower TRR here means better accuracy — not worse compression.
Configuration
Overhead: ~942ms on 242k token session. Default off. Pro feature. enableLiveness: true.
CARD 7 — VERIFYDOCUMENT(): HARD-FACT TRACEABILITY
+

Standalone stateless function. Proves which specific numbers, dates, and identifiers in an LLM response exist in the source document.

Category Description
GroundedFact present verbatim in document
DerivedFact mathematically computable from adjacent document facts
Ungrounded⚠ Not traceable to document — verify manually
Properties
Each compression decision recorded and independently verifiable, signed with Ed25519. Zero dependencies. Runs in browser. No model required.
Honest Scope
Proves token presence, not relational accuracy.
CARD 8 — DIVERSE CONTENT: COMPRESSION ACROSS CONTENT TYPES
+

Purgr compresses any content type — not just developer conversations.

Content Type Format Messages Tokens TRR Facts
Developer architecture sessionclaude-web-export617242,30453%143/143
Lex Fridman / Elon Musk interviewprose-transcript47725,74967%6/6
Lex Fridman / Yann LeCun interviewprose-transcript20234,59192%1/1
Grok technical conversationprose-transcript56044,21417%16/16
Parser Note
prose-transcript parser handles: podcast exports, interview transcripts, raw chat logs, any continuous prose with no recognized speaker markers.

Testing Integrity.

All benchmarks are evaluated against verbatim production builds with no benchmarking cheats.

01 / PRODUCTION

Verbatim Builds.

Tests evaluated running raw output from Purgr against unmodified dependencies.

02 / HONESTY

Known Limitations.

Pure synthetic vocabulary-drift NIAH scores 0% — no token overlap means no lexical signal. Co-occurrence matrix improves semantic survival on real multi-session content. Semantic paraphrase detection without embeddings remains an open research problem.

03 / REPRO

Source Included.

Fixture synthesis and NIAH run scripts are included identically in the deployed SDK source.