Contamination and Honesty Checks

Picochat assumes small-model evals are easy to fool unless the pipeline makes cheating visible.

The honesty system is designed to answer four questions:

Did eval prompts leak into SFT?
Did eval answers leak into SFT?
Does the base corpus contain eval prompts or support phrases?
Is the generated answer copying the prompt, corpus, or training rows?

Checked Before Training

Preflight and data honesty inspection check:

exact prompt overlaps between SFT and eval
near prompt overlaps between SFT and eval
answer overlaps
corpus prompt hits
support phrase hits in corpus
SFT/eval category coverage
eval split coverage
unanswerable/boundary checks

Release profiles block if corpus contamination is detected.

Checked During and After Runs

Training and reports track:

train/validation gap
validation BPB
train-only canary reproduction
generated n-gram overlap
prompt echo
missing support
forbidden claims
unsupported answers
refusal behavior

The post-run gate uses those signals to block release when needed.

Why Regex Stream Scanning Exists

Large ClimbMix corpora are too big for naive phrase matching. Picochat uses a compiled word-boundary regex matcher over normalized stream chunks for corpus phrase scans. This avoids an expensive phrase-by-phrase inner loop and makes preflight practical on multi-billion-character corpora.

What Honesty Does Not Prove

Passing honesty checks does not prove the model is smart. It only means the score is less likely to be inflated by obvious leakage.

A clean score can still be weak if:

the base corpus is low quality
the model is undertrained
SFT only taught format
eval is too narrow
external benchmarks are missing

Honesty checks are necessary, not sufficient.

Operator Rule

Do not copy failed eval prompts into SFT. Eval is the scoreboard; SFT is the practice set. Add new practice rows that teach the underlying behavior without duplicating the held-out scoring item.