Concordance Engine

Where it helps

Pre-submission self-check

Before sending a manuscript out, run every reported p-value through verify_statistics_pvalue from the raw inputs. Catches transposed numerators, wrong-tail conventions, and copy-paste drift between draft revisions.

Peer review

Reviewing a chemistry paper claiming a stoichiometric mechanism? Paste the equation into verify_chemistry. If it doesn't balance, the proposed mechanism is incomplete — and you have a concrete comment to write.

Reproducibility audits

Ask Claude (with the engine connected) to walk a paper's results: "for each table reporting a p-value, recompute from the supplied n and effect size; flag mismatches." That's a real reproducibility pass, not a vibe check.

Teaching

Have students hand-derive a t-statistic, then verify against the engine. The mismatch is the lesson — what assumption did they get wrong?

A worked example

You're reviewing a methods section that reports:

"A two-sample t-test (n₁ = n₂ = 30, mean diff = 1.0, pooled SD = 1.0) yielded p < 0.001."

Run it:

verify_statistics_pvalue({
  "test": "two_sample_t",
  "n1": 30, "n2": 30,
  "mean1": 5.0, "mean2": 4.0,
  "sd1": 1.0, "sd2": 1.0,
  "tail": "two",
  "claimed_p": 0.001
})

→ {"status": "MISMATCH",
   "detail": "claimed p=0.001, recomputed p=0.000297 (diff 7.0e-04)",
   "data": {"recomputed_t": 3.873, "df": 58.0,
             "recomputed_p": 0.000297, "tail": "two-sided"}}

The author wasn't lying — they were rounding. But "p < 0.001" is true and "p ≈ 0.001" is not. The recomputed value is 0.0003. Now you have a precise reviewer comment instead of a hunch.

What it doesn't replace

The engine doesn't read your data. It can't tell you whether your experimental design is sound, whether your effect size is meaningful, whether you should have pre-registered, or whether your sample is representative. Those are scientific judgments. The engine catches the layer below judgment — the arithmetic that has to be right before any of the judgment matters.

A correct p-value computed from a confounded experiment is still wrong in the way that matters. The engine is necessary, not sufficient.

Domains covered for science workflows

Install → See all verifiers