By Condition
| Condition | Runs | Avg Tokens | Conv.% |
| Loading… |
By Genre
No analytics data yet.
Study data reflects testing runs only (conditions A–D). Use the panel below to explicitly tag runs as study runs.
Tag Runs for Study
Select completed runs and assign them a study condition. Only explicitly tagged runs appear in Study Data.
Condition Summary
| Condition | n |
Total Critiques | Avg Critiques |
A count | C count | I count |
I-rate | A-rate |
Avg Tokens | Avg Loops | Conv.% |
| Loading… |
Per-Manuscript (B3 Paired Comparison)
I-rate per condition per manuscript. Use for Wilcoxon paired tests.
| Manuscript | Cond A I-rate | Cond B I-rate | Cond C I-rate | Cond D I-rate |
| Loading… |
ACI Distribution by Condition
Inappropriate Rate by Condition
All usage data is fully anonymized and aggregated. No user IDs, emails, or reconstructable sequences are stored or displayed.
B2 — Inter-Rater Reliability tracking. Upload sampled critique labels (agent vs. human) to compute Cohen's κ.
| Batch ID | n Critiques |
Cohen's κ | Agreement Rate | Date | |
| Loading… |
| Critique ID | Title | Agent Label | Human Label | Agreement | Rater |
| Select a batch. |
B3 — Statistical test results. Enter results from Wilcoxon signed-rank or Mann-Whitney U tests. Results auto-format for paper citation.
| Test | Conditions | Metric |
Statistic | p-value |
Effect (r) | 95% CI | n A / n B |
Citation | |
| No results yet. |
Download a fully self-contained study report (works offline). Source is locked to Testing for study exports.
Run counts by condition (testing)