Marnie — Analytics
Production
Total Runs
Completed
Total Tokens
Avg Loops
Convergence Rate
Avg Runtime
seconds
Runs per day (30 days)
By Condition
ConditionRunsAvg TokensConv.%
Loading…
By Genre
GenreRuns
Loading…

Study data reflects testing runs only (conditions A–D). Use the panel below to explicitly tag runs as study runs.

Tag Runs for Study

Select completed runs and assign them a study condition. Only explicitly tagged runs appear in Study Data.

Condition Summary
Conditionn Total CritiquesAvg Critiques A countC countI count I-rateA-rate Avg TokensAvg LoopsConv.%
Loading…
Per-Manuscript (B3 Paired Comparison)

I-rate per condition per manuscript. Use for Wilcoxon paired tests.

ManuscriptCond A I-rateCond B I-rateCond C I-rateCond D I-rate
Loading…
ACI Distribution by Condition
Inappropriate Rate by Condition
Data Integrity
Loading…
Flexible Query
GroupValueCount
Run a query above.
Convergence
Loops to convergence (distribution)
Critique Stats
Auditor recommendations
Agent Stats
AgentRunsTotal CritiquesTotal Tokens
Loading…
All usage data is fully anonymized and aggregated. No user IDs, emails, or reconstructable sequences are stored or displayed.
Runs per day (60 days)
Review Mode Distribution

B2 — Inter-Rater Reliability tracking. Upload sampled critique labels (agent vs. human) to compute Cohen's κ.

Batch IDn Critiques Cohen's κAgreement RateDate
Loading…

B3 — Statistical test results. Enter results from Wilcoxon signed-rank or Mann-Whitney U tests. Results auto-format for paper citation.

TestConditionsMetric Statisticp-value Effect (r)95% CIn A / n B Citation
No results yet.

Download a fully self-contained study report (works offline). Source is locked to Testing for study exports.

Run counts by condition (testing)

Completed runs from your admin account. Select one or more runs, choose a run type, and assign them to the analytics dashboard.

Run Date Loops Assignment Actions
Loading…