Science Replication
Automated computational reproducibility assessment for social science papers
Pipeline
- Paper analysis — PDF parsing, table/figure extraction, claim identification
- Code analysis — Stata/R/Python parsing, dependency graphs, master script detection
- Code execution — Dockerized sandboxes with timeout and memory limits
- Results comparison — Fuzzy variable matching, SE/R²/N comparison, tolerance-based scoring
- Figure comparison — Side-by-side visual comparison via Claude vision
- Report generation — Classification with detailed cell-level evidence
Published Reports
Fully Reproducible
Kafle & Balasubramanya (2023)
— Reducing food insecurity through equitable investments in irrigation: The case of Niger
6/6 tables reproduced · Stata → Python translation · Panel FE
Fully Reproducible
Martinez (2022)
— How Much Should We Trust the Dictator's GDP Growth
1 coefficient reproduced · Stata → Python · I4R reference
Fully Reproducible
Williams (2022)
— Historical Lynchings and Contemporary Voting
2 coefficients reproduced · Stata → Python · I4R reference
I4R Meta Database Coverage
We mapped 109 papers from the
I4R Meta Database with
6,583 robustness check coefficients as ground-truth reference data.
| Status |
Papers |
Description |
| Fully Reproduced | 3 | All coefficients match within tolerance |
| Largely Reproduced | 1 | Most coefficients match, minor differences |
| Attempted | 13 | OSF packages downloaded, translated, executed |
| Awaiting Execution | 118 | I4R reference data mapped, code+data needed |
Pipeline: Stata .do → Python translation → execution → coefficient comparison vs I4R reference
Browse full paper database →