Science Replication
Automated computational reproducibility assessment for social science papers
Pipeline
- Paper analysis — PDF parsing, table/figure extraction, claim identification
- Code analysis — Stata/R/Python parsing, dependency graphs, master script detection
- Code execution — Dockerized sandboxes with timeout and memory limits
- Results comparison — Fuzzy variable matching, SE/R²/N comparison, tolerance-based scoring
- Figure comparison — Side-by-side visual comparison via Claude vision
- Report generation — Classification with detailed cell-level evidence
Published Reports
Fully Reproducible
Martinez (2022)
— How Much Should We Trust the Dictator's GDP Growth
1 coefficient reproduced · Stata → Python · I4R reference
Fully Reproducible
Williams (2022)
— Historical Lynchings and Contemporary Voting
2 coefficients reproduced · Stata → Python · I4R reference
Fully Reproducible
Geography of Repression
— Opposition to Autocracy in Chile
1 coefficient reproduced (0.7% diff) · Stata → Python · I4R reference
Largely Reproducible
Side Effects of Immunity
— Malaria and African Development
Tables 1, 3 reproduced; Table 2 mixed · Stata → Python · I4R reference
Largely Reproducible
Finance and Green Growth
— Financial development and CO2 emissions
3 coefficients compared · Stata → Python · I4R reference
Partially Reproducible
Teaching Norms
— Direct Evidence of Parental Transmission
26 CSVs produced, interaction terms diverge · Stata → Python · I4R reference
I4R Meta Database Coverage
We mapped 109 papers from the
I4R Meta Database with
6,583 robustness check coefficients as ground-truth reference data.
| Status |
Papers |
Description |
| Fully Reproduced | 6 | All coefficients match within tolerance |
| Largely Reproduced | 2 | Most coefficients match, minor differences |
| Partially Reproduced | 1 | Some outputs produced, significant discrepancies |
| Attempted | 23 | Packages downloaded, translated, executed |
| Awaiting Execution | 109 | I4R reference data mapped, code+data needed |
Pipeline: Stata .do → Python translation → execution → coefficient comparison vs I4R reference
Browse full paper database →