Home

Row

A scoping review on metrics to quantify reproducibility

Reproducibility is increasingly recognized as essential to scientific progress and integrity. A growing number of meta-research studies and projects aim to evaluate interventions to improve reproducibility, including three recently funded EU consortia (, and ). Large-scale replication projects and replication studies in general, with the aim to quantify different aspects of reproducibility, have become more common. Since no standardized approach to measuring reproducibility exists, a diverse set of metrics has emerged. This highlights the need for a comprehensive overview of available metrics. To this end, we conducted a scoping review of the published literature and identified a total of 50 metrics to quantify reproducibility. These metrics were characterized based on their type (formulas and/or statistical models, frameworks, graphical representations, studies and questionnaires, algorithms), the input required, and appropriate application scenarios. Each metric addresses distinct questions, problems and needs. Our review provides a comprehensive and valuable resource in the form of a ``live’’, interactive table for future replication teams and meta-researchers, offering support in how to select the most appropriate metrics that are aligned with their research questions and project goals.

This dashboard presents the full Table of reproducibility metrics identified in our scoping review. It can be found in the panel Table. All columns can be searched and arranged, which should make it user friendly. The cited papers are listed in the panel References.

This table should be a living document and evolve with the community. Feedback including requests to add metrics can be sent to Rachel Heyard. The code for the live table is hosted on github.com/rachelHey/reproducibility_metrics and we invite anyone to suggest the addition of reproducibility metrics via github-issues. A CSV version of our table can be found here: osf.io/sbcy3.


Table

References

Column

Alipourfard, Nazanin, Beatrix Arendt, Daniel M. Benjamin, Noam Benkler, Michael Bishop, Mark Burstein, Martin Bush, et al. 2024. “Systematizing Confidence in Open Research and Evidence (SCORE),” February. https://doi.org/10.31235/osf.io/46mnb.
Amaral, Olavo B, Kleber Neves, Ana P Wasilewska-Sampaio, and Clarissa FD Carneiro. 2019. “The Brazilian Reproducibility Initiative.” Edited by Peter Rodgers, Timothy M Errington, and Richard Klein. eLife 8 (February): e41602. https://doi.org/10.7554/eLife.41602.
Amini, Shahram M., and Christopher F. Parmeter. 2012. “Comparison of Model Averaging Techniques: Assessing Growth Determinants.” Journal of Applied Econometrics 27 (5): 870–76. https://doi.org/10.1002/jae.2288.
Anderson, Samantha F., and Ken Kelley. 2022. “Sample Size Planning for Replication Studies: The Devil Is in the Design.” Psychological Methods, No Pagination Specified–. https://doi.org/10.1037/met0000520.
Anderson, Samantha F., and Scott E. Maxwell. 2016. “There’s More Than One Way to Conduct a Replication Study: Beyond Statistical Significance.” Psychological Methods 21 (1): 1–12. https://doi.org/10.1037/met0000051.
Aria, Massimo, Trang Le, Corrado Cuccurullo, Alessandra Belfiore, and June Choe. 2024. openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex.” The R Journal 15 (4): 167–80. https://doi.org/10.32614/RJ-2023-089.
Arroyo-Araujo, María, Radka Graf, Martine Maco, Elsbeth van Dam, Esther Schenker, Wilhelmus Drinkenburg, Bastijn Koopmans, et al. 2019. “Reproducibility via Coordinated Standardization: A Multi-Center Study in a Shank2 Genetic Rat Model for Autism Spectrum Disorders.” Scientific Reports 9 (1): 11602. https://doi.org/10.1038/s41598-019-47981-0.
Arroyo-Araujo, María, Bernhard Voelkl, Clément Laloux, Janja Novak, Bastijn Koopmans, Ann-Marie Waldron, Isabel Seiffert, et al. 2022. “Systematic Assessment of the Replicability and Generalizability of Preclinical Findings: Impact of Protocol Harmonization Across Laboratory Sites.” PLOS Biology 20 (11): e3001886. https://doi.org/10.1371/journal.pbio.3001886.
Asendorpf, Jens B., Mark Conner, Filip De Fruyt, Jan De Houwer, Jaap J. A. Denissen, Klaus Fiedler, Susann Fiedler, et al. 2013. “Recommendations for Increasing Replicability in Psychology.” European Journal of Personality 27 (2): 108–19. https://doi.org/10.1002/per.1919.
Bachmann, Gregor, Thomas Hofmann, and Aurélien Lucchi. 2022. “Generalization Through The Lens Of Leave-One-Out Error.” arXiv. https://doi.org/10.48550/ARXIV.2203.03443.
Bahor, Zsanett, Jing Liao, Gillian Currie, Can Ayder, Malcolm Macleod, Sarah K. McCann, Alexandra Bannach-Brown, et al. 2021. “Development and Uptake of an Online Systematic Review Platform: The Early Years of the CAMARADES Systematic Review Facility (SyRF).” BMJ Open Science 5 (1): e100103. https://doi.org/10.1136/bmjos-2020-100103.
Baig, Sabeeh A. 2022. “Bayesian Inference: Evaluating Replication Attempts With Bayes Factors.” Nicotine & Tobacco Research 24 (4): 626–29. https://doi.org/10.1093/ntr/ntab219.
Balli, Hatice Ozer, and Bent E. Sørensen. 2013. “Interaction Effects in Econometrics.” Empirical Economics 45 (1): 583–603. https://doi.org/10.1007/s00181-012-0604-2.
Barba, Lorena A. 2018. “Terminologies for Reproducible Research.” arXiv. https://doi.org/10.48550/arXiv.1802.03311.
Bartoš, František, and Ulrich Schimmack. 2022. “Z-Curve 2.0: Estimating Replication Rates and Discovery Rates.” Meta-Psychology 6 (September). https://doi.org/10.15626/MP.2021.2720.
Bastiaansen, Jojanneke A., Yoram K. Kunkels, Frank J. Blaauw, Steven M. Boker, Eva Ceulemans, Meng Chen, Sy-Miin Chow, et al. 2020. “Time to Get Personal? The Impact of Researchers Choices on the Selection of Treatment Targets Using the Experience Sampling Methodology.” Journal of Psychosomatic Research 137 (October): 110211. https://doi.org/10.1016/j.jpsychores.2020.110211.
Bayarri, M. J, and A. M Mayoral. 2002. “Bayesian Design of Successful Replications.” The American Statistician 56 (3): 207–14. https://doi.org/10.1198/000313002155.
Belbasis, Lazaros, and Orestis A. Panagiotou. 2022. “Reproducibility of Prediction Models in Health Services Research.” BMC Research Notes 15 (1): 204. https://doi.org/10.1186/s13104-022-06082-4.
Belz, Anya. 2022. “A Metrological Perspective on Reproducibility in NLP*.” Computational Linguistics 48 (4): 1125–35. https://doi.org/10.1162/coli_a_00448.
Belz, Anya, Maja Popovic, and Simon Mille. 2022. “Quantified Reproducibility Assessment of NLP Results.” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 16–28. Dublin, Ireland: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.2.
Bland, Martin J., and DouglasG. Altman. 1986. “Statistical Methods for Assessing Agreement Between Two Methods of Clinical Measurement.” The Lancet 327 (8476): 307–10. https://doi.org/10.1016/S0140-6736(86)90837-8.
Boehm, Udo, Jeffrey Annis, Michael J. Frank, Guy E. Hawkins, Andrew Heathcote, David Kellen, Angelos-Miltiadis Krypotos, et al. 2018. “Estimating Across-Trial Variability Parameters of the Diffusion Decision Model: Expert Advice and Recommendations.” Journal of Mathematical Psychology 87 (December): 46–75. https://doi.org/10.1016/j.jmp.2018.09.004.
Bonett, Douglas G. 2012. “Replication-Extension Studies.” Current Directions in Psychological Science 21 (6): 409–12. https://doi.org/10.1177/0963721412459512.
———. 2021a. “Design and Analysis of Replication Studies.” Organizational Research Methods 24 (3): 513–29. https://doi.org/10.1177/1094428120911088.
———. 2021b. “Design and Analysis of Replication Studies.” Organizational Research Methods 24 (3): 513–29. https://doi.org/10.1177/1094428120911088.
Borsboom, Denny, Eiko I. Fried, Sacha Epskamp, Lourens J. Waldorp, Claudia D. Van Borkulo, Han L. J. Van Der Maas, and Angélique O. J. Cramer. 2017. “False Alarm? A Comprehensive Reanalysis of Evidence That Psychopathology Symptom Networks Have Limited Replicability’ by Forbes, Wright, Markon, and Krueger (2017).” Journal of Abnormal Psychology 126 (7): 989–99. https://doi.org/10.1037/abn0000306.
Botvinik-Nezer, Rotem, Felix Holzmeister, Colin F. Camerer, Anna Dreber, Juergen Huber, Magnus Johannesson, Michael Kirchler, et al. 2020. “Variability in the Analysis of a Single Neuroimaging Dataset by Many Teams.” Nature 582 (7810): 84–88. https://doi.org/10.1038/s41586-020-2314-9.
Bouwmeester, S., P. P. J. L. Verkoeijen, B. Aczel, F. Barbosa, L. Bègue, P. Brañas-Garza, T. G. H. Chmura, et al. 2017. “Registered Replication Report: Rand, Greene, and Nowak (2012).” Perspectives on Psychological Science 12 (3): 527–42. https://doi.org/10.1177/1745691617693624.
Boyce, Veronica, Maya Mathur, and Michael C. Frank. 2023. “Eleven Years of Student Replication Projects Provide Evidence on the Correlates of Replicability in Psychology.” Royal Society Open Science 10 (11): 231240. https://doi.org/10.1098/rsos.231240.
Brandt, Mark J., Hans IJzerman, Ap Dijksterhuis, Frank J. Farach, Jason Geller, Roger Giner-Sorolla, James A. Grange, Marco Perugini, Jeffrey R. Spies, and Anna van ’t Veer. 2014. “The Replication Recipe: What Makes for a Convincing Replication?” Journal of Experimental Social Psychology 50 (January): 217–24. https://doi.org/10.1016/j.jesp.2013.10.005.
Brauer, Jurgen. 2007. “Data, Models, Coefficients: The Case of United States Military Expenditure.” Conflict Management and Peace Science 24 (1): 55–64. https://doi.org/10.1080/07388940601102845.
Braver, Sanford L., Felix J. Thoemmes, and Robert Rosenthal. 2014. “Continuously Cumulating Meta-Analysis and Replicability.” Perspectives on Psychological Science 9 (3): 333–42. https://doi.org/10.1177/1745691614529796.
Breznau, Nate, Eike Mark Rinke, Alexander Wuttke, Hung H. V. Nguyen, Muna Adem, Jule Adriaans, Amalia Alvarez-Benjumea, et al. 2022. “Observing Many Researchers Using the Same Data and Hypothesis Reveals a Hidden Universe of Uncertainty.” Proceedings of the National Academy of Sciences 119 (44): e2203150119. https://doi.org/10.1073/pnas.2203150119.
Brunner, Jerry, and Ulrich Schimmack. 2020. “Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for Significance.” Meta-Psychology 4 (May). https://doi.org/10.15626/MP.2018.874.
Camerer, Colin F., Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, et al. 2016. “Evaluating Replicability of Laboratory Experiments in Economics.” Science 351 (6280): 1433–36. https://doi.org/10.1126/science.aaf0918.
Camerer, Colin F., Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, et al. 2018. “Evaluating the Replicability of Social Science Experiments in Nature and Science Between 2010 and 2015.” Nature Human Behaviour 2 (9): 637–44. https://doi.org/10.1038/s41562-018-0399-z.
Chang, Andrew C, and Phillip Li. 2022. “Is Economics Research Replicable? Sixty Published Papers From Thirteen Journals Say Often Not.” Critical Finance Review 11 (1): 185–206. https://doi.org/10.1561/104.00000053.
Chang, J.-Y. A., J. B. Chilcott, and N. R. Latimer. 2024. “Leveraging Real-World Data to Assess Treatment Sequences in Health Economic Evaluations: A Study Protocol for Emulating Target Trials Using the English Cancer Registry and US Electronic Health Records-Derived Database.” Monograph. HEDS Discussion Paper. https://eprints.whiterose.ac.uk/208318/.
Cheung, I., L. Campbell, E. P. LeBel, R. A. Ackerman, B. Aykutoğlu, Š. Bahník, J. D. Bowen, et al. 2016. “Registered Replication Report: Study 1 From Finkel, Rusbult, Kumashiro, & Hannon (2002).” Perspectives on Psychological Science 11 (5): 750–64. https://doi.org/10.1177/1745691616664694.
Clemens, Michael A. 2017. THE MEANING OF FAILED REPLICATIONS: A REVIEW AND PROPOSAL.” Journal of Economic Surveys 31 (1): 326–42. https://doi.org/10.1111/joes.12139.
Cobey, Kelly D, Christophe A Fehlmann, Marina Christ Franco, Ana Patricia Ayala, Lindsey Sikora, Danielle B Rice, Chenchen Xu, et al. 2023a. “Epidemiological Characteristics and Prevalence Rates of Research Reproducibility Across Disciplines: A Scoping Review of Articles Published in 2018-2019.” eLife 12 (June): e78518. https://doi.org/10.7554/eLife.78518.
Cobey, Kelly D, Christophe A Fehlmann, Marina Christ Franco, Ana Patricia Ayala, Lindsey Sikora, Danielle B Rice, Chenchen Xu, et al. 2023b. “Epidemiological Characteristics and Prevalence Rates of Research Reproducibility Across Disciplines: A Scoping Review of Articles Published in 2018-2019.” Edited by David B Allison, Mone Zaidi, Colby J Vorland, Arthur Lupia, and Jon Agley. eLife 12 (June): e78518. https://doi.org/10.7554/eLife.78518.
Coles, Nicholas A., David S. March, Fernando Marmolejo-Ramos, Jeff T. Larsen, Nwadiogo C. Arinze, Izuchukwu L. G. Ndukaihe, Megan L. Willis, et al. 2022. “A Multi-Lab Test of the Facial Feedback Hypothesis by the Many Smiles Collaboration.” Nature Human Behaviour 6 (12): 1731–42. https://doi.org/10.1038/s41562-022-01458-9.
Cologna, Viktoria, Niels G. Mede, Sebastian Berger, John Besley, Cameron Brick, Marina Joubert, Edward Maibach, et al. 2024. “Trust in Scientists and Their Role in Society Across 67 Countries,” January. https://doi.org/10.31219/osf.io/6ay7s.
Coretta, Stefano, Joseph V. Casillas, Simon Roessig, Michael Franke, Byron Ahn, Ali H. Al-Hoorie, Jalal Al-Tamimi, et al. 2023. “Multidimensional Signals and Analytic Flexibility: Estimating Degrees of Freedom in Human-Speech Analyses.” Advances in Methods and Practices in Psychological Science 6 (3): 25152459231162567. https://doi.org/10.1177/25152459231162567.
Costigan, Samantha, John Ruscio, and Jarret T. Crawford. 2024. “Performing Small-Telescopes Analysis by Resampling: Empirically Constructing Confidence Intervals and Estimating Statistical Power for Measures of Effect Size.” Advances in Methods and Practices in Psychological Science 7 (1): 25152459241227865. https://doi.org/10.1177/25152459241227865.
Cova, Florian, Brent Strickland, Angela Abatista, Aurélien Allard, James Andow, Mario Attie, James Beebe, et al. 2021. “Estimating the Reproducibility of Experimental Philosophy.” Review of Philosophy and Psychology 12 (1): 9–44. https://doi.org/10.1007/s13164-018-0400-9.
Cumming, Geoff. 2008. “Replication and p Intervals: P Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better.” Perspectives on Psychological Science 3 (4): 286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x.
Cumming, Geoff, and Robert Maillardet. 2006. “Confidence Intervals and Replication: Where Will the Next Mean Fall?” Psychological Methods 11 (3): 217–27. https://doi.org/10.1037/1082-989X.11.3.217.
De Vet, Henrica C. W., Caroline B. Terwee, Dirk L. Knol, and Lex M. Bouter. 2006. “When to Use Agreement Versus Reliability Measures.” Journal of Clinical Epidemiology 59 (10): 1033–39. https://doi.org/10.1016/j.jclinepi.2005.10.015.
Dear, Peter. n.d. Revolutionizing the Sciences. Accessed September 9, 2024. https://www.bloomsbury.com/uk/revolutionizing-the-sciences-9781352003130/.
Dear, Peter Robert. 2019. Revolutionizing the Sciences: European Knowledge in Transition, 1500-1700. Third edition. Oxford: macmillan international, Higher Education.
Dixon, Peter, and Scott Glover. 2020. “Assessing Evidence for Replication: A Likelihood-Based Approach.” Behavior Research Methods 52 (6): 2452–59. https://doi.org/10.3758/s13428-020-01403-6.
Dongen, Noah N. N. van, Johnny B. van Doorn, Quentin F. Gronau, Don van Ravenzwaaij, Rink Hoekstra, Matthias N. Haucke, Daniel Lakens, et al. 2019. “Multiple Perspectives on Inference for Two Simple Statistical Scenarios.” The American Statistician 73 (sup1): 328–39. https://doi.org/10.1080/00031305.2019.1565553.
Dreber, Anna, Thomas Pfeiffer, Johan Almenberg, Siri Isaksson, Brad Wilson, Yiling Chen, Brian A. Nosek, and Magnus Johannesson. 2015. “Using Prediction Markets to Estimate the Reproducibility of Scientific Research.” Proceedings of the National Academy of Sciences 112 (50): 15343–47. https://doi.org/10.1073/pnas.1516179112.
Duvendack, Maren, Richard Palmer-Jones, and W. Robert Reed. 2017. “What Is Meant by Replication and Why Does It Encounter Resistance in Economics?” American Economic Review 107 (5): 46–51. https://doi.org/10.1257/aer.p20171031.
Ebersole, Charles R., Olivia E. Atherton, Aimee L. Belanger, Hayley M. Skulborstad, Jill M. Allen, Jonathan B. Banks, Erica Baranski, et al. 2016. “Many Labs 3: Evaluating Participant Pool Quality Across the Academic Semester via Replication.” Journal of Experimental Social Psychology, Special Issue: Confirmatory, 67 (November): 68–82. https://doi.org/10.1016/j.jesp.2015.10.012.
Ebersole, Charles R., Maya B. Mathur, Erica Baranski, Diane-Jo Bart-Plange, Nicholas R. Buttrick, Christopher R. Chartier, Katherine S. Corker, et al. 2020. “Many Labs 5: Testing Pre-Data-Collection Peer Review as an Intervention to Increase Replicability.” Advances in Methods and Practices in Psychological Science 3 (3): 309–31. https://doi.org/10.1177/2515245920958687.
Erdfelder, Edgar, and Rolf Ulrich. 2018. “Zur Methodologie von Replikationsstudien.” Psychologische Rundschau 69 (1): 3–21. https://doi.org/10.1026/0033-3042/a000387.
Errington, Timothy M, Maya Mathur, Courtney K Soderberg, Alexandria Denis, Nicole Perfito, Elizabeth Iorns, and Brian A Nosek. 2021. “Investigating the Replicability of Preclinical Cancer Biology.” Edited by Renata Pasqualini and Eduardo Franco. eLife 10 (December): e71601. https://doi.org/10.7554/eLife.71601.
Fabrigar, Leandre R., and Duane T. Wegener. 2016. “Conceptualizing and Evaluating the Replication of Research Results.” Journal of Experimental Social Psychology 66 (September): 68–80. https://doi.org/10.1016/j.jesp.2015.07.009.
Farrar, Benjamin, Markus Boeckle, and Nicola Clayton. 2020. “Replications in Comparative Cognition: What Should We Expect and How Can We Improve?” Animal Behavior and Cognition 7 (1): 1–22. https://doi.org/10.26451/abc.07.01.02.2020.
Fidler, Fiona, Yung En Chee, Bonnie C. Wintle, Mark A. Burgman, Michael A. McCarthy, and Ascelin Gordon. 2017. “Metaresearch for Evaluating Reproducibility in Ecology and Evolution.” BioScience, January, biw159. https://doi.org/10.1093/biosci/biw159.
Fišar, Miloš, Ben Greiner, Christoph Huber, Elena Katok, Ali Ozkes, and Management Science Reproducibility Collaboration. 2024. “Reproducibility in Management Science,” February. https://doi.org/10.31219/osf.io/mydzv.
Fletcher, Samuel C. 2021. “How (Not) to Measure Replication.” European Journal for Philosophy of Science 11 (2): 57. https://doi.org/10.1007/s13194-021-00377-2.
Forbes, Miriam K., Aidan G. C. Wright, Kristian E. Markon, and Robert F. Krueger. 2021. “Quantifying the Reliability and Replicability of Psychopathology Network Characteristics.” Multivariate Behavioral Research 56 (2): 224–42. https://doi.org/10.1080/00273171.2019.1616526.
Fraser, Hannah, Martin Bush, Bonnie C. Wintle, Fallon Mody, Eden T. Smith, Anca M. Hanea, Elliot Gould, Victoria Hemming, Daniel G. Hamilton, Libby Rumpff, David P. Wilkinson, Ross Pearson, Felix Singleton Thorn, et al. 2023. “Predicting Reliability Through Structured Expert Elicitation with the repliCATS (Collaborative Assessments for Trustworthy Science) Process.” PLOS ONE 18 (1): e0274429. https://doi.org/10.1371/journal.pone.0274429.
Fraser, Hannah, Martin Bush, Bonnie C. Wintle, Fallon Mody, Eden T. Smith, Anca M. Hanea, Elliot Gould, Victoria Hemming, Daniel G. Hamilton, Libby Rumpff, David P. Wilkinson, Ross Pearson, Felix Singleton Thorn, et al. 2023. “Predicting Reliability Through Structured Expert Elicitation with the repliCATS (Collaborative Assessments for Trustworthy Science) Process.” Edited by Ferrán Catalá-López. PLOS ONE 18 (1): e0274429. https://doi.org/10.1371/journal.pone.0274429.
Fu, Qianrao, Herbert Hoijtink, and Mirjam Moerbeek. 2021. “Sample-Size Determination for the Bayesian t Test and Welch’s Test Using the Approximate Adjusted Fractional Bayes Factor.” Behavior Research Methods 53 (1): 139–52. https://doi.org/10.3758/s13428-020-01408-1.
Gelman, Andrew, and John Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641–51. https://doi.org/10.1177/1745691614551642.
Gelman, Andrew, and Hal Stern. 2006. “The Difference Between Significant and Not Significant Is Not Itself Statistically Significant.” The American Statistician 60 (4): 328–31. https://doi.org/10.1198/000313006X152649.
González-Barahona, Jesús M., and Gregorio Robles. 2012. “On the Reproducibility of Empirical Software Engineering Studies Based on Data Retrieved from Development Repositories.” Empirical Software Engineering 17 (1-2): 75–89. https://doi.org/10.1007/s10664-011-9181-9.
Goodman, Steven N., Daniele Fanelli, and John P. A. Ioannidis. 2016. “What Does Research Reproducibility Mean?” Science Translational Medicine 8 (341): 341ps12–12. https://doi.org/10.1126/scitranslmed.aaf5027.
Guan, Jianmin, Ping Xiang, and Xiaofen Deng Keating. 2004. “Evaluating the Replicability of Sample Results: A Tutorial of Double Cross-Validation Methods.” Measurement in Physical Education and Exercise Science 8 (4): 227–41. https://doi.org/10.1207/s15327841mpee0804_4.
Hagger, Martin S., Nikos L. D. Chatzisarantis, Hugo Alberts, Calvin Octavianus Anggono, Cédric Batailler, Angela R. Birt, Ralf Brand, et al. 2016. “A Multilab Preregistered Replication of the Ego-Depletion Effect.” Perspectives on Psychological Science: A Journal of the Association for Psychological Science 11 (4): 546–73. https://doi.org/10.1177/1745691616652873.
Hanousek, Jan, Dana Hajkova, and Randall K. Filer. 2008. “A Rise by Any Other Name? Sensitivity of Growth Regressions to Data Source.” Journal of Macroeconomics 30 (3): 1188–1206. https://doi.org/10.1016/j.jmacro.2007.08.015.
Hedges, Larry V., and Jacob M. Schauer. 2019a. “More Than One Replication Study Is Needed for Unambiguous Tests of Replication.” Journal of Educational and Behavioral Statistics 44 (5): 543–70. https://doi.org/10.3102/1076998619852953.
———. 2019b. “Statistical Analyses for Studying Replication: Meta-Analytic Perspectives.” Psychological Methods 24 (5): 557–70. https://doi.org/10.1037/met0000189.
———. 2021. “The Design of Replication Studies.” Journal of the Royal Statistical Society Series A: Statistics in Society 184 (3): 868–86. https://doi.org/10.1111/rssa.12688.
Heirene, Robert M. 2021. “A Call for Replications of Addiction Research: Which Studies Should We Replicate and What Constitutes a ‘Successful’ Replication?” Addiction Research & Theory 29 (2): 89–97. https://doi.org/10.1080/16066359.2020.1751130.
Held, Leonhard. 2019. “The Assessment of Intrinsic Credibility and a New Argument for p < 0.005.” Royal Society Open Science 6 (3): 181534. https://doi.org/10.1098/rsos.181534.
———. 2020. “A New Standard for the Analysis and Design of Replication Studies.” Journal of the Royal Statistical Society Series A: Statistics in Society 183 (2): 431–48. https://doi.org/10.1111/rssa.12493.
Held, Leonhard, Robert Matthews, Manuela Ott, and Samuel Pawel. 2022. “Reverse‐Bayes Methods for Evidence Assessment and Research Synthesis.” Research Synthesis Methods 13 (3): 295–314. https://doi.org/10.1002/jrsm.1538.
Held, Leonhard, Charlotte Micheloud, and Samuel Pawel. 2022. “The Assessment of Replication Success Based on Relative Effect Size.” The Annals of Applied Statistics 16 (2). https://doi.org/10.1214/21-AOAS1502.
Held, Leonhard, Samuel Pawel, and Charlotte Micheloud. 2024. “The Assessment of Replicability Using the Sum of p-Values.” Royal Society Open Science 11 (8): 240149. https://doi.org/10.1098/rsos.240149.
Heller, Ruth, and Daniel Yekutieli. 2014. “Replicability Analysis for Genome-Wide Association Studies.” The Annals of Applied Statistics 8 (1). https://doi.org/10.1214/13-AOAS697.
Heyard, Rachel, Samuel Pawel, Kimberley Wever, Hanno Würbel, Bernhard Voelkl, and Leonhard Held. 2023. “Reproducibility Metrics - Study Protocol.” OSF. https://doi.org/10.17605/OSF.IO/7VC4Z.
Higgins, J. P T. 2003. “Measuring Inconsistency in Meta-Analyses.” BMJ 327 (7414): 557–60. https://doi.org/10.1136/bmj.327.7414.557.
Hildebrandt, Tom, and Jason M. Prenoveau. 2020. “Rigor and Reproducibility for Data Analysis and Design in the Behavioral Sciences.” Behaviour Research and Therapy 126 (March): 103552. https://doi.org/10.1016/j.brat.2020.103552.
Hoogeveen, Suzanne, Alexandra Sarafoglou, Balazs Aczel, Yonathan Aditya, Alexandra J. Alayan, Peter J. Allen, Sacha Altay, et al. 2023. “A Many-Analysts Approach to the Relation Between Religiosity and Well-Being.” Religion, Brain & Behavior 13 (3): 237–83. https://doi.org/10.1080/2153599X.2022.2070255.
Hung, Kenneth, and William Fithian. 2020. “Statistical Methods for Replicability Assessment.” The Annals of Applied Statistics 14 (3). https://doi.org/10.1214/20-AOAS1336.
Huntington-Klein, Nick, Andreu Arenas, Emily Beam, Marco Bertoni, Jeffrey R. Bloem, Pralhad Burli, Naibin Chen, et al. 2021. “The Influence of Hidden Researcher Decisions in Applied Microeconomics.” Economic Inquiry 59 (3): 944–60. https://doi.org/10.1111/ecin.12992.
Irvine, Krin, David A. Hoffman, and Tess Wilkinson-Ryan. 2018. “Law and Psychology Grows Up, Goes Online, and Replicates.” Journal of Empirical Legal Studies 15 (2): 320–55. https://doi.org/10.1111/jels.12180.
Jaric, Ivana, Bernhard Voelkl, Irmgard Amrein, David P. Wolfer, Janja Novak, Carlotta Detotto, Ulrike Weber-Stadlbauer, et al. 2024. “Using Mice from Different Breeding Sites Fails to Improve Replicability of Results from Single-Laboratory Studies.” Lab Animal 53 (1): 18–22. https://doi.org/10.1038/s41684-023-01307-w.
Jiang, Wei, Jing-Hao Xue, and Weichuan Yu. 2016. “What Is the Probability of Replicating a Statistically Significant Association in Genome-Wide Association Studies?” Briefings in Bioinformatics, September, bbw091. https://doi.org/10.1093/bib/bbw091.
Kačmár, Pavol, and Matúš Adamkovič. 2020. “Replikačné Štúdie v Psychológii: Pojednanie o Dvoch Dôležitých Otázkách. [Replication Studies in Psychology: A Discourse on Two Important Issues.].” Československá Psychologie: Časopis Pro Psychologickou Teorii a Praxi 64 (1): 66–83.
Kirkby, Robert. 2023. “Quantitative Macroeconomics: Lessons Learned from Fourteen Replications.” Computational Economics 61 (2): 875–96. https://doi.org/10.1007/s10614-022-10234-w.
Klein, Richard A, Corey L. Cook, Charles R. Ebersole, Christine Vitiello, Brian A. Nosek, Joseph Hilgard, Paul Hangsan Ahn, et al. 2022. “Many Labs 4: Failure to Replicate Mortality Salience Effect With and Without Original Author Involvement.” Collabra: Psychology 8 (1): 35271. https://doi.org/10.1525/collabra.35271.
Klein, Richard A., Kate A. Ratliff, Michelangelo Vianello, Reginald B. Adams, Štěpán Bahník, Michael J. Bernstein, Konrad Bocian, et al. 2014. “Investigating Variation in Replicability.” Social Psychology 45 (3): 142–52. https://doi.org/10.1027/1864-9335/a000178.
Klein, Richard A., Michelangelo Vianello, Fred Hasselman, Byron G. Adams, Reginald B. Adams, Sinan Alper, Mark Aveyard, et al. 2018. “Many Labs 2: Investigating Variation in Replicability Across Samples and Settings.” Advances in Methods and Practices in Psychological Science 1 (4): 443–90. https://doi.org/10.1177/2515245918810225.
Klugkist, Irene, and Thom Benjamin Volker. 2023. “Bayesian Evidence Synthesis for Informative Hypotheses: An Introduction.” Psychological Methods, September. https://doi.org/10.1037/met0000602.
Lakens, Daniël, Anne M. Scheel, and Peder M. Isager. 2018. “Equivalence Testing for Psychological Research: A Tutorial.” Advances in Methods and Practices in Psychological Science 1 (2): 259–69. https://doi.org/10.1177/2515245918770963.
LeBel, Etienne P., Randy J. McCarthy, Brian D. Earp, Malte Elson, and Wolf Vanpaemel. 2018. “A Unified Framework to Quantify the Credibility of Scientific Findings.” Advances in Methods and Practices in Psychological Science 1 (3): 389–402. https://doi.org/10.1177/2515245918787489.
Lin, Lifeng, and Haitao Chu. 2022. “Assessing and Visualizing Fragility of Clinical Results with Binary Outcomes in R Using the Fragility Package.” Edited by Paul Aurelian Gagniuc. PLOS ONE 17 (6): e0268754. https://doi.org/10.1371/journal.pone.0268754.
Lin, Lifeng, Aiwen Xing, Haitao Chu, M. Hassan Murad, Chang Xu, Benjamin R. Baer, Martin T. Wells, and Luis Sanchez-Ramos. 2023. “Assessing the Robustness of Results from Clinical Trials and Meta-Analyses with the Fragility Index.” American Journal of Obstetrics and Gynecology 228 (3): 276–82. https://doi.org/10.1016/j.ajog.2022.08.053.
Liou, Michelle, Hong-Ren Su, Juin-Der Lee, Philip E. Cheng, Chien-Chih Huang, and Chih-Hsin Tsai. 2003. “Bridging Functional MR Images and Scientific Inference: Reproducibility Maps.” Journal of Cognitive Neuroscience 15 (7): 935–45. https://doi.org/10.1162/089892903770007326.
Liu, Yang, Alex Kale, Tim Althoff, and Jeffrey Heer. 2021. “Boba: Authoring and Visualizing Multiverse Analyses.” IEEE Transactions on Visualization and Computer Graphics 27 (2): 1753–63. https://doi.org/10.1109/TVCG.2020.3028985.
Low, Jeffrey, Joseph S. Ross, Jessica D. Ritchie, Cary P. Gross, Richard Lehman, Haiqun Lin, Rongwei Fu, Lesley A. Stewart, and Harlan M. Krumholz. 2017. “Comparison of Two Independent Systematic Reviews of Trials of Recombinant Human Bone Morphogenetic Protein-2 (rhBMP-2): The Yale Open Data Access Medtronic Project.” Systematic Reviews 6 (1): 28. https://doi.org/10.1186/s13643-017-0422-x.
Luijken, K., A. Lohmann, U. Alter, J. Claramunt Gonzalez, F. J. Clouth, J. L. Fossum, L. Hesen, et al. 2024. “Replicability of Simulation Studies for the Investigation of Statistical Methods: The RepliSims Project.” Royal Society Open Science 11 (1): 231003. https://doi.org/10.1098/rsos.231003.
Maier-Hein, Klaus H., Peter F. Neher, Jean-Christophe Houde, Marc-Alexandre Côté, Eleftherios Garyfallidis, Jidan Zhong, Maxime Chamberland, et al. 2017. “The Challenge of Mapping the Human Connectome Based on Diffusion Tractography.” Nature Communications 8 (November): 1349. https://doi.org/10.1038/s41467-017-01285-x.
Maitra, Ranjan. 2010. “A Re-Defined and Generalized Percent-Overlap-of-Activation Measure for Studies of fMRI Reproducibility and Its Use in Identifying Outlier Activation Maps.” NeuroImage 50 (1): 124–35. https://doi.org/10.1016/j.neuroimage.2009.11.070.
Manolov, Rumen, and René Tanious. 2020. “Assessing Consistency in Single-Case Data Features Using Modified Brinley Plots.” Behavior Modification 46 (3): 581–627. https://doi.org/10.1177/0145445520982969.
Manolov, Rumen, René Tanious, and Belén Fernández‐Castilla. 2022. “A Proposal for the Assessment of Replication of Effects in Single‐case Experimental Designs.” Journal of Applied Behavior Analysis 55 (3): 997–1024. https://doi.org/10.1002/jaba.923.
Marcoci, Alexandru, David Peter Wilkinson, Anna Lou Abatayo, Ernest Baskin, Henk Berkman, Erin Michelle Buchanan, Sara Capitán, et al. 2024. “Predicting the Replicability of Social and Behavioural Science Claims from the COVID-19 Preprint Replication Project with Structured Expert and Novice Groups,” February. https://doi.org/10.31222/osf.io/xdsjf.
Mateu, Pedro, Brooks Applegate, and Chris L. Coryn. 2024. “Towards More Credible Conceptual Replications Under Heteroscedasticity and Unbalanced Designs.” Quality & Quantity 58 (1): 723–51. https://doi.org/10.1007/s11135-023-01657-0.
Mathur, Maya B., and Tyler J. VanderWeele. 2019a. “New Metrics for Meta-Analyses of Heterogeneous Effects.” Statistics in Medicine 38 (8): 1336–42. https://doi.org/10.1002/sim.8057.
———. 2019b. “Challenges and Suggestions for Defining Replication ‘Success’ When Effects May Be Heterogeneous: Comment on Hedges and Schauer (2019).” Psychological Methods 24 (5): 571–75. https://doi.org/10.1037/met0000223.
———. 2020. “New Statistical Metrics for Multisite Replication Projects.” Journal of the Royal Statistical Society Series A: Statistics in Society 183 (3): 1145–66. https://doi.org/10.1111/rssa.12572.
Matthews, Robert A. J. 2001. “Methods for Assessing the Credibility of Clinical Trial Outcomes.” Drug Information Journal 35 (4): 1469–78. https://doi.org/10.1177/009286150103500442.
Mbuagbaw, Lawrence, Daeria O. Lawson, Livia Puljak, David B. Allison, and Lehana Thabane. 2020. “A Tutorial on Methodological Studies: The What, When, How and Why.” BMC Medical Research Methodology 20 (1): 226. https://doi.org/10.1186/s12874-020-01107-7.
McGuire, Daniel, Yu Jiang, Mengzhen Liu, J. Dylan Weissenkampen, Scott Eckert, Lina Yang, Fang Chen, et al. 2021. “Model-Based Assessment of Replicability for Genome-Wide Association Meta-Analysis.” Nature Communications 12 (1): 1964. https://doi.org/10.1038/s41467-021-21226-z.
McIntosh, Leslie D., Anthony Juehne, Cynthia R. H. Vitale, Xiaoyan Liu, Rosalia Alcoser, J. Christian Lukas, and Bradley Evanoff. 2017. “Repeat: A Framework to Assess Empirical Reproducibility in Biomedical Research.” BMC Medical Research Methodology 17 (1): 143. https://doi.org/10.1186/s12874-017-0377-6.
McShane, Blakeley B., Ulf Böckenholt, and Karsten T. Hansen. 2022. “Variation and Covariation in Large-Scale Replication Projects: An Evaluation of Replicability.” Journal of the American Statistical Association 117 (540): 1605–21. https://doi.org/10.1080/01621459.2022.2054816.
McShane, Blakeley B., Jennifer L. Tackett, Ulf Böckenholt, and Andrew Gelman. 2019. “Large-Scale Replication Projects in Contemporary Psychological Research.” The American Statistician 73 (sup1): 99–105. https://doi.org/10.1080/00031305.2018.1505655.
Micheloud, Charlotte, Fadoua Balabdaoui, and Leonhard Held. 2023. “Assessing Replicability with the Sceptical $p$‐value: TypeI Error Control and Sample Size Planning.” Statistica Neerlandica 77 (4): 573–91. https://doi.org/10.1111/stan.12312.
Milcu, Alexandru, Ruben Puga-Freitas, Aaron M. Ellison, Manuel Blouin, Stefan Scheu, Grégoire T. Freschet, Laura Rose, et al. 2018. “Genotypic Variability Enhances the Reproducibility of an Ecological Study.” Nature Ecology & Evolution 2 (2): 279–87. https://doi.org/10.1038/s41559-017-0434-x.
Muradchanian, Jasmine, Rink Hoekstra, Henk Kiers, and Don Van Ravenzwaaij. 2021. “How Best to Quantify Replication Success? A Simulation Study on the Comparison of Replication Success Metrics.” Royal Society Open Science 8 (5): 201697. https://doi.org/10.1098/rsos.201697.
National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. Washington, D.C.: National Academies Press. https://doi.org/10.17226/25303.
Naudet, Florian, Charlotte Sakarovitch, Perrine Janiaud, Ioana Cristea, Daniele Fanelli, David Moher, and John P. A. Ioannidis. 2018. “Data Sharing and Reanalysis of Randomized Controlled Trials in Leading Biomedical Journals with a Full Data Sharing Policy: Survey of Studies Published in The BMJ and PLOS Medicine.” BMJ 360 (February): k400. https://doi.org/10.1136/bmj.k400.
Nieuwenhuis, Sander, Birte U. Forstmann, and Eric-Jan Wagenmakers. 2011. “Erroneous Analyses of Interactions in Neuroscience: A Problem of Significance.” Nature Neuroscience 14 (9): 1105–7. https://doi.org/10.1038/nn.2886.
Nordling, Torbjörn, and Tomas Melo Peralta. 2022. “A Literature Review of Methods for Assessment of Reproducibility in Science.” https://doi.org/10.21203/rs.3.rs-2267847/v4.
Nosek, Brian A., Tom E. Hardwicke, Hannah Moshontz, Aurélien Allard, Katherine S. Corker, Anna Dreber, Fiona Fidler, et al. 2022. “Replicability, Robustness, and Reproducibility in Psychological Science.” Annual Review of Psychology 73 (1): 719–48. https://doi.org/10.1146/annurev-psych-020821-114157.
Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716. https://doi.org/10.1126/science.aac4716.
Page, Matthew J., David Moher, Fiona M. Fidler, Julian P. T. Higgins, Sue E. Brennan, Neal R. Haddaway, Daniel G. Hamilton, et al. 2021. “The REPRISE Project: Protocol for an Evaluation of REProducibility and Replicability In Syntheses of Evidence.” Systematic Reviews 10 (1): 112. https://doi.org/10.1186/s13643-021-01670-0.
Patil, Prasad, Roger D. Peng, and Jeffrey T. Leek. 2016. “What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science.” Perspectives on Psychological Science 11 (4): 539–44. https://www.jstor.org/stable/26358643.
Pauli, Francesco. 2019. “A Statistical Model to Investigate the Reproducibility Rate Based on Replication Experiments.” International Statistical Review 87 (1): 68–79. https://doi.org/10.1111/insr.12273.
Pawel, Samuel, and Leonhard Held. 2022. “The Sceptical Bayes Factor for the Assessment of Replication Success.” Journal of the Royal Statistical Society Series B: Statistical Methodology 84 (3): 879–911. https://doi.org/10.1111/rssb.12491.
Pawel, Samuel, Rachel Heyard, Charlotte Micheloud, and Leonhard Held. 2024a. “Replication of ‘Null Results’Absence of Evidence or Evidence of Absence?” eLife 12 (February). https://doi.org/10.7554/eLife.92311.2.
———. 2024b. “Replication of Null Results: Absence of Evidence or Evidence of Absence?” Edited by Philip Boonstra and Peter Rodgers. eLife 12 (May): RP92311. https://doi.org/10.7554/eLife.92311.
Pollock, Danielle, Micah D. J. Peters, Hanan Khalil, Patricia McInerney, Lyndsay Alexander, Andrea C. Tricco, Catrin Evans, et al. 2023. “Recommendations for the Extraction, Analysis, and Presentation of Results in Scoping Reviews.” JBI Evidence Synthesis 21 (3): 520. https://doi.org/10.11124/JBIES-22-00123.
Protzko, John, Jon Krosnick, Leif Nelson, Brian A. Nosek, Jordan Axt, Matt Berent, Nicholas Buttrick, et al. 2023. “High Replicability of Newly Discovered Social-Behavioural Findings Is Achievable.” Nature Human Behaviour, November, 1–9. https://doi.org/10.1038/s41562-023-01749-9.
Puoliväli, Tuomas, Satu Palva, and J. Matias Palva. 2020. “Influence of Multiple Hypothesis Testing on Reproducibility in Neuroimaging Research: A Simulation Study and Python-Based Software.” Journal of Neuroscience Methods 337 (May): 108654. https://doi.org/10.1016/j.jneumeth.2020.108654.
“Recommendations for Increasing Replicability in Psychology - Jens B. Asendorpf, Mark Conner, Filip De Fruyt, Jan De Houwer, Jaap J. A. Denissen, Klaus Fiedler, Susann Fiedler, David C. Funder, Reinhold Kliegl, Brian A. Nosek, Marco Perugini, Brent W. Roberts, Manfred Schmitt, Marcel A. G. Van Aken, Hannelore Weber, Jelte M. Wicherts, 2013.” n.d. Accessed September 13, 2024. https://journals.sagepub.com/doi/10.1002/per.1919.
Rosenthal, Robert. 1990. “Replication in Behavioral Research.” Journal of Social Behavior & Personality 5 (4): 1–30.
Rouder, Jeffrey N., and Richard D. Morey. 2012. “Default Bayes Factors for Model Selection in Regression.” Multivariate Behavioral Research 47 (6): 877–903. https://doi.org/10.1080/00273171.2012.734737.
Salganik, Matthew J., Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, et al. 2020. “Measuring the Predictability of Life Outcomes with a Scientific Mass Collaboration.” Proceedings of the National Academy of Sciences 117 (15): 8398–8403. https://doi.org/10.1073/pnas.1915006117.
Schauer, J. M., and L. V. Hedges. 2021. “Reconsidering Statistical Methods for Assessing Replication.” Psychological Methods 26 (1): 127–39. https://doi.org/10.1037/met0000302.
Schauer, Jacob M. 2023. “On the Accuracy of Replication Failure Rates.” Multivariate Behavioral Research 58 (3): 598–615. https://doi.org/10.1080/00273171.2022.2066500.
Schauer, Jacob M., Kaitlyn G. Fitzgerald, Sarah Peko-Spicer, Mena C. R. Whalen, Rrita Zejnullahi, and Larry V. Hedges. 2021. “An Evaluation of Statistical Methods for Aggregate Patterns of Replication Failure.” The Annals of Applied Statistics 15 (1). https://doi.org/10.1214/20-AOAS1387.
Schauer, Jacob M., and Larry V. Hedges. 2020. “Assessing Heterogeneity and Power in Replications of Psychological Experiments.” Psychological Bulletin 146 (8): 701–19. https://doi.org/10.1037/bul0000232.
Schweinsberg, Martin, Michael Feldman, Nicola Staub, Olmo R. van den Akker, Robbie C. M. van Aert, Marcel A. L. M. van Assen, Yang Liu, et al. 2021. “Same Data, Different Conclusions: Radical Dispersion in Empirical Results When Independent Analysts Operationalize and Test the Same Hypothesis.” Organizational Behavior and Human Decision Processes 165 (July): 228–49. https://doi.org/10.1016/j.obhdp.2021.02.003.
Silberzahn, R., E. L. Uhlmann, D. P. Martin, P. Anselmi, F. Aust, E. Awtrey, Š. Bahník, et al. 2018. “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results.” Advances in Methods and Practices in Psychological Science 1 (3): 337–56. https://doi.org/10.1177/2515245917747646.
Simonsohn, Uri. 2015. “Small Telescopes: Detectability and the Evaluation of Replication Results.” Psychological Science 26 (5): 559–69. https://doi.org/10.1177/0956797614567341.
Simonsohn, Uri, Leif D. Nelson, and Joseph P. Simmons. 2014. “P-Curve: A Key to the File-Drawer.” Journal of Experimental Psychology: General 143 (2): 534–47. https://doi.org/10.1037/a0033242.
Song, Q. Chelsea, Chen Tang, and Serena Wee. 2021. “Making Sense of Model Generalizability: A Tutorial on Cross-Validation in R and Shiny.” Advances in Methods and Practices in Psychological Science 4 (1): 251524592094706. https://doi.org/10.1177/2515245920947067.
Soto, Christopher J. 2019. “How Replicable Are Links Between Personality Traits and Consequential Life Outcomes? The Life Outcomes of Personality Replication Project.” Psychological Science 30 (5): 711–27. https://doi.org/10.1177/0956797619831612.
Steiner, Peter M., Patrick Sheehan, and Vivian C. Wong. 2023a. “Correspondence Measures for Assessing Replication Success.” Psychological Methods, No Pagination Specified–. https://doi.org/10.1037/met0000597.
———. 2023b. “Correspondence Measures for Assessing Replication Success.” Psychological Methods, July. https://doi.org/10.1037/met0000597.
Steiner, Peter M., Vivian C. Wong, and Kylie Anglin. 2019. “A Causal Replication Framework for Designing and Assessing Replication Efforts.” Zeitschrift Für Psychologie 227 (4): 280–92. https://doi.org/10.1027/2151-2604/a000385.
Steinle, Friedrich. 2016. “Stability and Replication of Experimental Results: A Historical Perspective.” In Reproducibility, 39–63. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118865064.ch3.
Stroebe, Wolfgang, and Fritz Strack. 2014. “The Alleged Crisis and the Illusion of Exact Replication.” Perspectives on Psychological Science 9 (1): 59–71. https://doi.org/10.1177/1745691613514450.
Suetake, Hirotaka, Tsukasa Fukusato, Takeo Igarashi, and Tazro Ohta. 2022. “A Workflow Reproducibility Scale for Automatic Validation of Biological Interpretation Results.” GigaScience 12 (December): giad031. https://doi.org/10.1093/gigascience/giad031.
Sumner, Josh Q., Cynthia Hudson Vitale, and Leslie D. McIntosh. 2022. RipetaScore: Measuring the Quality, Transparency, and Trustworthiness of a Scientific Work.” Frontiers in Research Metrics and Analytics 6 (January): 751734. https://doi.org/10.3389/frma.2021.751734.
Tackett, Jennifer L., and Blakeley B. McShane. 2018. “Conceptualizing and Evaluating Replication Across Domains of Behavioral Research.” Behavioral and Brain Sciences 41: e152. https://doi.org/10.1017/S0140525X18000882.
Tetens, Holm. 2016. “Reproducibility, Objectivity, Invariance.” In Reproducibility, 13–20. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118865064.ch1.
Thompson, Bruce. 1994. “The Pivotal Role of Replication in Psychological Research: Empirically Evaluating the Replicability of Sample Results.” Journal of Personality 62 (2): 157–76. https://doi.org/10.1111/j.1467-6494.1994.tb00289.x.
Tricco, Andrea C., Erin Lillie, Wasifa Zarin, Kelly K. O’Brien, Heather Colquhoun, Danielle Levac, David Moher, et al. 2018. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation.” Annals of Internal Medicine 169 (7): 467–73. https://doi.org/10.7326/M18-0850.
Tsai, Tsung-Hsun, Chingwei David Shin, Laura M. Neumann, and Barry W. Grau. 2012. “Generalizability Analyses of NBDE Part II.” Evaluation & the Health Professions 35 (2): 169–81. https://doi.org/10.1177/0163278711425382.
Tukey, John Wilder. 1977. Exploratory Data Analysis. Addison-Wesley Series in Behavioral Science. Reading (Mass.) Menlo Park (Calif.) London [etc.]: Addison-Wesley publ.
Van Aert, Robbie C. M., and Marcel A. L. M. Van Assen. 2017. “Bayesian Evaluation of Effect Size After Replicating an Original Study.” Edited by Daniele Marinazzo. PLOS ONE 12 (4): e0175302. https://doi.org/10.1371/journal.pone.0175302.
Verhagen, Josine, and Eric-Jan Wagenmakers. 2014a. “Bayesian Tests to Quantify the Result of a Replication Attempt.” Journal of Experimental Psychology: General 143 (4): 1457–75. https://doi.org/10.1037/a0036731.
———. 2014b. “"Bayesian Tests to Quantify the Result of a Replication Attempt": Correction to Verhagen and Wagenmakers (2014).” Journal of Experimental Psychology: General 143 (6): 2073–73. https://doi.org/10.1037/a0038326.
Veronese, Mattia, Gaia Rizzo, Martin Belzunce, Julia Schubert, Graham Searle, Alex Whittington, Ayla Mansur, et al. 2021. “Reproducibility of Findings in Modern PET Neuroimaging: Insight from the NRM2018 Grand Challenge.” Journal of Cerebral Blood Flow and Metabolism: Official Journal of the International Society of Cerebral Blood Flow and Metabolism 41 (10): 2778–96. https://doi.org/10.1177/0271678X211015101.
Voelkl, Bernhard, Rachel Heyard, Daniele Fanelli, Kimberley Wever, Leonhard Held, Zacharias Maniadis, Sarah McCann, Stephanie Zellers, and Hanno Würbel. 2024. “The iRISE Reproducibility Glossary,” June. https://doi.org/https://doi.org/10.17605/OSF.IO/BR9SP.
Wagenmakers, E.-J., T. Beek, L. Dijkhoff, Q. F. Gronau, A. Acosta, R. B. Adams, D. N. Albohn, et al. 2016. “Registered Replication Report: Strack, Martin, & Stepper (1988).” Perspectives on Psychological Science 11 (6): 917–28. https://doi.org/10.1177/1745691616674458.
Walsh, Michael, Sadeesh K. Srinathan, Daniel F. McAuley, Marko Mrkobrada, Oren Levine, Christine Ribic, Amber O. Molnar, et al. 2014. “The Statistical Significance of Randomized Controlled Trial Results Is Frequently Fragile: A Case for a Fragility Index.” Journal of Clinical Epidemiology 67 (6): 622–28. https://doi.org/10.1016/j.jclinepi.2013.10.019.
Wang, Jiping, Hongmin Liang, Qingzhao Zhang, and Shuangge Ma. 2022. “Replicability in Cancer Omics Data Analysis: Measures and Empirical Explorations.” Briefings in Bioinformatics 23 (5): bbac304. https://doi.org/10.1093/bib/bbac304.
Wang, Shirley V., Sebastian Schneeweiss, and RCT-DUPLICATE Initiative. 2023. “Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses: Results of 32 Clinical Trials.” JAMA 329 (16): 1376–85. https://doi.org/10.1001/jama.2023.4221.
Wang, Shirley V., Sushama Kattinakere Sreedhara, Sebastian Schneeweiss, REPEAT Initiative, Jessica M. Franklin, Joshua J. Gagne, Krista F. Huybrechts, et al. 2022. “Reproducibility of Real-World Evidence Studies Using Clinical Practice Data to Inform Regulatory and Coverage Decisions.” Nature Communications 13 (1): 5126. https://doi.org/10.1038/s41467-022-32310-3.
Wilensky, Uri, and William Rand. 2007. “Making Models Match: Replicating an Agent-Based Model.” Journal of Artificial Societies and Social Simulation 10 (4).
Wilson, Brent M., and John T. Wixted. 2023a. “On the Importance of Modeling the Invisible World of Underlying Effect Sizes.” Social Psychological Bulletin 18 (November): e9981. https://doi.org/10.32872/spb.9981.
———. 2023b. “On the Importance of Modeling the Invisible World of Underlying Effect Sizes.” Social Psychological Bulletin 18 (November): e9981. https://doi.org/10.32872/spb.9981.
Wong, Vivian C., Kylie Anglin, and Peter M. Steiner. 2022. “Design-Based Approaches to Causal Replication Studies.” Prevention Science 23 (5): 723–38. https://doi.org/10.1007/s11121-021-01234-7.
Xiao, Mengli, Haitao Chu, James S. Hodges, and Lifeng Lin. 2024. “Quantifying Replicability of Multiple Studies in a Meta-Analysis.” The Annals of Applied Statistics 18 (1). https://doi.org/10.1214/23-AOAS1806.
Xu, Xin. 2022. “Epistemic Diversity and Cross-Cultural Comparative Research: Ontology, Challenges, and Outcomes.” Globalisation, Societies and Education 20 (1): 36–48. https://doi.org/10.1080/14767724.2021.1932438.
Yang, Yang, Wu Youyou, and Brian Uzzi. 2020a. “Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence.” Proceedings of the National Academy of Sciences 117 (20): 10762–68. https://doi.org/10.1073/pnas.1909046117.
———. 2020b. “Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence.” Proceedings of the National Academy of Sciences 117 (20): 10762–68. https://doi.org/10.1073/pnas.1909046117.
Youyou, Wu, Yang Yang, and Brian Uzzi. 2023. “A Discipline-Wide Investigation of the Replicability of Psychology Papers over the Past Two Decades.” Proceedings of the National Academy of Sciences 120 (6): e2208863120. https://doi.org/10.1073/pnas.2208863120.
Zhao, Yi, and Xiaoquan Wen. 2021. “Statistical Assessment of Replicability via Bayesian Model Criticism.” arXiv. https://doi.org/10.48550/ARXIV.2105.03993.
Zwaan, Rolf A., Alexander Etz, Richard E. Lucas, and M. Brent Donnellan. 2018. “Making Replication Mainstream.” Behavioral and Brain Sciences 41: e120. https://doi.org/10.1017/S0140525X17001972.