Notes - Statistics Done Wrong: The Woefully Complete Guide (2015)

Statistics Done Wrong: The Woefully Complete Guide (2015)

Appendix A. Notes

Articles from some publishers, such as BMJ, BMC, and PLOS, are freely available online. Free copies of others can sometimes be found by searching for their titles. Most references include Digital Object Identifiers (DOIs), which may be entered at http://dx.doi.org/ to find the authoritative online version of the article.

Introduction

1. J.P.A. Ioannidis. “Why Most Published Research Findings Are False.” PLOS Medicine 2, no. 8 (2005): e124. DOI: 10.1371/journal. pmed.0020124.

2. N.J. Horton and S.S. Switzer. “Statistical Methods in the Journal.” New England Journal of Medicine 353, no. 18 (2005): 1977–1979. DOI: 10.1056/NEJM200511033531823.

3. B.L. Anderson, S. Williams, and J. Schulkin. “Statistical Literacy of Obstetrics-Gynecology Residents.” Journal of Graduate Medical Education 5, no. 2 (2013): 272–275. DOI: 10.4300/JGME-D-12-00161.1.

4. D.M. Windish, S.J. Huot, and M.L. Green. “Medicine residents’ understanding of the biostatistics and results in the medical literature.” JAMA 298, no. 9 (2007): 1010–1022. DOI: 10.1001/jama. 298.9.1010.

5. S. Goodman. “A Dirty Dozen: Twelve P-Value Misconceptions.” Seminars in Hematology 45, no. 3 (2008): 135–140. DOI: 10.1053/j. seminhematol.2008.04.003.

6. P.E. Meehl. “Theory-testing in psychology and physics: A methodological paradox.” Philosophy of Science 34, no. 2 (1967): 103–115.

7. G. Taubes and C.C. Mann. “Epidemiology faces its limits.” Science 269, no. 5221 (1995): 164–169. DOI: 10.1126/science.7618077.

8. D. Fanelli and J.P.A. Ioannidis. “US studies may overestimate effect sizes in softer research.” Proceedings of the National Academy of Sciences 110, no. 37 (2013): 15031–15036. DOI: 10.1073/pnas. 1302997110.

Chapter 1

1. B. Thompson. “Two and One-Half Decades of Leadership in Measurement and Evaluation.” Journal of Counseling & Development 70, no. 3 (1992): 434–438. DOI: 10.1002/j.1556-6676.1992.tb01631.x.

2. E.J. Wagenmakers. “A practical solution to the pervasive problems of p values.” Psychonomic Bulletin & Review 14, no. 5 (2007): 779–804. DOI: 10.3758/BF03194105.

3. J. Neyman and E.S. Pearson. “On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society of London, Series A 231 (1933): 289–337.

4. S.N. Goodman. “Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy.” Annals of Internal Medicine 130, no. 12 (1999): 995–1004. DOI: 10.7326/0003-4819-130-12-199906150-00008.

5. S.N. Goodman. “P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate.” American Journal of Epidemiology 137, no. 5 (1993): 485–496.

6. R. Hubbard and M.J. Bayarri. “Confusion Over Measures of Evidence (p’s) Versus Errors (α’s) in Classical Statistical Testing.” The American Statistician 57, no. 3 (2003): 171–178. DOI: 10.1198/0003130031856.

7. M.J. Gardner and D.G. Altman. “Confidence intervals rather than P values: estimation rather than hypothesis testing.” BMJ 292 (1986): 746–750.

8. G. Cumming, F. Fidler, M. Leonard, P. Kalinowski, A. Christiansen, A. Kleinig, J. Lo, N. McMenamin, and S. Wilson. “Statistical Reform in Psychology: Is Anything Changing?” Psychological Science 18, no. 3 (2007): 230–232. DOI: 10.1111/j.1467-9280.2007.01881.x.

9. P.E. Tressoldi, D. Giofré, F. Sella, and G. Cumming. “High Impact = High Statistical Standards? Not Necessarily So.” PLOS ONE 8, no. 2 (2013): e56180. DOI: 10.1371/journal.pone.0056180.

10. B. Thompson. “Why ‘Encouraging’ Effect Size Reporting Is Not Working: The Etiology of Researcher Resistance to Changing Practices.” The Journal of Psychology 133, no. 2 (1999): 133–140. DOI: 10.1080/00223989909599728.

11. J. Cohen. “The earth is round (p < .05).” American Psychologist 49, no. 12 (1994): 997–1003. DOI: 10.1037/0003-066X.49.12.997.

12. F. Fidler, N. Thomason, G. Cumming, S. Finch, and J. Leeman. “Editors Can Lead Researchers to Confidence Intervals, but Can’t Make Them Think: Statistical Reform Lessons From Medicine.” Psychological Science 15, no. 2 (2004): 119–126. DOI: 10.1111/j. 0963-7214.2004.01502008.x.

Chapter 2

1. P.E. Tressoldi, D. Giofré, F. Sella, and G. Cumming. “High Impact = High Statistical Standards? Not Necessarily So.” PLOS ONE 8, no. 2 (2013): e56180. DOI: 10.1371/journal.pone.0056180.

2. R. Tsang, L. Colley, and L.D. Lynd. “Inadequate statistical power to detect clinically significant differences in adverse event rates in randomized controlled trials.” Journal of Clinical Epidemiology 62, no. 6 (2009): 609–616. DOI: 10.1016/j.jclinepi.2008.08.005.

3. D. Moher, C. Dulberg, and G. Wells. “Statistical power, sample size, and their reporting in randomized controlled trials.” JAMA 272, no. 2 (1994): 122–124. DOI: 10. 1001 / jama. 1994. 03520020048013.

4. P.L. Bedard, M.K. Krzyzanowska, M. Pintilie, and I.F. Tannock. “Statistical Power of Negative Randomized Controlled Trials Presented at American Society for Clinical Oncology Annual Meetings.” Journal of Clinical Oncology 25, no. 23 (2007): 3482–3487. DOI:10.1200/JCO.2007.11.3670.

5. C.G. Brown, G.D. Kelen, J.J. Ashton, and H.A. Werman. “The beta error and sample size determination in clinical trials in emergency medicine.” Annals of Emergency Medicine 16, no. 2 (1987): 183–187. DOI: 10.1016/S0196-0644(87)80013-6.

6. K.C. Chung, L.K. Kalliainen, and R.A. Hayward. “Type II (beta) errors in the hand literature: the importance of power.” The Journal of Hand Surgery 23, no. 1 (1998): 20–25. DOI: 10.1016/S0363-5023(98)80083-X.

7. K.S. Button, J.P.A. Ioannidis, C. Mokrysz, B.A. Nosek, J. Flint, E.S.J. Robinson, and M.R. Munafò. “Power failure: why small sample size undermines the reliability of neuroscience.” Nature Reviews Neuroscience 14 (2013): 365–376. DOI: 10.1038/nrn3475.

8. J. Cohen. “The statistical power of abnormal-social psychological research: A review.” Journal of Abnormal and Social Psychology 65, no. 3 (1962): 145–153. DOI: 10.1037/h0045186.

9. P. Sedlmeier and G. Gigerenzer. “Do studies of statistical power have an effect on the power of studies?” Psychological Bulletin 105, no. 2 (1989): 309–316. DOI: 10.1037/0033-2909.105.2.309.

10. G. Murray. “The task of a statistical referee.” British Journal of Surgery 75, no. 7 (1988): 664–667. DOI: 10.1002/bjs.1800750714.

11. S.E. Maxwell. “The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies.” Psychological Methods 9, no. 2 (2004): 147–163. DOI: 10.1037/1082-989X.9.2.147.

12. E. Hauer. “The harm done by tests of significance.” Accident Analysis & Prevention 36, no. 3 (2004): 495–500. DOI: 10.1016/S0001-4575(03)00036-8.

13. D.F. Preusser, W.A. Leaf, K.B. DeBartolo, R.D. Blomberg, and M.M. Levy. “The effect of right-turn-on-red on pedestrian and bicyclist accidents.” Journal of Safety Research 13, no. 2 (1982): 45–55. DOI: 10.1016/0022-4375(82)90001-9.

14. P.L. Zador. “Right-turn-on-red laws and motor vehicle crashes: A review of the literature.” Accident Analysis & Prevention 16, no. 4 (1984): 241–245. DOI: 10.1016/0001-4575(84)90019-8.

15. National Highway Traffic Safety Administration. “The Safety Impact of Right Turn on Red.” February 1995. URL: http://www.nhtsa.gov/people/injury/research/pub/rtor.pdf.

16. G. Cumming. Understanding the New Statistics. Routledge, 2012. ISBN: 978-0415879682.

17. S.E. Maxwell, K. Kelley, and J.R. Rausch. “Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation.” Annual Review of Psychology 59, no. 1 (2008): 537–563. DOI: 10. 1146/annurev.psych.59.103006.093735.

18. J.P.A. Ioannidis. “Why Most Discovered True Associations Are Inflated.” Epidemiology 19, no. 5 (2008): 640–648. DOI: 10.1097/EDE.0b013e31818131e7.

19. J.P.A. Ioannidis. “Contradicted and initially stronger effects in highly cited clinical research.” JAMA 294, no. 2 (2005): 218–228. DOI: 10.1001/jama.294.2.218.

20. J.P.A. Ioannidis and T.A. Trikalinos. “Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials.” Journal of Clinical Epidemiology 58, no. 6 (2005): 543–549. DOI: 10. 1016/j.jclinepi.2004.10.019.

21. B. Brembs, K.S. Button, and M.R. Munafò. “Deep impact: unintended consequences of journal rank.” Frontiers in Human Neuroscience 7 (2013): 291. DOI: 10.3389/fnhum.2013.00291.

22. K.C. Siontis, E. Evangelou, and J.P.A. Ioannidis. “Magnitude of effects in clinical trials published in high-impact general medical journals.” International Journal of Epidemiology 40, no. 5 (2011): 1280–1291. DOI: 10.1093/ije/dyr095.

23. A. Gelman and D. Weakliem. “Of beauty, sex, and power: statistical challenges in estimating small effects.” American Scientist 97 (2009): 310–316. DOI: 10.1511/2009.79.310.

24. H. Wainer. “The Most Dangerous Equation.” American Scientist 95 (2007): 249–256. DOI: 10.1511/2007.65.249.

25. A. Gelman and P.N. Price. “All maps of parameter estimates are misleading.” Statistics in Medicine 18, no. 23 (1999): 3221–3234. DOI: 10.1002/(SICI)1097-0258(19991215) 18: 23<3221:: AIDSIM312<3.0.CO;2-M.

26. R. Munroe. “reddit’s new comment sorting system.” October 15, 2009. URL: http://redditblog.com/2009/10/reddits-new-comment-sorting-system.html.

27. E. Miller. “How Not To Sort By Average Rating.” February 6, 2009. URL: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html.

Chapter 3

1. S.E. Lazic. “The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?” BMC Neuroscience 11 (2010): 5. DOI: 10.1186/1471-2202-11-5.

2. S.H. Hurlbert. “Pseudoreplication and the design of ecological field experiments.” Ecological Monographs 54, no. 2 (1984): 187–211. DOI: 10.2307/1942661.

3. D.E. Kroodsma, B.E. Byers, E. Goodale, S. Johnson, and W.C. Liu. “Pseudoreplication in playback experiments, revisited a decade later.” Animal Behaviour 61, no. 5 (2001): 1029–1033. DOI: 10. 1006/anbe.2000.1676.

4. D.M. Primo, M.L. Jacobsmeier, and J. Milyo. “Estimating the impact of state policies and institutions with mixed-level data.” State Politics & Policy Quarterly 7, no. 4 (2007): 446–459. DOI: 10. 1177 / 153244000700700405.

5. W. Rogers. “Regression standard errors in clustered samples.” Stata Technical Bulletin, no. 13 (1993): 19–23. URL: http://www.stata-press.com/journals/stbcontents/stb13.pdf.

6. L.V. Hedges. “Correcting a Significance Test for Clustering.” Journal of Educational and Behavioral Statistics 32, no. 2 (2007): 151–179. DOI: 10.3102/1076998606298040.

7. A. Gelman and J. Hill. Data Analysis Using Regression and Multi-level/Hierarchical Models. Cambridge University Press, 2007. ISBN: 978-0521686891.

8. J.T. Leek, R.B. Scharpf, H.C. Bravo, D. Simcha, B. Langmead, W.E. Johnson, D. Geman, K. Baggerly, and R.A. Irizarry. “Tackling the widespread and critical impact of batch effects in high-throughput data.” Nature Reviews Genetics 11, no. 10 (2010): 733–739. DOI: 10.1038/nrg2825.

9. R.A. Heffner, M.J. Butler, and C.K. Reilly. “Pseudoreplication revisited.” Ecology 77, no. 8 (1996): 2558–2562. DOI: 10. 2307 / 2265754.

10. M.K. McClintock. “Menstrual synchrony and suppression.” Nature 229 (1971): 244–245. DOI: 10.1038/229244a0.

11. H.C. Wilson. “A critical review of menstrual synchrony research.” Psychoneuroendocrinology 17, no. 6 (1992): 565–591. DOI: 10.1016/0306-4530(92)90016-Z.

12. Z. Yang and J.C. Schank. “Women do not synchronize their menstrual cycles.” Human Nature 17, no. 4 (2006): 433–447. DOI: 10. 1007/s12110-006-1005-z.

13. A.L. Harris and V.J. Vitzthum. “Darwin’s legacy: an evolutionary view of women’s reproductive and sexual functioning.” Journal of Sex Research 50, no. 3-4 (2013): 207–246. DOI: 10.1080/00224499. 2012.763085.

Chapter 4

1. H. Haller and S. Krauss. “Misinterpretations of significance: A problem students share with their teachers?” Methods of Psychological Research 7, no. 1 (2002).

2. R. Bramwell, H. West, and P. Salmon. “Health professionals’ and service users’ interpretation of screening test results: experimental study.” BMJ 333 (2006): 284–286. DOI: 10.1136/bmj.38884. 663102.AE.

3. D. Hemenway. “Survey Research and Self-Defense Gun Use: An Explanation of Extreme Overestimates.” The Journal of Criminal Law and Criminology 87, no. 4 (1997): 1430–1445. URL: http://www.jstor.org/stable/1144020.

4. D. McDowall and B. Wiersema. “The incidence of defensive firearm use by US crime victims, 1987 through 1990.” American Journal of Public Health 84, no. 12 (1994): 1982–1984. DOI: 10. 2105/AJPH.84.12.1982.

5. G. Kleck and M. Gertz. “Illegitimacy of One-Sided Speculation: Getting the Defensive Gun Use Estimate Down.” Journal of Criminal Law & Criminology 87, no. 4 (1996): 1446–1461.

6. E. Gross and O. Vitells. “Trial factors for the look elsewhere effect in high energy physics.” The European Physical Journal C 70, no. 1-2 (2010): 525–530. DOI: 10.1140/epjc/s10052-010-1470-8.

7. E.J. Wagenmakers. “A practical solution to the pervasive problems of p values.” Psychonomic Bulletin & Review 14, no. 5 (2007): 779–804. DOI: 10.3758/BF03194105.

8. D.G. Smith, J. Clemens, W. Crede, M. Harvey, and E.J. Gracely. “Impact of multiple comparisons in randomized clinical trials.” The American Journal of Medicine 83, no. 3 (1987): 545–550. DOI: 10.1016/0002-9343(87)90768-6.

9. J. Carp. “The secret lives of experiments: methods reporting in the fMRI literature.” Neuroimage 63, no. 1 (2012): 289–300. DOI: 10.1016/j.neuroimage.2012.07.004.

10. Y. Benjamini and Y. Hochberg. “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” Journal of the Royal Statistical Society Series B 57, no. 1 (1995): 289–300. URL: http://www.jstor.org/stable/2346101.

Chapter 5

1. A. Gelman and H. Stern. “The Difference Between ‘Significant’ and ‘Not Significant’ is not Itself Statistically Significant.” The American Statistician 60, no. 4 (2006): 328–331. DOI: 10. 1198 / 000313006X152649.

2. M. Bland. “Keep young and beautiful: evidence for an ‘anti-aging’ product?” Significance 6, no. 4 (2009): 182–183. DOI: 10.1111/j. 1740-9713.2009.00395.x.

3. S. Nieuwenhuis, B.U. Forstmann, and E.J. Wagenmakers. “Erroneous analyses of interactions in neuroscience: a problem of significance.” Nature Neuroscience 14, no. 9 (2011): 1105–1109. DOI: 10.1038/nn.2886.

4. A.F. Bogaert. “Biological versus nonbiological older brothers and men’s sexual orientation.” Proceedings of the National Academy of Sciences 103, no. 28 (2006): 10771–10774. DOI: 10. 1073 / pnas. 0511152103.

5. J. McCormack, B. Vandermeer, and G.M. Allan. “How confidence intervals become confusion intervals.” BMC Medical Research Methodology 13 (2013). DOI: 10.1186/1471-2288-13-134.

6. N. Schenker and J.F. Gentleman. “On judging the significance of differences by examining the overlap between confidence intervals.” The American Statistician 55, no. 3 (2001): 182–186. DOI: 10. 1198/000313001317097960.

7. S. Belia, F. Fidler, J. Williams, and G. Cumming. “Researchers misunderstand confidence intervals and standard error bars.” Psychological methods 10, no. 4 (2005): 389–396. DOI: 10. 1037 / 1082-989X.10.4.389.

8. J.R. Lanzante. “A cautionary note on the use of error bars.” Journal of Climate 18, no. 17 (2005): 3699–3703. DOI: 10. 1175/JCLI3499.1.

9. K.R. Gabriel. “A simple method of multiple comparisons of means.” Journal of the American Statistical Association 73, no. 364 (1978): 724–729. DOI: 10.1080/01621459.1978.10480084.

10. M.R. Stoline. “The status of multiple comparisons: simultaneous estimation of all pairwise comparisons in one-way ANOVA designs.” The American Statistician 35, no. 3 (1981): 134–141. DOI: 10.1080/00031305.1981.10479331.

Chapter 6

1. P.N. Steinmetz and C. Thorp. “Testing for effects of different stimuli on neuronal firing relative to background activity.” Journal of Neural Engineering 10, no. 5 (2013): 056019. DOI: 10.1088/1741-2560/10/5/056019.

2. N. Kriegeskorte, W.K. Simmons, P.S.F. Bellgowan, and C.I. Baker. “Circular analysis in systems neuroscience: the dangers of double dipping.” Nature Neuroscience 12, no. 5 (2009): 535–540. DOI: 10. 1038/nn.2303.

3. E. Vul, C. Harris, P. Winkielman, and H. Pashler. “Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition.” Perspectives on Psychological Science 4, no. 3 (2009): 274–290. DOI: 10.1111/j.1745-6924.2009.01125.x.

4. E. Vul and H. Pashler. “Voodoo and circularity errors.” Neuroimage 62, no. 2 (2012): 945–948. DOI: 10.1016/j.neuroimage.2012.01. 027.

5. S.M. Stigler. Statistics on the Table. Harvard University Press, 1999. ISBN: 978-0674009790.

6. J.P. Simmons, L.D. Nelson, and U. Simonsohn. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22, no. 11 (2011): 1359–1366. DOI: 10.1177/0956797611417632.

7. D. Bassler, M. Briel, V.M. Montori, M. Lane, P. Glasziou, Q. Zhou, D. Heels-Ansdell, S.D. Walter, and G.H. Guyatt. “Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis.” JAMA 303, no. 12 (2010): 1180–1187. DOI: 10.1001/jama.2010.310.

8. V.M. Montori, P.J. Devereaux, and N. Adhikari. “Randomized trials stopped early for benefit: a systematic review.” JAMA 294, no. 17 (2005): 2203–2209. DOI: 10.1001/jama.294.17.2203.

9. S. Todd, A. Whitehead, N. Stallard, and J. Whitehead. “Interim analyses and sequential designs in phase III studies.” British Journal of Clinical Pharmacology 51, no. 5 (2001): 394–399. DOI: 10. 1046/j.1365-2125.2001.01382.x.

10. L.K. John, G. Loewenstein, and D. Prelec. “Measuring the prevalence of questionable research practices with incentives for truth telling.” Psychological Science 23, no. 5 (2012): 524–532. DOI: 10. 1177/0956797611430953.

Chapter 7

1. D.G. Altman, B. Lausen, W. Sauerbrei, and M. Schumacher. “Dangers of Using ‘Optimal’ Cutpoints in the Evaluation of Prognostic Factors.” Journal of the National Cancer Institute 86, no. 11 (1994): 829–835. DOI: 10.1093/jnci/86.11.829.

2. L. McShane, D.G. Altman, W. Sauerbrei, S.E. Taube, M. Gion, and G.M. Clark. “Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK).” Journal of the National Cancer Institute 97, no. 16 (2005): 1180–1184. DOI: 10.1093/jnci/dji237.

3. V. Fedorov, F. Mannino, and R. Zhang. “Consequences of dichotomization.” Pharmaceutical Statistics 8, no. 1 (2009): 50–61. DOI: 10.1002/pst.331.

4. S.E. Maxwell and H.D. Delaney. “Bivariate Median Splits and Spurious Statistical Significance.” Psychological Bulletin 113, no. 1 (1993): 181–190. DOI: 10.1037/0033-2909.113.1.181.

Chapter 8

1. R. Abbaszadeh, A. Rajabipour, M. Mahjoob, M. Delshad, and H. Ahmadi. “Evaluation of watermelons texture using their vibration responses.” Biosystems Engineering 115, no. 1 (2013): 102–105. DOI: 10.1016/j.biosystemseng.2013.01.001.

2. M.J. Whittingham, P.A. Stephens, R.B. Bradbury, and R.P. Freckleton. “Why do we still use stepwise modelling in ecology and behaviour?” Journal of Animal Ecology 75, no. 5 (2006): 1182–1189. DOI: 10.1111/j.1365-2656.2006.01141.x.

3. D.A. Freedman. “A note on screening regression equations.” The American Statistician 37, no. 2 (1983): 152–155. DOI: 10. 1080 / 00031305.1983.10482729.

4. L.S. Freedman and D. Pee. “Return to a note on screening regression equations.” The American Statistician 43, no. 4 (1989): 279–282. DOI: 10.1080/00031305.1989.10475675.

5. R. Investigators and Prevenzione. “Efficacy of n-3 polyunsaturated fatty acids and feasibility of optimizing preventive strategies in patients at high cardiovascular risk: rationale, design and baseline characteristics of the Rischio and Prevenzione study, a large randomised trial in general practice.” Trials 11, no. 1 (2010): 68. DOI: 10.1186/1745-6215-11-68.

6. The Risk and Prevention Study Collaborative Group. “n–3 Fatty Acids in Patients with Multiple Cardiovascular Risk Factors.” New England Journal of Medicine 368, no. 19 (2013): 1800–1808. DOI: 10.1056/NEJMoa1205409.

7. C. Tuna. “When Combined Data Reveal the Flaw of Averages.” The Wall Street Journal (2009). URL: http://online.wsj.com/news/articles/SB125970744553071829.

8. P.J. Bickel, E.A. Hammel, and J.W. O’Connell. “Sex bias in graduate admissions: Data from Berkeley.” Science 187, no. 4175 (1975): 398–404. DOI: 10.1126/science.187.4175.398.

9. S.A. Julious and M.A. Mullee. “Confounding and Simpson’s paradox.” BMJ 309, no. 6967 (1994): 1480–1481. DOI: 10.1136/bmj.309.6967.1480.

10. R. Perera. “Commentary: Statistics and death from meningococcal disease in children.” BMJ 332, no. 7553 (2006): 1297–1298. DOI: 10.1136/bmj.332.7553.1297.

Chapter 9

1. J.P.A. Ioannidis. “Why Most Discovered True Associations Are Inflated.” Epidemiology 19, no. 5 (2008): 640–648. DOI: 10.1097/EDE.0b013e31818131e7.

2. M.J. Shun-Shin and D.P. Francis. “Why Even More Clinical Research Studies May Be False: Effect of Asymmetrical Handling of Clinically Unexpected Values.” PLOS ONE 8, no. 6 (2013): e65323. DOI: 10.1371/journal.pone.0065323.

3. J.P. Simmons, L.D. Nelson, and U. Simonsohn. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22, no. 11 (2011): 1359–1366. DOI: 10.1177/0956797611417632.

4. A.T. Beall and J.L. Tracy. “Women Are More Likely to Wear Red or Pink at Peak Fertility.” Psychological Science 24, no. 9 (2013): 1837–1841. DOI: 10.1177/0956797613476045.

5. A. Gelman. “Too Good to Be True.” Slate (2013). URL: http://www.slate.com/articles/health_and_science/science/2013/07/statistics_and_psychology_multiple_comparisons_give_spurious_results.html.

6. K.M. Durante, A. Rae, and V. Griskevicius. “The Fluctuating Female Vote: Politics, Religion, and the Ovulatory Cycle.” Psychological Science 24, no. 6 (2013): 1007–1016. DOI: 10.1177/0956797612466416.

7. C.R. Harris and L. Mickes. “Women Can Keep the Vote: No Evidence That Hormonal Changes During the Menstrual Cycle Impact Political and Religious Beliefs.” Psychological Science 25, no. 5 (2014): 1147–1149. DOI: 10.1177/0956797613520236.

8. M. Jeng. “A selected history of expectation bias in physics.” American Journal of Physics 74 (2006): 578. DOI: 10.1119/1.2186333.

9. J.R. Klein and A. Roodman. “Blind analysis in nuclear and particle physics.” Annual Review of Nuclear and Particle Science 55 (2005): 141–163. DOI: 10.1146/annurev.nucl.55.090704.151521.

10. A.W. Chan, A. Hróbjartsson, K.J. Jørgensen, P.C. Gøtzsche, and D.G. Altman. “Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols.” BMJ 337 (2008): a2299. DOI: 10.1136/bmj. a2299.

11. A.W. Chan, A. Hróbjartsson, M.T. Haahr, P.C. Gøtzsche, and D.G. Altman. “Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials: Comparison of Protocols to Published Articles.” JAMA 291, no. 20 (2004): 2457–2465. DOI: 10.1001/jama. 291.20.2457.

12. D. Fanelli and J.P.A. Ioannidis. “US studies may overestimate effect sizes in softer research.” Proceedings of the National Academy of Sciences 110, no. 37 (2013): 15031–15036. DOI: 10.1073/pnas.1302997110.

Chapter 10

1. P.C. Gøtzsche. “Believability of relative risks and odds ratios in abstracts: cross sectional study.” BMJ 333, no. 7561 (2006): 231–234. DOI: 10.1136/bmj.38895.410451.79.

2. M. Bakker and J.M. Wicherts. “The (mis)reporting of statistical results in psychology journals.” Behavior Research Methods 43, no. 3 (2011): 666–678. DOI: 10.3758/s13428-011-0089-5.

3. E. García-Berthou and C. Alcaraz. “Incongruence between test statistics and P values in medical papers.” BMC Medical Research Methodology 4, no. 1 (2004): 13. DOI: 10.1186/1471-2288-4-13.

4. P.C. Gøtzsche. “Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis.” Controlled Clinical Trials 10 (1989): 31–56. DOI: 10.1016/0197-2456(89)90017-2.

5. K.A. Baggerly and K.R. Coombes. “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology.” The Annals of Applied Statistics 3, no. 4 (2009): 1309–1334. DOI: 10.1214/09-AOAS291.

6. The Economist. “Misconduct in science: An array of errors.” September 2011. URL: http://www.economist.com/node/21528593.

7. G. Kolata. “How Bright Promise in Cancer Testing Fell Apart.” New York Times (2011). URL: http://www.nytimes.com/2011/07/08/health/research/08genes.html.

8. V. Stodden, P. Guo, and Z. Ma. “Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals.” PLOS ONE 8, no. 6 (2013): e67111. DOI: 10.1371/journal.pone.0067111.

9. G.K. Sandve, A. Nekrutenko, J. Taylor, and E. Hovig. “Ten Simple Rules for Reproducible Computational Research.” PLOS Computational Biology 9, no. 10 (2013): e1003285. DOI: 10.1371/journal.pcbi.1003285.

10. C.G. Begley and L.M. Ellis. “Drug development: Raise standards for preclinical cancer research.” Nature 483, no. 7 (2012): 531–533. DOI: 10.1038/483531a.

11. F. Prinz, T. Schlange, and K. Asadullah. “Believe it or not: how much can we rely on published data on potential drug targets?” Nature Reviews Drug Discovery 10 (2011): 328–329. DOI: 10.1038/nrd3439-c1.

12. J.P.A. Ioannidis. “Contradicted and initially stronger effects in highly cited clinical research.” JAMA 294, no. 2 (2005): 218–228. DOI: 10.1001/jama.294.2.218.

Chapter 11

1. S. Schroter, N. Black, S. Evans, F. Godlee, L. Osorio, and R. Smith. “What errors do peer reviewers detect, and does training improve their ability to detect them?” Journal of the Royal Society of Medicine 101, no. 10 (2008): 507–514. DOI: 10.1258/jrsm.2008.080062.

2. A.A. Alsheikh-Ali, W. Qureshi, M.H. Al-Mallah, and J.P.A. Ioannidis. “Public Availability of Published Research Data in High-Impact Journals.” PLOS ONE 6, no. 9 (2011): e24357. DOI: 10. 1371/journal.pone.0024357.

3. J.M. Wicherts, D. Borsboom, J. Kats, and D. Molenaar. “The poor availability of psychological research data for reanalysis.” American Psychologist 61, no. 7 (2006): 726–728. DOI: 10.1037/0003-066X. 61.7.726.

4. J.M. Wicherts, M. Bakker, and D. Molenaar. “Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results.” PLOS ONE 6, no. 11 (2011): e26828. DOI: 10.1371/journal.pone.0026828.

5. B. Goldacre. Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients. Faber & Faber, 2013. ISBN: 978-0865478008.

6. T.H. Vines, A.Y.K. Albert, R.L. Andrew, F. Débarre, D.G. Bock, M.T. Franklin, K.J. Gilbert, J.S. Moore, S. Renaut, and D.J. Rennison. “The availability of research data declines rapidly with article age.” Current Biology 24, no. 1 (2014): 94–97. DOI: 10.1016/j.cub. 2013.11.014.

7. T.H. Vines, A.Y.K. Albert, R.L. Andrew, F. Débarre, D.G. Bock, M.T. Franklin, K.J. Gilbert, J.S. Moore, S. Renaut, and D.J. Rennison. “Data from: The availability of research data declines rapidly with article age.” Dryad Digital Repository (2013). DOI: 10.5061/dryad.q3g37.

8. A.W. Chan, A. Hróbjartsson, M.T. Haahr, P.C. Gøtzsche, and D.G. Altman. “Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials: Comparison of Protocols to Published Articles.” JAMA 291, no. 20 (2004): 2457–2465. DOI: 10.1001/jama. 291.20.2457.

9. J.J. Kirkham, K.M. Dwan, D.G. Altman, C. Gamble, S. Dodd, R. Smyth, and P.R. Williamson. “The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews.” BMJ 340 (2010): c365. DOI: 10.1136/bmj.c365.

10. W. Bouwmeester, N.P.A. Zuithoff, S. Mallett, M.I. Geerlings, Y. Vergouwe, E.W. Steyerberg, D.G. Altman, and K.G.M. Moons. “Reporting and Methods in Clinical Prediction Research: A Systematic Review.” PLOS Medicine 9, no. 5 (2012): e1001221. DOI:10.1371/journal.pmed.1001221.

11. K. Huwiler-Müntener, P. Jüni, C. Junker, and M. Egger. “Quality of Reporting of Randomized Trials as a Measure of Methodologic Quality.” JAMA 287, no. 21 (2002): 2801–2804. DOI: 10. 1001 / jama.287.21.2801.

12. A.C. Plint, D. Moher, A. Morrison, K. Schulz, D.G. Altman, C. Hill, and I. Gaboury. “Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review.” Medical Journal of Australia 185, no. 5 (2006): 263–267.

13. E. Mills, P. Wu, J. Gagnier, D. Heels-Ansdell, and V.M. Montori. “An analysis of general medical and specialist journals that endorse CONSORT found that reporting was not enforced consistently.” Journal of Clinical Epidemiology 58, no. 7 (2005): 662–667. DOI: 10. 1016/j.jclinepi.2005.01.004.

14. L.K. John, G. Loewenstein, and D. Prelec. “Measuring the prevalence of questionable research practices with incentives for truth telling.” Psychological Science 23, no. 5 (2012): 524–532. DOI: 10. 1177/0956797611430953.

15. N.A. Vasilevsky, M.H. Brush, H. Paddock, L. Ponting, S.J. Tripathy, G.M. LaRocca, and M.A. Haendel. “On the reproducibility of science: unique identification of research resources in the biomedical literature.” PeerJ 1 (2013): e148. DOI: 10.7717/peerj. 148.

16. G.B. Emerson, W.J. Warme, F.M. Wolf, J.D. Heckman, R.A. Brand, and S.S. Leopold. “Testing for the presence of positive-outcome bias in peer review: a randomized controlled trial.” Archives of Internal Medicine 170, no. 21 (2010): 1934–1939. DOI: 10.1001/archinternmed.2010.406.

17. P.A. Kyzas, K.T. Loizou, and J.P.A. Ioannidis. “Selective Reporting Biases in Cancer Prognostic Factor Studies.” Journal of the National Cancer Institute 97, no. 14 (2005): 1043–1055. DOI: 10.1093/jnci/dji184.

18. D. Eyding, M. Lelgemann, U. Grouven, M. Härter, M. Kromp, T. Kaiser, M.F. Kerekes, M. Gerken, and B. Wieseler. “Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials.” BMJ 341 (2010): c4737. DOI: 10.1136/bmj.c4737.

19. E.H. Turner, A.M. Matthews, E. Linardatos, R.A. Tell, and R. Rosenthal. “Selective publication of antidepressant trials and its influence on apparent efficacy.” New England Journal of Medicine 358, no. 3 (2008): 252–260. DOI: 10.1056/NEJMsa065779.

20. J.P.A. Ioannidis and T.A. Trikalinos. “An exploratory test for an excess of significant findings.” Clinical Trials 4, no. 3 (2007): 245–253. DOI: 10.1177/1740774507079441.

21. K.K. Tsilidis, O.A. Panagiotou, E.S. Sena, E. Aretouli, E. Evangelou, D.W. Howells, R.A.S. Salman, M.R. Macleod, and J.P.A. Ioannidis. “Evaluation of Excess Significance Bias in Animal Studies of Neurological Diseases.” PLOS Biology 11, no. 7 (2013): e1001609. DOI:10.1371/journal.pbio.1001609.

22. G. Francis. “Too good to be true: Publication bias in two prominent studies from experimental psychology.” Psychonomic Bulletin & Review 19, no. 2 (2012): 151–156. DOI: 10.3758/s13423-012-0227-9.

23. U. Simonsohn. “It Does Not Follow: Evaluating the One-Off Publication Bias Critiques by Francis.” Perspectives on Psychological Science 7, no. 6 (2012): 597–599. DOI: 10.1177/1745691612463399.

24. R.F. Viergever and D. Ghersi. “The Quality of Registration of Clinical Trials.” PLOS ONE 6, no. 2 (2011): e14701. DOI: 10.1371/journal.pone.0014701.

25. A.P. Prayle, M.N. Hurley, and A.R. Smyth. “Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: cross sectional study.” BMJ 344 (2012): d7373. DOI: 10.1136/bmj.d7373.

26. V. Huser and J.J. Cimino. “Linking ClinicalTrials.gov and PubMed to Track Results of Interventional Human Clinical Trials.” PLOS ONE 8, no. 7 (2013): e68409. DOI: 10.1371/journal.pone.0068409.

27. C.W. Jones, L. Handler, K.E. Crowell, L.G. Keil, M.A. Weaver, and T.F. Platts-Mills. “Non-publication of large randomized clinical trials: cross sectional analysis.” BMJ 347 (2013): f6104. DOI: 10. 1136/bmj.f6104.

28. S. Mathieu, A.W. Chan, and P. Ravaud. “Use of trial register information during the peer review process.” PLOS ONE 8, no. 4 (2013): e59910. DOI: 10.1371/journal.pone.0059910.

29. E.J. Wagenmakers, R. Wetzels, D. Borsboom, H.L.J. van der Maas, and R.A. Kievit. “An Agenda for Purely Confirmatory Research.” Perspectives on Psychological Science 7, no. 6 (2012): 632–638. DOI: 10.1177/1745691612463078.

Chapter 12

1. J.P.A. Ioannidis. “Why Most Published Research Findings Are False.” PLOS Medicine 2, no. 8 (2005): e124. DOI: 10.1371/journal. pmed.0020124.

2. J.D. Schoenfeld and J.P.A. Ioannidis. “Is everything we eat associated with cancer? A systematic cookbook review.” American Journal of Clinical Nutrition 97, no. 1 (2013): 127–134. DOI: 10.3945/ajcn. 112.047142.

3. V. Prasad, A. Vandross, C. Toomey, M. Cheung, J. Rho, S. Quinn, S.J. Chacko, D. Borkar, V. Gall, S. Selvaraj, N. Ho, and A. Cifu. “A Decade of Reversal: An Analysis of 146 Contradicted Medical Practices.” Mayo Clinic Proceedings 88, no. 8 (2013): 790–798. DOI:10.1016/j.mayocp.2013.05.012.

4. J. LeLorier, G. Gregoire, and A. Benhaddad. “Discrepancies between meta-analyses and subsequent large randomized, controlled trials.” New England Journal of Medicine 337 (1997): 536–542. DOI: 10.1056/NEJM199708213370806.

5. T.V. Pereira and J.P.A. Ioannidis. “Statistically significant meta-analyses of clinical trials have modest credibility and inflated effects.” Journal of Clinical Epidemiology 64, no. 10 (2011): 1060–1069. DOI: 10.1016/j.jclinepi.2010.12.012.

6. A. Tatsioni, N.G. Bonitsis, and J.P.A. Ioannidis. “Persistence of Contradicted Claims in the Literature.” JAMA 298, no. 21 (2007): 2517–2526. DOI: 10.1001/jama.298.21.2517.

7. F. Gonon, J.P. Konsman, D. Cohen, and T. Boraud. “Why Most Biomedical Findings Echoed by Newspapers Turn Out to be False: The Case of Attention Deficit Hyperactivity Disorder.” PLOS ONE 7, no. 9 (2012): e44275. DOI: 10.1371/journal.pone.0044275.

8. M. Marshall, A. Lockwood, C. Bradley, C. Adams, C. Joy, and M. Fenton. “Unpublished rating scales: a major source of bias in randomised controlled trials of treatments for schizophrenia.” The British Journal of Psychiatry 176, no. 3 (2000): 249–252. DOI: 10.1192/bjp.176.3.249.

9. J.J. Kirkham, K.M. Dwan, D.G. Altman, C. Gamble, S. Dodd, R. Smyth, and P.R. Williamson. “The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews.” BMJ 340 (2010): c365. DOI: 10.1136/bmj.c365.

10. J.R. Lanzante. “A cautionary note on the use of error bars.” Journal of Climate 18, no. 17 (2005): 3699–3703. DOI: 10. 1175/JCLI3499.1.

11. E. Wagenmakers, R. Wetzels, D. Borsboom, and H.L. van der Maas. “Why psychologists must change the way they analyze their data: The case of psi.” Journal of Personality and Social Psychology 100, no. 3 (2011): 426–432. DOI: 10.1037/a0022790.

12. J. Galak, R.A. LeBoeuf, L.D. Nelson, and J.P. Simmons. “Correcting the past: Failures to replicate psi.” Journal of Personality and Social Psychology 103, no. 6 (2012): 933–948. DOI: 10.1037/a0029709.

13. R. Hake. “Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses.” American Journal of Physics 66, no. 1 (1998): 64–74. DOI: 10.1119/1.18809.

14. L.C. McDermott. “Research on conceptual understanding in mechanics.” Physics Today 37, no. 7 (1984): 24. DOI: 10. 1063 / 1.2916318.

15. J. Clement. “Students’ preconceptions in introductory mechanics.” American Journal of Physics 50, no. 1 (1982): 66–71. DOI: 10. 1119/1.12989.

16. D.A. Muller. Designing Effective Multimedia for Physics Education. PhD thesis. University of Sydney, April 2008. URL: http://www.physics.usyd.edu.au/super/theses/PhD(Muller).pdf.

17. C.H. Crouch, A.P. Fagen, J.P. Callan, and E. Mazur. “Classroom demonstrations: Learning tools or entertainment?” American Journal of Physics 72, no. 6 (2004): 835–838. DOI: 10.1119/1.1707018.

18. H. Haller and S. Krauss. “Misinterpretations of significance: A problem students share with their teachers?” Methods of Psychological Research 7, no. 1 (2002).

19. C.H. Crouch and E. Mazur. “Peer Instruction: Ten years of experience and results.” American Journal of Physics 69, no. 9 (2001): 970–977. DOI: 10.1119/1.1374249.

20. N. Lasry, E. Mazur, and J. Watkins. “Peer instruction: From Harvard to the two-year college.” American Journal of Physics 76, no. 11 (2008): 1066–1069. DOI: 10.1119/1.2978182.

21. A.M. Metz. “Teaching Statistics in Biology: Using Inquiry-based Learning to Strengthen Understanding of Statistical Analysis in Biology Laboratory Courses.” CBE Life Sciences Education 7 (2008): 317–326. DOI: 10.1187/cbe.07--07--0046.

22. R. Delmas, J. Garfield, A. Ooms, and B. Chance. “Assessing students’ conceptual understanding after a first course in statistics.” Statistics Education Research Journal 6, no. 2 (2007): 28–58.

23. Nature Editors. “Reporting checklist for life sciences articles.” May 2013. URL: http://www.nature.com/authors/policies/checklist.pdf.

24. E. Eich. “Business Not as Usual.” Psychological Science 25, no. 1 (2014): 3–6. DOI: 10.1177/0956797613512465.

25. R. Schekman. “How journals like Nature, Cell and Science are damaging science.” The Guardian (2013). URL: http://www.theguardian.com/commentisfree/2013/dec/09/how-journals-nature-science-cell-damage-science.

26. R.D. deShazo, S. Bigler, and L.B. Skipworth. “The Autopsy of Chicken Nuggets Reads ‘Chicken Little’.” American Journal of Medicine 126, no. 11 (2013): 1018–1019. DOI: 10.1016/j.amjmed. 2013.05.005.