replication crisis

back to index

description: a crisis in some scientific disciplines where published results are difficult or impossible to replicate

28 results

Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth
by Stuart Ritchie
Published 20 Jul 2020

But even if you think the talk of a ‘crisis’ is grandiose or exaggerated, there’s one final argument in my quiver.122 It’s this: the reforms we’ve discussed in this chapter would all be beneficial for science even if there weren’t a replication crisis. This brings to mind the following classic cartoon about climate change, from the Lexington Herald-Leader’s Joel Pett: With apologies to Pett, let me rewrite his cartoon to respond to a different kind of doubter: Openness. Transparency. Improved statistics. Pre-registration. Automated error-checking. Clever ways to catch fraudsters. Preprints. Better hiring practices. A new culture of humility. Etc. Etc. What if the replication crisis is a big hoax and we create a better science for nothing? Epilogue O, while you live, tell truth, and shame the Devil!

The policy, which was intended to protect the ‘integrity’ of Scotland’s ‘clean and green … brand’ (whatever that means), was derided as ‘cheap populism’ by a political commentator and described as ‘extremely concern[ing]’ in an open letter signed by twenty-eight scientific societies.24 What all this tells us is that, regardless of our discussion of the replication crisis and its associated failings, politicians will still trample all over science if they think it’ll lead them towards votes. The worry that the arguments in this book might be misappropriated to make selective, insincere attacks on research shouldn’t stop us from publicly discussing the replication crisis and its associated problems. We mustn’t make science suck in its stomach whenever a member of the public or a politician is watching. In fact, a frank admission of science’s weaknesses is the best way to pre-empt attacks by science’s critics and to be honest more generally about how the uncertainty-filled process of science really works.

But the very fact that we don’t know – along with the fact that so many high-profile, puffed-up findings have fallen apart upon closer inspection – is, I’d argue, cause for enough concern. For responses to other criticisms of the idea that there’s a crisis, see Harold Pashler & Christine R. Harris, ‘Is the Replicability Crisis Overblown? Three Arguments Examined’, Perspectives on Psychological Science 7, no. 6 (Nov. 2012): pp. 531–36; https://doi.org/10.1177/1745691612463401 30.  Alexander Bird, ‘Understanding the Replication Crisis as a Base Rate Fallacy’, British Journal for the Philosophy of Science, 13 Aug. 2018; https://doi.org/10.1093/bjps/axy051 31.  Of course, the argument of the original authors (those whose findings failed to replicate) has often been that the modifications aren’t, in fact, slight, and break the experiment in important ways.

pages: 283 words: 102,484

Everything Is Predictable: How Bayesian Statistics Explain Our World
by Tom Chivers
Published 6 May 2024

Jaynes—do talk about “objective Bayesianism,” trying to base priors on logical principles. “I don’t think [Jaynes] succeeded,” Kevin McConway of the Open University told me, “but he did have a good try.” CHAPTER TWO Bayes in Science THE REPLICATION CRISIS IN SCIENCE, AND SOME WAYS TO FIX IT In 2011, a series of unwelcome things happened, and science was shaken to the core. Not everyone noticed. The “replication crisis,” as it was known, probably didn’t affect your daily life (it didn’t affect mine for some years, and I was writing about science for a living). Most scientists—even most psychologists, whose discipline was the worst affected—were able to go on for quite a long time as if nothing had happened.

Well: the cause of the replication crisis has been greatly discussed. It is a story of bad incentives—publish or perish, the demand for novelty—and scientists have come up with many sensible proposals for how to fix them. Lowering the threshold for “significance” is one; requiring preregistration of hypotheses in order to prevent HARKing is another; having journals agree to publish papers on the strength of the methods, not the nature of the findings, in order to avoid the novelty filter, is a third. But you could go deeper and say that the underlying cause of the replication crisis is even more basic: it’s that science, like Jakob Bernoulli three hundred years ago, is doing sampling probabilities, not inferential probabilities.

It’s not that the theses of these books were wrong—most of this research does stand up to scrutiny, even after 2011 and Daryl Bem and all those things, knowing what we know now about the replication crisis and the statistical problems in psychology. When presented with questions framed like this, people really do seem to give incoherent, irrational answers. Dan Ariely’s own work has come under scrutiny after a 2012 paper of his4 turned out to be based on fraudulent data—Ariely denies making up the data himself, but admits he has no good story for how it happened.5 And a lot of the work on “social priming” that Kahneman’s book cited has since been undermined, as we discussed in the section on the replication crisis in chapter 2. But it’s definitely true that framing affects how people view risk, and that people misjudge risk on the basis of how easily they can think of examples.

pages: 290 words: 82,871

The Hidden Half: How the World Conceals Its Secrets
by Michael Blastland
Published 3 Apr 2019

Index abstract formulas 141 Academy of Medical Sciences 133 adoption studies 41 aid, economic development 141 aid-effectiveness craze, the 153 alcohol consumption 180 AllTrials campaign 114–5 Altman, Doug 129–30 Amano, Yukiya 185 ambiguity 209–10 Amgen 111–2 Analysis (radio programme) 102 analytic validity 158, 263n18 anarchy 224 aphorisms 68–9, 149 apprenticeships 205–6 argument, beliefs and habits of 186 asthma 135 Attanasio, Orazio 225–9, 230 Autho, David 219–23 average knowledge 173 background influences 23–34 background norms, rejecting 24–5 bacon 161–3, 162–3 Banerjee, Abhijit 150–4, 157 Bangladesh 80–2, 82, 101–2, 158, 261n6 Bank of England 103, 216 Bank of Japan 103 Basbøll, Thomas 244–5 baseline data 165 base-rate neglect 176–7 basic laws 140 Bateson, William 245 BBC 88, 98 Beatles, the 52–3, 259n33 Begley, Glenn 111–7 behaviour context-specific 42–3 environmental cues 65–7 behavioural economics 157 Behavioural Insight Team 155, 156, 232 beliefs 60 contradictory 63–4 inconsistency of 60–6 justification 60–1, 63 manipulation 62–3 power of information on 66–8 self-contradiction 61–2 Berlin, Isaiah 199 betting, on knowledge 236–7 big causes, power of 35 big events causal intricacy 193–6 complexity 185–7 difficulty determining causality 188–96 power of circumstance 196–9 big picture, the 215–6 Bijani, Ladan 40–1 Bijani, Laleh 40–1 biographies 49 biological randomness 43–4 biomedical science, research standards 129–36 Bolsover 217–8 Boorstin, Daniel 17, 136, 138, 264n24 Booth, Charles 146–7 BP 211 brain, the 64 plasticity 56 self-justifying 83 breast cancer 45–6, 46 Brexit referendum 18–9, 20, 90, 214–8, 223–4, 241 Bunnings 77 Burckhardt, Jacob 255n20 Burke, Edmund 269n1 Burns, Terry 102–3 business decisions, failures 210–1 cancer 45–8 breast 45–6, 46 lung 174–5 risk 162–3, 166, 174–5 screening 132–3 Cancer Research UK 133 canned laughter 154–5 capitalism 118 Carillion 211 Carp, Joshua 123–4 Cartwright, Nancy 79, 79–82, 82, 193–4, 195, 202–3, 203–4, 263n18 causal instincts 123 causal interactions, complexity 239 causal intricacy 193–4 causal models 242–4, 243, 269–70n3 causal theorizing 212–4 causality assumption of 212–4 difficulty determining 188–96 existence of 276–7n12 hard 225–9 importance of 212 mechanical models 242–4, 243 in one person 48 cause and effect dependable 203–4 patterns of 23, 25–6, 26 supposed 248 unreliable 204 causes and causal influences 90, 94 competing 248 criminals 29 interaction 193–6 and luck 178 secret life of 8–11 simple 184–5 cells, biographical stories 47–8 certainty, desire for 235 Chadwick, Edwin 146–7 chance 14, 37–8, 247, 281n1 chaos theory 56–7, 276n10 Chater, Nick 59, 60, 63, 64–5, 66–7 Chernobyl disaster 185 child and adolescent development 23–6, 41–2 child mental health 206–7 childhood influences 23–5 delinquent boys 26–34 China, rise of 218–23, 279n19 choice, situated 31–3, 34 choice blindness 62 choices 60 Cialdini, Robert 154–5 Cifu, Adam 131–2 circumstances 70 power of 196–9 claims inflation 130 climate change 238–9 Clinton, Hillary 222 Cochrane Collaboration, the 189–90 cognition 64 cognitive biases 14 cognitive limitations 14, 214 Comaroff, John 107–8 common sense 69–70 comparative cost analysis 173 competence 236–7 complacency 237 complexity adding 244 big events 185–7 facing 15 hidden 184–201 of reality 245 complexity theory 276n10 complexity-avoidance 187 complications, hidden 187 Conan Doyle, Arthur 108 confidence 72 consistency 68–75, 202–4, 260n6, 260n8 constructive realism 17 consumer behaviour 77 context 41–2, 72, 101 context-specific behaviour 72 context-specific learning 42–3 control alternative to 248–9 elusiveness of 85–6 powers of 195 conviction 104 coping strategies 16–7, 225–46 adapting 230–3 betting 236–7 communicate uncertainty 237–9 embracing uncertainty 234–6 exceptions 244–5 experiment 230–3 governing for uncertainty 239–41 managing for uncertainty 241–2 metaphors 242–4 negative capability 234 relax 246 triangulation 233–4 use of probability 242 Corbyn, Jeremy 20 corporate power 241 cost/benefit analysis, cows 117–22 cows, cost/benefit analysis 117–22 Coyle, Diane 216, 262n12 Crabbe, John 85–7 credibility 238–9 credibility crisis 18 crime causes of 142–4 heroes and villains view 142 opportunist 144–5 reduced opportunity 144–5 theory of 142–6, 143 victims and survivors view 142–3 criminals causal influences 29 childhood influences 26–34 desisters 30 high rate chronics 30 life-course persistent offenders 28–9 life-courses 28, 236 variables 31 critical factors 83–5 crowds, wisdom of 149 cultural difference 79–82, 79–85 Daniels, Denise 43–4, 57 Darwin, Charles 50–1 data granularity 216–7 interpretation 98–100 Dawid, Philip 276–7n12 De Rond, Mark 198, 201 de Vries, Ymkje Anna 114 deadweight cost 205–6 debate 98 decision making 58–60 influences 32–3 situated choice 31–3 deep preferences 65 deeper rationale, construction of 60 Deepwater Horizon 211 defining characteristics 43 degrees of freedom 122–9 delinquent boys 26–34 dementia 176–7, 274n16 democracy 20 Deng Xiaoping 219 Denrell, Jerker 199, 201 desires 59 details importance of 49–54 neglecting 151–2 problem of 229 selective 26 determinism 28 development economics 150–3 developmental difference, sources of variation 9–11 developmental noise 10 difference 15 pockets of 214–24 Dilnot, Andrew 237, 275n3 disciplined pluralism 231 disorder 45 forces of 11–3 doubt 238 Down’s syndrome 166 drugs comparative cost analysis 173 impact 171–2 medical effect 167–9, 169, 170–4 non-responders 172 Numbers-Needed-to-Treat (NNTs) 168, 169, 170, 173–4 predictive weakness 170–3 duelling certainties 235 Duflo, Esther 83, 84, 141, 150–3, 157–8, 158–9, 230–1 ecological validity 263n18 economic development, aid 141 economic forecasting 92, 102–7 economic recovery 217–8 economics 233 economy, the 87–100, 91, 93, 94, 95 education 151–2, 206–7, 275– 6n7 Einstein, Albert 140–1 Emerson, Ralph Waldo 68 enigmatic variation 13–6, 48 environment context 72 non-shared 37 shared 35 environmental influences 43–4 epidemiology 181 epigenetics 6–7 erratic influences 60 essential you, the 59–60 estimates 89–91, 96 European Central Bank 103 evidence 21 balance of 114 conclusive 186, 187 the Janus effect 121, 122–9 limitations of 117–22 statistical significance 137 strength of 137 evidence-based medicine 133–4 exceptions 214–24, 244–5 expectations 35 big 196 frustration of 15 of regularity 47, 202–4 unrealistic 182 experience, influence of 33, 34, 55–7 experiment 230–3 expertise, crisis of 18–9 experts, credibility crisis 18–9 external validity 101, 158, 263n18, 264n19 extreme performance 199 failure 204–11 fairness 66–7 false negatives 113–4 false positives 113–4, 122 falsification 245 family, changes of 41 farmer and a chicken, the 202–4 fate 30 fears, exaggerated 46 Financial Times 77 First World War 108 Fitzroy, Robert 50 flat mind, the 60, 60–8 Flaubert, Gustave 139 forecasting 109 former Yugoslavia 108 foxes 199 France 186–7 Freedman, Sir Lawrence 108, 109 freedom 236 Fukushima nuclear power station meltdown 185–7 fundamentals 141 identifying 153 further education 208–9 Galbraith, John Kenneth 110 Gartner, Klaus 87 Gash, Tom 142–3 Gates, Bill 199 GDP data 262n12 growth estimation 88–100, 91, 93, 94, 95, 262–3n14 local 214–5, 216, 218 Gelman, Andrew 124–5, 244 gene–environment interaction 6–7 general principles 140 generalities 174 generalization 76–8, 146, 152, 263n18 genes and genetics influence of 34–7, 39–41, 44, 45–7 overclaiming 134–5 power of 33, 45 genetic risk 45–7 genius, dangerous 212–4 genotype 8 Germany 185, 186, 188 Gillam, John 77 global financial crisis, 2008–9 104, 106, 210, 235 globalization 213 Gove, Michael 18–9 granularity 216–7 ground truth 217 groupthink 149 guarantees, lack of 160 Guardian 207 Gupta, Rajeev 117, 118 Haldane, Andy 216–7, 218 Harford, Tim 156–7, 237 Harris, Judith Rich 40–2, 72 Hayek, Friedrich 105–6 health screening 177 heart disease 163–6 hedgehogs 199 Henry (ex-delinquent) 32 Hensall, Abigail 39–40, 41 Hensall, Brittany 39–40, 41 herd mentality 154–5 hidden causes 35–8 hidden half, the coping strategies 225–46 ignoring 202–24 mystery of 35 power of 44–5 hidden trivia 8–9 hindsight 78 hindsight bias 83 history 107–8 lessons of 109 Homebase 76–7 Honda, US motorcycle market penetration 196–9 hubris 77 human sameness irregularity 45–9 limits of 34–45 human understanding, fundamentals 213 Human Zoo, The (radio programme) 60–6 humility 224, 248–9 IBM 199 ibuprofen 163–5 ideological divide 240 ideologies 9–10 idiosyncratic influence 53–4 ignorance 21, 107 disguising 242 the shock of 7 imagination 138 impulsive judgement, value of 149 incarceration rates, United States of America 222, 240, 280n10 incidentals, effect of 51–2 incoherency problem, the 149 inconsistency beliefs 60–6 justifiable 70–1 incredible certitude 209 Indian Express 117 individual differences 56 individuality conjoined twins 39–42 neurological foundation of 56 industrial policy 208 inflation 102–7 influences background 23–34 childhood 26–34 criminals 26–34 decision making 32–3 environmental 43–4 erratic 60 hidden 204 microenvironmental 8–9, 253–4n12 information power of 66–8 selective 66–7 Institute for Fiscal Studies 205–6 Institute for Government 208–9 intangible differences 253n11 intangible variation 10, 229 interaction, problems of 193–6 internal validity 101–2, 158 International Journal of Epidemiology 43 intuition 54, 204 Ioannidis, John 121, 133–6 irrationality, human 14 irregularity 94 disruptive power of 224 frustration of 15 human 45–9 influence 12 problem of 229 underestimating 214–24 Islamic State 108 it’s-all-because problem 91, 96 James, Henry 29, 56 James, William 141 Janus effect, the 121, 122–9 Johansen, Petter 62 Johnson, Samuel 214 Johnson, Wendy 71–2 Jones, Susannah Mushatt 162–3, 165 journalism 237–8 Juno (film) 193 Kaelin, William 130 Kawashima, Kihachiro 197 Kay, John 16, 68, 197, 231, 232 Keats, John 138–9, 234 Kempermann, Gerd 56, 57 Keynes, John Maynard 107, 271n9 Keynesianism 103 King, Mervyn 103, 104, 106, 110 Kinnell, Galway 28 Knausgaard, Karl Ove 86–7 Knight, Frank 107 Knightian uncertainty 107 knowledge 12–3, 170 advance of 20–1 average 173 betting on 236–7 credibility crisis 18 critical factors 83–5 failures of 19, 76–8, 79–82 fallibility of 248 generalizable 234 generalization 76–8 illusion of 136, 138 lessons of the past 102–7, 107–10 in medicine 182 negative capability 138–9 as obstacle to progress 17 obvious 82 paths to 136–9 plausibility mistaken for 132 practical 30–1 pretence of 105–6 probabilistic 160, 161, 163–4, 172–3 and probability 180 problem of scale 177–80 provenance 116 relevant 82–5 replication crisis 111–7 subverting 76–110 and time variations 87–100, 91, 93, 94, 95 transfer 37, 76–8, 83, 101–2 unknowns 85–7 validity 100–2 validity across time 107–10 weakest-link principle 79–82 Krugman, Paul 210 Lancet 225–6 Langley, Winnie 51, 165, 178 Laub, John 26–34, 42 law-like effects, claims about 21 learning styles 207 Leicester City Football Club 199–201 Leon (ex-delinquent) 31–2 Leyser, Ottoline 114 life, mechanics of 51 life-course persistent offenders 28–9 limits and limitations 16–7, 44, 75 base-rate neglect 176–7 of cleverness 278n14 individual level 174–6, 178–9, 181–3 lack of guarantees 160 marginal probabilistic outcomes 176–7 medical effect 167–9, 169, 170–4 on prediction 165–6 on probability 160–83 problem of scale 161–6, 174– 6, 177–80, 181–3 Liskov Substitution Principle 261n3 Little Britain (TV comedy) 192 Liu, Chengwei 198, 201 lives, understanding 29 location shift 264n20 Loken, Erik 124–5 long-acting reversible contraceptives (LARCS) 190 luck 37–8, 48, 178, 198 lung cancer 174–5 Lyko, Frank 1, 2 machine mode thinking 151–2 Macron, Emmanuel 20 Manski, Charles 209, 235 Mao Zedong 218 marginal probabilistic outcomes 176–7 marmorkrebs 1–9, 4, 10, 12, 12–3, 22, 35, 81, 182, 252n2 Marteau, Theresa 65 Martin, George 52 May, Theresa 208 Mayne, Stephen 77 measurement 99–100 mechanical relationships 212, 242, 244 mechanical thinking 242–4, 243 media stigma 192–3 medical effect, drugs 167–9, 169, 170–4 medical reversal 131–3 medicine comparative cost analysis 173 knowledge in 182 non-responders 172 Numbers-Needed-to-Treat (NNTs) 168, 169, 170, 173–4 personalized 181–3 predictive weakness 170–3 probability and 167–9, 169, 170–4 memory 56, 102–7 Mendelian randomization 233 Menon, Anand 214–5 mental shortcuts 14–5 mere facts 202–3 meta-science 19, 20 methodological revisions 97–8, 120 mice 55 microenvironmental influences 8–9, 253–4n12 micro-irregularity 35–7 micro-particulars 128 Microsoft 147–50, 199 Miller, Helen 66–7, 67 mind, the flat 59–60, 60–8 shape 59 models and modelling 140, 242–4, 243, 269–70n3 moment when, the 52 morality, changing 108 More or Less (radio programme) 237 Munafò, Marcus 234 Nadella, Satya 147–8 National Survey of Family Growth 192 National Surveys of Sexual Attitudes and Lifestyles 191–2 nationalism 108 Nature 2, 112, 136, 168, 174 nature/nurture debate 3, 5–6, 9–10 negative capability 138–9, 234 neurology 58 New England Journal of Medicine 131–2 Newcastle upon Tyne 214 Newton, Isaac 140–1 noise 14 definition 10 developmental 10 as intellectual dross 11 re-appraisal of 11–3 non-shared environment 37 Nosek, Brian 129 noses 49–51 Nottingham 217 Numbers-Needed-to-Treat (NNTs) 168, 169 nurture, influence of 44 O’Connor, Sarah 217–8 Office for National Statistics 89, 92, 98, 99–100, 216 O’Neill, Onora 238 opinions 21, 59 order 11–2, 13 organ donation campaign 155–6 outside influence 44 overclaiming 134–5 overconfidence 21 overseas business expansion 76–8 Oxfam, sexual abuse scandal 210 Paphides, Pete 52–3 parental behaviour 41 parents, impact of 41 Parris, Matthew 63 parthenogenesis 1–2 particularism 271–2n15 particularity problem, the 93 past, the, lessons of 102–7, 107–10 pattern-making instinct 21 patterns 13 pendulums 57 perceptual systems 64 performance 72–5 personalized medicine 181–3 Peto, Richard 47–8 phenotypes 8 physiognomy, and character 50 plausibility 132 Plomin, Robert 43–4, 49, 57 pluralism 231–2 polarization 235 policy making 231–2 appraisal 277n4 chances of success 208 failures 204–9 governing for uncertainty 239–41 and probability 178–9 secret of 209 seminar 207–8 sequential changes 208 political assumptions, fall of 20 political beliefs 60–6 population validity 263n18 populism, rise of 20 poverty 240–1 Prasad, Vinayak 131–2 precision 183 predictability 28 predictive weakness 165–6, 170–3 preferences 59, 62 deep 65 priming 126–8 probabilistic knowledge 160, 161, 163–4, 170, 172–3 probability 54, 70, 107, 258n25, 272n2 advantages 177–80 base-rate neglect 176–7 difference in 30 fear of low probabilities 166 individual level 174–6, 178–9, 181–3 limits and limitations 160–83 marginal 176–7 medical effect 167–9, 169, 170–4 paradox 170 and policy making 178–9 predictive weakness 165–6 problem of scale 161–6, 174– 6, 177–80, 181–3 recognizing significance 161 risk evaluation 161–6 suggestion of knowledge 180 use of 242 usefulness 161 problems, conceptualizing 17 productivity growth 209–10 progress, knowledge as obstacle to 17 psychoanalysis 58 psychology 58 Pullinger, John 278n14 Pullman, Philip 37 quantification, risk and risk-taking 162–5 racism 125–6 radical uncertainty 106, 107 Radio, Andrew 102 rage to conclude, the 139 randomized controlled trials, value of 280n6 randomness, pure 9 Ranieri, Claudio 200–1 rationality 68, 260n6, 260n8 reality 230, 245, 254n14 reciprocity 155 reflection 65–6 regularity 73, 160 assumption of 212–4 expectations of 47, 202–4 search for 212, 230 statistical 240–1 replication crisis 18, 111–7, 117– 22, 129, 136, 138 Replication Project 129 research 111–39 balance of evidence 114 breadth 130 claims inflation 130 confidence in 115–6 credibility crisis 18 decision rules 136–9 depth 130 evidence-based medicine 133–4 false negatives 113–4 false positives 113–4, 122 fragility 128–9 freedom 122–9 half wrong 113, 115–6 the Janus effect 121, 122–9 limitations of 117–22 micro-particulars 128 multiple analyses 125–6 multiple conclusions 117–22 overclaiming 134–5 priming 126–8 redemption 20 replication crisis 111–7, 117– 22, 129, 136, 138 rigour 19 scepticism 115–6 standards 129–36 statistical significance 122 triangulation 138 validity 101–2 research-credibility crisis 18 rigour 19, 246 risk and risk-taking 70–1, 107, 186 alcohol consumption 180 cancer 162–3, 166, 174–5 communication of 133 evaluation 161–6 heart disease 163–6 quantification 162–5, 166 quantified 187 risk-perception 71 Rockhill, Beverly 181 Rolling Stone magazine 23 Rose, Geoffrey 175–6 Rowntree Joseph 146–7 Royal Bank of Scotland 211 Russell, Bertrand 202, 202–3 samples, validity 100–2 Sampson, Robert 26–34, 42, 236 sanitation 225–9 Santayana, George 109 scale, problem of 161–6, 174–6, 177–80, 181–3 scepticism 105, 115–6, 128, 206 schizophrenia 34–6, 256n10 Science 56 Scientific American 55 Scotland, Triple-P parenting programme 206 screening 132–3, 177 searing memory, doctrine of the 102–7 selection bias 244 self-understanding 67 Sense about 115 serendipitous events 43, 52–3 sex education, role of 189–90 short-term gene–environment interaction 7 significance, recognizing 161 Silberzahn, Raphael 125–6 Simmons, Joseph 122–3 situated choice 31–3, 34, 42 situations, appraisal of 72 sliding-doors moments 50 small differences, power of 56–7 small effects, influence of 49–54 small experiences, influence of 35–7 smartphones 97, 191 Smith, George Davey 50, 51, 234, 281n1 social contexts 31, 195 social media 191 social mobility 240–1 social policy 195 social proof 154–6 social reformers 146–7 social science, utility of 146–50 special theory of relativity 140–1 Spiegelhalter, David 180, 244–5 spontaneous interaction 9 stagflation 103 statins 171 statistical regularities 240–1 statistical significance 122, 137 stents, use of 131 stories and storytelling 25–6, 53–4, 244–5, 247, 248, 258n25, 258n27 structural forces 54 Sun, the 51 support factors 194 Surfers Against Sewage 70–1 surgeons, skills 73–4 system 1 thinking 149 systematic forces 54 systems-level thinking 153 Tamil Nadu 79–82, 101–2 Tangiers, Morocco 84 technology, changing 108 Teen Mom (TV show) 193 teenage pregnancy rate decline in 184, 188–96 estimates 275n3 terrible simplifiers 255n20 Tesco 77, 211 Thaler, Richard 157 theories 140–59 analytic validity 158 arguments about 150–4 of crime 142–6, 143 development economics 150–3 fitness 157 implementation 152 limitations 157 and practice 141 refining 156–7 relevance 157–8 social science 146–50 tension in 154–9 using 156–7 ‘thick’ description 86 time, validity across 107–10 Time magazine 193 time variations, and knowledge 87–100, 91, 93, 94, 95 The Times 63 toilets 225–9 Toshiba 211 trade-offs 190–1 trends 54 trials 156 triangulation 138, 233–4 Triple-P parenting programme 206–7 trivia, importance of 84–5 true uncertainty 107 Trump, Donald 20, 218, 222, 223–4 trust 238 trust deficit 218 trustworthiness 238 Tufte, Edward 139 turning points, variety 49–54 TV crime shows 143, 143 twins and twin studies conjoined 39–42 identical 34–7, 39, 256n10 Tyson, Mike 23, 23–6 Tyson, Rodney 24–5, 255n3 Uhlmann, Eric 125–6 uncertainty 89–90, 100, 209– 12, 254n14 admitting 238 communicating 237–9 data 89–91 embracing 234–6 erratic 93 governing for 239–41 Knightian 107 language of 238 managing for 241–2 in medicine 167–9, 169, 170–4 perpetual 230 radical 106, 107 true 107 uncertainty laundering 268n33 understanding hidden half of 13 limiting effects on 14 limits of 54 unemployment 221–2, 263n17 unintended consequences 105, 229 United States of America China trade 220–3 incarceration rates 222, 240, 280n10 labour market 221 minimum wage 266–7n10 unemployment 221–2 universal gravitational attraction, theory of 140–1 unknowns 85–7, 206 unusual, the 195 upbringing 23–5 Uyeno, Lori 47 validity across time 107–10 analytic 158, 263n18 ecological 263n18 external 101, 158, 263n18, 264n19 internal 101–2, 158 knowledge 100–2, 107–10 population 263n18 research 101–2 samples 100–2 values 59, 232 variation, sources of 5–8 Volkswagen, diesel emissions scandal 211 Wall Street Journal 219 Wallace, Alfred Russel 259n33 Walmart 77 Watts, Duncan 68, 69, 147–50 weakest-link principle 79–82 Wedgwood, Josiah 50–1 Wellington, Duke of 51 Wesfarmers 76–7 West Germany, motorcycle thefts 142–4 Western, Bruce 54 Wilson, Harold 99 World Bank Independent Evaluation Group 79 World Health Organization 162 world picture 63–4 Wright, Sewall 253n11

More striking than what I think is how many others say something similar: that we have somehow over-reached, and now wake up to the fact that life is not the shining edifice of robust understanding that our mass of research findings suggests. These findings are prolific, to be sure. But they have started to fall over at an alarming rate, failing to replicate as scientists seek to repeat each other’s work. You might have heard talk of a replication crisis, even a crisis of expertise, or research-credibility crisis. Take a moment to absorb that phrase: ‘research’ faces a ‘credibility crisis’. We’re not sure what to believe even from people whose purpose is finding out what to believe. If the knowledge factory, of all places, can’t be relied on for knowledge, we know we’re in trouble.

He was more careful than that, and actually said: ‘People have had enough of experts from organisations with acronyms saying that they know what is best and getting it consistently wrong.’ Which, if that is what these experts did, would be reasonable enough. He didn’t say whether people had also had enough of politicians, and I suspect at the time he had little idea of the research credibility or replication crisis, which is more serious and remains under-reported. This book is written against that more serious background – of anxiety about failures of knowledge in the social and human sciences, coupled with a movement to raise the game. It has various names, this movement – ‘meta-science’ is one you might come across if you haven’t already.

pages: 533 words: 125,495

Rationality: What It Is, Why It Seems Scarce, Why It Matters
by Steven Pinker
Published 14 Oct 2021

Not to mention the fact that when spooky coincidences are noticed, they tend to get embellished (Lincoln did not have a secretary named Kennedy), while pesky noncoincidences (like their different days, months, and years of birth and death) are ignored. Scientists are not immune to the Texas sharpshooter fallacy. It’s one of the explanations for the replicability crisis that rocked epidemiology, social psychology, human genetics, and other fields in the 2010s.59 Think of all the foods that are good for you which used to be bad for you, the miracle drug that turns out to work no better than the placebo, the gene for this or that trait which was really noise in the DNA, the cute studies showing that people contribute more to the coffee fund when two eyespots are posted on the wall and that they walk more slowly to the elevator after completing an experiment that presented them with words associated with old age.

For its posterior credence to be higher than the posterior credence in its opposite, the likelihood of the data given that the hypothesis is true must be far higher than the likelihood of the data given that the hypothesis is false. The evidence, in other words, must be extraordinary. A failure of Bayesian reasoning among scientists themselves is a contributor to the replicability crisis that we met in chapter 4. The issue hit the fan in 2011 when the eminent social psychologist Daryl Bem published the results of nine experiments in the prestigious Journal of Personality and Social Psychology which claimed to show that participants successfully predicted (at a rate above chance) random events before they took place, such as which of two curtains on a computer screen hid an erotic image before the computer had selected where to place it.14 Not surprisingly, the effects failed to replicate, but that was a foregone conclusion given the infinitesimal prior probability that a social psychologist had disproven the laws of physics by showing some undergraduates some porn.

young upstart overturns scientific applecart a scientific revolution in x everything you know about y is wrong The problem is that “surprising” is a synonym for “low prior probability,” assuming that our cumulative scientific understanding is not worthless. This means that even if the quality of evidence is constant, we should have a lower credence in claims that are surprising. But the problem is not just with journalists. The physician John Ioannidis scandalized his colleagues and anticipated the replicability crisis with his 2005 article “Why Most Published Research Findings Are False.” A big problem is that many of the phenomena that biomedical researchers hunt for are interesting and a priori unlikely to be true, requiring highly sensitive methods to avoid false positives, while many true findings, including successful replication attempts and null results, are considered too boring to publish.

Super Thinking: The Big Book of Mental Models
by Gabriel Weinberg and Lauren McCann
Published 17 Jun 2019

In order to be sure a study result isn’t a fluke, it needs to be replicated. Interestingly, in some fields, such as psychology, there has been a concerted effort to replicate positive results, but those efforts have found that fewer than 50 percent of positive results can be replicated. That rate is low, and this problem is aptly positive results the replication crisis. This final section offers some models to explain how this happens, and how you can nevertheless gain more confidence in a research area. Replication efforts are an attempt to distinguish between false positive and true positive results. Consider the chances of replication in each of these two groups.

Using those numbers, a replication rate of 50 percent requires about 60 percent of the studies to have been true positives and 40 percent of them to have been false positives. To see this, consider 100 studies: If 60 were true positives, we would expect 48 of those to replicate (80 percent of 60). Of the remaining 40 false positives, 2 would replicate (5 percent of 40) for a total of 50. The replication rate would then be 50 per 100 studies, or 50 percent. Replication Crisis Re-test 100 Studies So, under this scenario, about a fourth of the failed replications (12 of 50) are explained by a lack of power in the replication efforts. These are real results that would likely be replicated successfully either if an additional replication study were done or if the original replication study had a higher sample size.

If this type of data dredging happens routinely enough, then you can see why a large number of studies in the set to be replicated might have been originally false positives. In other words, in this set of one hundred studies, the base rate of false positives is likely much larger than 5 percent, and so another large part of the replication crisis can likely be explained as a base rate fallacy. Unfortunately, studies are much, much more likely to be published if they show statistically significant results, which causes publication bias. Studies that fail to find statistically significant results are still scientifically meaningful, but both researchers and publications have a bias against them for a variety of reasons.

pages: 184 words: 46,395

The Choice Factory: 25 Behavioural Biases That Influence What We Buy
by Richard Shotton
Published 12 Feb 2018

Contentsx Praise for The Choice Factory Preface Introduction Bias 1: The Fundamental Attribution Error Bias 2: Social Proof Bias 3: Negative Social Proof Bias 4: Distinctiveness Bias 5: Habit Bias 6: The Pain of Payment Bias 7: The Danger of Claimed Data Bias 8: Mood Bias 9: Price Relativity Bias 10: Primacy Effect Bias 11: Expectancy Theory Bias 12: Confirmation Bias Bias 13: Overconfidence Bias 14: Wishful Seeing Bias 15: Media Context Bias 16: The Curse of Knowledge Bias 17: Goodhart’s Law Bias 18: The Pratfall Effect Bias 19: Winner’s Curse Bias 20: The Power of the Group Bias 21: Veblen Goods Bias 22: The Replicability Crisis Bias 23: Variability Bias 24: Cocktail Party Effect Bias 25: Scarcity Ethics Conclusion References Further reading Acknowledgements Index Praise for The Choice Factory “This book is a Haynes Manual for understanding consumer behaviour. You should buy a copy - and then buy another copy to give to one of the 97% of people in marketing who are too young to remember what a bloody Haynes Manual is

The point of these tests isn’t to answer the question with 100% certainty; it’s to give you enough evidence to run a larger-scale test with your advertising message. The worries about behavioural science extend beyond representative samples. Critics, like Brian Nosek, are concerned that some experiments have failed to be replicated. Luckily, we’ll be discussing replicability in the next chapter. Bias 22: The Replicability Crisis How should marketers react? Your first sip of coffee causes you to wince. For the last fortnight, each hot drink has been accompanied by a twinge in your back tooth. You decide to ring your dentist to arrange an appointment before it deteriorates any further. The receptionist answers within a few rings and, after expressing her concern, she checks the dentist’s availability.

, by Evan Davis, John Kay, and Jonathan Star [London Business School Review, Vol. 2, No. 3, pp. 1–23, 1991] Marketers Are from Mars, Consumers Are from New Jersey by Bob Hoffman [2015] Bias 16: The curse of knowledge Made to Stick: Why Some Ideas Survive and Others Die by Chip Heath and Dan Heath [2008] The Wiki Man by Rory Sutherland [2011] Bias 17: Goodhart’s law Long and Short of It: Balancing Short- and Long-Term Marketing Strategies by Les Binet and Peter Field [2012] Management in 10 Words by Terry Leahy [2012] Leading by Alex Ferguson and Michael Moritz [2015] Bias 18: The pratfall effect: Social Animal by Elliot Aronson [1972] The Wasp Factory by Iain Banks [1984] Bias 19: Winner’s curse The Winner’s Curse: Paradoxes and Anomalies of Economic Life by Richard Thaler [1991] Originals: How Non-Conformists Move the World by Adam Grant [2016] ‘Harnessing naturally occurring data to measure the response of spending to income’, by Michael Gelman, Shachar Kariv, Matthew Shapiro, Dan Silverman, Steven Tadelis [Science, Vol. 345, No. 6193, pp. 212–215, 2014] ‘The Psychology of Windfall Gains’, by Hal Arkes, Cynthia Joyner, Mark Pezzo, Jane Gradwohl Nash, Karen Siegel-Jacobs, Eric Stone Eric [Organizational Behaviour and Human Decision Processes, Vol. 59, No. 3, pp. 331–347, 1994] On the Fungibility of Spending and Earnings – Evidence from Rural China and Tanzania by Luc Christiaensen and Lei Pan [2012] Bias 20: The power of the group ‘Humour in Television Advertising: The Effects of Repetition and Social Setting’, by Yong Zhang and George Zinkhan [Advances In Consumer Research, Vol. 18, pp. 813–818, 1991] ‘Feeling More Together: Group Attention Intensifies Emotion’, by Garriy Shteynberg, Jacob Hirsh, Evan Apfelbaum, Jeff Larsen, Adam Galinsky, and Neal Roese [Emotion, Vol. 14, No. 6, pp. 1102–1114, 2014] Bias 21: Veblen goods ‘Commercial Features of Placebo and Therapeutic Efficacy’, by Rebecca Waber, Baba Shiv, Ziv Carmon; Dan Ariely [Journal of the American Medical Association, Vol. 299, No.9, pp. 1016–1017, 2008] Bias 22: The replicability crisis ‘Why Susie Sells Seashells by the Seashore: Implicit Egotism and Major Life Decisions’, by Brett Pelham, Matthew Mirenberg, and John Jones [Journal of Personality and Social Psychology, Vol. 82, No. 4, pp. 469–487, 2002] ‘Rich the banker? What’s not in a Name’, by Tim Harford [2016], www.timharford.com/2016/11/rich-the-banker-whats-not-in-a-name ‘Estimating the reproducibility of psychological science’, by Brian Nosek et al.

pages: 428 words: 103,544

The Data Detective: Ten Easy Rules to Make Sense of Statistics
by Tim Harford
Published 2 Feb 2021

Brian Nosek has given useful interviews to several podcasts, including You Are Not So Smart (episode 100), https://youarenotsosmart.com/2017/07/19/yanss-100-the-replication-crisis/; Planet Money (episode 677); EconTalk, November 16, 2015, http://www.econtalk.org/brian-nosek-on-the-reproducibility-project/; The Hidden Brain (episode 32), https://www.npr.org/templates/transcript/transcript.php?storyId=477921050; as well as BBC Analysis, “The Replication Crisis,” November 12, 2018, https://www.bbc.co.uk/programmes/m00013p9. 11. This figure of thirty-nine is based on the subjective opinion of the replicating researchers.

Daniel Kahneman himself dramatically raised the profile of the issue when he wrote an open letter to psychologists in the field warning them of a looming “train wreck” if they could not improve the credibility of their research.25 The entire saga—Ioannidis’s original paper, Bem’s nobody-believes-this finding, the high-profile struggles to replicate Baumeister’s, Cuddy’s, and Bargh’s research, and as the coup de grâce, Nosek’s discovery that (as Ioannidis had said all along) high-profile psychological studies were more likely not to replicate than to stand up—was sometimes described as a “replication crisis” or a “reproducibility crisis.” In the light of Kickended, perhaps none of this should have been a surprise—but it is shocking nonetheless. The famous psychological results are famous not because they are the most rigorously demonstrated, but because they’re interesting. Fluke results are far more likely to be surprising, and so far more likely to hit that Goldilocks level of counterintuitiveness (not too absurd, but not too predictable) that makes them so fascinating.

The journey toward more rigorous science requires many steps, and we at least are taking some of them. I recently had the chance to interview Richard Thaler, a Nobel Memorial Prize winner in economics, who has collaborated with Daniel Kahneman and many other psychologists. He struck me as well placed to evaluate psychology as a sympathetic outsider. “I think the replication crisis has been great for psychology,” he told me. “There’s just better hygiene.”36 Brian Nosek, meanwhile, told the BBC: “I think if we do another large reproducibility project five years from now, we are going to see a dramatic improvement in reproducibility in the field.”37 * * * — In the early chapters of this book, I cited numerous psychological studies of motivated reasoning and the biased assimilation of information.

pages: 338 words: 85,566

Restarting the Future: How to Fix the Intangible Economy
by Jonathan Haskel and Stian Westlake
Published 4 Apr 2022

Modifying Rules as Technology Progresses In addition to being imperfect, the rules for subsidising intangibles are sometimes downright perverse, either because they are based on outdated models of how investment takes place or because they overlook some important type of investment. Let’s consider two examples: the growing importance of software tools and data in research, and what is called the replication crisis. It is widely agreed that the explosion of computing power in recent decades has increased the returns to data-intensive research.12 Much research involves the creation and analysis of new data sets, along with the development of new tools to do so. Consider, for example, OpenSAFELY, a data platform that allows medical researchers to study data from British patients’ National Health Service electronic health records securely and pseudonymously, and that enables urgent COVID-19-related research using very large data sets.13 Ben Goldacre, one of the project’s leaders, has frequently written about the difficulty of persuading traditional research funding bodies to finance the development of these kinds of data sets and tools, and of having them recognised as legitimate research outputs alongside traditional academic publication.

Consider, for example, OpenSAFELY, a data platform that allows medical researchers to study data from British patients’ National Health Service electronic health records securely and pseudonymously, and that enables urgent COVID-19-related research using very large data sets.13 Ben Goldacre, one of the project’s leaders, has frequently written about the difficulty of persuading traditional research funding bodies to finance the development of these kinds of data sets and tools, and of having them recognised as legitimate research outputs alongside traditional academic publication. Research funders are beginning to change their attitudes, but it is a slow process, and it is constrained by the bureaucratic nature of funders. Another widely recognised issue in science is the replication crisis,14 in which a whole range of research findings that were thought to be reliable turn out to be uncertain: when researchers try to replicate the experiments on which the findings are based, they are unable to obtain the same results, suggesting the original findings were at best the result of luck and at worst of fraud.

Another widely recognised issue in science is the replication crisis,14 in which a whole range of research findings that were thought to be reliable turn out to be uncertain: when researchers try to replicate the experiments on which the findings are based, they are unable to obtain the same results, suggesting the original findings were at best the result of luck and at worst of fraud. So, for example, the psychological phenomenon of priming—the idea that showing a person a series of words or images about, say, old people will cause them, subconsciously, to act in an “elderly” way—was shown to be either nonexistent or much weaker than psychologists had thought. The replication crisis has given rise to systematic attempts to see if time-honoured findings are actually replicable.15 Such replication attempts are often funded by philanthropists, such as John Arnold. Replication attempts could be an extremely valuable undertaking, greatly enhancing humanity’s knowledge base.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design
by Michael Kearns and Aaron Roth
Published 3 Oct 2019

The methodological dangers presented by the combination of algorithmic and human p-hacking have generated acrimonious controversies and hand-wringing over scientific findings that don’t reflect reality. These play a central role in what is broadly referred to as the “reproducibility crisis” in science, which has its own Wikipedia pages that begins: The replication crisis (or replicability crisis or reproducibility crisis) is an ongoing (2019) methodological crisis in science in which scholars have found that the results of many scientific studies are difficult or impossible to replicate or reproduce on subsequent investigation, either by independent researchers or by the original researchers themselves.

See also precise specification goal racial data and bias and algorithmic violations of fairness and privacy, 96 and college admissions models, 77 and dating preferences, 94–97 and “fairness gerrymandering,” 86–89 and fairness issues in machine learning, 65–66 and forbidden inputs, 66–67 and Google search, 14–15 and lending decisions, 191 and scope of topics covered, 19 and unique challenges of algorithms, 7 RAND Corporation, 100 randomization and differential privacy, 36–37, 40–44, 47 random lending, 69–71 random sampling, 18–19, 40 and self-play in machine learning, 131–32 and trust in data administrators, 45–47 rare events, 144 regulation of data and algorithms. See laws and regulations reidentification of anonymous data, 22–31, 33–34, 38 relationship status data, 51–52 religious affiliation data, 51–52 reproducibility (replication) crisis, 19–20, 156–60 residency hiring, 126–30 resume evaluation, 60–61 Rock-Paper-Scissors, 99–100, 102–3 Roth, Alvin, 130 RuleFit algorithms, 173 runs on banks, 95–96 sabotage, 99–100 Sandel, Michael, 177–78 SAT tests and fairness vs. accuracy of models, 65, 74–80 and predictive modeling, 8 and word analogy problems, 57 and word embedding models, 59–60 scale issues, 139, 143–45, 192.

pages: 172 words: 51,837

How to Read Numbers: A Guide to Statistics in the News (And Knowing When to Trust Them)
by Tom Chivers and David Chivers
Published 18 Mar 2021

It transpired that he had asked his PhD student to break up the data into ‘males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to buffet, people who sit far away, and so on’.9 Other methodological problems were found with Wansink’s old papers, and other emails revealed shoddy statistical practice – in one, he suggests that ‘we should be able to get much more from this … I think it would be good to mine it for significance and a good story.’10 He wanted the research to ‘go virally big time’. This was a dramatic example. But p-hacking – in less dramatic forms – goes on all the time. It is usually innocent. Academics desperate to get p<0.05, so they can get their paper published, will rerun a trial, or reanalyse the data. You might have heard of the ‘replication crisis’, in which lots of important findings in psychology and other disciplines have turned out not to exist when other scientists tried to replicate their findings. It was based on a failure of scientists to understand exactly this problem: they kept chopping up their data and rerunning their studies until they found statistically significant results, not realising that by doing so they were rendering their work meaningless.

Daniel Kahneman, the great psychologist and pioneer of our understanding of cognitive biases – he won the Nobel Prize in Economic Sciences for his work with Amos Tversky – wrote in 2011 that ‘disbelief is not an option’ when it came to the astonishing priming effects.3 A picture of a pair of eyes above an honesty box led people to put more money in it if than if there was a neutral picture of flowers.4 Thinking about a shameful action, like stabbing a colleague in the back, leads people to buy more soap and disinfectant than they normally would, to scrub their soul clean: the ‘Lady Macbeth effect.’5 But by the time that BBC article – and others, such as a long piece in The Atlantic from 20146 – were published, research into money priming was struggling. People were trying to find the same results as the early researchers and not finding them, or finding them to be much smaller and less impressive. What was going on? Well, a lot of things. And there are many excellent books to read about the ‘replication crisis’ – the sudden realisation in many parts of science, but especially psychology, and especially social priming, that a huge swath of past research did not stand up to scrutiny. But the one we’re going to look at is the demand for novelty in science. There is a huge problem at the core of how science is done.

pages: 371 words: 107,141

You've Been Played: How Corporations, Governments, and Schools Use Games to Control Us All
by Adrian Hon
Published 14 Sep 2022

New York Times, updated November 12, 2021, www.nytimes.com/2021/01/23/business/financial-aid-college-merit-aid.html; “Scholarships & Financial Aid,” Wabash College, accessed November 28, 2021, www.wabash.edu/admissions/finances/sources. 5. “Transcript: Ezra Klein Interviews Agnes Callard,” Ezra Klein Show, New York Times, May 14, 2021, www.nytimes.com//2021/05/14/podcasts/ezra-klein-podcast-agnes-callard-transcript.html. 6. Kelsey Piper, “Science Has Been in a ‘Replication Crisis’ for a Decade. Have We Learned Anything?” Vox, October 14, 2020, www.vox.com/future-perfect/21504366/science-replication-crisis-peer-review-statistics. 7. Brian Resnick, “The ‘Marshmallow Test’ Said Patience Was a Key to Success. A New Replication Tells Us S’More.” Vox, June 6, 2018, www.vox.com/science-and-health/2018/6/6/17413000/marshmallow-test-replication-mischel-psychology; Yuichi et al., “Predicting Adolescent Cognitive and Self-Regulatory Competencies from Preschool Delay of Gratification: Identifying Diagnostic Conditions,” Developmental Psychology 26, no. 6 (2016): 978–986, https://doi.org/10.1037/0012-1649.26.6.978; Tyler W.

Sometimes this happens because designers mistakenly overgeneralise findings from studies to include interventions that are different from those in the original study, or interventions that are delivered to different sets of people in different conditions—a brain training game that works for university students in a lab doesn’t necessarily work for a fifty-year-old at home, especially if it’s been redesigned during its journey to the consumer. Exaggerations can also occur intentionally, when marketers cherry-pick the single best improvement from a study and suggest it applies to their entire app. These misrepresentations come at a time when behavioural science itself has been in the midst of a “replication crisis,” along with fields like economics and medicine.6 Influential findings like the results of the marshmallow test, which showed that children’s ability to delay gratification predicted their future academic achievement, have failed to be replicated after repeated retesting.7 In other words, a lot of what’s published in scientific journals has turned out to be completely wrong.

pages: 297 words: 83,651

The Twittering Machine
by Richard Seymour
Published 20 Aug 2019

The complaints about ‘fake news’ indicate that the embattled political establishment has not yet mastered the new media. But the problem goes even deeper than that and, in a strange way, the myth of a ‘post-truth’ society is a bungled attempt to diagnose the rot. In the sciences, there is an ongoing ‘replication crisis’ afflicting medicine, economics, psychology and evolutionary biology. The crisis consists of the fact that the results of many scientific studies are impossible to replicate in subsequent tests. In a survey of 1,500 scientists in the journal Nature, 70 per cent of the respondents failed to replicate the findings of another scientist’s experiment.50 Half of them couldn’t even replicate their own findings.

Jane and Chris Fleming, Modern Conspiracy: The Importance of Being Paranoid, Bloomsbury: New York and London, 2014 pp. 4–5. Devorah Baum notices a similar pattern. Feeling Jewish: (A Book for Just About Anyone), Yale University Press: New Haven and London, 2017, pp. 53–5. 50. In a survey of 1,500 scientists . . . Monya Baker, ‘1,500 scientists lift the lid on reproducibility’, Nature, 25 May 2016. On the replication crisis, see Nature’s special online report, ‘Challenges in irreproducible research’, Nature, 18 October 2018. 51. According to the historian of ideas Philip Mirowski . . . Philip Mirowski, Science-Mart: Privatizing American Science, Harvard University Press: Cambridge, MA, 2011. 52. Among the worst examples of this degradation . . .

pages: 442 words: 94,734

The Art of Statistics: Learning From Data
by David Spiegelhalter
Published 14 Oct 2019

This has led to doubts about the reliability of parts of the scientific literature, with claims that many ‘discoveries’ cannot be reproduced by other researchers – such as the continuing dispute over whether adopting an assertive posture popularly known as a ‘power pose’ can induce hormonal and other changes.8 The inappropriate use of standard statistical methods has received a fair share of the blame for what has become known as the reproducibility or replication crisis in science. With the growing availability of massive data sets and user-friendly analysis software, it might be thought that there is less need for training in statistical methods. This would be naïve in the extreme. Far from freeing us from the need for statistical skills, bigger data and the rise in the number and complexity of scientific studies makes it even more difficult to draw appropriate conclusions.

Locators in italics refer to figures and tables A A/B tests 107 absolute risk 31–2, 36–7, 383 adjustment 110, 133, 135, 383 adjuvant therapy 181–5, 183–4 agricultural experiments 105–6 AI (artificial intelligence) 144–5, 185–6, 383 alcohol consumption 112–13, 299–300 aleatory uncertainty 240, 306, 383 algorithms – accuracy 163–7 – biases 179 – for classification 143–4, 148 – complex 174–7 – contests 148, 156, 175, 277–8 see also Titanic challenge – meaning of 383 – parameters 171 – performance assessment 156–63, 176, 177 – for prediction 144, 148 – robustness 178 – sensitivity 157 – specificity 157 – and statistical variability 178–9 – transparency 179–81 allocation bias 85 analysis 6–12, 15 apophenia 97, 257 Arbuthnot, John 253–5 Archbishop of Canterbury 322–3 arm-crossing behaviour 259–62, 260, 263, 268–70, 269 artificial intelligence (AI) 144–5, 185–6, 383 ascertainment bias 96, 383 assessment of statistical claims 368–71 associations 109–14, 138 autism 113 averages 46–8, 383 B bacon sandwiches 31–4 bar charts 28, 30 Bayes, Thomas 305 Bayes factors 331–2, 333, 384 Bayes’ Theorem 307, 313, 315–16, 384 Bayesian hypothesis testing 219, 305–38 Bayesian learning 331 Bayesian smoothing 330 Bayesian statistical inference 323–34, 325, 384 beauty 179 bell-shaped curves 85–91, 87 Bem, Daryl 341, 358–9 Bernoulli distribution 237, 384 best-fit lines 125, 393 biases 85, 179 bias/variance trade-off 169–70, 384 big data 145–6, 384 binary data 22, 385 binary variables 27 binomial distribution 230–6, 232, 235, 385 birth weight 85–91 blinding 101, 385 BMI (body mass index) 28 body mass index (BMI) 28 Bonferroni correction 280, 290–1, 385 boosting 172 bootstrapping 195–203, 196, 198, 200, 202, 208, 229–30, 386 bowel cancer 233–6, 235 Box, George 139 box-and-whisker plots 42, 43, 44, 45 Bradford-Hill, Austin 114 Bradford-Hill criteria 114–17 brain tumours 95–6, 135, 301–3 breast cancer screening 214–16, 215 breast cancer surgery 181–5, 183–4 Brier score 164–7, 386 Bristol Royal Infirmary 19–21, 56–8 C Cairo, Alberto 25, 65 calibration 161–3, 162, 386 Cambridge University 110, 111 cancer – breast 181–5, 183–4, 214–16, 215 – lung 98, 114, 266 – ovarian 361 – risk of 31–6 carbonated soft drinks 113 Cardiac Surgical Registry (CSR) 20–1 case-control studies 109, 386 categorical variables 27–8, 386 causation 96–9, 114–17, 128 reverse causation 112–15, 404 Central Limit Theorem 199, 238–9, 386–7 chance 218, 226 child heart surgery see heart surgery chi-squared goodness-of-fittest 271, 272, 387 chi-squared test of association 268–70, 387 chocolate 348 classical probability 217 classification 143–4, 148–54 classification trees 154–6, 155, 168, 174, 387 cleromancy 81 clinical trials 82–3, 99–107, 131, 280, 347 clustering 147 cohort studies 109, 387 coins 308, 309 communication 66–9, 353, 354, 364–5 complex algorithms 138–9 complexity parameters 171 computer simulation 205–7, 208 conclusions 15, 22, 347 conditional probability 214–16 confidence intervals 241–4, 243, 248–51, 250, 271–3, 335–6, 387–8 confirmatory studies 350–1, 388 confounders 110, 135, 388 confusion matrixes 157 continuous variables 46, 388 control groups 100, 389 control limits 234, 389 correlation 96–7, 113 count variables 44–6, 389 counterfactuals 97–8, 389 crime 83–5, 321–2 see also homicides Crime Survey for England and Wales 83–5 cross-sectional studies 108–9 cross-validation 170–1, 389 CSR(Cardiac Surgical Registry) 20–1 D Data 7–12, 15, 22 data collection 345 data distribution see sample distribution data ethics 371 data literacy 12, 389 data science 11, 145–6, 389 data summaries 40 data visualization 22, 25, 65–6, 69 data-dredging 12 death 9 see also mortality; murder; survival rates deduction 76 deep learning 147, 389 dependent events 214, 389 dependent variables 60, 125–6, 389 deterministic models 128–9, 138 dice 205–7, 206, 213 differences between groups of numbers 51–6 distribution 43 DNA evidence 216 dogs 179 Doll, Richard 114 doping 310–13, 311–12, 314, 315–16 dot-diagrams 42, 43, 44, 45 dynamic graphics 71 E Ears 108–9 education 95–6, 106–7, 131, 135, 178–9 election result predictions 372–6, 375 see also opinion polls empirical distribution 197, 404 enumerative probability 217–18 epidemiology 95, 117, 389 epistemic uncertainty 240, 306, 308, 309, 390 error matrixes 157, 158, 390 errors in coding 345–6 ESP (extra-sensory perception) 341, 358–9 ethics 371 eugenics 39 expectation 231, 390 expected frequencies 32, 209–13, 211, 214–16, 215, 390 explanatory variables 126, 132–5 exploratory studies 350, 390 exposures 114, 390 external validity 82–3, 390 extra-sensory perception (ESP) 341, 358–9 F False discovery rate 280, 390 false-positives 278–80, 390 feature engineering 147, 390 Fermat, Pierre de 207 final odds 316 financial crisis of 2007–2008 139–40 financial models 139–40 Fisher, Ronald 258, 265–6, 336, 345 five-sigma results 281–2 forensic epidemiology 117, 391 forensic statistics 6 framing 391 – of numbers 24–5 – of questions 79–80 fraud 347–50 funnel plots 234, 391 G Gallup, George 81 Galton, Francis 39–40, 58, 121–2, 238–9 gambler’s fallacy 237 gambling 205–7, 206, 213 garden of forking paths 350 Gaussian distribution see normal distribution GDP (Gross Domestic Product) 8–9 gender discrimination 110, 111 Gini index 49 Gombaud, Antoine 205–7 Gross Domestic Product (GDP) 8–9 Groucho principle 358 H Happiness 9 HARKing 351–2 hazard ratios 357, 391 health 169–70 heart attacks 99–104 Heart Protection Study (HPS) 100–2, 103, 273–5, 274, 282–7 heart surgery 19–21, 22–4, 23, 56–8, 57, 93, 136–8, 137 heights 122–5, 123, 124, 127, 134, 201, 202, 243, 275–8, 276 hernia surgery 106 HES (Hospital Episode Statistics) 20–1 hierarchical modelling 328, 391 Higgs bosons 281–2 histograms 42, 43, 44, 45 homicides 1–6, 222–6, 225, 248, 270–1, 272, 287–94 Hospital Episode Statistics (HES) 20–1 hospitals 19–21, 25–7, 26, 56–61, 138 house prices 48, 112–14 HPS (Heart Protection Study) 100–2, 103, 273–5, 274, 282–7 hypergeometric distribution 264, 391 hypotheses 256–7 hypothesis testing 253–303, 336, 392 see also Neyman-Pearson Theory; null hypothesis significance testing; P-values I IARC (International Agency for Research in Cancer) 31 icon arrays 32–4, 33, 392 income 47–8 independent events 214, 392 independent variables 60, 126, 392 induction 76–7, 392 inductive behaviour 283 inductive inference 76–83, 78, 239, 392 infographics 69, 70 insurance 180 ‘intention to treat’ principle 100–1, 392 interactions 172, 392 internal validity 80–1, 392 International Agency for Research in Cancer (IARC) 31 inter-quartile range (IQR) 51, 89, 392 IQ 349 IQR (inter-quartile range) 49, 51, 89, 392 J Jelly beans in a jar 40–6, 48, 49, 50 K Kaggle contests 148, 156, 175, 277–8 see also Titanic challenge k-nearest neighbors algorithm 175 L LASSO 172–4 Law of Large Numbers 237, 393 law of the transposed conditional 216, 313 league tables 25, 130–1 see also tables least-squares regression lines 124, 125, 393 left-handedness 113–14, 229–33, 232 legal cases 313, 321, 331–2 likelihood 327, 336, 394 likelihood ratios 314–23, 319–20, 332, 394 line graphs 4, 5 linear models 132, 138 literal populations 91–2 logarithmic scale 44, 45, 394 logistic regression 136, 172, 173, 394 London Underground 24 loneliness 80 long-run frequency probability 218 look elsewhere effect 282 lung cancer 98, 114, 266 lurking factors 113, 135, 394–5 M Machine learning 139, 144–5, 395 mammography 214–16, 215 margins of error 189, 199, 200, 244–8, 395 mean average 46–8 mean squared error (MSE) 163–4, 165, 395 measurement 77–9 meat 31–4 media 356–8 median average 46, 47–8, 51, 89, 395 Méré, Chevalier de 205–7, 213 meta-analysis 102, 104, 395 metaphorical populations 92–3 mode 46, 48, 395 mortality 47, 113–14 MRP (multilevel regression and post-stratification) 329, 396 MSE (mean squared error) 163–4, 165, 395 mu 190 multilevel regression and post-stratification (MRP) 329, 396 multiple linear regression 132–3, 134 multiple regression 135, 136, 396 multiple testing 278–80, 290, 396 murders 1–6, 222–6, 225, 248, 270–1, 287–94 N Names, popularity of 66, 67 National Sexual Attitudes and Lifestyle Survey (Natsal) 52, 69, 70, 73–5 natural variability 226 neural networks 174 Neyman, Jerzy 242, 283, 335–6 Neyman-Pearson Theory 282–7, 336–7 NHST (null hypothesis significance testing) 266–71, 294–7, 296 non-significant results 299, 346–7, 370 normal distribution 85–91, 87, 226, 237–9, 396–7 null hypotheses 257–65, 336, 397 null hypothesis significance testing (NHST) 266–71, 294–7, 296 O Objective priors 327 observational data 108, 114–17, 128 odds 34, 314, 316 odds ratios 34–6 one-sided tests 264, 397–8 one-tailed P-values 264, 398 opinion polls 82, 245–7, 246, 328–9 see also election result predictions ovarian cancer 361 over-fitting 167–71, 168 P P-hacking 351 P-values 264–5, 283, 285, 294–303, 336, 401 parameters 88, 240, 398 Pascal, Blaise 207 patterns 146–7 Pearson, Egon 242, 283, 336 Pearson, Karl 58 Pearson correlation coefficient 58, 59, 96–7, 126, 398 percentiles 48, 89, 398–9 performance assessment of algorithms 156–67, 176, 177 permutation tests 261–4, 263, 399 personal probability 218–19 pie charts 28, 29 placebo effect 131 placebos 100, 101, 399 planning 13–15, 344–5 Poisson distribution 223–4, 225, 270–1, 399 poker 322–3 policing 107 popes 114 population distribution 86–91, 195, 399 population growth 61–6, 62–4 population mean 190–1, 395 see also expectation populations 74–5, 80–93, 399 posterior distributions 327, 400 power of a test 285–6, 400 PPDAC (Problem, Plan, Data, Analysis, Conclusion) problem-solving cycle 13–15, 14, 108–9, 148–54, 344–8, 372–6, 400 practical significance 302, 400 prayer 107 precognition 341, 358–9 Predict 2.1 182 prediction 144, 148–54 predictive analytics 144, 400 predictor variables 392 pre-election polls see opinion polls presentation 22–7 press offices 355–6 priming 80 prior distributions 327, 400 prior odds 316 probabilistic forecasts 161, 400 probabilities, accuracy 163–7 probability 10 meaning of 216–22, 400–1 rules of 210–13 and uncertainty 306–7 probability distribution 90, 401 probability theory 205–27, 268–71 probability trees 210–13, 212 probation decisions 180 Problem, Plan, Data, Analysis, Conclusion (PPDAC) problem-solving cycle 13–15, 14, 108–9, 148–54, 344–8, 372–6, 400 problems 13 processed meat 31–4 propensity 218 proportions, comparisons 28–37, 33, 35 prosecutor’s fallacy 216, 313 prospective cohort studies 109, 401 pseudo-random-number generators 219 publication bias 367–8 publication of findings 355 Q QRPs (questionable research practices) 350–3 quartiles 89, 402 questionable research practices (QRPs) 350–3 Quetelet, Adolphe 226 R Race 179 random forests 174 random match probability 321, 402 random observations 219 random sampling 81–2, 208, 220–2 random variables 221, 229, 402 randomization 108, 266 randomization tests 261–4, 263, 399 randomized controlled trials (RCTs) 100–2, 105–7, 114, 135, 402 randomizing devices 219, 220–1 range 49, 402 rate ratios 357, 402 Receiver Operating Characteristic (ROC) curves 157–60, 160, 402 recidivism algorithms 179–80 regression 121–40 regression analysis 125–8, 127 regression coefficients 126, 133, 403 regression modelling strategies 138–40 regression models 171–4 regression to the mean 125, 129–32, 403 regularization 170 relative risk 31, 403 reliability of data 77–9 replication crisis in science 11–12 representative sampling 82 reproducibility crisis 11–12, 297, 342–7, 403 researcher degrees of freedom 350–1 residual errors 129, 403 residuals 122–5, 403 response variables 126, 135–8 retrospective cohort studies 109, 403 reverse causation 112–15, 404 Richard III 316–21 risk, expression of 34 robust measures 51 ROC (Receiver Operating Characteristic) curves 157–60, 160, 402 Rosling, Hans 71 Royal Statistical Society 68, 79 rules for effective statistical practice 379–80 Ryanair 79 S Salmon 279 sample distribution 43 sample mean 190–1, 395 sample size 191, 192–5, 193–4, 283–7 sampling 81–2, 93 sampling distributions 197, 404 scatter-plots 2–4, 3 scientific research 11–12 selective reporting 12, 347 sensitivity 157–60, 404 sentencing 180 Sequential Probability Ratio Test (SPRT) 292, 293 sequential testing 291–2, 404 sex ratio 253–5, 254, 261, 265 sexual partners 47, 51–6, 53, 55, 73–5, 191–201, 193–4, 196, 198, 200 Shipman, Harold 1–6, 287–94, 289, 293 shoe sizes 49 shrinkage 327, 404 sigma 190, 281–2 signal and the noise 129, 404 significance testing see null hypothesis significance testing Silver, Nate 27 Simonsohn, Uri 349–52, 366 Simpson’s Paradox 111, 112, 405 size of a test 285–6, 405 skewed distribution 43, 405 smoking 98, 114, 266 social acceptability bias 74 social physics 226 Somerton, Francis see Titanic challenge sortilege 81 sortition 81 Spearman’s rank correlation 58–60, 405 specificity 157–9, 405 speed cameras 130, 131–2 speed of light 247 sports doping 310–13, 311–12, 314, 315–16 sports teams 130–1 spread 49–51 SPRT (Sequential Probability Ratio Test) 292, 293 standard deviation 49, 88, 126, 405 standard error 231, 405–6 statins 36–7, 99–104, 273–5, 274, 282–7 statistical analysis 6–12, 15 statistical inference 208, 219, 229–51, 305–38, 323–8, 335, 404 statistical methods 12, 346–7, 379 statistical models 121, 128–9, 404 statistical practice 365–7 statistical science 2, 7, 404 statistical significance 255, 265–8, 270–82, 404 Statistical Society 68 statistics – assessment of claims 368–71 – as a discipline 10–11 – ideology 334–8 – improvements 362–4 – meaning of 404 – publications 16 – rules for effective practice 379–80 – teaching of 13–15 STEP (Study of the Therapeutic Effects of Intercessory Prayer) 107 storytelling 69–71 stratification 110, 383 Streptomycin clinical trial 105, 114 strip-charts 42, 43, 44, 45 strokes 99–104 Student’s t-statistic 275–7 Study of the Therapeutic Effects of Intercessory Prayer (STEP) 107 subjective probability 218–19 summaries 40, 49, 50, 51 supermarkets 112–14 supervised learning 143–4, 404 support-vector machines 174 surgery – breast cancer surgery 181–5, 183–4 – heart surgery 19–21, 22–4, 23, 56–8, 57, 93, 136–8, 137 – hernia surgery 106 survival rates 25–7, 26, 56–61, 57, 60–1 systematic reviews 102–4 T T-statistic 275–7, 404 tables 22–7, 23 tail-area 231 tea tasting 266 teachers 178–9 teaching of statistics 13–15 technology 1 telephone polls 82 Titanic challenge 148–56, 150, 152–3, 155, 162, 166–7, 172, 173, 175, 176, 177, 277 transposed conditionals, law of 216, 313 trees 7–8 trends 61–6, 62–4, 67 two-sided tests 265, 397–8 two-tailed P-values 265, 398 Type I errors 283–5, 404 Type II errors 283–5, 407 U Uncertainty 208, 240, 306–7, 383, 390 uncertainty intervals 199, 200, 241, 335 unemployment 8–9, 189–91, 271–3 university education 95–6, 135, 301–3 see also Cambridge University unsupervised learning 147, 407 US Presidents 167–9 V Vaccination 113 validity of data 79–83 variability 10, 49–51, 178–9, 407 variables 27, 56–61 variance 49, 407 Vietnam War draft lottery 81–2 violence 113 virtual populations 92 volunteer bias 85 voting age 79–80 W Waitrose 112–14 weather forecasts 161, 164, 165 weight loss 348 ‘When I’m Sixty-Four’ 351–2 wisdom of crowds 39–40, 48, 51, 407 Z Z-scores 89, 407 PELICAN BOOKS Economics: The User’s Guide Ha-Joon Chang Human Evolution Robin Dunbar Revolutionary Russia: 1891–1991 Orlando Figes The Domesticated Brain Bruce Hood Greek and Roman Political Ideas Melissa Lane Classical Literature Richard Jenkyns Who Governs Britain?

Investing Amid Low Expected Returns: Making the Most When Markets Offer the Least
by Antti Ilmanen
Published 24 Feb 2022

See Black (1993) on data mining concerns, Cochrane (2011) on factor zoo, Bailey-Lopes de Prado (2014) on deflating performance metrics, Harvey-Liu-Zhu (2016) on diagnostics and remedies, and Hou-Xue-Zhang (2020) on replicating anomalies. Yet, Jensen-Kelly-Pedersen (2021) presents compelling evidence that the case for a replication crisis in finance has been overstated. 13 There is a problem with statistical answers which adjust the significance of a single result for the broad set of trials and errors made. Such adjustments often require quantifying how many strategies/models/specifications were tried. Since our understanding on various factor premia reflects the gradual collective effort of generations of researchers, this quantification may be infeasible.

Journal of Financial Economics 135(1), 213–230. Jegadeesh, Narasimhan; and Sheridan Titman (1993), “Returns to buying winners and selling losers: Implications for stock market efficiency.” Journal of Finance 48(1), 65–91. Jensen, Theis I.; Bryan Kelly; and Lasse H. Pedersen (2021), “Is there a replication crisis in finance?” NBER working paper 28432. Jones, Brad (2017), “Leaning with the wind: Long-term asset owners and procyclical investing,” Journal of Investment Management 15(2), 16–38. Jones, Robert C.; and Russ Wermers (2011), “Active management in mostly efficient markets,” Financial Analysts Journal 67(6), 29–45.

pages: 848 words: 227,015

On the Edge: The Art of Risking Everything
by Nate Silver
Published 12 Aug 2024

And betting sports—or almost anything else—requires a tolerance for financial swings that isn’t for everyone. But for the most part, I endorse the sentiment. In recent years, researchers have discovered that a large share of experimental results published in academic journals—the majority of results in some fields—can’t be verified when other researchers try to duplicate them. (This is called the replication crisis.) Occasionally, the reason is something like fraud, but more often the issue is just that statistical inference is hard and the pressure to publish is intense. Academics have an incentive to cater to the whims of peer reviewers and department chairs—more so than to be accurate. When you bet, though, all you care about is accuracy.

In a normal distribution, 68 percent, 95 percent, and 99.7 percent of the data, respectively, is within one, two and three standard deviations of the mean, so someone whose IQ is said to be three standard deviations above the mean is very smart indeed. Statistically significant: Unlikely to be due to chance. In classical statistics, it means that the null hypothesis can be rejected with a specified probability, usually 95 percent. The term is falling out of favor in the River as a result of the adaptation of Bayesian statistics and the replication crisis, the failure of many published academic findings using classical statistics to be verified by other researchers. Steam chasing: In sports betting, the practice of following steam—changes in betting prices at bookmakers that you believe reflect the action of sharp bettors but which have not yet been incorporated by other bookmakers.

L., 72–73 Mickelson, Phil, 197n middle (sports betting), 489 Midriver, 21, 489 Miller, Ed, 134–35, 172, 177n, 186 mimetic desire, 330–31, 489 See also conformity Mindlin, Ivan “Doc,” 195 mining (crypto), 489 misclick, 489 misogyny, 68, 118–19 Mitchell, Melanie, 450, 459 mixed strategies, 58, 60, 63, 425–26, 490 Mizuhara, Ippei, 173 model mavericks/model mediators, 446–47, 490 models abstract thinking and, 23 AI existential risk and, 446–48 vs. algorithms, 478 defined, 490 sports betting and, 179–80, 182 Moneyball, 137, 145, 153, 171, 179–80, 489 moneylines (sports betting), 183, 490 Moneymaker, Chris, 12, 43, 68, 493 Monnette, John, 103–4 moral hazard, 30, 261, 490 moral philosophy consequentialism, 359, 481, 533n deontology, 359, 368, 481, 482 game theory and, 367–68 impartiality, 358–59, 360–61, 366–67, 368, 377, 487, 533n, 538n modern value proposal, 469–72 moral parliament, 364, 470 overfitting/underfitting and, 362–68 rationality, 372–73, 495 River-Village conflict and, 30–31 See also effective altruism; rationalism; utilitarianism Morgenstern, Oskar, 22, 50–51 Moritz, Michael, 247, 248, 258, 259, 265–66, 271 Moskovitz, Dustin, 338–39 Motte-and-bailey fallacy, 490 “move fast and break things,” 250, 270, 419, 490 Mowshowitz, Zvi, 370 Murray, John, 174, 177, 208 Musk, Elon AI existential risk and, 406n, 416 Sam Altman and, 406 autism and, 282, 284 competitiveness and, 25–26 cryptocurrency and, 314–15 cults of personality and, 31 culture wars and, 29 effective altruism and, 344 luck and, 278, 280 megalothymia and, 468 OpenAI founding and, 406 poker and, 251 politics and, 267n resentment and, 277, 278 risk tolerance and, 229, 247–48, 251, 252, 264–65, 299 River and, 299 River-Village conflict and, 26–27, 267n, 295 secular stagnation and, 467 mutually assured destruction (MAD), 58, 421, 424–27, 488, 490 N Nakamoto, Satoshi, 322–23, 496 narcissism, 274–75 Nash equilibrium defined, 47, 490 dominant strategies and, 55 everyday randomization and, 64 in poker, 57–58, 60, 61, 62 prisoner’s dilemma as, 54 reciprocity and, 471 in sports betting, 58–60, 508n Negreanu, Daniel, 48–49, 66–67, 99, 100, 239, 508n nerd-sniping, 490 networking, 191, 197, 333 Neumann, Adam, 30, 281, 282, 283 neural net, 433–34, 490 New York Times, The, 27, 295 Neymar, 18, 82–83 NFTs, 325–26 apeing, 480 Bored Apes, 480 bubble in, 311, 312 DAOs and, 307 defined, 325, 490 focal points and, 330–34 profitability of, 331–32, 530n nits (gambling), 9, 114, 482, 490 Nitsche, Dominik, 49 nodes, 490 normal distribution, 491 nosebleed gambling, 491 NOT INVESTMENT ADVICE, 491 Noyce, Robert, 257 NPC (nonplayer character) syndrome, 378–79, 490 nuclear existential risk, 407, 420–30 Bayesian reasoning on, 423 game theory and, 58, 328, 420–21, 424, 426, 483 Kelly criterion and, 408–9 mutually assured destruction and, 58, 421, 424–27, 488, 490 nuclear proliferation and, 421, 540n odds of, 422–24 rationality and, 427–28 societal institutions and, 250, 456 stability-instability paradox and, 425 technological Richter scale and, 449 nuts (poker), 491 O Obama, Barack, 267 Occam’s razor, 491 Ocean’s 11, 142 Ohtani, Shohai, 173 Old Man Coffee (OMC), 491 O’Leary, Kevin, 301 “Ones Who Walk Away from Omelas, The” (Le Guin), 454n OpenAI AI breakthrough and, 415 attempt to fire Altman, 408, 411, 452n founding of, 406–7, 414 River-Village conflict and, 27 Oppenheimer, Robert, 407, 421, 425 optimism, 407–8, 413–14, 539n See also “Techno-Optimist Manifesto” optionality, 76–77, 99n, 116, 470, 491 options trading, 318–21 Ord, Toby, 352, 369–70, 380, 443 order of magnitude, 491 originating (sports betting), 491 orthogonality thesis, 418, 491 Oster, Emily, 348–49 outliers, 491 outs (poker), 491 outside view, 492 overbet (poker), 492 overdetermined, 492 overfitting/underfitting, 361, 361, 362–68, 492 P p(doom), 369, 372, 375–76, 380, 401, 408–9, 412, 416, 419, 442, 444–46, 455, 458, 492 See also existential risk Page, Larry, 259, 406 Palihapitiya, Chamath, 272, 277, 280 paper clip thought experiment, 372, 402, 418, 442, 487, 491 parameters, 491 Pareto optimal solutions, 492 Parfit, Derek, 364–65, 443–44n, 495 parlay, 492 Pascal, Blaise, 22, 457n, 492 Pascal’s Mugging, 22, 457n, 492 Pascal’s Wager, 457n, 492 patience, 258, 259, 260 payoff matrix, 492 Peabody, Rufus, 178–80, 181, 182–83, 191, 193, 195, 204, 517n Pepe, 492 Perkins, Bill, 374–75 Persinger, LoriAnn, 118 Petrov, Stanislav, 424, 426 p-hacking, 492 physical risk-takers, 217–21 Piper, Kelsey, 505n pips, 492 pit boss, 493 pits, 493 See also table games plurality, 470–71, 493 plus EV, 493 See also EV maximizing pocket pair, 41, 493 point-spread betting, 183–84, 493 poker abstract thinking and, 23–24 abuse and, 118 AI and, 40, 46–48, 60–61, 430–33, 437, 439, 507n asymmetric odds and, 248–49 attention to detail and, 233–34 bluffing, 39–40, 51, 70–75, 77, 78, 101, 509n calmness and, 221 cheating, 84, 85–86, 123–24, 126–28, 512n competitiveness and, 112, 118, 120, 243 concrete learning and, 432 corporatization of, 43–44 courage and, 222–23 deception and, 60 degens and nits, 9, 114, 482 edge and, 22, 63, 86 effective altruism and, 347–48, 367 estimation ability and, 237–38 fictional portrayals of, 45, 112, 134, 333, 487 game theory development and, 22, 50–51 game trees in, 61, 508n Garrett-Robbi hand, 80–86, 89, 117, 123–29, 130, 444–45, 512n gender and, 70, 82, 84, 100, 117–19, 511n Hellmuth’s career, 97–100 high-stakes cash games, 83–84, 115, 251–52 innovations in, 45–46 lack of money drive and, 243 language and, 439–40 mixed strategies and, 60, 63, 425–26 models in, 23–24 money and, 108–11, 120–21, 511n Elon Musk’s strategy, 251 origins of, 40 personality and, 111–17, 129–30 PokerGO studio, 48–49 post-oak bluffing, 64–65 prediction markets and, 370–71 preparation and, 233 prisoner’s dilemma and, 56–57, 508n privilege and, 82–83, 120–21 probabilistic thinking and, 41, 104–5, 127, 154n, 237 process-oriented thinking and, 226–27 race and, 118, 120, 121–22 raise-or-fold attitude and, 229–30 randomization and, 57–58, 63 regulation of, 13 scientific approach to, 41, 42–43 strategic empathy and, 225 tells, 7–8, 88, 99–104, 118, 233–34, 238, 437, 498 tournaments, 6, 7–8, 56, 154n, 503n variance and, 105, 106–11, 112 See also exploitative strategies; risk impact; solvers (poker); World Series of Poker Poker Boom (2004–2007), 12–13, 68, 315, 493 PokerGO studio, 48–49, 73, 77 polarized vs. condensed ranges (poker), 493 politics, 14–17 AI existential risk and, 458, 541n analytics and, 254 contrarianism and, 242, 254n decoupling and, 25, 27 effective altruism/rationalism and, 377–78 election forecasting, 13–14, 16–17, 27, 137, 182n, 433, 448n EV maximizing and, 14–15 expertise and, 272 gambling and, 17, 504n NFTs and, 326 prediction markets and, 373, 374–75, 535n probabilistic thinking and, 15, 17 reference classes and, 448n River-Village conflict and, 27–28, 29, 30, 267–68, 271, 505–6n SBF and, 26, 341n, 342 Village and, 26, 267–68, 271 Polk, Doug, 65–67 polymaths, 493 See also fox/hedgehog model Ponzi schemes, 309, 337, 493 Population Bomb, The (Ehrlich), 412n, 463 Porter, Jontay, 173, 177 position (poker), 493 posthumanism, 499 Postle, Mike, 84 post-oak bluffing, 64–65 pot-committed (poker), 493 Pot-Limit Omaha (PLO), 487, 493 Poundstone, William, 396 Power Law, The (Mallaby), 286 precautionary principle, 493 Precipice, The (Ord), 369–70, 443 prediction markets, 369–75, 380, 493, 535n preflop (poker), 41, 493 preparation, 232–33 price discovery, 493 priors, 493–94; see also Bayes’ theorem prisoner’s dilemma, 52–57 AI existential risk and, 417 arms race and, 478 cryptocurrency and, 315–18 defined, 494, 507–8n dominant strategies and, 54–55 poker and, 56–57, 508n reciprocity and, 367–68 regulation and, 144 sports betting and, 205 trust and, 472 updated version of, 52–54, 53 probabilistic thinking AI and, 439 AI existential risk and, 445–46 asymmetric odds and, 255 vs. determinism, 253–55, 264, 482 distribution, 9, 491 effective altruism and, 367 importance of, 15–16 poker and, 41, 104–5, 127, 154n, 237 politics and, 15, 17 prediction markets, 369–75, 493, 535n slots and, 153–55, 155 sports betting and, 16–17 theory invention, 22 See also EV maximizing probability distribution, 494 process-oriented thinking, 180, 226–27, 495 Professional Blackjack (Wong), 136 progress studies, 494 prop bets, 180, 182–83, 494 prospect theory, 428n, 494 provenance, 494 public (sports betting), 494 pump-and-dump, 494 punt (poker), 494 pure strategy, 59, 494 push (sports betting), 494 pushing the button, 494 See also existential risk Putin, Vladimir, 421–22, 424, 425 put options, 480 Q quantification, 345–51, 352, 359–60, 364 quants, 494 quantum mechanics, 253n Quit (Duke), 90, 230 Qureshi, Haseeb, 338 R Rabois, Keith, 284–85, 286–87 race casinos and, 135–36, 513n poker and, 118, 120, 121–22 River and, 29, 506n VC discrimination and, 287–90 Rain Man, 136 raise-or-fold situation, 229–31, 494 rake (casino poker), 494 Ralston, Jon, 147 randomization, 57–58, 59–60, 63, 64, 426, 438, 494 See also variance range (poker), 494 rationalism AI existential risk and, 21, 457 defined, 352–53, 354, 495 effective hedonism and, 376 futurism and, 379 impartiality and, 377 politics and, 17, 377–78 prediction markets and, 369, 372–73, 380 River and, 343 tech sector and, 21 Upriver and, 20 utilitarianism and, 364 varying streams of, 355–56, 356, 380–81, 533n wealthy elites and, 344 rationality, 17, 54, 372–73, 427–28, 495 Rawls, John, 364 Ray, John J., III, 301–2, 303 rec (recreational) players, 495 reciprocity, 130, 367–68, 471–72, 495 reference classes, 448, 450, 452, 457, 495 regression analysis, 23, 495 regulation AI, 270, 458, 541n casinos, 134, 135, 143–44, 157, 513n, 514n poker and, 13 River-Village conflict and, 31 Silicon Valley, 269–70, 272 regulatory capture, 31, 269, 270, 495 reinforcement learning from human feedback (RLHF), 440–41, 442, 495 Reinkemeier, Tobias, 102–3 replication crisis, 179, 497 Repugnant Conclusion, 364–65, 403, 495 resilience, 116–17 results-oriented thinking, 495 retail bookmakers, 186–90, 187, 489, 518n return on investment (ROI), 477, 495 revealed preference, 495 revenge, 428–30 Rhodes, Richard, 418n, 496 risk aversion, 137, 268, 277, 427–28, 495 risk ignorance, 247–48, 264, 265, 266 risk impact, 87–97 attention to detail and, 235 Coates on, 89–91, 125 flow state and, 88, 93–94, 95, 126 Garrett-Robbi hand and, 125–26 sports and, 94 tells and, 88 Tendler on, 91–94, 125 risk-loving disposition, 495 risk-neutral disposition, 495 risk of ruin, 495 risk-taker attributes, 23–26, 217–18, 221–43 adaptability, 235–37 asymmetric odds, 248, 259, 260–62 attention to detail, 233–35 calmness, 221–22 courage, 222–24 estimation ability, 237–38 fragile ego, 223 independence, 31, 239–40, 249, 268 lack of money drive, 242–43 patience, 258, 259–60, 260 preparation, 232–33 process-oriented thinking, 226–27, 495 raise-or-fold attitude, 229–31 resentment and, 223, 277 risk tolerance, 26, 30, 227–29 strategic empathy, 224–25 venture capital and, 248–49 See also contrarianism risk tolerance consequences and, 30 COVID-19 and, 6–7, 8–9, 10, 10 decision science on, 427–28 degens and nits, 9, 114, 482 founders and, 247–48, 251, 252, 264–65, 337, 403 gender and, 120 insufficiency of, 90 life expectancy and, 10–11 luck and, 116 Elon Musk and, 229, 247–48, 251, 252, 264–65, 299 poker and, 113–14 as River attribute, 26, 30, 227–29 River-Village conflict and, 29, 30 SBF and, 334–35, 397–403, 537–38n slots and, 168 sports betting and, 179, 196 statistical distribution and, 9 table games and, 165–66 venture capital and, 249, 264 Village and, 137 See also physical risk-takers River, the Archipelago, 22, 310, 478 autism and, 282–84, 525n collegiality within, 249–50 concrete learning and, 432n cultural domination of, 137–38 decoupling and, 24–25, 26, 27, 352, 505n defined, 495 demographics of, 29, 506n effective altruism and, 343 fictional portrayals of, 112 gender and, 29, 117, 506n Las Vegas veneration of, 139 map of, 18, 19, 20–26 megalothymia and, 468 name of, 18, 42, 504n obsession and, 196 prediction markets and, 371–72, 493 process-oriented thinking and, 495 quantification and, 352 race and, 29, 506n rationalism and, 343 SBF’s presence in, 299 self-awareness and, 417 venture capital and, 249–50 See also risk-taker attributes; River-Village conflict river (poker), 42, 495 River-Village conflict, 26–31 culture wars and, 29, 272–73 decoupling and, 27, 482 higher education and, 294–96 moral hazard and, 30 moral philosophy and, 30–31 politics and, 27–28, 29, 30, 267–68, 271, 505–6n regulatory capture and, 31, 269 risk aversion and, 493 Silicon Valley and, 26, 267–75, 290, 295, 505n RLHF (reinforcement learning from human feedback), 440–41, 442, 495 Robins, Jason, 184, 186 robustness, 495 Rock, Arthur, 257, 296 rock paper scissors, 47, 58 Roffman, Marvin, 151 Rogers, Kenny, 229 roon, 410–13, 417, 442, 443, 452, 459–60, 501, 539n Rounders, 45, 112, 134, 333, 487 Rousseau, Jean-Jacques, 54 Roxborough, Roxy, 178 rug pull (crypto), 496 rule utilitarianism, 368, 500 Rumbolz, Mike, 138, 142, 153, 167, 186 running good/rungood, 496 Russell, Stuart, 441 Russian roulette, 496 r/wallstreetbets, 314–15, 317–18, 321, 411, 489, 496 Ryder, Nick, 415n, 430–31, 433, 479 S Sagan, Scott, 425, 426 Sagbigsal, Bryan, 127 Saltz, Jerry, 329, 331n, 484 sample size, 496 sampling error, 489 Sassoon, Danielle, 401 Satoshi (cryptocurrency), 496 SBF.

pages: 281 words: 79,464

Against Empathy: The Case for Rational Compassion
by Paul Bloom

Doris, Talking to Our Selves: Reflection, Ignorance, and Agency (Oxford: Oxford University Press, 2015). 223 Jonathan Haidt captures Jonathan Haidt, “The Emotional Dog and Its Rational Tail: A Social Intuitionist Approach to Moral Judgment,” Psychological Review 108 (2001): 814–34. The issue in “repligate” For discussion, see Paul Bloom, “Psychology’s Replication Crisis Has a Silver Lining,” The Atlantic, February 19, 2016, http://www.theatlantic.com/science/archive/2016/02/psychology-studies-replicate/468537. 224 eventually published this failure Brian D. Earp et al., “Out, Damned Spot: Can the ‘Macbeth Effect’ Be Replicated?” Basic and Applied Social Psychology 36 (2014): 91–98.

Know Thyself
by Stephen M Fleming
Published 27 Apr 2021

In a 2015 study, it was found that out of one hundred textbook findings in academic psychology, only thirty-nine could be successfully reproduced. A more recent replication of twenty-one high-profile studies published in Science and Nature found a slightly better return—62 percent—but one that should still be concerning to social scientists wishing to build on the latest findings. This replication crisis takes on deeper meaning for students just embarking on their PhDs, who are often tasked with repeating a key study from another lab as a jumping-off point for new experiments. When apparently solid findings begin to disintegrate, months and even years can be wasted chasing down results that don’t exist.

pages: 404 words: 92,713

The Art of Statistics: How to Learn From Data
by David Spiegelhalter
Published 2 Sep 2019

This has led to doubts about the reliability of parts of the scientific literature, with claims that many ‘discoveries’ cannot be reproduced by other researchers—such as the continuing dispute over whether adopting an assertive posture popularly known as a ‘power pose’ can induce hormonal and other changes.8 The inappropriate use of standard statistical methods has received a fair share of the blame for what has become known as the reproducibility or replication crisis in science. With the growing availability of massive data sets and user-friendly analysis software, it might be thought that there is less need for training in statistical methods. This would be naïve in the extreme. Far from freeing us from the need for statistical skills, bigger data and the rise in the number and complexity of scientific studies makes it even more difficult to draw appropriate conclusions.

pages: 289 words: 92,714

The Rationalist's Guide to the Galaxy: Superintelligent AI and the Geeks Who Are Trying to Save Humanity's Future
by Tom Chivers
Published 12 Jun 2019

Much of our understanding of them comes from the work of Daniel Kahneman and Amos Tversky, a pair of Israeli psychologists who did a series of groundbreaking experiments in the 1970s, although many other psychologists have worked on them since. (A worthwhile caveat to mention at this point: since Kahneman and Tversky did their work, and since Kahneman’s book Thinking, Fast and Slow made it especially famous in 2011, psychology in particular and science in general has been wracked by the ‘replication crisis’, in which many high-profile studies have turned out to be untrustworthy. Most of the stuff Kahneman and Tversky talked about is, I think, pretty robust, but it’s just worth taking everything in psychology with a pinch of salt at this point.) The Rationalists are extremely interested in all this.

pages: 375 words: 102,166

The Genetic Lottery: Why DNA Matters for Social Equality
by Kathryn Paige Harden
Published 20 Sep 2021

Certainly, there are misuses of genetic data that need to be guarded against, which I will return to in chapter 12. But while researchers might have good intentions, the widespread practice of ignoring genetics in social science research has significant costs. In the past few years, the field of psychology has been rocked by a “replication crisis,” in which it has become clear that many of the field’s splashy findings, published in the top journals, could not be reproduced and are likely to be false. Writing about the methodological practices that led to the mass production of illusory findings (practices known as “p-hacking”), the psychologist Joseph Simmons and his colleagues wrote that “everyone knew [p-hacking] was wrong, but they thought it was wrong the way it is wrong to jaywalk.”

Calling Bullshit: The Art of Scepticism in a Data-Driven World
by Jevin D. West and Carl T. Bergstrom
Published 3 Aug 2020

If researchers have faked their data, we wouldn’t expect to be able to replicate their experiments. Fraud generates enormous publicity, which can give a misleading impression of its frequency.*2 But outright fraud is rare. It might explain why one study in a thousand can’t be replicated. It doesn’t explain why half of the results in fields are irreproducible. How then do we explain the replication crisis? To answer this, it is helpful to take a detour and look at a statistic known as a p-value. THE PROSECUTOR’S FALLACY As we’ve seen, most scientific studies look to patterns in data to make inferences about the world. But how can we distinguish a pattern from random noise? And how do we quantify how strong a particular pattern is?

pages: 338 words: 104,815

Nobody's Fool: Why We Get Taken in and What We Can Do About It
by Daniel Simons and Christopher Chabris
Published 10 Jul 2023

Bhattacharjee, “The Mind of a Con Man,” New York Times Magazine, April 26, 2013 [https://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html]. Psychologist Yoel Inbar described his firsthand knowledge of the Stapel case in an episode of the podcast Two Psychologists, Four Beers titled “The Replication Crisis Gets Personal” [https://www.fourbeers.com/4]. Stapel’s memoir, Faking Science: A True Story of Academic Fraud, was published in Dutch in 2012 and translated into English by Nicholas J. L. Brown in 2016 [http://nick.brown.free.fr/stapel/FakingScience-20161115.pdf]. 24. Wansink study: B. Wansink, D.

Visual Thinking: The Hidden Gifts of People Who Think in Pictures, Patterns, and Abstractions
by Temple Grandin, Ph.d.
Published 11 Oct 2022

With all of the effort to replicate the experiment, nobody thought to ask about the stirring method. Most of the errors in the findings can be traced to poorly described methods, which make it difficult for another scientist to accurately replicate an experiment. These are the kinds of details that jump out at visual thinkers. We are in the middle of a replication crisis in biomedical research. In the last few years, the number of studies that have been retracted from the scientific literature has increased significantly. Massive pressure on researchers to publish in order to keep the grant money flowing is largely responsible. A review of the literature by Elisabeth M.

User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work & Play
by Cliff Kuang and Robert Fabricant
Published 7 Nov 2019

Descartes imagined that he was in thrall to some demon who held him asleep in a dream, controlling everything he experienced. Modern philosophers call this the “brain in a vat” thought experiment; you might also imagine The Matrix. 32. Some experiments in the field of embodied cognition have come under scrutiny in the broader psychological community, owing to the so-called replication crisis that has rocked the entire profession. But “grounded cognition” remains a live vein of research. 33. Samuel McNerney, “A Brief Guide to Embodied Cognition: Why You Are Not Your Brain,” Scientific American, November 4, 2011, https://blogs.scientificamerican.com/guest-blog/a-brief-guide-to-embodied-cognition-why-you-are-not-your-brain. 34.

pages: 314 words: 122,534

The Missing Billionaires: A Guide to Better Financial Decisions
by Victor Haghani and James White
Published 27 Aug 2023

This proliferation of factors, now in the hundreds, has given birth to the term “factor zoo” in reference to the overall literature.12 The same phenomenon magnified 10‐fold is seen in the explosion in the number of ETFs and indexes on which they are based.13 Adding to the doubt and confusion over the factor zoo's quickly expanding size is the well‐publicized “replication crisis” in social science research, referring to the realization that many academic papers in economics and other social sciences report findings that other researchers are unable to reproduce.14 k While data mining concerns cast doubt on the authenticity of the factor zoo's exotic members more than on the core factors, another data issue impacts all the factors.

pages: 530 words: 147,851

Small Men on the Wrong Side of History: The Decline, Fall and Unlikely Return of Conservatism
by Ed West
Published 19 Mar 2020

They look at their political opponents, who in their heads think of themselves as being on the side of Galileo and Darwin against the bigoted establishment, as more resembling a modern-day Inquisition, ruthless in enforcing orthodoxy wherever they can. Likewise many of the establishment beliefs trotted out in top-selling pop psychology books and repeated at TED Talks and then parroted in high-status postcodes turn out to be rubbish. Among the theories that have crumbled from psychology’s ‘replication crisis’ of the 2010s is ‘stereotype threat’, the idea that preconceived beliefs about people become self-fulfilling prophecies and affect their outcomes. Stereotype threat explains that there are fewer women than men at the top of maths and science-based professions because they are put off by the perception that men are better, an idea so comforting that one 1995 paper has been cited over five thousand times.

Blueprint: The Evolutionary Origins of a Good Society
by Nicholas A. Christakis
Published 26 Mar 2019

Heisenberg also argued that positivists could actually undermine their own program and illustrated this with an example from the history of science in which claims of meteorites in the eighteenth century were “dismissed as rank superstition,” whereas, of course, they do exist. 37. D. Kevles, In the Name of Eugenics: Genetics and the Uses of Human Heredity (New York: Knopf, 1985); R. Merton, The Sociology of Science: Theoretical and Empirical Investigations (Chicago: University of Chicago Press, 1973). We should also be cognizant of the “replication crisis” afflicting so many branches of science in the 2010s, including psychology, economics, physics, biology, epidemiology, and oncology. 38. P. W. Anderson, “More Is Different,” Science 177 (1972): 393–396. 39. Interestingly, humans are natural-born essentialists. From an early age, we categorize objects according to fundamental commonalities, discriminate between these categories, and assign each category a basic essence.

pages: 741 words: 199,502

Human Diversity: The Biology of Gender, Race, and Class
by Charles Murray
Published 28 Jan 2020

You can square that figure and point out that IQ explains only 25 percent of the variance in job performance. If you’re an employer, however, and are told that a standard deviation increase in IQ is associated with half a standard deviation increase in overall job performance, a predictive validity of +.50 is a big deal.18 Since we live in an age when the social sciences are suffering from a replication crisis, I emphasize again that the generalizations I have made about the relationships of g to educational attainment and job productivity are drawn from hundreds of studies. Psychometric g Versus Other Personal Traits The popular suspicion of IQ’s relationship to success has been tenacious, but for an understandable reason.