p-hacking

back to index

25 results

Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth
by Stuart Ritchie
Published 20 Jul 2020

Andrew Gelman & Eric Loken, ‘The Garden of Forking Paths: Why Multiple Comparisons can be a Problem, Even When There is no “Fishing Expedition” or “p-Hacking” and the Research Hypothesis was Posited Ahead of Time’, unpublished, 4 Nov. 2013; http://www.stat.columbia.edu/∼gelman/research/unpublished/p_hacking.pdf. And Jorge Luis Borges, ‘The Garden of Forking Paths’, Labyrinths, tr. Donald A. Yates (New York: New Directions, 1962, 1964). 73.  This framing of the p-hacking problem is due to Yarkoni & Westfall, who call p-hacking ‘procedural overfitting’: Yarkoni & Westfall, ‘Choosing Prediction’, p. 1103. 74.  Roger Giner-Sorolla, ‘Science or Art?

By openly admitting to dredging through his datasets looking for anything that was ‘significant’, Wansink had unintentionally revealed a major flaw in the way he, and unfortunately many thousands of other scientists, conducted research. That flaw has been dubbed ‘p-hacking’.47 Because the p < 0.05 criterion is so important for getting papers published – after all, it supposedly signals a real effect – scientists whose studies show ambiguous or disappointing results regularly use practices that ever so slightly nudge, or hack, their p-values below that crucial threshold. Such p-hacking comes in two main flavours. In one, scientists pursuing a particular hypothesis run and re-run and re-re-run their analysis of an experiment, each time in a marginally different way, until chance eventually grants them a p-value below 0.05.

The scientist can then declare, often perhaps convincing even themselves, that they’d been searching for these results from the start.49 This latter type of p-hacking is known as HARKing, or Hypothesising After the Results are Known. It’s nicely summed up by the oft-repeated analogy of the ‘Texas sharpshooter’, who takes his revolver and randomly riddles the side of a barn with gunshots, then swaggers over to paint a bullseye around the few bullet holes that happen to be near to one another, claiming that’s where he was aiming all along.50 Both kinds of p-hacking are instances of the same mistake and ironically, it’s precisely the one that p-values were invented to avoid: capitalising on random chance.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design
by Michael Kearns and Aaron Roth
Published 3 Oct 2019

But the study failed to be replicated, and “power poses” became the poster child for the reproducibility crisis and the dangers of p-hacking. “We realized entire literatures could be false positives,” [prominent p-hacking critic Joe] Simmons says. “They had collaborated with enough other researchers to recognize that the practice was widespread and counted themselves among the guilty.” The famous priming studies we discussed also failed to hold up to scrutiny. And food science has been under suspicion for years. A notable p-hacking scandal rocked the food science community in 2017. In this instance the principal researcher in question, a celebrated Cornell professor named Brian Wansink, seemed to actively embrace p-hacking as a means of generating results.

But in a more typical situation, sequential decisions are being made at least in part by human beings, and it isn’t possible to reason about the myriad counterfactuals: What would I have done in every circumstance had my analyses come out differently? This is why making decisions based on the data has classically been viewed with skepticism, going by various derogatory names including “data snooping,” “data dredging,” and—as we have seen—“p-hacking.” The methodological dangers presented by the combination of algorithmic and human p-hacking have generated acrimonious controversies and hand-wringing over scientific findings that don’t reflect reality. These play a central role in what is broadly referred to as the “reproducibility crisis” in science, which has its own Wikipedia pages that begins: The replication crisis (or replicability crisis or reproducibility crisis) is an ongoing (2019) methodological crisis in science in which scholars have found that the results of many scientific studies are difficult or impossible to replicate or reproduce on subsequent investigation, either by independent researchers or by the original researchers themselves.

The crisis has long-standing roots; the phrase was coined in the early 2010s as part of a growing awareness of the problem. While p-hacking is not the only culprit here—poor study design, sloppy experimental technique, and even occasionally outright fraud and deception are present—the concern is that even well-intentioned data-driven scientific exploration can lead to false findings that cannot be reproduced on fresh data or in fresh experiments. Among the more prominent examples associated with p-hacking was the controversial research on “power poses” that we discussed earlier. Here is how it was described in the New York Times Magazine in late 2017: The study found that subjects who were directed to stand or sit in certain positions—legs astride, or feet up on a desk—reported stronger “feelings of power” after posing than they did before.

pages: 172 words: 51,837

How to Read Numbers: A Guide to Statistics in the News (And Knowing When to Trust Them)
by Tom Chivers and David Chivers
Published 18 Mar 2021

Uncovering Wansink’s behaviour took months of digging by conscientious, statistically minded researchers and an experienced science journalist. Most of the time, journalists writing about science are writing quick news stories from press releases. They are not going to be able to spot p-hacking even if they have the dataset, which they usually don’t. And p-hacked studies have an unfair advantage: because they don’t need to be true, it’s easier to get them to be exciting. So they tend to turn up in the news a lot. There’s no easy way for readers to spot this in news stories. But it’s worth being aware that just because something is ‘statistically significant’, it doesn’t mean that it’s actually significant, or even that it’s true.

She and Wansink published five different papers from that dataset, including the ‘men eat to impress women’ study. In it, they found a p-value of 0.02 for men eating more pizza around women, and 0.04 for salad. But that blog post raised red flags with scientists. Behaviour like this is known as ‘p-hacking’, massaging the data to get your p-value to a publishable below-0.05 figure. Methodologically savvy researchers started to go through all Wansink’s old work, and a source leaked his emails to Stephanie M. Lee, an investigative science journalist at BuzzFeed News. It transpired that he had asked his PhD student to break up the data into ‘males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to buffet, people who sit far away, and so on’.9 Other methodological problems were found with Wansink’s old papers, and other emails revealed shoddy statistical practice – in one, he suggests that ‘we should be able to get much more from this … I think it would be good to mine it for significance and a good story.’10 He wanted the research to ‘go virally big time’.

It transpired that he had asked his PhD student to break up the data into ‘males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to buffet, people who sit far away, and so on’.9 Other methodological problems were found with Wansink’s old papers, and other emails revealed shoddy statistical practice – in one, he suggests that ‘we should be able to get much more from this … I think it would be good to mine it for significance and a good story.’10 He wanted the research to ‘go virally big time’. This was a dramatic example. But p-hacking – in less dramatic forms – goes on all the time. It is usually innocent. Academics desperate to get p<0.05, so they can get their paper published, will rerun a trial, or reanalyse the data. You might have heard of the ‘replication crisis’, in which lots of important findings in psychology and other disciplines have turned out not to exist when other scientists tried to replicate their findings.

pages: 301 words: 85,263

New Dark Age: Technology and the End of the Future
by James Bridle
Published 18 Jun 2018

It’s easy to calculate and it’s easy to read, meaning that more and more journals use it as shorthand for reliability when sifting through potentially thousands of submissions. Moreover, p-hacking doesn’t just depend on getting those serendipitous results and running with them. Instead, researchers can comb through vast amounts of data to find the results they need. Say that instead of rolling ten green dice, I also rolled ten blue ones, ten yellow ones, ten red ones, and so on. I could roll fifty different colours, and most of them would come out close to the average. But the more I rolled, the more likely I would be to get an anomalous result – and this is the one I could publish. This practice has given p-hacking another name: data dredging. Data dredging has become particularly notorious in the social sciences, where social media and other sources of big behavioural data have suddenly and vastly increased the amount of information available to researchers.

Data dredging has become particularly notorious in the social sciences, where social media and other sources of big behavioural data have suddenly and vastly increased the amount of information available to researchers. But the pervasiveness of p-hacking isn’t limited to the social sciences. A comprehensive analysis of 100,000 open access papers in 2015 found evidence of p-hacking across multiple disciplines.16 The researchers mined the papers for every p-value they could find, and they discovered that the vast majority just scraped under the 0.05 boundary – evidence, they said, that many scientists were adjusting their experimental designs, data sets, or statistical methods in order to get a result that crossed the significance threshold.

‘People Who Mattered 2014’, Time, December 2014, time.com. 13.Yudhijit Bhattacharjee, ‘The Mind of a Con Man’, New York Times, April 26, 2013, nytimes.com. 14.Monya Baker, ‘1,500 scientists lift the lid on reproducibility’, Nature, May 25, 2016, nature.com. 15.For more on the math of this experiment, see Jean-Francois Puget, ‘Green dice are loaded (welcome to p-hacking)’, IBM developer-Works blog entry, March 22, 2016, ibm.com. 16.M. L. Head, et al., ‘The Extent and Consequences of P-Hacking in Science’, PLOS Biology 13:3 (2015). 17.John P. A. Ioannidis, ‘Why Most Published Research Findings Are False’, PLOS ONE, August 2005. 18.Derek J. de Solla Price, Little Science, Big Science, New York: Columbia University Press, 1963. 19.Siebert, Machesky, and Insall, ‘Overflow in science and its implications for trust’, eLife 14 (September 2015), ncbi.nlm.nih.gov. 20.Ibid. 21.Michael Eisen, ‘Peer review is f***ed up – let’s fix it’, personal blog entry, October 28, 2011, michaeleisen.org. 22.Emily Singer, ‘Biology’s big problem: There’s too much data to handle’, Wired, October 11, 2013, wired.com. 23.Lisa Grossman and Maggie McKee, ‘Is the LHC throwing away too much data?’

Calling Bullshit: The Art of Scepticism in a Data-Driven World
by Jevin D. West and Carl T. Bergstrom
Published 3 Aug 2020

Under those circumstances it may feel easy to rationalize the latter approach. “I’m sure the trend is really there,” you might tell yourself. “I was thinking about omitting women from the study right from the start.” Congratulations. You’ve just p-hacked your study.*9 * * * — IMAGINE A THOUSAND RESEARCHERS of unimpeachable integrity, all of whom refuse to p-hack under any circumstances. These virtuous scholars test a thousand hypotheses about relationships between political victories and analgesic use, all of which are false. Simply by chance, roughly fifty of these hypotheses will be statistically supported at the p = 0.05 level.

Each individual test then requires stronger evidence to be considered significant, so that there is roughly a one-in-twenty chance that any of the tested hypotheses would appear significant if the null hypothesis were true. *9 To illustrate how powerful p-hacking techniques can be, Joseph Simmons and colleagues Leif Nelson and Uri Simonsohn tested a pair of hypotheses they were pretty sure were untrue. One was an unlikely hypothesis; the other was impossible. The unlikely hypothesis was that listening to children’s music makes people feel older than they really are. Volunteers listened to either a children’s song or a control song, and later were asked how old they felt. With a bit of p-hacking, the researchers concluded that listening to a children’s song makes people feel older, with statistical significance at the p < 0.05 level.

In the case of the Higgs boson, there were already good reasons to expect that the Higgs boson would exist, and its existence was subsequently confirmed. But this is not always the case.*6 The important thing to remember is that a very unlikely hypothesis remains unlikely even after someone obtains experimental results with a very low p-value. P-HACKING AND PUBLICATION BIAS Purely as a matter of convention, we often use a p-value of 0.05 as a cutoff for saying that a result is statistically significant.*7 In other words, a result is statistically significant when p < 0.05, i.e., when it would have less than 5 percent probability of arising due to chance alone.

pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python
by Joel Grus
Published 13 Apr 2015

entropy, Entropy entropy of a partition, The Entropy of a Partition hiring tree implementation (example), Putting It All Together random forests, Random Forests degree centrality, Finding Key Connectors, Betweenness Centrality DELETE statement (SQL), DELETE delimited files, Delimited Files dependence, Dependence and Independence derivatives, approximating with difference quotients, Estimating the Gradient dictionaries (Python), Dictionariesdefaultdict, defaultdict items and iteritems methods, Generators and Iterators dimensionality reduction, Dimensionality Reduction-Dimensionality Reductionusing principal component analysis, Dimensionality Reduction dimensionality, curse of, The Curse of Dimensionality-The Curse of Dimensionality, User-Based Collaborative Filtering discrete distribution, Continuous Distributions dispersion, Dispersionrange, Dispersion standard deviation, Dispersion variance, Dispersion distance, The Model(see also nearest neighbors classification) between clusters, Bottom-up Hierarchical Clustering distance function, Rescaling, The Model distributionbernoulli, The Central Limit Theorem, Example: Flipping a Coin beta, Bayesian Inference binomial, The Central Limit Theorem, Example: Flipping a Coin continuous, Continuous Distributions normal, The Normal Distribution dot product, Vectors, Matrix Multiplication dummy variables, Multiple Regression E edges, Network Analysis eigenshirts project, T-shirts eigenvector centrality, Eigenvector Centrality-Centrality ensemble learning, Random Forests entropy, Entropyof a partition, The Entropy of a Partition enumerate function (Python), enumerate errorsin clustering, Choosing k in multiple linear regression model, Further Assumptions of the Least Squares Model in simple linear regression model, The Model, Maximum Likelihood Estimation minimizing in models, Gradient Descent-For Further Exploration standard errors of regression coefficients, Standard Errors of Regression Coefficients-Standard Errors of Regression Coefficients Euclidean distance function, Rescaling exceptions in Python, Exceptions experience optimization, Example: Running an A/B Test F F1 score, Correctness false positives, Example: Flipping a Coin farness, Betweenness Centrality features, Feature Extraction and Selectionchoosing, Feature Extraction and Selection extracting, Feature Extraction and Selection feed-forward neural networks, Feed-Forward Neural Networks files, reading, Reading Filesdelimited files, Delimited Files text files, The Basics of Text Files filter function (Python), Functional Tools fire trucks project, Fire Trucks for comprehensions (Python), Generators and Iterators for loops (Python), Control Flowin list comprehensions, List Comprehensions full outer joins, JOIN functions (Python), Functions G generators (Python), Generators and Iterators getting data (see data, getting) Gibbs sampling, An Aside: Gibbs Sampling-An Aside: Gibbs Sampling Github's API, Using an Unauthenticated API gradient, The Idea Behind Gradient Descent gradient descent, Gradient Descent-For Further Explorationchoosing the right step size, Choosing the Right Step Size estimating the gradient, Estimating the Gradient example, minimize_batch function, Putting It All Together stochastic, Stochastic Gradient Descent using for multiple regression model, Fitting the Model using in simple linear regression, Using Gradient Descent grammars, Grammars-Grammars greedy algorithms, Creating a Decision Tree GROUP BY statement (SQL), GROUP BY-GROUP BY H Hacker News, Hacker News harmonic mean, Correctness hierarchical clustering, Bottom-up Hierarchical Clustering-Bottom-up Hierarchical Clustering histogramsof friend counts (example), Describing a Single Set of Data plotting using bar charts, Bar Charts HTML, parsing, HTML and the Parsing Thereofexample, O'Reilly books about data, Example: O’Reilly Books About Data-Example: O’Reilly Books About Data using Beautiful Soup library, HTML and the Parsing Thereof hypotheses, Hypothesis and Inference hypothesis testing, Statistical Hypothesis Testingexample, an A/B test, Example: Running an A/B Test example, flipping a coin, Example: Flipping a Coin-Example: Flipping a Coin p-hacking, P-hacking regression coefficients, Standard Errors of Regression Coefficients-Standard Errors of Regression Coefficients using confidence intervals, Confidence Intervals using p-values, Example: Flipping a Coin I if statements (Python), Control Flow if-then-else statements (Python), Control Flow in operator (Python), Lists, Dictionariesin for loops, Control Flow using on sets, Sets independence, Dependence and Independence indexes (database tables), Indexes inferenceBayesian Inference, Bayesian Inference statistical, in A/B test, Example: Running an A/B Test inner joins, JOIN INSERT statement (SQL), CREATE TABLE and INSERT interactive visualizations, Visualization inverse normal cumulative distribution function, The Normal Distribution IPython, Getting Python, IPython item-based collaborative filtering, Item-Based Collaborative Filtering-For Further Exploration J JavaScript, D3.js library, Visualization JOIN statement (SQL), JOIN JSON (JavaScript Object Notation), JSON (and XML) K k-means clustering, The Modelchoosing k, Choosing k k-nearest neighbors classification (see nearest neighbors classification) kernel trick, Support Vector Machines key/value pairs (in Python dictionaries), Dictionaries kwargs (Python), args and kwargs L Lasso regression, Regularization Latent Dirichlet Analysis (LDA), Topic Modeling layers (neural network), Feed-Forward Neural Networks least squares modelassumptions, Further Assumptions of the Least Squares Model in simple linear regression, The Model left joins, JOIN likelihood, Maximum Likelihood Estimation, The Logistic Function line chartscreating with matplotlib, matplotlib showing trends, Line Charts linear algebra, Linear Algebra-For Further Exploration, Mathematicsmatrices, Matrices-Matrices vectors, Vectors-Vectors linear regressionmultiple, Multiple Regression-For Further Explorationassumptions of least squares model, Further Assumptions of the Least Squares Model bootstrapping new data sets, Digression: The Bootstrap goodness of fit, Goodness of Fit interpreting the model, Interpreting the Model model, The Model regularization, Regularization standard errors of regression coefficients, Standard Errors of Regression Coefficients-Standard Errors of Regression Coefficients simple, Simple Linear Regression-For Further Explorationmaximum likelihood estimation, Maximum Likelihood Estimation model, The Model using gradient descent, Using Gradient Descent using to predict paid accounts, The Problem list comprehensions (Python), List Comprehensions lists (in Python), Listsrepresenting matrices as, Matrices sort method, Sorting using to represent vectors, Vectors zipping and unzipping, zip and Argument Unpacking log likelihood, The Logistic Function logistic regression, Logistic Regression-For Further Investigationapplying the model, Applying the Model goodness of fit, Goodness of Fit logistic function, The Logistic Function problem, predicting paid user accounts, The Problem M machine learning, Machine Learning-For Further Explorationbias-variance trade-off, The Bias-Variance Trade-off correctness, Correctness defined, What Is Machine Learning?

modules (Python), Modules multiple assignment (Python), Tuples N n-gram models, n-gram Models-n-gram Modelsbigram, n-gram Models trigrams, n-gram Models n-grams, n-gram Models Naive Bayes algorithm, Naive Bayes-For Further Explorationexample, filtering spam, A Really Dumb Spam Filter-A More Sophisticated Spam Filter implementation, Implementation natural language processing (NLP), Natural Language Processing-For Further Explorationgrammars, Grammars-Grammars topic modeling, Topic Modeling-Topic Modeling topics of interest, finding, Topics of Interest word clouds, Word Clouds-Word Clouds nearest neighbors classification, k-Nearest Neighbors-For Further Explorationcurse of dimensionality, The Curse of Dimensionality-The Curse of Dimensionality example, favorite programming languages, Example: Favorite Languages-Example: Favorite Languages model, The Model network analysis, Network Analysis-For Further Explorationbetweenness centrality, Betweenness Centrality-Betweenness Centrality closeness centrality, Betweenness Centrality degree centrality, Finding Key Connectors, Betweenness Centrality directed graphs and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see natural language processing) nodes, Network Analysis noise, Rescalingin machine learning, Overfitting and Underfitting None (Python), Truthiness normal distribution, The Normal Distributionand p-value computation, Example: Flipping a Coin central limit theorem and, The Central Limit Theorem in coin flip example, Example: Flipping a Coin standard, The Normal Distribution normalized tables, JOIN NoSQL databases, NoSQL NotQuiteABase, Databases and SQL null hypothesis, Statistical Hypothesis Testingtesting in A/B test, Example: Running an A/B Test NumPy, NumPy O one-sided tests, Example: Flipping a Coin ORDER BY statement (SQL), ORDER BY overfitting, Overfitting and Underfitting, The Bias-Variance Trade-off P p-hacking, P-hacking p-values, Example: Flipping a Coin PageRank algorithm, Directed Graphs and PageRank paid accounts, predicting, Paid Accounts pandas, For Further Exploration, For Further Exploration, pandas parameterized models, What Is Machine Learning? parameters, probability judgments about, Bayesian Inference partial derivatives, Estimating the Gradient partial functions (Python), Functional Tools PCA (see principal component analysis) perceptrons, Perceptrons pip (Python package manager), Getting Python pipe operator (|), stdin and stdout piping data through Python scripts, stdin and stdout posterior distributions, Bayesian Inference precision and recall, Correctness predicate functions, DELETE predictive modeling, What Is Machine Learning?

If instead we’d seen 540 heads, then we’d have: p_hat = 540 / 1000 mu = p_hat sigma = math.sqrt(p_hat * (1 - p_hat) / 1000) # 0.0158 normal_two_sided_bounds(0.95, mu, sigma) # [0.5091, 0.5709] Here, “fair coin” doesn’t lie in the confidence interval. (The “fair coin” hypothesis doesn’t pass a test that you’d expect it to pass 95% of the time if it were true.) P-hacking A procedure that erroneously rejects the null hypothesis only 5% of the time will — by definition — 5% of the time erroneously reject the null hypothesis: def run_experiment(): """flip a fair coin 1000 times, True = heads, False = tails""" return [random.random() < 0.5 for _ in range(1000)] def reject_fairness(experiment): """using the 5% significance levels""" num_heads = len([flip for flip in experiment if flip]) return num_heads < 469 or num_heads > 531 random.seed(0) experiments = [run_experiment() for _ in range(1000)] num_rejections = len([experiment for experiment in experiments if reject_fairness(experiment)]) print num_rejections # 46 What this means is that if you’re setting out to find “significant” results, you usually can.

pages: 250 words: 64,011

Everydata: The Misinformation Hidden in the Little Data You Consume Every Day
by John H. Johnson
Published 27 Apr 2016

Consider: In the endometrial cancer study, the researchers “found a link, but not a cause-and-effect relationship, between coffee drinking and lower risk of endometrial cancer,” according to WebMD.35 In other words, there was correlation, but not causation. “P-hacking” (named after p-values) is a term used when researchers “collect or select data or statistical analyses until nonsignificant results become significant,” according to a PLoS Biology article.”36 This is similar to cherry picking, as p-hacking researchers simply throw things at the wall until something sticks, metaphorically speaking (although there probably are some scientists who actually throw things at the wall until something sticks…).

Aggregated data—Individual data points combined together into groups (e.g., the total number of votes in a state are aggregated to determine who receives that state’s Electoral College votes) Average—A type of summary statistic (usually the mean, mode, or median) that describes the data in a single metric Big data—Data that’s too big for people to process without the use of sophisticated machinery or computing capacity, given its enormous volume Bivariate relationship—A fancy way of saying that there is a relationship between two (“bi”) variables (“variate”) (e.g., the price of your house is related to the number of bathrooms it has) Black swan event—Something that is highly improbable, yet has a massive impact when it occurs Causation—A relationship where it is determined that one factor causes another factor Cherry-picking—Choosing anecdotal examples from the data to make your point, while ignoring other data points that may contradict it Confidence interval—A way to measure the level of statistical certainty about results; typically expressed as a range of values, the confidence interval tells you the range of values within which you’re likely to see the estimate (assuming, of course, you have a random—and representative—sample) Confidence level—The term we use to determine how confident we are that we’re measuring the data correctly Confirmation bias—The tendency to interpret data in a way that reinforces your preconceptions Correlation—A type of statistical relationship between two variables, usually defined as positive (moving in the same direction) or negative (moving in opposite directions) Data—Information or facts Dependence—When one variable is said to be directly determined by another Deterministic forecast—A forecast for which you determine a precise outcome (e.g., it will rain tomorrow at 9 a.m. at my house) Economic impact—How much something is going to cost in terms of time, money, health, or other resources Estimate—A statistic capturing an inference about a population from a sample of data Everydata—The term we use to describe everyday data External validity—The extent to which the results from your sample can be extended to draw meaningful conclusions about the full population False positive—A situation in which the statistical forecast predicts an untrue outcome (e.g., your credit card company calls you suspecting a recent purchase you actually made was fraudulent) Forecast—A statement about the future; while forecast and prediction may have different meanings to specific groups of people (see chapter 8), we generally use them synonymously unless noted otherwise Forecast bias—The term used to describe when a prediction is consistently high (a positive forecast bias) or low (a negative bias) Inference—The process of making statistical conclusions about the data Magnitude—Essentially, the size of the effect Margin of error—A way to measure statistical uncertainty Mean—What most people think of when you say “average” (to get the mean, you add up all the values, then divide by the number of data points) Median—The middle value in a data set that has been rank ordered Misrepresentation—When data is portrayed in an inaccurate or misleading manner Mode—The data point (or points) most frequently found in your data Observation—Looking at one unit, such as a person, a price, or a day Odds—In statistics, the odds of something happening is the ratio of the probability of an outcome to the probability that it doesn’t occur (e.g., a horse’s statistical odds of winning a race might be ⅓, which means it is probable that the horse will win one out of every three races; in betting jargon, the odds are typically the reverse, so this same horse would have 2–1 odds against, which means it has a ⅔ chance of losing) Omitted variable—A variable that plays a role in a relationship, but may be overlooked or otherwise not included; omitted variables are one of the primary reasons why correlation doesn’t equal causation Outlier—A particular observation that doesn’t fit; it may be much higher (or lower) than all the other data, or perhaps it just doesn’t fall into the pattern of everything else that you’re seeing P-hacking—Named after p-values, p-hacking is a term for the practice of repeatedly analyzing data, trying to find ways to make nonsignificant results significant P-value—A way to measure statistical significance; the lower your p-value is, the less likely it is that the results you’re seeing are due to chance Population—The entire set of data or observations that you want to study and draw inferences about; statisticians rarely have the ability to look at the entire population in a study, although it could be possible with a small, well-defined group (e.g., the voting habits of all 100 U.S. senators) Prediction—See forecast Prediction error—A way to measure uncertainty in the future, essentially by comparing the predicted results to the actual outcomes, once they occur Prediction interval—The range in which we expect to see the next data point Probabilistic forecast—A forecast where you determine the probability of an outcome (e.g., there is a 30 percent chance of thunderstorms tomorrow) Probability—The likelihood (typically expressed as a percentage, fraction, or decimal) that an outcome will occur Proxy—A factor that you believe is closely related (but not identical) to another difficult-to-measure factor (e.g., IQ is a proxy for innate ability) Random—When an observed pattern is due to chance, rather than some observable process or event Risk—A term that can mean different things to different people; in general, risk takes into account not only the probability of an event, but also the consequences Sample—Part of the full population (e.g., the set of Challenger launches with O-ring failures) Sample selection—A potential statistical problem that arises when the way a sample has been chosen is directly related to the outcomes one is studying; also, sometimes used to describe the process of determining a sample from a population Sampling error—The uncertainty of not knowing if a sample represents the true value in the population or not Selection bias—A potential concern when a sample is comprised of those who chose to participate, a factor which may bias the results Spurious correlation—A statistical relationship between two factors that has no practical or economic meaning, or one that is driven by an omitted variable (e.g., the relationship between murder rates and ice cream consumption) Statistic—A numeric measure that describes an aspect of the data (e.g., a mean, a median, a mode) Statistical impact—Having a statistically significant effect of some undetermined size Statistical significance—A probability-based method to determine whether an observed effect is truly present in the data, or just due to random chance Summary statistic—Metric that provides information about one or more aspects of the data; averages and aggregated data are two examples of summary statistics Weighted average—An average calculated by assigning each value a weight (based on the value’s relative importance) NOTES Preface 1.

Kathleen Doheny, “Coffee May Lower Endometrial Cancer Risk,” WebMD website, February 6, 2015, http://www.webmd.com/cancer/news/20150206/coffee-linked-to-possible-lower-endometrial-cancer-risk. 36. Megan L. Head, Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions, “The Extent and Consequences of P-Hacking in Science,” Public Library of Science (PLoS) Biology 13, no. 3 (2015): e1002106, doi: 10.1371/journal.pbio.1002106. 37. Jonah Lehrer, “The Truth Wears Off” New Yorker website, December 13, 2010, http://www.newyorker.com/magazine/2010/12/13/the-truth-wears-off. 38. Rebecca Steinbach, Chloe Perkins, Lisa Tompson, Shane Johnson, Ben Armstrong, Judith Green, Chris Grundy, Paul Wilkinson, and Phil Edwards, “The Effect of Reduced Street Lighting on Road Casualties and Crime in England and Wales: Controlled Interrupted Time Series Analysis,” Journal of Epidemiology Community Health (June 3, 2015), doi: 10.1 136/jech-2015-206012, http://jech.bmj.com/content/early/2015/07/08/jech-2015-206012.full.pdf+html. 39.

pages: 283 words: 102,484

Everything Is Predictable: How Bayesian Statistics Explain Our World
by Tom Chivers
Published 6 May 2024

They also did other things, like stopping collecting data if their p-value dropped for a moment below 0.05. Simmons, Nelson, and Simonsohn estimated that by running a few simple tricks like this, you could make it more than 60 percent likely you’d find an apparently significant result. This is known as “hypothesizing after results are known”—HARKing—or “p-hacking,” and it happens all the time, not just in wry papers intended to demonstrate that it’s possible. One example: there’s a thing called the competitive reaction time task (CRTT), which is used to measure aggression, especially in research into the psychological effects of video games. A player plays either a violent or a nonviolent video game.

Also, Stephanie Lee, a science journalist at BuzzFeed, got hold of his emails, in which—it transpired—he had told his PhD student to cut the data up into “males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to buffet, people who sit far away, and so on,” in order to “mine it for significance… squeeze some blood out of this rock” and get it to “go virally big time.”16 As a result, eighteen of Wansink’s papers have been retracted; seven have received “expressions of concern,” which journals append to studies they don’t think can be fully trusted, but aren’t ready to retract altogether; and fifteen have been corrected.17 Wansink, meanwhile, resigned from Cornell in 2019, after the university found him to have committed scientific misconduct, and barred him from teaching and research.18 This is a particularly egregious example, but in a way Wansink was unlucky that he was publicly destroyed for something that was almost standard practice. P-hacking goes on all the time, in much less dramatic ways—and a lot of scientists have absolutely no idea that they’re doing anything wrong. The aforementioned Daryl Bem, in a 1987 book chapter written as a guide to help students get their research published, wrote that “there are two articles you can write: the article you planned to write when you designed your study; the article that makes the most sense now that you have seen the results.

The data may be strong enough to justify recentering your article around the new findings and subordinating or even ignoring your original hypotheses…. Think of your dataset as a jewel. Your task is to cut and polish it, to select the facets to highlight, and to craft the best setting for it.”19 It’s not intended as a call for p-hacking, but “recentering your data around the new findings” is exactly what both the “False Positive” guys and Wansink were doing, and as they demonstrated, if you do that, you can very easily get statistically significant findings from utterly meaningless noise. So far we’ve just looked at specific scientists, but it’s worth getting a sense of how big a problem this was in science at large.

Super Thinking: The Big Book of Mental Models
by Gabriel Weinberg and Lauren McCann
Published 17 Jun 2019

That’s because when a study is designed for a 5 percent chance of a false positive, that chance applies only to one statistical test, but very rarely is only one statistical test conducted. The act of running additional tests to look for statistically significant results has many names, including data dredging, fishing, and p-hacking (trying to hack your data looking for small enough p-values). Often this is done with the best of intentions, as seeing data from an experiment can be illuminating, spurring a researcher to form new hypotheses. The temptation to test these additional hypotheses is strong, since the data needed to analyze them has already been collected.

There are ways to overcome these issues, such as the following: Using lower p-values to properly account for false positive error in the original study, across all the tests that are conducted Using a larger sample size in a replication study to be able to detect a smaller effect size Specifying statistical tests to run ahead of time to avoid p-hacking Nevertheless, as a result of the replication crisis and the reasons that underlie it, you should be skeptical of any isolated study, especially when you don’t know how the data was gathered and analyzed. More broadly, when you interpret a claim, it is important to evaluate critically any data that backs up that claim: Is it from an isolated study or is there a body of research behind the claim?

Department of, 97 just world hypothesis, 22 Kahneman, Daniel, 9, 30, 90 karoshi, 82 Kauffman Foundation, 122 keeping up with the Joneses, 210–11 key person insurance, 305 King, Martin Luther, Jr., 129, 225 KISS (Keep It Simple, Stupid), 10 knowledge, institutional, 257 knowns: known, 197 unknown, 198, 203 known unknowns, 197–98 Knox, Robert E., 91 Kodak, 302–3, 308–10, 312 Koenigswald, Gustav Heinrich Ralph von, 50 Kohl’s, 15 Kopelman, Josh, 301 Korea, 229, 231, 235, 238 Kristof, Nicholas, 254 Krokodil, 49 Kruger, Justin, 269 Kuhn, Thomas, 24 Kutcher, Ashton, 121 labor market, 283–84 laggards, 116–17 landlords, 178, 179, 182, 188 Laplace, Pierre-Simon, 132 large numbers, law of, 143–44 Latané, Bibb, 259 late majority, 116–17 lateral thinking, 201 law of diminishing returns, 81–83 law of diminishing utility, 81–82 law of inertia, 102–3, 105–8, 110, 112, 113, 119, 120, 129, 290, 296 law of large numbers, 143–44 law of small numbers, 143, 144 Lawson, Jerry, 289 lawsuits, 231 leadership, 248, 255, 260, 265, 271, 275, 276, 278–80 learned helplessness, 22–23 learning, 262, 269, 295 from past events, 271–72 learning curve, 269 Le Chatelier, Henri-Louis, 193 Le Chatelier’s principle, 193–94 left to their own devices, 275 Leibniz, Gottfried, 291 lemons into lemonade, 121 Lernaean Hydra, 51 Levav, Jonathan, 63 lever, 78 leverage, 78–80, 83, 115 high-leverage activities, 79–81, 83, 107, 113 leveraged buyout, 79 leveraging up, 78–79 Levitt, Steven, 44–45 Levitt, Theodore, 296 Lewis, Michael, 289 Lichtenstein, Sarah, 17 lightning, 145 liking, 216–17, 220 Lincoln, Abraham, 97 Lindy effect, 105, 106, 112 line in the sand, 238 LinkedIn, 7 littering, 41, 42 Lloyd, William, 37 loans, 180, 182–83 lobbyists, 216, 306 local optimum, 195–96 lock-in, 305 lock in your gains, 90 long-term negative scenarios, 60 loose versus tight, in organizational culture, 274 Lorenz, Edward, 121 loss, 91 loss aversion, 90–91 loss leader strategy, 236–37 lost at sea, 68 lottery, 85–86, 126, 145 low-context communication, 273–74 low-hanging fruit, 81 loyalists versus mercenaries, 276–77 luck, 128 making your own, 122 luck surface area, 122, 124, 128 Luft, Joseph, 196 LuLaRoe, 217 lung cancer, 133–34, 173 Lyautey, Hubert, 276 Lyft, ix, 288 Madoff, Bernie, 232 magnetic resonance imaging (MRI), 291 magnets, 194 maker’s schedule versus manager’s schedule, 277–78 Making of Economic Society, The (Heilbroner), 49 mammograms, 160–61 management debt, 56 manager’s schedule versus maker’s schedule, 277–78 managing to the person, 255 Manhattan Project, 195 Man in the High Castle, The (Dick), 201 manipulative insincerity, 264 man-month, 279 Mansfield, Peter, 291 manufacturer’s suggested retail price (MSRP), 15 margin of error, 154 markets, 42–43, 46–47, 106 failure in, 47–49 labor, 283–84 market norms versus social norms, 222–24 market power, 283–85, 312 product/market fit, 292–96, 302 secondary, 281–82 winner-take-most, 308 marriage: divorce, 231, 305 same-sex, 117, 118 Maslow, Abraham, 177, 270–71 Maslow’s hammer, xi, 177, 255, 297, 317 Maslow’s hierarchy of needs, 270–71 mathematics, ix–x, 3, 4, 132, 178 Singapore math, 23–24 matrices, 2 × 2, 125–26 consensus-contrarian, 285–86, 290 consequence-conviction, 265–66 Eisenhower Decision Matrix, 72–74, 89, 124, 125 of knowns and unknowns, 197–98 payoff, 212–15, 238 radical candor, 263–64 scatter plot on top of, 126 McCain, John, 241 mean, 146, 149, 151 regression to, 146, 286 standard deviation from, 149, 150–51, 154 variance from, 149 measles, 39, 40 measurable target, 49–50 median, 147 Medicare, 54–55 meetings, 113 weekly one-on-one, 262–63 Megginson, Leon, 101 mental models, vii–xii, 2, 3, 31, 35, 65, 131, 289, 315–17 mentorship, 23, 260, 262, 264, 265 mercenaries versus loyalists, 276–77 Merck, 283 merry-go-round, 108 meta-analysis, 172–73 Metcalfe, Robert, 118 Metcalfe’s law, 118 #MeToo movement, 113 metrics, 137 proxy, 139 Michaels, 15 Microsoft, 241 mid-mortems, 92 Miklaszewski, Jim, 196 Milgram, Stanley, 219, 220 military, 141, 229, 279, 294, 300 milkshakes, 297 Miller, Reggie, 246 Mills, Alan, 58 Mindset: The New Psychology of Success (Dweck), 266 mindset, fixed, 266–67, 272 mindset, growth, 266–67 minimum viable product (MVP), 7–8, 81, 294 mirroring, 217 mission, 276 mission statement, 68 MIT, 53, 85 moats, 302–5, 307–8, 310, 312 mode, 147 Moltke, Helmuth von, 7 momentum, 107–10, 119, 129 Monday morning quarterbacking, 271 Moneyball (Lewis), 289 monopolies, 283, 285 Monte Carlo fallacy, 144 Monte Carlo simulation, 195 Moore, Geoffrey, 311 moral hazard, 43–45, 47 most respectful interpretation (MRI), 19–20 moths, 99–101 Mountain Dew, 35 moving target, 136 multiple discovery, 291–92 multiplication, ix, xi multitasking, 70–72, 74, 76, 110 Munger, Charlie, viii, x–xi, 30, 286, 318 Murphy, Edward, 65 Murphy’s law, 64–65, 132 Musk, Elon, 5, 302 mutually assured destruction (MAD), 231 MVP (minimum viable product), 7–8, 81, 294 Mylan, 283 mythical man-month, 279 name-calling, 226 NASA, 4, 32, 33 Nash, John, 213 Nash equilibrium, 213–14, 226, 235 National Football League (NFL), 225–26 National Institutes of Health, 36 National Security Agency, 52 natural selection, 99–100, 102, 291, 295 nature versus nurture, 249–50 negative compounding, 85 negative externalities, 41–43, 47 negative returns, 82–83, 93 negotiations, 127–28 net benefit, 181–82, 184 Netflix, 69, 95, 203 net present value (NPV), 86, 181 network effects, 117–20, 308 neuroticism, 250 New Orleans, La., 41 Newport, Cal, 72 news headlines, 12–13, 221 newspapers, 106 Newsweek, 290 Newton, Isaac, 102, 291 New York Times, 27, 220, 254 Nielsen Holdings, 217 ninety-ninety rule, 89 Nintendo, 296 Nobel Prize, 32, 42, 220, 291, 306 nocebo effect, 137 nodes, 118, 119 No Fly List, 53–54 noise and signal, 311 nonresponse bias, 140, 142, 143 normal distribution (bell curve), 150–52, 153, 163–66, 191 North Korea, 229, 231, 238 north star, 68–70, 275 nothing in excess, 60 not ready for prime time, 242 “now what” questions, 291 NPR, 239 nuclear chain reaction, viii, 114, 120 nuclear industry, 305–6 nuclear option, 238 Nuclear Regulatory Commission (NRC), 305–6 nuclear weapons, 114, 118, 195, 209, 230–31, 233, 238 nudging, 13–14 null hypothesis, 163, 164 numbers, 130, 146 large, law of, 143–44 small, law of, 143, 144 see also data; statistics nurses, 284 Oakland Athletics, 289 Obama, Barack, 64, 241 objective versus subjective, in organizational culture, 274 obnoxious aggression, 264 observe, orient, decide, act (OODA), 294–95 observer effect, 52, 54 observer-expectancy bias, 136, 139 Ockham’s razor, 8–10 Odum, William E., 38 oil, 105–6 Olympics, 209, 246–48, 285 O’Neal, Shaquille, 246 one-hundred-year floods, 192 Onion, 211–12 On the Origin of Species by Means of Natural Selection (Darwin), 100 OODA loop, 294–95 openness to experience, 250 Operation Ceasefire, 232 opinion, diversity of, 205, 206 opioids, 36 opportunity cost, 76–77, 80, 83, 179, 182, 188, 305 of capital, 77, 179, 182 optimistic probability bias, 33 optimization, premature, 7 optimums, local and global, 195–96 optionality, preserving, 58–59 Oracle, 231, 291, 299 order, 124 balance between chaos and, 128 organizations: culture in, 107–8, 113, 273–80, 293 size and growth of, 278–79 teams in, see teams ostrich with its head in the sand, 55 out-group bias, 127 outliers, 148 Outliers (Gladwell), 261 overfitting, 10–11 overwork, 82 Paine, Thomas, 221–22 pain relievers, 36, 137 Pampered Chef, 217 Pangea, 24–25 paradigm shift, 24, 289 paradox of choice, 62–63 parallel processing, 96 paranoia, 308, 309, 311 Pareto, Vilfredo, 80 Pareto principle, 80–81 Pariser, Eli, 17 Parkinson, Cyril, 74–75, 89 Parkinson’s law, 89 Parkinson’s Law (Parkinson), 74–75 Parkinson’s law of triviality, 74, 89 passwords, 94, 97 past, 201, 271–72, 309–10 Pasteur, Louis, 26 path dependence, 57–59, 194 path of least resistance, 88 Patton, Bruce, 19 Pauling, Linus, 220 payoff matrix, 212–15, 238 PayPal, 72, 291, 296 peak, 105, 106, 112 peak oil, 105 Penny, Jonathon, 52 pent-up energy, 112 perfect, 89–90 as enemy of the good, 61, 89–90 personality traits, 249–50 person-month, 279 perspective, 11 persuasion, see influence models perverse incentives, 50–51, 54 Peter, Laurence, 256 Peter principle, 256, 257 Peterson, Tom, 108–9 Petrified Forest National Park, 217–18 Pew Research, 53 p-hacking, 169, 172 phishing, 97 phones, 116–17, 290 photography, 302–3, 308–10 physics, x, 114, 194, 293 quantum, 200–201 pick your battles, 238 Pinker, Steven, 144 Pirahã, x Pitbull, 36 pivoting, 295–96, 298–301, 308, 311, 312 placebo, 137 placebo effect, 137 Planck, Max, 24 Playskool, 111 Podesta, John, 97 point of no return, 244 Polaris, 67–68 polarity, 125–26 police, in organizations and projects, 253–54 politics, 70, 104 ads and statements in, 225–26 elections, 206, 218, 233, 241, 271, 293, 299 failure and, 47 influence in, 216 predictions in, 206 polls and surveys, 142–43, 152–54, 160 approval ratings, 152–54, 158 employee engagement, 140, 142 postmortems, 32, 92 Potemkin village, 228–29 potential energy, 112 power, 162 power drills, 296 power law distribution, 80–81 power vacuum, 259–60 practice, deliberate, 260–62, 264, 266 precautionary principle, 59–60 Predictably Irrational (Ariely), 14, 222–23 predictions and forecasts, 132, 173 market for, 205–7 superforecasters and, 206–7 PredictIt, 206 premature optimization, 7 premises, see principles pre-mortems, 92 present bias, 85, 87, 93, 113 preserving optionality, 58–59 pressure point, 112 prices, 188, 231, 299 arbitrage and, 282–83 bait and switch and, 228, 229 inflation in, 179–80, 182–83 loss leader strategy and, 236–37 manufacturer’s suggested retail, 15 monopolies and, 283 principal, 44–45 principal-agent problem, 44–45 principles (premises), 207 first, 4–7, 31, 207 prior, 159 prioritizing, 68 prisoners, 63, 232 prisoner’s dilemma, 212–14, 226, 234–35, 244 privacy, 55 probability, 132, 173, 194 bias, optimistic, 33 conditional, 156 probability distributions, 150, 151 bell curve (normal), 150–52, 153, 163–66, 191 Bernoulli, 152 central limit theorem and, 152–53, 163 fat-tailed, 191 power law, 80–81 sample, 152–53 pro-con lists, 175–78, 185, 189 procrastination, 83–85, 87, 89 product development, 294 product/market fit, 292–96, 302 promotions, 256, 275 proximate cause, 31, 117 proxy endpoint, 137 proxy metric, 139 psychology, 168 Psychology of Science, The (Maslow), 177 Ptolemy, Claudius, 8 publication bias, 170, 173 public goods, 39 punching above your weight, 242 p-values, 164, 165, 167–69, 172 Pygmalion effect, 267–68 Pyrrhus, King, 239 Qualcomm, 231 quantum physics, 200–201 quarantine, 234 questions: now what, 291 what if, 122, 201 why, 32, 33 why now, 291 quick and dirty, 234 quid pro quo, 215 Rabois, Keith, 72, 265 Rachleff, Andy, 285–86, 292–93 radical candor, 263–64 Radical Candor (Scott), 263 radiology, 291 randomized controlled experiment, 136 randomness, 201 rats, 51 Rawls, John, 21 Regan, Ronald, 183 real estate agents, 44–45 recessions, 121–22 reciprocity, 215–16, 220, 222, 229, 289 recommendations, 217 red line, 238 referrals, 217 reframe the problem, 96–97 refugee asylum cases, 144 regression to the mean, 146, 286 regret, 87 regulations, 183–84, 231–32 regulatory capture, 305–7 reinventing the wheel, 92 relationships, 53, 55, 63, 91, 111, 124, 159, 271, 296, 298 being locked into, 305 dating, 8–10, 95 replication crisis, 168–72 Republican Party, 104 reputation, 215 research: meta-analysis of, 172–73 publication bias and, 170, 173 systematic reviews of, 172, 173 see also experiments resonance, 293–94 response bias, 142, 143 responsibility, diffusion of, 259 restaurants, 297 menus at, 14, 62 RetailMeNot, 281 retaliation, 238 returns: diminishing, 81–83 negative, 82–83, 93 reversible decisions, 61–62 revolving door, 306 rewards, 275 Riccio, Jim, 306 rise to the occasion, 268 risk, 43, 46, 90, 288 cost-benefit analysis and, 180 de-risking, 6–7, 10, 294 moral hazard and, 43–45, 47 Road Ahead, The (Gates), 69 Roberts, Jason, 122 Roberts, John, 27 Rogers, Everett, 116 Rogers, William, 31 Rogers Commission Report, 31–33 roles, 256–58, 260, 271, 293 roly-poly toy, 111–12 root cause, 31–33, 234 roulette, 144 Rubicon River, 244 ruinous empathy, 264 Rumsfeld, Donald, 196–97, 247 Rumsfeld’s Rule, 247 Russia, 218, 241 Germany and, 70, 238–39 see also Soviet Union Sacred Heart University (SHU), 217, 218 sacrifice play, 239 Sagan, Carl, 220 sales, 81, 216–17 Salesforce, 299 same-sex marriage, 117, 118 Sample, Steven, 28 sample distribution, 152–53 sample size, 143, 160, 162, 163, 165–68, 172 Sánchez, Ricardo, 234 sanctions and fines, 232 Sanders, Bernie, 70, 182, 293 Sayre, Wallace, 74 Sayre’s law, 74 scarcity, 219, 220 scatter plot, 126 scenario analysis (scenario planning), 198–99, 201–3, 207 schools, see education and schools Schrödinger, Erwin, 200 Schrödinger’s cat, 200 Schultz, Howard, 296 Schwartz, Barry, 62–63 science, 133, 220 cargo cult, 315–16 Scientific Autobiography and other Papers (Planck), 24 scientific evidence, 139 scientific experiments, see experiments scientific method, 101–2, 294 scorched-earth tactics, 243 Scott, Kim, 263 S curves, 117, 120 secondary markets, 281–82 second law of thermodynamics, 124 secrets, 288–90, 292 Securities and Exchange Commission, U.S., 228 security, false sense of, 44 security services, 229 selection, adverse, 46–47 selection bias, 139–40, 143, 170 self-control, 87 self-fulfilling prophecies, 267 self-serving bias, 21, 272 Seligman, Martin, 22 Semmelweis, Ignaz, 25–26 Semmelweis reflex, 26 Seneca, Marcus, 60 sensitivity analysis, 181–82, 185, 188 dynamic, 195 Sequoia Capital, 291 Sessions, Roger, 8 sexual predators, 113 Shakespeare, William, 105 Sheets Energy Strips, 36 Shermer, Michael, 133 Shirky, Clay, 104 Shirky principle, 104, 112 Short History of Nearly Everything, A (Bryson), 50 short-termism, 55–56, 58, 60, 68, 85 side effects, 137 signal and noise, 311 significance, 167 statistical, 164–67, 170 Silicon Valley, 288, 289 simulations, 193–95 simultaneous invention, 291–92 Singapore math, 23–24 Sir David Attenborough, RSS, 35 Skeptics Society, 133 sleep meditation app, 162–68 slippery slope argument, 235 slow (high-concentration) thinking, 30, 33, 70–71 small numbers, law of, 143, 144 smartphones, 117, 290, 309, 310 smoking, 41, 42, 133–34, 139, 173 Snap, 299 Snowden, Edward, 52, 53 social engineering, 97 social equality, 117 social media, 81, 94, 113, 217–19, 241 Facebook, 18, 36, 94, 119, 219, 233, 247, 305, 308 Instagram, 220, 247, 291, 310 YouTube, 220, 291 social networks, 117 Dunbar’s number and, 278 social norms versus market norms, 222–24 social proof, 217–20, 229 societal change, 100–101 software, 56, 57 simulations, 192–94 solitaire, 195 solution space, 97 Somalia, 243 sophomore slump, 145–46 South Korea, 229, 231, 238 Soviet Union: Germany and, 70, 238–39 Gosplan in, 49 in Cold War, 209, 235 space exploration, 209 spacing effect, 262 Spain, 243–44 spam, 37, 161, 192–93, 234 specialists, 252–53 species, 120 spending, 38, 74–75 federal, 75–76 spillover effects, 41, 43 sports, 82–83 baseball, 83, 145–46, 289 football, 226, 243 Olympics, 209, 246–48, 285 Spotify, 299 spreadsheets, 179, 180, 182, 299 Srinivasan, Balaji, 301 standard deviation, 149, 150–51, 154 standard error, 154 standards, 93 Stanford Law School, x Starbucks, 296 startup business idea, 6–7 statistics, 130–32, 146, 173, 289, 297 base rate in, 157, 159, 160 base rate fallacy in, 157, 158, 170 Bayesian, 157–60 confidence intervals in, 154–56, 159 confidence level in, 154, 155, 161 frequentist, 158–60 p-hacking in, 169, 172 p-values in, 164, 165, 167–69, 172 standard deviation in, 149, 150–51, 154 standard error in, 154 statistical significance, 164–67, 170 summary, 146, 147 see also data; experiments; probability distributions Staubach, Roger, 243 Sternberg, Robert, 290 stock and flow diagrams, 192 Stone, Douglas, 19 stop the bleeding, 234 strategy, 107–8 exit, 242–43 loss leader, 236–37 pivoting and, 295–96, 298–301, 308, 311, 312 tactics versus, 256–57 strategy tax, 103–4, 112 Stiglitz, Joseph, 306 straw man, 225–26 Streisand, Barbra, 51 Streisand effect, 51, 52 Stroll, Cliff, 290 Structure of Scientific Revolutions, The (Kuhn), 24 subjective versus objective, in organizational culture, 274 suicide, 218 summary statistics, 146, 147 sunk-cost fallacy, 91 superforecasters, 206–7 Superforecasting (Tetlock), 206–7 super models, viii–xii super thinking, viii–ix, 3, 316, 318 surface area, 122 luck, 122, 124, 128 surgery, 136–37 Surowiecki, James, 203–5 surrogate endpoint, 137 surveys, see polls and surveys survivorship bias, 140–43, 170, 272 sustainable competitive advantage, 283, 285 switching costs, 305 systematic review, 172, 173 systems thinking, 192, 195, 198 tactics, 256–57 Tajfel, Henri, 127 take a step back, 298 Taleb, Nassim Nicholas, 2, 105 talk past each other, 225 Target, 236, 252 target, measurable, 49–50 taxes, 39, 40, 56, 104, 193–94 T cells, 194 teams, 246–48, 275 roles in, 256–58, 260 size of, 278 10x, 248, 249, 255, 260, 273, 280, 294 Tech, 83 technical debt, 56, 57 technologies, 289–90, 295 adoption curves of, 115 adoption life cycles of, 116–17, 129, 289, 290, 311–12 disruptive, 308, 310–11 telephone, 118–19 temperature: body, 146–50 thermostats and, 194 tennis, 2 10,000-Hour Rule, 261 10x individuals, 247–48 10x teams, 248, 249, 255, 260, 273, 280, 294 terrorism, 52, 234 Tesla, Inc., 300–301 testing culture, 50 Tetlock, Philip E., 206–7 Texas sharpshooter fallacy, 136 textbooks, 262 Thaler, Richard, 87 Theranos, 228 thermodynamics, 124 thermostats, 194 Thiel, Peter, 72, 288, 289 thinking: black-and-white, 126–28, 168, 272 convergent, 203 counterfactual, 201, 272, 309–10 critical, 201 divergent, 203 fast (low-concentration), 30, 70–71 gray, 28 inverse, 1–2, 291 lateral, 201 outside the box, 201 slow (high-concentration), 30, 33, 70–71 super, viii–ix, 3, 316, 318 systems, 192, 195, 198 writing and, 316 Thinking, Fast and Slow (Kahneman), 30 third story, 19, 92 thought experiment, 199–201 throwing good money after bad, 91 throwing more money at the problem, 94 tight versus loose, in organizational culture, 274 timeboxing, 75 time: management of, 38 as money, 77 work and, 89 tipping point, 115, 117, 119, 120 tit-for-tat, 214–15 Tōgō Heihachirō, 241 tolerance, 117 tools, 95 too much of a good thing, 60 top idea in your mind, 71, 72 toxic culture, 275 Toys “R” Us, 281 trade-offs, 77–78 traditions, 275 tragedy of the commons, 37–40, 43, 47, 49 transparency, 307 tribalism, 28 Trojan horse, 228 Truman Show, The, 229 Trump, Donald, 15, 206, 293 Trump: The Art of the Deal (Trump and Schwartz), 15 trust, 20, 124, 215, 217 trying too hard, 82 Tsushima, Battle of, 241 Tupperware, 217 TurboTax, 104 Turner, John, 127 turn lemons into lemonade, 121 Tversky, Amos, 9, 90 Twain, Mark, 106 Twitter, 233, 234, 296 two-front wars, 70 type I error, 161 type II error, 161 tyranny of small decisions, 38, 55 Tyson, Mike, 7 Uber, 231, 275, 288, 290 Ulam, Stanislaw, 195 ultimatum game, 224, 244 uncertainty, 2, 132, 173, 180, 182, 185 unforced error, 2, 10, 33 unicorn candidate, 257–58 unintended consequences, 35–36, 53–55, 57, 64–65, 192, 232 Union of Concerned Scientists (UCS), 306 unique value proposition, 211 University of Chicago, 144 unknown knowns, 198, 203 unknowns: known, 197–98 unknown, 196–98, 203 urgency, false, 74 used car market, 46–47 U.S.

pages: 375 words: 102,166

The Genetic Lottery: Why DNA Matters for Social Equality
by Kathryn Paige Harden
Published 20 Sep 2021

In the past few years, the field of psychology has been rocked by a “replication crisis,” in which it has become clear that many of the field’s splashy findings, published in the top journals, could not be reproduced and are likely to be false. Writing about the methodological practices that led to the mass production of illusory findings (practices known as “p-hacking”), the psychologist Joseph Simmons and his colleagues wrote that “everyone knew [p-hacking] was wrong, but they thought it was wrong the way it is wrong to jaywalk.” Really, however, “it was wrong the way it is wrong to rob a bank.”27 Like p-hacking, the tacit collusion in some areas of the social science to ignore genetic differences between people is not wrong in the way that jaywalking is wrong. Researchers are not taking a victimless shortcut by ignoring something (genetics) that is only marginally relevant to their work.

The Knowledge Machine: How Irrationality Created Modern Science
by Michael Strevens
Published 12 Oct 2020

Dear’s own acute characterization of the passage, and of the house style of the early Transactions in general, is from the same page. 168 can be gamed to illuminate the data: The process of running through many different statistical analyses to squeeze some significant result from a set of observations is called “data dredging” or (referring to a particular class of statistical methods) “p-hacking.” You can try out p-hacking yourself at http://fivethirtyeight.com/features/science-isnt-broken/ (Aschwanden, “Science Isn’t Broken”). 169 “The particular and endless modifications”: Scoresby, Account of the Arctic Regions, vol. 1, 426–7. 169 During one particularly brutal winter freeze: As described in Glaisher, “On the Severe Weather at the Beginning of the Year 1855,” 16–30. 171 “the old-school version of Photoshop”: Libbrecht’s comparison to Photoshop and his snowflake statistics are from Pilcher, “No Great Flakes,” 71. 172 even the camera has a point of view: Many further fascinating aspects of the changing notion of objectivity in scientific imagery, including both the Cajal-Golgi dispute and the case of snowflakes, are described in Daston and Galison’s book Objectivity and in the literature on which it draws.

M., 304n McLane, Maureen, 278 Meander River, 278 Meaning of Human Existence, The (Wilson), 275 measurement, Born’s rule and, 148–49 mechanical theory of heat, See kinetic theory of heat Medawar, Peter, 173 medical experimentation, Avicenna’s rules for, 117 Mendel, Gregor, 47 Mercury (god), 274 Mer de Glace (Mount Blanc), 51–52 metaphysics in Cartesian physics, 132–33 Descartes and, 271 and nature of causal principles, 147 Newton and, 186 methodism/methodists defined, 7, 293 lack of guides to govern, 73–74 and self-correcting nature of science, 58–59 and standardized rules/procedures for empirical inquiry, 58 subjectivity vs., 50 and three essential ingredients for thriving science, 283–84 Michelson, Albert, 112–14 Michelson–Morley experiment, 112–14, 113 microcosm, consonance with macrocosm, 210, 236 microscope, 168–70 Mie, Gustav, 143 Miletus, 2, 278, 279, 279 Millikan, Robert, 48 minutiae, pursuit of, 203 miseducation, 262–64 Mlodinow, Leonard, 260 modern science distinguished from natural philosophy, 3, 116–19, 281–82 four innovations leading to, 119 late arrival of, 1–4, 8–9, 201–8, 241, 289–90 as social practice, 275 moons of Jupiter, 106 morality, in contemporary science, 260 moral personae, 316n moral strategy, when teaching scientific habits of thought, 258–59 Morley, Edward, 112–14 Morris, Errol, 22 motion, in Aristotelian physics, 133–34 motivation; See also fighting spirit comparison of Kuhn and Popper on, 39 and the iron rule’s procedural consensus, 99–103, 195–96 and Kuhnian paradigms, 282–83 objectivity and, 85–86 as a problem for science, 33, 37–38, 116, 203 Mousterian tools, 239–41, 241 NASA, 34 nation-states, rise of, 246 natural philosophy Bacon and, 304n and consequences of causal principles, 147 distinguished from modern science, 3, 116–19, 281–82 God’s place in Cartesian philosophy, 205–6 lack of procedural consensus, 117–18 origins, 278 and schisms, 97–98 in seventeenth century, 243 natural selection, 28, 219, 226 Nazi Germany, 13, 21–22 Neanderthals, 223, 240 negative clause, 118 nervous system, 153–55 Neuhauss, Richard, 170–71 neurons, 153, 153–55 neutrino, 228 neutron, 228, 230 Newcomb, Simon, 301n New Organon, The (Bacon), 105–6, 109 Newton, Sir Isaac, 135–40, 290 and alchemy, 183–92 Anglican ordination crisis, 250–52 Aristotle’s methods compared to, 202–3 and beauty, 227 as central to Scientific Revolution, 244 childhood, 135 and compartmentalization, 188–89, 238, 272 death of, 140 Descartes and, 188, 273 gravity theory, 27–28, 40, 68, 111, 136–40, 188, 195 on hypotheses, 311n–312n and iron rule, 137, 191–92, 195 law of universal gravitation, 136–40 and manipulation of experiment results, 48 as mathematical physicist, 190 method for empirical inquiry, 137–40, 191, 194, 243 nonempirical pursuits, 186 as nonhumanistic thinker, 271–72 praise from fellow scientists/philosophers, 140–42 Principia, 135, 137–39, 141, 191–92, 248–49 and quantum mechanics as example of shallow explanation, 144 religious beliefs of, 186–87, 250, 316n on separation of science and religion, 247 and shallow conception of explanation, 138–39, 151 sketch of philosopher’s stone, 214 view of universe as riddle, 212, 214 Newtonian gravitation, 27–28, 40, 42–50, 68, 111, 195 Newtonian physics, 112, 114 “Newtonian style” (Cohen), 307n “Newtonian university,” 273 New York City, 278–79, 289 New Zealand, 35, 36 Nibbs, George, 176 Ninety-Five Theses, 242, 245 Nobel Duel, The (Wade), 296n, 304n nonempirical thinking, iron rule’s prohibition of, 173, 180–81, 191, 237, 258 “non-overlapping magisteria,” 207 novelty, worldview and, 25 objectivity, 85–86; See also subjectivity and Cajal’s defense of neuron doctrine, 154–55 drive for, 152–72 Eddington and, 155–61 and idea of science as “self-correcting,” 59–62 and iron rule, 118, 181–82 logical impossibility of objective rule for interpreting evidence, 79–82 logical impossibility of finding objective rule for weighing evidence, 79–82 new ideas/technology and, 168 in Philosophical Transactions, 166–67, 249–50 photography and, 168–72 in public scientific argument, 163–64, 166 rhetoric of, 166–67 and Royal Society, 166–67 and “science wars,” 262–63 and snowflake studies, 168–72, 169, 171 sterilization and, 161; See also “sterilization” of scientific argument observation and Aristotle’s approach to inquiry, 204 Born’s rule and, 148–49 and essence of science, 5 iron rule’s exclusive concern with, 206 supremacy of, 173–97 Occam’s Razor, 92 octet (eightfold way), 230–31, 231 official publications/speech; See also scientific argument aesthetic reasoning’s exclusion from, 209–10, 235, 238, 257 iron rule’s limitations on, 118, 195, 208, 238 and objectivity, 161, 163–64, 166, 172 defined, 163 omega-minus particle, 233, 233, 234 On Growth and Form (Thompson), 221 On the Marriage of Philology and Mercury (Martianus), 274 On the Origin of Species (Darwin), 177, 206–7, 219 Oppenheimer, Robert, 229 Oration on the Dignity of Man (Pico della Mirandola), 269–70 Oreskes, Naomi, 299n “organization man,” 25–26 Organon (Aristotle), 105 Origin of Continents and Oceans, The (Wegener), 55, 55–56 Owen, Richard, 76 Paolozzi, Eduardo, 286 Paracelsus, 212, 270 paradigm, 26–32, 296n and crisis, 28 defined, 24 and Eddington experiment, 46–47 ending of, 28–29 explanatory relativism and, 151 and modern science vs. natural philosophy, 282–83 and motivation, 38, 116, 282 relevance of experimental inquiry and, 36 and scientific revolutions, 26–30 paradigm shift, 124 particle accelerators, 228 particle physics, 227–35 particles, Born’s rule and, 148–49 Pasteur, Louis, 50–52, 82 Penrose, Roger, 146 perception, 24–25 perspective, 242 persuasion, beauty and, 235 p-hacking, 309n pharmaceuticals, corporate-funded research and, 53 philosopher-scientists, 265 philosophical argument Boyle and, 244 Descartes and, 188, 270–71 iron rule’s exclusion of, 207–8 philosophical question about science, the, 9, 32–33, 65, 103–4, 116–19 Philosophical Transactions of the Royal Society, 157, 166, 249–50 philosophy abandoning for pursuit of science, 256 and Aristotle’s approach to inquiry, 204 contemporary science’s hostility to, 260–62 removal from explanation, 118 and schism, 97 in years leading to Scientific Revolution, 242–43 photoelectric effect, 144 photography and objectivity, 170 in scientific papers, 168–72 photons, 144, 150, 228 physical contact, in Cartesian physics, 130–35 physics Aristotelian, 123–24, 133–34 Cartesian, 130–38, 270–71 Einsteinian, See relativity theory of heat, 76–77, 90–94, 107–8 mathematical foundations, 194 Newtonian, 27–28, 40, 68, 111, 136–40, 188, 195 particle physics, 227–35 “post-empirical,” 284–85 quantum physics, 28 and velocity, 27 Pico della Mirandola, Giovanni, 269–70 Piero della Francesca, 242 Pinch, Trevor, 285–86 Pinker, Steven, 265 Pittoni, Giovanni Battista, 141–42 planetary motion, 106, 134 Plato, 187, 304n plausibility rankings, 83 and Baconian convergence, 109–11 Bayesian framework for, 302n defined, 293 Eddington and, 157, 160–61 and falsification, 162, 281 and IPCC reports, 288 and iron rule, 196, 307–8n and scientific papers, 165–67 and scientific reasoning, 83–84, 163 playing cards (psychology experiment), 24 Polyprion, 221, 222 Popper, Karl, 6, 13–22 birth and childhood, 14 on blaming of assumptions, 72 and critical spirit, 281–82 early years, 14–15 and Eddington’s eclipse expedition, 42, 68 extraphilosophical claims, 40 and falsification, 46 and Hume’s view of inductive reasoning, 17–18, 21 and interpretation of evidence, 162 and iron rule of explanation, 102 and Kelvin’s estimate of earth’s age, 78–79 Kuhn and, 25, 30, 38–40 and motivation, 19–20, 39, 281 and objectivity, 85 and plausibility rankings, 110 and repeating experiments, 73 “post-empirical physics,” 284–85 Pouchet, Felix, 52 power, scientific discovery and, 64, 312n predictions explanations vs., 303n true theory and, 18 predictive power commonalities between Kuhn and Popper, 39 and paradigms, 32 and shallow conception of explanation, 151 Principia (Newton), 135, 141 adoption of methods outlined in, 142, 248–49 on hypothesis, 191 and Newton’s conception of empirical inquiry, 137–38 and shallow conception of explanation, 138–39 principle of total evidence, 315n prisca sapientia (ancient wisdom), 187, 212, 227 private thought/reasoning, See scientific reasoning probabilistic logics, 302n, 305n procedural consensus and Baconian convergence, 110–19 explanatory relativism as obstacle to, 128 and iron rule, 97, 99–101, 103, 117–18, 195–96 procedure, prescribed by the iron rule, 93, 96, 203 Protestantism, 245–46 proton, 232, 234 psuche (soul), 126–29, 134, 140, 307n psychoanalytic schools, 303–4n Ptolemaic paradigm, 31–32 Ptolemy, 27 public arguments/documents, See scientific argument public speech codes, 7, 181, 251 Pufendorf, Samuel von, 316n puffer fish, 221–22, 223, 226 puzzle-solving, paradigms and, 31–32, 39 Pythagoras, 187 quantitative data, 202–3, 244 quantum ether, 148 quantum mechanics, 28, 142–51 quarks, 101, 232 quinarian system, 214–20, 215–17 quintessence, 133–34, 228 radiation and kinetic vs. caloric theory, 92, 93 superposition and, 144 radical subjectivism and conditions for healthy science, 285–86 critique of scientific objectivity, 164–66 defined, 63, 294 and iron rule, 162–63 and norm of objectivity, 161 problems with absolute claims of, 64–65 radiolarians, 221 reasoning, See scientific reasoning record, scientific paper as, 165–66 refutation, 15–16, 18–20, 39 regularity/uniformity, See inductive reasoning Reichenbach, Hans, 21–22 Rejection of Continental Drift (Oreskes), 299n relativism, See explanatory relativism relativity theory, 28 general theory, 34–35, 41–42, 49, 111–112, 155–61 special theory, 72, 114 religion; See also God and Cartesian philosophy of knowledge, 270–71 in Cartesian physics, 132 Darwin and, 207 exclusion from iron rule, 204–7 Newton and, 186–88 Newton’s Anglican ordination crisis, 250–52 and Scientific Revolution, 242–43 separation from political identity in 1600s, 246–47 Whewell and, 178–83, 204–5 religious tolerance, 251–52 Renaissance, 242, 269–71 reporting, of experimental results, 48, 66–67, 157–61 research, industry-sponsored, 52–53, 84 reticular theory of the brain, 154 retina, 153 rigor, 32–33 Roman Catholicism, See Catholic Church Romanes, George, 207 Rome, ancient, 2, 242, 274 Royal Astronomical Society, 45 Royal Greenwich Observatory (England), 69, 69 Royal Society and Eddington’s eclipse expedition, 45 on empirical testing, 173 and observation-based inquiry, 194 prose style in early publications from, 166–67 Philosophical Transactions (house journal), 157, 166, 249–50 rule of four, 210–12, 213, 313n rules, of modern science, See scientific method Russell, Bertrand, 17 Sacred Theory of the Earth (Burnet), 75, 310n–311n Salk Institute (San Diego), 60–63, 61 SARS-CoV-2, 287 Schally, Andrew, 33–34, 37–38, 99, 264 schisms, 97 Schneider, Stephen, 288–89 Schrödinger, Erwin, 148 Schrödinger’s equation, 148, 150 Schultz, Howard, 267 science (generally) essential ingredients for, 283–84 essential subjectivity of, 66–86 lack of consensus about nature of, 5 late arrival of, 1–4, 8–9, 201–8, 241, 289–90 and method, 63–65 resisting pressure to “improve,” 284 as social institution, 290 theatrical element of, 190–91, 265–67 science education Kuhnian program for, 282–83 miseducation and, 262–64 and simplemindedness, 259–60 training process, 255–68 “science wars,” 262–63 Science without Laws (Giere), 296n scientific argument data as by-product of, 98, 103–4, 195–6 elimination of aesthetic reasoning from, 209–10 exclusive focus on empirical testing, 257 first modern scientists’ approach to, 265–67 private reasoning vs., 163–64, 181, 235, 238, 249–51, 265–66, 273 scientific attitude, 273 scientific journals, See scientific argument scientific method (generally); See also Great Method Debate, iron rule of explanation inherent strangeness of, 4–5 lack of consensus about nature of, 5–6 self-correcting, 58–62, 111 scientific papers changes in standards for content, 167–68 functions of, 164–65 subjectivist critique of, 164–66 See also scientific argument scientific reasoning Newton vs.

pages: 305 words: 75,697

Cogs and Monsters: What Economics Is, and What It Should Be
by Diane Coyle
Published 11 Oct 2021

Perhaps this is changing, although my impression is that too little attention is still paid to the epistemological status of the data it is now so easy to download and feed into statistics packages. Bayesian inference is rarely (though increasingly) taught despite its usefulness as a practical tool in the face of uncertainty. Economics research is hardly ever replicated, nor are negative results published—problems affecting other disciplines too, of course, as the recent ‘p-hacking’ debate shows (Fanelli 2010; Head et al. 2015). This may be coming for us soon. One of the improvements employers have long wanted to see in economics degrees is much better practical preparation for collecting and understanding statistics, as well as using them in careful econometrics. This is an area where much has improved in teaching practice in the past decade.

A., 1935, ‘Socialist Calculation I: The Nature and History of the Problem’, reprinted in Individualism and Economic Order, 121–147, Chicago: University of Chicago Press, 1948. Hayek, F., 1944, The Road to Serfdom, London: Routledge. Hayek, F. A., 1945, ‘The Use of Knowledge in Society’, The American Economic Review, 35 (4), 519–530. Head, M. L., L. Holman, R. Lanfear, A. T. Kahn, and M. D. Jennions, 2015, ‘The Extent and Consequences of P-Hacking in Science’, PLoS Biol, 13 (3), e1002106, https://doi.org/10.1371/journal.pbio.1002106. Heckman, James J., and Sidharth Moktan, 2020, ‘Publishing and Promotion in Economics: The Tyranny of the Top Five’, Journal of Economic Literature, 58 (2), 419–470. Hedlund, J., 2000, ‘Risky Business: Safety Regulations, Risk Compensation, and Individual Behaviour’, Injury Prevention, 6, 82–89.

pages: 340 words: 94,464

Randomistas: How Radical Researchers Changed Our World
by Andrew Leigh
Published 14 Sep 2018

This troubling finding immediately suggested that up to 5 per cent of published results might be due to luck, rather than statistical significance. Worse still, if researchers were rerunning their analysis with different specifications until they got a result that was significant at the 95 per cent level (a practice known as ‘P-hacking’), then the resulting research might be even more error-prone. An unscrupulous academic who started each project with twenty junk theories could reasonably expect that mere luck would confirm that one of them was significant at the 95 per cent level. Discard the other nineteen and – voila! – there’s your publishable result.

Olds, David 211 ‘once and done’ campaign, and Smile Train aid charity 158 O’Neill, John, and Black Saturday 2009 13–14 O’Neill, Maura 210 Oportunidades Mexico 117 see also President Vincent Fox Oregon research on health insurance 42 parachute study, and randomised evaluation of 12 Pare, Ambroise, and soldiers’ gunpowder burns 22–3 parenting programs 68–9 and Chicago ‘Parent Academy’ 9 and Incredible Years Basic Parenting Programme 69 and randomised evaluations 70 ‘Triple P’ positive parenting program 68–9 ‘partial equilibrium’ effect 191 Peirce, Charles Sanders 49–51 Perry, Rick 150–1 Perry Preschool 66–8, 71, 169, 191–2 see also David Weikart; Evelyn Moore ‘P-hacking’ 195–6 Piaget, Jean 66 Pinker, Stephen 177 placebo effect 10, 29–31, 34, 138, 192 and John Haygarth 23–4 placebo surgery 18–21 see also sham surgery Planet Money 103 policing programs 91–4, 209 ‘broken windows policing’ 209 and ‘hot spots’ policing 93 and ‘problem oriented policing’ 94 and randomised evaluations 94 see also criminal justice experiments; Lawrence Sherman; Patrick Murphy; Rudi Lammers political campaign strategies and Benin political campaign 160 and control groups 148, 155 and ‘deep canvassing’ 163–4 and Harold Gosnell 148–50 and lobbying in US 162 and online campaigning 154–5 and political speeches 160–1 and ‘robocalls’ 152 and Sierra Leone election debates 161 and use of ‘social pressure’ 151–2 see also Get Out the Vote Pope Benedict XVI 119 ‘power of free’ theory 112 pragmatism 50 see also Charles Sanders Pierce ‘problem oriented policing’ 94 Programme for International Student Assessment 73 Progresa Mexico 117–18 see also President Ernesto Zedillo Project Independence 60–1 see also Ben Graber; Judith Gueron; Manpower Demonstration Research Corporation (MDRC) Project STAR experiment 81 Promise Academy 78–9 Prospera Mexico 118 psychology experiments 50–1, 143, 170, 177, 196 see also Charles Sanders Pierce; Joseph Jastrow ‘publication bias’ 199 Pyrotron 14–15 see also Andrew Sullivan Quintanar, Maricela 38–40 Quora 131 RAND Health Insurance Experiment 41, 169 randomised auditing 174–5 randomised trials see also A/B testing and ‘anchoring’ effect 133 and the book of Daniel 22 and Community Led Sanitation 116 and control groups 13, 67–8, 74, 78, 82 and data collection 171–2 and the driving licence experiment 109 and the ‘experimental idea’ 194 fairness of 37, 100, 177, 185 and ‘fixed mindset’ 6 and ‘general equilibrium’ effect 191 and the ‘gold standard’ 194 and ‘growth mindset’ 6 and ‘healthy cohort’ effect 12 and Highest Paid Person’s Opinion (HiPPO) 6 and Kenyan mini-bus driver experiments 115–16 and ‘natural experiments’ 193 and N-of-1 168–9 and the No Child Left Behind Act 210 and ‘the paradox of choice’ 195 and ‘partial equilibrium’ effect 191 and ‘publication bias’ 199 and replication of 90, 124, 195, 197–8 and sex education 119–20 and single-centre trials 197 and ‘virginity pledges’ in the US 46–7 randomistas, Angus Deaton Nobel laureate on 12 Read India 188 see also Rukmini Banerji Reagan, President Ronald 59, 151 Registry for International Development Impact Evaluations 199 replication 90, 195, 197–8 ‘restorative justice conferencing’ 84 restorative justice experiments 85–6, 182 Results for America 211 Rhinehart, Luke, and The Dice Man 180 Roach, William 52 ‘robocalls’ 152 Romney, Mitt 147 Rossi, Peter 190 ‘Rossi’s Law’ 190, 206 Rothamsted Experimental Centre 53 Rudder, Christian 130 see also OkCupid Sachs, Jeffrey 121 Sackett, David 27, 206 Sacred Heart Mission 36 Salk, Jonas 168 Salvation Army’s ‘Red Kettle Christmas drive 157 Sandburg, Sheryl 144 Saut, Fabiola Vasquez 110 see also Acayucan road experiment ‘scaling proven success,’ and ‘Development Innovation Ventures’ 210 Scared Straight 7–8, 94, 98–9, 189 see also Danny Glover; James Finckenauer Schmidt, Eric, and Google 143 Schwarzenegger, Arnold 75, 173 Science 163 ‘Science of Philanthropy Initiative’ 159 scurvy treatment trials 3–5, 16 see also Gilbert Blane; James Cook; James Lind; William Stark Second Chance Act 210 Seeger, Pete, and ‘The Draft Dodger Rag’ 42 Semelweiss, Ignaz 25 Sesame Street 63–5, 83 see also Joan Cooney sex education 119–20 sham surgery trials 19–20, 182 and ‘clinical equipoise’ 21 Sherman, Lawrence 91–4, 101 ‘Shoes for Better Tomorrows’ (TOMS) 113–15 see also Blake Mycoskie; Bruce Wydick Sierra Leone election debates 161 see also Saa Badabla SimCalc, and online learning tools 77 ‘single subject’ trials 168–9 see also N-of-1 Siroker, Dan 148 Sliding Doors 9 Smile Train aid charity, and ‘once and done’ campaign 158 social experiments large-scale 41 social field experiments and control groups 37, 39–41, 139 and credit card upgrades 132–3 and pay rates 136–7 and retail discounts 133 and ‘split cable’ techniques 139–40 and Western Union money transfers 130 social program trials and Kenyan electricity trial 110 and smoking deterrents 47–8 see also Acayucan road experiment; neighbourhood project social service agencies 36, 69 ‘soft targeting’ 36 ‘split cable’ technique 139–40 St.

pages: 442 words: 94,734

The Art of Statistics: Learning From Data
by David Spiegelhalter
Published 14 Oct 2019

Any amount of tweaking is fine in exploratory studies, but confirmatory studies should be carried out according to a pre-specified, and preferably public, protocol. Each can use P-values to summarize the strength of evidence for their conclusions, but these P-values should be clearly distinguished and interpreted very differently. Activities that are intended to create statistically significant results have come to be known as ‘P-hacking’, and although the most obvious technique is to carry out multiple tests and report the most significant, there are many more subtle ways in which researchers can exercise their degrees of freedom. Does listening to the Beatles’ song ‘When I’m Sixty-Four’ make you younger? You might feel fairly confident about the correct answer to this question.

Locators in italics refer to figures and tables A A/B tests 107 absolute risk 31–2, 36–7, 383 adjustment 110, 133, 135, 383 adjuvant therapy 181–5, 183–4 agricultural experiments 105–6 AI (artificial intelligence) 144–5, 185–6, 383 alcohol consumption 112–13, 299–300 aleatory uncertainty 240, 306, 383 algorithms – accuracy 163–7 – biases 179 – for classification 143–4, 148 – complex 174–7 – contests 148, 156, 175, 277–8 see also Titanic challenge – meaning of 383 – parameters 171 – performance assessment 156–63, 176, 177 – for prediction 144, 148 – robustness 178 – sensitivity 157 – specificity 157 – and statistical variability 178–9 – transparency 179–81 allocation bias 85 analysis 6–12, 15 apophenia 97, 257 Arbuthnot, John 253–5 Archbishop of Canterbury 322–3 arm-crossing behaviour 259–62, 260, 263, 268–70, 269 artificial intelligence (AI) 144–5, 185–6, 383 ascertainment bias 96, 383 assessment of statistical claims 368–71 associations 109–14, 138 autism 113 averages 46–8, 383 B bacon sandwiches 31–4 bar charts 28, 30 Bayes, Thomas 305 Bayes factors 331–2, 333, 384 Bayes’ Theorem 307, 313, 315–16, 384 Bayesian hypothesis testing 219, 305–38 Bayesian learning 331 Bayesian smoothing 330 Bayesian statistical inference 323–34, 325, 384 beauty 179 bell-shaped curves 85–91, 87 Bem, Daryl 341, 358–9 Bernoulli distribution 237, 384 best-fit lines 125, 393 biases 85, 179 bias/variance trade-off 169–70, 384 big data 145–6, 384 binary data 22, 385 binary variables 27 binomial distribution 230–6, 232, 235, 385 birth weight 85–91 blinding 101, 385 BMI (body mass index) 28 body mass index (BMI) 28 Bonferroni correction 280, 290–1, 385 boosting 172 bootstrapping 195–203, 196, 198, 200, 202, 208, 229–30, 386 bowel cancer 233–6, 235 Box, George 139 box-and-whisker plots 42, 43, 44, 45 Bradford-Hill, Austin 114 Bradford-Hill criteria 114–17 brain tumours 95–6, 135, 301–3 breast cancer screening 214–16, 215 breast cancer surgery 181–5, 183–4 Brier score 164–7, 386 Bristol Royal Infirmary 19–21, 56–8 C Cairo, Alberto 25, 65 calibration 161–3, 162, 386 Cambridge University 110, 111 cancer – breast 181–5, 183–4, 214–16, 215 – lung 98, 114, 266 – ovarian 361 – risk of 31–6 carbonated soft drinks 113 Cardiac Surgical Registry (CSR) 20–1 case-control studies 109, 386 categorical variables 27–8, 386 causation 96–9, 114–17, 128 reverse causation 112–15, 404 Central Limit Theorem 199, 238–9, 386–7 chance 218, 226 child heart surgery see heart surgery chi-squared goodness-of-fittest 271, 272, 387 chi-squared test of association 268–70, 387 chocolate 348 classical probability 217 classification 143–4, 148–54 classification trees 154–6, 155, 168, 174, 387 cleromancy 81 clinical trials 82–3, 99–107, 131, 280, 347 clustering 147 cohort studies 109, 387 coins 308, 309 communication 66–9, 353, 354, 364–5 complex algorithms 138–9 complexity parameters 171 computer simulation 205–7, 208 conclusions 15, 22, 347 conditional probability 214–16 confidence intervals 241–4, 243, 248–51, 250, 271–3, 335–6, 387–8 confirmatory studies 350–1, 388 confounders 110, 135, 388 confusion matrixes 157 continuous variables 46, 388 control groups 100, 389 control limits 234, 389 correlation 96–7, 113 count variables 44–6, 389 counterfactuals 97–8, 389 crime 83–5, 321–2 see also homicides Crime Survey for England and Wales 83–5 cross-sectional studies 108–9 cross-validation 170–1, 389 CSR(Cardiac Surgical Registry) 20–1 D Data 7–12, 15, 22 data collection 345 data distribution see sample distribution data ethics 371 data literacy 12, 389 data science 11, 145–6, 389 data summaries 40 data visualization 22, 25, 65–6, 69 data-dredging 12 death 9 see also mortality; murder; survival rates deduction 76 deep learning 147, 389 dependent events 214, 389 dependent variables 60, 125–6, 389 deterministic models 128–9, 138 dice 205–7, 206, 213 differences between groups of numbers 51–6 distribution 43 DNA evidence 216 dogs 179 Doll, Richard 114 doping 310–13, 311–12, 314, 315–16 dot-diagrams 42, 43, 44, 45 dynamic graphics 71 E Ears 108–9 education 95–6, 106–7, 131, 135, 178–9 election result predictions 372–6, 375 see also opinion polls empirical distribution 197, 404 enumerative probability 217–18 epidemiology 95, 117, 389 epistemic uncertainty 240, 306, 308, 309, 390 error matrixes 157, 158, 390 errors in coding 345–6 ESP (extra-sensory perception) 341, 358–9 ethics 371 eugenics 39 expectation 231, 390 expected frequencies 32, 209–13, 211, 214–16, 215, 390 explanatory variables 126, 132–5 exploratory studies 350, 390 exposures 114, 390 external validity 82–3, 390 extra-sensory perception (ESP) 341, 358–9 F False discovery rate 280, 390 false-positives 278–80, 390 feature engineering 147, 390 Fermat, Pierre de 207 final odds 316 financial crisis of 2007–2008 139–40 financial models 139–40 Fisher, Ronald 258, 265–6, 336, 345 five-sigma results 281–2 forensic epidemiology 117, 391 forensic statistics 6 framing 391 – of numbers 24–5 – of questions 79–80 fraud 347–50 funnel plots 234, 391 G Gallup, George 81 Galton, Francis 39–40, 58, 121–2, 238–9 gambler’s fallacy 237 gambling 205–7, 206, 213 garden of forking paths 350 Gaussian distribution see normal distribution GDP (Gross Domestic Product) 8–9 gender discrimination 110, 111 Gini index 49 Gombaud, Antoine 205–7 Gross Domestic Product (GDP) 8–9 Groucho principle 358 H Happiness 9 HARKing 351–2 hazard ratios 357, 391 health 169–70 heart attacks 99–104 Heart Protection Study (HPS) 100–2, 103, 273–5, 274, 282–7 heart surgery 19–21, 22–4, 23, 56–8, 57, 93, 136–8, 137 heights 122–5, 123, 124, 127, 134, 201, 202, 243, 275–8, 276 hernia surgery 106 HES (Hospital Episode Statistics) 20–1 hierarchical modelling 328, 391 Higgs bosons 281–2 histograms 42, 43, 44, 45 homicides 1–6, 222–6, 225, 248, 270–1, 272, 287–94 Hospital Episode Statistics (HES) 20–1 hospitals 19–21, 25–7, 26, 56–61, 138 house prices 48, 112–14 HPS (Heart Protection Study) 100–2, 103, 273–5, 274, 282–7 hypergeometric distribution 264, 391 hypotheses 256–7 hypothesis testing 253–303, 336, 392 see also Neyman-Pearson Theory; null hypothesis significance testing; P-values I IARC (International Agency for Research in Cancer) 31 icon arrays 32–4, 33, 392 income 47–8 independent events 214, 392 independent variables 60, 126, 392 induction 76–7, 392 inductive behaviour 283 inductive inference 76–83, 78, 239, 392 infographics 69, 70 insurance 180 ‘intention to treat’ principle 100–1, 392 interactions 172, 392 internal validity 80–1, 392 International Agency for Research in Cancer (IARC) 31 inter-quartile range (IQR) 51, 89, 392 IQ 349 IQR (inter-quartile range) 49, 51, 89, 392 J Jelly beans in a jar 40–6, 48, 49, 50 K Kaggle contests 148, 156, 175, 277–8 see also Titanic challenge k-nearest neighbors algorithm 175 L LASSO 172–4 Law of Large Numbers 237, 393 law of the transposed conditional 216, 313 league tables 25, 130–1 see also tables least-squares regression lines 124, 125, 393 left-handedness 113–14, 229–33, 232 legal cases 313, 321, 331–2 likelihood 327, 336, 394 likelihood ratios 314–23, 319–20, 332, 394 line graphs 4, 5 linear models 132, 138 literal populations 91–2 logarithmic scale 44, 45, 394 logistic regression 136, 172, 173, 394 London Underground 24 loneliness 80 long-run frequency probability 218 look elsewhere effect 282 lung cancer 98, 114, 266 lurking factors 113, 135, 394–5 M Machine learning 139, 144–5, 395 mammography 214–16, 215 margins of error 189, 199, 200, 244–8, 395 mean average 46–8 mean squared error (MSE) 163–4, 165, 395 measurement 77–9 meat 31–4 media 356–8 median average 46, 47–8, 51, 89, 395 Méré, Chevalier de 205–7, 213 meta-analysis 102, 104, 395 metaphorical populations 92–3 mode 46, 48, 395 mortality 47, 113–14 MRP (multilevel regression and post-stratification) 329, 396 MSE (mean squared error) 163–4, 165, 395 mu 190 multilevel regression and post-stratification (MRP) 329, 396 multiple linear regression 132–3, 134 multiple regression 135, 136, 396 multiple testing 278–80, 290, 396 murders 1–6, 222–6, 225, 248, 270–1, 287–94 N Names, popularity of 66, 67 National Sexual Attitudes and Lifestyle Survey (Natsal) 52, 69, 70, 73–5 natural variability 226 neural networks 174 Neyman, Jerzy 242, 283, 335–6 Neyman-Pearson Theory 282–7, 336–7 NHST (null hypothesis significance testing) 266–71, 294–7, 296 non-significant results 299, 346–7, 370 normal distribution 85–91, 87, 226, 237–9, 396–7 null hypotheses 257–65, 336, 397 null hypothesis significance testing (NHST) 266–71, 294–7, 296 O Objective priors 327 observational data 108, 114–17, 128 odds 34, 314, 316 odds ratios 34–6 one-sided tests 264, 397–8 one-tailed P-values 264, 398 opinion polls 82, 245–7, 246, 328–9 see also election result predictions ovarian cancer 361 over-fitting 167–71, 168 P P-hacking 351 P-values 264–5, 283, 285, 294–303, 336, 401 parameters 88, 240, 398 Pascal, Blaise 207 patterns 146–7 Pearson, Egon 242, 283, 336 Pearson, Karl 58 Pearson correlation coefficient 58, 59, 96–7, 126, 398 percentiles 48, 89, 398–9 performance assessment of algorithms 156–67, 176, 177 permutation tests 261–4, 263, 399 personal probability 218–19 pie charts 28, 29 placebo effect 131 placebos 100, 101, 399 planning 13–15, 344–5 Poisson distribution 223–4, 225, 270–1, 399 poker 322–3 policing 107 popes 114 population distribution 86–91, 195, 399 population growth 61–6, 62–4 population mean 190–1, 395 see also expectation populations 74–5, 80–93, 399 posterior distributions 327, 400 power of a test 285–6, 400 PPDAC (Problem, Plan, Data, Analysis, Conclusion) problem-solving cycle 13–15, 14, 108–9, 148–54, 344–8, 372–6, 400 practical significance 302, 400 prayer 107 precognition 341, 358–9 Predict 2.1 182 prediction 144, 148–54 predictive analytics 144, 400 predictor variables 392 pre-election polls see opinion polls presentation 22–7 press offices 355–6 priming 80 prior distributions 327, 400 prior odds 316 probabilistic forecasts 161, 400 probabilities, accuracy 163–7 probability 10 meaning of 216–22, 400–1 rules of 210–13 and uncertainty 306–7 probability distribution 90, 401 probability theory 205–27, 268–71 probability trees 210–13, 212 probation decisions 180 Problem, Plan, Data, Analysis, Conclusion (PPDAC) problem-solving cycle 13–15, 14, 108–9, 148–54, 344–8, 372–6, 400 problems 13 processed meat 31–4 propensity 218 proportions, comparisons 28–37, 33, 35 prosecutor’s fallacy 216, 313 prospective cohort studies 109, 401 pseudo-random-number generators 219 publication bias 367–8 publication of findings 355 Q QRPs (questionable research practices) 350–3 quartiles 89, 402 questionable research practices (QRPs) 350–3 Quetelet, Adolphe 226 R Race 179 random forests 174 random match probability 321, 402 random observations 219 random sampling 81–2, 208, 220–2 random variables 221, 229, 402 randomization 108, 266 randomization tests 261–4, 263, 399 randomized controlled trials (RCTs) 100–2, 105–7, 114, 135, 402 randomizing devices 219, 220–1 range 49, 402 rate ratios 357, 402 Receiver Operating Characteristic (ROC) curves 157–60, 160, 402 recidivism algorithms 179–80 regression 121–40 regression analysis 125–8, 127 regression coefficients 126, 133, 403 regression modelling strategies 138–40 regression models 171–4 regression to the mean 125, 129–32, 403 regularization 170 relative risk 31, 403 reliability of data 77–9 replication crisis in science 11–12 representative sampling 82 reproducibility crisis 11–12, 297, 342–7, 403 researcher degrees of freedom 350–1 residual errors 129, 403 residuals 122–5, 403 response variables 126, 135–8 retrospective cohort studies 109, 403 reverse causation 112–15, 404 Richard III 316–21 risk, expression of 34 robust measures 51 ROC (Receiver Operating Characteristic) curves 157–60, 160, 402 Rosling, Hans 71 Royal Statistical Society 68, 79 rules for effective statistical practice 379–80 Ryanair 79 S Salmon 279 sample distribution 43 sample mean 190–1, 395 sample size 191, 192–5, 193–4, 283–7 sampling 81–2, 93 sampling distributions 197, 404 scatter-plots 2–4, 3 scientific research 11–12 selective reporting 12, 347 sensitivity 157–60, 404 sentencing 180 Sequential Probability Ratio Test (SPRT) 292, 293 sequential testing 291–2, 404 sex ratio 253–5, 254, 261, 265 sexual partners 47, 51–6, 53, 55, 73–5, 191–201, 193–4, 196, 198, 200 Shipman, Harold 1–6, 287–94, 289, 293 shoe sizes 49 shrinkage 327, 404 sigma 190, 281–2 signal and the noise 129, 404 significance testing see null hypothesis significance testing Silver, Nate 27 Simonsohn, Uri 349–52, 366 Simpson’s Paradox 111, 112, 405 size of a test 285–6, 405 skewed distribution 43, 405 smoking 98, 114, 266 social acceptability bias 74 social physics 226 Somerton, Francis see Titanic challenge sortilege 81 sortition 81 Spearman’s rank correlation 58–60, 405 specificity 157–9, 405 speed cameras 130, 131–2 speed of light 247 sports doping 310–13, 311–12, 314, 315–16 sports teams 130–1 spread 49–51 SPRT (Sequential Probability Ratio Test) 292, 293 standard deviation 49, 88, 126, 405 standard error 231, 405–6 statins 36–7, 99–104, 273–5, 274, 282–7 statistical analysis 6–12, 15 statistical inference 208, 219, 229–51, 305–38, 323–8, 335, 404 statistical methods 12, 346–7, 379 statistical models 121, 128–9, 404 statistical practice 365–7 statistical science 2, 7, 404 statistical significance 255, 265–8, 270–82, 404 Statistical Society 68 statistics – assessment of claims 368–71 – as a discipline 10–11 – ideology 334–8 – improvements 362–4 – meaning of 404 – publications 16 – rules for effective practice 379–80 – teaching of 13–15 STEP (Study of the Therapeutic Effects of Intercessory Prayer) 107 storytelling 69–71 stratification 110, 383 Streptomycin clinical trial 105, 114 strip-charts 42, 43, 44, 45 strokes 99–104 Student’s t-statistic 275–7 Study of the Therapeutic Effects of Intercessory Prayer (STEP) 107 subjective probability 218–19 summaries 40, 49, 50, 51 supermarkets 112–14 supervised learning 143–4, 404 support-vector machines 174 surgery – breast cancer surgery 181–5, 183–4 – heart surgery 19–21, 22–4, 23, 56–8, 57, 93, 136–8, 137 – hernia surgery 106 survival rates 25–7, 26, 56–61, 57, 60–1 systematic reviews 102–4 T T-statistic 275–7, 404 tables 22–7, 23 tail-area 231 tea tasting 266 teachers 178–9 teaching of statistics 13–15 technology 1 telephone polls 82 Titanic challenge 148–56, 150, 152–3, 155, 162, 166–7, 172, 173, 175, 176, 177, 277 transposed conditionals, law of 216, 313 trees 7–8 trends 61–6, 62–4, 67 two-sided tests 265, 397–8 two-tailed P-values 265, 398 Type I errors 283–5, 404 Type II errors 283–5, 407 U Uncertainty 208, 240, 306–7, 383, 390 uncertainty intervals 199, 200, 241, 335 unemployment 8–9, 189–91, 271–3 university education 95–6, 135, 301–3 see also Cambridge University unsupervised learning 147, 407 US Presidents 167–9 V Vaccination 113 validity of data 79–83 variability 10, 49–51, 178–9, 407 variables 27, 56–61 variance 49, 407 Vietnam War draft lottery 81–2 violence 113 virtual populations 92 volunteer bias 85 voting age 79–80 W Waitrose 112–14 weather forecasts 161, 164, 165 weight loss 348 ‘When I’m Sixty-Four’ 351–2 wisdom of crowds 39–40, 48, 51, 407 Z Z-scores 89, 407 PELICAN BOOKS Economics: The User’s Guide Ha-Joon Chang Human Evolution Robin Dunbar Revolutionary Russia: 1891–1991 Orlando Figes The Domesticated Brain Bruce Hood Greek and Roman Political Ideas Melissa Lane Classical Literature Richard Jenkyns Who Governs Britain?

Alpha Trader
by Brent Donnelly
Published 11 May 2021

Then you test, “What if stocks fall 2% or more Thursday AND 5% or more Friday” … The more you mess around with your parameters, the more likely you are to find “interesting” results. This is called “p-hacking” in the research business. You keep tweaking the inputs until you get an interesting result. That is not good research. Don’t torture your data in an effort to make it confess. Keep your analysis simple. For the best ever and super simple explanation of how p-hacking, snooping and data-mining work, Google “xkcd green jelly beans”. 3. Be aware of the trend in your data. If you do any study of S&P 500 data since 1930, you will generally find that bullish strategies perform well and bearish strategies perform poorly!

See Big Five Personality Traits OPEC meetings March 2020, 443 as market catalyst, 311 opinions, weakly held strong, 34, 83, 387, 411 orderliness, 56 organization skills, working to improve, 76 See also organized/meticulous organized/meticulous, 35, 95—100 build/optimize information network, 97—98 daily plan, 95-97 data collection, 97 as trader attributes, 75, 95—100 oscillators, 325 outcome bias, 240—241 “Outside the Box” (newsletter), 492 overbought/oversold indicators, 234, 337—341 deviation from moving average, 338, 340—341 relative strength indicator (RSI), 338—339, 341 overconfidence, 65, 180—187, 476, 477, 482 how to combat, 187 as leak, 81 men versus women and, 35—36, 171, 186, 187 miscalibration, 183—184 overestimation, 182 overplacement, 182—183 overtrading and, 171, 173, 181, 185, 187 risk-seeking behaviors and, 186 trading and, 62, 66, 71 types of, 182—183 See also emotion oversold indicators, 234 overtrading, 57, 157, 187—197, 381—382 how to combat, 193—197 as leak, 194 major economic announcements/market events and, 188—189 overconfidence and, 171, 181, 185 time dilation and, 189, 483 trading as stimulant/sensation-seeking and, 188, 190 See also gambling p-hacking, 222 Pallesen, Jonatan, 45 panic/euphoria model, 299 Panigrahi, Asha, 69 participants, market, 273—275 banks, 273 central banks, 274 institutional, 273 monitoring, 274—275 retail, 273 Pascal’s Wager, 459 passion for trading, 34, 456—457, 465 as trader attribute, 75, 143, 144—146 trader statement of purpose and, 456—457 versus obsession, 145 path dependence, 54, 176, 192, 356 trading month, 359 understanding, 359—362 when starting new trading job, 359—360 when switching firms, 360 patience, trader attribute of, 75, 104, 138—139 pattern recognition, 163, 216, 218, 219, 349 analogs, 402—405 average path of equities since 1990, 401, 402 EURUSD after Macron and after Merkel- Macron proposal, 402—403, 403 EURUSD rolling 40-day peak to trough drawdown after Macron won, 404 idea generation and, 392, 400—406, 407 rolling 5-day percentage of SPX positive days, 401 seasonal, 401—402 patterns, method for accepting or rejecting, 219 Peoples Bank of China (PBoC), 307 performance analysis, trade, 392, 423—424 directional bias and, 424 monthly analytics, 392, 424, 434 postmortem, 392, 424, 434 recording detailed results, 392, 424, 434 persistence, 56 personality, 46—58 effect of IQ and on male earnings, 49, 50 financial success and, 49—58 success and, 46—49 See also Big Five Personality Traits perspective, trader, 132 phone, 134—135, 138 plan, 454 trading process and, 194, 483 inability to stick to, 157 Porter, David, 66 portfolio managers.

pages: 362 words: 103,087

The Elements of Choice: Why the Way We Decide Matters
by Eric J. Johnson
Published 12 Oct 2021

Researchers overcome this by searching all the online databases for results and by systematically asking people to share these studies. The second caution about forest plots is that they won’t necessarily detect which experiments have inflated their results, and/or shrunk their confidence intervals by what is called p-hacking—essentially doing many possible analyses and reporting only those that worked best. There are plots and analyses that can help detect this called funnel plots and p-curves. 30. See Jachimowicz et al., “When and Why Defaults Influence Decisions: A Meta-Analysis of Default Effects.” 31.

See describing options options trading, 285–89 order (ordering), 185–220 by attributes, 211–14 election ballots, 186–90 Expedia search, 199–202 on menus, 192–94, 214–20 Netflix, 269–70 primacy, 191–99, 206–7, 208–9 recency, 191–92, 202–8 visuals, 208–11 organ donations, 107–22, 157–58, 338–39n across countries, 109–10, 110, 111–14 alternatives, 117–22 defaults, 112–17, 120, 120–21, 151–52 incentives and, 117–18, 339n opt-in condition, 112, 114, 115, 116, 120, 121 opt-out condition, 112–16, 117, 119, 121, 127 out-of-pocket costs, 174–76 overdraft protection, 5–6 OxyContin, 317–18 Packard, Vance, 300 Pailhès, Alice, 305 Pandora, Music Genome Project, 277 “paradox of choice,” 167 Parag, Pathak, 162 path comparisons, 35–39, 36 path integration, 35–39, 36 Pathways to Technology School, 183 Patient Protection and Affordable Care Act (Obamacare). See Affordable Care Act payday loans, 127–28 p-curves, 341n Pension Protection Act of 2006, 129–30 persistent defaults, 152–53, 155 personalized defaults, 150–55 p-hacking, 341n physicians. See doctors Picwell, Inc., 281 plausible paths, 30–53, 264, 319–20 Amazon gift certificates, 34–35, 39–40, 41 Copenhagen Airport exit, 30–32, 31 dating sites, 44–52 doctors and generics, 11 finding fluency, 41–44 health insurance, 180–81 larger-later outcome, 34–35, 37, 38 Netflix, 269 path comparison, 35–39, 36 path integration, 35–39, 36 to patience, 33–41 seeing clearly, 52–53 smaller-sooner outcome, 34–35, 37, 39 Sullenberger and Flight 1549, 23–24 use of term, 23, 30 polarization, 323–24 political forecasting, 291–93 Pope, Devin, 68–69 Pope, Jaren, 68–69 pop-ups, 9, 266–67 Posner, Richard, 117 Poterba, James, 199 Practice Fusion, 317–18 preference assembly.

pages: 428 words: 103,544

The Data Detective: Ten Easy Rules to Make Sense of Statistics
by Tim Harford
Published 2 Feb 2021

Hand, Dark Data (Princeton, NJ: Princeton University Press, 2020). 16. Andrew Gelman and Eric Loken, “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘Fishing Expedition’ or ‘P-Hacking’ and the Research Hypothesis Was Posited Ahead of Time,” working paper, November 14, 2013, http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf. 17. J. P. Simmons, L. D. Nelson, and U. Simonsohn, “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” Psychological Science 22, no. 11 (2011), 1359–66, https://doi.org/10.1177/0956797611417632. 18.

pages: 533 words: 125,495

Rationality: What It Is, Why It Seems Scarce, Why It Matters
by Steven Pinker
Published 14 Oct 2021

It’s one of the explanations for the replicability crisis that rocked epidemiology, social psychology, human genetics, and other fields in the 2010s.59 Think of all the foods that are good for you which used to be bad for you, the miracle drug that turns out to work no better than the placebo, the gene for this or that trait which was really noise in the DNA, the cute studies showing that people contribute more to the coffee fund when two eyespots are posted on the wall and that they walk more slowly to the elevator after completing an experiment that presented them with words associated with old age. It’s not that the investigators faked their data. It’s that they engaged in what is now known as questionable research practices, the garden of forking paths, and p-hacking (referring to the probability threshold, p, that counts as “statistically significant”).60 Imagine a scientist who runs a laborious experiment and obtains data that are the opposite of “Eureka!” Before cutting his losses, he may be tempted to wonder whether the effect really is there, but only with the men, or only with the women, or if you throw out the freak data from the participants who zoned out, or if you exclude the crazy Trump years, or if you switch to a statistical test which looks at the ranking of the data rather than their values down to the last decimal place.

See deep learning patterning, random vs. nonrandom, 112–13 Paxton, Ken, 130–31 peace, democracy and, 264, 266, 269–72, 327 Peanuts (cartoon), 286, 298 Pearl, Judea, 261 Pearson, Egon, 221–22 peer group, expressive rationality and, 297–98 peer review, 41, 58, 160, 300–301, 316 Pence, Mike, 82–83 Pennycook, Gordon, 310–11 perceptrons. See deep learning p-hacking, 145 Pizzagate conspiracy theory, 299, 302, 304 plane crashes as risk, 33, 120, 121, 122 Plato, Euthyphro, 67 poker, 231 police and correlation–causation confusion, 260 evidence-based evaluation of, 317 killing African Americans, 123, 124–25, 141 reporting concerns to, 299, 308 policy avoiding sectarian symbolism in, 312 behavioral insights from cognitive science, 56 deliberative democracy and, 317 discounting the future, 51–52 evidence-based, 316, 317 libertarian paternalism, 56 randomized controlled trials to test, 265 rational choice axioms and, 191, 193–94 signaling equality and fairness, 165 taboo tradeoffs and, 63–64 See also government political commentary.

pages: 184 words: 46,395

The Choice Factory: 25 Behavioural Biases That Influence What We Buy
by Richard Shotton
Published 12 Feb 2018

He recruited 270 scientists to replicate 98 published psychology experiments. A worryingly low percentage produced the same results as the original. Depending on the statistical measure used, only 36% to 47% of studies were successfully replicated. But why such a low rate? Suggestions have ranged from outright fraud to p-hacking. That’s the term for blindly testing dozens of variables in the hope that, by sheer fluke, some show statistically significant results. Nosek’s work should make us cautious and not place too much weight on results backed by only a single experiment. How to apply this effect 1. Be sceptical, not cynical Nosek’s series of experiments has generated much publicity.

pages: 848 words: 227,015

On the Edge: The Art of Risking Everything
by Nate Silver
Published 12 Aug 2024

Often extended outside of poker to refer to a best-in-class experience, e.g., “We took some edibles at the Sphere. So cool, man, it was the nuts.” Occam’s razor: A heuristic, named after the fourteenth-century English philosopher William of Ockham, that simpler solutions are more likely to be true. Generally regarded highly in the River, because more complex solutions can give rise to p-hacking and overfitting, where the data is tortured to produce the desired conclusion. A related term is parsimony. Odds: Sometimes a synonym for probability, but more precisely, odds refer to the likelihood of an event not occurring compared to it happening, expressed as a ratio. For instance, if you assign 5:1 odds against Elon Musk replying to your email, that means there are 5 times he won’t reply for every time he does, so your chances of getting a reply are 1 in 6.

p(doom): Short for “probability of doom,” one’s subjective estimate of existential risk, particularly existential risk from AI. Pepe: A reference to Rare Pepe, a popular early NFT collection featuring Pepe the Frog, a much-memed green cartoon frog. Although co-opted by right-wing groups in the mid-2010s, the creator of Pepe the Frog has disavowed that connotation and the meme is typically apolitical today. p-hacking: Any of a number of dubious methods to obtain an ostensibly statistically significant result to increase the chances for publication in an academic journal. The term is derived from the p-value, a measure of statistical significance in classical statistics. Pip: The spots on a dice or a playing card.

L., 72–73 Mickelson, Phil, 197n middle (sports betting), 489 Midriver, 21, 489 Miller, Ed, 134–35, 172, 177n, 186 mimetic desire, 330–31, 489 See also conformity Mindlin, Ivan “Doc,” 195 mining (crypto), 489 misclick, 489 misogyny, 68, 118–19 Mitchell, Melanie, 450, 459 mixed strategies, 58, 60, 63, 425–26, 490 Mizuhara, Ippei, 173 model mavericks/model mediators, 446–47, 490 models abstract thinking and, 23 AI existential risk and, 446–48 vs. algorithms, 478 defined, 490 sports betting and, 179–80, 182 Moneyball, 137, 145, 153, 171, 179–80, 489 moneylines (sports betting), 183, 490 Moneymaker, Chris, 12, 43, 68, 493 Monnette, John, 103–4 moral hazard, 30, 261, 490 moral philosophy consequentialism, 359, 481, 533n deontology, 359, 368, 481, 482 game theory and, 367–68 impartiality, 358–59, 360–61, 366–67, 368, 377, 487, 533n, 538n modern value proposal, 469–72 moral parliament, 364, 470 overfitting/underfitting and, 362–68 rationality, 372–73, 495 River-Village conflict and, 30–31 See also effective altruism; rationalism; utilitarianism Morgenstern, Oskar, 22, 50–51 Moritz, Michael, 247, 248, 258, 259, 265–66, 271 Moskovitz, Dustin, 338–39 Motte-and-bailey fallacy, 490 “move fast and break things,” 250, 270, 419, 490 Mowshowitz, Zvi, 370 Murray, John, 174, 177, 208 Musk, Elon AI existential risk and, 406n, 416 Sam Altman and, 406 autism and, 282, 284 competitiveness and, 25–26 cryptocurrency and, 314–15 cults of personality and, 31 culture wars and, 29 effective altruism and, 344 luck and, 278, 280 megalothymia and, 468 OpenAI founding and, 406 poker and, 251 politics and, 267n resentment and, 277, 278 risk tolerance and, 229, 247–48, 251, 252, 264–65, 299 River and, 299 River-Village conflict and, 26–27, 267n, 295 secular stagnation and, 467 mutually assured destruction (MAD), 58, 421, 424–27, 488, 490 N Nakamoto, Satoshi, 322–23, 496 narcissism, 274–75 Nash equilibrium defined, 47, 490 dominant strategies and, 55 everyday randomization and, 64 in poker, 57–58, 60, 61, 62 prisoner’s dilemma as, 54 reciprocity and, 471 in sports betting, 58–60, 508n Negreanu, Daniel, 48–49, 66–67, 99, 100, 239, 508n nerd-sniping, 490 networking, 191, 197, 333 Neumann, Adam, 30, 281, 282, 283 neural net, 433–34, 490 New York Times, The, 27, 295 Neymar, 18, 82–83 NFTs, 325–26 apeing, 480 Bored Apes, 480 bubble in, 311, 312 DAOs and, 307 defined, 325, 490 focal points and, 330–34 profitability of, 331–32, 530n nits (gambling), 9, 114, 482, 490 Nitsche, Dominik, 49 nodes, 490 normal distribution, 491 nosebleed gambling, 491 NOT INVESTMENT ADVICE, 491 Noyce, Robert, 257 NPC (nonplayer character) syndrome, 378–79, 490 nuclear existential risk, 407, 420–30 Bayesian reasoning on, 423 game theory and, 58, 328, 420–21, 424, 426, 483 Kelly criterion and, 408–9 mutually assured destruction and, 58, 421, 424–27, 488, 490 nuclear proliferation and, 421, 540n odds of, 422–24 rationality and, 427–28 societal institutions and, 250, 456 stability-instability paradox and, 425 technological Richter scale and, 449 nuts (poker), 491 O Obama, Barack, 267 Occam’s razor, 491 Ocean’s 11, 142 Ohtani, Shohai, 173 Old Man Coffee (OMC), 491 O’Leary, Kevin, 301 “Ones Who Walk Away from Omelas, The” (Le Guin), 454n OpenAI AI breakthrough and, 415 attempt to fire Altman, 408, 411, 452n founding of, 406–7, 414 River-Village conflict and, 27 Oppenheimer, Robert, 407, 421, 425 optimism, 407–8, 413–14, 539n See also “Techno-Optimist Manifesto” optionality, 76–77, 99n, 116, 470, 491 options trading, 318–21 Ord, Toby, 352, 369–70, 380, 443 order of magnitude, 491 originating (sports betting), 491 orthogonality thesis, 418, 491 Oster, Emily, 348–49 outliers, 491 outs (poker), 491 outside view, 492 overbet (poker), 492 overdetermined, 492 overfitting/underfitting, 361, 361, 362–68, 492 P p(doom), 369, 372, 375–76, 380, 401, 408–9, 412, 416, 419, 442, 444–46, 455, 458, 492 See also existential risk Page, Larry, 259, 406 Palihapitiya, Chamath, 272, 277, 280 paper clip thought experiment, 372, 402, 418, 442, 487, 491 parameters, 491 Pareto optimal solutions, 492 Parfit, Derek, 364–65, 443–44n, 495 parlay, 492 Pascal, Blaise, 22, 457n, 492 Pascal’s Mugging, 22, 457n, 492 Pascal’s Wager, 457n, 492 patience, 258, 259, 260 payoff matrix, 492 Peabody, Rufus, 178–80, 181, 182–83, 191, 193, 195, 204, 517n Pepe, 492 Perkins, Bill, 374–75 Persinger, LoriAnn, 118 Petrov, Stanislav, 424, 426 p-hacking, 492 physical risk-takers, 217–21 Piper, Kelsey, 505n pips, 492 pit boss, 493 pits, 493 See also table games plurality, 470–71, 493 plus EV, 493 See also EV maximizing pocket pair, 41, 493 point-spread betting, 183–84, 493 poker abstract thinking and, 23–24 abuse and, 118 AI and, 40, 46–48, 60–61, 430–33, 437, 439, 507n asymmetric odds and, 248–49 attention to detail and, 233–34 bluffing, 39–40, 51, 70–75, 77, 78, 101, 509n calmness and, 221 cheating, 84, 85–86, 123–24, 126–28, 512n competitiveness and, 112, 118, 120, 243 concrete learning and, 432 corporatization of, 43–44 courage and, 222–23 deception and, 60 degens and nits, 9, 114, 482 edge and, 22, 63, 86 effective altruism and, 347–48, 367 estimation ability and, 237–38 fictional portrayals of, 45, 112, 134, 333, 487 game theory development and, 22, 50–51 game trees in, 61, 508n Garrett-Robbi hand, 80–86, 89, 117, 123–29, 130, 444–45, 512n gender and, 70, 82, 84, 100, 117–19, 511n Hellmuth’s career, 97–100 high-stakes cash games, 83–84, 115, 251–52 innovations in, 45–46 lack of money drive and, 243 language and, 439–40 mixed strategies and, 60, 63, 425–26 models in, 23–24 money and, 108–11, 120–21, 511n Elon Musk’s strategy, 251 origins of, 40 personality and, 111–17, 129–30 PokerGO studio, 48–49 post-oak bluffing, 64–65 prediction markets and, 370–71 preparation and, 233 prisoner’s dilemma and, 56–57, 508n privilege and, 82–83, 120–21 probabilistic thinking and, 41, 104–5, 127, 154n, 237 process-oriented thinking and, 226–27 race and, 118, 120, 121–22 raise-or-fold attitude and, 229–30 randomization and, 57–58, 63 regulation of, 13 scientific approach to, 41, 42–43 strategic empathy and, 225 tells, 7–8, 88, 99–104, 118, 233–34, 238, 437, 498 tournaments, 6, 7–8, 56, 154n, 503n variance and, 105, 106–11, 112 See also exploitative strategies; risk impact; solvers (poker); World Series of Poker Poker Boom (2004–2007), 12–13, 68, 315, 493 PokerGO studio, 48–49, 73, 77 polarized vs. condensed ranges (poker), 493 politics, 14–17 AI existential risk and, 458, 541n analytics and, 254 contrarianism and, 242, 254n decoupling and, 25, 27 effective altruism/rationalism and, 377–78 election forecasting, 13–14, 16–17, 27, 137, 182n, 433, 448n EV maximizing and, 14–15 expertise and, 272 gambling and, 17, 504n NFTs and, 326 prediction markets and, 373, 374–75, 535n probabilistic thinking and, 15, 17 reference classes and, 448n River-Village conflict and, 27–28, 29, 30, 267–68, 271, 505–6n SBF and, 26, 341n, 342 Village and, 26, 267–68, 271 Polk, Doug, 65–67 polymaths, 493 See also fox/hedgehog model Ponzi schemes, 309, 337, 493 Population Bomb, The (Ehrlich), 412n, 463 Porter, Jontay, 173, 177 position (poker), 493 posthumanism, 499 Postle, Mike, 84 post-oak bluffing, 64–65 pot-committed (poker), 493 Pot-Limit Omaha (PLO), 487, 493 Poundstone, William, 396 Power Law, The (Mallaby), 286 precautionary principle, 493 Precipice, The (Ord), 369–70, 443 prediction markets, 369–75, 380, 493, 535n preflop (poker), 41, 493 preparation, 232–33 price discovery, 493 priors, 493–94; see also Bayes’ theorem prisoner’s dilemma, 52–57 AI existential risk and, 417 arms race and, 478 cryptocurrency and, 315–18 defined, 494, 507–8n dominant strategies and, 54–55 poker and, 56–57, 508n reciprocity and, 367–68 regulation and, 144 sports betting and, 205 trust and, 472 updated version of, 52–54, 53 probabilistic thinking AI and, 439 AI existential risk and, 445–46 asymmetric odds and, 255 vs. determinism, 253–55, 264, 482 distribution, 9, 491 effective altruism and, 367 importance of, 15–16 poker and, 41, 104–5, 127, 154n, 237 politics and, 15, 17 prediction markets, 369–75, 493, 535n slots and, 153–55, 155 sports betting and, 16–17 theory invention, 22 See also EV maximizing probability distribution, 494 process-oriented thinking, 180, 226–27, 495 Professional Blackjack (Wong), 136 progress studies, 494 prop bets, 180, 182–83, 494 prospect theory, 428n, 494 provenance, 494 public (sports betting), 494 pump-and-dump, 494 punt (poker), 494 pure strategy, 59, 494 push (sports betting), 494 pushing the button, 494 See also existential risk Putin, Vladimir, 421–22, 424, 425 put options, 480 Q quantification, 345–51, 352, 359–60, 364 quants, 494 quantum mechanics, 253n Quit (Duke), 90, 230 Qureshi, Haseeb, 338 R Rabois, Keith, 284–85, 286–87 race casinos and, 135–36, 513n poker and, 118, 120, 121–22 River and, 29, 506n VC discrimination and, 287–90 Rain Man, 136 raise-or-fold situation, 229–31, 494 rake (casino poker), 494 Ralston, Jon, 147 randomization, 57–58, 59–60, 63, 64, 426, 438, 494 See also variance range (poker), 494 rationalism AI existential risk and, 21, 457 defined, 352–53, 354, 495 effective hedonism and, 376 futurism and, 379 impartiality and, 377 politics and, 17, 377–78 prediction markets and, 369, 372–73, 380 River and, 343 tech sector and, 21 Upriver and, 20 utilitarianism and, 364 varying streams of, 355–56, 356, 380–81, 533n wealthy elites and, 344 rationality, 17, 54, 372–73, 427–28, 495 Rawls, John, 364 Ray, John J., III, 301–2, 303 rec (recreational) players, 495 reciprocity, 130, 367–68, 471–72, 495 reference classes, 448, 450, 452, 457, 495 regression analysis, 23, 495 regulation AI, 270, 458, 541n casinos, 134, 135, 143–44, 157, 513n, 514n poker and, 13 River-Village conflict and, 31 Silicon Valley, 269–70, 272 regulatory capture, 31, 269, 270, 495 reinforcement learning from human feedback (RLHF), 440–41, 442, 495 Reinkemeier, Tobias, 102–3 replication crisis, 179, 497 Repugnant Conclusion, 364–65, 403, 495 resilience, 116–17 results-oriented thinking, 495 retail bookmakers, 186–90, 187, 489, 518n return on investment (ROI), 477, 495 revealed preference, 495 revenge, 428–30 Rhodes, Richard, 418n, 496 risk aversion, 137, 268, 277, 427–28, 495 risk ignorance, 247–48, 264, 265, 266 risk impact, 87–97 attention to detail and, 235 Coates on, 89–91, 125 flow state and, 88, 93–94, 95, 126 Garrett-Robbi hand and, 125–26 sports and, 94 tells and, 88 Tendler on, 91–94, 125 risk-loving disposition, 495 risk-neutral disposition, 495 risk of ruin, 495 risk-taker attributes, 23–26, 217–18, 221–43 adaptability, 235–37 asymmetric odds, 248, 259, 260–62 attention to detail, 233–35 calmness, 221–22 courage, 222–24 estimation ability, 237–38 fragile ego, 223 independence, 31, 239–40, 249, 268 lack of money drive, 242–43 patience, 258, 259–60, 260 preparation, 232–33 process-oriented thinking, 226–27, 495 raise-or-fold attitude, 229–31 resentment and, 223, 277 risk tolerance, 26, 30, 227–29 strategic empathy, 224–25 venture capital and, 248–49 See also contrarianism risk tolerance consequences and, 30 COVID-19 and, 6–7, 8–9, 10, 10 decision science on, 427–28 degens and nits, 9, 114, 482 founders and, 247–48, 251, 252, 264–65, 337, 403 gender and, 120 insufficiency of, 90 life expectancy and, 10–11 luck and, 116 Elon Musk and, 229, 247–48, 251, 252, 264–65, 299 poker and, 113–14 as River attribute, 26, 30, 227–29 River-Village conflict and, 29, 30 SBF and, 334–35, 397–403, 537–38n slots and, 168 sports betting and, 179, 196 statistical distribution and, 9 table games and, 165–66 venture capital and, 249, 264 Village and, 137 See also physical risk-takers River, the Archipelago, 22, 310, 478 autism and, 282–84, 525n collegiality within, 249–50 concrete learning and, 432n cultural domination of, 137–38 decoupling and, 24–25, 26, 27, 352, 505n defined, 495 demographics of, 29, 506n effective altruism and, 343 fictional portrayals of, 112 gender and, 29, 117, 506n Las Vegas veneration of, 139 map of, 18, 19, 20–26 megalothymia and, 468 name of, 18, 42, 504n obsession and, 196 prediction markets and, 371–72, 493 process-oriented thinking and, 495 quantification and, 352 race and, 29, 506n rationalism and, 343 SBF’s presence in, 299 self-awareness and, 417 venture capital and, 249–50 See also risk-taker attributes; River-Village conflict river (poker), 42, 495 River-Village conflict, 26–31 culture wars and, 29, 272–73 decoupling and, 27, 482 higher education and, 294–96 moral hazard and, 30 moral philosophy and, 30–31 politics and, 27–28, 29, 30, 267–68, 271, 505–6n regulatory capture and, 31, 269 risk aversion and, 493 Silicon Valley and, 26, 267–75, 290, 295, 505n RLHF (reinforcement learning from human feedback), 440–41, 442, 495 Robins, Jason, 184, 186 robustness, 495 Rock, Arthur, 257, 296 rock paper scissors, 47, 58 Roffman, Marvin, 151 Rogers, Kenny, 229 roon, 410–13, 417, 442, 443, 452, 459–60, 501, 539n Rounders, 45, 112, 134, 333, 487 Rousseau, Jean-Jacques, 54 Roxborough, Roxy, 178 rug pull (crypto), 496 rule utilitarianism, 368, 500 Rumbolz, Mike, 138, 142, 153, 167, 186 running good/rungood, 496 Russell, Stuart, 441 Russian roulette, 496 r/wallstreetbets, 314–15, 317–18, 321, 411, 489, 496 Ryder, Nick, 415n, 430–31, 433, 479 S Sagan, Scott, 425, 426 Sagbigsal, Bryan, 127 Saltz, Jerry, 329, 331n, 484 sample size, 496 sampling error, 489 Sassoon, Danielle, 401 Satoshi (cryptocurrency), 496 SBF.

pages: 318 words: 73,713

The Shame Machine: Who Profits in the New Age of Humiliation
by Cathy O'Neil
Published 15 Mar 2022

Quincy Bioscience Holding Company, Inc., case 17-3745, March 5, 2018, https://www.ftc.gov/​system/​files/​documents/​cases/​quincy_bioscience_ca2_ftc_brief_special_appendix_2018-0228.pdf. GO TO NOTE REFERENCE IN TEXT According to the FTC’s complaint against Prevagen: Megan L. Head et al., “The Extent and Consequences of P-Hacking in Science,” PLOS Biology 13, no. 3 (2015), https://doi.org/​10.1371/​journal.pbio.1002106. GO TO NOTE REFERENCE IN TEXT They showed no statistically significant improvement: Federal Trade Commission, case 17-3745. GO TO NOTE REFERENCE IN TEXT The psychologist Donna Hicks: D.

pages: 302 words: 85,877

Cult of the Dead Cow: How the Original Hacking Supergroup Might Just Save the World
by Joseph Menn
Published 3 Jun 2019

“I took my stupidity very seriously, and chafed under the oppressive hierarchy of the Informed Aristocracy,” he wrote. “Before cDc, there were the Elites and the Losers. It was a simple, feudal, pre-pubescent system of class discrimination, based on connections (primarily) and knowledge or experience in the h/p [hacking/phreaking] arts.… cDc was really a liberating force.” After a while, Kevin and Bill decided the group couldn’t be all ridiculous humor and overwrought exhortations, that it needed some hacker credibility. And so it was that the decidedly untechnical Bill went to the Texas Tech library, studied a book on Unix operating systems, and posted a decent summary of software commands that continued to circulate online for years.

pages: 404 words: 92,713

The Art of Statistics: How to Learn From Data
by David Spiegelhalter
Published 2 Sep 2019

Any amount of tweaking is fine in exploratory studies, but confirmatory studies should be carried out according to a pre-specified, and preferably public, protocol. Each can use P-values to summarize the strength of evidence for their conclusions, but these P-values should be clearly distinguished and interpreted very differently. Activities that are intended to create statistically significant results have come to be known as ‘P-hacking’, and although the most obvious technique is to carry out multiple tests and report the most significant, there are many more subtle ways in which researchers can exercise their degrees of freedom. Does listening to the Beatles’ song ‘When I’m Sixty-Four’ make you younger? You might feel fairly confident about the correct answer to this question.

pages: 292 words: 106,826

Boom: Bubbles and the End of Stagnation
by Byrne Hobart and Tobias Huber
Published 29 Oct 2024

Over the past decade and a half, we have witnessed the eruption of a reproducibility crisis, starting with the 2005 publication of John Ioannidis’s landmark paper “Why Most Published Research Findings Are False.” The paper showed that when the design and publication of a study is biased toward positive results—which is almost invariably the case—most of the results that get published are false. The culprit? “P-hacking,” whereby data is manipulated to make patterns appear statistically significant. The reproducibility crisis identified in Ioannidis’s paper has since been confirmed by myriad empirical studies. Almost every scientific field has been affected, from clinical trials in medicine to research in bioinformatics, neuroimaging, cognitive science, epidemiology, economics, political science, psychiatry, education, sociology, computer science, machine learning, and AI.

pages: 402 words: 129,876

Bad Pharma: How Medicine Is Broken, and How We Can Fix It
by Ben Goldacre
Published 1 Jan 2012

Darby S, McGale P, Correa C, Taylor C, Arriagada R, Clarke M, Cutter D, Davies C, Ewertz M, Godwin J, Gray R, Pierce L, Whelan T, Wang Y, Peto R.Albain K, Anderson S, Arriagada R, Barlow W, Bergh J, Bliss J, Buyse M, Cameron D, Carrasco E, Clarke M, Correa C, Coates A, Collins R, Costantino J, Cutter D, Cuzick J, Darby S, Davidson N, Davies C, Davies K, Delmestri A, Di Leo A, Dowsett M, Elphinstone P, Evans V, Ewertz M, Gelber R, Gettins L, Geyer C, Goldhirsch A, Godwin J, Gray R, Gregory C, Hayes D, Hill C, Ingle J, Jakesz R, James S, Kaufmann M, Kerr A, MacKinnon E, McGale P, McHugh T, Norton L, Ohashi Y, Paik S, Pan HC, Perez E, Peto R, Piccart M, Pierce L, Pritchard K, Pruneri G, Raina V, Ravdin P, Robertson J, Rutgers E, Shao YF, Swain S, Taylor C, Valagussa P, Viale G, Whelan T, Winer E, Wang Y, Wood W, Abe O, Abe R, Enomoto K, Kikuchi K, Koyama H, Masuda H, Nomura Y, Ohashi Y, Sakai K, Sugimachi K, Toi M, Tominaga T, Uchino J, Yoshida M, Haybittle JL, Leonard CF, Calais G, Geraud P, Collett V, Davies C, Delmestri A, Sayer J, Harvey VJ, Holdaway IM, Kay RG, Mason BH, Forbes JF, Wilcken N, Bartsch R, Dubsky P, Fesl C, Fohler H, Gnant M, Greil R, Jakesz R, Lang A, Luschin-Ebengreuth G, Marth C, Mlineritsch B, Samonigg H, Singer CF, Steger GG, Stöger H, Canney P, Yosef HM, Focan C, Peek U, Oates GD, Powell J, Durand M, Mauriac L, Di Leo A, Dolci S, Larsimont D, Nogaret JM, Philippson C, Piccart MJ, Masood MB, Parker D, Price JJ, Lindsay MA, Mackey J, Martin M, Hupperets PS, Bates T, Blamey RW, Chetty U, Ellis IO, Mallon E, Morgan DA, Patnick J, Pinder S, Olivotto I, Ragaz J, Berry D, Broadwater G, Cirrincione C, Muss H, Norton L, Weiss RB, Abu-Zahra HT, Portnoj SM, Bowden S, Brookes C, Dunn J, Fernando I, Lee M, Poole C, Rea D, Spooner D, Barrett-Lee PJ, Mansel RE, Monypenny IJ, Gordon NH, Davis HL, Cuzick J, Lehingue Y, Romestaing P, Dubois JB, Delozier T, Griffon B, Mace Lesec'h J, Rambert P, Mustacchi G, Petruzelka, Pribylova O, Owen JR, Harbeck N, Jänicke F, Meisner C, Schmitt M, Thomssen C, Meier P, Shan Y, Shao YF, Wang X, Zhao DB, Chen ZM, Pan HC, Howell A, Swindell R, Burrett JA, Clarke M, Collins R, Correa C, Cutter D, Darby S, Davies C, Davies K, Delmestri A, Elphinstone P, Evans V, Gettins L, Godwin J, Gray R, Gregory C, Hermans D, Hicks C, James S, Kerr A, MacKinnon E, Lay M, McGale P, McHugh T, Sayer J, Taylor C, Wang Y, Albano J, de Oliveira CF, Gervásio H, Gordilho J, Johansen H, Mouridsen HT, Gelman RS, Harris JR, Hayes D, Henderson C, Shapiro CL, Winer E, Christiansen P, Ejlertsen B, Ewertz M, Jensen MB, Møller S, Mouridsen HT, Carstensen B, Palshof T, Trampisch HJ, Dalesio O, de Vries EG, Rodenhuis S, van Tinteren H, Comis RL, Davidson NE, Gray R, Robert N, Sledge G, Solin LJ, Sparano JA, Tormey DC, Wood W, Cameron D, Chetty U, Dixon JM, Forrest P, Jack W, Kunkler I, Rossbach J, Klijn JG, Treurniet-Donker AD, van Putten WL, Rotmensz N, Veronesi U, Viale G, Bartelink H, Bijker N, Bogaerts J, Cardoso F, Cufer T, Julien JP, Rutgers E, van de Velde CJ, Cunningham MP, Huovinen R, Joensuu H, Costa A, Tinterri C, Bonadonna G, Gianni L, Valagussa P, Goldstein LJ, Bonneterre J, Fargeot P, Fumoleau P, Kerbrat P, Luporsi E, Namer M, Eiermann W, Hilfrich J, Jonat W, Kaufmann M, Kreienberg R, Schumacher M, Bastert G, Rauschecker H, Sauer R, Sauerbrei W, Schauer A, Schumacher M, Blohmer JU, Costa SD, Eidtmann H, Gerber G, Jackisch C, Loibl S, von Minckwitz G, de Schryver A, Vakaet L, Belfiglio M, Nicolucci A, Pellegrini F, Pirozzoli MC, Sacco M, Valentini M, McArdle CS, Smith DC, Stallard S, Dent DM, Gudgeon CA, Hacking A, Murray E, Panieri E, Werner ID, Carrasco E, Martin M, Segui MA, Galligioni E, Lopez M, Erazo A, Medina JY, Horiguchi J, Takei H, Fentiman IS, Hayward JL, Rubens RD, Skilton D, Scheurlen H, Kaufmann M, Sohn HC, Untch M, Dafni U, Markopoulos C, Dafni D, Fountzilas G, Mavroudis D, Klefstrom P, Saarto T, Gallen M, Margreiter R, de Lafontan B, Mihura J, Roché H, Asselain B, Salmon RJ, Vilcoq JR, Arriagada R, Bourgier C, Hill C, Koscielny S, Laplanche A, Lê MG, Spielmann M, A'Hern R, Bliss J, Ellis P, Kilburn L, Yarnold JR, Benraadt J, Kooi M, van de Velde AO, van Dongen JA, Vermorken JB, Castiglione M, Coates A, Colleoni M, Collins J, Forbes J, Gelber RD, Goldhirsch A, Lindtner J, Price KN, Regan MM, Rudenstam CM, Senn HJ, Thuerlimann B, Bliss JM, Chilvers CE, Coombes RC, Hall E, Marty M, Buyse M, Possinger K, Schmid P, Untch M, Wallwiener D, Foster L, George WD, Stewart HJ, Stroner P, Borovik R, Hayat H, Inbar MJ, Robinson E, Bruzzi P, Del Mastro L, Pronzato P, Sertoli MR, Venturini M, Camerini T, De Palo G, Di Mauro MG, Formelli F, Valagussa P, Amadori D, Martoni A, Pannuti F, Camisa R, Cocconi G, Colozza A, Passalacqua R, Aogi K, Takashima S, Abe O, Ikeda T, Inokuchi K, Kikuchi K, Sawa K, Sonoo H, Korzeniowski S, Skolyszewski J, Ogawa M, Yamashita J, Bastiaannet E, van de Velde CJ, van de Water W, van Nes JG, Christiaens R, Neven P, Paridaens R, Van den Bogaert W, Braun S, Janni W, Martin P, Romain S, Janauer M, Seifert M, Sevelda P, Zielinski CC, Hakes T, Hudis CA, Norton L, Wittes R, Giokas G, Kondylis D, Lissaios B, de la Huerta R, Sainz MG, Altemus R, Camphausen K, Cowan K, Danforth D, Lichter A, Lippman M, O'Shaughnessy J, Pierce LJ, Steinberg S, Venzon D, Zujewski JA, D'Amico C, Lioce M, Paradiso A, Chapman JA, Gelmon K, Goss PE, Levine MN, Meyer R, Parulekar W, Pater JL, Pritchard KI, Shepherd LE, Tu D, Whelan T, Nomura Y, Ohno S, Anderson A, Bass G, Brown A, Bryant J, Costantino J, Dignam J, Fisher B, Geyer C, Mamounas EP, Paik S, Redmond C, Swain S, Wickerham L, Wolmark N, Baum M, Jackson IM, Palmer MK, Perez E, Ingle JN, Suman VJ, Bengtsson NO, Emdin S, Jonsson H, Del Mastro L, Venturini M, Lythgoe JP, Swindell R, Kissin M, Erikstein B, Hannisdal E, Jacobsen AB, Varhaug JE, Erikstein B, Gundersen S, Hauer-Jensen M, Høst H, Jacobsen AB, Nissen-Meyer R, Blamey RW, Mitchell AK, Morgan DA, Robertson JF, Ueo H, Di Palma M, Mathé G, Misset JL, Levine M, Pritchard KI, Whelan T, Morimoto K, Sawa K, Takatsuka Y, Crossley E, Harris A, Talbot D, Taylor M, Martin AL, Roché H, Cocconi G, di Blasio B, Ivanov V, Paltuev R, Semiglazov V, Brockschmidt J, Cooper MR, Falkson CI, A'Hern R, Ashley S, Dowsett M, Makris A, Powles TJ, Smith IE, Yarnold JR, Gazet JC, Browne L, Graham P, Corcoran N, Deshpande N, di Martino L, Douglas P, Hacking A, Høst H, Lindtner A, Notter G, Bryant AJ, Ewing GH, Firth LA, Krushen-Kosloski JL, Nissen-Meyer R, Anderson H, Killander F, Malmström P, Rydén L, Arnesson LG, Carstensen J, Dufmats M, Fohlin H, Nordenskjöld B, Söderberg M, Carpenter JT, Murray N, Royle GT, Simmonds PD, Albain K, Barlow W, Crowley J, Hayes D, Gralow J, Green S, Hortobagyi G, Livingston R, Martino S, Osborne CK, Adolfsson J, Bergh J, Bondesson T, Celebioglu F, Dahlberg K, Fornander T, Fredriksson I, Frisell J, Göransson E, Iiristo M, Johansson U, Lenner E, Löfgren L, Nikolaidis P, Perbeck L, Rotstein S, Sandelin K, Skoog L, Svane G, af Trampe E, Wadström C, Castiglione M, Goldhirsch A, Maibach R, Senn HJ, Thürlimann B, Hakama M, Holli K, Isola J, Rouhento K, Saaristo R, Brenner H, Hercbergs A, Martin AL, Roché H, Yoshimoto M, Paterson AH, Pritchard KI, Fyles A, Meakin JW, Panzarella T, Pritchard KI, Bahi J, Reid M, Spittle M, Bishop H, Bundred NJ, Cuzick J, Ellis IO, Fentiman IS, Forbes JF, Forsyth S, George WD, Pinder SE, Sestak I, Deutsch GP, Gray R, Kwong DL, Pai VR, Peto R, Senanayake F, Boccardo F, Rubagotti A, Baum M, Forsyth S, Hackshaw A, Houghton J, Ledermann J, Monson K, Tobias JS, Carlomagno C, De Laurentiis M, De Placido S, Williams L, Hayes D, Pierce LJ, Broglio K, Buzdar AU, Love RR, Ahlgren J, Garmo H, Holmberg L, Liljegren G, Lindman H, Wärnberg F, Asmar L, Jones SE, Gluz O, Harbeck N, Liedtke C, Nitz U, Litton A, Wallgren A, Karlsson P, Linderholm BK, Chlebowski RT, Caffier H.