model building

back to index

183 results

pages: 134 words: 43,617

Double Helix
by James D. Watson and Gunther S. Stent
Published 1 Jan 1968

Even the presence of Peter, saying he felt sure his father would soon spring into action, failed to ruffle Maurice’s plans. Again he emphasized that he wanted to put off more model building until Rosy was gone, six weeks from then. Francis seized the occasion to ask Maurice whether he would mind if we started to play about with DNA models. When Maurice’s slow answer emerged as no, he wouldn’t mind, my pulse rate returned to normal. For even if the answer had been yes, our model building would have gone ahead. 25 THE next few days saw Francis becoming increasingly agitated by my failure to stick close to the molecular models.

Maurice had told Francis, however, that the diameter of the DNA molecule was thicker than would be the case if only one polynucleotide (a collection of nucleotides) chain were present. This made him think that the DNA molecule was a compound helix composed of several polynucleotide chains twisted about each other. If true, then before serious model building began, a decision would have to be made whether the chains would be held together by hydrogen bonds or by salt linkages involving the negatively charged phosphate groups. A further complication arose from the fact that four types of nucleotides were found in DNA. In this sense, DNA was not a regular molecule but a highly irregular one.

Given this conclusion, Maurice suspected that three polynucleotide chains were used to construct the helix. The chemical structures of the four DNA bases as they were often drawn about 1951. Because the electrons in the five- and six-membered rings are not localized, each base has a planar shape with a thickness of 3. 4 Å. He did not, however, share our belief that Pauling’s model-building game would quickly solve the structure, at least not until further X-ray results were obtained. Most of our conversation, instead, centered on Rosy Franklin. More trouble than ever was coming from her. She was now insisting that not even Maurice himself should take any more X-ray photographs of DNA.

pages: 257 words: 13,443

Statistical Arbitrage: Algorithmic Trading Insights and Techniques
by Andrew Pole
Published 14 Sep 2007

Again, superficially only. Our participation in the market is not accounted for in the model building process. A buy order from us adds to demand, dragging up price; the opposite for a sell order. Thus, our own trading introduces a force against us in the market. So our forecasts are really only valid providing we do not act on them and participate in the market. One might ask that, since the goal is to build a forecast model that is exploitable, why not include the information that the forecasts will be traded into the model building? The short—and probably also the long—answer to that is that it is just too difficult.

Statistical Arbitrage describes the phenomena, the driving forces generating those phenomena, the patterns of dynamic development of exploitable opportunities, and models for exploitation of the basic reversion to the mean in securities prices. It also offers a good deal more, from hints at more sophisticated models to valuable practical advice on model building and performance monitoring—advice applicable far beyond statistical arbitrage. Chapters 1 and 2 speak to the genesis of statistical arbitrage, the venerable pairs trading schemes of the 1980s, with startling illustration of the enormous extent and productivity of the opportunities. This demonstration sets the scene for theoretical development, providing the first step to critical understanding of practical exploitation with rules for calibrating trade placement.

‘‘The completed star catalogue tripled the number of entries in the sky atlas Tyco Brahe had compiled at Uraniborg in Denmark, and improved the precision of the census by several orders of magnitude.’’ In Longitude by Dava Sobel. Monte Carlo or Bust 7 The questions are unanswerable here. One cannot offer a philosophy or sociology of finance. But one can strive for scientific rigor in data analysis, hypothesis positing, model building, and testing. That rigor is the basis of any belief one can claim for the validity of understanding and coherent actions in exploiting emergent properties of components of the financial emporium. This volume presents a critical analysis of what statistical arbitrage is—a formal theoretical underpinning for the existence of opportunities and quantification thereof, and an explication of the enormous shifts in the structure of the U.S. economy reflected in financial markets with specific attention on the dramatic consequences for arbitrage possibilities.

Statistics in a Nutshell
by Sarah Boslaugh
Published 10 Nov 2012

Methods for Building Regression Models We’ve been looking at fairly simple regression models, but often the model-building process begins with 10, 20, or even more predictor variables under consideration for inclusion, and even with a smaller number of predictors, you might want to use a formal model-building process. Many statistical packages include several choices of algorithms for model-building, and in some systems, you can combine different methods or algorithms within the same model. There are two categories of model building: stepwise methods for considering predictors for inclusion and exclusion and blocking methods to designate which predictors should be considered for inclusion in a given step.

This type of model is particularly useful in observational studies in which you can’t use random assignment to attempt to control the influence of variables (such as demographics) that might be related to your outcome. Blocking can also be combined with automated model building because it is possible to use one automated method in one block and another (or no automated method) in another block. Continuing with the preceding example, you might have measurements of a number of demographic characteristics and not be sure which are most useful in explaining variance in your model. If it is acceptable in your field to use automated processes of model building, you could enter all the demographics in a single block and let the algorithm decide which are most useful in explaining the variance in your outcome variable.

However, many statisticians frown on building models based purely on a given data set, deeming it “going on a fishing expedition” and, when nonlinear regression is involved, arbitrary curve-fitting. We discussed the dangers of mechanical model-building in Chapter 10, but the cautions apply even more here because you are not simply adding and subtracting predictor variables but also changing their form. However, this type of model building is acceptable in some fields, so if that is the case in your workplace or school, there’s no reason you shouldn’t take advantage of all the possibilities offered by modern computer packages. Some statistical packages allow you to request that all possible linear and nonlinear relationships between two variables be calculated, and then you can simply select the one that does the best job of explaining the data.

pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006
by Ben Goertzel and Pei Wang
Published 1 Jan 2007

Looks / Program Evolution for General Intelligence focused on the symbolic regression domain; I am not aware of any results for the artificial ant problem. The final contributor to consider is the sampling mechanism (knowledge-driven knob-creation followed by probabilistic model-building). We can test to what extent model-building contributes to the bottom line by simply disabling it and assuming probabilistic independence between all knobs. The result here is of interest because model-building can be quite expensive (O(n2N) per generation, where n is the problem size and N is the population size12). In 50 independent runs of MOSES without modelbuilding, a global optimum was still discovered in all runs.

However, the variance in the number of evaluations required was much higher (in two cases over 100,000 evaluations were needed). The new average was 26,355 evaluations to reach an optimum (about 3.5 times more than required with model-building). The contribution of model-building to the performance of MOSES is expected to be even greater for more difficult problems. Applying MOSES without model-building (i.e., a model assuming no interactions between variables) is a way to test the combination of representation-building with an approach resembling the probabilistic incremental program learning (PIPE) [29] algorithm, which learns programs based on a probabilistic model without any interactions.

It is based on Koza's genetic programming [10], and modifies the representation based on a syntactic analysis driven by the scoring function, as well as a modularity bias. The representation-building that takes place consists of introducing new compound operators, and hence modifying the implicit distance function in tree12 The fact that reduction to normal tends to reduce the problem size is another synergy between it and the application of probabilistic model-building. 13 There is in fact even more information available in the hBOA models concerning hierarchy and direction of dependence, but this is difficult to analyze. M. Looks / Program Evolution for General Intelligence 141 space. This modification is uniform, in the sense that the new operators can be placed in any context, without regard for semantics.

The Economics Anti-Textbook: A Critical Thinker's Guide to Microeconomics
by Rod Hill and Anthony Myatt
Published 15 Mar 2010

But the pool-player analogy – he acts as if he understood the laws of physics without realizing that he does – has exerted a powerful influence over the imagination of economists and helps to explain their extreme scepticism that anything can be learnt about economic behaviour by asking people. What about thought experiments? This is just a folksy way of talking about model building and deductive thinking. This is nothing new. As we know, model building can be useful – providing the predictions can be tested against the evidence. If they are not tested, or if they can’t be tested, we end up with the situation described by Wassily Leontief: ‘Page after page of professional economic journals are filled with mathematical formulas leading the reader from sets of more or less plausible but entirely arbitrary assumptions to precisely stated but irrelevant theoretical conclusions’ (1983: viii).

Suggestions for further reading For a focused discussion of social cohesion and its economic benefits, see Dayton-Johnson (2001); for an interesting discussion of why and how eco­nomics has ignored community and the consequences of this neglect, see Marglin (2008) The Dismal Science: How thinking like an economist undermines community. 26 2  |  Introducing economic models ‘Whether you can observe a thing or not depends on the theory which you use. It is the theory which decides what can be observed.’ Albert Einstein1 ‘Economics is the science of thinking in terms of models joined to the art of choosing models which are relevant to the contem­ porary world.’ John Maynard Keynes2 1 the standard text 1.1 Model building and model testing Science is a method – a process of forming hypotheses, making predictions and testing the predictions against the facts. Sometimes a hypothesis will emerge from inference: looking at the world and making a generalization about it. Sometimes it will emerge from a process of deduction: thinking about the world in a systematic way.

The point is that if a model makes good predictions, then that aspect of reality which is ignored (or simplified away) did not significantly affect the outcome. Therefore, it is inappropriate to judge the usefulness of a model by the realism of its assumptions; the only relevant test is the accuracy of the model’s predictions. Having established this, most texts then illustrate the notion of model building by presenting several different models – one of which is invariably the production possibility frontier. 1.2 Examples of economic models The production possibility frontier (PPF) The production possibility frontier model has several uses. Besides providing a visual illustration of the contrast between efficiency and inefficiency, it also illustrates opportunity cost and the inevitability of trade-offs.

pages: 250 words: 79,360

Escape From Model Land: How Mathematical Models Can Lead Us Astray and What We Can Do About It
by Erica Thompson
Published 6 Dec 2022

Other memes include ‘Distracted Boyfriend’ (one thing superseded by another), ‘Picard Facepalm’ (you did something predictably silly), ‘Running Away Balloon’ (being held back from achieving an objective) and ‘This Is Fine’ (denial of a bad situation). This capacity for metaphor, and elaboration of the metaphor to generate insight or amusement, is what underlies our propensity for model-building. When you create a metaphor, or model, or meme, you are reframing a situation from a new perspective, emphasising one aspect of it and playing down others. Why is a computational model akin to the Earth’s climate? What does a Jane Austen novel have to tell us about human relationships today?

In order to do that, we would also need to agree that the difference is not problematic: for instance, if we want to decide which truck to use for commercial deliveries, half a second doesn’t matter, but if we are planning a movie stunt in which a motorbike crosses the path of the truck, half a second could be critically important and the more complex model justified. Baking a cake without a recipe Historian and philosopher of science Marcel Boumans said that ‘model-building is like baking a cake without a recipe’. We have a general shared idea of what the outcome should look like, the kinds of ingredients we are likely to need and the types of operations (mixing, heating) that will get us to the point of having a product. Very different kinds of ingredients and operations can have equally delicious outcomes, all of which we would recognise as cake.

But adequacy-for-purpose alone can also be problematic when we risk dressing something up as something it isn’t – in general, it is helpful to have some indication of what is inside. You could put candles on a round of cheese and sing ‘Happy Birthday’, but the guests might be a little surprised when they bit into it. This analogy also highlights the culturally specific aspects of model-building. As I am British, I have expectations of what ‘cake’ means that come from my own upbringing and cultural surroundings. I know what I mean by a cake! But American readers may have a slightly different idea; German readers a different idea again; Indonesian readers a very different idea – and the notion of ‘putting candles on it and singing “Happy Birthday”’ might also need to be recast into other settings.

pages: 287 words: 78,609

The Molecule of More: How a Single Chemical in Your Brain Drives Love, Sex, and Creativityand Will Det Ermine the Fate of the Human Race
by Daniel Z. Lieberman and Michael E. Long
Published 13 Aug 2018

How do we perceive something that is so far away that we can’t even see it? We use our imagination. Models are imaginary representations of the world that we build in order to better understand it. In some ways model building is like latent inhibition. Models contain only the elements of the environment that the model builder believes are essential. Other details are discarded. That makes the world easier to comprehend and, later, to imagine a variety of ways it might be manipulated for maximum benefit. Model building isn’t something we’re aware of. The brain builds models automatically as we go about our day, and updates them as we learn new things. Models not only simplify our conception of the world; they also allow us to abstract, to take specific experiences and use them to craft broad, general rules.

It would be paralyzing if I had to do that with every car I encountered. Based on my experience with real cars, I built a model of an abstract car. If a car I’ve never seen before fits the general outlines of my abstract conception, I can quickly classify it and know that it’s made for driving. Recognizing a car may seem trivial, but model building also helps us with the most cosmic abstractions. Watching how real objects moved led Newton to develop his abstract law of universal gravitation, which not only predicts how apples fall from trees, but also the movements of planets, stars, and galaxies. MENTAL TIME TRAVEL Models can be helpful when we need to choose among a number of different options.

INDEX abstract thinking and concepts and antipsychotic medication, 115 and creativity, 135, 148 and dopamine control circuit, 57, 62–63, 105–106, 172, 202, 223 dreams, 128. See also dreams and extrapersonal space, 94–95 geniuses and suppressed H&N functioning, 135–138 and happiness, 218–219, 223 liberals, 148–152, 160 and model building, 121, 124, 172 addictive behaviors and ADHD, 80–83 adolescents, 53–54 alcoholism. See alcohol addiction compulsive shopping, 48 and desire, 36–40, 43, 47 and DNA, changes in, 103–104 drug addiction. See drug addiction and easy access, 51–52 gambling, 12, 48–49, 53, 56, 61 and Here and Now neurotransmitters (H&Ns), 104–105 hypersexuality, 48–50 and impulsiveness, 46–47 pornography, 50–53, 104–105, 159–160, 208 psychotherapy for, 100–105 sex addiction, 37, 51–53 smoking (nicotine), 46–47, 51–52, 105 treatment for, 100–105 triggers, 101–103 video games, 53–56 and wanting versus liking, 34–35, 45–48, 53, 56 and willpower, 47, 97, 100.

pages: 524 words: 120,182

Complexity: A Guided Tour
by Melanie Mitchell
Published 31 Mar 2009

I have used examples of models related to the Prisoner’s Dilemma to illustrate all these points, but my previous discussion could be equally applied to nearly all other simplified models of complex systems. I will give the last word to physicist (and ahead-of-his-time model-building proponent) Phillip Anderson, from his 1977 Nobel Prize acceptance speech: The art of model-building is the exclusion of real but irrelevant parts of the problem, and entails hazards for the builder and the reader. The builder may leave out something genuinely relevant; the reader, armed with too sophisticated an experimental probe or too accurate a computation, may take literally a schematized model whose main aim is to be a demonstration of possibility.

“areas ranging from the maintenance of biodiversity to the effectiveness of bacteria in producing new antibiotics”: E.g., Nowak, M. A. and Sigmund, K., Biodiversity: Bacterial game dynamics. Nature, 418, 2002, pp. 138–139; Wiener, P., Antibiotic production in a spatially structured environment. Ecology Letters, 3(2), 2000, pp. 122–130. “All models are wrong”: Box, G.E.P. and Draper, N. R., Empirical Model Building and Response Surfaces. New York: Wiley 1997, p. 424. “Replication is one of the hallmarks”: Axelrod R., Advancing the art of simulation in the social sciences. In Conte, R., Hegselmann, R., Terna, P. (editors), Simulating Social Phenomena. (Lecture Notes in Economics and Mathematical Systems 456).

“Jose Manuel Galan and Luis Izquierdo published results”: Galan, J. M. and Izquierdo, L. R., Appearances can be deceiving: Lessons learned re-implementing Axelrod’s ‘Evolutionary Approaches to Norms.’ Journal of Artificial Societies and Social Simulation, 8 (3), 2005, [http://jasss.soc.surrey.ac.uk/8/3/2.html]. “The art of model building”: Anderson, Nobel Prize acceptance speech, 1977. Part IV “In Ersilia”: From Calvino, I. Invisible Cities. New York: Harcourt Brace Jovanovich, 1974, p. 76. (Translated by W. Weaver.) Chapter 15 “The Science of Networks”: Parts of this chapter were adapted from Mitchell, M., Complex systems: Network thinking.

pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics
by Thomas H. Davenport and Jinho Kim
Published 10 Jun 2013

How do you select variables and begin to envision how they might be related to each other? We are still largely in the realm of the subjective here. Hypotheses—the early story you tell about your analysis—are simply educated guesses about what variables really matter in your model. At this stage, model building involves using logic, experience, and previous findings to hypothesize your dependent variable—the one you are trying to predict or explain—and the independent variables that will affect it. You will, of course, test your hypothesis later; that is what differentiates analytical thinking from less precise approaches to decision making like intuition. 3.

As with any other type of model, a few concrete examples (historical or made up) are extremely useful to step through in structuring the basic model. In this exercise, the quant must listen carefully, ask clarifying questions, and absorb as much of the knowledge of the business decision maker as possible. This is as much about relationship building as it is model building. At this point the quant team should be ready to spring into action. It needs to select the right math approach, formalize the model so it can be represented in the computer, collect the data, and get it into the computer. The analyst can then test the model by performing sensitivity analysis on variables and relationships, and trying alternatives.

case=2393563144534950884; “People v. Collins,” http://en.wikipedia. org/wiki/People_v._Collins. Chapter 3 1. A. M. Starfield, Karl A. Smith, and A. L. Bleloch, How to Model It: Problem Solving for the Computer Age (New York: McGraw-Hill, 1994), 19. 2. George Box and Norman R. Draper, Empirical Model-Building and Response Surfaces (New York: Wiley, 1987), 424. 3. Garth Sundem, Geek Logik: 50 Foolproof Equations for Everyday Life, (New York: Workman, 2006). 4. Minnie Brashears, Mark Twain, Son of Missouri (Whitefish, MT: Kessinger Publishing, 2007). 5. Ernest E. Leisy, ed., The Letters of Quintus Curtius Snodgrass (Irving, TX: University Press of Dallas, 1946). 6.

The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy
by Matthew Hindman
Published 24 Sep 2018

But what has been especially needed are not new facts but new models, theories that explain the broad patterns of digital traffic and revenue from Facebook and Google all the way down to tiny personal blogs. What we need are simplified stories that can explain both the enormous concentration at the top of the web and the (very) long tail of smaller sites. Chapters 4 and 5 take on the task of model building, using different but complementary approaches. Chapter 4 builds a formal economic model of online content production. This deductive model is based on three key assumptions. First, larger sites are assumed to have economies of scale, both in the production of content and in their ability to turn traffic into revenue.

Weak preferences give no reason for users to spread out, but strong preferences just end up sending users to portals, aggregators, and search engines. If digital media diversity requires Goldilocks preferences—not too strong, but not too weak—then it is unlikely to be stable. Yet before we delve into model building, several topics touched on earlier require a fuller treatment. We will start by discussing online The Economic Geography of Cyberspace • 65 aggregation and the economics of bundling. We will then lay out what we know about media preferences, and why larger sites now get more money per visitor than small local producers.

In general, though, there are good reasons to prefer the power law label, even when other distributions may fit the data slightly better. Of course other related distributions often fit better: they have two or more parameters, while pure power laws have only one. Parsimony is a cardinal virtue in model building, and each additional parameter provides latitude for mischief. As John von Neumann reportedly said, “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”1 In any case, our data show a good fit to a pure power law, the discussion in the previous paragraphs notwithstanding.

pages: 923 words: 163,556

Advanced Stochastic Models, Risk Assessment, and Portfolio Optimization: The Ideal Risk, Uncertainty, and Performance Measures
by Frank J. Fabozzi
Published 25 Feb 2008

Notice that only one of the dummy variable regression coefficients is statistically significant at the 10% level, b3. This suggests that market participants reassessed the impact of the coupon rate (x1) on the yield spread. The estimated regression coefficient for x1 on November 28, 2005 was 33.25 (b2) whereas on June 6, 2005 it was 61.29 (b2 + b3 = 33.25 + 28.14). MODEL BUILDING TECHNIQUES We now turn our attention to the model building process in the sense that we attempt to find the independent variables that best explain the variation in the dependent variable y. At the outset, we do not know how many and which independent variables to use. Increasing the number of independent variables does not always improve regressions.

DECOMPOSITION OF TIME SERIES REPRESENTATION OF TIME SERIES WITH DIFFERENCE EQUATIONS APPLICATION: THE PRICE PROCESS CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) PART Two - Basic Probability Theory CHAPTER 8 - Concepts of Probability Theory HISTORICAL DEVELOPMENT OF ALTERNATIVE APPROACHES TO PROBABILITY SET OPERATIONS AND PRELIMINARIES PROBABILITY MEASURE RANDOM VARIABLE CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 9 - Discrete Probability Distributions DISCRETE LAW BERNOULLI DISTRIBUTION BINOMIAL DISTRIBUTION HYPERGEOMETRIC DISTRIBUTION MULTINOMIAL DISTRIBUTION POISSON DISTRIBUTION DISCRETE UNIFORM DISTRIBUTION CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 10 - Continuous Probability Distributions CONTINUOUS PROBABILITY DISTRIBUTION DESCRIBED DISTRIBUTION FUNCTION DENSITY FUNCTION CONTINUOUS RANDOM VARIABLE COMPUTING PROBABILITIES FROM THE DENSITY FUNCTION LOCATION PARAMETERS DISPERSION PARAMETERS CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 11 - Continuous Probability Distributions with Appealing Statistical Properties NORMAL DISTRIBUTION CHI-SQUARE DISTRIBUTION STUDENT’S t-DISTRIBUTION F-DISTRIBUTION EXPONENTIAL DISTRIBUTION RECTANGULAR DISTRIBUTION GAMMA DISTRIBUTION BETA DISTRIBUTION LOG-NORMAL DISTRIBUTION CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 12 - Continuous Probability Distributions Dealing with Extreme Events GENERALIZED EXTREME VALUE DISTRIBUTION GENERALIZED PARETO DISTRIBUTION NORMAL INVERSE GAUSSIAN DISTRIBUTION α-STABLE DISTRIBUTION CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 13 - Parameters of Location and Scale of Random Variables PARAMETERS OF LOCATION PARAMETERS OF SCALE CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 14 - Joint Probability Distributions HIGHER DIMENSIONAL RANDOM VARIABLES JOINT PROBABILITY DISTRIBUTION MARGINAL DISTRIBUTIONS DEPENDENCE COVARIANCE AND CORRELATION SELECTION OF MULTIVARIATE DISTRIBUTIONS CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 15 - Conditional Probability and Bayes’ Rule CONDITIONAL PROBABILITY INDEPENDENT EVENTS MULTIPLICATIVE RULE OF PROBABILITY BAYES’ RULE CONDITIONAL PARAMETERS CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 16 - Copula and Dependence Measures COPULA ALTERNATIVE DEPENDENCE MEASURES CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) PART Three - Inductive Statistics CHAPTER 17 - Point Estimators SAMPLE, STATISTIC, AND ESTIMATOR QUALITY CRITERIA OF ESTIMATORS LARGE SAMPLE CRITERIA MAXIMUM LIKEHOOD ESTIMATOR EXPONENTIAL FAMILY AND SUFFICIENCY CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 18 - Confidence Intervals CONFIDENCE LEVEL AND CONFIDENCE INTERVAL CONFIDENCE INTERVAL FOR THE MEAN OF A NORMAL RANDOM VARIABLE CONFIDENCE INTERVAL FOR THE MEAN OF A NORMAL RANDOM VARIABLE WITH UNKNOWN VARIANCE CONFIDENCE INTERVAL FOR THE VARIANCE OF A NORMAL RANDOM VARIABLE CONFIDENCE INTERVAL FOR THE VARIANCE OF A NORMAL RANDOM VARIABLE WITH UNKNOWN MEAN CONFIDENCE INTERVAL FOR THE PARAMETER P OF A BINOMIAL DISTRIBUTION CONFIDENCE INTERVAL FOR THE PARAMETER λ OF AN EXPONENTIAL DISTRIBUTION CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 19 - Hypothesis Testing HYPOTHESES ERROR TYPES QUALITY CRITERIA OF A TEST EXAMPLES CONCEPTS EXPLAINED IN THIS CHAPTER (INORDER OF PRESENTATION) PART Four - Multivariate Linear Regression Analysis CHAPTER 20 - Estimates and Diagnostics for Multivariate Linear Regression Analysis THE MULTIVARIATE LINEAR REGRESSION MODEL ASSUMPTIONS OF THE MULTIVARIATE LINEAR REGRESSION MODEL ESTIMATION OF THE MODEL PARAMETERS DESIGNING THE MODEL DIAGNOSTIC CHECK AND MODEL SIGNIFICANCE APPLICATIONS TO FINANCE CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 21 - Designing and Building a Multivariate Linear Regression Model THE PROBLEM OF MULTICOLLINEARITY INCORPORATING DUMMY VARIABLES AS INDEPENDENT VARIABLES MODEL BUILDING TECHNIQUES CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) CHAPTER 22 - Testing the Assumptions of the Multivariate Linear Regression Model TESTS FOR LINEARITY ASSUMED STATISTICAL PROPERTIES ABOUT THE ERROR TERM TESTS FOR THE RESIDUALS BEING NORMALLY DISTRIBUTED TESTS FOR CONSTANT VARIANCE OF THE ERROR TERM (HOMOSKEDASTICITY) ABSENCE OF AUTOCORRELATION OF THE RESIDUALS CONCEPTS EXPLAINED IN THIS CHAPTER (IN ORDER OF PRESENTATION) APPENDIX A - Important Functions and Their Features APPENDIX B - Fundamentals of Matrix Operations and Concepts APPENDIX C - Binomial and Multinomial Coefficients APPENDIX D - Application of the Log-Normal Distribution to the Pricing of Call Options References Index The Frank J.

The two illustrations presented include the estimation of the duration of certain sectors of the financial market and the prediction of the 10-year Treasury yield. In Chapter 21, we focus on the design and the building process of multivariate linear regression models. The three principal topics covered in this chapter are the problem of multicollinearity, incorporating dummy variables into a regression model and model building techniques using stepwise regression analysis. Multicollinearity is the problem that is caused by including in a multivariate linear regression independent variables that themselves may be highly correlated. Dummy variables allow the incorporation of independent variables that represent a characteristic or attribute such as industry sector or a time period within which an observation falls.

pages: 338 words: 106,936

The Physics of Wall Street: A Brief History of Predicting the Unpredictable
by James Owen Weatherall
Published 2 Jan 2013

(For instance, Osborne showed that price changes aren’t independent. This is especially true during market crashes, when a series of downward ticks makes it very likely that prices will continue to fall. When this kind of herding effect is present, even Osborne’s extended Brownian motion model is going to be an unreliable guide.) The model-building process involves constantly updating your best models and theories in light of new evidence, pulling yourself up by the bootstraps as you progressively understand whatever you’re studying — be it cells, hurricanes, or stock prices. Not everyone who has worked with mathematical models in finance has been as sensitive to the importance of this methodology as Osborne was, which is one of the principal reasons why mathematical models have sometimes been associated with financial ruin.

Sometimes you realize that your original solution is no good, because it depends too heavily on assumptions that never really apply; other times, you discover that the solution is pretty good but can be improved in simple ways; and other times still, you realize that your solution is great under certain circumstances, but you need to think about what to do when those circumstances don’t apply. Obviously, physicists aren’t the only people who have thought about understanding the world in this way. This kind of model building is ubiquitous in economics and in other sciences. Unsurprisingly, most advances in economics have been made by economists. But physicists are very good — perhaps especially good — at thinking like this. And they are usually trained in a way that helps them solve certain kinds of problems in economics, without the political or intellectual baggage that sometimes hampers economists.

For this reason, a measure of caution and a good helping of common sense are always going to be important when we try to use models successfully. But recognizing that we will never be able to predict everything, and that we shouldn’t assume our models reveal some deep truth about what can and cannot occur, is part and parcel of what I have described as thinking like a physicist — it amounts to resisting complacency in model building. And indeed, trying to figure out how to predict the kinds of events that might have seemed like black swans from the perspective of (say) Osborne’s random walk model is precisely what led Sornette to start thinking about dragon kings. Surely not every black swan is really a dragon king in disguise.

pages: 600 words: 72,502

When More Is Not Better: Overcoming America's Obsession With Economic Efficiency
by Roger L. Martin
Published 28 Sep 2020

That is what human advancement is about: building better models. But only in the rarified air of PhD programs is model building a core aspect of the educational experience. The rest of the educational sphere is primarily concerned with learning and applying the existing models of the world—as if they were certainly correct. This most definitely does not need to be the case. Students don’t need to wait until they enter a PhD program—which only a distinct minority of undergraduates ever ends up doing—to engage in mental model building. Beth Grosso demonstrates that her fourth-and fifth-grade students can easily learn to build new models and challenge existing ones.

See economic growth growth rate, 8, 11 gun control, 197, 206 Hamilton, Alexander, 40 Harris, Michael, 47 Harvard Business School, 175 Harvard Kennedy School, 180 Harvey, Campbell, 155 Hasso Plattner Institute of Design, 179–180 hedge funds, 91, 155, 157, 158 hedonistic adjustments, 10–11 Heinz, 123–124, 187, 189 high-frequency trading, 90, 156–157 high-income Americans, 7–8 high-speed trading, 90 honeybees, 74–75 Hong Kong, 93–94 hotel industry, 122 House of Representatives, 201, 202 housing bubble, 81, 137, 140 Huizenga, Wayne, 71–72 humanities, 181–182, 184 IBM, 132 IDEO, 179 ideology, 213–214 iHeartMedia, 97–98, 99, 101 IIT Institute of Design, 180 Illinois Institute of Technology, 180 improvement, 103–106, 113, 132 income average, by percentile, 1913–2015, 7 disparities, 66 growth, 33 mean family, 4, 6 median family, 4–5, 11, 38 real, 10, 11 income distribution, 14, 36–37, 38, 63, 70, 161 income taxes, 159–162 incumbents, 202 Industrial Revolution, 41 industry consolidation, 71–73 See also mergers inflation, 24, 31, 103 innovation, 54 input-output relationships, 81–82 input-output tables, 22 Instagram, 61, 65, 71, 129, 191, 192 Institute for Competitiveness & Prosperity, 17 integrative thinking, 174, 176, 178 Intel, 129 interdependencies, 106–113 interest rates, 80, 81 internet, 63–64, 66 internet of things (IoT), 106 Investors Exchange (IEX), 156–157 invisible hand, 39, 41 Irish potato famine, 75 isolation, 199 Italy, 157 I-Think Initiative, 171–172, 181 Japanese auto manufacturers, 43, 151 Jefferson, Thomas, 40 Jenner, Kylie, 65 job design, 31–32 job market, 66–70 Joe’s Stone Crab, 115–120, 121, 132–133 Johnson, Dwayne, 65 Kaplan, Robert, 129 Kardashian, Kim, 65 Ka-shing, Li, 52 Katsuyama, Brad, 156 Kelley, David, 179 Kennedy, John F., 198 killer whales, 82–83 King, Martin Luther, Jr., 192 Kraft, 123–124 Kraft Heinz, 123–124, 127 labor division of, 39–41 rewards for, 67–70 labor costs, 49–50, 63, 122, 124–125, 128 labor market, 67–70 ladder of inference, 167 laws, revision of, 142–145 leading brands, 191–192 leaky bucket metaphor, 27–28 Leamington Ketchup Affair, 187–190 LearningEdge platform, 177 legal system, 105–106 legislation, 93 antitrust, 53–56, 152, 153 revising, 142–145 Lehman Brothers, 84–86, 104, 137 Leontief, Wassily, 21–22, 30, 80 leveraged buyouts (LBOs), 97–99, 101 Lightner, Candace, 193–194 living standards, 9–11, 33 lobbyists, 91, 93 Loblaws, 188 Long Depression, 31 long-term capital, 157, 158–159 Long-Term Stock Exchange (LTSE), 156 Loosemore, Tom, 147–149 Lorenz, Edward, 81 Loungani, Prakash, 82 Love Canal families, 193, 207 Loyalty Effect, The (Reichheld), 27–28 Lucas, Robert, 24 MacArthur, Douglas, 43 machine model, 22, 25, 26, 30–44, 94, 100, 103–104, 123, 210 Madison, James, 40 management models, 49–50 scientific, 42 total quality, 43 manufacturing industries, 41, 126 Martin, Paul, 141 Martin Prosperity Institute, 2, 17 Massachusetts Institute of Technology (MIT), 176–177, 184 master’s in business administration (MBA), 174–175 McKelvey, Bill, 62 mean family income, 4, 6 Measure What Matters (Doerr), 52 median family income, 4–5, 11, 38 median voter, 38 mental proximity, 145–149 mergers, 53–54, 63–64, 123–124, 141 metaphors for education, 29 importance of, 25–26 leaky bucket metaphor, 27–28 machine, 22, 25, 26, 30–44, 94, 100, 103–104, 123, 210 Microsoft, 131 middle class, 9, 14, 36 mobile operating systems, 131 mobility, 9, 37 models building better, 171–172, 179 in business and public policy, 27–30 of citizens, 145 core components of, 29 critical evaluation of, 171 doubling-down on existing, 214 economic. See economic models imperfections in, 45–57 use of proxies and, 45–57 Moms Demand Action for Gun Sense in America, 197 monetarists, 24 monetary policies, 31, 103 monocultures, 73–75, 153 monopolies, 53–54, 63, 129–135, 152–154 mortgage-backed bond market, 109–111 mortgages, 80–81 Mothers Against Drunk Driving (MADD), 193–196, 199 multihoming, 192 NAFTA.

pages: 317 words: 84,400

Automate This: How Algorithms Came to Rule Our World
by Christopher Steiner
Published 29 Aug 2012

The least squares method allows for predictive model building based on observed results. To find the best model according to the data in hand, Gauss developed equations that minimize the squares of the differences between measured values, which could be stock prices of the past, and values provided by a predictive model. The model—which can be anything from a quadratic equation to a multitiered algorithm—is adjusted up and down until the point of least squares is found. The least squares method forms the backbone of modern statistics and algorithmic model building. When building an algorithm to, say, predict future stock prices, testing it means running the algorithm through loops of data representing days and prices that have already taken place.

Simpson jurors evaluated, 177 see also litigation Lawrence, Peter, 1–2 least squares method, 62–63 Le Corbusier, 56 Lee, Spike, 87 Lehman Brothers, 191, 192 Leibniz, Gottfried, 26, 57–61, 68, 72 binary language of, 57–58, 60–61, 71, 73 Leipzig, 58 Lennon, John, 104, 107–8 “In My Life” claimed by, 110–11 as math savant, 103 “Let It Be,” 103 Levchin, Max, 188 leverage, trading on margin with, 51 Lewis, Michael, 141, 202 Li, David X., 65 Liber Abaci (The Book of Calculation) (Fibonacci), 56–57 Library of Congress, 193 Lin, Jeremy, 142–43 linguistics, 187 liquidity crisis, potential, 51–52 Lisp, 12, 93, 94 lit fiber, 114, 120 lithium hydroxide, 166 Lithuania, 69 litigation: health insurers and, 181 stock prices and potential, 27 Walgreens and, 156 logic: algorithms and, 71 broken down into mechanical operations, 58–59 logic theory, 73 logic trees, 171 London, 59, 66–67, 68, 121, 198 Los Angeles International Airport, security algorithm at, 135 Los Angeles Lakers, 143 loudness, 93, 106 Lovelace, Ada, 73 Lovell, James, 165–67 Lulea, Sweden, 204 lunar module, 166 lung cancer, 154 McAfee, Andrew P., 217–18 McCartney, Paul, 104, 105, 107 “In My Life” claimed by, 110–11 as math savant, 103 McCready, Mike, 78–83, 85–89 McGuire, Terry, 145, 168–72, 174–76 machine-learning algorithms, 79, 100 Magnetar Capital, 3–4, 10 Mahler, Gustav, 98 Major Market Index, 40, 41 Making of a Fly, The (Lawrence), prices of, 1–2 Malyshev, Mikhail, 190 management consultants, 189 margin, trading with, 51 market cap, price swings and, 49 market makers: bids and offers by, 35–36 Peterffy as, 31, 35–36, 38, 51 market risk, 66 Maroon 5, 85 Marseille, 147, 149 Marshall, Andrew, 140 Martin, George, 108–10 Martin, Max (Martin Sandberg), 88–89 math: behind algorithms, 6, 53 education in, 218–20 mathematicians: algorithms and, 6, 71 online, 53 on Wall Street, 13, 23, 24, 27, 71, 179, 185, 201–3 Mattingly, Ken, 167 MBAs: eLoyalty’s experience with, 187 Peterffy’s refusal to hire, 47 MDCT scans, 154 measurement errors, distribution of, 63 medical algorithms, 54, 146 in diagnosis and testing, 151–56, 216 in organ sharing, 147–51 patient data and home monitoring in, 158–59 physicians’ practice and, 156–62 medical residencies, game theory and matching for, 147 medicine, evidence-based, 156 Mehta, Puneet, 200, 201 melodies, 82, 87, 93 Mercer, Robert, 178–80 Merrill Lynch, 191, 192, 200 Messiah, 68 metal: trading of, 27 volatility of, 22 MGM, 135 Miami University, 91 Michigan, 201 Michigan, University of, 136 Microsoft, 67, 124, 209 microwaves, 124 Midas (algorithm), 134 Miller, Andre, 143 mind-reading bots, 178, 181–83 Minneapolis, Minn., 192–93 minor-league statistics, baseball, 141 MIT, 24, 73, 128, 160, 179, 188, 217 Mocatta & Goldsmid, 20 Mocatta Group, 20, 21–25, 31 model building, predictive, 63 modifiers, 71 Boolean, 72–73 Mojo magazine, 110 Moneyball (Lewis), 141 money markets, 214 money streams, present value of future, 57 Montalenti, Andrew, 200–201 Morgan Stanley, 116, 128, 186, 191, 200–201, 204 mortgage-backed securities, 203 mortgages, 57 defaults on, 65 quantitative, 202 subprime, 65, 202, 216 Mosaic, 116 movies, algorithms and, 75–76 Mozart, Wolfgang Amadeus, 77, 89, 90, 91, 96 MP3 sharing, 83 M Resort Spa, sports betting at, 133–35 Mubarak, Hosni, 140 Muller, Peter, 128 music, 214 algorithms in creation of, 76–77, 89–103 decoding Beatles’, 70, 103–11 disruptors in, 102–3 homogenization or variety in, 88–89 outliers in, 102 predictive algorithms for success of, 77–89 Music X-Ray, 86–87 Musikalisches Würfelspiel, 91 mutual funds, 50 MyCityWay, 200 Najarian, John A., 119 Naples, 121 Napoleon I, emperor of France, 121 Napster, 81 Narrative Science, 218 NASA: Houston mission control of, 166, 175 predictive science at, 61, 164, 165–72, 174–77, 180, 194 Nasdaq, 177 algorithm dominance of, 49 Peterffy and, 11–17, 32, 42, 47–48, 185 terminals of, 14–17, 42 trading method at, 14 National Heart, Lung, and Blood Institute, 159 Nationsbank, Chicago Research and Trading Group bought by, 46 NBA, 142–43 Neanderthals, human crossbreeding with, 161 Nebraska, 79–80, 85 Netflix, 112, 207 Netherlands, 121 Netscape, 116, 188 Nevermind, 102 New England Patriots, 134 New Jersey, 115, 116 Newsweek, 126 Newton, Isaac, 57, 58, 59, 64, 65 New York, N.Y., 122, 130, 192, 201–2, 206 communication between markets in Chicago and, 42, 113–18, 123–24 financial markets in, 20, 198 high school matching algorithm in, 147–48 McCready’s move to, 85 Mocatta’s headquarters in, 26 Peterffy’s arrival in, 19 tech startups in, 210 New York Commodities Exchange (NYCE), 26 New Yorker, 156 New York Giants, 134 New York Knicks, 143 New York magazine, 34 New York State, health department of, 160 New York Stock Exchange (NYSE), 3, 38–40, 44–45, 49, 83, 123, 184–85 New York Times, 123, 158 New York University, 37, 132, 136, 201, 202 New Zealand, 77, 100, 191 Nietzsche, Friedrich, 69 Nirvana, 102 Nixon, Richard M., 140, 165 Nobel Prize, 23, 106 North Carolina, 48, 204 Northwestern University, 145, 186 Kellogg School of Management at, 10 Novak, Ben, 77–79, 83, 85, 86 NSA, 137 NuclearPhynance, 124 nuclear power, 139 nuclear weapons, in Iran, 137, 138–39 number theory, 65 numerals: Arabic-Indian, 56 Roman, 56 NYSE composite index, 40, 41 Oakland Athletics, 141 Obama, Barack, 46, 218–19 Occupy Wall Street, 210 O’Connor & Associates, 40, 46 OEX, see S&P 100 index Ohio, 91 oil prices, 54 OkCupid, 144–45 Olivetti home computers, 27 opera, 92, 93, 95 Operation Match, 144 opinions-driven people, 173, 174, 175 OptionMonster, 119 option prices, probability and statistics in, 27 options: Black-Scholes formula and, 23 call, 21–22 commodities, 22 definition of, 21 pricing of, 22 put, 22 options contracts, 30 options trading, 36 algorithms in, 22–23, 24, 114–15 Oregon, University of, 96–97 organ donor networks: algorithms in, 149–51, 152, 214 game theory in, 147–49 oscilloscopes, 32 Outkast, 102 outliers, 63 musical, 102 outputs, algorithmic, 54 Pacific Exchange, 40 Page, Larry, 213 PageRank, 213–14 pairs matching, 148–51 pairs trading, 31 Pakistan, 191 Pandora, 6–7, 83 Papanikolaou, Georgios, 153 Pap tests, 152, 153–54 Parham, Peter, 161 Paris, 56, 59, 121 Paris Stock Exchange, 122 Parse.ly, 201 partial differential equations, 23 Pascal, Blaise, 59, 66–67 pathologists, 153 patient data, real-time, 158–59 patterns, in music, 89, 93, 96 Patterson, Nick, 160–61 PayPal, 188 PCs, Quotron data for, 33, 37, 39 pecking orders, social, 212–14 Pennsylvania, 115, 116 Pennsylvania, University of, 49 pension funds, 202 Pentagon, 168 Perfectmatch.com, 144 Perry, Katy, 89 Persia, 54 Peru, 91 Peterffy, Thomas: ambitions of, 27 on AMEX, 28–38 automated trading by, 41–42, 47–48, 113, 116 background and early career of, 18–20 Correlator algorithm of, 42–45 early handheld computers developed by, 36–39, 41, 44–45 earnings of, 17, 37, 46, 48, 51 fear that algorithms have gone too far by, 51 hackers hired by, 24–27 independence retained by, 46–47 on index funds, 41–46 at Interactive Brokers, 47–48 as market maker, 31, 35–36, 38, 51 at Mocatta, 20–28, 31 Nasdaq and, 11–18, 32, 42, 47–48, 185 new technology innovated by, 15–16 options trading algorithm of, 22–23, 24 as outsider, 31–32 profit guidelines of, 29 as programmer, 12, 15–16, 17, 20–21, 26–27, 38, 48, 62 Quotron hack of, 32–35 stock options algorithm as goal of, 27 Timber Hill trading operation of, see Timber Hill traders eliminated by, 12–18 trading floor methods of, 28–34 trading instincts of, 18, 26 World Trade Center offices of, 11, 39, 42, 43, 44 Petty, Tom, 84 pharmaceutical companies, 146, 155, 186 pharmacists, automation and, 154–56 Philips, 159 philosophy, Leibniz on, 57 phone lines: cross-country, 41 dedicated, 39, 42 phones, cell, 124–25 phosphate levels, 162 Physicians’ Desk Reference (PDR), 146 physicists, 62, 157 algorithms and, 6 on Wall Street, 14, 37, 119, 185, 190, 207 pianos, 108–9 Pincus, Mark, 206 Pisa, 56 pitch, 82, 93, 106 Pittsburgh International Airport, security algorithm at, 136 Pittsburgh Pirates, 141 Pius II, Pope, 69 Plimpton, George, 141–42 pneumonia, 158 poetry, composed by algorithm, 100–101 poker, 127–28 algorithms for, 129–35, 147, 150 Poland, 69, 91 Polyphonic HMI, 77–79, 82–83, 85 predictive algorithms, 54, 61, 62–65 prescriptions, mistakes with, 151, 155–56 present value, of future money streams, 57 pressure, thriving under, 169–70 prime numbers, general distribution pattern of, 65 probability theory, 66–68 in option prices, 27 problem solving, cooperative, 145 Procter & Gamble, 3 programmers: Cope as, 92–93 at eLoyalty, 182–83 Peterffy as, 12, 15–16, 17, 20–21, 26–27, 38, 48, 62 on Wall Street, 13, 14, 24, 46, 47, 53, 188, 191, 203, 207 programming, 188 education for, 218–20 learning, 9–10 simple algorithms in, 54 Progress Energy, 48 Project TACT (Technical Automated Compatibility Testing), 144 proprietary code, 190 proprietary trading, algorithmic, 184 Prussia, 69, 121 PSE, 40 pseudocholinesterase deficiency, 160 psychiatry, 163, 171 psychology, 178 Pu, Yihao, 190 Pulitzer Prize, 97 Purdue University, 170, 172 put options, 22, 43–45 Pythagorean algorithm, 64 quadratic equations, 63, 65 quants (quantitative analysts), 6, 46, 124, 133, 198, 200, 202–3, 204, 205 Leibniz as, 60 Wall Street’s monopoly on, 183, 190, 191, 192 Queen’s College, 72 quizzes, and OkCupid’s algorithms, 145 Quotron machine, 32–35, 37 Rachmaninoff, Sergei, 91, 96 Radiohead, 86 radiologists, 154 radio transmitters, in trading, 39, 41 railroad rights-of-way, 115–17 reactions-based people, 173–74, 195 ReadyForZero, 207 real estate, 192 on Redfin, 207 recruitment, of math and engineering students, 24 Redfin, 192, 206–7, 210 reflections-driven people, 173, 174, 182 refraction, indexes of, 15 regression analysis, 62 Relativity Technologies, 189 Renaissance Technologies, 160, 179–80, 207–8 Medallion Fund of, 207–8 retirement, 50, 214 Reuter, Paul Julius, 122 Rhode Island hold ‘em poker, 131 rhythms, 82, 86, 87, 89 Richmond, Va., 95 Richmond Times-Dispatch, 95 rickets, 162 ride sharing, algorithm for, 130 riffs, 86 Riker, William H., 136 Ritchie, Joe, 40, 46 Rochester, N.Y., 154 Rolling Stones, 86 Rondo, Rajon, 143 Ross, Robert, 143–44 Roth, Al, 147–49 Rothschild, Nathan, 121–22 Royal Society, London, 59 RSB40, 143 runners, 39, 122 Russia, 69, 193 intelligence of, 136 Russian debt default of 1998, 64 Rutgers University, 144 Ryan, Lee, 79 Saint Petersburg Academy of Sciences, 69 Sam Goody, 83 Sandberg, Martin (Max Martin), 88–89 Sandholm, Tuomas: organ donor matching algorithm of, 147–51 poker algorithm of, 128–33, 147, 150 S&P 100 index, 40–41 S&P 500 index, 40–41, 51, 114–15, 218 Santa Cruz, Calif., 90, 95, 99 satellites, 60 Savage Beast, 83 Saverin, Eduardo, 199 Scholes, Myron, 23, 62, 105–6 schools, matching algorithm for, 147–48 Schubert, Franz, 98 Schwartz, Pepper, 144 science, education in, 139–40, 218–20 scientists, on Wall Street, 46, 186 Scott, Riley, 9 scripts, algorithms for writing, 76 Seattle, Wash., 192, 207 securities, 113, 114–15 mortgage-backed, 203 options on, 21 Securities and Exchange Commission (SEC), 185 semiconductors, 60, 186 sentence structure, 62 Sequoia Capital, 158 Seven Bridges of Königsberg, 69, 111 Shannon, Claude, 73–74 Shuruppak, 55 Silicon Valley, 53, 81, 90, 116, 188, 189, 215 hackers in, 8 resurgence of, 198–211, 216 Y Combinator program in, 9, 207 silver, 27 Simons, James, 179–80, 208, 219 Simpson, O.

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by Dipanjan Sarkar
Published 1 Dec 2016

Integrated Development Environments Environment Setup Virtual Environments Python Syntax and Structure Data Structures and Types Numeric Types Strings Lists Sets Dictionaries Tuples Files Miscellaneous Controlling Code Flow Conditional Constructs Looping Constructs Handling Exceptions Functional Programming Functions Recursive Functions Anonymous Functions Iterators Comprehensions Generators The itertools and functools Modules Classes Working with Text String Literals String Operations and Methods Text Analytics Frameworks Summary Chapter 3:​ Processing and Understanding Text Text Tokenization Sentence Tokenization Word Tokenization Text Normalization Cleaning Text Tokenizing Text Removing Special Characters Expanding Contractions Case Conversions Removing Stopwords Correcting Words Stemming Lemmatization Understanding Text Syntax and Structure Installing Necessary Dependencies Important Machine Learning Concepts Parts of Speech (POS) Tagging Shallow Parsing Dependency-based Parsing Constituency-based Parsing Summary Chapter 4:​ Text Classification What Is Text Classification?​ Automated Text Classification Text Classification Blueprint Text Normalization Feature Extraction Bag of Words Model TF-IDF Model Advanced Word Vectorization Models Classification Algorithms Multinomial Naïve Bayes Support Vector Machines Evaluating Classification Models Building a Multi-Class Classification System Applications and Uses Summary Chapter 5:​ Text Summarization Text Summarization and Information Extraction Important Concepts Documents Text Normalization Feature Extraction Feature Matrix Singular Value Decomposition Text Normalization Feature Extraction Keyphrase Extraction Collocations Weighted Tag–Based Phrase Extraction Topic Modeling Latent Semantic Indexing Latent Dirichlet Allocation Non-negative Matrix Factorization Extracting Topics from Product Reviews Automated Document Summarization Latent Semantic Analysis TextRank Summarizing a Product Description Summary Chapter 6:​ Text Similarity and Clustering Important Concepts Information Retrieval (IR) Feature Engineering Similarity Measures Unsupervised Machine Learning Algorithms Text Normalization Feature Extraction Text Similarity Analyzing Term Similarity Hamming Distance Manhattan Distance Euclidean Distance Levenshtein Edit Distance Cosine Distance and Similarity Analyzing Document Similarity Cosine Similarity Hellinger-Bhattacharya Distance Okapi BM25 Ranking Document Clustering Clustering Greatest Movies of All Time K-means Clustering Affinity Propagation Ward’s Agglomerative Hierarchical Clustering Summary Chapter 7:​ Semantic and Sentiment Analysis Semantic Analysis Exploring WordNet Understanding Synsets Analyzing Lexical Semantic Relations Word Sense Disambiguation Named Entity Recognition Analyzing Semantic Representations Propositional Logic First Order Logic Sentiment Analysis Sentiment Analysis of IMDb Movie Reviews Setting Up Dependencies Preparing Datasets Supervised Machine Learning Technique Unsupervised Lexicon-based Techniques Comparing Model Performances Summary Index Contents at a Glance About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1:​ Natural Language Basics Chapter 2:​ Python Refresher Chapter 3:​ Processing and Understanding Text Chapter 4:​ Text Classification Chapter 5:​ Text Summarization Chapter 6:​ Text Similarity and Clustering Chapter 7:​ Semantic and Sentiment Analysis Index About the Author and About the Technical Reviewer About the Author Dipanjan Sarkar is a data scientist at Intel, the world’s largest silicon company, which is on a mission to make the world more connected and productive.

We will be measuring the performance of our models using the very same metrics, and you may remember seeing these metrics from Chapter 3, when we were building some of our taggers and parsers. Building a Multi-Class Classification System We have gone through all the steps necessary for building a classification system, from normalization to feature extraction, model building, and evaluation. In this section, we will be putting everything together and applying it on some real-world data to build a multi-class text classification system. For this, we will be using the 20 newsgroups dataset available for download using scikit-learn. The 20 newsgroups dataset comprises around 18,000 newsgroups posts spread across 20 different categories or topics, thus making this a 20-class classification problem!

Clusters are formed by connecting objects based on their distance and they can be visualized using a dendrogram. The output of these models is a complete, exhaustive hierarchy of clusters. They are mainly subdivided into agglomerative and divisive clustering models. Centroid-based clustering models: These models build clusters in such a way that each cluster has a central representative member that represents each cluster and has the features that distinguish that particular cluster from the rest. There are various algorithms in this, like k-means, k-medoids, and so on, where we need to set the number of clusters 'k' in advance, and distance metrics like squares of distances from each data point to the centroid need to be minimized.

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
by Valliappa Lakshmanan , Sara Robinson and Michael Munn
Published 31 Oct 2020

Connected Patterns Patterns Reference Pattern Interactions Patterns Within ML Projects ML Life Cycle AI Readiness Common Patterns by Use Case and Data Type Natural Language Understanding Computer Vision Predictive Analytics Recommendation Systems Fraud and Anomaly Detection Index Machine Learning Design Patterns Solutions to Common Challenges in Data Preparation, Model Building, and MLOps Valliappa Lakshmanan, Sara Robinson, and Michael Munn Machine Learning Design Patterns by Valliappa Lakshmanan, Sara Robinson, and Michael Munn Copyright © 2021 Valliappa Lakshmanan, Sara Robinson, and Michael Munn. All rights reserved. Printed in the United States of America.

Given a set of features about the instance, the model will calculate a predicted value. In order to do that, the model is trained on training examples, which associate an instance with a label. A training example refers to a single instance (row) of data from your dataset that will be fed to your model. Building on the timestamp use case, a full training example might include: “day of week,” “city,” and “type of car.” A label is the output column in your dataset—the item your model is predicting. Label can refer both to the target column in your dataset (also called a ground truth label) and the output given by your model (also called a prediction).

This book is targeted primarily at data scientists, data engineers, and ML engineers, so let’s start with those. A data scientist is someone focused on collecting, interpreting, and processing datasets. They run statistical and exploratory analysis on data. As it relates to machine learning, a data scientist may work on data collection, feature engineering, model building, and more. Data scientists often work in Python or R in a notebook environment, and are usually the first to build out an organization’s machine learning models. A data engineer is focused on the infrastructure and workflows powering an organization’s data. They might help manage how a company ingests data, data pipelines, and how data is stored and transferred.

pages: 509 words: 92,141

The Pragmatic Programmer
by Andrew Hunt and Dave Thomas
Published 19 Oct 1999

Where Do Estimates Come From? All estimates are based on models of the problem. But before we get too deeply into the techniques of building models, we have to mention a basic estimating trick that always gives good answers: ask someone who's already done it. Before you get too committed to model building, cast around for someone who's been in a similar situation in the past. See how their problem got solved. It's unlikely you'll ever find an exact match, but you'd be surprised how many times you can successfully draw on other's experiences. Understand What's Being Asked The first part of any estimation exercise is building an understanding of what's being asked.

From your understanding of the question being asked, build a rough and ready bare-bones mental model. If you're estimating response times, your model may involve a server and some kind of arriving traffic. For a project, the model may be the steps that your organization uses during development, along with a very rough picture of how the system might be implemented. Model building can be both creative and useful in the long term. Often, the process of building the model leads to discoveries of underlying patterns and processes that weren't apparent on the surface. You may even want to reexamine the original question: "You asked for an estimate to do X. However, it looks like Y, a variant of X, could be done in about half the time, and you lose only one feature."

Don't Be a Slave to Formal Methods 220 Don't blindly adopt any technique without putting it into the context of your development practices and capabilities. 59. Costly Tools Don't Produce Better Designs 222 Beware of vendor hype, industry dogma, and the aura of the price tag. Judge tools on their merits. 60. Organize Teams Around Functionality 227 Don't separate designers from coders, testers from data modelers. Build teams the way you build code. 61. Don't Use Manual Procedures 231 A shell script or batch file will execute the same instructions, in the same order, time after time. 62. Test Early. Test Often. Test Automatically. 237 Tests that run with every build are much more effective than test plans that sit on a shelf. 63.

pages: 287 words: 44,739

Guide to business modelling
by John Tennent , Graham Friend and Economist Group
Published 15 Dec 2005

Next the team should list all the tasks and activities that need to be done to achieve the goal. Tasks are pieces of work carried out by one person. Activities are defined as a package of work comprising several tasks, carried out by one or more persons. Activities may include data collection, model building, meetings with suppliers, product testing and the various stages of the corporate approvals process. A useful technique is to write each task or activity on a note and stick it on a flip chart on the wall. The team should then be asked to cluster groups of related tasks and activities to define the stages of the project.

The model development process 33 THE MODEL DEVELOPMENT PROCESS Set up output and input templates The first stage of the model development process is to set up sheets containing templates for all the outputs. The content of the output sheets was the subject of Chapter 3. Templates for the profit and loss, balance sheet and cash flow can also be created if these form part of the solution. During the model-building process the financial statement templates can be completed continuously, as if performing double-entry book-keeping. The modeller can ensure the accuracy of the work by checking that the balance sheet continues to balance throughout the development of the model. Sheets containing the input templates should also be created at the start of the process.

MACROS FOR REPEATED TASKS Personal macros Some macros that modellers create may be relevant to more than one spreadsheet. Excel allows modellers to record macros in a “Personal Macro Workbook” which can then be run from any spreadsheet. Many modellers establish a library of macros for repeated tasks which can dramatically speed up the process of model building. Formatting macros Chapter 8 recommended using a separate font, background colour and border and switching off protection to distinguish input cells from output cells. Each time an input cell is required the modeller must go through the laborious task of formatting the cell accordingly. Fortunately, a personal macro can be recorded to automate this task.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
by Zdravko Markov and Daniel T. Larose
Published 5 Apr 2007

The first volume in this series, Discovering Knowledge in Data: An Introduction to Data Mining, by Daniel Larose, appeared in 2005, and introduced the reader to this rapidly growing field of data mining. The second volume in the series, Data Mining Methods and Models, by Daniel Larose, appeared in 2006, and explores the process of data mining from the point of view of model building—the development of complex and powerful predictive models that can deliver actionable results for a wide range of business and research problems. Although Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage serves well as a stand-alone resource for learning how to apply data mining techniques to Web-based data, reference is sometimes made to more complete coverage of certain topics in the earlier volumes.

xiv PREFACE DATA MINING AS A PROCESS The book continues the coverage of data mining as a process. The particular standard process used is the CRISP-DM framework: the cross-industry standard process for data mining. CRISP-DM demands that data mining be seen as an entire process, from communication of the business problem through data collection and management, data preprocessing, model building, model evaluation, and finally, model deployment. Therefore, this book is not only for analysts and managers, but also for data management professionals, database analysts, decision makers, and others who would like to leverage their repositories of Web-based data. THE SOFTWARE The software used in this book includes the following: r WEKA open-source data mining software r Clementine data mining software suite.

Either way, the perspective is global, encompassing the entire data set. On the other hand, patterns are essentially local features of the data. Recognizable patterns may in fact hold true for only a few variables or a fraction of the records in the data. Classification methods, which we are about to examine, deal with global model building. Association rules, on the other hand, are particularly well suited to uncovering local patterns in the data. As soon as one applies the if clause in an association rule, one is partitioning the data so that, usually, most of the records do not apply. Applying the if clause “drills down” deeper into the data set, with the aim of uncovering a hidden local pattern, which may or may not be relevant to the bulk of the data.

pages: 586 words: 159,901

Wall Street: How It Works And for Whom
by Doug Henwood
Published 30 Aug 1998

The early 1970s, Fama recalled, were a euphoric time for financial theorists; the SLB theory for a moment seemed to have "solved" the problem of pricing securities. "We should have known better," Fama wrote. "The SLB model is just a model and so surely false." For a model building profession, this is an odd discovery, but Fama went on to defend SLB and model-building in general by saying it was useful to discover the ways in which it is proved false, which might be true if the model builders had approached the world with humility rather than euphoria. Though it is still reverently taught to business school students, there are simply too many anomalies to sustain the pure SLB beta model, most importantly the size effect.

The MM thesis, even with its tax modifications, became so widely ac- WALL STREET cepted in the academic literature that interest in the relation between finance and the real world faded to near-invisibility. In the words of a paper that would challenge this orthodoxy 30 years later, MM's triumph "dominated the investment literature until recently...largely because of [its] theoretical appeal" (Cantor 1990). It also made the life of model-building economists much easier; as Mark Gertler (1988) put it, "Apart from its formal elegance, the MM theorem was attractive because it provided researchers with a rigorous justification for abstracting from the complications induced by financial considerations." For one thing, you needed fewer variables in your equations if you did, and for another, it fit in nicely with the orthodox preference for individualism: you could focus only on the firm or the household as your unit of analysis, and ignore broader social institutions like the structure of financial markets (or even the state).

Though modern monetarists, from Milton Friedman to Alan Greenspan's Fed staff, have tried to give their doctrine a dynamic spin, it's a flat, static, and trivializing view of a multidimensional, dynamic, and complex system. Keynes's Treatise In a considerable advance on static monetarism, Keynes began his model-building by separating a community's income into two parts, that earned in the production of consumer goods, and that in the production of investment goods, and separating its expenditures also into two parts, that spent on consumption goods and that on savings. This both rejects the homogeneity of the monetarists' T, and moves away from static identities towards a dynamic system.

Beginning R: The Statistical Programming Language
by Mark Gardener
Published 13 Jun 2012

Contents Chapter 1: Introducing R: What It Is and How to Get It Getting the Hang of R The R Website Downloading and Installing R from CRAN Running the R Program Finding Your Way with R Getting Help via the CRAN Website and the Internet The Help Command in R Anatomy of a Help Item in R Command Packages Standard Command Packages What Extra Packages Can Do for You How to Get Extra Packages of R Commands Running and Manipulating Packages Summary Chapter 2: Starting Out: Becoming Familiar with R Some Simple Math Use R Like a Calculator Storing the Results of Calculations Reading and Getting Data into R Using the combine Command for Making Data Using the scan Command for Making Data Reading Bigger Data Files Viewing Named Objects Viewing Previously Loaded Named-Objects Removing Objects from R Types of Data Items Number Data Text Items Converting Between Number and Text Data The Structure of Data Items Vector Items Data Frames Matrix Objects List Objects Examining Data Structure Working with History Commands Using History Files Editing History Files Saving Your Work in R Saving the Workspace on Exit Saving Data Files to Disk Reading Data Files from Disk Saving Data to Disk as Text Files Summary Chapter 3: Starting Out: Working With Objects Manipulating Objects Manipulating Vectors Manipulating Matrix and Data Frames Manipulating Lists Viewing Objects within Objects Looking Inside Complicated Data Objects Opening Complicated Data Objects Quick Looks at Complicated Data Objects Viewing and Setting Names Rotating Data Tables Constructing Data Objects Making Lists Making Data Frames Making Matrix Objects Re-ordering Data Frames and Matrix Objects Forms of Data Objects: Testing and Converting Testing to See What Type of Object You Have Converting from One Object Form to Another Summary Chapter 4: Data: Descriptive Statistics and Tabulation Summary Commands Summarizing Samples Summary Statistics for Vectors Cumulative Statistics Summary Statistics for Data Frames Summary Statistics for Matrix Objects Summary Statistics for Lists Summary Tables Making Contingency Tables Selecting Parts of a Table Object Converting an Object into a Table Testing for Table Objects Complex (Flat) Tables Testing “Flat” Table Objects Summary Commands for Tables Cross Tabulation Summary Chapter 5: Data: Distribution Looking at the Distribution of Data Stem and Leaf Plot Histograms Density Function Types of Data Distribution The Shapiro-Wilk Test for Normality The Kolmogorov-Smirnov Test Quantile-Quantile Plots Summary Chapter 6: Simple Hypothesis Testing Using the Student’s t-test Two-Sample t-Test with Unequal Variance Two-Sample t-Test with Equal Variance One-Sample t-Testing Using Directional Hypotheses Formula Syntax and Subsetting Samples in the t-Test The Wilcoxon U-Test (Mann-Whitney) Two-Sample U-Test One-Sample U-Test Using Directional Hypotheses Formula Syntax and Subsetting Samples in the U-test Paired t- and U-Tests Correlation and Covariance Simple Correlation Covariance Significance Testing in Correlation Tests Formula Syntax Tests for Association Multiple Categories: Chi-Squared Tests Single Category: Goodness of Fit Tests Summary Chapter 7: Introduction to Graphical Analysis Box-whisker Plots Basic Boxplots Customizing Boxplots Horizontal Boxplots Scatter Plots Basic Scatter Plots Adding Axis Labels Plotting Symbols Setting Axis Limits Using Formula Syntax Adding Lines of Best-Fit to Scatter Plots Pairs Plots (Multiple Correlation Plots) Line Charts Line Charts Using Numeric Data Line Charts Using Categorical Data Pie Charts Cleveland Dot Charts Bar Charts Single-Category Bar Charts Multiple Category Bar Charts Horizontal Bars Bar Charts from Summary Data Copy Graphics to Other Applications Use Copy/Paste to Copy Graphs Save a Graphic to Disk Summary Chapter 8: Formula Notation and Complex Statistics Examples of Using Formula Syntax for Basic Tests Formula Notation in Graphics Analysis of Variance (ANOVA) One-Way ANOVA Simple Post-hoc Testing Extracting Means from aov() Models Two-Way ANOVA Extracting Means and Summary Statistics Interaction Plots More Complex ANOVA Models Other Options for aov() Summary Chapter 9: Manipulating Data and Extracting Components Creating Data for Complex Analysis Data Frames Matrix Objects Creating and Setting Factor Data Making Replicate Treatment Factors Adding Rows or Columns Summarizing Data Simple Column and Row Summaries Complex Summary Functions Summary Chapter 10: Regression (Linear Modeling) Simple Linear Regression Linear Model Results Objects Similarity between lm() and aov() Multiple Regression Formulae and Linear Models Model Building Curvilinear Regression Logarithmic Regression Polynomial Regression Plotting Linear Models and Curve Fitting Best-Fit Lines Confidence Intervals on Fitted Lines Summarizing Regression Models Diagnostic Plots Summary of Fit Summary Chapter 11: More About Graphs Adding Elements to Existing Plots Error Bars Adding Legends to Graphs Adding Text to Graphs Adding Points to an Existing Graph Adding Various Sorts of Lines to Graphs Matrix Plots (Multiple Series on One Graph) Multiple Plots in One Window Splitting the Plot Window into Equal Sections Splitting the Plot Window into Unequal Sections Exporting Graphs Using Copy and Paste to Move a Graph Saving a Graph to a File Using the Device Driver to Save a Graph to Disk Summary Chapter 12: Writing Your Own Scripts: Beginning to Program Copy and Paste Scripts Make Your Own Help File as Plain Text Using Annotations with the # Character Creating Simple Functions One-Line Functions Using Default Values in Functions Simple Customized Functions with Multiple Lines Storing Customized Functions Making Source Code Displaying the Results of Customized Functions and Scripts Displaying Messages as Part of Script Output Summary Appendix: Answers to Exercises Chapter 1 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Exercise 5 Solution Chapter 2 Exercise 1 Solution Exercise 2 Solution Chapter 3 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Chapter 4 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Chapter 5 Exercise 1 Solution Exercise 2 Solution Chapter 6 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Exercise 5 Solution Chapter 7 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Exercise 5 Solution Chapter 8 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Exercise 5 Solution Chapter 9 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Exercise 5 Solution Chapter 10 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Exercise 5 Solution Chapter 11 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Chapter 12 Exercise 1 Solution Exercise 2 Solution Exercise 3 Solution Exercise 4 Solution Exercise 5 Solution Introduction Chapter 1 Introducing R: What It Is and How to Get It What you will learn in this chapter: Discovering what R is How to get the R program How to install R on your computer How to start running the R program How to use the help system and find help from other sources How to get additional libraries of commands R is more than just a program that does statistics.

So, the following examples are quite different in meaning: y ~ x1 + x2 y ~ I(x1 + x2) In the first case you indicate a multiple regression of y against x1 and x2. The second case indicates a simple regression of y against the sum of x1 and x2. You see an example of this in action shortly in the section, “Curvilinear Regression.” Model Building When you have several or many predictor variables, you usually want to create the most statistically significant model from the data. You have two main choices: forward stepwise regression and backward deletion. Forward stepwise regression: Start off with the single best variable and add more variables to build your model into a more complex form Backward deletion: Put all the variables in and reduce the model by removing variables until you are left with only significant terms.

The anova() command produces the classic ANOVA table from the result of an lm() command. Linear modeling:formula syntax The basis of the lm() command is the formula syntax (also known as model syntax). This takes the form of response ~ predictor(s). Complex models can be specified using this syntax. Model building: add1()drop1() You can build regression models in a forward stepwise manner or by using backward deletion. Moving forward, terms are added using the add1() command. Backward deletion uses the drop1() command. Comparing regression models: anova() You can compare regression models using the anova() command.

pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking
by Foster Provost and Tom Fawcett
Published 30 Jun 2013

(In other words, when do we stop recursing?) It should be clear that we would stop when the nodes are pure, or when we run out of variables to split on. But we may want to stop earlier; we will return to this question in Chapter 5. Visualizing Segmentations Continuing with the metaphor of predictive model building as supervised segmentation, it is instructive to visualize exactly how a classification tree partitions the instance space. The instance space is simply the space described by the data features. A common form of instance space visualization is a scatterplot on some pair of features, used to compare one variable against another to detect correlations and relationships.

A General Method for Avoiding Overfitting More generally, if we have a collection of models with different complexities, we could choose the best simply by estimating the generalization performance of each. But how could we estimate their generalization performance? On the (labeled) test data? There’s one big problem with that: test data should be strictly independent of model building so that we can get an independent estimate of model accuracy. For example, we might want to estimate the ultimate business performance or to compare the best model we can build from one family (say, classification trees) against the best model from another family (say, logistic regression). If we don’t care about comparing models or getting an independent estimate of the model accuracy and/or variance, then we could pick the best model based on the testing data.

Manhattan distance (equation), * Other Distance Functions Mann-Whitney-Wilcoxon measure, The Area Under the ROC Curve (AUC) margin-maximizing boundary, Support Vector Machines, Briefly margins, Support Vector Machines, Briefly market basket analysis, Associations Among Facebook Likes–Associations Among Facebook Likes Massachusetts Institute of Technology (MIT), Data Science, Engineering, and Data-Driven Decision Making, Privacy, Ethics, and Mining Data About Individuals mathematical functions, overfitting in, Overfitting in Mathematical Functions–Overfitting in Mathematical Functions matrix factorization, Data Reduction, Latent Information, and Movie Recommendation maximizing objective functions, * Avoiding Overfitting for Parameter Optimization maximizing the margin, Support Vector Machines, Briefly maximum likelihood model, Profiling: Finding Typical Behavior McCarthy, Cormac, Term Frequency McKinsey and Company, Data-Analytic Thinking mean generalization, From Holdout Evaluation to Cross-Validation, Summary Mechanical Turk, Final Example: From Crowd-Sourcing to Cloud-Sourcing Medicare fraud, detecting, Data Understanding Michael Jackson’s Malt Whisky Companion (Jackson), Example: Whiskey Analytics micro-outsourcing, Final Example: From Crowd-Sourcing to Cloud-Sourcing Microsoft, Term Frequency, Attracting and Nurturing Data Scientists and Their Teams Mingus, Charles, Example: Jazz Musicians missing values, Data Preparation mobile devices location of, finding, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data mining data from, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data–Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data model accuracy, Holdout Data and Fitting Graphs model building, test data and, A General Method for Avoiding Overfitting model evaluation and classification, Problems with Unbalanced Classes model induction, Models, Induction, and Prediction model intelligibility, Intelligibility model performance, visualizing, Visualizing Model Performance–Example: Performance Analytics for Churn Modeling area under ROC curves, The Area Under the ROC Curve (AUC) cumulative response curves, Cumulative Response and Lift Curves–Cumulative Response and Lift Curves lift curves, Cumulative Response and Lift Curves–Cumulative Response and Lift Curves profit curves, Profit Curves–Profit Curves ranking vs. classifying cases, Visualizing Model Performance–Example: Performance Analytics for Churn Modeling model types, Models, Induction, and Prediction Black-Sholes option pricing, Models, Induction, and Prediction descriptive, Models, Induction, and Prediction predictive, Models, Induction, and Prediction modelers, Overfitting in Mathematical Functions modeling algorithms, A General Method for Avoiding Overfitting, Flaws in the Big Red Proposal modeling labs, From Holdout Evaluation to Cross-Validation models comprehensibility, Evaluation creating, Models, Induction, and Prediction first-layer, Nonlinear Functions, Support Vector Machines, and Neural Networks fitting to data, Fitting a Model to Data, The Fundamental Concepts of Data Science linear, Fitting a Model to Data parameterizing, Fitting a Model to Data parameters, Fitting a Model to Data problems, Probability Estimation producing, From Holdout Evaluation to Cross-Validation second-layer, Nonlinear Functions, Support Vector Machines, and Neural Networks structure, Fitting a Model to Data table, Generalization understanding types of, Visualizing Segmentations worsening, * Example: Why Is Overfitting Bad?

pages: 543 words: 153,550

Model Thinker: What You Need to Know to Make Data Work for You
by Scott E. Page
Published 27 Nov 2018

We often think we see patterns in election outcomes, stock prices, and scoring in sporting events, but instead, to borrow Nassim Taleb’s lovely phrase, we are being fooled by randomness.2 The Bernoulli urn model describes random processes that produce discrete outcomes, like the flip of a coin or the roll of a die. Developed centuries ago to explain the odds of winning at gambling, it now occupies a central position in probability theory. The random walk model builds on that model by keeping running totals of the number of heads and tails. The model can capture the movement of particles in liquids and gases, the movement of animals in physical space, and growth in human height from birth to childhood.3 The chapter begins with brief coverage of the Bernoulli urn model along with an analysis of the length of streaks.

Given that Berkshire reveals its investments, we can also rule out fraud. Bernie Madoff did not reveal his investments. His proclaimed streak of successes—decades of consecutive positive returns—was so unlikely that his clients should have demanded transparency.6 Random Walk Models Our next model, the simple random walk model, builds on the Bernoulli urn model by keeping running totals of past outcomes. We set the initial value, the state of the model, to be zero. If we draw a white ball, we add 1 to the total. If we choose a gray ball, we subtract 1. The state of the model at any time equals the sum of the previous outcomes (i.e., the total number of white balls drawn minus the number of gray balls drawn).

“Challenges to Replication and Iteration in Field Experiments: Evidence from Two Direct Mail Shots.” American Economic Review Papers & Proceedings 107, no. 5: 1–3. Bowles, Samuel, and Herbert Gintis. 2002. “The Inheritance of Inequality.” Journal of Economic Perspectives 16, no. 3: 3–30. Box, George E. P., and Norman Draper. 1987. Empirical Model-Building and Response Surfaces. New York: Wiley. Boyd, Robert. 2006. “Reciprocity: You Have to Think Different.” Journal of Evolutionary Biology 19: 1380–1382. Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24, no. 2: 123–140. Briggs, Andrew, and Mark Sculpher. 1998. “An Introduction to Markov Modeling for Economic Evaluation.”

pages: 226 words: 59,080

Economics Rules: The Rights and Wrongs of the Dismal Science
by Dani Rodrik
Published 12 Oct 2015

Economists also use models, however, to shed light on the functioning of other institutions—schools, trade unions, governments. But what are economic models? The easiest way to understand them is as simplifications designed to show how specific mechanisms work by isolating them from other, confounding effects. A model focuses on particular causes and seeks to show how they work their effects through the system. A modeler builds an artificial world that reveals certain types of connections among the parts of the whole—connections that might be hard to discern if you were looking at the real world in its welter of complexity. Models in economics are no different from physical models used by physicians or architects. A plastic model of the respiratory system that you might encounter in a physician’s office focuses on the detail of the lungs, leaving out the rest of the human body.

As the Finnish philosopher Uskali Mäki explains, the economics modeler in fact practices a similar method of insulation, isolation, and identification. The main difference is that the lab experiment purposely manipulates the physical environment to achieve the isolation needed to observe the causal effect, whereas a model does this by manipulating the assumptions that go into it.§ Models build mental environments to test hypotheses. You may object that in a lab experiment, as artificial as its environment may be, the action still takes place in the real world. We know if it works or does not work, in at least one setting. An economic model, by contrast, is a thoroughly artificial construct that unfolds in our minds only.

pages: 824 words: 218,333

The Gene: An Intimate History
by Siddhartha Mukherjee
Published 16 May 2016

X-ray pictures would help, of course—but trying to determine structures of biological molecules using experimental methods, Crick argued, was absurdly laborious—“like trying to determine the structure of a piano by listening to the sound it made while being dropped down a flight of stairs.” But what if the structure of DNA was so simple—so elegant—that it could be deduced by “common sense,” by model building? What if a stick-and-stone assemblage could solve DNA? Fifty miles away, at King’s College in London, Franklin had little interest in building models with toys. With her laserlike focus on experimental studies, she had been taking photograph after photograph of DNA—each with increasing clarity.

In Pasadena, meanwhile, Linus Pauling was also trying to solve the structure of DNA. Pauling’s “assault on DNA,” Watson knew, would be nothing short of formidable. He would come at it with a bang, deploying his deep understanding of chemistry, mathematics, and crystallography—but more important, his instinctual grasp of model building. Watson and Crick feared that they would wake up one morning, open the pages of an august scientific journal, and find the solved structure of DNA staring back at them. Pauling’s name—not theirs—would be attached to the article. In the first weeks of January 1953, that nightmare seemed to come true: Pauling and Robert Corey wrote a paper proposing a structure of DNA and sent a preliminary copy to Cambridge.

He wasn’t going to repeat the same error. By the time he had returned to Cambridge and jumped over the back gate of the college, he was convinced that DNA had to be made of two intertwined, helical chains: “important biological objects come in pairs.” The next morning, Watson and Crick raced down to the lab and started model building in earnest. Geneticists count; biochemists clean. Watson and Crick played. They worked methodically, diligently, and carefully—but left enough room for their key strength: lightness. If they were to win this race, it would be through whimsy and intuition; they would laugh their way to DNA. At first, they tried to salvage the essence of their first model, placing the phosphate backbone in the middle, and the bases projecting out to the sides.

pages: 504 words: 89,238

Natural language processing with Python
by Steven Bird , Ewan Klein and Edward Loper
Published 15 Dec 2009

((x = bruce | x = julia) -> admire(x, y)))') >>> m2.satisfiers(fmla6, 'y', g2) set(['b']) Your Turn: Devise a new model based on m2 such that (27a) comes out false in your model; similarly, devise a new model such that (27b) comes out true. 382 | Chapter 10: Analyzing the Meaning of Sentences Model Building We have been assuming that we already had a model, and wanted to check the truth of a sentence in the model. By contrast, model building tries to create a new model, given some set of sentences. If it succeeds, then we know that the set is consistent, since we have an existence proof of the model. We invoke the Mace4 model builder by creating an instance of Mace() and calling its build_model() method, in an analogous way to calling the Prover9 theorem prover.

English, 63 code blocks, nested, 25 code examples, downloading, 57 code points, 94 codecs module, 95 coindex (in feature structure), 340 collocations, 20, 81 comma operator (,), 133 comparative wordlists, 65 comparison operators numerical, 22 for words, 23 complements of lexical head, 347 complements of verbs, 313 complex types, 373 complex values, 336 components, language understanding, 31 computational linguistics, challenges of natural language, 441 computer understanding of sentence meaning, 368 concatenation, 11, 88 lists and strings, 87 strings, 16 conclusions in logic, 369 concordances creating, 40 graphical POS-concordance tool, 184 conditional classifiers, 254 conditional expressions, 25 conditional frequency distributions, 44, 52–56 combining with regular expressions, 103 condition and event pairs, 52 counting words by genre, 52 generating random text with bigrams, 55 male and female names ending in each alphabet letter, 62 plotting and tabulating distributions, 53 using to find minimally contrasting set of words, 64 ConditionalFreqDist, 52 commonly used methods, 56 conditionals, 22, 133 confusion matrix, 207, 240 consecutive classification, 232 non phrase chunking with consecutive classifier, 275 consistent, 366 466 | General Index constituent structure, 296 constituents, 297 context exploiting in part-of-speech classifier, 230 for taggers, 203 context-free grammar, 298, 300 (see also grammars) probabilistic context-free grammar, 320 contractions in tokenization, 112 control, 22 control structures, 26 conversion specifiers, 118 conversions of data formats, 419 coordinate structures, 295 coreferential, 373 corpora, 39–52 annotated text corpora, 46–48 Brown Corpus, 42–44 creating and accessing, resources for further reading, 438 defined, 39 differences in corpus access methods, 50 exploring text corpora using a chunker, 267 Gutenberg Corpus, 39–42 Inaugural Address Corpus, 45 from languages other than English, 48 loading your own corpus, 51 obtaining from Web, 416 Reuters Corpus, 44 sources of, 73 tagged, 181–189 text corpus structure, 49–51 web and chat text, 42 wordlists, 60–63 corpora, included with NLTK, 46 corpus case study, structure of TIMIT, 407–412 corpus HOWTOs, 122 life cycle of, 412–416 creation scenarios, 412 curation versus evolution, 415 quality control, 413 widely-used format for, 421 counters, legitimate uses of, 141 cross-validation, 241 CSV (comma-separated value) format, 418 CSV (comma-separated-value) format, 170 D \d decimal digits in regular expressions, 110 \D nondigit characters in regular expressions, 111 data formats, converting, 419 data types dictionary, 190 documentation for Python standard types, 173 finding type of Python objects, 86 function parameter, 146 operations on objects, 86 database query via natural language, 361–365 databases, obtaining data from, 418 debugger (Python), 158 debugging techniques, 158 decimal integers, formatting, 119 decision nodes, 242 decision stumps, 243 decision trees, 242–245 entropy and information gain, 243 decision-tree classifier, 229 declarative style, 140 decoding, 94 def keyword, 9 defaultdict, 193 defensive programming, 159 demonstratives, agreement with noun, 329 dependencies, 310 criteria for, 312 existential dependencies, modeling in XML, 427 non-projective, 312 projective, 311 unbounded dependency constructions, 349–353 dependency grammars, 310–315 valency and the lexicon, 312 dependents, 310 descriptive models, 255 determiners, 186 agreement with nouns, 333 deve-test set, 225 development set, 225 similarity to test set, 238 dialogue act tagging, 214 dialogue acts, identifying types, 235 dialogue systems (see spoken dialogue systems) dictionaries feature set, 223 feature structures as, 337 pronouncing dictionary, 63–65 Python, 189–198 default, 193 defining, 193 dictionary data type, 190 finding key given a value, 197 indexing lists versus, 189 summary of dictionary methods, 197 updating incrementally, 195 storing features and values, 327 translation, 66 dictionary methods, 197 dictionary data structure (Python), 65 directed acyclic graphs (DAGs), 338 discourse module, 401 discourse semantics, 397–402 discourse processing, 400–402 discourse referents, 397 discourse representation structure (DRS), 397 Discourse Representation Theory (DRT), 397–400 dispersion plot, 6 divide-and-conquer strategy, 160 docstrings, 143 contents and structure of, 148 example of complete docstring, 148 module-level, 155 doctest block, 148 doctest module, 160 document classification, 227 documentation functions, 148 online Python documentation, versions and, 173 Python, resources for further information, 173 docutils module, 148 domain (of a model), 377 DRS (discourse representation structure), 397 DRS conditions, 397 DRT (Discourse Representation Theory), 397– 400 Dublin Core Metadata initiative, 435 duck typing, 281 dynamic programming, 165 General Index | 467 application to parsing with context-free grammar, 307 different approaches to, 167 E Earley chart parser, 334 electronic books, 80 elements, XML, 425 ElementTree interface, 427–429 using to access Toolbox data, 429 elif clause, if . . . elif statement, 133 elif statements, 26 else statements, 26 encoding, 94 encoding features, 223 encoding parameters, codecs module, 95 endangered languages, special considerations with, 423–424 entities, 373 entity detection, using chunking, 264–270 entries adding field to, in Toolbox, 431 contents of, 60 converting data formats, 419 formatting in XML, 430 entropy, 251 (see also Maximum Entropy classifiers) calculating for gender prediction task, 243 maximizing in Maximum Entropy classifier, 252 epytext markup language, 148 equality, 132, 372 equivalence (<->) operator, 368 equivalent, 340 error analysis, 225 errors runtime, 13 sources of, 156 syntax, 3 evaluation sets, 238 events, pairing with conditions in conditional frequency distribution, 52 exceptions, 158 existential quantifier, 374 exists operator, 376 Expected Likelihood Estimation, 249 exporting data, 117 468 | General Index F f-structure, 357 feature extractors defining for dialogue acts, 235 defining for document classification, 228 defining for noun phrase (NP) chunker, 276–278 defining for punctuation, 234 defining for suffix checking, 229 Recognizing Textual Entailment (RTE), 236 selecting relevant features, 224–227 feature paths, 339 feature sets, 223 feature structures, 328 order of features, 337 resources for further reading, 357 feature-based grammars, 327–360 auxiliary verbs and inversion, 348 case and gender in German, 353 example grammar, 333 extending, 344–356 lexical heads, 347 parsing using Earley chart parser, 334 processing feature structures, 337–344 subsumption and unification, 341–344 resources for further reading, 357 subcategorization, 344–347 syntactic agreement, 329–331 terminology, 336 translating from English to SQL, 362 unbounded dependency constructions, 349–353 using attributes and constraints, 331–336 features, 223 non-binary features in naive Bayes classifier, 249 fields, 136 file formats, libraries for, 172 files opening and reading local files, 84 writing program output to, 120 fillers, 349 first-order logic, 372–385 individual variables and assignments, 378 model building, 383 quantifier scope ambiguity, 381 summary of language, 376 syntax, 372–375 theorem proving, 375 truth in model, 377 floating-point numbers, formatting, 119 folds, 241 for statements, 26 combining with if statements, 26 inside a list comprehension, 63 iterating over characters in strings, 90 format strings, 118 formatting program output, 116–121 converting from lists to strings, 116 strings and formats, 117–118 text wrapping, 120 writing results to file, 120 formulas of propositional logic, 368 formulas, type (t), 373 free, 375 Frege’s Principle, 385 frequency distributions, 17, 22 conditional (see conditional frequency distributions) functions defined for, 22 letters, occurrence in strings, 90 functions, 142–154 abstraction provided by, 147 accumulative, 150 as arguments to another function, 149 call-by-value parameter passing, 144 checking parameter types, 146 defined, 9, 57 documentation for Python built-in functions, 173 documenting, 148 errors from, 157 for frequency distributions, 22 for iteration over sequences, 134 generating plurals of nouns (example), 58 higher-order, 151 inputs and outputs, 143 named arguments, 152 naming, 142 poorly-designed, 147 recursive, call structure, 165 saving in modules, 59 variable scope, 145 well-designed, 147 gazetteer, 282 gender identification, 222 Decision Tree model for, 242 gender in German, 353–356 Generalized Phrase Structure Grammar (GPSG), 345 generate_model ( ) function, 55 generation of language output, 29 generative classifiers, 254 generator expressions, 138 functions exemplifying, 151 genres, systematic differences between, 42–44 German, case and gender in, 353–356 gerunds, 211 glyphs, 94 gold standard, 201 government-sponsored challenges to machine learning application in NLP, 257 gradient (grammaticality), 318 grammars, 327 (see also feature-based grammars) chunk grammar, 265 context-free, 298–302 parsing with, 302–310 validating Toolbox entries with, 433 writing your own, 300 dependency, 310–315 development, 315–321 problems with ambiguity, 317 treebanks and grammars, 315–317 weighted grammar, 318–321 dilemmas in sentence structure analysis, 292–295 resources for further reading, 322 scaling up, 315 grammatical category, 328 graphical displays of data conditional frequency distributions, 56 Matplotlib, 168–170 graphs defining and manipulating, 170 directed acyclic graphs, 338 greedy sequence classification, 232 Gutenberg Corpus, 40–42, 80 G hapaxes, 19 hash arrays, 189, 190 (see also dictionaries) gaps, 349 H General Index | 469 head of a sentence, 310 criteria for head and dependencies, 312 heads, lexical, 347 headword (lemma), 60 Heldout Estimation, 249 hexadecimal notation for Unicode string literal, 95 Hidden Markov Models, 233 higher-order functions, 151 holonyms, 70 homonyms, 60 HTML documents, 82 HTML markup, stripping out, 418 hypernyms, 70 searching corpora for, 106 semantic similarity and, 72 hyphens in tokenization, 110 hyponyms, 69 I identifiers for variables, 15 idioms, Python, 24 IDLE (Interactive DeveLopment Environment), 2 if . . . elif statements, 133 if statements, 25 combining with for statements, 26 conditions in, 133 immediate constituents, 297 immutable, 93 implication (->) operator, 368 in operator, 91 Inaugural Address Corpus, 45 inconsistent, 366 indenting code, 138 independence assumption, 248 naivete of, 249 indexes counting from zero (0), 12 list, 12–14 mapping dictionary definition to lexeme, 419 speeding up program by using, 163 string, 15, 89, 91 text index created using a stemmer, 107 words containing a given consonant-vowel pair, 103 inference, 369 information extraction, 261–289 470 | General Index architecture of system, 263 chunking, 264–270 defined, 262 developing and evaluating chunkers, 270– 278 named entity recognition, 281–284 recursion in linguistic structure, 278–281 relation extraction, 284 resources for further reading, 286 information gain, 243 inside, outside, begin tags (see IOB tags) integer ordinal, finding for character, 95 interpreter >>> prompt, 2 accessing, 2 using text editor instead of to write programs, 56 inverted clauses, 348 IOB tags, 269, 286 reading, 270–272 is operator, 145 testing for object identity, 132 ISO 639 language codes, 65 iterative optimization techniques, 251 J joint classifier models, 231 joint-features (maximum entropy model), 252 K Kappa coefficient (k), 414 keys, 65, 191 complex, 196 keyword arguments, 153 Kleene closures, 100 L lambda expressions, 150, 386–390 example, 152 lambda operator (λ), 386 Lancaster stemmer, 107 language codes, 65 language output, generating, 29 language processing, symbol processing versus, 442 language resources describing using OLAC metadata, 435–437 LanguageLog (linguistics blog), 35 latent semantic analysis, 171 Latin-2 character encoding, 94 leaf nodes, 242 left-corner parser, 306 left-recursive, 302 lemmas, 60 lexical relationships between, 71 pairing of synset with a word, 68 lemmatization, 107 example of, 108 length of a text, 7 letter trie, 162 lexical categories, 179 lexical entry, 60 lexical relations, 70 lexical resources comparative wordlists, 65 pronouncing dictionary, 63–65 Shoebox and Toolbox lexicons, 66 wordlist corpora, 60–63 lexicon, 60 (see also lexical resources) chunking Toolbox lexicon, 434 defined, 60 validating in Toolbox, 432–435 LGB rule of name resolution, 145 licensed, 350 likelihood ratios, 224 Linear-Chain Conditional Random Field Models, 233 linguistic objects, mappings from keys to values, 190 linguistic patterns, modeling, 255 linguistics and NLP-related concepts, resources for, 34 list comprehensions, 24 for statement in, 63 function invoked in, 64 used as function parameters, 55 lists, 10 appending item to, 11 concatenating, using + operator, 11 converting to strings, 116 indexing, 12–14 indexing, dictionaries versus, 189 normalizing and sorting, 86 Python list type, 86 sorted, 14 strings versus, 92 tuples versus, 136 local variables, 58 logic first-order, 372–385 natural language, semantics, and, 365–368 propositional, 368–371 resources for further reading, 404 logical constants, 372 logical form, 368 logical proofs, 370 loops, 26 looping with conditions, 26 lowercase, converting text to, 45, 107 M machine learning application to NLP, web pages for government challenges, 257 decision trees, 242–245 Maximum Entropy classifiers, 251–254 naive Bayes classifiers, 246–250 packages, 237 resources for further reading, 257 supervised classification, 221–237 machine translation (MT) limitations of, 30 using NLTK’s babelizer, 30 mapping, 189 Matplotlib package, 168–170 maximal projection, 347 Maximum Entropy classifiers, 251–254 Maximum Entropy Markov Models, 233 Maximum Entropy principle, 253 memoization, 167 meronyms, 70 metadata, 435 OLAC (Open Language Archives Community), 435 modals, 186 model building, 383 model checking, 379 models interpretation of sentences of logical language, 371 of linguistic patterns, 255 representation using set theory, 367 truth-conditional semantics in first-order logic, 377 General Index | 471 what can be learned from models of language, 255 modifiers, 314 modules defined, 59 multimodule programs, 156 structure of Python module, 154 morphological analysis, 213 morphological cues to word category, 211 morphological tagging, 214 morphosyntactic information in tagsets, 212 MSWord, text from, 85 mutable, 93 N \n newline character in regular expressions, 111 n-gram tagging, 203–208 across sentence boundaries, 208 combining taggers, 205 n-gram tagger as generalization of unigram tagger, 203 performance limitations, 206 separating training and test data, 203 storing taggers, 206 unigram tagging, 203 unknown words, 206 naive Bayes assumption, 248 naive Bayes classifier, 246–250 developing for gender identification task, 223 double-counting problem, 250 as generative classifier, 254 naivete of independence assumption, 249 non-binary features, 249 underlying probabilistic model, 248 zero counts and smoothing, 248 name resolution, LGB rule for, 145 named arguments, 152 named entities commonly used types of, 281 relations between, 284 named entity recognition (NER), 281–284 Names Corpus, 61 negative lookahead assertion, 284 NER (see named entity recognition) nested code blocks, 25 NetworkX package, 170 new words in languages, 212 472 | General Index newlines, 84 matching in regular expressions, 109 printing with print statement, 90 resources for further information, 122 non-logical constants, 372 non-standard words, 108 normalizing text, 107–108 lemmatization, 108 using stemmers, 107 noun phrase (NP), 297 noun phrase (NP) chunking, 264 regular expression–based NP chunker, 267 using unigram tagger, 272 noun phrases, quantified, 390 nouns categorizing and tagging, 184 program to find most frequent noun tags, 187 syntactic agreement, 329 numerically intense algorithms in Python, increasing efficiency of, 257 NumPy package, 171 O object references, 130 copying, 132 objective function, 114 objects, finding data type for, 86 OLAC metadata, 74, 435 definition of metadata, 435 Open Language Archives Community, 435 Open Archives Initiative (OAI), 435 open class, 212 open formula, 374 Open Language Archives Community (OLAC), 435 operators, 369 (see also names of individual operators) addition and multiplication, 88 Boolean, 368 numerical comparison, 22 scope of, 157 word comparison, 23 or operator, 24 orthography, 328 out-of-vocabulary items, 206 overfitting, 225, 245 P packages, 59 parameters, 57 call-by-value parameter passing, 144 checking types of, 146 defined, 9 defining for functions, 143 parent nodes, 279 parsing, 318 (see also grammars) with context-free grammar left-corner parser, 306 recursive descent parsing, 303 shift-reduce parsing, 304 well-formed substring tables, 307–310 Earley chart parser, parsing feature-based grammars, 334 parsers, 302 projective dependency parser, 311 part-of-speech tagging (see POS tagging) partial information, 341 parts of speech, 179 PDF text, 85 Penn Treebank Corpus, 51, 315 personal pronouns, 186 philosophical divides in contemporary NLP, 444 phonetics computer-readable phonetic alphabet (SAMPA), 137 phones, 63 resources for further information, 74 phrasal level, 347 phrasal projections, 347 pipeline for NLP, 31 pixel images, 169 plotting functions, Matplotlib, 168 Porter stemmer, 107 POS (part-of-speech) tagging, 179, 208, 229 (see also tagging) differences in POS tagsets, 213 examining word context, 230 finding IOB chunk tag for word's POS tag, 272 in information retrieval, 263 morphology in POS tagsets, 212 resources for further reading, 214 simplified tagset, 183 storing POS tags in tagged corpora, 181 tagged data from four Indian languages, 182 unsimplifed tags, 187 use in noun phrase chunking, 265 using consecutive classifier, 231 pre-sorting, 160 precision, evaluating search tasks for, 239 precision/recall trade-off in information retrieval, 205 predicates (first-order logic), 372 prepositional phrase (PP), 297 prepositional phrase attachment ambiguity, 300 Prepositional Phrase Attachment Corpus, 316 prepositions, 186 present participles, 211 Principle of Compositionality, 385, 443 print statements, 89 newline at end, 90 string formats and, 117 prior probability, 246 probabilistic context-free grammar (PCFG), 320 probabilistic model, naive Bayes classifier, 248 probabilistic parsing, 318 procedural style, 139 processing pipeline (NLP), 86 productions in grammars, 293 rules for writing CFGs for parsing in NLTK, 301 program development, 154–160 debugging techniques, 158 defensive programming, 159 multimodule programs, 156 Python module structure, 154 sources of error, 156 programming style, 139 programs, writing, 129–177 advanced features of functions, 149–154 algorithm design, 160–167 assignment, 130 conditionals, 133 equality, 132 functions, 142–149 resources for further reading, 173 sequences, 133–138 style considerations, 138–142 legitimate uses for counters, 141 procedural versus declarative style, 139 General Index | 473 Python coding style, 138 summary of important points, 172 using Python libraries, 167–172 Project Gutenberg, 80 projections, 347 projective, 311 pronouncing dictionary, 63–65 pronouns anaphoric antecedents, 397 interpreting in first-order logic, 373 resolving in discourse processing, 401 proof goal, 376 properties of linguistic categories, 331 propositional logic, 368–371 Boolean operators, 368 propositional symbols, 368 pruning decision nodes, 245 punctuation, classifier for, 233 Python carriage return and linefeed characters, 80 codecs module, 95 dictionary data structure, 65 dictionary methods, summary of, 197 documentation, 173 documentation and information resources, 34 ElementTree module, 427 errors in understanding semantics of, 157 finding type of any object, 86 getting started, 2 increasing efficiency of numerically intense algorithms, 257 libraries, 167–172 CSV, 170 Matplotlib, 168–170 NetworkX, 170 NumPy, 171 other, 172 reference materials, 122 style guide for Python code, 138 textwrap module, 120 Python Package Index, 172 Q quality control in corpus creation, 413 quantification first-order logic, 373, 380 quantified noun phrases, 390 scope ambiguity, 381, 394–397 474 | General Index quantified formulas, interpretation of, 380 questions, answering, 29 quotation marks in strings, 87 R random text generating in various styles, 6 generating using bigrams, 55 raster (pixel) images, 169 raw strings, 101 raw text, processing, 79–128 capturing user input, 85 detecting word patterns with regular expressions, 97–101 formatting from lists to strings, 116–121 HTML documents, 82 NLP pipeline, 86 normalizing text, 107–108 reading local files, 84 regular expressions for tokenizing text, 109– 112 resources for further reading, 122 RSS feeds, 83 search engine results, 82 segmentation, 112–116 strings, lowest level text processing, 87–93 summary of important points, 121 text from web and from disk, 80 text in binary formats, 85 useful applications of regular expressions, 102–106 using Unicode, 93–97 raw( ) function, 41 re module, 101, 110 recall, evaluating search tasks for, 240 Recognizing Textual Entailment (RTE), 32, 235 exploiting word context, 230 records, 136 recursion, 161 function to compute Sanskrit meter (example), 165 in linguistic structure, 278–281 tree traversal, 280 trees, 279–280 performance and, 163 in syntactic structure, 301 recursive, 301 recursive descent parsing, 303 reentrancy, 340 references (see object references) regression testing framework, 160 regular expressions, 97–106 character class and other symbols, 110 chunker based on, evaluating, 272 extracting word pieces, 102 finding word stems, 104 matching initial and final vowel sequences and all consonants, 102 metacharacters, 101 metacharacters, summary of, 101 noun phrase (NP) chunker based on, 265 ranges and closures, 99 resources for further information, 122 searching tokenized text, 105 symbols, 110 tagger, 199 tokenizing text, 109–112 use in PlaintextCorpusReader, 51 using basic metacharacters, 98 using for relation extraction, 284 using with conditional frequency distributions, 103 relation detection, 263 relation extraction, 284 relational operators, 22 reserved words, 15 return statements, 144 return value, 57 reusing code, 56–59 creating programs using a text editor, 56 functions, 57 modules, 59 Reuters Corpus, 44 root element (XML), 427 root hypernyms, 70 root node, 242 root synsets, 69 Rotokas language, 66 extracting all consonant-vowel sequences from words, 103 Toolbox file containing lexicon, 429 RSS feeds, 83 feedparser library, 172 RTE (Recognizing Textual Entailment), 32, 235 exploiting word context, 230 runtime errors, 13 S \s whitespace characters in regular expressions, 111 \S nonwhitespace characters in regular expressions, 111 SAMPA computer-readable phonetic alphabet, 137 Sanskrit meter, computing, 165 satisfies, 379 scope of quantifiers, 381 scope of variables, 145 searches binary search, 160 evaluating for precision and recall, 239 processing search engine results, 82 using POS tags, 187 segmentation, 112–116 in chunking and tokenization, 264 sentence, 112 word, 113–116 semantic cues to word category, 211 semantic interpretations, NLTK functions for, 393 semantic role labeling, 29 semantics natural language, logic and, 365–368 natural language, resources for information, 403 semantics of English sentences, 385–397 quantifier ambiguity, 394–397 transitive verbs, 391–394 ⋏-calculus, 386–390 SemCor tagging, 214 sentence boundaries, tagging across, 208 sentence segmentation, 112, 233 in chunking, 264 in information retrieval process, 263 sentence structure, analyzing, 291–326 context-free grammar, 298–302 dependencies and dependency grammar, 310–315 grammar development, 315–321 grammatical dilemmas, 292 parsing with context-free grammar, 302– 310 resources for further reading, 322 summary of important points, 321 syntax, 295–298 sents( ) function, 41 General Index | 475 sequence classification, 231–233 other methods, 233 POS tagging with consecutive classifier, 232 sequence iteration, 134 sequences, 133–138 combining different sequence types, 136 converting between sequence types, 135 operations on sequence types, 134 processing using generator expressions, 137 strings and lists as, 92 shift operation, 305 shift-reduce parsing, 304 Shoebox, 66, 412 sibling nodes, 279 signature, 373 similarity, semantic, 71 Sinica Treebank Corpus, 316 slash categories, 350 slicing lists, 12, 13 strings, 15, 90 smoothing, 249 space-time trade-offs in algorihm design, 163 spaces, matching in regular expressions, 109 Speech Synthesis Markup Language (W3C SSML), 214 spellcheckers, Words Corpus used by, 60 spoken dialogue systems, 31 spreadsheets, obtaining data from, 418 SQL (Structured Query Language), 362 translating English sentence to, 362 stack trace, 158 standards for linguistic data creation, 421 standoff annotation, 415, 421 start symbol for grammars, 298, 334 startswith( ) function, 45 stemming, 107 NLTK HOWTO, 122 stemmers, 107 using regular expressions, 104 using stem( ) fuinction, 105 stopwords, 60 stress (in pronunciation), 64 string formatting expressions, 117 string literals, Unicode string literal in Python, 95 strings, 15, 87–93 476 | General Index accessing individual characters, 89 accessing substrings, 90 basic operations with, 87–89 converting lists to, 116 formats, 117–118 formatting lining things up, 118 tabulating data, 119 immutability of, 93 lists versus, 92 methods, 92 more operations on, useful string methods, 92 printing, 89 Python’s str data type, 86 regular expressions as, 101 tokenizing, 86 structurally ambiguous sentences, 300 structure sharing, 340 interaction with unification, 343 structured data, 261 style guide for Python code, 138 stylistics, 43 subcategories of verbs, 314 subcategorization, 344–347 substrings (WFST), 307 substrings, accessing, 90 subsumes, 341 subsumption, 341–344 suffixes, classifier for, 229 supervised classification, 222–237 choosing features, 224–227 documents, 227 exploiting context, 230 gender identification, 222 identifying dialogue act types, 235 part-of-speech tagging, 229 Recognizing Textual Entailment (RTE), 235 scaling up to large datasets, 237 sentence segmentation, 233 sequence classification, 231–233 Swadesh wordlists, 65 symbol processing, language processing versus, 442 synonyms, 67 synsets, 67 semantic similarity, 71 in WordNet concept hierarchy, 69 syntactic agreement, 329–331 syntactic cues to word category, 211 syntactic structure, recursion in, 301 syntax, 295–298 syntax errors, 3 T \t tab character in regular expressions, 111 T9 system, entering text on mobile phones, 99 tabs avoiding in code indentation, 138 matching in regular expressions, 109 tag patterns, 266 matching, precedence in, 267 tagging, 179–219 adjectives and adverbs, 186 combining taggers, 205 default tagger, 198 evaluating tagger performance, 201 exploring tagged corpora, 187–189 lookup tagger, 200–201 mapping words to tags using Python dictionaries, 189–198 nouns, 184 part-of-speech (POS) tagging, 229 performance limitations, 206 reading tagged corpora, 181 regular expression tagger, 199 representing tagged tokens, 181 resources for further reading, 214 across sentence boundaries, 208 separating training and testing data, 203 simplified part-of-speech tagset, 183 storing taggers, 206 transformation-based, 208–210 unigram tagging, 202 unknown words, 206 unsimplified POS tags, 187 using POS (part-of-speech) tagger, 179 verbs, 185 tags in feature structures, 340 IOB tags representing chunk structures, 269 XML, 425 tagsets, 179 morphosyntactic information in POS tagsets, 212 simplified POS tagset, 183 terms (first-order logic), 372 test sets, 44, 223 choosing for classification models, 238 testing classifier for document classification, 228 text, 1 computing statistics from, 16–22 counting vocabulary, 7–10 entering on mobile phones (T9 system), 99 as lists of words, 10–16 searching, 4–7 examining common contexts, 5 text alignment, 30 text editor, creating programs with, 56 textonyms, 99 textual entailment, 32 textwrap module, 120 theorem proving in first order logic, 375 timeit module, 164 TIMIT Corpus, 407–412 tokenization, 80 chunking and, 264 in information retrieval, 263 issues with, 111 list produced from tokenizing string, 86 regular expressions for, 109–112 representing tagged tokens, 181 segmentation and, 112 with Unicode strings as input and output, 97 tokenized text, searching, 105 tokens, 8 Toolbox, 66, 412, 431–435 accessing data from XML, using ElementTree, 429 adding field to each entry, 431 resources for further reading, 438 validating lexicon, 432–435 tools for creation, publication, and use of linguistic data, 421 top-down approach to dynamic programming, 167 top-down parsing, 304 total likelihood, 251 training classifier, 223 classifier for document classification, 228 classifier-based chunkers, 274–278 taggers, 203 General Index | 477 unigram chunker using CoNLL 2000 Chunking Corpus, 273 training sets, 223, 225 transformation-based tagging, 208–210 transitive verbs, 314, 391–394 translations comparative wordlists, 66 machine (see machine translation) treebanks, 315–317 trees, 279–281 representing chunks, 270 traversal of, 280 trie, 162 trigram taggers, 204 truth conditions, 368 truth-conditional semantics in first-order logic, 377 tuples, 133 lists versus, 136 parentheses with, 134 representing tagged tokens, 181 Turing Test, 31, 368 type-raising, 390 type-token distinction, 8 TypeError, 157 types, 8, 86 (see also data types) types (first-order logic), 373 U unary predicate, 372 unbounded dependency constructions, 349– 353 defined, 350 underspecified, 333 Unicode, 93–97 decoding and encoding, 94 definition and description of, 94 extracting gfrom files, 94 resources for further information, 122 using your local encoding in Python, 97 unicodedata module, 96 unification, 342–344 unigram taggers confusion matrix for, 240 noun phrase chunking with, 272 unigram tagging, 202 lookup tagger (example), 200 separating training and test data, 203 478 | General Index unique beginners, 69 Universal Feed Parser, 83 universal quantifier, 374 unknown words, tagging, 206 updating dictionary incrementally, 195 US Presidential Inaugural Addresses Corpus, 45 user input, capturing, 85 V valencies, 313 validity of arguments, 369 validity of XML documents, 426 valuation, 377 examining quantifier scope ambiguity, 381 Mace4 model converted to, 384 valuation function, 377 values, 191 complex, 196 variables arguments of predicates in first-order logic, 373 assignment, 378 bound by quantifiers in first-order logic, 373 defining, 14 local, 58 naming, 15 relabeling bound variables, 389 satisfaction of, using to interpret quantified formulas, 380 scope of, 145 verb phrase (VP), 297 verbs agreement paradigm for English regular verbs, 329 auxiliary, 336 auxiliary verbs and inversion of subject and verb, 348 categorizing and tagging, 185 examining for dependency grammar, 312 head of sentence and dependencies, 310 present participle, 211 transitive, 391–394 W \W non-word characters in Python, 110, 111 \w word characters in Python, 110, 111 web text, 42 Web, obtaining data from, 416 websites, obtaining corpora from, 416 weighted grammars, 318–321 probabilistic context-free grammar (PCFG), 320 well-formed (XML), 425 well-formed formulas, 368 well-formed substring tables (WFST), 307– 310 whitespace regular expression characters for, 109 tokenizing text on, 109 wildcard symbol (.), 98 windowdiff scorer, 414 word classes, 179 word comparison operators, 23 word occurrence, counting in text, 8 word offset, 45 word processor files, obtaining data from, 417 word segmentation, 113–116 word sense disambiguation, 28 word sequences, 7 wordlist corpora, 60–63 WordNet, 67–73 concept hierarchy, 69 lemmatizer, 108 more lexical relations, 70 semantic similarity, 71 visualization of hypernym hierarchy using Matplotlib and NetworkX, 170 Words Corpus, 60 words( ) function, 40 wrapping text, 120 Z zero counts (naive Bayes classifier), 249 zero projection, 347 X XML, 425–431 ElementTree interface, 427–429 formatting entries, 430 representation of lexical entry from chunk parsing Toolbox record, 434 resources for further reading, 438 role of, in using to represent linguistic structures, 426 using ElementTree to access Toolbox data, 429 using for linguistic structures, 425 validity of documents, 426 General Index | 479 About the Authors Steven Bird is Associate Professor in the Department of Computer Science and Software Engineering at the University of Melbourne, and Senior Research Associate in the Linguistic Data Consortium at the University of Pennsylvania.

English, 63 code blocks, nested, 25 code examples, downloading, 57 code points, 94 codecs module, 95 coindex (in feature structure), 340 collocations, 20, 81 comma operator (,), 133 comparative wordlists, 65 comparison operators numerical, 22 for words, 23 complements of lexical head, 347 complements of verbs, 313 complex types, 373 complex values, 336 components, language understanding, 31 computational linguistics, challenges of natural language, 441 computer understanding of sentence meaning, 368 concatenation, 11, 88 lists and strings, 87 strings, 16 conclusions in logic, 369 concordances creating, 40 graphical POS-concordance tool, 184 conditional classifiers, 254 conditional expressions, 25 conditional frequency distributions, 44, 52–56 combining with regular expressions, 103 condition and event pairs, 52 counting words by genre, 52 generating random text with bigrams, 55 male and female names ending in each alphabet letter, 62 plotting and tabulating distributions, 53 using to find minimally contrasting set of words, 64 ConditionalFreqDist, 52 commonly used methods, 56 conditionals, 22, 133 confusion matrix, 207, 240 consecutive classification, 232 non phrase chunking with consecutive classifier, 275 consistent, 366 466 | General Index constituent structure, 296 constituents, 297 context exploiting in part-of-speech classifier, 230 for taggers, 203 context-free grammar, 298, 300 (see also grammars) probabilistic context-free grammar, 320 contractions in tokenization, 112 control, 22 control structures, 26 conversion specifiers, 118 conversions of data formats, 419 coordinate structures, 295 coreferential, 373 corpora, 39–52 annotated text corpora, 46–48 Brown Corpus, 42–44 creating and accessing, resources for further reading, 438 defined, 39 differences in corpus access methods, 50 exploring text corpora using a chunker, 267 Gutenberg Corpus, 39–42 Inaugural Address Corpus, 45 from languages other than English, 48 loading your own corpus, 51 obtaining from Web, 416 Reuters Corpus, 44 sources of, 73 tagged, 181–189 text corpus structure, 49–51 web and chat text, 42 wordlists, 60–63 corpora, included with NLTK, 46 corpus case study, structure of TIMIT, 407–412 corpus HOWTOs, 122 life cycle of, 412–416 creation scenarios, 412 curation versus evolution, 415 quality control, 413 widely-used format for, 421 counters, legitimate uses of, 141 cross-validation, 241 CSV (comma-separated value) format, 418 CSV (comma-separated-value) format, 170 D \d decimal digits in regular expressions, 110 \D nondigit characters in regular expressions, 111 data formats, converting, 419 data types dictionary, 190 documentation for Python standard types, 173 finding type of Python objects, 86 function parameter, 146 operations on objects, 86 database query via natural language, 361–365 databases, obtaining data from, 418 debugger (Python), 158 debugging techniques, 158 decimal integers, formatting, 119 decision nodes, 242 decision stumps, 243 decision trees, 242–245 entropy and information gain, 243 decision-tree classifier, 229 declarative style, 140 decoding, 94 def keyword, 9 defaultdict, 193 defensive programming, 159 demonstratives, agreement with noun, 329 dependencies, 310 criteria for, 312 existential dependencies, modeling in XML, 427 non-projective, 312 projective, 311 unbounded dependency constructions, 349–353 dependency grammars, 310–315 valency and the lexicon, 312 dependents, 310 descriptive models, 255 determiners, 186 agreement with nouns, 333 deve-test set, 225 development set, 225 similarity to test set, 238 dialogue act tagging, 214 dialogue acts, identifying types, 235 dialogue systems (see spoken dialogue systems) dictionaries feature set, 223 feature structures as, 337 pronouncing dictionary, 63–65 Python, 189–198 default, 193 defining, 193 dictionary data type, 190 finding key given a value, 197 indexing lists versus, 189 summary of dictionary methods, 197 updating incrementally, 195 storing features and values, 327 translation, 66 dictionary methods, 197 dictionary data structure (Python), 65 directed acyclic graphs (DAGs), 338 discourse module, 401 discourse semantics, 397–402 discourse processing, 400–402 discourse referents, 397 discourse representation structure (DRS), 397 Discourse Representation Theory (DRT), 397–400 dispersion plot, 6 divide-and-conquer strategy, 160 docstrings, 143 contents and structure of, 148 example of complete docstring, 148 module-level, 155 doctest block, 148 doctest module, 160 document classification, 227 documentation functions, 148 online Python documentation, versions and, 173 Python, resources for further information, 173 docutils module, 148 domain (of a model), 377 DRS (discourse representation structure), 397 DRS conditions, 397 DRT (Discourse Representation Theory), 397– 400 Dublin Core Metadata initiative, 435 duck typing, 281 dynamic programming, 165 General Index | 467 application to parsing with context-free grammar, 307 different approaches to, 167 E Earley chart parser, 334 electronic books, 80 elements, XML, 425 ElementTree interface, 427–429 using to access Toolbox data, 429 elif clause, if . . . elif statement, 133 elif statements, 26 else statements, 26 encoding, 94 encoding features, 223 encoding parameters, codecs module, 95 endangered languages, special considerations with, 423–424 entities, 373 entity detection, using chunking, 264–270 entries adding field to, in Toolbox, 431 contents of, 60 converting data formats, 419 formatting in XML, 430 entropy, 251 (see also Maximum Entropy classifiers) calculating for gender prediction task, 243 maximizing in Maximum Entropy classifier, 252 epytext markup language, 148 equality, 132, 372 equivalence (<->) operator, 368 equivalent, 340 error analysis, 225 errors runtime, 13 sources of, 156 syntax, 3 evaluation sets, 238 events, pairing with conditions in conditional frequency distribution, 52 exceptions, 158 existential quantifier, 374 exists operator, 376 Expected Likelihood Estimation, 249 exporting data, 117 468 | General Index F f-structure, 357 feature extractors defining for dialogue acts, 235 defining for document classification, 228 defining for noun phrase (NP) chunker, 276–278 defining for punctuation, 234 defining for suffix checking, 229 Recognizing Textual Entailment (RTE), 236 selecting relevant features, 224–227 feature paths, 339 feature sets, 223 feature structures, 328 order of features, 337 resources for further reading, 357 feature-based grammars, 327–360 auxiliary verbs and inversion, 348 case and gender in German, 353 example grammar, 333 extending, 344–356 lexical heads, 347 parsing using Earley chart parser, 334 processing feature structures, 337–344 subsumption and unification, 341–344 resources for further reading, 357 subcategorization, 344–347 syntactic agreement, 329–331 terminology, 336 translating from English to SQL, 362 unbounded dependency constructions, 349–353 using attributes and constraints, 331–336 features, 223 non-binary features in naive Bayes classifier, 249 fields, 136 file formats, libraries for, 172 files opening and reading local files, 84 writing program output to, 120 fillers, 349 first-order logic, 372–385 individual variables and assignments, 378 model building, 383 quantifier scope ambiguity, 381 summary of language, 376 syntax, 372–375 theorem proving, 375 truth in model, 377 floating-point numbers, formatting, 119 folds, 241 for statements, 26 combining with if statements, 26 inside a list comprehension, 63 iterating over characters in strings, 90 format strings, 118 formatting program output, 116–121 converting from lists to strings, 116 strings and formats, 117–118 text wrapping, 120 writing results to file, 120 formulas of propositional logic, 368 formulas, type (t), 373 free, 375 Frege’s Principle, 385 frequency distributions, 17, 22 conditional (see conditional frequency distributions) functions defined for, 22 letters, occurrence in strings, 90 functions, 142–154 abstraction provided by, 147 accumulative, 150 as arguments to another function, 149 call-by-value parameter passing, 144 checking parameter types, 146 defined, 9, 57 documentation for Python built-in functions, 173 documenting, 148 errors from, 157 for frequency distributions, 22 for iteration over sequences, 134 generating plurals of nouns (example), 58 higher-order, 151 inputs and outputs, 143 named arguments, 152 naming, 142 poorly-designed, 147 recursive, call structure, 165 saving in modules, 59 variable scope, 145 well-designed, 147 gazetteer, 282 gender identification, 222 Decision Tree model for, 242 gender in German, 353–356 Generalized Phrase Structure Grammar (GPSG), 345 generate_model ( ) function, 55 generation of language output, 29 generative classifiers, 254 generator expressions, 138 functions exemplifying, 151 genres, systematic differences between, 42–44 German, case and gender in, 353–356 gerunds, 211 glyphs, 94 gold standard, 201 government-sponsored challenges to machine learning application in NLP, 257 gradient (grammaticality), 318 grammars, 327 (see also feature-based grammars) chunk grammar, 265 context-free, 298–302 parsing with, 302–310 validating Toolbox entries with, 433 writing your own, 300 dependency, 310–315 development, 315–321 problems with ambiguity, 317 treebanks and grammars, 315–317 weighted grammar, 318–321 dilemmas in sentence structure analysis, 292–295 resources for further reading, 322 scaling up, 315 grammatical category, 328 graphical displays of data conditional frequency distributions, 56 Matplotlib, 168–170 graphs defining and manipulating, 170 directed acyclic graphs, 338 greedy sequence classification, 232 Gutenberg Corpus, 40–42, 80 G hapaxes, 19 hash arrays, 189, 190 (see also dictionaries) gaps, 349 H General Index | 469 head of a sentence, 310 criteria for head and dependencies, 312 heads, lexical, 347 headword (lemma), 60 Heldout Estimation, 249 hexadecimal notation for Unicode string literal, 95 Hidden Markov Models, 233 higher-order functions, 151 holonyms, 70 homonyms, 60 HTML documents, 82 HTML markup, stripping out, 418 hypernyms, 70 searching corpora for, 106 semantic similarity and, 72 hyphens in tokenization, 110 hyponyms, 69 I identifiers for variables, 15 idioms, Python, 24 IDLE (Interactive DeveLopment Environment), 2 if . . . elif statements, 133 if statements, 25 combining with for statements, 26 conditions in, 133 immediate constituents, 297 immutable, 93 implication (->) operator, 368 in operator, 91 Inaugural Address Corpus, 45 inconsistent, 366 indenting code, 138 independence assumption, 248 naivete of, 249 indexes counting from zero (0), 12 list, 12–14 mapping dictionary definition to lexeme, 419 speeding up program by using, 163 string, 15, 89, 91 text index created using a stemmer, 107 words containing a given consonant-vowel pair, 103 inference, 369 information extraction, 261–289 470 | General Index architecture of system, 263 chunking, 264–270 defined, 262 developing and evaluating chunkers, 270– 278 named entity recognition, 281–284 recursion in linguistic structure, 278–281 relation extraction, 284 resources for further reading, 286 information gain, 243 inside, outside, begin tags (see IOB tags) integer ordinal, finding for character, 95 interpreter >>> prompt, 2 accessing, 2 using text editor instead of to write programs, 56 inverted clauses, 348 IOB tags, 269, 286 reading, 270–272 is operator, 145 testing for object identity, 132 ISO 639 language codes, 65 iterative optimization techniques, 251 J joint classifier models, 231 joint-features (maximum entropy model), 252 K Kappa coefficient (k), 414 keys, 65, 191 complex, 196 keyword arguments, 153 Kleene closures, 100 L lambda expressions, 150, 386–390 example, 152 lambda operator (λ), 386 Lancaster stemmer, 107 language codes, 65 language output, generating, 29 language processing, symbol processing versus, 442 language resources describing using OLAC metadata, 435–437 LanguageLog (linguistics blog), 35 latent semantic analysis, 171 Latin-2 character encoding, 94 leaf nodes, 242 left-corner parser, 306 left-recursive, 302 lemmas, 60 lexical relationships between, 71 pairing of synset with a word, 68 lemmatization, 107 example of, 108 length of a text, 7 letter trie, 162 lexical categories, 179 lexical entry, 60 lexical relations, 70 lexical resources comparative wordlists, 65 pronouncing dictionary, 63–65 Shoebox and Toolbox lexicons, 66 wordlist corpora, 60–63 lexicon, 60 (see also lexical resources) chunking Toolbox lexicon, 434 defined, 60 validating in Toolbox, 432–435 LGB rule of name resolution, 145 licensed, 350 likelihood ratios, 224 Linear-Chain Conditional Random Field Models, 233 linguistic objects, mappings from keys to values, 190 linguistic patterns, modeling, 255 linguistics and NLP-related concepts, resources for, 34 list comprehensions, 24 for statement in, 63 function invoked in, 64 used as function parameters, 55 lists, 10 appending item to, 11 concatenating, using + operator, 11 converting to strings, 116 indexing, 12–14 indexing, dictionaries versus, 189 normalizing and sorting, 86 Python list type, 86 sorted, 14 strings versus, 92 tuples versus, 136 local variables, 58 logic first-order, 372–385 natural language, semantics, and, 365–368 propositional, 368–371 resources for further reading, 404 logical constants, 372 logical form, 368 logical proofs, 370 loops, 26 looping with conditions, 26 lowercase, converting text to, 45, 107 M machine learning application to NLP, web pages for government challenges, 257 decision trees, 242–245 Maximum Entropy classifiers, 251–254 naive Bayes classifiers, 246–250 packages, 237 resources for further reading, 257 supervised classification, 221–237 machine translation (MT) limitations of, 30 using NLTK’s babelizer, 30 mapping, 189 Matplotlib package, 168–170 maximal projection, 347 Maximum Entropy classifiers, 251–254 Maximum Entropy Markov Models, 233 Maximum Entropy principle, 253 memoization, 167 meronyms, 70 metadata, 435 OLAC (Open Language Archives Community), 435 modals, 186 model building, 383 model checking, 379 models interpretation of sentences of logical language, 371 of linguistic patterns, 255 representation using set theory, 367 truth-conditional semantics in first-order logic, 377 General Index | 471 what can be learned from models of language, 255 modifiers, 314 modules defined, 59 multimodule programs, 156 structure of Python module, 154 morphological analysis, 213 morphological cues to word category, 211 morphological tagging, 214 morphosyntactic information in tagsets, 212 MSWord, text from, 85 mutable, 93 N \n newline character in regular expressions, 111 n-gram tagging, 203–208 across sentence boundaries, 208 combining taggers, 205 n-gram tagger as generalization of unigram tagger, 203 performance limitations, 206 separating training and test data, 203 storing taggers, 206 unigram tagging, 203 unknown words, 206 naive Bayes assumption, 248 naive Bayes classifier, 246–250 developing for gender identification task, 223 double-counting problem, 250 as generative classifier, 254 naivete of independence assumption, 249 non-binary features, 249 underlying probabilistic model, 248 zero counts and smoothing, 248 name resolution, LGB rule for, 145 named arguments, 152 named entities commonly used types of, 281 relations between, 284 named entity recognition (NER), 281–284 Names Corpus, 61 negative lookahead assertion, 284 NER (see named entity recognition) nested code blocks, 25 NetworkX package, 170 new words in languages, 212 472 | General Index newlines, 84 matching in regular expressions, 109 printing with print statement, 90 resources for further information, 122 non-logical constants, 372 non-standard words, 108 normalizing text, 107–108 lemmatization, 108 using stemmers, 107 noun phrase (NP), 297 noun phrase (NP) chunking, 264 regular expression–based NP chunker, 267 using unigram tagger, 272 noun phrases, quantified, 390 nouns categorizing and tagging, 184 program to find most frequent noun tags, 187 syntactic agreement, 329 numerically intense algorithms in Python, increasing efficiency of, 257 NumPy package, 171 O object references, 130 copying, 132 objective function, 114 objects, finding data type for, 86 OLAC metadata, 74, 435 definition of metadata, 435 Open Language Archives Community, 435 Open Archives Initiative (OAI), 435 open class, 212 open formula, 374 Open Language Archives Community (OLAC), 435 operators, 369 (see also names of individual operators) addition and multiplication, 88 Boolean, 368 numerical comparison, 22 scope of, 157 word comparison, 23 or operator, 24 orthography, 328 out-of-vocabulary items, 206 overfitting, 225, 245 P packages, 59 parameters, 57 call-by-value parameter passing, 144 checking types of, 146 defined, 9 defining for functions, 143 parent nodes, 279 parsing, 318 (see also grammars) with context-free grammar left-corner parser, 306 recursive descent parsing, 303 shift-reduce parsing, 304 well-formed substring tables, 307–310 Earley chart parser, parsing feature-based grammars, 334 parsers, 302 projective dependency parser, 311 part-of-speech tagging (see POS tagging) partial information, 341 parts of speech, 179 PDF text, 85 Penn Treebank Corpus, 51, 315 personal pronouns, 186 philosophical divides in contemporary NLP, 444 phonetics computer-readable phonetic alphabet (SAMPA), 137 phones, 63 resources for further information, 74 phrasal level, 347 phrasal projections, 347 pipeline for NLP, 31 pixel images, 169 plotting functions, Matplotlib, 168 Porter stemmer, 107 POS (part-of-speech) tagging, 179, 208, 229 (see also tagging) differences in POS tagsets, 213 examining word context, 230 finding IOB chunk tag for word's POS tag, 272 in information retrieval, 263 morphology in POS tagsets, 212 resources for further reading, 214 simplified tagset, 183 storing POS tags in tagged corpora, 181 tagged data from four Indian languages, 182 unsimplifed tags, 187 use in noun phrase chunking, 265 using consecutive classifier, 231 pre-sorting, 160 precision, evaluating search tasks for, 239 precision/recall trade-off in information retrieval, 205 predicates (first-order logic), 372 prepositional phrase (PP), 297 prepositional phrase attachment ambiguity, 300 Prepositional Phrase Attachment Corpus, 316 prepositions, 186 present participles, 211 Principle of Compositionality, 385, 443 print statements, 89 newline at end, 90 string formats and, 117 prior probability, 246 probabilistic context-free grammar (PCFG), 320 probabilistic model, naive Bayes classifier, 248 probabilistic parsing, 318 procedural style, 139 processing pipeline (NLP), 86 productions in grammars, 293 rules for writing CFGs for parsing in NLTK, 301 program development, 154–160 debugging techniques, 158 defensive programming, 159 multimodule programs, 156 Python module structure, 154 sources of error, 156 programming style, 139 programs, writing, 129–177 advanced features of functions, 149–154 algorithm design, 160–167 assignment, 130 conditionals, 133 equality, 132 functions, 142–149 resources for further reading, 173 sequences, 133–138 style considerations, 138–142 legitimate uses for counters, 141 procedural versus declarative style, 139 General Index | 473 Python coding style, 138 summary of important points, 172 using Python libraries, 167–172 Project Gutenberg, 80 projections, 347 projective, 311 pronouncing dictionary, 63–65 pronouns anaphoric antecedents, 397 interpreting in first-order logic, 373 resolving in discourse processing, 401 proof goal, 376 properties of linguistic categories, 331 propositional logic, 368–371 Boolean operators, 368 propositional symbols, 368 pruning decision nodes, 245 punctuation, classifier for, 233 Python carriage return and linefeed characters, 80 codecs module, 95 dictionary data structure, 65 dictionary methods, summary of, 197 documentation, 173 documentation and information resources, 34 ElementTree module, 427 errors in understanding semantics of, 157 finding type of any object, 86 getting started, 2 increasing efficiency of numerically intense algorithms, 257 libraries, 167–172 CSV, 170 Matplotlib, 168–170 NetworkX, 170 NumPy, 171 other, 172 reference materials, 122 style guide for Python code, 138 textwrap module, 120 Python Package Index, 172 Q quality control in corpus creation, 413 quantification first-order logic, 373, 380 quantified noun phrases, 390 scope ambiguity, 381, 394–397 474 | General Index quantified formulas, interpretation of, 380 questions, answering, 29 quotation marks in strings, 87 R random text generating in various styles, 6 generating using bigrams, 55 raster (pixel) images, 169 raw strings, 101 raw text, processing, 79–128 capturing user input, 85 detecting word patterns with regular expressions, 97–101 formatting from lists to strings, 116–121 HTML documents, 82 NLP pipeline, 86 normalizing text, 107–108 reading local files, 84 regular expressions for tokenizing text, 109– 112 resources for further reading, 122 RSS feeds, 83 search engine results, 82 segmentation, 112–116 strings, lowest level text processing, 87–93 summary of important points, 121 text from web and from disk, 80 text in binary formats, 85 useful applications of regular expressions, 102–106 using Unicode, 93–97 raw( ) function, 41 re module, 101, 110 recall, evaluating search tasks for, 240 Recognizing Textual Entailment (RTE), 32, 235 exploiting word context, 230 records, 136 recursion, 161 function to compute Sanskrit meter (example), 165 in linguistic structure, 278–281 tree traversal, 280 trees, 279–280 performance and, 163 in syntactic structure, 301 recursive, 301 recursive descent parsing, 303 reentrancy, 340 references (see object references) regression testing framework, 160 regular expressions, 97–106 character class and other symbols, 110 chunker based on, evaluating, 272 extracting word pieces, 102 finding word stems, 104 matching initial and final vowel sequences and all consonants, 102 metacharacters, 101 metacharacters, summary of, 101 noun phrase (NP) chunker based on, 265 ranges and closures, 99 resources for further information, 122 searching tokenized text, 105 symbols, 110 tagger, 199 tokenizing text, 109–112 use in PlaintextCorpusReader, 51 using basic metacharacters, 98 using for relation extraction, 284 using with conditional frequency distributions, 103 relation detection, 263 relation extraction, 284 relational operators, 22 reserved words, 15 return statements, 144 return value, 57 reusing code, 56–59 creating programs using a text editor, 56 functions, 57 modules, 59 Reuters Corpus, 44 root element (XML), 427 root hypernyms, 70 root node, 242 root synsets, 69 Rotokas language, 66 extracting all consonant-vowel sequences from words, 103 Toolbox file containing lexicon, 429 RSS feeds, 83 feedparser library, 172 RTE (Recognizing Textual Entailment), 32, 235 exploiting word context, 230 runtime errors, 13 S \s whitespace characters in regular expressions, 111 \S nonwhitespace characters in regular expressions, 111 SAMPA computer-readable phonetic alphabet, 137 Sanskrit meter, computing, 165 satisfies, 379 scope of quantifiers, 381 scope of variables, 145 searches binary search, 160 evaluating for precision and recall, 239 processing search engine results, 82 using POS tags, 187 segmentation, 112–116 in chunking and tokenization, 264 sentence, 112 word, 113–116 semantic cues to word category, 211 semantic interpretations, NLTK functions for, 393 semantic role labeling, 29 semantics natural language, logic and, 365–368 natural language, resources for information, 403 semantics of English sentences, 385–397 quantifier ambiguity, 394–397 transitive verbs, 391–394 ⋏-calculus, 386–390 SemCor tagging, 214 sentence boundaries, tagging across, 208 sentence segmentation, 112, 233 in chunking, 264 in information retrieval process, 263 sentence structure, analyzing, 291–326 context-free grammar, 298–302 dependencies and dependency grammar, 310–315 grammar development, 315–321 grammatical dilemmas, 292 parsing with context-free grammar, 302– 310 resources for further reading, 322 summary of important points, 321 syntax, 295–298 sents( ) function, 41 General Index | 475 sequence classification, 231–233 other methods, 233 POS tagging with consecutive classifier, 232 sequence iteration, 134 sequences, 133–138 combining different sequence types, 136 converting between sequence types, 135 operations on sequence types, 134 processing using generator expressions, 137 strings and lists as, 92 shift operation, 305 shift-reduce parsing, 304 Shoebox, 66, 412 sibling nodes, 279 signature, 373 similarity, semantic, 71 Sinica Treebank Corpus, 316 slash categories, 350 slicing lists, 12, 13 strings, 15, 90 smoothing, 249 space-time trade-offs in algorihm design, 163 spaces, matching in regular expressions, 109 Speech Synthesis Markup Language (W3C SSML), 214 spellcheckers, Words Corpus used by, 60 spoken dialogue systems, 31 spreadsheets, obtaining data from, 418 SQL (Structured Query Language), 362 translating English sentence to, 362 stack trace, 158 standards for linguistic data creation, 421 standoff annotation, 415, 421 start symbol for grammars, 298, 334 startswith( ) function, 45 stemming, 107 NLTK HOWTO, 122 stemmers, 107 using regular expressions, 104 using stem( ) fuinction, 105 stopwords, 60 stress (in pronunciation), 64 string formatting expressions, 117 string literals, Unicode string literal in Python, 95 strings, 15, 87–93 476 | General Index accessing individual characters, 89 accessing substrings, 90 basic operations with, 87–89 converting lists to, 116 formats, 117–118 formatting lining things up, 118 tabulating data, 119 immutability of, 93 lists versus, 92 methods, 92 more operations on, useful string methods, 92 printing, 89 Python’s str data type, 86 regular expressions as, 101 tokenizing, 86 structurally ambiguous sentences, 300 structure sharing, 340 interaction with unification, 343 structured data, 261 style guide for Python code, 138 stylistics, 43 subcategories of verbs, 314 subcategorization, 344–347 substrings (WFST), 307 substrings, accessing, 90 subsumes, 341 subsumption, 341–344 suffixes, classifier for, 229 supervised classification, 222–237 choosing features, 224–227 documents, 227 exploiting context, 230 gender identification, 222 identifying dialogue act types, 235 part-of-speech tagging, 229 Recognizing Textual Entailment (RTE), 235 scaling up to large datasets, 237 sentence segmentation, 233 sequence classification, 231–233 Swadesh wordlists, 65 symbol processing, language processing versus, 442 synonyms, 67 synsets, 67 semantic similarity, 71 in WordNet concept hierarchy, 69 syntactic agreement, 329–331 syntactic cues to word category, 211 syntactic structure, recursion in, 301 syntax, 295–298 syntax errors, 3 T \t tab character in regular expressions, 111 T9 system, entering text on mobile phones, 99 tabs avoiding in code indentation, 138 matching in regular expressions, 109 tag patterns, 266 matching, precedence in, 267 tagging, 179–219 adjectives and adverbs, 186 combining taggers, 205 default tagger, 198 evaluating tagger performance, 201 exploring tagged corpora, 187–189 lookup tagger, 200–201 mapping words to tags using Python dictionaries, 189–198 nouns, 184 part-of-speech (POS) tagging, 229 performance limitations, 206 reading tagged corpora, 181 regular expression tagger, 199 representing tagged tokens, 181 resources for further reading, 214 across sentence boundaries, 208 separating training and testing data, 203 simplified part-of-speech tagset, 183 storing taggers, 206 transformation-based, 208–210 unigram tagging, 202 unknown words, 206 unsimplified POS tags, 187 using POS (part-of-speech) tagger, 179 verbs, 185 tags in feature structures, 340 IOB tags representing chunk structures, 269 XML, 425 tagsets, 179 morphosyntactic information in POS tagsets, 212 simplified POS tagset, 183 terms (first-order logic), 372 test sets, 44, 223 choosing for classification models, 238 testing classifier for document classification, 228 text, 1 computing statistics from, 16–22 counting vocabulary, 7–10 entering on mobile phones (T9 system), 99 as lists of words, 10–16 searching, 4–7 examining common contexts, 5 text alignment, 30 text editor, creating programs with, 56 textonyms, 99 textual entailment, 32 textwrap module, 120 theorem proving in first order logic, 375 timeit module, 164 TIMIT Corpus, 407–412 tokenization, 80 chunking and, 264 in information retrieval, 263 issues with, 111 list produced from tokenizing string, 86 regular expressions for, 109–112 representing tagged tokens, 181 segmentation and, 112 with Unicode strings as input and output, 97 tokenized text, searching, 105 tokens, 8 Toolbox, 66, 412, 431–435 accessing data from XML, using ElementTree, 429 adding field to each entry, 431 resources for further reading, 438 validating lexicon, 432–435 tools for creation, publication, and use of linguistic data, 421 top-down approach to dynamic programming, 167 top-down parsing, 304 total likelihood, 251 training classifier, 223 classifier for document classification, 228 classifier-based chunkers, 274–278 taggers, 203 General Index | 477 unigram chunker using CoNLL 2000 Chunking Corpus, 273 training sets, 223, 225 transformation-based tagging, 208–210 transitive verbs, 314, 391–394 translations comparative wordlists, 66 machine (see machine translation) treebanks, 315–317 trees, 279–281 representing chunks, 270 traversal of, 280 trie, 162 trigram taggers, 204 truth conditions, 368 truth-conditional semantics in first-order logic, 377 tuples, 133 lists versus, 136 parentheses with, 134 representing tagged tokens, 181 Turing Test, 31, 368 type-raising, 390 type-token distinction, 8 TypeError, 157 types, 8, 86 (see also data types) types (first-order logic), 373 U unary predicate, 372 unbounded dependency constructions, 349– 353 defined, 350 underspecified, 333 Unicode, 93–97 decoding and encoding, 94 definition and description of, 94 extracting gfrom files, 94 resources for further information, 122 using your local encoding in Python, 97 unicodedata module, 96 unification, 342–344 unigram taggers confusion matrix for, 240 noun phrase chunking with, 272 unigram tagging, 202 lookup tagger (example), 200 separating training and test data, 203 478 | General Index unique beginners, 69 Universal Feed Parser, 83 universal quantifier, 374 unknown words, tagging, 206 updating dictionary incrementally, 195 US Presidential Inaugural Addresses Corpus, 45 user input, capturing, 85 V valencies, 313 validity of arguments, 369 validity of XML documents, 426 valuation, 377 examining quantifier scope ambiguity, 381 Mace4 model converted to, 384 valuation function, 377 values, 191 complex, 196 variables arguments of predicates in first-order logic, 373 assignment, 378 bound by quantifiers in first-order logic, 373 defining, 14 local, 58 naming, 15 relabeling bound variables, 389 satisfaction of, using to interpret quantified formulas, 380 scope of, 145 verb phrase (VP), 297 verbs agreement paradigm for English regular verbs, 329 auxiliary, 336 auxiliary verbs and inversion of subject and verb, 348 categorizing and tagging, 185 examining for dependency grammar, 312 head of sentence and dependencies, 310 present participle, 211 transitive, 391–394 W \W non-word characters in Python, 110, 111 \w word characters in Python, 110, 111 web text, 42 Web, obtaining data from, 416 websites, obtaining corpora from, 416 weighted grammars, 318–321 probabilistic context-free grammar (PCFG), 320 well-formed (XML), 425 well-formed formulas, 368 well-formed substring tables (WFST), 307– 310 whitespace regular expression characters for, 109 tokenizing text on, 109 wildcard symbol (.), 98 windowdiff scorer, 414 word classes, 179 word comparison operators, 23 word occurrence, counting in text, 8 word offset, 45 word processor files, obtaining data from, 417 word segmentation, 113–116 word sense disambiguation, 28 word sequences, 7 wordlist corpora, 60–63 WordNet, 67–73 concept hierarchy, 69 lemmatizer, 108 more lexical relations, 70 semantic similarity, 71 visualization of hypernym hierarchy using Matplotlib and NetworkX, 170 Words Corpus, 60 words( ) function, 40 wrapping text, 120 Z zero counts (naive Bayes classifier), 249 zero projection, 347 X XML, 425–431 ElementTree interface, 427–429 formatting entries, 430 representation of lexical entry from chunk parsing Toolbox record, 434 resources for further reading, 438 role of, in using to represent linguistic structures, 426 using ElementTree to access Toolbox data, 429 using for linguistic structures, 425 validity of documents, 426 General Index | 479 About the Authors Steven Bird is Associate Professor in the Department of Computer Science and Software Engineering at the University of Melbourne, and Senior Research Associate in the Linguistic Data Consortium at the University of Pennsylvania.

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

This remark, widely attributed to Brooks in many sources, appears to have been first stated as “It turns out to be better to use the world as its own model” in Brooks, “Intelligence Without Representation.” 3. The now-famous statistical adage “All models are wrong” first appeared in Box, “Science and Statistics”; it later appeared with the silver lining “but some are useful” in Box, “Robustness in the Strategy of Scientific Model Building.” PROLOGUE 1. Information about Walter Pitts’s life is incredibly scarce. I have drawn from what little primary-source material there is, chiefly Pitts’s letters to Warren McCulloch, which are accessible in the McCulloch archive at the American Philosophical Society in Philadelphia. I’m grateful for the kind assistance of the staff there.

The “intrinsic curiosity module” is actually a bit subtler and more complex than this, because it is designed to predict only user-controllable aspects of the screen, for which another, “inverse dynamics” model is used. For the full details, see Pathak et al., “Curiosity-Driven Exploration by Self-Supervised Prediction.” For some other related approaches, which incentivize exploration by rewarding “information gain,” see, e.g., Schmidhuber, “Curious Model-Building Control Systems”; Stadie, Levine, and Abbeel, “Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models”; and Houthooft et al., “VIME.” 51. Burda et al., “Large-Scale Study of Curiosity-Driven Learning.” 52. See Burda et al., “Exploration by Random Network Distillation.” 53.

Bourgin, David D., Joshua C. Peterson, Daniel Reichman, Thomas L. Griffiths, and Stuart J. Russell. “Cognitive Model Priors for Predicting Human Decisions.” In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. Box, George E. P. “Robustness in the Strategy of Scientific Model Building.” In Robustness in Statistics, edited by Robert L. Launer and Graham N. Wilkinson, 201–36. Academic Press, 1979. ———. “Science and Statistics.” Journal of the American Statistical Association 71, no. 356 (December 1976): 791–99. Brandt, Felix, Vincent Conitzer, Ulle Endriss, Jérôme Lang, and Ariel D Procaccia.

pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets
by David J. Leinweber
Published 31 Dec 2008

These techniques are based on the idea of testing the hypothesis that a new model has predictive superiority over a previous benchmark model. The new model is clearly data-mined to some extent, given that the benchmark model was developed beforehand.6 • Use truly bogus test models. You can calibrate the model-building process using a model based on random data. There is a ferocious Stupid Data Miner Tricks 145 amount of technology that can be brought to bear on forecasting problems. One neural net product advertised in Technical Analysis of Stocks & Commodities magazine claims to be able to “forecast any market, using any data.”7 This is no doubt true, subject to the usual caveats.

One neural net product advertised in Technical Analysis of Stocks & Commodities magazine claims to be able to “forecast any market, using any data.”7 This is no doubt true, subject to the usual caveats. Throw in enough genetic algorithms, wavelets, and the like, and you are certain to come up with a model. But is it any good? A useful indicator in answering this question is to take the same model-building process and use it to build a test model for the same forecast target, but using a completely random set of data.8 This test model has to be truly bogus. If your actual model has performance statistics similar to the bogus test version, you know it’s time to visit the data miner’s rehabilitation clinic. Summary (and Sermonette) These dairy product and calendar examples are obviously contrived.

Leaving some of the data out of the sample used to build the model is a good idea as is holding back some data to use in testing the model. This holdback sample can be a period of time or a cross section of data. The cross-sectional holdback works where there is enough data to do this, as in the analysis of individual stocks.You can use stocks with symbols starting with A through L for model building and save M through Z for verification purposes. 146 Nerds on Wall Str eet It is possible to mine these holdback samples as well. Every time you visit the out-of-sample period for verification purposes, you do a little more data mining. Testing the process to see if you can produce models of similar quality using purely random data is often a sobering experience.

pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline
by Cathy O'Neil and Rachel Schutt
Published 8 Oct 2013

models, Modeling–Exploratory Data Analysis, Code readability and reusability adding assumptions about errors, Adding in modeling assumptions about the errors–Adding in modeling assumptions about the errors algorithms vs., Linear Regression, Linear Regression at scale, Sample R code: K-NN on the housing dataset autocorrelation, correcting, A Baby Model–A Baby Model building, But how do you build a model?, Model Building Tips–Productionizing machine learning models causal, In-Sample, Out-of-Sample, and Causality causality and, In-Sample, Out-of-Sample, and Causality–In-Sample, Out-of-Sample, and Causality coding for, Code readability and reusability–Get a pair! data, Modeling defined, What is a model?

There is no one obvious definition, and, in fact, a multitude of definitions might work depending on the context—there is no ground truth! Some definitions of engagement could depend on the frequency or rhythm with which a user comes to a site, or how much they create or consume content. It’s a semi-supervised learning problem where you’re simultaneously trying to define the labels as well as predict them. Model Building Tips Here are a few good guidelines to building good production models: Models are not black boxes You can’t build a good model by assuming that the algorithm will take care of everything. For instance, you need to know why you are misclassifying certain people, so you’ll need to roll up your sleeves and dig into your model to look at what happened.

pages: 245 words: 12,162

In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation
by William J. Cook
Published 1 Jan 2011

It terms of money, a New York Times article by Gina Kolata states, “Solving linear programming problems for industry is a multibillion-dollar-a-year business.”11 Take that, Professor Hotelling. Readers interested in learning the art of capturing problems with linear constraints can find what they are looking for in Paul Williams’s excellent book Model Building in Mathematical Programming.12 Williams’s examples include food manufacturing, refinery optimization, farm planning, mining, airline pricing, power generation, and on and on. The Simplex Algorithm How often do you see mathematics described on the front page of the New York Times? The proof of Fermat’s Last Theorem made the cut, but the Four-Color Theorem missed out.

The term “nonnegative” is not the same as “positive,” which means strictly greater than 0. Safire, W. 1990. “On Language”. New York Times, February 11. Cottle, R. W. 2006. Math. Program. 105, 1–8. Dantzig (1991). Gill, P. E., et al. 2008. Discrete Optim. 5, 151–58. Dantzig (1963). Kolata, G. 1989. New York Times, March 12. Notes to Chapter 6 12. Williams, H. P. 1999. Model Building in Mathematical Programming. John Wiley & Sons, Chichester, UK. 13. Dongara, J., F. Sullivan. 2000. Comp. Sci. Eng. 2, 22–23. 14. Chvátal, V. 1983. Linear Programming. W. H. Freeman and Company, New York. 15. In an LP model we must assume that we can produce and sell fractions of a widget; we will come back to this point at the end of the chapter. 16. http://campuscgi.princeton.edu/∼rvdb/JAVA/pivot/simple.html.

pages: 50 words: 13,399

The Elements of Data Analytic Style
by Jeff Leek
Published 1 Mar 2015

Figure 2.2 A spurious correlation Particular caution should be used when applying words such as “cause” and “effect” when performing inferential analysis. Causal language applied to even clearly labeled inferential analyses may lead to misinterpretation - a phenomenon called causation creep. 2.8.2 Overfitting Interpreting an exploratory analysis as predictive A common mistake is to use a single, unsplit data set for both model building and testing. If you apply a prediction model to the same data set used to build the model you can only estimate “resubstitution error” or “training set error”. These estimates are very optimistic estimates of the error you would get if using the model in practice. If you try enough models on the same set of data, you eventually can predict perfectly. 2.8.3 n of 1 analysis Descriptive versus inferential analysis.

pages: 268 words: 75,490

The Knowledge Economy
by Roberto Mangabeira Unger
Published 19 Mar 2019

It is also to slight the diversity of the history of economics, exemplified by the late nineteenth-century rivals to the marginalist program such as Edgeworth’s treatment of economics as a psychological science in the tradition of Bentham or Marshall’s proposal to develop economics as a science of loosely connected and context-bound causal sequences in the manner of natural history and by analogy to the science of tides or of the weather. Under this practice of fragmentary theory from the inside, the thinker must engage the specialized discipline on its own terms (for it will refuse to rise to his) and by its own standards, including those of its mathematics and of its model-building, while holding himself to different terms and higher standards. He must show the steps by which the established economics can expand its vision, enlarge its tools, and relate insight into the actual and imagination of the adjacent possible, given that to understand something is always to grasp what it can become.

Consider, as an example, the vexed question of the role of mathematics in economic theory. In the analytic practice inaugurated by the marginalists, mathematics acquired a central function: it served as the favored instrument of a practice of economics, one that was closer to logic than to causal science. In the model-building into which post-marginalist economics devolved, mathematics remained the fundamental tool, exposing the implications of each model of a piece of economic activity on the basis of factual stipulations and causal theories, as well as in the light of normative commitments, supplied from outside the apparatus of economic analysis.

pages: 290 words: 76,216

What's Wrong With Economics: A Primer for the Perplexed
by Robert Skidelsky
Published 3 Mar 2020

The normative or prescriptive purpose of modelling is hardly ever acknowledged, because economics is supposed to be ‘scientific’ and ‘value-free’. The economist Jevons put one view of the task of economics simply: ‘The investigator begins with the facts and ends with them.’ In his conception, there are three stages in model-building: the inductive hypothesis, the deduction of a conclusion, the testing of the conclusion against reality.5 The process may be illustrated as follows. An observation suggests a ‘conjecture’ or ‘hypothesis’ as to why something may be the case. You then develop a theory which involves establishing a causal link between your conjecture and other factors called variables.

Are economists’ models intended as replications or simplifications of actual behaviour, or are they intended to create behaviour consistent with the economists’ models – to create self-fulfilling prophecies, so to speak? It seems pretty obvious that economic models are intended to be both descriptive and prescriptive, wobbling between claims that this is how humans behave in fact, and this is how they should behave, both converging on a predictive claim. Paul Krugman (b.1953) has described the model-building process as follows: ‘You make a set of clearly untrue simplifications to get the system down to something you can handle; those simplifications are dictated partly by guesses about what is important, partly by the modelling techniques available. And the end result, if the model is a good one, is an improved insight into why the vastly more complex real system behaves the way it does.’10 The argument here is that economists need the untrue simplifications to get the generalising machinery going.

pages: 398 words: 31,161

Gnuplot in Action: Understanding Data With Graphs
by Philipp Janert
Published 2 Jan 2010

A catalogue of more advanced mathematical techniques helpful in data analysis and model building. The choice of topics is excellent, but the presentation often seems a bit aloof and too terse for the uninitiated. Very expensive. An Introduction to Mathematical Modeling by Edward A. Bender. Dover (1978, 2000). Short and idiosyncratic. A variety of problems are investigated and mathematical models developed to help answer specific questions. Requires only basic math skills, the emphasis being on the conceptual model building process. Problem Solving: A Statistician’s Guide by Chris Chatfield. Chapman & Hall (1995).

In comparison to statistical methods, it helps us discover new and possibly quite unexpected behavior. Moreover, it helps us develop an intuitive understanding of the data and the information it contains. Since it doesn’t require particular math skills, it is accessible to anyone with an interest and a certain amount of intuition. Even if rigorous model building is our ultimate goal, graphical methods still need to be the first step, so that we can develop a sense for the data, its behavior, and quality. Knowing this, we can then select the most appropriate formal methods. 1.2.3 Limitations of graphical analysis Of course, graphical analysis has limitations and its own share of problems. ■ ■ ■ Graphical analysis doesn’t scale.

pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms
by Mehmed Kantardzić
Published 2 Jan 2003

Combinatorial Approach. It is, in essence, a brute-force approach, where the search is performed across all possible combinations of independent variables to determine the best regression model. Irrespective of whether the sequential or combinatorial approach is used, the maximum benefit to model building occurs from a proper understanding of the application domain. Additional postprocessing steps may estimate the quality of the linear regression model. Correlation analysis attempts to measure the strength of a relationship between two variables (in our case this relationship is expressed through the linear regression equation).

Miner, Handbook of Statistical Analysis and Data Mining Applications, Elsevier Inc., Amsterdam, 2009. The book is a comprehensive professional reference book that guides business analysts, scientists, engineers, and researchers (both academic and industrial) through all stages of data analysis, model building, and implementation. The handbook helps one discern technical and business problems, understand the strengths and weaknesses of modern data-mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex data sets with novel statistical approaches and be able to objectively evaluate analyses and solutions.

This means that in smoothing there is a delay in producing the result at discrete time n. 3. Prediction. The task of prediction is to forecast data in the future. The aim is to derive information about what the quantity of interest will be like at some time n + n0 in the future, for n0 > 0, by using data measured up to and including time n. Prediction may be viewed as a form of model building in the sense that the smaller we make the prediction error, the better the network serves as a model of the underlying physical process responsible for generating the data. The block diagram of an ANN for a prediction task is given in Figure 7.9. Figure 7.9. Block diagram of an ANN-based prediction. 7.5 MULTILAYER PERCEPTRONS (MLPs) Multilayer feedforward networks are one of the most important and most popular classes of ANNs in real-world applications.

pages: 237 words: 82,266

You Say Tomato, I Say Shut Up
by Annabelle Gurwitch
Published 31 Aug 2010

Here is a list of things I’ve seen recommended for building intimacy: cleaning, shopping, folding laundry, building a snowman, dancing lessons, taxidermy, going to amusement parks, gardening, badminton, Bible study, coin collecting, and model building. Here is the list of things we don’t do as a couple: cleaning, shopping, folding laundry, building a snowman, dancing lessons, taxidermy, going to amusement parks, gardening, badminton, Bible study, coin collecting, and model building. Just living with someone can create a surfeit of closeness and transparency. If Jeff is any example, men prefer to operate on a need-to-know basis. For example, I happen to come from extremely hairless people; however, it’s true that a few hairs have appeared of late.

pages: 807 words: 154,435

Radical Uncertainty: Decision-Making for an Unknowable Future
by Mervyn King and John Kay
Published 5 Mar 2020

The early attempts foundered, as we have seen, on the failure to appreciate that apparently stable empirical relationships could suddenly break down when, for example, the government changes the nature of its policy intervention (the Lucas critique). The intellectual appeal of basing forecasts on a rigorous theoretical foundation describing the behaviour of individuals and the economy is easily understood. But the programme of model-building which searches for a stable underlying set of structural relationships could be made consistent with observations of the economy only by the introduction of the shocks and shifts about which nothing could usefully be said. The result was that phenomena such as the financial crisis or the Great Depression could be explained only in terms of unanticipated developments in technology or a sudden preference for leisure rather than work.

And these representations are the basis of the scientific advance which followed the formulation of Newtonian mechanics. The models that NASA has developed – based on the long-established and empirically verified equations of planetary motion, and the agency’s knowledge of the capabilities of its own rockets – represent the limits of human achievement in model building. Their map is not the territory, but it represents the relevant features of the territory sufficiently well that computer simulations reproduce more or less exactly the experience of the rocket in outer space. Such modelling is possible because of NASA’s knowledge of the solar system (it can be represented precisely by a relatively simple set of equations), because the agency is confident of the stationarity of that system, and because it is not necessary to anticipate how that system will respond to the agency’s actions.

A., Hurley, A. and Irby, J. E. (trans.), The Garden of Forking Paths (London: Penguin, 2018) Bowden, M., The Finish: The Killing of Osama bin Laden (New York: Atlantic Monthly Press, 2012) Bower, T., Branson (London: Fourth Estate, 2001) Box, G. E. P., ‘Robustness in the Strategy of Scientific Model Building’ (1979) in Launer, R. L. and Wilkinson, G. N. (eds), Robustness in Statistics (Cambridge, Massachusetts: Academic Press, 1979), 201–36 Bradley, H. in Stephen, L. (ed.), ‘Jedediah Buxton’, Dictionary of National Biography Vol. VIII (1886), 106 Brands, H. W., The General vs. The President (New York: Doubleday, 2016) Brearley, M., On Form (London: Little Brown, 2017) Brooks, B.

pages: 304 words: 88,773

The Ghost Map: A Street, an Epidemic and the Hidden Power of Urban Networks.
by Steven Johnson
Published 18 Oct 2006

We have already seen amazing advances in our understanding of the way genes build organisms, but the application of that understanding—particularly in the realm of medicine—is only starting to bear fruit. A decade or two from now, we may possess tools that will allow us to both analyze the genetic composition of a newly discovered bacterium and, using computer modeling, build an effective vaccine or antiviral drug in a matter of days. At that point, the primary issue will be production and delivery of the drugs. We’ll know how to make a cure for any rogue virus that shows up; the question will be whether we can produce enough supplies of the cure to stop the path of the disease.

A post-9/11 city could be built along similar lines: the density of traditional metropolitan space in distributed nodes limited to 50,000 to 100,000 people each, separated by expanses of low-density development: parkland, nature preserves, sports facilites, even vineyards where the climate allows. Such a model would reverse the Olmsted vision of urban greenery: rather than carve out a park in the middle of an immense city, the new model builds a space for nature on the edges of the city center—Peripheral Park, instead of Central. In medieval times, the walls protected the town population. In these theoretical settlements, the open spaces separating the nodes would keep the city safe. Imagine a city of 2 million people, built out of twenty nodes.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists
by Gary Marcus and Jeremy Freeman
Published 1 Nov 2014

Building the Brain The Brain Simulation Platform, led by HBP partners from EPFL, GRS-SIM, and the Royal Institute of Technology in Stockholm, will guide neuroscientists through the process of building models of proteins, cells, synapses, circuits, brain areas, and whole brains. At each step, a scientist will be prompted through the web interface to select the data, analysis methods and model-building methods necessary to construct the model. The building workflow will be populated by default parameters derived from the selected dataset, but these can be overridden so that the scientist is free to test hypotheses or examine “what-if” scenarios. For example, the workflow may populate a neural circuit with a high density of neurons taken from a normal brain, but the researcher using the platform may wish to examine the impact of reducing neuron density—as can occur during degenerative diseases such as Alzheimer’s.

If the model replicates experimental findings (that were not used to build the model in the first place), this is evidence the experimental findings can be explained by the measured data. However, if the model does not replicate a particular experiment, that result is also extremely informative—guiding the neuroscientist to acquire additional specific data to refine the model-building process. In either case, the model provides an important tool to test the relevance and impact of neuroscience data on a precise scientific question. Unifying Brain Models The core strategy within the Human Brain Project is to continually produce new releases of unifying brain models. A unifying brain model is the model that best accounts for all available data by reproducing the greatest array of experimental data.

pages: 125 words: 27,675

Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning
by Benjamin Bengfort , Rebecca Bilbro and Tony Ojeda
Published 10 Jun 2018

Data products that build models from natural language are a special case of machine learning as they enable increasingly novel human computer interaction. As data products have become more successful, there has been increasing interest in generally defining a machine learning workflow for more rapid model building. Usually the discussion of machine learning techniques separate workflows and their interaction with data management because they are loosely independent or dependent only on the algorithm selected. However, by combining these workflows into a single generalization, we are able leverage a much larger machine learning space, potentially creating global optimizations and automatic analyses that can be steered (guided) by experts.

pages: 700 words: 160,604

The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race
by Walter Isaacson
Published 9 Mar 2021

They were self-appointed jesters in a court of fools.”4 The Caltech biochemist Linus Pauling had just rocked the scientific world, and paved the way for his first Nobel Prize, by figuring out the structure of proteins using a combination of X-ray crystallography, his understanding of the quantum mechanics of chemical bonds, and Tinkertoy model building. Over their lunches at the Eagle, Watson and Crick plotted how to use the same tricks to beat Pauling in the race to discover the structure of DNA. They even had the tool shop of the Cavendish Lab cut tin plates and copper wires to represent the atoms and other components for the desktop model they planned to tinker with until they got all the elements and bonds correct

Wilkins, momentarily bonding with Franklin, told her that if they left for the station right away, they could make the 3:40 train back to London, which they did. Not only were Watson and Crick embarrassed; they were put in a penalty box. Word came down from Sir Lawrence that they were to stop working on DNA. Their model-building components were packed up and sent to Wilkins and Franklin in London. * * * Adding to Watson’s dismay was the news that Linus Pauling was coming over from Caltech to lecture in England, which would likely catalyze his own attempt to solve the structure of DNA. Fortunately, the U.S. State Department came to the rescue.

He had to climb over the back gate into his residential college, which had locked up for the night. The next morning, when he went into the Cavendish lab, he encountered Sir Lawrence Bragg, who had demanded that he and Crick steer clear of DNA. But confronted with Watson’s excited summary of what he had learned, and hearing of his desire to get back to model-building, Sir Lawrence gave his assent. Watson rushed down the stairs to the machine shop to set them to work on making a new set of components. Watson and Crick soon got more of Franklin’s data. She had submitted to Britain’s Medical Research Council a report on her work, and a member of the council shared it with them.

Thinking with Data
by Max Shron
Published 15 Aug 2014

In the case of the apartment prices and public transit, finding or plotting a map of apartment prices next to a base layer of transit connections is probably the easiest thing to do first. By looking at the map, we can see whether such a relationship seems plausible, and start to gain intuition for the problem of making scatterplots or building a model. Building exploratory scatterplots should precede the building of a model, if for no reason other than to check that the intuition gained from making the map makes sense. The relationships may be so obvious, or the confounders so unimportant, that the model is unnecessary. A lack of obvious relationships in pairwise scatterplots does not mean that a model of greater complexity would not be able to find signal, but if that’s what we’re up against, it is important to know it ahead of time.

Real-World Kanban
by Mattias Skarin
Published 23 Jun 2015

To notice them, you would have to observe events and behaviors for a longer period of time. What these five factors have in common is that they provide focus and direction for improvements and behaviors driven by the management team. “All models are wrong, some are useful,” statistician George E.P. Box wrote in Empirical Model-Building and Response Surfaces, and this is no exception. I’ve found models useful in helping to break out of vicious short-term cycles, so I share them with you in the following figure and table. report erratum • discuss Improve the Organization with Long-Term Thinking System enabler Improve the whole • 15 Examples • Improve lead time through the full value chain, not just through individual functions. • Test working user scenarios instead of individual functions. • Reduce total costs, not just IT costs.

pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control
by Stuart Russell
Published 7 Oct 2019

Machines are also subject to certain speed limits imposed by the real world on the rate at which new knowledge of the world can be acquired—one of the valid points made by Kevin Kelly in his article on oversimplified predictions about superhuman AI.53 For example, to determine whether a specific drug cures a certain kind of cancer in an experimental animal, a scientist—human or machine—has two choices: inject the animal with the drug and wait several weeks or run a sufficiently accurate simulation. To run a simulation, however, requires a great deal of empirical knowledge of biology, some of which is currently unavailable; so, more model-building experiments would have to be done first. Undoubtedly, these would take time and must be done in the real world. On the other hand, a machine scientist could run vast numbers of model-building experiments in parallel, could integrate their outcomes into an internally consistent (albeit very complex) model, and could compare the model’s predictions with the entirety of experimental evidence known to biology.

pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics)
by Trevor Hastie , Robert Tibshirani and Jerome Friedman
Published 25 Aug 2009

• Asymptotic likelihood theory says that if the model is correct, then β̂ is consistent (i.e., converges to the true β). • A central limit theorem then shows that the distribution of β̂ converges to N (β, (XT WX)−1 ). This and other asymptotics can be derived directly from the weighted least squares fit by mimicking normal theory inference. • Model building can be costly for logistic regression models, because each model fitted requires iteration. Popular shortcuts are the Rao score test which tests for inclusion of a term, and the Wald test which can be used to test for exclusion of a term. Neither of these require iterative fitting, and are based on the maximum-likelihood fit of the current model.

Therefore, the collection of basis functions is C = {(Xj − t)+ , (t − Xj )+ } t ∈ {x1j , x2j , . . . , xN j } j = 1, 2, . . . , p. (9.18) If all of the input values are distinct, there are 2N p basis functions altogether. Note that although each basis function depends only on a single Xj , for example, h(X) = (Xj − t)+ , it is considered as a function over the entire input space IRp . The model-building strategy is like a forward stepwise linear regression, but instead of using the original inputs, we are allowed to use functions from the set C and their products. Thus the model has the form f (X) = β0 + M X βm hm (X), (9.19) m=1 where each hm (X) is a function in C, or a product of two or more such functions.

At each stage we consider as a new basis function pair all products of a function hm in the model set M with one of the reflected pairs in C. We add to the model M the term of the form β̂M +1 hℓ (X) · (Xj − t)+ + β̂M +2 hℓ (X) · (t − Xj )+ , hℓ ∈ M, 9.4 MARS: Multivariate Adaptive Regression Splines 323 X1 Constant X2 Xp X1 X2 X2 Xp X1 X2 X1 X2 Xp FIGURE 9.10. Schematic of the MARS forward model-building procedure. On the left are the basis functions currently in the model: initially, this is the constant function h(X) = 1. On the right are all candidate basis functions to be considered in building the model. These are pairs of piecewise linear basis functions as in Figure 9.9, with knots t at all unique observed values xij of each predictor Xj .

Robot Futures
by Illah Reza Nourbakhsh
Published 1 Mar 2013

The desire of people—their entire purchasing 16 Chapter 1 trajectory—becomes deeply influenced by the idiosyncrasies of which raw materials are cheap, which overstocks the company has, and which currency exchange rates happen to be favorable this month. In the context of today’s presumptions about personal privacy, this level of consumer profiling, model building, and customized interaction may seem unacceptable. But the transition to this scenario does not happen overnight, and as we have seen privacy eroded for economic gain for years, so there is every reason we can expect this trend to continue. Much of the marketing optimization outlined in this chapter can be performed without associating actual identities with people—just by aggregating across types of people using estimates such as age, gender, and profession.

pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders
by Mariya Yao , Adelyn Zhou and Marlene Jia
Published 1 Jun 2018

Based on these profiles, your NLG solution can create similar content that is tailored to specific platforms and to individual user groups, increasing the likelihood that the targeted audience will engage with your ads. Data about click-through rates can then be fed back to fine-tune your model, building more detailed profiles of how targeted users respond to certain types of messaging. Unlike older A/B testing methods that optimize for statistical significance, AI solutions can be used in real-time, continuous optimization. They work well when you are frequently testing new versions of ads. For example, bandit algorithms, a type of semi-supervised machine learning, are often used to optimize pricing, creative, and placement decisions in digital advertising.

pages: 403 words: 111,119

Doughnut Economics: Seven Ways to Think Like a 21st-Century Economist
by Kate Raworth
Published 22 Mar 2017

London: University of Chicago Press, p. 46. 37. Goffmann, E. (1974) Frame Analysis: An Essay on the Organization of Experience. New York: Harper & Row. 38. Keynes, J.M. (1961) The General Theory of Employment, Interest and Money. London: Macmillan & Co., p. viii. 39. Box, G. and Draper, N. (1987) Empirical Model Building and Response Surfaces. New York: John Wiley & Sons, p. 424. 40. Lakoff, G. (2014) The All New Don’t Think of an Elephant. White River Junction, VT: Chelsea Green. 41. Tax Justice Network, www.taxjustice.net and Global Alliance for Tax Justice, www.globaltaxjustice.org 1. Change the Goal 1.

Bowen, A. and Hepburn, C. (2012) ‘Prosperity With Growth: Economic Growth, Climate Change and Environmental Limits’, Centre for Climate Change Economic and Policy Working Paper no. 109 Bowles, S. and Gintis, H. (2011) A Cooperative Species: Human Reciprocity and Its Evolution. Princeton: Princeton University Press. Box, G. and Draper, N. (1987) Empirical Model Building and Response Surfaces. New York: John Wiley & Sons. Boyce, J. K. et al. (1999) ‘Power distribution, the environment, and public health: a state-level analysis’, Ecological Economics 29: 127–140. Braungart, M. and McDonough, W. (2009) Cradle to Cradle: Re-making the Way We Make Things. London: Vintage Books.

pages: 124 words: 40,697

The Grand Design
by Stephen Hawking and Leonard Mlodinow
Published 14 Jun 2010

Fortunately, the human brain processes that data, combining the input from both eyes, filling in gaps on the assumption that the visual properties of neighboring locations are similar and interpolating. Moreover, it reads a two-dimensional array of data from the retina and creates from it the impression of three-dimensional space. The brain, in other words, builds a mental picture or model. The brain is so good at model building that if people are fitted with glasses that turn the images in their eyes upside down, their brains, after a time, change the model so that they again see things the right way up. If the glasses are then removed, they see the world upside down for a while, then again adapt. This shows that what one means when one says “I see a chair” is merely that one has used the light scattered by the chair to build a mental image or model of the chair.

Scikit-Learn Cookbook
by Trent Hauck
Published 3 Nov 2014

www.it-ebooks.info scikit-learn Cookbook Over 50 recipes to incorporate scikit-learn into every step of the data science pipeline, from feature extraction to model building and model evaluation Trent Hauck BIRMINGHAM - MUMBAI www.it-ebooks.info scikit-learn Cookbook Copyright © 2014 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented.

pages: 174 words: 42,316

Look Evelyn, Duck Dynasty Wiper Blades. We Should Get Them.: A Collection of New Essays
by David Thorne
Published 3 Dec 2014

They had a comfortable couch with a pair of earphones and even though I never bought anything, the owner, a large bearded man named Maurice, didn’t care how long I stayed. Maurice didn’t like people, especially people who took records out of sleeves or asked about Compact Discs, but we got on alright. He was really into model building and after closing one night, I helped him assemble and paint a plastic ED209 robot. I wasn’t actually allowed to assemble or paint anything but I handed him things when he asked and we chatted about our favourite science fiction authors. He gave me a copy of Ender’s Game to read. The next day, when Maurice was busy yelling at a customer and the phone rang, he asked me to answer it.

Governing the Commons: The Evolution of Institutions for Collective Action
by Elinor Ostrom
Published 29 Nov 1990

Fourth, the solutions presented for "the" government to impose are themselves based on models of idealized markets or idealized states. We in the social sciences face as great a challenge in how to address the analysis of CPR problems as do the communities of people who struggle with ways to avoid CPR problems in their day-to-day lives. The theoretical enterprise requires social scientists to engage in model-building,1° but not theoretical inquiry to that specific level of discourse. We need to appreciate the analytical power that can be derived from the prior in­ tellectual efforts of important contributors such as Hobbes, Montesquieu, Madison, Hamilton, Tocqueville, and many others.21 Con­ temporary studies in the theory of public and social choice, the economics of transactions costs, the new institutional economics, law and economics, game theory, and many related fields22 are making important contributions that need to be carried forward in theoretically informed empirical in­ quiries in both laboratory and field settings. 216 Notes 1.

pages: 190 words: 52,865

Full Stack Web Development With Backbone.js
by Patrick Mulder
Published 18 Jun 2014

Send email to index@oreilly.com. 165 Backbone.Collection, sorting/filtering models with, 61–71 Backbone.js dependencies, 2 distributed application design, 6 fetching local copy of, 4 fetching via content delivery networks, 5 fetching with Node’s package manager, 2 philosophy of, 2, 145 Backbone.Model building a data layer, 26 data resolution, 88 DRYer Views and ViewModels, 46 modal view, 125 sorting, 62 wrapping a data store, 101 Backbone.ModelBinder, 39 Backbone.Obscura, 68, 137 Backbone.Router addressing state, 49–55 orchestrating views, 55–60 overview of, 49 Backbone.Sync, 84, 87 Backbone.View basic events, 31 basic rendering, 37 basic view templates, 41 DRYer Views and ViewModels, 46 filtering, 66 handling UI events, 43 modal view, 125 navbar view, 123 parent/child views, 56 rendering a collection, 42 sorting, 62 templates, 74 Backburner, 76 backend-as-a-service providers, 94, 98 bind function, 159 bindAll function, 159 binding, 39 Bluebird library, 103 Bower, 136 Browserify, 10, 29, 136 browsers development console, 15 DOM representation in, 161 packaging modules for, 9 166 | Index security in, 113 (see also authentication) browsing experience mock-up, 19 Brunch, 136 build automation goals of, 77 Grunt, 77 overview of, 135 scaffolding components, 143 tools to improve productivity, 135 Yeoman, 138 C callbacks, 103 Catero, 136 Cavage, Mark, 100 chaining methods, 161 change events, 28 Chaplin framework, 136, 146 child views, 56 className property, 37 click dummy basic CSS for, 25 basic events, 31 basic HTML for, 24 data layer, 26 preparation overview, 24 Cloudflare, 5 Cocoa API, 22 Codepen.io, 5 CoffeeScript, 136 collection helpers, 161 collections filtering, 21, 66 pagination, 68 sorting, 21, 62 transforming, 61 Underscore.js helpers, 158 command line interface (CLI) benefits of, 1 bundling modules from, 10 npm (Node package manager), 2 CommonJS modules benefits of, 8 Browserify, 10 Cartero management system, 136 Express.js and Stitch, 13 require in browsers, 9 comparator function, 62 content delivery network (CDN), 5 controllers, 24, 55 convention-over-configuration, 147 cookies drawbacks of, 115, 118 overview of, 114 session management, 118 user signup, 116 CORS (cross origin resource sharing), 99 createUser method, 116 cross-site request forgery (CSRF), 114 cross-site scripting (XSS), 114 D data binding, 39 building the data layer, 26 controlling access to, 113 (see also authentication) representation with models, 21 transforming with Underscore.js, 158 databases non-relational, 98 NoSQL, 98 relational, 98 wrapping data stores behind an API, 101 debuggers, 15 Decker, Kevin, 145 default properties, 27 dependencies managing with Bower, 136 resolving with main.js file, 141 reusing across projects, 8 Underscore.js, 158–160 Document Object Model (DOM) changing multiple nodes at once, 76 manipulation libraries, 2 node types, 161 statelessness and, 19 DOM nodes attaching/removing event handlers, 162 chaining methods on, 161 operating on directly, 161 preventing event bubbling, 162 selecting with jQuery, 161 types of, 161 DRYer views, 46 E Eastridge, Ryan, 145 ECO (embedded CoffeeScript), 75 event bubbling, 162 event handlers attaching/removing, 162 for UI events, 43 event listeners, 39 events change events, 28 default, 31 handling UI events, 43 sources of, 21, 31 Express.js, 13, 100 extend function, 27, 160 F fetching information asynchronous effects, 92 from hosted backend services, 94 overview of, 83, 87 RESTful web service handling, 84 filtering, 66 Firebase, 94 frameworks benefits of, 145 Chaplin, 146 Giraffe, 146 Junior, 146 Marionette, 146 Rendr, 146 Thorax.js, 146 Function.prototype.bind, 159 functional programming, 158 functions binding context to, 159 get, 28 private, 28 set, 28 sharing across multiple objects, 160 G get function, 28 Giraffe, 146 Grunt, 77 Index | 167 H Handlebars, 76 hashes/hashbangs, 50 Homebrew package manager, 157 HTTP requests basic verbs, 84 cookies, 115 sending from JavaScript, 163 signing, 114 HTTP responses, 102 I index.html, 9 inheritance, 160 isomorphic application design, 97 J JavaScript adding moudles from command line, 143 Ajax, 163 basic abstractions for Backbone.js, 1 debugging, 15 distributed application design, 6 HTTP requests from, 163 jQuery basics of, 160 element selection, 161 event handling, 162 Node.js installation, 157 overview of, 157 promises, 102 Underscore.js benefits of, 158 collections/arrays, 158 functions, 159 objects, 160 utility functions, 160 (see also objects) jQuery Ajax browsing experience mock-up, 19 jQuery API for, 163 basics of, 160 chaining methods, 161 collection helpers, 161 element selection, 161 event handling, 162 168 | Index node wrappers, 161 referencing, 35 JSBin, 5 JSFiddle, 5 JSLint/JSHint, 16 JST (JavaScript Templates), 74 Junior, 146 K key-value pairs data representation with, 21 primary keys, 107 syntax considerations for, 28 L LAMP (Linux-Apache-MySQL-PHP), 98 Layout View, 55 Linux, Node.js installation, 157 M Mac OS Homebrew package manager, 157 Node.js installation, 157 main.js file, 141 Marionette, 146 Mincer, 13 mixin functions, 46 mock-ups APIs, 85 browsing experience, 19 data, 149 wireframes, 19 Mockjax, 149 modal view, 125 model parameter, 29 model-view-controller (MVC) pattern, 22 models (see Backbone models) modules Browserify, 10 bundling from command line, 10 choosing, 8 CommonJS, 8 packaging for browsers, 9 RequireJS, 142 Morell, Jeremy, 68 Munich Cinema example API creation, 100 click dummy preparation basic CSS, 25 basic events, 31 basic HTML, 24 data layer, 26 overview of, 24 current web page, 18 preliminary mock-up, 19 project goals, 18 star rating/voting system, 108 synchronizing state in basic sync and fetch, 87 fetching remote movies, 84 goals of, 83 user interface DRYer views/ViewModels, 46 goals for, 35 handling UI events, 43 interfacing the DOM, 36–43 referencing jQuery, 35 N Navbar view, 123 navigate function, 54 navigation view (navbar), 123 NeXTSTEP operating system, 22 noBackend providers, 94, 98 Node.js installation of, 157 package manager, 2 read-eval-print-loop (REPL), 15 nodes (see DOM nodes) non-relational data stores, 98 npm (Node package manager), 2, 8 O object-relational-mapper (ORM), 98 objects customizing interfaces of, 160 rendering within templates, 160 open-source software, 4 P package managers, 13 pagination, 68 parent views, 56 passwords, 113 (see also authentication) persistence, 101, 108 primary keys, 107 private functions, 28 productivity, improving, 135 (see also workflow automation) promises, 103 proxies, 98 publish-subscribe pattern, 31 pushState(), 50 R React.js, 77 read-eval-print-loop (REPL), 15, 29 relational databases, 98 render function, 37 Rendr, 146 representations in RESTful web services, 85 with models, 21 RequireJS adding modules, 142 benefits of, 140 main.js file, 141 RESTful web services, 84 Restify library, 100 router basics addressing state defining routes, 51 goal of, 49 navigating, 54 preparing, 50 orchestrating views Layout View, 55 parent/child views, 56 overview of, 49 S security, 113 (see also authentication) session management Backbone applications API calls, 118 login dialog, 129 modal view, 125 navbar view, 123 cookies, 118 Index | 169 creating new, 131 logout, 132 set function, 28 signing requests approaches to, 114 benefits of, 114 sorting, 62 Sprockets, 13 state addressing with routers defining routes, 51 goal of, 49 navigating, 54 preparing, 50 authentication and, 131 decoupling from UI benefits of, 22 models and collections, 21 MVC pattern, 22 need for, 19 views, 22 synchronizing basic sync and fetch, 87 fetching remote information, 84 overview of, 83 statelessness, 19, 84 Stitch, 13 T tagName property, 37 template property, 41, 75 templates embedded CoffeeScript, 75 Handlebars, 76 JavaScript Templates, 74 overview of, 73 Thorax.js benefits of, 145 getting started application initialization, 150 build tasks, 147 installation/setup of, 147 mock data preparation, 149 overview of, 146 rendering advanced views, 154 Router setup, 152 Thorax.Collection, 152 TodoMVC demo, 24 tokens, access, 114 170 | Index U Ubuntu, Node.js installation, 157 Underscore.js benefits of, 158 collections/arrays, 158 functions, 159 objects, 160 utility functions, 160 user interface decoupling from state benefits of, 22 models and collections, 21 MVC pattern, 22 need for, 19 views, 22 DRYer views/ViewModels, 46 goals for, 35 handling UI events, 43 interfacing the DOM basic rendering, 37 basic view templates, 41 bindings to data changes, 39 rendering a collection, 42 strategy overview, 36 referencing jQuery, 35 V ViewModels, 46 views advanced view templates, 73 Backbone views, 22 data display management with, 23 DRYer view/ViewModels, 46 Layout View, 55 modal view, 125 MVC pattern, 22 navbar view, 123 parent/child views, 56 updating immediately, 39 welcome view, 59 vulnerabilities cross-site request forgery (CSRF), 114 cross-site scripting (XSS), 114 W Walmart’s shopping cart, 147 welcome view, 59 Windows, Node.js installation, 157 wireframes benefits of, 19 creating, 18 workflow, automation of (see build automation) X benefits of, 136, 138 installation of, 138 running, 139 Z Zepto library, 2, 160 XMLHttpRequest object, 163 Y Yeoman application directory, 140 Index | 171 About the Author Before discovering software development for web applications with Java and Ruby in 2008, Patrick Mulder mainly worked as a software engineer on measurement equip‐ ment and electronic devices.

pages: 560 words: 135,629

Eloquent JavaScript: A Modern Introduction to Programming
by Marijn Haverbeke
Published 15 Nov 2018

Exercises Keyboard Bindings Efficient Drawing Circles Proper Lines PART III: NODE 20 NODE.JS Background The node Command Modules Installing with NPM Package Files Versions The File System Module The HTTP Module Streams A File Server Summary Exercises Search Tool Directory Creation A Public Space on the Web 21 PROJECT: SKILL-SHARING WEBSITE Design Long Polling HTTP Interface The Server Routing Serving Files Talks as Resources Long Polling Support The Client HTML Actions Rendering Components Polling The Application Exercises Disk Persistence Comment Field Resets 22 JAVASCRIPT AND PERFORMANCE Staged Compilation Graph Layout Defining a Graph Force-Directed Layout Avoiding Work Profiling Function Inlining Creating Less Garbage Garbage Collection Dynamic Types Summary Exercises Pathfinding Timing Optimizing EXERCISE HINTS Chapter 2: Program Structure Looping a Triangle FizzBuzz Chessboard Chapter 3: Functions Minimum Recursion Bean Counting Chapter 4: Data Structures: Objects and Arrays The Sum of a Range Reversing an Array A List Deep Comparison Chapter 5: Higher-Order Functions Everything Dominant Writing Direction Chapter 6: The Secret Life of Objects A Vector Type Groups Iterable Groups Borrowing a Method Chapter 7: Project: A Robot Measuring a Robot Robot Efficiency Persistent Group Chapter 8: Bugs and Errors Retry The Locked Box Chapter 9: Regular Expressions Quoting Style Numbers Again Chapter 10: Modules A Modular Robot Roads Module Circular Dependencies Chapter 11: Asynchronous Programming Tracking the Scalpel Building Promise.all Chapter 12: Project: A Programming Language Arrays Closure Comments Fixing Scope Chapter 14: The Document Object Model Build a Table Elements by Tag Name The Cat’s Hat Chapter 15: Handling Events Balloon Mouse Trail Tabs Chapter 16: Project: A Platform Game Pausing the Game A Monster Chapter 17: Drawing on Canvas Shapes The Pie Chart A Bouncing Ball Precomputed Mirroring Chapter 18: HTTP and Forms Content Negotiation A JavaScript Workbench Conway’s Game of Life Chapter 19: Project: A Pixel Art Editor Keyboard Bindings Efficient Drawing Circles Proper Lines Chapter 20: Node.js Search Tool Directory Creation A Public Space on the Web Chapter 21: Project: Skill-Sharing Website Disk Persistence Comment Field Resets Chapter 22: JavaScript and Performance Pathfinding Optimizing INDEX For Lotte and Jan “We think we are creating the system for our own purposes.

If it does, set it to the result of evaluating the second argument to set and then return that value. If the outermost scope is reached (Object.getPrototypeOf returns null) and we haven’t found the binding yet, it doesn’t exist, and an error should be thrown. Chapter 14: The Document Object Model Build a Table You can use document.createElement to create new element nodes, document.createTextNode to create text nodes, and the appendChild method to put nodes into other nodes. You’ll want to loop over the key names once to fill in the top row and then again for each object in the array to construct the data rows.

pages: 463 words: 140,499

The Tyranny of Nostalgia: Half a Century of British Economic Decline
by Russell Jones
Published 15 Jan 2023

The predictions of some basic models can be as accurate as those of more complex alternatives. Model building therefore inevitably involves considerable trial and error, or ‘learning by doing’. It is a work in progress, and economists would benefit from being more outward looking, and seeking to work with engineers, climate scientists, computer scientists and ecologists to better understand the challenges facing society. How, for example, will the development of artificial intelligence or quantum computing improve the process of econometric model building? There are, however, certain fundamental characteristics that the most effective models embody.

pages: 183 words: 49,460

Start Small, Stay Small: A Developer's Guide to Launching a Startup
by Rob Walling
Published 15 Jan 2010

The focus of this book is building and launching a successful software, web or mobile startup with no external funding. This process includes: Developing the proper mindset for a self-funded startup Understanding the Market-First Approach Finding and testing a niche market Choosing the optimal platform, price and revenue model Building a killer sales website Understanding the primary purpose of your sales website Building the right kind of interest, and thus driving the right kind of traffic, to your website Learning how to outsource Working with virtual assistants Determining what to do after launch: do you grow the business or start over?

pages: 167 words: 50,652

Alternatives to Capitalism
by Robin Hahnel and Erik Olin Wright

An alternative impulse is to enunciate the basic values that animate the search for alternatives and the core principles of institutional design that would facilitate a realization of those values, but not attempt a comprehensive, integrated design model of the alternative system as a whole. Both of these strategies have value. The detailed model-building strategy is useful and sometimes inspiring, so long as one treats these as speculative ideas to inform the messy trial-and-error experimentation of emancipatory social transformation rather than blueprints. The more open-ended discussion of general principles and values can help give us a sense of the direction we want to move and provide a basis for critical evaluation of our experiments, but provides less clarity of what it might be like to live in the destination itself.

Learning Flask Framework
by Matt Copperwaite and Charles Leifer
Published 26 Nov 2015

In this chapter we shall: • Install Flask-Admin and add it to our website • Add views for working with the Entry, Tag, and User models • Add a view for managing the website's static assets • Integrate the admin with the Flask-Login framework • Create a column to identify a user as an administrator • Create a custom index page for the admin dashboard Installing Flask-Admin Flask-Admin provides a readymade admin interface for Flask applications. FlaskAdmin also integrates nicely with SQLAlchemy to provide views for managing your application's models. Building an Administrative Dashboard The following image gives is a sneak preview of what the Entry admin will look like by the end of this chapter: While this amount of functionality requires relatively little code, we still have a lot to cover, so let's get started. Begin by installing Flask-Admin into virtualenv using pip.

Presentation Zen
by Garr Reynolds
Published 15 Jan 2012

Use diagrams and models rather than bulleted lists; smaller photographs rather than full-screen pictures; plain backgrounds rather than corporate templates; and icons rather than words. 4. “Build” your slides. Build up your complex slides as you’re talking about them. If you’re showing a graph, start with the axes, then the labels, then the bars or lines, then the highlighted points. If you’re showing a model, build it up step by step. It’s easy to do this in PowerPoint using the Custom Animation feature (but don’t use fancy animation—just let each part “appear”). Or, simply use a series of slides that build up to the final picture. 5. Get them active. Make your webinars active and interactive. Your audience is attending a live event, so involve them in it.

Analysis of Financial Time Series
by Ruey S. Tsay
Published 14 Oct 2001

As an illustration, consider the quarterly growth rate of U.S. real gross national product (GNP), seasonally adjusted, from the second quarter of 1947 to the first quarter of 1991. This series is used in Chapter 4 as an example of nonlinear economic time series. Here we simply employ an AR(3) model for the data. Denoting the growth rate by rt , we can use the model building procedure of the next subsection to estimate the model. The fitted model is 34 LINEAR TIME SERIES ANALYSIS AND ITS APPLICATIONS 0.5 acf -1.0 -0.5 0.0 acf -1.0 -0.5 0.0 0.5 1.0 (c) 1.0 (a) 0 5 10 lag 15 20 0 5 10 lag 20 15 20 0.5 acf -1.0 -0.5 0.0 acf -1.0 -0.5 0.0 0.5 1.0 (d) 1.0 (b) 15 0 5 10 lag 15 20 0 5 10 lag Figure 2.4.

Such an identifiability problem is serious because, without proper constraints, the likelihood function of a vector ARMA(1,1) model for the data is not uniquely defined, resulting in a situation similar to the exact multicollinearity in a regression analysis. This type of identifiability problem can occur in a vector model even if none of the components is a white noise series. These two simple examples highlight the new issues involved in the generalization to VARMA models. Building a VARMA model for a given data set thus requires some attention. In the time series literature, methods of structural specification have been proposed to overcome the identifiability problem; see Tiao and Tsay (1989), Tsay (1991), and the references therein. We do not discuss the detail of structural specification here because VAR and VMA models are sufficient in most financial applications.

pages: 239 words: 68,598

The Vanishing Face of Gaia: A Final Warning
by James E. Lovelock
Published 1 Jan 2009

Computer models are so helpful that before long many biologists and geologists put their field equipment in store and began a new life working with their models pretending that they were the real world. This Pygmalion fate – falling in love with the model – is all too easy, as generations of the young and old playing their computer games have found. Gradually the world of science has evolved to the dangerous point where model‐building has precedence over observation and measurement, especially in Earth and life sciences. In certain ways modelling by scientists has become a threat to the foundation on which science has stood: the acceptance that nature is always the final arbiter and that a hypothesis must always be tested by experiment and observation in the real world.

pages: 218 words: 67,330

Kelly: More Than My Share of It All
by Clarence L. Johnson
Published 1 Jan 1985

The total flight time for the Marquardt ramjet at the time was not over seven hours, obtained mainly on the ramjet test vehicle for the Boeing Bomarc missile. This test vehicle, the X-7, had been built and operated by the Skunk Works. On August 29, 1959, our A-12 design, the twelfth in the series, was declared the winner and Bissell gave us a limited go-ahead. We were to conduct tests on models, build a full-scale mockup, and investigate specific electronic features of the airplane over a four-month period. On January 30, 1960, we were given a full go-ahead for design, manufacture, and test of 12 aircraft. The code name was Oxcart, a name selected from a list of deliberately deceptive identifications.

pages: 204 words: 66,619

Think Like an Engineer: Use Systematic Thinking to Solve Everyday Challenges & Unlock the Inherent Values in Them
by Mushtak Al-Atabi
Published 26 Aug 2014

Crashing their plane 7 times and changing their wing design 200 times, the levels of failures that they encountered were so disheartening that Wilbur Wright declared in 1901: "Not within a thousand years will man ever fly." However, we all know that they prevailed and eventually created history. It is important to mention here that when we are operating at the border between record levels of performance and uncharted territories (the outer two circles of the Performance Growth Model), building momentum through the attempt-fail cycle may take more than one lifetime or a career before achieving success. Sometimes those who start the cycle may not be around when success is reached. For example, scientists have been working very hard for the past few decades trying to find a cure or a vaccine for AIDS.

pages: 257 words: 64,763

The Great American Stickup: How Reagan Republicans and Clinton Democrats Enriched Wall Street While Mugging Main Street
by Robert Scheer
Published 14 Apr 2010

Treasury secretary, he still did not understand the inner workings of the unregulated derivatives market that had made him a very rich man but would impoverish many others. “They took the money and ran” is the best way to describe the decade of irresponsible greed in which the top CEOs entrusted subordinates, claiming expertise in mathematical model building and market arbitrage, to do their wizardry, no questions asked. That story has been well documented in the case of the much-disgraced executives of Citigroup and AIG, but it remained for Paulson’s memoir to confirm that the abysmal ignorance extended to the highest reaches of even mighty Goldman Sachs.

pages: 234 words: 68,798

The Science of Storytelling: Why Stories Make Us Human, and How to Tell Them Better
by Will Storr
Published 3 Apr 2019

Lewis implored a young writer in 1956, ‘instead of telling us a thing was “terrible”, describe it so that we’ll be terrified. Don’t say it was “delightful”; make us say “delightful” when we’ve read the description.’ The abstract information contained in adjectives such as ‘terrible’ and ‘delightful’ is thin gruel for the model-building brain. In order to experience a character’s terror or delight or rage or panic or sorrow, it has to make a model of it. By building its model of the scene, in all its vivid and specific detail, it experiences what’s happening on the page almost as if it’s actually happening. Only that way will the scene truly rouse our emotions.

Bulletproof Problem Solving
by Charles Conn and Robert McLean
Published 6 Mar 2019

Chapter Six Big Guns of Analysis In Chapter 5 we introduced a set of first‐cut heuristics and root cause thinking to make the initial analysis phase of problem solving simpler and faster. We showed that you can often get good‐enough analytic results quickly for many types of problems with little mathematics or model building. But what should you do when faced with a complex problem that really does require a robustly quantified solution? When is it time to call in the big guns—Bayesian statistics, regression analysis, Monte Carlo simulation, randomized controlled experiments, machine learning, game theory, or crowd‐sourced solutions?

pages: 296 words: 66,815

The AI-First Company
by Ash Fontana
Published 4 May 2021

A/B test, 271 accessibility of data, 72, 107 accuracy, 175, 203–4 in proof of concept phase, 59–60 active learning-based systems, 94–95 acyclic, 150, 271 advertising, 227, 240 agent-based models (ABMs), 103–5, 271 simulations versus, 105 aggregated data, 81, 83 aggregating advantages, 222–65 branding and, 255–56 data aggregation and, 241–45 on demand side, 225 disruption and, 239–41 first-mover advantage and, 253–55 and integrating incumbents, 244–45 and leveraging the loop against incumbents, 256–61 positioning and, 245–56 ecosystem, 251–53 staging, 249–51 standardization, 247–51 storage, 246–47 pricing and, 236–39 customer data contribution, 237 features, 238–39 transactional, 237, 281 updating, 238 usage-based, 237–38, 281 on supply side, 224–25 talent loop and, 260–61 traditional forms of competitive advantage versus, 224–25 with vertical integration, see vertical integration aggregation theory, 243–44, 271 agreement rate, 216 AI (artificial intelligence), 1–3 coining of term, 5 definitions and analogies regarding, 15–16 investment in, 7 lean, see Lean AI AI-First Century, 3 first half of (1950–2000), 3–9 cost and power of computers and, 8 progression to practice, 5–7 theoretical foundations, 4–5 second half of (2000–2050), 9 AI-First companies, 1, 9, 10, 44 eight-part framework for, 10–13 learning journey of, 44–45 AI-First teams, 127–42 centralized, 138–39 decentralized, 139 management of, 135–38 organization structure of, 138–39 outsourcing, 131 support for, 134–35 when to hire, 130–32 where to find people for, 133 who to hire, 128–30 airlines, 42 Alexa, 8, , 228, 230 algorithms, 23, 58, 200–201 evolutionary, 150–51, 153 alliances of corporate and noncorporate organizations, 251 Amazon, 34, 37, 84, 112, 226 Alexa, 8, 228, 230 Mechanical Turk, 98, 99, 215 analytics, 50–52 anonymized data, 81, 83 Apple, 8, 226 iPhone, 252 application programming interfaces (APIs), 86, 118–22, 159, 172, 236, 271 applications, 171 area underneath the curve (AUC), 206, 272 artificial intelligence, see AI artificial neural network, 5 Atlassian Corporation, 243 augmentation, 172 automation versus, 163 availability of data, 72–73 Babbage, Charles, 2 Bank of England, 104–5 Bayesian networks, 150, 201 Bengio, Yoshua, 7 bias, 177 big-data era, 28 BillGuard, 112 binary classification, 204–6 blockchain, 109–10, 117, 272 Bloomberg, 73, 121 brain, 5, 15, 31–32 shared, 31–33 branding, 256–57 breadth of data, 76 business goal, in proof of concept phase, 60 business software companies, 113 buying data, 119–22 data brokers, 119–22 financial, 120–21 marketing, 120 car insurance, 85 Carnegie, Andrew, 226 cars, 6, 254 causes, 145 census, 118 centrifugal process, 49–50 centripetal process, 50 chess, 6 chief data officer (CDO), 138 chief information officer (CIO), 138 chief technology officer (CTO), 139 Christensen, Clay, 239 cloud computing, 8, 22, 78–79, 87, 242, 248, 257 Cloudflare, 35–36 clustering, 53, 64, 95, 272 Coase, Ronald, 226 compatibility, 251–52 competitions, 117–18 competitive advantages, 16, 20, 22 in DLEs, 24, 33 traditional forms of, 224–25 see also aggregating advantages complementarity, 253 complementary data, 89, 124, 272 compliance concerns, 80 computer chips, 7, 22, 250 computers, 2, 3, 6 cost of, 8–9 power of, 7, 8, 19, 22 computer vision, 90 concave payoffs, 195–98, 272 concept drift, 175–76, 272 confusion matrix, 173–74 consistency, 256–57 consultants, 117–18, 131 consumer apps, 111–13, 272 consumer data, 109–14 apps, 111–13 customer-contributed data versus, 109 sensor networks, 113–14 token-based incentives for, 109–10 consumer reviews, 29, 43 contractual rights, 78–82 clean start advantage and, 78–79 negotiating, 79 structuring, 79–82 contribution margin, 214, 272 convex payoffs, 195–97, 202, 272 convolutional neural networks (CNNs), 151, 153 Conway, John, 104 cost of data labeling, 108 in ML management, 158 in proof of concept phase, 60 cost leadership, 272 DLEs and, 39–41 cost of goods sold (COGS), 217 crawling, 115–16, 281 Credit Karma, 112 credit scores, 36–37 CRM (customer relationship management), 159, 230–31, 255, 260, 272 Salesforce, 159, 212, 243, 248, 258 cryptography, 272 crypto tokens, 109–10, 272 CUDA, 250 customer-generated data, 77–91 consumer data versus, 109 contractual rights and, 78–82 clean start advantage and, 78–79 negotiating, 79 structuring, 79–82 customer data coalitions, 82–84 data integrators and, 86–89 partnerships and, 89–91 pricing and, 237 workflow applications for, 84–86 customers costs to serve, 242 direct relationship with, 242 needs of, 49–50 customer support agents, 232, 272 customer support tickets, 260, 272 cybernetics, 4, 273 Dark Sky, 112, 113 DARPA (Defense Advanced Research Projects Agency), 5 dashboards, 171 data, 1, 8, 69, 273 aggregation of, 241–45 big-data era, 28 complementary, 89 harvesting from multiple sources, 57 incomplete, 178 information versus, 22–23 missing sources of, 177 in proof of concept phase, 60 quality of, 177–78 scale effects with, 22 sensitive, 57 starting small with, 56–58 vertical integration and, 231–32 data acquisition, 69–126, 134 buying data, 119–22 consumer data, 109–14 apps, 111–13 customer-contributed data versus, 109 sensor networks, 113–14 token-based incentives for, 109–10 customer-generated data, see customer-generated data human-generated data, see human-generated data machine-generated data, 102–8 agent-based models, 103–5 simulation, 103–4 synthetic, 105–8 partnerships for, 89–91 public data, 115–22 buying, see buying data consulting and competitions, 117–18 crawling, 115–16, 281 governments, 118–19 media, 118 valuation of, 71–77 accessibility, 72, 107 availability, 72–73 breadth, 76 cost, 73 determination, 74–76 dimensionality, 75 discrimination, 72–74 fungibility, 74 perishability and relevance, 74–75, 201 self-reinforcement, 76 time, 73–74 veracity, 75 volume of, 76–77 data analysts, 128–30, 132, 133, 137, 273 data as a service (DaaS), 116, 120 databases, 258 data brokers, 119–22 financial, 120–21 marketing, 120 data cleaning, 162–63 data distribution drift, 178 data drift, 176, 273 data-driven media, 118 data engineering, 52 data engineers, 128–30, 132, 133, 137, 161, 273 data exhaust, 80, 257–58, 273 data infrastructure engineers, 129–32, 137, 273 data integration and integrators, 86–90, 276 data labeling, 57, 58, 92–100, 273 best practices for, 98 human-in-the-loop (HIL) systems, 100–101, 276 management of, 98–99 measurement in, 99–100 missing labels, 178 outsourcing of, 101–2 profitability metrics and, 215–16 tools for, 93–97 data lake, 57, 163 data learning effects (DLEs), 15–47, 48, 69, 222, 273 competitive advantages of, 24, 33 data network effects, 19, 26–33, 44, 273 edges of, 24 entry-level, 26–29, 31–33, 36–37, 274 network effects versus, 24–25 next-level, 26–27, 29–33, 36–37, 278 what type to build, 33 economies of scale in, 34 formula for, 17–20 information accumulation and, 21 learning effects and, 20–21 limitations of, 21, 42–43 loops around, see loops network effects and, 24–26 powers of, 34–42 compounding, 36–38 cost leadership, 39–41 flywheels, 37–38 price optimization, 41–42 product utility, 35–36 winner-take-all dynamics, 34–35 product value and, 39 scale effects and, 21–23 variety and, 34–35 data learning loops, see loops data lock-in, 247–48 data networks, 109–10, 143–44, 273 normal networks versus, 26 underneath products, 25–26 data pipelines, 181, 216 breaks in, 87, 181 data platform, 57 data processing capabilities (computing power), 7, 8, 19, 22 data product managers, 129–32, 274 data rights, 78–82, 246 data science, 52–56 decoupling software engineering from, 133 data scientists, 54–56, 117, 128–30, 132–39, 161, 274 data stewards, 58, 274 data storage, 57, 81, 246–47, 257 data validators, 161 data valuation, 71–77 accessibility in, 72, 107 availability in, 72–73 breadth in, 76 cost in, 73 determination in, 74–76 dimensionality in, 75 discrimination in, 72–74 fungibility in, 74 perishability and relevance in, 74–75, 201 self-reinforcement in, 76 time in, 73–74 veracity in, 75 decision networks, 150, 153 decision trees, 149–50, 153 deduction and induction, 49–50 deep learning, 7, 147–48, 274 defensibility, 200, 274 defensible assets, 25 Dell, Michael, 226 Dell Technologies, 226 demand, 225 denial-of-service (DoS) attacks, 36 designers, 129 differential privacy, 117, 274 dimensionality reduction, 53, 274 disruption, 239–41 disruption theory, 239, 274 distributed systems, 8, 9 distribution costs, 243 DLEs, see data learning effects DoS (denial-of-service) attacks, 36 drift, 175–77, 203, 274 concept, 175–76 data, 176 minimizing, 201 e-commerce, 29, 31, 34, 37, 41, 84 economies of scale, 19, 34 ecosystem, 251–52 edges, 24, 274 enterprise resource planning (ERP), 161, 250, 274 entry-level data network effects, 26–29, 31–33, 36–37, 274 epochs, 173, 275 equity capital, 230 ETL (extract, transform, and load), 58, 275 evolutionary algorithms, 150–51, 153 expected error reduction, 96 expected model change, 96 Expensify, 85–86 Facebook, 25, 43, 112, 119, 122 features, 63–64, 145, 275 finding, 64–65 pricing and, 238–39 federated learning, 117, 275 feedback data, 36, 199–200 feed-forward networks, 151, 153 financial data brokers for, 120–21 stock market, 72, 74, 120–21 first-movers, 253–55, 275 flywheels, 37–38, 243–44 Ford, Henry, 49 fungibility of data, 74 Game of Life, 104 Gaussian mixture model, 275 generative adversarial networks (GANs), 152, 153 give-to-get model, 36 global multiuser models, 275 glossary, 271–82 Google, 111–12, 115, 195, 241, 251, 253–54 governments, 118–19 gradient boosted tree, 53, 275 gradient descent, 208 graph, 275 Gulf War, 6 hedge funds, 227 heuristics, 139, 231, 275 Hinton, Geoffrey, 7 histogram, 53, 275 holdout data, 199 horizontal products, 210–12, 276 HTML (hypertext markup language), 116, 276 human-generated data, 91–102 data labeling in, 57, 58, 92–100, 273 best practices for, 98 human-in-the-loop (HIL) systems, 100–101, 276 management of, 98–99 measurement in, 99–100 missing labels, 178 outsourcing of, 101–2 profitability metrics and, 215–16 tools for, 93–97 human learning, 16–17 hyperparameters, 173, 276 hypertext markup language (HTML), 116, 276 IBM (International Business Machines), 5–8, 255 image recognition, 76–77, 146 optical character, 72, 278 incumbents, 276 integrating, 245–46 leveraging the loop against, 256–61 independent software vendors (ISVs), 161, 248, 276 induction and deduction, 49–50 inductive logic programming (ILP), 149, 153 Informatica, 86 information, 1, 2, 276 data versus, 22–23 informational leverage, 3 Innovator’s Dilemma, The (Christensen), 239 input cost analysis, 215–16 input data, 199 insourcing, 102, 276 integration, 86–90, 276 predictions and, 171 testing, 174 integrations-first versus workflow-first companies, 88–89 intellectual leverage, 3 intellectual property (IP), 25, 251 intelligence, 1, 2, 5, 15, 16 artificial, see AI intelligent applications, 257–60, 276 intelligent systems, 19 interaction frequency, 197 interactive machine learning (IML), 96–97, 276 International Telecommunications Union (ITU), 250–51 Internet, 8, 19, 32, 69, 241–42, 244 inventory management software, 260 investment firms, 232 iPhone, 252 JIRA, 243 Kaggle, 9, 56, 117 Keras, 251 k-means, 276 knowledge economy, 21 Kubernetes, 251 language processing, 77, 94 latency, 158 layers of neurons, 7, 277 Lean AI, 48–68, 277 customer needs and, 49–50 decision tree for, 50–52 determining customer need for AI, 50–60 data and, 56–58 data science and, 54–56 sales and, 58–60 statistics and, 53–54 lean start-up versus, 61–62 levels in, 65–66 milestones for, 61 minimum viable product and, 62–63 model features lean start-ups, 61–62 learning human formula for, 16–17 machine formula for, 17–20 learning effects, 20–22, 277 moving beyond, 20–21 legacy applications, 257, 277 leverage, 3 linear optimization, 42 LinkedIn, 122 loans, 35, 37, 227 lock-in, 247–48 loops, 187–221, 273 drift and, 201 entropy and, 191–92 good versus bad, 191–92 metrics for measuring, see metrics moats versus, 187–88, 192–94 physics of, 190–92 prediction and, 202–3 product payoffs and, 195–98, 202 concave, 195–98 convex, 195–97, 202 picking the product to build, 198 repeatability in, 188–89 scale and, 198–201, 203 and data that doesn’t contribute to output, 199–200 loss, 207–8, 277 loss function, 275, 277 machine-generated data, 102–8 agent-based models, 103–5 simulation, 103–4 synthetic, 105–8 machine learning (ML), 9, 145–47, 277 types of, 147–48 machine learning engineers, 39, 56, 117–18, 129, 130, 132, 138, 139, 161, 277 machine learning management loop, 277 machine learning models (ML models), 9, 26, 27, 31, 52–56, 59, 61, 134 customer predictions and, 80–81 features of, 61, 63 machine learning models, building, 64–65, 143–54 compounding, 148–52 diverse disciplines in, 149–51 convolutional neural networks in, 151, 153 decision networks in, 150, 153 decision trees in, 149–50, 153 defining features, 146–47 evolutionary algorithms in, 150–51, 153 feed-forward networks in, 151, 153 generative adversarial networks in, 152, 153 inductive logic programming in, 149, 153 machine learning in, 151–52 primer for, 145–47 recurrent neural networks in, 151, 153 reinforcement learning in, 152, 153 statistical analysis in, 149, 153 machine learning models, managing, 155–86 acceptance, 157, 162–66 accountability and, 164 and augmentation versus automation, 163 budget and, 164 data cleaning and, 162–63 distribution and, 165 executive education and, 165–66 experiments and, 165 explainability and, 166 feature development and, 163 incentives and, 164 politics and, 163–66 product enhancements and, 165 retraining and, 163 and revenues versus costs, 164 schedule and, 163 technical, 162–63 and time to value, 164 usage tracking and, 166 decentralization versus centralization in, 156 experimentation versus implementation in, 155 implementation, 158–66 data, 158–59 security, 159–60 sensors, 160 services, 161 software, 159 staffing, 161–162 loop in, 156, 166–81 deployment, 171–72 monitoring, see monitoring model performance training, 168–69 redeploying, 181 reproducibility and, 170 rethinking, 181 reworking, 179–80 testing, 172–74 versioning, 169–70, 281 ROI in, 164, 176, 181 testing and observing in, 156 machine learning researchers, 129–34, 135–36, 138, 277 management of AI-First teams, 135–38 of data labeling, 98–99 of machine learning models, see machine learning models, managing manual acceptance, 208–9 manufacturing, 6 marketing, customer data coalitions and, 83 marketing segmentation, 277 McCulloch, Warren, 4–5 McDonald’s, 256 Mechanical Turk, 98, 99, 215 media, 118 medical applications, 90–91, 145, 208 metrics, 203 measurement, 203–9 accuracy, 203–4 area underneath the curve, 206, 272 binary classification, 204–6 loss, 207–8 manual acceptance, 208–9 precision and recall, 206–7 receiver operating characteristic, 205–6, 279 usage, 209 profitability, 209–18 data labeling and, 215–16 data pipes and, 216 input cost analysis, 215–16 research cost analysis, 217–18 unit analysis, 213–14 and vertical versus horizontal products, 210–12 Microsoft, 8, 247 Access, 257 Outlook, 252 military, 6, 7 minimum viable product (MVP), 62–63, 277 MIT (Massachusetts Institute of Technology), 4, 5 ML models, see machine learning models moats, 277 loops versus, 187–88, 192–94 mobile phones, 113 iPhone, 252 monitoring, 277 monitoring model performance, 174–78 accuracy, 175 bias, 177 data quality, 177–78 reworking and, 179–80 stability, 175–77 MuleSoft, 86, 87 negotiating data rights, 79, 80 Netflix, 242, 243 network effects, 15–16, 20, 22, 23, 44, 278 compounding of, 36 data network effects versus, 24–25 edges of, 24 limits to, 42–43 moving beyond, 24–26 products with versus without, 26 scale effects versus, 24 traditional, 27 value of, 27 networks, 7, 15, 17 data networks versus, 26 neural networks, 5, 7, 8, 19, 23, 53, 54, 277–78 neurons, 5, 7, 15 layers of, 7, 276 next-level data network effects, 26–27, 29–33, 36–37, 278 nodes, 21, 23–25, 27, 44, 278 NVIDIA, 250 Obama administration, 118 Onavo, 112 optical character recognition software, 72, 278 Oracle, 247, 248 outsourcing, 216 data labeling, 101–2 team members, 131 overfitting, 82 Pareto optimal solution, 56, 278 partial plots, 53, 278 payoffs, 195–98 concave, 195–98 convex, 195–97, 202 perceptron algorithm, 5 perishability of data, 74–75, 201 personalization, 255–56 personally identifiable information (PII), 81, 278 personnel lock-in, 248 perturbation, 178, 278 physical leverage, 3 Pitts, Walter, 4–5 POC (proof of concept), 59–60, 63, 278 positioning, 245–56 power generators, 209, 278 power teachers, 209 precision, 278 precision and recall, 206–7 prediction usability threshold (PUT), 62–64, 90, 91, 173, 200–202, 279 predictions, 34–35, 48, 63, 65, 148, 202–3 predictive pricing, 41, 42 prices charged by data vendors, 73 pricing of AI-First products, 236–39 customer data contribution and, 237 features and, 238–39 transactional, 237, 280 updating and, 238 usage-based, 237–38, 281 of data integration products, 87 optimization of, 41–42 personalized, 41 predictive, 41, 42 ROI-based, 235–36, 279 Principia Mathematica, 4 prisoner’s dilemma, 104 probability, in data labeling, 107 process automation, 6 process lock-in, 248 products, 59 features of, 61, 63 lock-in and, 248 utility of, 35–36 value of, 39 profit, 213 profitability metrics, 209–18 data labeling and, 215–16 data pipes and, 216 input cost analysis, 215–16 research cost analysis, 217–18 unit analysis, 213–14 and vertical versus horizontal products, 210–12 proof of concept (POC), 59–60, 63, 278 proprietary information, 44, 279 feedback data, 199–200 protocols, 248 public data, 115–22 buying, see buying data consulting and competitions, 117–18 crawling, 115–16, 281 governments, 118–19 media, 118 PUT (prediction usability threshold), 62–64, 90, 91, 173, 200–202, 278 quality, 175, 177–78 query by committee, 96 query languages, 279 random forest, 53, 64, 279 recall, 279 receiver operating characteristic (ROC) curve, 205–6, 279 recurrent neural networks (RNNs), 151, 153 recursion, 150, 279 regression, 64 reinforcement learning (RL), 103, 147–48, 152, 153, 279 relevance of data, 74–75 reliability, 175 reports, 171 research and development (R & D), 42 cost analysis, 217–18 revolutionary products, 252 robots, 6 ROI (return on investment), 55, 63–65, 93, 164, 176, 181, 198, 279 pricing based on, 235–36, 279 Russell, Bertrand, 4 sales, 58–60 Salesforce, 159, 212, 243, 248, 258 SAP (Systems Applications and Products in Data Processing), 6, 159, 161, 247, 248 SAS, 253 scalability, in data labeling, 106 scale, 20–22, 227, 279 economies of, 19, 34 loops and, 198–200, 203 in ML management, 158 moving beyond, 21–23 network effects versus, 24 scatter plot, 53, 280 scheme, 279 search engines, 31 secure multiparty computation, 117, 279 security, 159 Segment, 87–88 self-reinforcing data, 76 selling data, 122 sensors, 113–14, 160, 280 shopping online, 29, 31, 34, 37, 41, 84 simulation, 103–4, 280 ABMs versus, 105 social networks, 16, 20, 44 Facebook, 25, 43, 112, 119, 122 LinkedIn, 122 software, 159 traditional business models for, 233–34 software-as-a-service (SaaS), 87, 280 software development kits (SDKs), 112, 280 software engineering, decoupling data science from, 133 software engineers, 139, 134–37 Sony, 7 speed of data labeling, 108 spreadsheets, 171 Square Capital, 35 stability, 175–77 staging, 249–51 standardization, 247–48, 249–50 statistical analysis, 149, 153 statistical process control (SPC), 156, 173, 280 statistics, 53–54 stocks, 72, 74, 120–21 supervised machine learning, 147–48, 280 supply, 225 supply-chain tracking, 260 support vector machines, 280 synthetic data, 105–8, 216 system of engagement, 280 system of record, 243, 281 systems integrators (SIs), 161, 248, 281 Tableau, 253 talent loop, 260–61, 281 Taylor, Frederick W., 6 teams in proof of concept phase, 60 see also AI-First teams telecommunications industry, 250–51 telephones mobile, 113 iPhone, 253 networks, 23–25 templates, 171 temporal leverage, 3 threshold logic unit (TLU), 5 ticker data, 120–21 token-based incentives, 109–10 tools, 2–3, 93–97 training data, 199 transactional pricing, 237, 280 transaction costs, 243 transfer learning, 147–48 true and false, 204–6 Turing, Alan, 5 23andMe, 112 Twilio, 87 uncertainty sampling, 96 unit analysis, 213–14 United Nations, 250 unsupervised machine learning, 53, 147–48, 281 Upwork, 99 usability, 255–56 usage-based pricing, 237–38, 281 usage metrics, 209 user interface (UI), 89, 159, 281 utility of network effects, 42 of products, 35–36 validation data, 199 value chain, 18–19, 281 value proposition, 59 values, missing, 178 variable importance plots, 53, 281 variance reduction, 96 Veeva Systems, 212 vendors, 73, 161 data, prices charged by, 73 independent software, 161, 248, 276 lock-in and, 247–48 venture capital, 230 veracity of data, 75 versioning, 169–70, 281 vertical integration, 226–37, 239, 244, 252, 281 vertical products, 210–12, 282 VMWare, 248 waterfall charts, 282 Web crawlers, 115–16, 282 weights, 150, 281 workflow applications, 84–86, 253, 259, 282 workflow-first versus integrations-first companies, 88–89 yield management systems, 42 Zapier, 87 Zendesk, 233 zettabyte, 8, 282 Zetta Venture Partners, 8–9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ABOUT THE AUTHOR Ash Fontana became one of the most recognized startup investors in the world after launching online investing at AngelList.

HBase: The Definitive Guide
by Lars George
Published 29 Aug 2011

PE (Performance Evaluation) tool, Performance Evaluation, Performance Evaluation perf.hfile.block.cache.size property, Configuration performance, MapReduce Locality, MapReduce Locality, Log-Structured Merge-Trees, Garbage Collection Tuning, Garbage Collection Tuning, Memstore-Local Allocation Buffer, Memstore-Local Allocation Buffer, Compression, Enabling Compression, Managed Splitting, Managed Splitting, Region Hotspotting, Presplitting Regions, Presplitting Regions, Load Balancing, Load Balancing, Merging Regions, Merging Regions, Client API: Best Practices, Client API: Best Practices, Configuration, Configuration, Load Tests, YCSB best practices for, Client API: Best Practices, Client API: Best Practices block replication and, MapReduce Locality, MapReduce Locality load tests for, Load Tests, YCSB seek compared to transfer operations, Log-Structured Merge-Trees tuning, Garbage Collection Tuning, Garbage Collection Tuning, Memstore-Local Allocation Buffer, Memstore-Local Allocation Buffer, Compression, Enabling Compression, Managed Splitting, Managed Splitting, Region Hotspotting, Presplitting Regions, Presplitting Regions, Load Balancing, Load Balancing, Merging Regions, Merging Regions, Configuration, Configuration compression, Compression, Enabling Compression configuration for, Configuration, Configuration garbage collection, Garbage Collection Tuning, Garbage Collection Tuning load balancing, Load Balancing, Load Balancing managed splitting, Managed Splitting, Managed Splitting memstore-local allocation buffer, Memstore-Local Allocation Buffer, Memstore-Local Allocation Buffer merging regions, Merging Regions, Merging Regions presplitting regions, Presplitting Regions, Presplitting Regions region hotspotting, Region Hotspotting Performance Evaluation (PE) tool, Performance Evaluation, Performance Evaluation Persistent time varying rate (PTVR) metric rate, Contexts, Records, and Metrics physical models, Dimensions Pig, Pig, Pig, Pig, Pig, Pig, Pig Grunt shell for, Pig, Pig installing, Pig Pig Latin query language for, Pig pipelined writes, LogSyncer Class piping commands into HBase Shell, Scripting, Scripting planet-sized web applications, The Dawn of Big Data POM (Project Object Model), Building the Examples pom.xml file, Dynamic Provisioning ports, Operation, Operation, Operation, Master UI, Adding a local backup master, Required Ports for Avro, Operation required for each server, Required Ports for REST, Operation for Thrift, Operation for web-based UI, Master UI, Adding a local backup master postAddColumn() method, MasterObserver class, The MasterObserver Class postAssign() method, MasterObserver class, The MasterObserver Class postBalance() method, MasterObserver class, The MasterObserver Class postBalanceSwitch() method, MasterObserver class, The MasterObserver Class postCheckAndDelete, Handling client API events postCheckAndPut() method, RegionObserver class, Handling client API events postCreateTable() method, MasterObserver class, The MasterObserver Class postDelete() method, RegionObserver class, Handling client API events postDeleteColumn() method, MasterObserver class, The MasterObserver Class postDeleteTable() method, MasterObserver class, The MasterObserver Class postDisableTable() method, MasterObserver class, The MasterObserver Class postEnableTable() method, MasterObserver class, The MasterObserver Class postExists() method, RegionObserver class, Handling client API events postGet() method, RegionObserver class, Handling client API events postGetClosestRowBefore() method, RegionObserver class, Handling client API events postIncrement() method, RegionObserver class, Handling client API events postIncrementColumnValue() method, RegionObserver class, Handling client API events postModifyColumn() method, MasterObserver class, The MasterObserver Class postModifyTable() method, MasterObserver class, The MasterObserver Class postMove() method, MasterObserver class, The MasterObserver Class postOpenDeployTasks() method, RegionServerServices class, The RegionCoprocessorEnvironment class postPut() method, RegionObserver class, Handling client API events postScannerClose() method, RegionObserver class, Handling client API events postScannerNext() method, RegionObserver class, Handling client API events postScannerOpen() method, RegionObserver class, Handling client API events postUnassign() method, MasterObserver class, The MasterObserver Class power supply unit (PSU), requirements for, Servers preAddColumn() method, MasterObserver class, The MasterObserver Class preAssign() method, MasterObserver class, The MasterObserver Class preBalance() method, MasterObserver class, The MasterObserver Class preBalanceSwitch() method, MasterObserver class, The MasterObserver Class preCheckAndDelete() method, RegionObserver class, Handling client API events preCheckAndPut() method, RegionObserver class, Handling client API events preClose() method, RegionObserver class, State: pending close preCompact() method, RegionObserver class, State: open preCreateTable() method, MasterObserver class, The MasterObserver Class preDelete() method, RegionObserver class, Handling client API events preDeleteColumn() method, MasterObserver class, The MasterObserver Class preDeleteTable() method, MasterObserver class, The MasterObserver Class predicate deletions, Tables, Rows, Columns, and Cells, Log-Structured Merge-Trees predicate pushdown, Introduction to Filters preDisableTable() method, MasterObserver class, The MasterObserver Class preEnableTable() method, MasterObserver class, The MasterObserver Class preExists() method, RegionObserver class, Handling client API events PrefixFilter class, PrefixFilter, PrefixFilter, Filters Summary preFlush() method, RegionObserver class, State: open preGet() method, RegionObserver class, Handling client API events preGetClosestRowBefore() method, RegionObserver class, Handling client API events preIncrement() method, RegionObserver class, Handling client API events preIncrementColumnValue() method, RegionObserver class, Handling client API events preModifyColumn() method, MasterObserver class, The MasterObserver Class preModifyTable() method, MasterObserver class, The MasterObserver Class preMove() method, MasterObserver class, The MasterObserver Class preOpen() method, RegionObserver class, State: pending open prepare() method, ObserverContext class, The ObserverContext class prePut() method, RegionObserver class, Handling client API events preScannerClose() method, RegionObserver class, Handling client API events preScannerNext() method, RegionObserver class, Handling client API events preScannerOpen() method, RegionObserver class, Handling client API events preShutdown() method, MasterObserver class, The MasterObserver Class preSplit() method, RegionObserver class, State: open preStopMaster() method, MasterObserver class, The MasterObserver Class preUnassign() method, MasterObserver class, The MasterObserver Class preWALRestore() method, RegionObserver class, State: pending open prewarmRegionCache() method, HTable class, The HTable Utility Methods process limits, File handles and process limits, File handles and process limits processors, Servers (see CPU) profiles, Maven, Dynamic Provisioning, Dynamic Provisioning Project Object Model, Building the Examples (see POM) properties, for configuration, HBase Configuration Properties, HBase Configuration Properties Protocol Buffers, Introduction to REST, Thrift, and Avro, Protocol Buffer (application/x-protobuf), Advanced Schemas encoding for REST, Protocol Buffer (application/x-protobuf) schema used by, Advanced Schemas pseudodistributed mode, Pseudodistributed mode, Pseudodistributed mode, Adding a local region server PSU (power supply unit), requirements for, Servers PTVR (Persistent time varying rate), Contexts, Records, and Metrics Puppet, deployment using, Puppet and Chef Put class, Single Puts, Single Puts put command, HBase Shell, Quick-Start Guide, Data manipulation Put type, KeyValue class, The KeyValue class put() method, HTable class, Put Method, Atomic compare-and-set, Single Puts, Single Puts, Client-side write buffer, List of Puts, List of Puts, List of Puts (see also checkAndPut() method, HTable class) list-based, List of Puts, List of Puts for multiple operations, Client-side write buffer, List of Puts for single operations, Single Puts, Single Puts putLong() method, Bytes class, The Bytes Class putTable() method, HTablePool class, HTablePool PyHBase client, Other Clients Q QualifierFilter class, QualifierFilter, Filters Summary quit command, HBase Shell, Basics quotes, in HBase Shell, Commands R RAID, Servers RAM, Servers (see memory) RandomRowFilter class, RandomRowFilter, Filters Summary range partitions, Auto-Sharding Rate (R) metric type, Contexts, Records, and Metrics raw() method, Result class, The Result class RDBMS (Relational Database Management System), The Dawn of Big Data, The Dawn of Big Data, The Problem with Relational Database Systems, The Problem with Relational Database Systems, Database (De-)Normalization, Database (De-)Normalization converting to HBase, Database (De-)Normalization, Database (De-)Normalization limitations of, The Dawn of Big Data, The Dawn of Big Data, The Problem with Relational Database Systems, The Problem with Relational Database Systems read-only tables, Table Properties read/write performance, Dimensions readFields() method, Writable interface, Tables record IDs, custom versioning for, Custom Versioning RecordReader class, InputFormat recovered.edits directory, Region-level files, Log splitting, Edits recovery Red Hat Enterprise Linux, Operating system (see RHEL) Red Hat Package Manager, Operating system (see RPM) Reducer class, Reducer referential integrity, The Problem with Relational Database Systems RegexStringComparator class, Comparators region hotspotting, Region Hotspotting region servers, Auto-Sharding, Implementation, Specifying region servers, Web-based UI Introduction, Cluster Operations, Cluster Status Information, Main page, Region Server UI, Main page, Region Server Metrics, Region Server Metrics, Garbage Collection Tuning, Startup check, Node Decommissioning, Node Decommissioning, Rolling Restarts, Rolling Restarts, Adding a local region server, Adding a region server, Required Ports, Analyzing the Logs, Garbage collection/memory tuning, Stability issues, “Could not obtain block” errors, HBase Configuration Properties, HBase Configuration Properties adding, Adding a region server for fully distributed mode, Specifying region servers heap for, Garbage collection/memory tuning local, adding, Adding a local region server logfiles created by, Analyzing the Logs metrics exposed by, Region Server Metrics, Region Server Metrics ports for, Required Ports properties for, HBase Configuration Properties, HBase Configuration Properties rolling restart for, Rolling Restarts, Rolling Restarts shutting down, troubleshooting, Stability issues, “Could not obtain block” errors startup check for, Startup check status information for, Web-based UI Introduction, Cluster Status Information, Main page, Region Server UI, Main page stopping, Cluster Operations, Node Decommissioning, Node Decommissioning workloads of, handling, Garbage Collection Tuning RegionCoprocessorEnvironment class, The RegionCoprocessorEnvironment class .regioninfo file, Region-level files RegionLoad class, Cluster Status Information, Cluster Status Information RegionObserver class, The RegionObserver Class, The BaseRegionObserver class regions, Auto-Sharding, Auto-Sharding, Auto-Sharding, The HTable Utility Methods, The HTable Utility Methods, Handling region life-cycle events, State: pending close, Tables, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Status Information, Cluster Status Information, Cluster Status Information, Cluster Status Information, Tools, Tools, Tools, Tools, Tools, Tools, Tools, Main page, User Table page, User Table page, User Table page, User Table page, Region-level files, Region-level files, Region splits, Region splits, Compactions, Compactions, Region Lookups, The Region Life Cycle, Managed Splitting, Managed Splitting, Presplitting Regions, Presplitting Regions, Merging Regions, Merging Regions, Configuration, HBase Fsck assigning to a server, Tools cache for, The HTable Utility Methods closing, Cluster Operations, Tools compacting, Cluster Operations, Tools, User Table page, Compactions, Compactions deploying or undeploying, Cluster Operations files for, Region-level files, Region-level files flushing, Cluster Operations, Tools life-cycle state changes, Handling region life-cycle events, State: pending close, The Region Life Cycle listing, User Table page, User Table page lookups for, Region Lookups map of, The HTable Utility Methods merging, Merging Regions, Merging Regions moving to a different server, Cluster Operations, Tools presplitting, Presplitting Regions, Presplitting Regions reassigning to a new server, HBase Fsck size of, increasing, Configuration splitting, Auto-Sharding, Cluster Operations, Tools, User Table page, Region splits, Region splits, Managed Splitting, Managed Splitting status information for, Cluster Status Information, Cluster Status Information, Cluster Status Information in transition, map of, Cluster Status Information, Main page unassigning, Tools RegionScanner class, Read Path regionservers file, Specifying region servers, regionserver, regionservers, Script-Based (see also configuration) RegionSplitter utility, Presplitting Regions Relational Database Management System, The Dawn of Big Data (see RDBMS) remote method invocation (RMI), JMX Remote API remote procedure call, Client-side write buffer (see RPC) RemoteAdmin class, REST Java client RemoteHTable class, REST Java client, REST Java client remove() method, HTableDescriptor class, Table Properties removeFamily() method, HTableDescriptor class, Table Properties remove_peer command, HBase Shell, Replication replication, Column Families, Replication, Replication, Region server failover, Replication, Replication for column families, Column Families in HBase Shell, Replication Representational State Transfer, Introduction to REST, Thrift, and Avro (see REST) requests, current number of, Cluster Status Information reset() method, Filter interface, Custom Filters REST (Representational State Transfer), Introduction to REST, Thrift, and Avro, Introduction to REST, Thrift, and Avro, REST, REST Java client, Operation, Operation, Operation, Operation, Operation, Supported formats, Raw binary (application/octet-stream), Plain (text/plain), Plain (text/plain), XML (text/xml), XML (text/xml), XML (text/xml), JSON (application/json), JSON (application/json), JSON (application/json), Protocol Buffer (application/x-protobuf), Raw binary (application/octet-stream), REST Java client, REST Java client, HBase Configuration Properties Base64 encoding used in, XML (text/xml), JSON (application/json) documentation for, Operation formats supported by, Supported formats, Raw binary (application/octet-stream) Java client for, REST Java client, REST Java client JSON format for, JSON (application/json), JSON (application/json) plain text format for, Plain (text/plain), Plain (text/plain) port for, Operation Protocol Buffer format for, Protocol Buffer (application/x-protobuf) raw binary format for, Raw binary (application/octet-stream) starting gateway server for, Operation stopping, Operation verifying operation of, Operation XML format for, XML (text/xml), XML (text/xml) Result class, The Result class, The Result class ResultScanner class, The ResultScanner Class, The ResultScanner Class, Client API: Best Practices RHEL (Red Hat Enterprise Linux), Operating system RMI (remote method invocation), JMX Remote API rolling restarts, Rolling Restarts, Rolling Restarts -ROOT- table, Region Lookups RootComparator class, The KeyValue class RootKeyComparator class, The KeyValue class round-trip time, Client-side write buffer row keys, Tables, Rows, Columns, and Cells, Tables, Rows, Columns, and Cells, Concepts, Partial Key Scans, Pagination, Time Series Data, Time Series Data, Time Series Data field swap and promotion of, Time Series Data for pagination, Pagination for partial key scans, Partial Key Scans randomization of, Time Series Data salting prefix for, Time Series Data RowComparator class, The KeyValue class RowCountProtocol interface, The BaseEndpointCoprocessor class RowFilter class, RowFilter, RowFilter, Filters Summary RowLock class, Single Puts rows, Tables, Rows, Columns, and Cells, Tables, Rows, Columns, and Cells, Single Puts, Single Puts, Single Puts, Single Puts, Client-side write buffer, List of Puts, Single Gets, The Result class, Single Gets, Single Gets, List of Gets, List of Gets, Single Deletes, Single Deletes, Single Deletes, Single Deletes, List of Deletes, List of Deletes, Batch Operations, Batch Operations, Row Locks, Row Locks, Scans, Caching Versus Batching, Multiple Counters, Data manipulation, Data manipulation, Data manipulation, Data manipulation, Data manipulation adding, Single Puts, Single Puts, Client-side write buffer, List of Puts, Data manipulation multiple operations, Client-side write buffer, List of Puts single operations, Single Puts, Single Puts batch operations on, Batch Operations, Batch Operations counting, Data manipulation deleting, Single Deletes, Single Deletes, List of Deletes, List of Deletes, Data manipulation multiple operations, List of Deletes, List of Deletes single operations, Single Deletes, Single Deletes getting, Single Gets, The Result class, List of Gets, List of Gets, Data manipulation multiple operations, List of Gets, List of Gets single operations, Single Gets, The Result class locking, Single Puts, Single Puts, Single Gets, Single Gets, Single Deletes, Single Deletes, Row Locks, Row Locks, Multiple Counters scanning, Scans, Caching Versus Batching, Data manipulation RPC (remote procedure call), Client-side write buffer, RPC Metrics, RPC Metrics metrics for, RPC Metrics, RPC Metrics put operations as, Client-side write buffer RPM (Red Hat Package Manager), Operating system Ruby hashes, in HBase Shell, Commands RVComparator class, The KeyValue class S S (String) metric type, Contexts, Records, and Metrics S3 (Simple Storage Service), S3, S3 Safari Books Online, Safari® Books Online sales, data requirements of, The Dawn of Big Data salting, Time Series Data scalability, Scalability, Scalability Scan class, Introduction, Introduction, Introduction scan command, HBase Shell, Quick-Start Guide, Data manipulation scan operations, Scans, Caching Versus Batching, The ResultScanner Class, Caching Versus Batching, Caching Versus Batching, Caching Versus Batching, Caching Versus Batching, Read Path, Partial Key Scans, Partial Key Scans, Pagination, Pagination (see also get operations) batching, Caching Versus Batching, Caching Versus Batching caching, Caching Versus Batching, Caching Versus Batching leases for, The ResultScanner Class pagination, Pagination, Pagination partial key scans, Partial Key Scans, Partial Key Scans scan() method, HTable class, Introduction to Filters filters for, Introduction to Filters (see filters) schema, Schema Definition, Column Families, Tables, Table Properties, Column Families, Column Families column families, Column Families, Column Families tables, Tables, Table Properties script-based deployment, Script-Based, Script-Based scripting, in HBase Shell, Scripting, Scripting search integration, Search Integration, Search Integration secondary indexes, Dimensions, Secondary Indexes, Secondary Indexes seek operations, compared to transfer operations, Log-Structured Merge-Trees sequential consistency, Nonrelational Database Systems, Not-Only SQL or NoSQL?

PE (Performance Evaluation) tool, Performance Evaluation, Performance Evaluation perf.hfile.block.cache.size property, Configuration performance, MapReduce Locality, MapReduce Locality, Log-Structured Merge-Trees, Garbage Collection Tuning, Garbage Collection Tuning, Memstore-Local Allocation Buffer, Memstore-Local Allocation Buffer, Compression, Enabling Compression, Managed Splitting, Managed Splitting, Region Hotspotting, Presplitting Regions, Presplitting Regions, Load Balancing, Load Balancing, Merging Regions, Merging Regions, Client API: Best Practices, Client API: Best Practices, Configuration, Configuration, Load Tests, YCSB best practices for, Client API: Best Practices, Client API: Best Practices block replication and, MapReduce Locality, MapReduce Locality load tests for, Load Tests, YCSB seek compared to transfer operations, Log-Structured Merge-Trees tuning, Garbage Collection Tuning, Garbage Collection Tuning, Memstore-Local Allocation Buffer, Memstore-Local Allocation Buffer, Compression, Enabling Compression, Managed Splitting, Managed Splitting, Region Hotspotting, Presplitting Regions, Presplitting Regions, Load Balancing, Load Balancing, Merging Regions, Merging Regions, Configuration, Configuration compression, Compression, Enabling Compression configuration for, Configuration, Configuration garbage collection, Garbage Collection Tuning, Garbage Collection Tuning load balancing, Load Balancing, Load Balancing managed splitting, Managed Splitting, Managed Splitting memstore-local allocation buffer, Memstore-Local Allocation Buffer, Memstore-Local Allocation Buffer merging regions, Merging Regions, Merging Regions presplitting regions, Presplitting Regions, Presplitting Regions region hotspotting, Region Hotspotting Performance Evaluation (PE) tool, Performance Evaluation, Performance Evaluation Persistent time varying rate (PTVR) metric rate, Contexts, Records, and Metrics physical models, Dimensions Pig, Pig, Pig, Pig, Pig, Pig, Pig Grunt shell for, Pig, Pig installing, Pig Pig Latin query language for, Pig pipelined writes, LogSyncer Class piping commands into HBase Shell, Scripting, Scripting planet-sized web applications, The Dawn of Big Data POM (Project Object Model), Building the Examples pom.xml file, Dynamic Provisioning ports, Operation, Operation, Operation, Master UI, Adding a local backup master, Required Ports for Avro, Operation required for each server, Required Ports for REST, Operation for Thrift, Operation for web-based UI, Master UI, Adding a local backup master postAddColumn() method, MasterObserver class, The MasterObserver Class postAssign() method, MasterObserver class, The MasterObserver Class postBalance() method, MasterObserver class, The MasterObserver Class postBalanceSwitch() method, MasterObserver class, The MasterObserver Class postCheckAndDelete, Handling client API events postCheckAndPut() method, RegionObserver class, Handling client API events postCreateTable() method, MasterObserver class, The MasterObserver Class postDelete() method, RegionObserver class, Handling client API events postDeleteColumn() method, MasterObserver class, The MasterObserver Class postDeleteTable() method, MasterObserver class, The MasterObserver Class postDisableTable() method, MasterObserver class, The MasterObserver Class postEnableTable() method, MasterObserver class, The MasterObserver Class postExists() method, RegionObserver class, Handling client API events postGet() method, RegionObserver class, Handling client API events postGetClosestRowBefore() method, RegionObserver class, Handling client API events postIncrement() method, RegionObserver class, Handling client API events postIncrementColumnValue() method, RegionObserver class, Handling client API events postModifyColumn() method, MasterObserver class, The MasterObserver Class postModifyTable() method, MasterObserver class, The MasterObserver Class postMove() method, MasterObserver class, The MasterObserver Class postOpenDeployTasks() method, RegionServerServices class, The RegionCoprocessorEnvironment class postPut() method, RegionObserver class, Handling client API events postScannerClose() method, RegionObserver class, Handling client API events postScannerNext() method, RegionObserver class, Handling client API events postScannerOpen() method, RegionObserver class, Handling client API events postUnassign() method, MasterObserver class, The MasterObserver Class power supply unit (PSU), requirements for, Servers preAddColumn() method, MasterObserver class, The MasterObserver Class preAssign() method, MasterObserver class, The MasterObserver Class preBalance() method, MasterObserver class, The MasterObserver Class preBalanceSwitch() method, MasterObserver class, The MasterObserver Class preCheckAndDelete() method, RegionObserver class, Handling client API events preCheckAndPut() method, RegionObserver class, Handling client API events preClose() method, RegionObserver class, State: pending close preCompact() method, RegionObserver class, State: open preCreateTable() method, MasterObserver class, The MasterObserver Class preDelete() method, RegionObserver class, Handling client API events preDeleteColumn() method, MasterObserver class, The MasterObserver Class preDeleteTable() method, MasterObserver class, The MasterObserver Class predicate deletions, Tables, Rows, Columns, and Cells, Log-Structured Merge-Trees predicate pushdown, Introduction to Filters preDisableTable() method, MasterObserver class, The MasterObserver Class preEnableTable() method, MasterObserver class, The MasterObserver Class preExists() method, RegionObserver class, Handling client API events PrefixFilter class, PrefixFilter, PrefixFilter, Filters Summary preFlush() method, RegionObserver class, State: open preGet() method, RegionObserver class, Handling client API events preGetClosestRowBefore() method, RegionObserver class, Handling client API events preIncrement() method, RegionObserver class, Handling client API events preIncrementColumnValue() method, RegionObserver class, Handling client API events preModifyColumn() method, MasterObserver class, The MasterObserver Class preModifyTable() method, MasterObserver class, The MasterObserver Class preMove() method, MasterObserver class, The MasterObserver Class preOpen() method, RegionObserver class, State: pending open prepare() method, ObserverContext class, The ObserverContext class prePut() method, RegionObserver class, Handling client API events preScannerClose() method, RegionObserver class, Handling client API events preScannerNext() method, RegionObserver class, Handling client API events preScannerOpen() method, RegionObserver class, Handling client API events preShutdown() method, MasterObserver class, The MasterObserver Class preSplit() method, RegionObserver class, State: open preStopMaster() method, MasterObserver class, The MasterObserver Class preUnassign() method, MasterObserver class, The MasterObserver Class preWALRestore() method, RegionObserver class, State: pending open prewarmRegionCache() method, HTable class, The HTable Utility Methods process limits, File handles and process limits, File handles and process limits processors, Servers (see CPU) profiles, Maven, Dynamic Provisioning, Dynamic Provisioning Project Object Model, Building the Examples (see POM) properties, for configuration, HBase Configuration Properties, HBase Configuration Properties Protocol Buffers, Introduction to REST, Thrift, and Avro, Protocol Buffer (application/x-protobuf), Advanced Schemas encoding for REST, Protocol Buffer (application/x-protobuf) schema used by, Advanced Schemas pseudodistributed mode, Pseudodistributed mode, Pseudodistributed mode, Adding a local region server PSU (power supply unit), requirements for, Servers PTVR (Persistent time varying rate), Contexts, Records, and Metrics Puppet, deployment using, Puppet and Chef Put class, Single Puts, Single Puts put command, HBase Shell, Quick-Start Guide, Data manipulation Put type, KeyValue class, The KeyValue class put() method, HTable class, Put Method, Atomic compare-and-set, Single Puts, Single Puts, Client-side write buffer, List of Puts, List of Puts, List of Puts (see also checkAndPut() method, HTable class) list-based, List of Puts, List of Puts for multiple operations, Client-side write buffer, List of Puts for single operations, Single Puts, Single Puts putLong() method, Bytes class, The Bytes Class putTable() method, HTablePool class, HTablePool PyHBase client, Other Clients Q QualifierFilter class, QualifierFilter, Filters Summary quit command, HBase Shell, Basics quotes, in HBase Shell, Commands R RAID, Servers RAM, Servers (see memory) RandomRowFilter class, RandomRowFilter, Filters Summary range partitions, Auto-Sharding Rate (R) metric type, Contexts, Records, and Metrics raw() method, Result class, The Result class RDBMS (Relational Database Management System), The Dawn of Big Data, The Dawn of Big Data, The Problem with Relational Database Systems, The Problem with Relational Database Systems, Database (De-)Normalization, Database (De-)Normalization converting to HBase, Database (De-)Normalization, Database (De-)Normalization limitations of, The Dawn of Big Data, The Dawn of Big Data, The Problem with Relational Database Systems, The Problem with Relational Database Systems read-only tables, Table Properties read/write performance, Dimensions readFields() method, Writable interface, Tables record IDs, custom versioning for, Custom Versioning RecordReader class, InputFormat recovered.edits directory, Region-level files, Log splitting, Edits recovery Red Hat Enterprise Linux, Operating system (see RHEL) Red Hat Package Manager, Operating system (see RPM) Reducer class, Reducer referential integrity, The Problem with Relational Database Systems RegexStringComparator class, Comparators region hotspotting, Region Hotspotting region servers, Auto-Sharding, Implementation, Specifying region servers, Web-based UI Introduction, Cluster Operations, Cluster Status Information, Main page, Region Server UI, Main page, Region Server Metrics, Region Server Metrics, Garbage Collection Tuning, Startup check, Node Decommissioning, Node Decommissioning, Rolling Restarts, Rolling Restarts, Adding a local region server, Adding a region server, Required Ports, Analyzing the Logs, Garbage collection/memory tuning, Stability issues, “Could not obtain block” errors, HBase Configuration Properties, HBase Configuration Properties adding, Adding a region server for fully distributed mode, Specifying region servers heap for, Garbage collection/memory tuning local, adding, Adding a local region server logfiles created by, Analyzing the Logs metrics exposed by, Region Server Metrics, Region Server Metrics ports for, Required Ports properties for, HBase Configuration Properties, HBase Configuration Properties rolling restart for, Rolling Restarts, Rolling Restarts shutting down, troubleshooting, Stability issues, “Could not obtain block” errors startup check for, Startup check status information for, Web-based UI Introduction, Cluster Status Information, Main page, Region Server UI, Main page stopping, Cluster Operations, Node Decommissioning, Node Decommissioning workloads of, handling, Garbage Collection Tuning RegionCoprocessorEnvironment class, The RegionCoprocessorEnvironment class .regioninfo file, Region-level files RegionLoad class, Cluster Status Information, Cluster Status Information RegionObserver class, The RegionObserver Class, The BaseRegionObserver class regions, Auto-Sharding, Auto-Sharding, Auto-Sharding, The HTable Utility Methods, The HTable Utility Methods, Handling region life-cycle events, State: pending close, Tables, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Operations, Cluster Status Information, Cluster Status Information, Cluster Status Information, Cluster Status Information, Tools, Tools, Tools, Tools, Tools, Tools, Tools, Main page, User Table page, User Table page, User Table page, User Table page, Region-level files, Region-level files, Region splits, Region splits, Compactions, Compactions, Region Lookups, The Region Life Cycle, Managed Splitting, Managed Splitting, Presplitting Regions, Presplitting Regions, Merging Regions, Merging Regions, Configuration, HBase Fsck assigning to a server, Tools cache for, The HTable Utility Methods closing, Cluster Operations, Tools compacting, Cluster Operations, Tools, User Table page, Compactions, Compactions deploying or undeploying, Cluster Operations files for, Region-level files, Region-level files flushing, Cluster Operations, Tools life-cycle state changes, Handling region life-cycle events, State: pending close, The Region Life Cycle listing, User Table page, User Table page lookups for, Region Lookups map of, The HTable Utility Methods merging, Merging Regions, Merging Regions moving to a different server, Cluster Operations, Tools presplitting, Presplitting Regions, Presplitting Regions reassigning to a new server, HBase Fsck size of, increasing, Configuration splitting, Auto-Sharding, Cluster Operations, Tools, User Table page, Region splits, Region splits, Managed Splitting, Managed Splitting status information for, Cluster Status Information, Cluster Status Information, Cluster Status Information in transition, map of, Cluster Status Information, Main page unassigning, Tools RegionScanner class, Read Path regionservers file, Specifying region servers, regionserver, regionservers, Script-Based (see also configuration) RegionSplitter utility, Presplitting Regions Relational Database Management System, The Dawn of Big Data (see RDBMS) remote method invocation (RMI), JMX Remote API remote procedure call, Client-side write buffer (see RPC) RemoteAdmin class, REST Java client RemoteHTable class, REST Java client, REST Java client remove() method, HTableDescriptor class, Table Properties removeFamily() method, HTableDescriptor class, Table Properties remove_peer command, HBase Shell, Replication replication, Column Families, Replication, Replication, Region server failover, Replication, Replication for column families, Column Families in HBase Shell, Replication Representational State Transfer, Introduction to REST, Thrift, and Avro (see REST) requests, current number of, Cluster Status Information reset() method, Filter interface, Custom Filters REST (Representational State Transfer), Introduction to REST, Thrift, and Avro, Introduction to REST, Thrift, and Avro, REST, REST Java client, Operation, Operation, Operation, Operation, Operation, Supported formats, Raw binary (application/octet-stream), Plain (text/plain), Plain (text/plain), XML (text/xml), XML (text/xml), XML (text/xml), JSON (application/json), JSON (application/json), JSON (application/json), Protocol Buffer (application/x-protobuf), Raw binary (application/octet-stream), REST Java client, REST Java client, HBase Configuration Properties Base64 encoding used in, XML (text/xml), JSON (application/json) documentation for, Operation formats supported by, Supported formats, Raw binary (application/octet-stream) Java client for, REST Java client, REST Java client JSON format for, JSON (application/json), JSON (application/json) plain text format for, Plain (text/plain), Plain (text/plain) port for, Operation Protocol Buffer format for, Protocol Buffer (application/x-protobuf) raw binary format for, Raw binary (application/octet-stream) starting gateway server for, Operation stopping, Operation verifying operation of, Operation XML format for, XML (text/xml), XML (text/xml) Result class, The Result class, The Result class ResultScanner class, The ResultScanner Class, The ResultScanner Class, Client API: Best Practices RHEL (Red Hat Enterprise Linux), Operating system RMI (remote method invocation), JMX Remote API rolling restarts, Rolling Restarts, Rolling Restarts -ROOT- table, Region Lookups RootComparator class, The KeyValue class RootKeyComparator class, The KeyValue class round-trip time, Client-side write buffer row keys, Tables, Rows, Columns, and Cells, Tables, Rows, Columns, and Cells, Concepts, Partial Key Scans, Pagination, Time Series Data, Time Series Data, Time Series Data field swap and promotion of, Time Series Data for pagination, Pagination for partial key scans, Partial Key Scans randomization of, Time Series Data salting prefix for, Time Series Data RowComparator class, The KeyValue class RowCountProtocol interface, The BaseEndpointCoprocessor class RowFilter class, RowFilter, RowFilter, Filters Summary RowLock class, Single Puts rows, Tables, Rows, Columns, and Cells, Tables, Rows, Columns, and Cells, Single Puts, Single Puts, Single Puts, Single Puts, Client-side write buffer, List of Puts, Single Gets, The Result class, Single Gets, Single Gets, List of Gets, List of Gets, Single Deletes, Single Deletes, Single Deletes, Single Deletes, List of Deletes, List of Deletes, Batch Operations, Batch Operations, Row Locks, Row Locks, Scans, Caching Versus Batching, Multiple Counters, Data manipulation, Data manipulation, Data manipulation, Data manipulation, Data manipulation adding, Single Puts, Single Puts, Client-side write buffer, List of Puts, Data manipulation multiple operations, Client-side write buffer, List of Puts single operations, Single Puts, Single Puts batch operations on, Batch Operations, Batch Operations counting, Data manipulation deleting, Single Deletes, Single Deletes, List of Deletes, List of Deletes, Data manipulation multiple operations, List of Deletes, List of Deletes single operations, Single Deletes, Single Deletes getting, Single Gets, The Result class, List of Gets, List of Gets, Data manipulation multiple operations, List of Gets, List of Gets single operations, Single Gets, The Result class locking, Single Puts, Single Puts, Single Gets, Single Gets, Single Deletes, Single Deletes, Row Locks, Row Locks, Multiple Counters scanning, Scans, Caching Versus Batching, Data manipulation RPC (remote procedure call), Client-side write buffer, RPC Metrics, RPC Metrics metrics for, RPC Metrics, RPC Metrics put operations as, Client-side write buffer RPM (Red Hat Package Manager), Operating system Ruby hashes, in HBase Shell, Commands RVComparator class, The KeyValue class S S (String) metric type, Contexts, Records, and Metrics S3 (Simple Storage Service), S3, S3 Safari Books Online, Safari® Books Online sales, data requirements of, The Dawn of Big Data salting, Time Series Data scalability, Scalability, Scalability Scan class, Introduction, Introduction, Introduction scan command, HBase Shell, Quick-Start Guide, Data manipulation scan operations, Scans, Caching Versus Batching, The ResultScanner Class, Caching Versus Batching, Caching Versus Batching, Caching Versus Batching, Caching Versus Batching, Read Path, Partial Key Scans, Partial Key Scans, Pagination, Pagination (see also get operations) batching, Caching Versus Batching, Caching Versus Batching caching, Caching Versus Batching, Caching Versus Batching leases for, The ResultScanner Class pagination, Pagination, Pagination partial key scans, Partial Key Scans, Partial Key Scans scan() method, HTable class, Introduction to Filters filters for, Introduction to Filters (see filters) schema, Schema Definition, Column Families, Tables, Table Properties, Column Families, Column Families column families, Column Families, Column Families tables, Tables, Table Properties script-based deployment, Script-Based, Script-Based scripting, in HBase Shell, Scripting, Scripting search integration, Search Integration, Search Integration secondary indexes, Dimensions, Secondary Indexes, Secondary Indexes seek operations, compared to transfer operations, Log-Structured Merge-Trees sequential consistency, Nonrelational Database Systems, Not-Only SQL or NoSQL?

pages: 704 words: 182,312

This Is Service Design Doing: Applying Service Design Thinking in the Real World: A Practitioners' Handbook
by Marc Stickdorn , Markus Edgar Hormess , Adam Lawrence and Jakob Schneider
Published 12 Jan 2018

In return, changes to your business model affect the employee and/or customer experience. Expert Tip “Experienced users of the Business Model Canvas will not just use it as a checklist to fill out the boxes. They will use the BMC to design business models that outperform competitors’ with a powerful business model story in which all business model building blocks reinforce each other. Great examples of business model stories are Nespresso, IKEA, or the Nintendo Wii, which succeeded not just based on a great value proposition, but based on a powerful business model.” — Alexander Osterwalder The Business Model Canvas and similar canvases can be used to understand the influence of various options on the employee and customer experience as well as on the business impact.

Running prototyping sessions Prototyping methods share an underlying common structure. Any prototyping method can be broken up into the following three core activities: → Preparation: Prepare prototypes through setting up templates or canvases, scripting and practicing an intended interaction or walkthrough, making physical models, building stages/sets, and/or preparing environments. → Use: Use the prepared objects and practiced activities to explore, evaluate, or communicate a design concept. → Research: Use research methods while running the use scenarios to capture feedback, data and generate insights. Expert Tip “Make sure you do not get stuck on solving problems that only exist in your prototyping environment and not in the actual service system.

pages: 252 words: 73,131

The Inner Lives of Markets: How People Shape Them—And They Shape Us
by Tim Sullivan
Published 6 Jun 2016

While asymmetric information—when the seller knows more than the buyer—complicates the job of turning the economy into an internet bazaar, eBay and its e-commerce brethren have found many ways to get the market to work reasonably well. And as we’ll see, their successes have provided economists with yet more fodder for their model building and experiments—often from within the companies themselves. E-Commerce Comes of Age Skoll came to Silicon Valley just as Omidyar and others were trying to figure out how to transform the World Wide Web into something that could serve as a platform for transparent market exchange. Part of the challenge was that it was created for an entirely different purpose.

pages: 288 words: 73,297

The Biology of Desire: Why Addiction Is Not a Disease
by Marc Lewis Phd
Published 13 Jul 2015

Researchers have found additional brain changes in systems underlying cognitive control, delayed gratification, and abstract skills like comparing and predicting outcomes and selecting best choices. According to the disease model, all these changes are caused by exposure to drugs of abuse, and they are difficult if not impossible to reverse. Of course the disease model builds on a biological framework, and it does a good job of explaining why some individuals are more vulnerable to addiction than others, based on genetic differences and other dispositional factors. And the cure? Well, there doesn’t seem to be one. Addiction is currently viewed as a chronic disease.

pages: 262 words: 79,790

Final Exam: A Surgeon's Reflections on Mortality
by Pauline W. Chen
Published 1 Jan 2006

The averaged judgment of the two trainees turned out to be slightly more reliable. Poses et al., “House Officers’ Prognostic Judgments.” 107 It is a process: L. L. Emanuel has proposed using a more accurately descriptive asymptotic model of death. Rather than defining two discrete states, life and death, the asymptotic model “builds on the fundamental assertion that both biological life and personhood decline in a continuous fashion rather than as an event.” L. L. Emanuel, “Reexamining Death.” 108 “Perhaps the classification”: Lynn and Harrold, Handbook for Mortals. Joanne Lynn has written that while health care and life expectancy have changed, the language we use to describe illness, treatment, and even payment has become misleading.

pages: 589 words: 69,193

Mastering Pandas
by Femi Anthony
Published 21 Jun 2015

Upon submitting our data to Kaggle, the following results are obtained: Formula Kaggle Score C(Pclass) + C(Sex) + Fare 0.77033 C(Pclass) + C(Sex) 0.76555 C(Sex) 0.76555 C(Pclass) + C(Sex) + Age + SibSp + Parch 0.76555 C(Pclass) + C(Sex) + Age + Parch + C(Embarked) 0.78947 C(Pclass) + C(Sex) + Age + Sibsp + Parch + C(Embarked) 0.79426 Random forest The random forest is an example of a non-parametric model as are decision trees. Random forests are based on decision trees. The decision boundary is learned from the data itself. It doesn't have to be a line or a polynomial or radial basis function. The random forest model builds upon the decision tree concept by producing a large number of or a forest of decision trees. It takes a random sample of the data and identifies a set of features to grow each decision tree. The error rate of the model is compared across sets of decision trees to find the set of features that produces the strongest classification model.

pages: 330 words: 77,729

Big Three in Economics: Adam Smith, Karl Marx, and John Maynard Keynes
by Mark Skousen
Published 22 Dec 2006

His economic theory may have been defective, his revolutionary socialism may have been destructive, and Marx himself may have been irascible, but his philosophical analysis of market capitalism has elements of merit and deserves our attention. Update: Marxists Keep Their Hero Alive and Kicking Marxism has never made much of an inroad into economics, which emphasizes high theory and econometric model-building. The few Marxists on campus have included Maurice Dobb at Cambridge, Paul Baran at Stanford, and Paul Sweezy at Harvard. Sweezy (1910-2004) was the most fascinating, being the only economist I know who went from laissez-faire to Marxism. (Whittaker Chambers, Mark Blaug, and Thomas Sowell all went in the opposite direction.)

pages: 372 words: 67,140

Jenkins Continuous Integration Cookbook
by Alan Berg
Published 15 Mar 2012

The Job exporter plugin gives Jenkins the ability to export Job-related information into a properties file that can later be picked up for re-use by other Jobs. Reviewing the code src/main/java/com/meyling/hudson/plugin/job_exporter/ExporterBuilder.java extends hudson.tasks.Builder, whose perform method is invoked when a build is run. The perform method receives the hudson.model.Build object when it is called. The Build instance contains information about the build itself. Calling the build.getBuiltOnStr() method returns a string, which contains the name of the node that the build is running on. The plugin uses a number of these methods to discover the information that is later outputted to a properties file.

pages: 302 words: 73,946

People Powered: How Communities Can Supercharge Your Business, Brand, and Teams
by Jono Bacon
Published 12 Nov 2019

It is also the foundation on which the others are built. More complex communities such as our Champion and Collaborator models incorporate the fundamental principles of the Consumer model. If you want to build a community of customers and fans, this is a good model to start with. MODEL 2: CHAMPIONS The Champions model, builds on top of the Consumer model and takes us a step further. Here community members go beyond discussing a shared interest to actively delivering work that champions the success of the community and its members. They can become an army of advocates that support the success of what you and other community members are trying to accomplish.

pages: 342 words: 72,927

Transport for Humans: Are We Nearly There Yet?
by Pete Dyson and Rory Sutherland
Published 15 Jan 2021

In a 1980s cassette deck that might have been the fluidity of the eject mechanism, for instance; and in a Dyson vacuum cleaner, it’s the transparent body revealing the high-tech wizardry inside. If your job is to maximize a one-dimensional target, these look like alchemy. Table 5 summarizes the model. * * * 37 Developed in the 1980s by Professor Noriaki Kano, his model builds on concepts of ‘hygiene factors’ and value enhancement. See Uxness. 2021. What is Kano model. Web page (www.uxness.in/2015/07/kano-model.html). Table 5. Kano model summary applied to transport design. Kano model attributes Transport design example (a) Must be Basic and essential needs Access to seating Passenger safety Clean windows (b) One dimensional Performance indicators and commodities; more is better Speed and journey time Frequency of service and punctuality (c) Attractive/delight Delight features providing satisfaction when achieved but minimal dissatisfaction when omitted Timely journey updates Free WiFi Discretionary effort (d) Indifferent Essential to the product but unacknowledged by the user Service lifespan of machinery Motorway drainage (e) Reverse Advanced/luxury features that add complexity some users would rather go without Overbearing customer service Journey customization and optional extras These attractive delight features can be achieved through discretionary effort: when you do things for passengers that they know you didn’t have to do.

pages: 829 words: 186,976

The Signal and the Noise: Why So Many Predictions Fail-But Some Don't
by Nate Silver
Published 31 Aug 2012

Needlessly complicated models may fit the noise in a problem rather than the signal, doing a poor job of replicating its underlying structure and causing predictions to be worse. But how much detail is too much—or too little? Cartography takes a lifetime to master and combines elements of both art and science. It probably goes too far to describe model building as an art form, but it does require a lot of judgment. Ideally, however, questions like Kokko’s can be answered empirically. Is the model working? If not, it might be time for a different level of resolution. In epidemiology, the traditional models that doctors use are quite simple—and they are not working that well.

Too many of its parameters are not known to within an order of magnitude; depending on which values you plug in, it can yield answers anywhere from that we are all alone in the universe to that there are billions and billions of extraterrestrial species. However, the Drake equation has nevertheless been a highly useful lens for astronomers to think about life, the universe, and everything. 90. George E. P. Box and Norman R. Draper, Empirical Model-Building and Response Surfaces (New York: Wiley, 1987), p. 424. 91. “Norbert Wiener,” Wikiquote.org. http://en.wikiquote.org/wiki/Norbert_Wiener. CHAPTER 8: LESS AND LESS AND LESS WRONG 1. Roland Lazenby, The Show: The Inside Story of the Spectacular Los Angeles Lakers in the Words of Those Who Lived It (New York: McGraw-Hill Professional, 2006). 2.

pages: 341 words: 84,752

All Tomorrow's Parties
by William Gibson
Published 2 Jan 1999

Like the afterimages of the DatAmerica flows are permanent now, retinally ingrained. No light penetrates from the corridor outside-he's blocked every pinhole with black tape-and the old man's halogen is off. He assumes the old man sleeps there, but he's never seen him do it, never heard any sounds that might indicate a transition from model-building to sleep. Maybe the old man sleeps upright on his mat, Gundam in one hand, brush in the other. Sometimes he can hear music from the adjacent cartons, but it's faint, as though the neighbors use earphones. He has no idea how many people live here in this corridor. It looks as though there might be room for six, but he's seen more, and it may be that they shelter here in shifts.

pages: 411 words: 80,925

What's Mine Is Yours: How Collaborative Consumption Is Changing the Way We Live
by Rachel Botsman and Roo Rogers
Published 2 Jan 2010

“Currently parents are spending upward of $20,000 just on the staples of kids’ clothing by the time their child is seventeen. And they’re retiring some fourteen hundred items!” explained Reinhart. The entrepreneurs discovered that the problems plaguing existing online children’s swap sites were twofold: inconvenience and poor quality. The team had already solved the first problem with the original thredUp model, building an easy-to-use interface and prepaid envelope system that made the experience feel similar to Netflix. But ensuring quality was a big concern, especially for children’s clothing. The challenge was to design a reputation system that would encourage members to send only what they themselves would want to receive.

pages: 290 words: 119,172

Beginning Backbone.js
by James Sugrue
Published 15 Dec 2013

Unless you are a team of one developer (and even then it’s debatable), there is a measure of consistency required so that everyone understands how the data model is defined and represented. This is exactly where Backbone comes in, making development more comfortable for developers to deal with data models, build module views, and send events across the application. Design Patterns for Web Applications Design patterns are credited with bringing maturity to software development. Ever since the seminal Gang of Four book Design Patterns: Elements of Reusable Object-Oriented Software, which introduced a series of reusable solutions and approaches for building applications, programmers have been using patterns to tame their code base.

pages: 291 words: 81,703

Average Is Over: Powering America Beyond the Age of the Great Stagnation
by Tyler Cowen
Published 11 Sep 2013

These programs will confirm some connections we already believe in, see some connections that we currently do not grasp, and perhaps generate some hypotheses that we do not suspect. Economics is not yet there, but perhaps in the next fifty years such endeavors will supplant the economist’s reliance on theoretical models. The power and quality of data will likely grow more rapidly than the power and quality of our best models. Current model-building in the social sciences is analogous to “grandmaster intuition pre–Deep Blue.” Making models has been a very useful approach and indeed it still is useful because the Deep Blue of the social sciences has yet to arrive. In economics, the early uses of machine intelligence will reinforce our understanding of some basic regularities behind economic phenomena.

pages: 261 words: 86,905

How to Speak Money: What the Money People Say--And What It Really Means
by John Lanchester
Published 5 Oct 2014

It’s an example of the moral underpinnings of economics, the fact that “the study of mankind of the ordinary business of life” takes us deep into questions of value, both economic and moral. One way of talking about what has gone wrong in much of economics, especially in how the subject is taught, is to say that it has stopped engaging with questions like these. A field that began life as a branch of “moral philosophy” has turned into a playground of model building, dominated by inappropriate certainties. The particular nature of these certainties is the second way in which economics has gone wrong. They concern a set of assumptions, tied to a particular dogma about how human beings and markets work. The funny thing is, we’re not all that far away from the situation described by Alfred Marshall, when he made his inaugural lecture as Cambridge professor of political economy, “The Present Position of Economics,” in 1885: The chief fault in English economists at the beginning of the century was not that they ignored history and statistics, but that they regarded man as so to speak a constant quantity, and gave themselves little trouble to study his variations.

pages: 360 words: 85,321

The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling
by Adam Kucharski
Published 23 Feb 2016

The group eventually disbanded in 1987, but Kent would continue to bet on sports for the next two decades. Kent said the division of labor remained much the same: he would come up with the forecasts, and Walters would implement the betting. Kent pointed out that much of the success of his predictions came from the attention he put into the computer models. “It’s the model-building that’s important,” he said. “You have to know how to build a model. And you never stop building the model.” Kent generally worked alone on his predictions, but he did get help with one sport. An economist at a major university on the West Coast came up with football predictions each week. The man was very private about his betting research, and Kent referred to him only as “Professor number 1.”

pages: 309 words: 81,975

Brave New Work: Are You Ready to Reinvent Your Organization?
by Aaron Dignan
Published 1 Feb 2019

Thaler, “Anomalies: The Endowment Effect, Loss Aversion, and Status Quo Bias,” Journal of Economic Perspectives 5, no. 1 (1991), doi:10.1257/jep.5.1.193. “put a man on the moon”: Astro Teller, “Google X Head on Moonshots: 10x Is Easier Than 10 Percent,” Wired, February 11, 2013, www.wired.com/2013/02/moonshots-matter-heres-how-to-make-them-happen. “All models are wrong”: George E. P. Box and Norman R. Draper, Empirical Model-Building and Response Surfaces (New York: John Wiley & Sons, 1987), 440. PART TWO: THE OPERATING SYSTEM human-centric design principles: Gary Hamel, “First, Let’s Fire All the Managers,” Harvard Business Review, December 2011, https://hbr.org/2011/12/first-lets-fire-all-the-managers. “business is to increase its profits”: Milton Friedman, “The Social Responsibility of Business Is to Increase Its Profits,” The New York Times Magazine, September 13, 1970, www.colorado.edu/studentgroups/libertarians/issues/friedman-soc-resp-business.html [inactive].

pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies
by Igor Tulchinsky
Published 30 Sep 2019

. •• Multiple-hypothesis testing principles imply that the more effort spent sifting through evidence and the more alternatives considered, the lower the likelihood of choosing an optimal model. •• An out-of-sample period is necessary to validate a model’s predictive ability. The longer the out-of-sample period, the higher the confidence in the model but the less in-sample data available to calibrate the model. The optimal ratio of in-sample to out-of-sample data in model building depends on the model’s complexity. LOOKING BACK Backtesting involves looking back in time to evaluate how a forecast or trading strategy would have performed historically. Although backtesting is invaluable (providing a window into both the markets and how the alpha would have performed), there are two important points to remember: •• History does not repeat itself exactly.

pages: 290 words: 85,847

A Brief History of Motion: From the Wheel, to the Car, to What Comes Next
by Tom Standage
Published 16 Aug 2021

More than 5 million attendees saw the exhibit, lining up for hours to take an eighteen-minute ride on benches that moved over the landscape as smoothly and continuously as the model cars seen whizzing through the congestion-free “City of 1960” below. The diorama combined scenes at a variety of different scales and included more than half a million model buildings, a million trees, and fifty thousand cars, including ten thousand moving along a fourteen-lane highway. Speakers built into the benches played a commentary that, as Business Week put it, “unfolds a prophecy of cities, towns and countrysides served by a comprehensive road system.” For Americans emerging from the Great Depression as a new world war was breaking out in Europe, Futurama offered a hopeful vision of a better future.

pages: 253 words: 79,595

The Joy of Less, A Minimalist Living Guide: How to Declutter, Organize, and Simplify Your Life
by Francine Jay

Do the same for books (on assigned shelves), magazines (on a shelf or rack), and electronic and computer equipment (in a special drawer, cabinet, or container). Modules are particularly useful for organizing craft and hobby supplies. Instead of housing them in a common drawer or cabinet, separate the materials by activity: knitting, scrapbooking, painting, model building, jewelry making, et cetera. Assign each activity its own container; clear plastic storage bins work well, as do the heavy cardboard boxes in which reams of paper are sold (cover them with fabric or contact paper to make them more attractive). Deep, rectangular baskets will also do the trick. When you’re ready to engage in a particular hobby, simply retrieve its module and unpack its supplies onto a convenient (clear!)

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences
by Rob Kitchin
Published 25 Aug 2014

By way of illustration, the emerging field of social physics, where physicists and others often make pronouncements about social and spatial processes based on big data analysis, especially relating to cities and the supposed laws that underpin their formation and functions (Bettencourt et al. 2007; Lehrer 2010; Lohr 2013), often wilfully ignore a couple of centuries of social science scholarship, including nearly a century of quantitative analysis and model building. The result is an analysis of cities that is largely reductionist, functionalist and ignores the effects of culture, politics, policy, governance and capital, and a rich tradition of work that has sought to understand how cities operate socially, culturally, politically, and economically (reproducing the same kinds of limitations generated by the quantitative/positivist social sciences in the mid-twentieth century) (Kitchin 2014; Mattern 2013).

pages: 298 words: 95,668

Milton Friedman: A Biography
by Lanny Ebenstein
Published 23 Jan 2007

In a contemporary conference comment, Friedman made reference to the “desultory skirmishing between what have been loosely designated as the National Bureau and the Cowles Commission techniques of investigating business cycles.”8 Robert Solow, who received the Nobel Prize in Economics in 1987, relates an anecdote about the response to Cowles Commission member Lawrence Klein’s Economic Fluctuations in the United States (1950) at a Cowles seminar, which gives an idea of relations between Friedman and the commission: “There was formal discussion. Friedman concluded that the whole econometric model-building enterprise had been shown to be worthless and congratulated the Cowles Commission on its self-immolation.”9 According to a number of historians of economic thought, the relationship between the Cowles Commission and the Department of Economics at Chicago was less than harmonious. Melvin Reder writes that starting in the later 1940s there was a “struggle for intellectual preeminence and institutional control between Friedman, Wallis and their adherents on one side, and the Cowles Commission and its supporters on the other.

pages: 342 words: 90,734

Mysteries of the Mall: And Other Essays
by Witold Rybczynski
Published 7 Sep 2015

The point was not that one approach was better than the other—Learning from Las Vegas allowed that “both kinds of architecture are valid”—but that, historically speaking, ducks are few and far between. Venturi and his co-authors argued that clients were better served by decorated sheds than by dramatically modeled buildings, no matter how exciting. After all, it is the former approach that has produced some of our most memorable public buildings: John Russell Pope’s National Gallery of Art in Washington, D.C., Carrère & Hastings’s New York Public Library, and McKim, Mead & White’s Symphony Hall in Boston—all decorated sheds.

pages: 282 words: 88,320

Brick by Brick: How LEGO Rewrote the Rules of Innovation and Conquered the Global Toy Industry
by David Robertson and Bill Breen
Published 24 Jun 2013

The second came when LEGO had the insight that it must evolve from producing stand-alone toys to creating an entire system of play, with the brick as the unifying element. Long before the first computer software programs were patented, LEGO made the brick backward compatible, so that a newly manufactured brick could connect with an original 1958 brick. Thanks to backward compatibility, kids could integrate LEGO model buildings from one kit with LEGO model cars, light pylons, traffic signs, train tracks, and more from other kits. No matter what the toy, every brick clicked with every other brick, which meant every LEGO kit was expandable. Thus, the LEGO universe grew with the launch of each new toy. An early publicity campaign summed up the company’s capacity for endless play (and limitless sales) thusly: “You can go on and on, building and building.

pages: 309 words: 95,495

Foolproof: Why Safety Can Be Dangerous and How Danger Makes Us Safe
by Greg Ip
Published 12 Oct 2015

Apart from being a difficult person, Minsky had another problem: his theory wasn’t very useful. Economists had done their best to mimic the natural sciences by building models that yielded predictable results. If the price of widgets went up this much, their sales would fall that much. If interest rates rose by x, employment would fall by y. Minskyism was antithetical to such elegant model building. Extrapolating established trends was, he believed, precisely what caused them to collapse. Confidence and credit would grow hand in hand, until some event caused them to break, and the fragility that had grown quietly alongside them over the years would be exposed. When? Minsky didn’t know any better than anyone else.

pages: 355 words: 92,571

Capitalism: Money, Morals and Markets
by John Plender
Published 27 Jul 2015

More recently, the attempt initiated by the American economist Paul Samuelson in the 1940s to emulate the certainty of the physical sciences led economists to dispense completely with human nature in building their models. They also employ a simplifying concept known as ‘the representative agent’, which assumes, in effect, that everyone in the economy is the same. Another blind spot lies in economists’ tendency to downplay the importance of the institutional context of economies in their model building. The result, as the widely respected British economist John Kay has observed, is that their models resemble the completely artificial worlds found in computer games. Robert Lucas, doyen of modern macro-economics, has defended mainstream economists for their failure to foresee the crisis, arguing that economic theory predicts that these events cannot be predicted.

pages: 369 words: 94,588

The Enigma of Capital: And the Crises of Capitalism
by David Harvey
Published 1 Jan 2010

On the other side of the Atlantic, Robert Samuelson, a columnist for the Washington Post, wrote in a somewhat similar vein: ‘Here we have the most spectacular economic and financial crisis in decades … and the one group that spends most of its waking hours analyzing the economy basically missed it.’ Yet the country’s 13,000 or so economists seemed singularly disinclined to engage in ‘rigorous self-criticism to explain their lapses’. Samuelson’s own conclusion was that the economic theorists were too interested in sophisticated forms of mathematical model-building to bother with the messiness of history and that this messiness had caught them out. The Nobel Prize-winning economist and columnist for The New York Times Paul Krugman agreed (sort of!). ‘[T]he economics profession went astray,’ he wrote, ‘because economists, as a group, mistook beauty, clad in impressive-looking mathematics, for truth.’

pages: 324 words: 91,653

The Quantum Thief
by Hannu Rajaniemi
Published 1 Jan 2010

How are you, lover? Pixil’s qupt startles him. It comes with an euphoric burst of joy. Great news. Everybody thought you were adorable. They want you back. I spoke to my mother, and I really think you are just paranoid— He yanks the entanglement ring off and throws it away. It ricochets among the Martian model buildings. The green monster scutters away and hides beneath his bed. He kicks at the cathedral. A part of it dissolves into inert tempmatter, and white dust rises into the air. He continues to break the models, until the floor is full of dust and fragments. He sits among the ruins for a while and tries to figure out how to reassemble them in his head.

Exploring the World of Lucid Dreaming
by Stephen Laberge, Phd and Howard Rheingold
Published 8 Feb 2015

The person who solved automobile repair problems did so by the elements of the problem into his dream and manipulating them until a solution emerged. The chemistry student simply continued working on problems as I he would while awake. The following letter is an example of another kind of mental model building, in which the lucid dreamer was I able to model a highly abstract concept (note that the dreamer had already been through the preparation and incubation phases): A little over a year ago, I was in a linear algebra class that introduced me to vector spaces. I was having a lot of trouble understanding the topic on more than a superficial level.

pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order
by Kai-Fu Lee
Published 14 Sep 2018

Those trends often led them into industries crowded with hundreds of near-identical copycats vying for the hot market of the year. As Taobao did to eBay, these impersonators undercut any attempt to charge users by offering their own products for free. The sheer density of competition and willingness to drive prices down to zero forced companies to iterate: to tweak their products and invent new monetization models, building robust businesses with high walls that their copycat competitors couldn’t scale. In a market where copying was the norm, these entrepreneurs were forced to work harder and execute better than their opponents. Silicon Valley prides itself on its aversion to copying, but this often leads to complacency.

pages: 286 words: 95,372

The Fields Beneath: The History of One London Village
by Gillian Tindall
Published 1 Oct 2002

The concept of ‘slum clearance’ began in the nineteenth century, and the original motive was not even altruism or a passion for social equality so much as concern for public health: the ‘dens and courts’ of places like Somers Town were felt to be breeding grounds for disease which might then spread to more salubrious areas. The Metropolitan Association for Improving the Dwellings of the Industrious Classes, which was founded in 1842 and erected its first ‘Model Buildings’ in St Pancras had, it is true, something of the spirit of all subsequent rehousing enterprises, but its critics were quick to point out that in fact it was not catering for the neediest classes. As John Hollinshead wrote in Ragged London (1861), At St Pancras they have done nothing for the worst class in Somers Town and Agars Town, and they have wasted their means on a class who are well able to help themselves … The costermongers, the street hawkers – the industrious poor – are still rotting up their filthy, ill-drained, ill-ventilated courts, while well-paid mechanics, clerks and porters, willing to sacrifice a certain portion of their self-respect, are the constant tenants of all these model dwellings.

Concentrated Investing
by Allen C. Benello
Published 7 Dec 2016

It makes you much more sensitive to getting something generally right, as opposed to a multivariate, 600-line model that can get everything precisely wrong. The younger guys from the model generation figure that if it’s in the model with all the assumptions then it must be right. I believe that model building sometimes distracts from time spent figuring out the key strategic questions that should be addressed to management. He also reads the annual report and a great many transcripts of quarterly earnings calls and presentations. I like to read transcripts. I get a good feel for the management and trends in the business by reading a year’s worth of transcripts.

pages: 340 words: 91,416

Lost in Math: How Beauty Leads Physics Astray
by Sabine Hossenfelder
Published 11 Jun 2018

“In hindsight, it is surprising how much emphasis was put on this naturalness argument,” says Michael. “If I look back, people repeated the same argument, again and again, not really reflecting on it. They were saying the same thing, saying the same thing, for ten years. It is really surprising that this was the main driver for so much of model building. Looking back, I find this strange. I still think naturalness is appealing, but I’m not convinced anymore that this points to new physics at the LHC.” The LHC finished its first run in February 2013, then shut down for an upgrade. The second run at higher energies started in April 2015. Now it is October 2015, and in the coming months we expect to see preliminary results of run two.

The Wood Age: How One Material Shaped the Whole of Human History
by Roland Ennos
Published 18 Feb 2021

When one visits a new town, there is nowhere better to get a feel for the place and a sense of its history: to understand why it is there and why it is laid out in the way it is. Charming and old-fashioned, the museums are a fine testament to civic pride and to the enthusiasm and curiosity of the local people. A treasure trove of everyday bygones—workmen’s tools, sepia photographs of men in suits and bowler hats and women in voluminous skirts and aprons, model buildings and ships, stuffed animals and human skeletons—they bring our ancestors alive to us like nothing else. The museums invariably start with a section on the local geography and geology, with a collection of fossils, before introducing a major display on “Xville in the early Stone Age.” An imaginative diorama depicts Xville thousands of years ago.

The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do
by Erik J. Larson
Published 5 Apr 2021

The fact that conjectures lead to discoveries doesn’t fit with mechanical accounts of science; to the contrary, it contradicts them. But detective work, scientific discovery, innovation, and common sense are all workings of the mind; they are all inferences that AI scientists in search of generally intelligent machines must somehow account for. As you can see, cognitive modeling—building a computer to think, to infer—is puzzling. AI researchers (at least for now) should be most concerned with inference in its everyday context. Why? Because the vast majority of inferences we make are seemingly mundane, like all the multifarious leaps and guesses made in the course of ordinary conversation.

pages: 976 words: 233,138

The Rough Guide to Poland
by Rough Guides
Published 18 Sep 2018

A high-quality event featuring top international performers. Unsound Oct unsound.pl. Week-long celebration of international electronica and avant-garde DJ activity in various venues across town. Szopki competition (Konkurs szopek) Dec. Kraków craftsmen display their szopki (ornate Christmas cribs featuring model buildings and nativity figures) on the main square. The best examples are displayed in the Kraków History Museum. Advent market (Targi Bożonarodzeniowe) Dec. Craft stalls selling jewellery, accessories, woodcarving, speciality foodstuffs, mulled wine and sausages fill the main square. Małopolska The area around Kraków presents an attractive landscape of rolling fields, quiet villages and market towns.

Nowomyśliwska 98 • Daily: Feb & March 10am–4pm; April & Oct 10am–5pm; May, June & Sept 10am–6pm; July & Aug 9am–7pm; Nov–Jan 10am–3pm • 19zł • 609 038 580, baltyckiparkminiatur.pl • Bus #10 from Międzyzdroje, or free golf cart in season Located at the western end of town some 3km from the centre, the open-air Baltic Miniature Park (Bałtycki Park Miniatur) displays model buildings from all over the Baltic region – from Rosenborg Castle in Denmark to St Petersburg’s Cathedral of the Assumption, with loads of famous Polish landmarks inbetween. With model railways, seasonal flower displays and a lake for model boats, it will keep the family entertained for an hour or so. The bison reserve May–Sept Tues–Sun 10am–6pm; Oct–April Tues–Sat 8am–4pm • 6zł • Head east along ul.

pages: 848 words: 227,015

On the Edge: The Art of Risking Everything
by Nate Silver
Published 12 Aug 2024

A barbecue restaurant in Austin, looking at its sales numbers, could run a regression analysis to adjust for factors like the day of the week, the weather, and if there was a big football game in town. The natural companion to analytic thinking is abstract thinking—that is, trying to derive general rules or principles from the things you observe in the world. Another way to describe this is “model building.” The models can be formal, as in a statistical model or even a philosophical model.[*6] Or they can be informal, as in a mental model, or a set of heuristics (rules of thumb) that adapt well to new situations. In poker, for instance, there are millions of permutations for how a particular hand might play out, and it’s impossible to plan for every one.

Statistical analysis on past sales patterns can potentially account for this. It is not as easy as it sounds, and there are many ways it can go wrong (essentially, this is the subject of The Signal and the Noise). But nearly all professions in the River, including the more philosophical ones you’ll find Upriver, involve some attempt at model building. The final term in the cognitive cluster, “decoupling,” is probably less familiar. It’s really just the same thought process as applied to philosophical or political ideas. As Sarah Constantin puts it, decoupling is “the ability to block out context…the opposite of holistic thinking. It’s the ability to separate, to view things in the abstract, to play devil’s advocate.”

pages: 227 words: 32,306

Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize Roi
by Lyndsay Wise
Published 16 Sep 2012

User Guide Too description Methodology Instructions Use Rights, Disclaimer BI Initiative Selection Detailed selection and analysis of BI initiative to be implemented Enhance/display BI applications Incorporate (or integrate with) new source systems Build/enhance the data warehouse platform Profile/Rapid ROI Key inputs that drive costs and benefits throughout model Build/enhance the BI platform Intiative selection drives cost and benefit calculations and improvements to maturity levels Initiative Benefits Estimates direct and indirect benefits enabled by the solution. Four worksheets IT Labor/Services savings Direct cost savings (IT and business) Business user productivity benefits Revenue growth (Margin) List user-selected initiatives that enable each benefit Analysis/Report Analysis of costs and benefits by type Cash flow ROI,payback period, IRR, NPV Financial graphs Capability/maturity levels (As-Is vs.

pages: 317 words: 106,130

The New Science of Asset Allocation: Risk Management in a Multi-Asset World
by Thomas Schneeweis , Garry B. Crowder and Hossein Kazemi
Published 8 Mar 2010

For the most part, this book does not attempt to depict the results of the most current research on 3 A Brief History of Asset Allocation various approaches to asset allocation. In many cases, that research has not undergone a full review or critical analysis and is often based solely on algorithmic based model building. Also, many individuals are simply not aware of or at ease with this current research since their investment background is often rooted in traditional investment books in which much of this “current research” is not included.2 IN THE BEGINNING It should be of no surprise to investors that the two fundamental directives of asset allocation: (1) estimate what may happen and (2) choose a course of action based on those estimates have been at the core of practitioner and academic debate.

pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance
by Emanuel Derman
Published 1 Jan 2004

Recently I experienced a burst of recognition while reading the autobiography of Erwin Chargaff, the Vienna-born discoverer of the eponymous basepairing Chargaff rules that led to Watson and Crick's discovery of the double helical structure of DNA. Chargaff, who disliked the imaginative, shot-in-the-dark, theoretical-physics-like style that characterized Watson and Crick's approach to model building, grew bitter at being asked why he had not made the discovery implied by his rules. In his autobiography he wrote: "Most people are wise and applaud the inevitable; but I, inexplicably, love to be on the losing side" By this route I came to the point where, in 1988, I began to interview for positions at other investment banks.

pages: 471 words: 97,152

Animal Spirits: How Human Psychology Drives the Economy, and Why It Matters for Global Capitalism
by George A. Akerlof and Robert J. Shiller
Published 1 Jan 2009

Brainard, William C., and George L. Perry. 2000. “Making Policy in a Changing World.” In William Brainard and George Perry, eds., Economic Events, Ideas, and Policies: The 1960s and After. Washington, D.C.: Brookings Institution Press, pp. 43–82. Brainard, William C., and James Tobin. 1968. “Pitfalls in Financial Model Building.” American Economic Review 58(2):99–122. Brock, William A., and Steven N. Durlauf. 2003. “Multinomial Choice and Social Interactions.” Unpublished paper, University of Wisconsin-Madison, January 27. Brown, Lucia. 1952. “Individualistic Interiors Mark Dozen Exhibit Homes.” Washington Post, September 7, p.

pages: 349 words: 98,868

Nervous States: Democracy and the Decline of Reason
by William Davies
Published 26 Feb 2019

One of the few indisputable features of quantitative easing is that it benefits the wealthy, as it inflates the price of assets (including real estate), adding to the feeling that this was a conspiracy by elites against ordinary people.3 In what sense were economic technocrats really being objective or apolitical any longer? What is even more bewildering about this whole story is how it combines matters of economic expertise, such as model-building and risk analysis, with matters of the greatest national urgency. One moment we were trusting experts to provide an objective analysis of the world; the next, we were trusting them to act decisively to stave off threats to the basic fabric of civil society. Financial regulators and central banks hire people on the basis of their economic and mathematical skills, but have ended up with far weightier responsibilities.

Lectures on Urban Economics
by Jan K. Brueckner
Published 14 May 2011

Extending the public-sector analysis of chapter 8, section 10.4 addresses a crime-related resource allocation problem: how to divide a city’s police force between its rich and poor neighborhoods. 10.2 The Economic Theory of Crime 10.2.1 A simple model In order to develop a clear picture of how economic factors determine the level of crime in a city, a simple model is useful. This model builds on the work of Gary Becker (1968) but uses elements of a framework sketched by Edward Glaeser (1999). The approach is to focus on the “occupational choices” of individuals, asking whether they become legitimate workers or criminals. After making an occupational choice, criminals also decide on the “intensity” of their criminal activity (the number of crimes to commit per period).

pages: 367 words: 97,136

Beyond Diversification: What Every Investor Needs to Know About Asset Allocation
by Sebastien Page
Published 4 Nov 2020

To improve our forecasts, we must get more specific about the components of our forward- looking views. What do we expect to receive as dividends? What’s our earnings growth forecast? Do we expect significant valuation changes? To answer these questions, we must throw away our envelope and fire up a spreadsheet. The Simplest Valuation-Based Model: Building Blocks The building block model decomposes equity returns into three components: income, growth, and valuation change. I’m partial to it because my first research project in the industry was to backtest it on data for more than 20 countries. In this, I had help from Mark Kritzman, president and CEO of Windham Capital Management and senior partner at State Street Associates.

pages: 359 words: 97,415

Vanishing Frontiers: The Forces Driving Mexico and the United States Together
by Andrew Selee
Published 4 Jun 2018

“We really need to know what is the business and market of the drug trade,” including how the groups move billions of dollars in drug profits back to Mexico from the United States each year. We lack this information partly because most of the violence has occurred on the Mexican side of the border, leading to greater attention there, and partly because US agencies that follow domestic crime, such as the FBI and the DEA, do so largely on a law enforcement model, building cases, rather than on an intelligence model that tries to understand the way criminal groups operate broadly. Throughout 2011, as the FBI was zeroing in on Treviño’s Oklahoma horse ranch and the Monterrey fusion center was mapping the Zetas’ structure in northeastern Mexico, Valdés worked closely with John Brennan, then the White House’s chief counterterrorism advisor (and later CIA director), on a project to map the structure and networks of the organized crime groups on both sides of the border.

pages: 335 words: 96,002

WEconomy: You Can Find Meaning, Make a Living, and Change the World
by Craig Kielburger , Holly Branson , Marc Kielburger , Sir Richard Branson and Sheryl Sandberg
Published 7 Mar 2018

Rarely, if ever, do I hear people acknowledge that the downsides of starting a charity are similar to those that occur when starting a business. No model is perfect at launch—that goes for charity, too. There is risk, a steep learning curve, and a huge initial investment of your own time and money. Running a charity means years devoted to intense learnings about the most effective intervention models, building networks among nonprofit partners, and understanding the governance and fiduciary responsibilities. The skill set is enormous. A 2013 survey found that more than two-thirds of people believe there are too many charities raising money for the same cause.5 Why reinvent the wheel when you can build on something with a proven track record?

pages: 332 words: 100,601

Rebooting India: Realizing a Billion Aspirations
by Nandan Nilekani
Published 4 Feb 2016

An app economy India is often said to be beset by the ‘curse of informality’. Nearly three-fourths of all employment is in the informal sector3—millions of farmers tilling tiny patches of land, retailers running cramped shops, women selling vegetables by the roadside. A common solution to ‘formalize’ our economy is to adopt the industrial model, building organized factories staffed by thousands of employees, creating a network of organized retail outlets, and promoting large organized farms, which can run into thousands of hectares. This is the route the western world took in the twentieth century. But is this really a twenty-first century plan for India?

pages: 419 words: 102,488

Chaos Engineering: System Resiliency in Practice
by Casey Rosenthal and Nora Jones
Published 27 Apr 2020

Teams ultimately learn more from facilitated experiment design and resulting comparison of their mental models. 1 Lisanne Bainbridge, “Ironies of Automation,” Automatica, Vol. 19, No. 6 (1983). 2 Sidney Dekker, Foundations of Safety Science: A Century of Understanding Accidents and Disasters (Boca Raton: Taylor & Francis, CRC Press, 2019). 3 G. E. P. Box, “Robustness in the Strategy of Scientific Model Building,” in R. L. Launer and G. N. Wilkinson (eds.), Robustness in Statistics (New York: Academic Press, 1979), pp. 201–236. 4 Ali Basiri et al., “Automating Chaos Experiments in Production,” International Conference on Software Engineering (ICSE), 2019, https://https://arxiv.org/abs/1905.04648. 5 Robert R.

pages: 353 words: 97,029

How Big Things Get Done: The Surprising Factors Behind Every Successful Project, From Home Renovations to Space Exploration
by Bent Flyvbjerg and Dan Gardner
Published 16 Feb 2023

There is nothing as practical as a theory that is correct. Regression to the mean has been proven mathematically for many types of statistics, and it is highly useful in health, insurance, and schools, on factory floors, in casinos, and in risk management; e.g., for flight safety. Much of statistics and statistical modeling builds on regression to the mean, including the law of large numbers, sampling, standard deviations, and conventional tests of statistical significance. Anyone who has done a basic statistics course has been trained in regression to the mean, whether they are aware of it or not. But regression to the mean presupposes that a population mean exists.

pages: 898 words: 236,779

Digital Empires: The Global Battle to Regulate Technology
by Anu Bradford
Published 25 Sep 2023

Despite their multifaceted global influence, the ethos of these US companies is far from global. These companies continue to reflect the “Californian ideology” and the American techno-libertarian instincts, exporting within their products and services the values of the US market-driven regulatory model. Building on the discussion of the American, Chinese, and European regulatory models in Part I, and the conflicts among those models in Part II, the book now turns to examine these models’ relative global influence. This last part shows how the digital empires have not only engaged in regulatory battles with each other, but also competed for global prominence.

It still today airs news in 27 languages to 23 countries, including Afghanistan, Iran, Russia, and Pakistan.78 Projects such as RFE/RL were designed to counter Communist propaganda and expose people living under Communism to reform ideas with greater knowledge of the everyday freedoms that Western citizens enjoyed.79 Thus, even before the internet era, the US view was that access to free information offered a path toward democracy around the world. The US’s efforts to globalize the anti-censorship aspect of its regulatory model builds on these other democracy promotion initiatives. Starting in the early 2000s in the aftermath of the 9/11 terrorist attacks, there were efforts in Congress to promote American internet freedoms abroad. In 2001–2003, several bipartisan bills were introduced with the goal of exporting democracy, including a bill calling for a global strategy to “defeat Internet jamming and censorship.”80 Members of Congress repeatedly expressed their view that the internet was a tool for promoting individual rights and democracy abroad, and that the US government should increase its funding dedicated to supporting internet freedoms abroad.

pages: 519 words: 102,669

Programming Collective Intelligence
by Toby Segaran
Published 17 Dec 2008

, Learning from Clicks, Learning from Clicks, Setting Up the Database, Feeding Forward, Training with Backpropagation, Training Test, Training Test artificial, Learning from Clicks, Learning from Clicks, Setting Up the Database, Feeding Forward, Training with Backpropagation, Training Test, Training Test backpropagation, Training with Backpropagation connecting to search engine, Training Test designing click-training network, Learning from Clicks feeding forward, Feeding Forward setting up database, Setting Up the Database training test, Training Test neural network classifier, Exercises neural networks, Neural Networks, Neural Networks, Neural Networks, Neural Networks, Training a Neural Network, Training a Neural Network, Training a Neural Network, Strengths and Weaknesses, Strengths and Weaknesses backpropagation, and, Training a Neural Network black box method, Strengths and Weaknesses combinations of words, and, Neural Networks multilayer perceptron network, Neural Networks strengths and weaknesses, Strengths and Weaknesses synapses, and, Neural Networks training, Training a Neural Network using code, Training a Neural Network news sources, A Corpus of News newsfeatures.py, Selecting Sources, Downloading Sources, Downloading Sources, Downloading Sources, Converting to a Matrix, Using NumPy, The Algorithm, Displaying the Results, Displaying the Results, Displaying by Article, Displaying by Article getarticlewords function, Downloading Sources makematrix function, Converting to a Matrix separatewords function, Downloading Sources shape function, The Algorithm showarticles function, Displaying the Results, Displaying by Article showfeatures function, Displaying the Results, Displaying by Article stripHTML function, Downloading Sources transpose function, Using NumPy nn.py, Setting Up the Database, Setting Up the Database, Setting Up the Database, Setting Up the Database searchnet class, Setting Up the Database, Setting Up the Database, Setting Up the Database, Setting Up the Database generatehiddennode function, Setting Up the Database getstrength method, Setting Up the Database setstrength method, Setting Up the Database nnmf.py, The Algorithm difcost function, The Algorithm non-negative matrix factorization (NMF), Supervised versus Unsupervised Learning, Clustering, Non-Negative Matrix Factorization, Non-Negative Matrix Factorization, Non-Negative Matrix Factorization, Using Your NMF Code factorization, Supervised versus Unsupervised Learning goal of, Non-Negative Matrix Factorization update rules, Non-Negative Matrix Factorization using code, Using Your NMF Code normalization, Normalization Function numerical predictions, Building Price Models numpredict.py, Building a Sample Dataset, Building a Sample Dataset, Defining Similarity, Defining Similarity, Defining Similarity, Defining Similarity, Subtraction Function, Subtraction Function, Weighted kNN, Weighted kNN, Cross-Validation, Cross-Validation, Cross-Validation, Heterogeneous Variables, Scaling Dimensions, Optimizing the Scale, Optimizing the Scale, Uneven Distributions, Estimating the Probability Density, Graphing the Probabilities, Graphing the Probabilities, Graphing the Probabilities createcostfunction function, Optimizing the Scale createhiddendataset function, Uneven Distributions crossvalidate function, Cross-Validation, Optimizing the Scale cumulativegraph function, Graphing the Probabilities distance function, Defining Similarity dividedata function, Cross-Validation euclidian function, Defining Similarity gaussian function, Weighted kNN getdistances function, Defining Similarity inverseweight function, Subtraction Function knnestimate function, Defining Similarity probabilitygraph function, Graphing the Probabilities probguess function, Estimating the Probability Density, Graphing the Probabilities rescale function, Scaling Dimensions subtractweight function, Subtraction Function testalgorithm function, Cross-Validation weightedknn function, Weighted kNN wineprice function, Building a Sample Dataset wineset1 function, Building a Sample Dataset wineset2 function, Heterogeneous Variables NumPy, Using NumPy, Using NumPy, Simple Usage Example, NumPy, Installation on Other Platforms, Installation on Other Platforms installation on other platforms, Installation on Other Platforms installation on Windows, Simple Usage Example usage example, Installation on Other Platforms using, Using NumPy O online technique, Strengths and Weaknesses Open Web APIs, Open APIs optimization, Optimization, Group Travel, Representing Solutions, Representing Solutions, Representing Solutions, Representing Solutions, The Cost Function, The Cost Function, The Cost Function, Random Searching, Hill Climbing, Simulated Annealing, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Optimizing for Preferences, Optimizing for Preferences, The Cost Function, The Cost Function, Network Visualization, Network Visualization, Counting Crossed Lines, Drawing the Network, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Optimizing the Scale, Exercises, Optimization, Optimization annealing starting points, Exercises cost function, The Cost Function, Optimization exercises, Exercises genetic algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms crossover or breeding, Genetic Algorithms generation, Genetic Algorithms mutation, Genetic Algorithms population, Genetic Algorithms genetic optimization stopping criteria, Exercises group travel cost function, Exercises group travel planning, Group Travel, Representing Solutions, Representing Solutions, Representing Solutions, The Cost Function, The Cost Function car rental period, The Cost Function departure time, Representing Solutions price, Representing Solutions time, Representing Solutions waiting time, The Cost Function hill climbing, Hill Climbing line angle penalization, Exercises network visualization, Network Visualization, Counting Crossed Lines, Drawing the Network counting crossed lines, Counting Crossed Lines drawing networks, Drawing the Network layout problem, Network Visualization network vizualization, Network Visualization pairing students, Exercises preferences, Optimizing for Preferences, Optimizing for Preferences, The Cost Function, The Cost Function cost function, The Cost Function running, The Cost Function student dorm, Optimizing for Preferences random searching, Random Searching representing solutions, Representing Solutions round-trip pricing, Exercises simulated annealing, Simulated Annealing where it may not work, Genetic Algorithms optimization.py, Group Travel, Representing Solutions, Representing Solutions, The Cost Function, Random Searching, Hill Climbing, Simulated Annealing, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Optimizing the Scale annealingoptimize function, Simulated Annealing geneticoptimize function, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms elite, Genetic Algorithms maxiter, Genetic Algorithms mutprob, Genetic Algorithms popsize, Genetic Algorithms getminutes function, Representing Solutions hillclimb function, Hill Climbing printschedule function, Representing Solutions randomoptimize function, Random Searching schedulecost function, The Cost Function P PageRank algorithm, Real-Life Examples, The PageRank Algorithm pairing students, Exercises Pandora, Real-Life Examples parse tree, Programs As Trees Pearson correlation, Hierarchical Clustering, Viewing Data in Two Dimensions hierarchical clustering, Hierarchical Clustering multidimensional scaling, Viewing Data in Two Dimensions Pearson correlation coefficient, Pearson Correlation Score, Pearson Correlation Coefficient, Pearson Correlation Coefficient code, Pearson Correlation Coefficient Pilgrim, Mark, Universal Feed Parser polynomial transformation, The Kernel Trick poplib, Exercises population, Genetic Algorithms, What Is Genetic Programming?, Creating the Initial Population, Genetic Algorithms diversity and, Creating the Initial Population Porter Stemmer, Finding the Words on a Page Pr(Document), Exercises prediction markets, Real-Life Examples price models, Building a Sample Dataset, Building a Sample Dataset, k-Nearest Neighbors, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises building sample dataset, Building a Sample Dataset eliminating variables, Exercises exercises, Exercises item types, Exercises k-nearest neighbors (kNN), k-Nearest Neighbors laptop dataset, Exercises leave-one-out cross-validation, Exercises optimizing number of neighbors, Exercises search attributes, Exercises varying ss for graphing probability, Exercises probabilities, Calculating Probabilities, Starting with a Reasonable Guess, Probability of a Whole Document, A Quick Introduction to Bayes' Theorem, Combining the Probabilities, Graphing the Probabilities, Conditional Probability assumed probability, Starting with a Reasonable Guess Bayes' Theorem, A Quick Introduction to Bayes' Theorem combining, Combining the Probabilities conditional probability, Calculating Probabilities graphing, Graphing the Probabilities of entire document given classification, Probability of a Whole Document product marketing, Other Uses for Learning Algorithms public message boards, Filtering Spam pydelicious, Simple Usage Example, Simple Usage Example, pydelicious installation, Simple Usage Example usage example, Simple Usage Example pysqlite, Building the Index, Persisting the Trained Classifiers, Installation on All Platforms, Installation on All Platforms, pysqlite, Simple Usage Example importing, Persisting the Trained Classifiers installation on other platforms, Installation on All Platforms installation on Windows, Installation on All Platforms usage example, Simple Usage Example Python, Style of Examples, Python Tips advantages of, Style of Examples tips, Python Tips Python Imaging Library (PIL), Drawing the Dendrogram, Python Imaging Library, Installation on Windows, Installation on Windows, Installation on Windows installation on other platforms, Installation on Windows usage example, Installation on Windows Windows installation, Installation on Windows Python, genetic programming and, Programs As Trees, Programs As Trees, Representing Trees in Python, Building and Evaluating Trees, Displaying the Program building and evaluating trees, Building and Evaluating Trees displaying program, Displaying the Program representing trees, Representing Trees in Python traversing complete tree, Programs As Trees Q query layer, Design of a Click-Tracking Network querying, Querying, Querying query function, Querying R radial-basis function, The Kernel Trick random searching, Random Searching random-restart hill climbing, Hill Climbing ranking, What's in a Search Engine?

pages: 329 words: 106,831

All Your Base Are Belong to Us: How Fifty Years of Video Games Conquered Pop Culture
by Harold Goldberg
Published 5 Apr 2011

Thinking the movies were real because the national news was also presented in black-and-white, he hid behind the couch in utter terror, positive the scary monster was going to get him. It was only when his mother bought him a Godzilla model kit that he realized the monstrosity wasn’t going to harm him; he felt that something he made would never turn on him. Yet his mother, Beverly, had ambivalent feelings as his life beyond school became a mix of model building, board games like Panzer Blitz, and the strategy game of many future game designers, Go. “You’ll never amount to anything if you keep that up,” she would say. But he kept on building, graduating to bigger things, constructing dioramas and model trains. His mother would just shrug. And he loved robots, not just those in his favorite sci-fi movies, like 2001: A Space Odyssey and Star Wars, but toys you could control remotely.

pages: 358 words: 106,729

Fault Lines: How Hidden Fractures Still Threaten the World Economy
by Raghuram Rajan
Published 24 May 2010

Subjects completed an average of 10.6 Bionicles when the completed models were left standing in front of them and only 7.2 when the completed ones were dismantled in front of their eyes. Thus they continued to make Bionicles for lower wages when the experiment was structured to give the work more meaning. Seeing the fruits of your labor, even in something as trivial as model building, seems important for motivation!3 In some jobs, it is very hard to see the effects of one’s work. On an assembly line, a worker is just one cog in a huge production machine, and her role in the final product may be small. No wonder modern management techniques try to make each worker feel important both individually and as part of a team: the Japanese kaizen system of continuous improvement, for example, involves all workers in making changes to enhance productivity, no matter how small the changes might be.

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by Eric Siegel
Published 19 Feb 2013

See deception, predicting life insurance companies, PA for lift Lindbergh, Charles LinkedIn friendships, predicting job skills, predicting Linux operating systems LiveJournal Lloyds TSB loan default risks, predicting location data logistic regression London Stock Exchange Los Angeles Police Department Lotti, Michael Loukides, Mike love, predicting Lynyrd Skynyrd (band) M MacDowell, Andie machine learning about courses on in crime prediction data preparation phase data preparation phase for decision trees in induction and induction vs. deduction learning data learning from mistakes in learning machines, building overlearning predictive models, building with silence, concept of testing and validating data univariate vs. multivariate models See also Watson computer Jeopardy! challenge machine risk macroscopic risks Mac vs. Windows users Madrigal, Alexis Magic 8 Ball toy marketing and advertising banner ads and consumer behavior mouse clicks and consumer behavior PA for targeting direct marketing marketing models do-overs in messages, creative design for Persuasion Effect, The quantum humans, influencing response uplift modeling marketing segmentation, decision trees and marriage and divorce, predicting Mars Climate Orbiter Martin, Andres D.

Nation-Building: Beyond Afghanistan and Iraq
by Francis Fukuyama
Published 22 Dec 2005

The authors of the chapters on Iraq (Diamond, Mendelson Forman, and Dobbins) have criticized the Bush administration for failing to provide adequate force levels to stabilize that country in the aftermath of active combat. I, by contrast, argue in the introductory chapter that the Afghan light-footprint • 242 • Guidelines for Future Nation-Builders • model builds early local ownership and is more sustainable on the part of foreign donors and taxpayers. Flournoy notes the importance of maintaining long-term political support for nation-building operations, which is always easier to do when taxpayers do not believe they are financing open-ended development projects.

pages: 312 words: 35,664

The Mathematics of Banking and Finance
by Dennis W. Cox and Michael A. A. Cox
Published 30 Apr 2006

In the case of portfolio analysis this would include market sentiment, the global economy and local factors such as interest rates and currency movements. What will also be available is historic knowledge and management’s judgement to supplement any internally derived estimates. There are clearly uncertainties related to any prediction, or model building, which should be based on stable foundations. A number of factors may affect the VaR. Many of these are effectively time effects which show movements in the key variables that impact the specific VaR. Consequently, showing the time trend inherent in the movement of the VaR tends to show additional information to management.

pages: 379 words: 109,612

Is the Internet Changing the Way You Think?: The Net's Impact on Our Minds and Future
by John Brockman
Published 18 Jan 2011

The Royal Society, founded two decades after Galileo’s death, chose as their motto Nullius in verba (“On the authority of no one”), a principle strikingly at variance with the pre-Gutenberg world. The assumptions (e.g., I should be free to think about and question anything), methods (experimentation, statistical inference, model building), and content (evolutionary biology, quantum mechanics, the computational Theory of Mind) of modern thought are unimaginably different from those held by our ancestors living before Gutenberg. All this—to simplify slightly—because of a drop in the cost of producing books. So what is happening to us, now that the Internet has engulfed us?

pages: 355 words: 63

The Elusive Quest for Growth: Economists' Adventures and Misadventures in the Tropics
by William R. Easterly
Published 1 Aug 2002

Oxford: Oxford University Press. 332 References and Further Reading World Bank. 1994c. “Lao People’s Democratic Republic.” Country Economic Memorandum. Report 12554. March 24. World Bank. 1995a. Bureaucrats in Business. Oxford: Oxford University Press. World Bank. 199513. Latin America After Mexico: Quickening the Pace. Washington, D.C. World Bank. 1995c. RMSM-X Model Building Reference Guide. Washington, D.C. July. World Bank. 1996a. Uganda: The Challenge of Growth and Poverty Reduction. Washington, D.C. World Bank. 199613. BangladeshCountryEconomicMemorandum. Washington, D.C. Report 15900-BD. World Bank. 1997a. Croatia: Beyond Stabilization. Washington, D.C. World Bank.1997b.

pages: 416 words: 106,582

This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking
by John Brockman
Published 14 Feb 2012

Stuart Firestein The Name Game Even words that, like “gravity,” seem well settled may lend more of an aura to an idea than it deserves. Seth Lloyd Living Is Fatal People are bad at probability on a deep, intuitive level. Garrett Lisi Uncalculated Risk We are afraid of the wrong things, and we are making bad decisions. Neil Gershenfeld Truth Is a Model Building models is . . . a never-ending process of discovery and refinement. Jon Kleinberg E Pluribus Unum The challenge for a distributed system is to achieve this illusion of a single unified behavior in the face of so much underlying complexity. Stefano Boeri A Proxemics of Urban Sexuality Even the warmest and most cohesive community can rapidly dissolve in the absence of erotic tension.

pages: 335 words: 111,405

B Is for Bauhaus, Y Is for YouTube: Designing the Modern World From a to Z
by Deyan Sudjic
Published 17 Feb 2015

In 1939, Wallace Harrison’s Trylon and Perisphere provided a soaring landmark, reflecting a world still lost in adolescent wonder at nylon, chrome, motorcars and air conditioning. Norman Bel Geddes designed Futurama, the enormous General Motors display, which proudly claimed to be the city of tomorrow – its 500,000 scale model buildings, its one million trees and 50,000 cars, 10,000 of which actually moved, defining the idea of the modern world in the popular imagination. The models were wonderful, but they reduced the individual to the scale of an ant, paving the way for Moses to start driving expressways through the Bronx, and demolishing swathes of Manhattan for the building of the Lincoln Center during the 1940s and 1950s.

pages: 432 words: 106,612

Trillions: How a Band of Wall Street Renegades Invented the Index Fund and Changed Finance Forever
by Robin Wigglesworth
Published 11 Oct 2021

“While we get many queries regarding PSP, I’ve yet to see an armored car drive up to our door with money in it to invest,” LeBaron observed in November 1973.19 That led a columnist for Pensions & Investments, an industry magazine, to give Batterymarch a “Dubious Achievement Award,” for its endurance in touting its index fund for an entire year without winning a single client.20 Like a good sport, LeBaron went to P&I’s offices to pick up his award, framed the certificate, and hung it in his office.21 The San Francisco–based investment management arm American Express was also at the time in the process of setting up an index fund—advised by William Sharpe at nearby Stanford—but progress was slow and success uncertain. For a while, it seemed like the Field of Dreams model—build it and they will come—might not actually work for the handful of companies willing to explore a new frontier of finance. “We were renegades,” Sinquefield recalls. But the renegades would eventually prove successful, and their invention would ultimately humiliate many of the industry luminaries who had long heaped scorn upon them.

Capital Ideas Evolving
by Peter L. Bernstein
Published 3 May 2007

Like an engineer who looks across a river and begins to design in his mind a method to cross that river, Sharpe is looking for ways to help individual investors get from here to there, from a miasma of self-defeating decisions into an environment where they know how to analyze the investment problem and where to seek solutions to it. W 91 bern_c07.qxd 92 3/23/07 9:05 AM Page 92 THE THEORETICIANS His views on CAPM follow, but first, a brief review of what the model is all about.  In essence, the model builds up from Harry Markowitz’s key notion of diversification: The risk of a portfolio is less than the risk of all the assets of which it is composed. Even a portfolio composed of highly risky assets would not be a risky portfolio if the returns on the individual assets in the portfolio had low levels of correlation with one another.

pages: 321 words: 113,564

AI in Museums: Reflections, Perspectives and Applications
by Sonja Thiel and Johannes C. Bernhardt
Published 31 Dec 2023

The Example Application of Genetic Algorithm for the Framework of Cultural and Creative Brand Design in Tamsui Historical Museum. Soft Computing 22 (8), 2527–2545. https://doi.org/10.1007/ s00500-017-2507-9. D’Orazio, Dario/Montoschi, Federico et al. (2020). Acoustic Comfort in Highly Attended Museums: A Dynamical Model. Building and Environment, 183. https:// doi.org/10.1016/j.buildenv.2020.107176. Gao, Yuan (2021). Forecast Model of Perceived Demand of Museum Tourists Based on Neural Network Integration. Neural Computing & Applications 33 (2), 625–35. https://doi.org/10.1007/s00521-020-05012-4. Garzia, Fabio (2022).

pages: 390 words: 109,438

Into the Raging Sea
by Rachel Slade
Published 4 Apr 2018

See Tropical Storm Erika Essex Junto, 139 Estonia (ship), 91 European Centre for Medium-Range Weather Forecasts (ECMWF), 353–54 Exxon Valdez, 266 families, 266 Facebook page of, 267, 278 Glanfield’s gifts to, 278 Marine Board hearings and, 285–86, 321, 326 notification of, 229–39 reaction to end of search and rescue, 267 spiritual connection with loved ones, 345–49 TOTE’s settlement with, 288, 357 Fantome (schooner), 51 Fawcett, Keith, 296 Fisker-Andersen, Jim, 78, 118, 300–301 “flags of convenience,” 142 Florida, 124, 125, 203, 351 Franklin, James, 51–52 free surface effect, 281 fuel, 115–16, 148, 257 Fukushima nuclear plant meltdown, 170 George III, King of England, 138 Georgia, 203 Glanfield, John, 82–83, 85–86, 89, 92, 93, 277–82, 314 on downflooding angle, 278–79, 281–82 model-building hobby of, 277–78 tour of El Faro, 295 Global Forecast System, 353 global warming, 116 Gloria, Hurricane, 313 Gold Rush, 255 Gomorrah (Saviano), 32–33 Gonzales, Laurence, 273 GPS systems, 16, 18, 181, 185, 210 Great Bahama Bank, 22–23, 103 Great Inagua (GI), 206, 215–20, 248 Great Lakes, 272–73 Great Land (ship), 233 Green, Robert, 39, 40, 285, 347–48 Greene, Philip, 119–20, 302 Davidson interviewed by, 299 Marine Board hearing testimony of, 288–94 Griffin, Keith, 151, 180 Guatemala, 51 Gulf of Aden, 34 Gulf of Mexico, 23, 51, 132 Gulfstream IV, 52 Haiyan, Typhoon, 53 Haley, Paul, 32, 33, 153 Hamilton, Alexander, 139, 140, 254 Hamm, Frank, 102–5, 178, 195, 197–98, 285 Davidson’s encouragement of, 341–43 Davidson’s steering commands to, 186, 189, 190, 191, 192, 193 health problems of, 102 helmet of found, 347 Shultz’s discussion with, 126–27 Hamm, Rochelle, 102–3, 285, 347, 359 Hanjin, 34 Harvey, Hurricane, 351 Hawaii, 170 hearings.

pages: 292 words: 107,998

Cities in the Sky: The Quest to Build the World's Tallest Skyscrapers
by Jason M. Barr
Published 13 May 2024

But here was a regional governmental entity producing building space to earn a profit. More importantly, it showed, over time, that placemaking via record-breaking skyscrapers was a viable option for cities, as the Twin Towers became instant icons of the Manhattan skyline. Just as important was their economic success, which created a new model: Build a record breaker with state support. If need be, fill it up with government agencies (or state-owned businesses outside the United States), give it time for neighborhood growth to kick in, and reap the returns. That it took till 1998 for this strategy to surface in Asia—starting with the Petronas Towers in Kuala Lumpur—was due to the time Asia needed to catch up with its economic development and infrastructure.

pages: 440 words: 117,978

Cuckoo's Egg
by Clifford Stoll
Published 2 Jan 1989

Or we could disable Joe’s account, and see if our troubles ended. My thoughts about the hacker were sidetracked when I found a note in my office: the astronomy group needed to know how the quality of the telescope’s images degraded if they loosened the specifications for the mirrors. This meant an evening of model building, all inside the computer. I wasn’t officially working for them anymore, but blood’s thicker than water … by midnight, I’d plotted the graphs for them. The next morning, I eagerly explained my suspicions about a hacker to Dave Cleveland, “I’ll bet you cookies to doughnuts it’s a hacker.” Dave sat back, closed his eyes, and whispered, “Yep, cookies for sure.”

pages: 492 words: 118,882

The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory
by Kariappa Bheemaiah
Published 26 Feb 2017

As per the Bank of England, 65% of bank lending is directed to residential mortgages while only 14% is directed to non-real estate business creation (Bank of England, 2012). 10See the cost of printing money at: http://www.federalreserve.gov/faqs/currency_12771.htm 11The New Keynesian model is the most popular alternative to the real business cycle theory among mainstream economists and policymakers. Whereas the real business cycle model features monetary neutrality and emphasizes that there should be no active stabilization policy by governments, the New Keynesian model builds in friction that generates monetary non-neutrality and gives rise to a welfare justification for activist economic policies (Sims, 2012) 12This is known as the net interest margin (NIM). NIM = (Investment Returns – Interest Expenses) / Average Earning Assets. 13Note: The authors systematically refer to their model in terms of the CBDC.

pages: 352 words: 120,202

Tools for Thought: The History and Future of Mind-Expanding Technology
by Howard Rheingold
Published 14 May 2000

But if you wanted to plot ten thousand points on a line, or turn a list of numbers into a graphic model of airflow patterns over an airplane wing, you wouldn't want data processing or batch processing. You would want modelling -- an exotic new use for computers that the aircraft designers were pioneering. All Licklider sought, at first, was a mechanical servant to take care of the clerical and calculating work that accompanied model building. Not long after, however, he began to wonder if computers could help formulate models as well as calculate them. When he attained tenure, later that same year, Licklider decided to join a consulting firm near Cambridge named Bolt, Beranek & Newman. They offered him an opportunity to pursue his psychoacoustic research -- and a chance to learn about digital computers.

pages: 457 words: 112,439

Zero History
by William Gibson
Published 6 Sep 2010

Part of the fuselage of a model plane: curved, streamlined, its upper surface yellow, dotted with brown. She bent for a closer look, saw a miniature leopard print, on plastic. “Don’t touch. Stings.” “What is it?” “Taser.” “A Taser?” “Heidi’s. Brought it from Los Angeles by accident, in her bag of Airfix parts. Swept it blindly up with her model-building bumf, when she was well pissed.” “TSA didn’t notice it?” “I hate to break this to you,” he said, feigning grave seriousness, “but that’s actually been known to happen. TSA not noticing the odd thing. Shocking, I know …” “But where would she even get it?” “America? But contrary to the saying, what happens in Vegas evidently doesn’t always stay there.

pages: 426 words: 117,775

The Charisma Machine: The Life, Death, and Legacy of One Laptop Per Child
by Morgan G. Ames
Published 19 Nov 2019

Teachers, he said, should be “struck by the level of intellectual effort that the children were putting into this activity [of playing video games] and the level of learning that was taking place, a level that seemed far beyond that which had taken place just a few hours earlier in school.”95 The precision and technical mastery required by boys’ model-building contests in the early to mid-twentieth century were transferred onto these virtual games, and their guns, cars, planes, and physical competitions reflected the concerns and sensibilities of those boyhood worlds—and almost always featured a male protagonist.96 In the last decade, toy and technology manufacturers in the United States and around the world have broadened their markets and have ostensibly started to target some toys toward a more gender-neutral audience.

pages: 421 words: 120,332

The World in 2050: Four Forces Shaping Civilization's Northern Future
by Laurence C. Smith
Published 22 Sep 2010

Kirilenko, “Climate Change and Food Stress in Russia: What If the Market Transforms as It Did during the Past Century?” Climatic Change 86 (2008): 123-150. 318 There’s more to it than just temperature and rain. A key issue is the so-called CO2 fertilization effect. Plants like CO2, so having more of it in the air tends to increase crop yields. Most agro-climate models build in a hefty benefit for this, based on early greenhouse experiments using enclosed chambers. This enables the models to offset a large share of the damages of summer heat and drought, owing to the anticipated fertilizing benefit from elevated CO2 levels. However, more realistic experiments staged outdoors, using blowers over actual farm fields, show a much lower fertilization benefit.

pages: 755 words: 121,290

Statistics hacks
by Bruce Frey
Published 9 May 2006

magic number, lotteries and MANOVA (multivariate analysis of variance) MCAT (Medical College Admission Test) mean [See also standard error of the mean] ACT calculating Central Limit Theorem central tendency and cut score and 2nd defined 2nd effect size and linear regression and normal curve and 2nd normal distribution precision of predicting test performance 2nd regression toward 2nd 3rd T scores z score 2nd 3rd measurement [See also standard error of measurement] <Emphasis>t</> tests asking questions categorical converting raw scores defined effect of increasing sample size Gott's Principle graphs and improving test scores levels of 2nd normal distribution percentile ranks precise predicting with normal curve probability characteristics reliability of standardized scores 2nd testing fairly validity of 2nd 3rd measures of central tendency median central tendency and 2nd 3rd defined normal curve and medical decisions Michie, Donald Microsoft Excel DATAS software histograms predicting football games Milgram, Stanley 2nd 3rd 4th mind control Minnesota Multiphase Personality Inventory-II test mnemonic devices mode central tendency and 2nd defined normal curve and models building 2nd defined goodness-of-fit statistic and money casinos and 2nd infinite doubling of Monopoly Monty Hall problem multiple choice questions analysis of answer options writing good 2nd 3rd multiple regression criterion variables and defined multiple predictor variables predicting football games multiple regression) multiplicative rule 2nd multivariate analysis of variance (MANOVA) mutually exclusive outcomes Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [V] [W] [X] [Y] [Z] negative correlation 2nd negative numbers negative wording Newcomb, Simon 2nd Nigrini, Mark 2nd 3rd 4th 5th 6th nominal level of measurement 2nd 3rd non-experimental designs norm-referenced scoring defined 2nd percentile ranks simplicity of normal curve Central Limit Theorem and overview precision of predicting with z score and 2nd normal distribution applying characteristics iTunes shuffle and overview shape of traffic patterns null hypothesis defined errors in testing Law of Large Numbers and possible outcomes purpose 2nd 3rd research hypothesis and statistical significance and nuts 2nd Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [V] [W] [X] [Y] [Z] O'Reilly Media 2nd observed score 2nd 3rd odds [See also odds: (see also gambling\\] [See also odds: (see also gambling\\] figuring out 2nd pot odds 2nd Powerball lottery one-way chi-square test ordering scores ordinal level of measurement outcomes blackjack 2nd coin toss comparing number of possible 2nd dice rolls 2nd gambler's fallacy about identifying unexpected likelihood of 2nd mutually exclusive occurrence of specific 2nd predicting 2nd predicting baseball games shuffled deck of cards spotting random trial-and-error learning two-point conversion chart and outs Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [V] [W] [X] [Y] [Z] p-values pairs of cards, counting by parallel forms reliability partial correlations Party Shuffle (iTunes) 2nd 3rd 4th Pascal's Triangle Pascal, Blaise passing epochs payoffs expected 2nd magic number for lotteries Powerball lottery Pearson correlation coefficient 2nd Pedrotti, J.T.

pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr
by Doug Turnbull and John Berryman
Published 30 Apr 2016

A first_name score, for instance, doesn’t provide a signal about the relationship between a first name in the search string and the first_name in the document (which might be nice). Instead it provides vague information. It assumes that any search term might be a first name and scores accordingly. This isn’t helpful when composing a ranking function. Signal modeling builds fields that can be queried with less ambiguity, as you understand the questions to be answered by searched fields. When signal modeling, you must answer these questions: 1. How do users intend to search these fields to obtain the needed information? 2. What work needs to occur to improve or adjust that information?

pages: 478 words: 126,416

Other People's Money: Masters of the Universe or Servants of the People?
by John Kay
Published 2 Sep 2015

Throughout the capital allocation process, expertise in investment has been supplanted by expertise in the mechanics of financial intermediation, an activity that requires greater intellectual capabilities and the ability to do complicated mathematics, rather than the convivial conversation of the nineteenth hole. In the housing sector, local knowledge of property and people has been replaced by model building and securitised product design. In the markets for listed securities, knowledge of companies has been eroded and the greatest rewards are now earned by those who design and implement sophisticated trading algorithms. Banks have centralised small business lending, and venture capital investors have shifted their attention to the refinancing of established businesses.

pages: 432 words: 124,635

Happy City: Transforming Our Lives Through Urban Design
by Charles Montgomery
Published 12 Nov 2013

Merely walking was a challenge because Bogotá’s sidewalks had disappeared under parked cars, and hawkers had completely taken over downtown plazas. The most visible injustice lay in the way that Bogotá apportioned the right to get around. Only one in five Bogotan families owned a car, but the city was increasingly using the highway-fed North American metropolis as its role model, building more road space and leaving drivers, cyclists, and bus riders to duke it out on the open road. Before Peñalosa’s election, Bogotá had been getting technical and planning advice from the Japanese International Cooperation Agency (JICA). This was not unusual. Poor cities often accept help from such international aid agencies.

Building and Dwelling: Ethics for the City
by Richard Sennett
Published 9 Apr 2018

The ‘desks’ I favour are trestle tables on which 4 by 8 sheets of plywood can be laid; during meetings people stand, walking around the physical objects they are creating. I am a big fan of styrofoam. It’s easy to cut and carve, which means that people themselves can do the work of producing a model. The ‘expert’ will show the component parts which people might carve, usually bringing a bag of these parts to serve as type-forms. The point of this model-building is not to produce one but several models of the same building. We would show people how components could be combined in several ways, using a kind of styrofoam glue which is quick-drying and water-solvent, so that it is easy to create, alter or break up a form. The subjunctive voice thus can morph into visual form, in which possibilities and what-if?

pages: 410 words: 120,234

Across the Airless Wilds: The Lunar Rover and the Triumph of the Final Moon Landings
by Earl Swift
Published 5 Jul 2021

But the Land Locomotion Research Laboratory took up other research, wholly terrestrial in nature, that would prove helpful to lunar studies in the not-distant future. Frank Pavlics, now chief of the lab’s Experimental Design Section, was overseeing the construction of scale models for use in the soil bins. They enabled him to study the behavior of entire vehicles in different types of soil and terrain, rather than just wheels or tracks. Model building would become key to his future inquiries. As it happened, they didn’t have to wait long to get back into their lunar experiments, because in late 1960 Bekker received another sweetheart offer. General Motors had recently created a new Defense Systems Division, with offices in Detroit. Its programs catered to the varied needs of the U.S. military.

Hedgehogging
by Barton Biggs
Published 3 Jan 2005

As the agony is prolonged, you become obsessed with this abusive relationship, and it can even overwhelm everything else in your real life. We were tortured unmercifully by a short position we took in oil in May 2004, when the price was about $40 a barrel.We did all our usual analytical work and model building. Our reasoning went like this: Other than a brief war-related spike in 1990, both the nominal and real price of oil were back to the highest levels since the early 1980s.World production and inventories were rising, and the strategic oil reserve was nearing peak capacity. Consumption, we postulated, was probably decelerating as both the world and the Chinese economy slowed.

pages: 411 words: 119,022

Build: An Unorthodox Guide to Making Things Worth Making
by Tony Fadell
Published 2 May 2022

But I wanted it to feel real—to get them, and especially me, to really dive to the details. I wanted to convince them and I wanted them to challenge it and I wanted to tell the story. To figure out if it held together. It took around nine to twelve months of making prototypes and interactive models, building bits of software, talking to users and experts, and testing it with friends before Matt and I decided to take the plunge and actually pitch investors. We didn’t have perfect data that we’d succeed. No amount of research or delayed intuition will ever guarantee that. We probably had 40–50 percent of the risks of starting this company identified, with ideas for how to mitigate them.

pages: 1,073 words: 314,528

Strategy: A History
by Lawrence Freedman
Published 31 Oct 2013

He predicted a change that would involve clarifying and bringing to the surface “the variables and logical models our minds must be using now in decision-making and of persistently improving the logic of these models.”12 One of those he recruited, political-scientist-cum-economist Herbert Simon, recalled a determination to transform business education from a “wasteland of vocationalism” into a “science-based professionalism.” By 1965, Ford was reporting “an increased use of quantitative analysis and model building” and more publications in disciplinary journals in economics, psychology, and statistics. Its original concept had been to integrate the case study method as taught at Harvard with economics, sharpening the case studies while tempering economic theory with a dose of realism. The balance was to be shifted to more research with less description, more theory and less practice.

This encouraged the view that liberal individualism was rational and collectivism irrational.3 The core attraction of the theory, however, was not ideological but that it was elegant, parsimonious, and genuinely innovative. Some of those attracted by its virtues even gamefully sought to demonstrate that it was not incompatible with Marxism. Unfortunately it was often asserted dogmatically and embraced as a project of ambitious model-building. There was ambiguity about whether this theory was descriptive or prescriptive. Did it explain how actors did behave or how they should behave? If prescriptive, then actors would need to make a deliberate decision to follow the advice. That would be the rational thing to do. “To identify a rational choice is to say that an agent would, in some sense and circumstances, do well to make it.

pages: 484 words: 136,735

Capitalism 4.0: The Birth of a New Economy in the Aftermath of Crisis
by Anatole Kaletsky
Published 22 Jun 2010

The quixotic demands to choose between fully predetermined individual microfoundations and uniform rational expectations should have been laughed out of court. So why did this not happen? One reason was purely intellectual. Not only did these methodologies seem to turn economics into a mathematically based science, but they had the further flattering feature of allowing the model-building economist to decree the universal laws of motion be obeyed by all humanity. Rational expectations did not just raise economics to the same status as physics; they elevated economists to the role that Newton had reserved for God. A much more important reason why the rational expectations research programs, despite obvious impracticability, had such a hypnotic influence over academic economics was that they dovetailed so perfectly with the conservative and individualistic ideology that was starting, by the early 1970s, to overwhelm the previous generation’s faith in benign bureaucracy.

pages: 872 words: 135,196

The Market for Force: The Consequences of Privatizing Security
by Deborah D. Avant
Published 17 Oct 2010

Indeed, principals often screen and select agents in order to attain a particular standard and use sanctions to reinforce education and socialization. Furthermore, sanctions (intentional or not) are often key to the development of common values and practices.58 Can these two logics be additive and complementary? Might a model building on both indicate more clearly the range of privatization’s effects across all three dimensions of control? I join a variety of scholars who question the gulf between economists (rationalists) and sociologists (constructivist) to argue that this is the case.59 To begin, I simply juxtapose these two arguments to generate hypotheses that look separately at the three different dimensions of control.

pages: 515 words: 126,820

Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World
by Don Tapscott and Alex Tapscott
Published 9 May 2016

See also Microfinance; Remittances asset ownership, 193–95 humanitarian aid, 20–21, 188–92 implementation challenges and leadership opportunities, 195–96 road map to, 178–81 Prosperity paradox, 51, 172–75 Prosperity passport, 177–78 Prosperity purgatory, 175–76 Prosumers, 136–37, 180–81 Protocols, 271–72 Public discourse, 212–13 Publicity rights, 48–49 Public key infrastructure (PKI), 39–40, 43 Public ledgers, 242 Public policy and implementation, 213–14, 302 Public value, partnering to create, 209–10 Putin, Vladimir, 244 Pyramid of rights, 48 Quantum computing, 277 Rajgopal, Kausik, 194 Ramamurthi, Suresh, 73 Random-sample elections, 219 Reagan, Ronald, 199 Real estate, 159–60 Re-architecting the firm, 21–22, 87–114. See also New business models building ConsenSys, 87–92, 112–14 changing boundaries of the firm, 92–109 determining corporate boundaries, 109–14 Record keeping, 159, 205 Record labels, 228–31, 234–35, 236 Recruiting process, search costs, 96–97, 98 Recycling equipment, 261–62 Red Cross, 20–21, 188–90 Reddit, 88, 129, 130, 248, 301 Redistributed capitalism, 25, 49, 163 Reduced instruction set computer (RISC), 260–61 Regulation (regulators), 9, 174, 289–93, 295 governance vs., 296–97 players in blockchain ecosystem, 286–87 role in financial services industry, 70 World Wide Ledger, 75–76 Remittances, 20, 59, 182–88 Analie’s story, 182–83, 186–87 Renewable energy, 148, 149–50 Reporters Without Borders, 244 Representative democracy, 211–12, 214–15 Reputation, 12, 16, 37 bAirbnb and, 116 credit score and, 79–82 fair music industry, 234 financial services, 63, 64, 79–82, 84 networked integrity and, 30–33 peer production and, 130–31 political, 210–11 Request for proposal (RFP), 160 Resiliency, 84, 148–49, 154 Resource extraction, 157–58 Retail banking services, 71–73 Retail operations and sales, in IoTs, 161 Revelator, 238 Rich databases, 233 Rights creators, 132–34, 234 Rights preservation, 45–49, 202 “Right to be forgotten,” 78 Ripple, 32, 37, 94, 257, 262 Ripple Labs, 59, 67 Risk management, 59–60, 63, 64, 116 Rivest, Ron, 320n Robles Elvira, Eduardo, 218, 219 Rodriguez, Keonne, 266, 274 Ronen V, 132–33 Roosevelt, Franklin Delano, 34 Rossiello, Elizabeth, 287 Roth, Kenneth, 200 Royalties, 229–31, 232, 233 R3 Consortium, 69–70, 305 Russia, 9, 243–44, 253–54 Safaricom, 176 Sales, 161, 179 Salesforce Chatter, 139 Salesforce.com, 118 Santander, 58 Sarbanes-Oxley Act, 74 Satoshi Nakamoto, 5, 6, 133 design principles, 29–31, 34–37, 39–42, 50, 152, 282, 308 implementation challenges, 256, 263, 269 Scalability, 152, 285, 288 Scaling Bitcoin, 288 Scalingbitcoin.org, 305 Scaling ignorance, 213 Scenario planning, 223 Schelling, Thomas, 279 Scherbius, Arthur, 27 Schmidt, Eric, 270 Schneider, Nathan, 259 Search costs, 95–99, 121–22, 142 Secure hash algorithm (SHA-256), 32, 40, 259 Secure multiparty computation, 28 Securities and Exchange Commission (SEC), 83 Security, 39–41, 141, 314n bAirbnb and, 116 breakthrough, 39–41 government and, 202 implications, 41 principle of, 39 problem to be solved, 39, 255 Self-Aware Systems, 46 Self-launched musicians, 235–37 Self-service, 207–11 Sequence search, 97 Serial expropriation, 200 Service aggregators, 17–18, 134–35, 164–65 SETLcoin, 70 Settlement risk, 59 Seven design principles.

pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts
by Richard Susskind and Daniel Susskind
Published 24 Aug 2015

This boundary of advantage, for example, is in fact ‘endogenous’, depending not only upon the productivity of the people and machines but, for instance, on their wages and rents, which in turn depend upon the prices of the goods and services that the tasks they carry out produce, which in turn depend upon the nature of consumer demand and the form of production in the economy, and so on. We abstract from this in this simple model. A more complex model is the subject of research by Daniel Susskind at the University of Oxford. That model builds on the ‘Ricardian model of the labor market’, set out in Daron Acemoglu and David Autor, ‘Skills, Tasks and Technologies: Implications for Employment and Earnings’, in Handbook of Labor Economics, Volume 4, Part B, ed. David Card and Orley Ashenfelter (2011), 1043–171. 28 The spirit of their anxieties is shared with the original nineteenth-century ‘Luddities’ (whose name derives from their declared support for Ned Ludd, an East Midlands weaver who smashed a set of framing machines in anger and in fear in the early tremors of the Industrial Revolution).

How I Became a Quant: Insights From 25 of Wall Street's Elite
by Richard R. Lindsey and Barry Schachter
Published 30 Jun 2007

After just starting my dissertation at Chicago I got an offer to work for Goldman Sachs for what was then planned to be a year away from academia.5 The offer was from the fixed-income group at the asset management division (GSAM). GSAM was only a few years old at the time, not the powerhouse it is today, and thus the opportunities were great. I was originally hired to be a model-building quant. However, after doing some research for the group as a consultant and spending a summer there, I quickly became full-time for a trial year, splitting my time between quant work and portfolio management. I got my start specializing in mortgage-backed securities, and I’m fond of saying it was a great baptism, as every investing disaster that can befall you happens to some variety of a mortgage-back in reasonably short order (interest rate risk, default risk, and maybe most important, “oh my God, who knew I wrote options” risk).

pages: 505 words: 142,118

A Man for All Markets
by Edward O. Thorp
Published 15 Nov 2016

I remember the pervasive acetone smell from drying glue, like that of some brands of nail polish remover. My first propeller-driven planes, powered by rubber band motors, didn’t fly well. They were too heavy because I had used excessive amounts of glue to be sure everything would hold together. When I learned to use glue more judiciously, I had some satisfying flights. The skills from model building and using tools were a valuable prequel to the science experiments that would occupy me during the next few years, and my introduction to planes helped me follow the details of the great air battles of World War II. I was sorry to see Uncle Ed go and worried about what would happen to him if war came.

pages: 478 words: 142,608

The God Delusion
by Richard Dawkins
Published 12 Sep 2006

Emily Dickinson said, That it will never come again Is what makes life so sweet. If the demise of God will leave a gap, different people will fill it in different ways. My way includes a good dose of science, the honest and systematic endeavour to find out the truth about the real world. I see the human effort to understand the universe as a model-building enterprise. Each of us builds, inside our head, a model of the world in which we find ourselves. The minimal model of the world is the model our ancestors needed in order to survive in it. The simulation software was constructed and debugged by natural selection, and it is most adept in the world familiar to our ancestors on the African savannah: a three-dimensional world of medium-sized material objects, moving at medium speeds relative to one another.

The Rough Guide to Cyprus (Travel Guide eBook)
by Rough Guides
Published 30 Apr 2019

That’s the core of the experience (it takes about 45min), though you can add overviews from four castle viewpoints, your kids can take part in treasure hunts, you can experience medieval swordsmanship, make models and much more. The concept is the brainchild of Russian architect Svetlana Petrachkova, and seems to be a Russian/Cypriot co-production, with a team of modellers, craftsmen and computer whizzes who between them created the whole extraordinary edifice. Interestingly, the model buildings were produced, not by traditional modelling methods, but by a 3-D printer. Distinctly odd, then, not cheap and rather fact-heavy, yet certainly worth a look, especially if you’re interested in Cyprus’s medieval history, so often overshadowed by the classical period at the one end and the modern at the other.

pages: 286 words: 94,017

Future Shock
by Alvin Toffler
Published 1 Jun 1984

A remarkable little book. 13 Boulding on post-civilization: [134], p. 7. 13 Boulding's reference to Julius Caesar is from "The Prospects of Economic Abundance," his lecture at the Nobel Conference, Gustavus Adolphus College, 1966. 14 Figures on US agricultural output are from "Malthus, Marx and the North American Breadbasket" by Orville Freeman in Foreign Affairs, July, 1967, p. 587. 15 There is, as yet, no widely accepted or wholly satisfactory term to describe the new stage of social development toward which we seem to be racing. Daniel Bell, the sociologist, coined the term "post-industrial" to signify a society in which the economy is largely based on service, the professional and technical classes dominate, theoretical knowledge is central, intellectual technology—systems analysis, model building, and the like—is highly developed, and technology is, at least potentially, capable of self-sustaining growth. The term has been criticized for suggesting that the society to come will no longer be technologically based—an implication that Bell specifically and carefully avoids. Kenneth Boulding's favorite term, "post-civilization," is employed to contrast the future society with "civilization"—the era of settled communities, agriculture, and war.

pages: 608 words: 150,324

Life's Greatest Secret: The Race to Crack the Genetic Code
by Matthew Cobb
Published 6 Jul 2015

Her X-ray data that had so entranced Watson had clearly shown that the phosphate-sugar groups were on the outside, not the inside, whereas the magnesium ions that held Watson and Crick’s triple helix together would be unable to fulfil this function because they would be surrounded by water molecules. If such a structure ever existed, it would instantly fly apart. All this had been explained in her talk at the King’s meeting, but Watson had not fully understood what she was saying and had not written anything down. Watson and Crick’s first venture into model-building ended in embarrassing failure. Watson’s failure to pay attention was even more significant than he realised. According to Franklin’s notes, when she spoke at the November meeting she described the shape of the ‘unit cell’ (the shape of each molecule) of the DNA crystal as ‘monoclinic’. This crystallographic jargon meant that the molecule would show rotational symmetry, and that if there were chains of molecules wrapped around each other in the structure, they must run in opposite directions.

pages: 444 words: 151,136

Endless Money: The Moral Hazards of Socialism
by William Baker and Addison Wiggin
Published 2 Nov 2009

Although they seem to vigorously support freely floating exchange rates, they conclude in a 1988 paper looking at the economic volatility from the end of Bretton Woods (1971) that “floating rates may not provide the degree of insulation once believed,” debunking the thesis that “transmission that occurred under fixed exchange rates …was mostly prevented when exchange rates floated.”41 Their disparagement of the work of other economists studying the field of fixed and floating exchange rates and their degree of stability to the world economic system is illuminating, for it highlights the inadequacy of econometric capability: “The exercises in model building that have occupied specialists in international economics seem designed to impress readers with the ingenuity of the effort rather than the value of the analytical contribution. The theoretical predictions in many cases conflict because they are model specific. Similarly, the empirical evidence on channels of transmission based on these theories has not yet resulted in a consensus.

Mastering Blockchain, Second Edition
by Imran Bashir
Published 28 Mar 2018

AI is a field of computer science that endeavors to build intelligent agents that can make rational decisions based on the scenarios and environment that they observe around them. Machine learning plays a vital role in AI by making use of raw data as a learning resource. A key requirement in AI-based systems is the availability of authentic data that can be used for machine learning and model building. The explosion of data coming out IoT devices, smartphone's, and other data acquisition means that AI and machine learning is becoming more and more powerful. There is, however, a requirement of authenticity of data. Once consumers, producers, and other entities are on a blockchain, the data that is generated as a result of interaction between these entities can be readily used as an input to machine learning engines with a guarantee of authenticity.

pages: 542 words: 145,022

In Pursuit of the Perfect Portfolio: The Stories, Voices, and Key Insights of the Pioneers Who Shaped the Way We Invest
by Andrew W. Lo and Stephen R. Foerster
Published 16 Aug 2021

One method would be to go beyond the constraints of “long-only” investing by considering using leverage and shorting, for example, with a 130/30 portfolio, shorting the equivalent of 30 percent of one’s portfolio, with the shorting proceeds reinvested in more equities.56 Another approach would be to uncover what he describes as a firm’s “franchise value,” its value-added growth component, and invest in those stocks that have underpriced franchises.57 The Endowment Model Building on these earlier insights, Leibowitz teamed up with Bova and his former colleague at TIAA-CREF, Brett Hammond, to synthesize these ideas about asset allocation and diversification, and created a new framework for endowment funds, one that was also applicable to individual investors. “I always found the individual investors’ problems more challenging,” he admitted.

The Volatility Smile
by Emanuel Derman,Michael B.Miller
Published 6 Sep 2016

In the face of very low safe yields, badly engineered financial models were indeed used to tempt investors—at times misleadingly and deceptively—into buying structured CDOs that promised optimistically high yields. Though our expertise lies in models for option valuation rather than mortgage securities, we also wanted to write a book that illustrates how to be sensible about model building. THE BLACK-SCHOLES-MERTON MODEL AND ITS DISCONTENTS Stephen Ross of MIT, one of the inventors of the binomial option valuation model and the theory of risk-neutral valuation, once wrote: “When judged by its ability to explain the empirical data, option pricing theory is the most successful theory not only in finance, but in all of economics” (Ross 1987).

The Disappearing Act
by Florence de Changy
Published 24 Dec 2020

Three months later, The Sunday Times reported that the Malaysian police had completed their investigation and cleared everyone on board the flight of any suspicions.17 Everyone but the pilot. I found these suspicions surrounding Zaharie somewhat incompatible with the portrayal coming from other sources, that of a smiling grandfather, a model-building and cookery enthusiast, a handyman, a man who always enjoyed a good joke and was well liked by his flying students. True, he readily talked politics and had a sophisticated flight simulator at home. But even the ‘long-time associate’ quoted by The New Zealand Herald acknowledged that Zaharie lived for the ‘3 Fs: family, food and flying’.

pages: 606 words: 157,120

To Save Everything, Click Here: The Folly of Technological Solutionism
by Evgeny Morozov
Published 15 Nov 2013

As Donald Green and Ian Shapiro note in their devastating critique of rational-choice theory—which also takes aim at Lohmann’s work—its leading proponents “share a propensity to engage in method-driven research, and . . . this propensity is characteristic of the drive for universalism.” In other words, since model building is their hammer and their only tool, the proponents of rational-choice theory see everything as a nail—and so they attempt to explain any kind of behavior, no matter complex or culturally specific, using the dry talk of incentives and opportunities. It’s no wonder that Clay Shirky can explain the behavior of anorexic girls, open-source communities, revolutionaries in East Germany, and rebellious teenagers in Belarus through one clean theory of information cascades.

Smart Grid Standards
by Takuro Sato
Published 17 Nov 2015

As in the case of the Israeli grid, the rate of penetration gradually decreases as the network energy capacity increases to comply with increasing generations. This figure also shows that, for smaller network energy capacity, penetration increases at a faster rate when stored energy is used locally than when it is transmitted. This phenomenon occurs because when we allow transmission of the stored energy, the model builds larger storage to reach almost the same grid penetration but achieves a slight reduction of the conventional backup capacity needed. This can be seen from the close correspondence between the lower conventional backup capacities for most of the renewable system sizes given in Figure 9.10 and the relatively higher network energy capacity of the storage under the SET model.

pages: 584 words: 187,436

More Money Than God: Hedge Funds and the Making of a New Elite
by Sebastian Mallaby
Published 9 Jun 2010

The board of Commodities Corporation met in July 1971 and agreed to give Weymar a last chance: The firm would be closed if it lost another $100,000.17 In the bloodletting that followed, four of the original seven founding professionals, including Cootner, left the firm. But the recovery came soon, and it laid the foundations for one of the most successful trading operations of the era. AFTER THE 1971 DEBACLE, WEYMAR SET ABOUT RETHINKING his theory of the market. He had begun with an economist’s faith in model building and data: Prices reflected the fundamental forces of supply and demand, so if you could anticipate those things you were on your way to riches. But experience had taught him some humility. An exaggerated faith in data could turn out to be a curse, breeding the sort of hubris that leads you into trading positions too big to be sustainable.

pages: 602 words: 177,874

Thank You for Being Late: An Optimist's Guide to Thriving in the Age of Accelerations
by Thomas L. Friedman
Published 22 Nov 2016

Boko Haram Bombetoka Bay Bonde, Bob Bork, Les Boston Consulting Group Boston Globe Bourguiba, Habib Boys & Girls Clubs of America Brainerd, Mary “Brains & Machines” (blog) Braun, Gil Brazil breakers, super-empowered; degrading of; humiliation and; weak states and Brew “Brief History of Jews and African Americans in North Minneapolis, A” (Quednau) Brimeyer, Jim Brin, Sergey broadband Broadgate, Wendy Brock, David Brooks, David Brooks, Mel Brookview golf club Brown, John Seely Brynjolfsson, Erik Bucksbaum, Phil Buffett, Warren building information modeling buildings, energy efficient Burke, Edmund Burke, Tom Burnett, T Bone Burning Glass Technologies business: social responsibility and Business Bridge Business Insider Busteed, Brandon Bustle.com “Caddie Chatter” (Long and Seitz) Cairo calcium carbonate California, University of, at San Diego Cambodia campaign spending Campbell, James R.

pages: 777 words: 186,993

Imagining India
by Nandan Nilekani
Published 25 Nov 2008

Birla Institute of Technology and Science birth certificates Bissell, George Bissell, William Blitz Bloom, David Bloomberg, Michael Bofors scandal Bombay Bombay Club Bombay House Bombay Plan Bombay Sensex (BSE) Bombay University bonds, government Bordia, Anil Borlaug, Norman Bose, Ashish Bose, Subhas Chandra Bound Together (Chanda) BPL Mobile Brahmins Brazil BRIC countries bridges British Indian Association Buch, Madhabi build-operate-transfer (BOT) models build-own-operate-transfer (BOOT) model Bulgaria Burma Burmah-Shell Bush, George W. business, Indian: attitudes towards ; big ; caste system in ; competition in ; entrepreneurship in ; family-owned ; foreign ; government regulation of; licenses for ; management of ; mergers and acquisitions (M&A) in ; profits of; small; state-run ; supply chains of ; transparency in ; younger generation in business process outsourcing (BPO) Business Standard Calcutta Calcutta Municipal Corporation Calcutta Tramways Calcutta University Calico Act (1701) Caltex Calypso Foods canals cancer “cap and trade” system capitalism carbon emissions carbon-pricing systems carbon taxes Caste, Class and Power (Béteille) caste certificates Center for Science and Environment (CSE) Center of Indian Trade Unions (CITU) Center of Policy Research (CPR) Central Intelligence Agency (CIA) CENVAT Chak de India Chancellor, Richard Chanda, Nayan Chandigarh Chandni Chowk Chandra, Kanchan Chandrasekhar, S.

Economic Origins of Dictatorship and Democracy
by Daron Acemoğlu and James A. Robinson
Published 28 Sep 2001

This discussion and Proposition 5.1, therefore, highlight how in nondemocracy equilibrium policies are determined by a combination of the preferences of the elite and the constraints that they face. When these constraints are absent or very loose, as in the case in which (5.4) does not bind, what matters is the preferences of the elite. When the constraints are tight (e.g., when (5.4) binds), the elite are constrained in the choices they can make. Our model builds in a natural way on existing models of revolutions. This research – for example, Roemer (1985), Grossman (1991, 1994), Wintrobe (1998), and Bueno de Mesquita et al. (2003) – examines simple games where authoritarian regimes can be overthrown by the citizens and then make various types of responses, concessions such as cutting taxes and redistributing assets, or repression.

pages: 619 words: 197,256

Apollo
by Charles Murray and Catherine Bly Cox
Published 1 Jan 1989

Johnson, who didn’t have an engineering degree (he dropped out after his first year at the University of Virginia, partly for financial reasons and partly because he was bored with school), had grown up within shouting distance of Langley. He had watched the Brain Busters and imitated them, and in the process he became an unexcelled builder of model airplanes. With a minor reputation even in aeronautical engineering circles as the kid who had won a variety of awards in model-building competitions, he was hired as a model-builder for P.A.R.D. at the age of eighteen. In later years, it was a recurring headache for Bob Gilruth to promote Johnson to the next Civil Service level. The Civil Service people kept insisting that because he was not an engineer he couldn’t go any further.

pages: 1,380 words: 190,710

Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems
by Heather Adkins , Betsy Beyer , Paul Blankinship , Ana Oprea , Piotr Lewandowski and Adam Stubblefield
Published 29 Mar 2020

When building the model, review any previously performed risk analyses so you can assign categories of incidents appropriate severity ratings. Doing so will ensure that not all incidents receive a critical or moderate severity rating. Accurate ratings will help incident commanders prioritize when multiple incidents are reported simultaneously. A priority model defines how quickly personnel need to respond to incidents. This model builds upon your understanding of incident severity and can also use a five-point (0–4) scale, with 0 indicating high priority and 4 indicating low priority. Priority drives the tempo of the work required: an incident rated 0 merits immediate incident response, where team members respond to this incident before any other work.

pages: 1,409 words: 205,237

Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale
by Jan Kunigk , Ian Buss , Paul Wilkinson and Lars George
Published 8 Jan 2019

Instances summary As we have seen, cloud providers take different approaches in the structure of their instance portfolios, but each of them provides solutions to address typical big data use cases. All providers permit significant vertical scalability, which some situations might require. Also consider existing reference architectures for the individual cloud providers from Hadoop distributors such as Cloudera. Storage and Life Cycle Models Building on the concepts we introduced in “Cluster Life Cycle Models”, let us see how we can use the storage offerings of the three cloud providers we have just covered to implement various life cycle models for Hadoop clusters. Suspendable clusters At the time of this writing, in our experience, the majority of Hadoop clusters deployed in public clouds are built as suspendable clusters: EBS, Azure VHDs or managed disks, or Google persistent disks are used to implement the cluster’s HDFS instance.

pages: 823 words: 220,581

Debunking Economics - Revised, Expanded and Integrated Edition: The Naked Emperor Dethroned?
by Steve Keen
Published 21 Sep 2011

W. (1950) ‘Mechanical models in economic dynamics,’ Economica, 17(67): 283–305. Phillips, A. W. (1954) ‘Stabilisation policy in a closed economy,’ Economic Journal, 64(254): 290–323. Phillips, A. W. (1968) ‘Models for the control of economic fluctuations,’ in Scientific Growth Systems, Mathematical Model Building in Economics and Industry, London, Griffin, pp. 159–65. Pierce, A. (2008) ‘The Queen asks why no one saw the credit crunch coming,’ Daily Telegraph, London. Pigou, A. C. (1922) ‘Empty economic boxes – a reply,’ Economic Journal, 36: 459–65. Pigou, A. C. (1927) ‘The law of diminishing and increasing cost,’ Economic Journal, 41: 188–97.

pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology
by Ray Kurzweil
Published 14 Jul 2005

Scanning-bandwidth, price-performance, and image-reconstruction times are also seeing comparable exponential growth. These trends hold true for all of the forms of scanning: fully noninvasive scanning, in vivo scanning with an exposed skull, and destructive scanning. Databases of brain-scanning information and model building are also doubling in size about once per year. We have demonstrated that our ability to build detailed models and working simulations of subcellular portions, neurons, and extensive neural regions follows closely upon the availability of the requisite tools and data. The performance of neurons and subcellular portions of neurons often involves substantial complexity and numerous nonlinearities, but the performance of neural clusters and neuronal regions is often simpler than their constituent parts.

pages: 846 words: 232,630

Darwin's Dangerous Idea: Evolution and the Meanings of Life
by Daniel C. Dennett
Published 15 Jan 1995

Skinner was a greedy reductionist, trying to explain all the design (and design power) in a single stroke. The proper response to him should have been: "Nice try — but it turns out to be much more complicated than you think!" And one should have said it without sarcasm, for Skinner's was a nice try. It was a great idea, which inspired (or provoked) a half-century of hardheaded experimentation and model-building from which a great deal was learned. Ironically, it was the repeated failures of another brand of greedy reductionism, dubbed "Good Old-Fashioned AI" or "GOFAI" by Haugeland (1985), that really convinced psychologists that the mind was {396} indeed a phenomenon of surpassing architectural complexity — much too complicated for behaviorism to describe.

pages: 798 words: 240,182

The Transhumanist Reader
by Max More and Natasha Vita-More
Published 4 Mar 2013

As explained earlier, Blue Brain is based on model generation by stochastic sampling. This means that without using a tool such as the ATLUM to acquire and reconstruct in accordance with the structure and function in an individual brain, the Blue Brain Project will not develop a whole brain emulation in the truest sense. Even so, the large-scale model building methods, verification protocols, simulations, and hypothesis testing that result from Blue Brain will be extremely valuable for the development of substrate-independent minds. The collaborative work published by Bock et al. (2011) and by Briggman et al. (2011) are outstanding examples of projects that generate results with useful insights about scope and resolution, as well as about structure-function.

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 17 Apr 2017

There are many different kinds of data models, and every data model embodies assumptions about how it is going to be used. Some kinds of usage are easy and some are not supported; some operations are fast and some perform badly; some data transformations feel natural and some are awkward. It can take a lot of effort to master just one data model (think how many books there are on relational data modeling). Building software is hard enough, even when work‐ ing with just one data model and without worrying about its inner workings. But since the data model has such a profound effect on what the software above it can and can’t do, it’s important to choose one that is appropriate to the application. In this chapter we will look at a range of general-purpose data models for data stor‐ age and querying (point 2 in the preceding list).

pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 16 Mar 2017

There are many different kinds of data models, and every data model embodies assumptions about how it is going to be used. Some kinds of usage are easy and some are not supported; some operations are fast and some perform badly; some data transformations feel natural and some are awkward. It can take a lot of effort to master just one data model (think how many books there are on relational data modeling). Building software is hard enough, even when working with just one data model and without worrying about its inner workings. But since the data model has such a profound effect on what the software above it can and can’t do, it’s important to choose one that is appropriate to the application. In this chapter we will look at a range of general-purpose data models for data storage and querying (point 2 in the preceding list).

Data Mining: Concepts and Techniques: Concepts and Techniques
by Jiawei Han , Micheline Kamber and Jian Pei
Published 21 Jun 2011

.; Weaver, W., The Mathematical Theory of Communication. (1949) University of Illinois Press . [Swe88] Swets, J., Measuring the accuracy of diagnostic systems, Science 240 (1988) 1285–1293. [Swi98] Swiniarski, R., Rough sets and principal component analysis and their applications in feature extraction and selection, data model building and classification, In: (Editors: Pal, S.K.; Skowron, A.) Rough Fuzzy Hybridization: A New Trend in Decision-Making (1999) Springer Verlag, Singapore. [SWJR07] Song, X.; Wu, M.; Jermaine, C.; Ranka, S., Conditional anomaly detection, IEEE Trans. on Knowledge and Data Engineering 19 (5) (2007) 631–645.

pages: 1,088 words: 297,362

The London Compendium
by Ed Glinert
Published 30 Jun 2004

According to the novelist Israel Zangwill, though, the street came alive at Jewish festival times with a ‘pandemonium of caged poultry, clucking and quacking, and cackling and screaming. Fowls and geese and ducks were bought alive, and taken to have their throats cut for a fee by the official slaughterers.’ On the night that Jack the Ripper claimed his fourth and fifth victims, 30 September 1888, a policeman discovered graffiti on the wall of Wentworth Model Buildings, on the east side, which read: ‘The Juwes [sic] are not the men That will be Blamed for nothing’, and on the ground below the graffiti a fragment from the apron of Catherine Eddowes, the fifth victim. To prevent a pogrom the police wiped the words from the wall of the block which, at that time, was inhabited by many Jews, but failed to photograph the wall for their records.

Engineering Security
by Peter Gutmann

“Using Soft Systems Methodology in the Design of Information Systems”, John Mingers, in “Information Systems Provision: The Contribution of Soft Systems Methodology”, McGraw-Hill, 1995. “Soft Systems Methodology: A Thirty Year Retrospective”, Peter Checkland, Systems Research and Behavioral Science, Vol.17, No.S1 (November 2000), p.S11. “Soft Systems Methodology in Action”, Peter Checkland and Jim Scholes, John Wiley and Sons, 1999. “Soft Systems Methodology: Conceptual Model Building and Its Contribution”, Brian Wilson, John Wiley and Sons, 2001. “Soft Systems Methodology”, Peter Checkland, in “Rational Analysis for a Problematic World Revisited (2nd ed)”, John Wiley and Sons, 2001, p.61. “Cloud Computing Roundtable”, Eric Grosse, John Howie, James Ransome, Jim Reavis and Steve Schmidt, IEEE Security and Privacy, Vol.8, No.6 (November/December 2010), p.17.

Principles of Corporate Finance
by Richard A. Brealey , Stewart C. Myers and Franklin Allen
Published 15 Feb 2014

The complete model of your project would include a set of equations for each of the variables: market size, price, market share, unit variable cost, and fixed cost. Even if you allowed for only a few interdependencies between variables and across time, the result would be quite a complex list of equations.8 Perhaps that is not a bad thing if it forces you to understand what the project is all about. Model building is like spinach: You may not like the taste, but it is good for you. Step 2: Specifying Probabilities Remember the procedure for simulating the gambling strategy? The first step was to specify the strategy, the second was to specify the numbers on the roulette wheel, and the third was to tell the computer to select these numbers at random and calculate the results of the strategy: The steps are just the same for your scooter project: Think about how you might go about specifying your possible errors in forecasting market size.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

The procedures can be generalized to deal with views that are not orthographic; to deal with points that are observed in only some views; to deal with unknown camera parameters (like focal length); and to exploit various sophisticated searches for appropriate correspondences. It is practical to accurately reconstruct a model of an entire city from images. Some applications are: •Model building: For example, one might build a modeling system that takes many views depicting an object and produces a very detailed 3D mesh of textured polygons for use in computer graphics and virtual reality applications. It is routine to build models like this from video, but such models can now be built from apparently random sets of pictures.