Markov chain

57 results

Monte Carlo Simulation and Finance
by Don L. McLeish
Published 1 Apr 2005

N. where Pij ≥ 0 for all i, j and P j Pij = 1 for all i. A limiting distribution of a Markov chain is a vector (π say) of long run probabilities of the individual states with the property that πi = limt→∞ P [Xt = i]. A stationary distribution of a Markov chain is the column vector (π say) of probabilities of the individual states such that π0 P = π0 . (3.28) RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 177 π 0 P = π0 . For a Markov chain, every limiting distribution is in fact a stationary distribution. For the basic theory of Markov Chains, see the Appendix. Roughly, a Markov chain which eventually “forgets” the states that were occupied in the distant path, in other words for which the probability of the current states does not vary much as we condition on diﬀerent states in the distant past, is called ergodic.

…

In R, rbinom(m,n,p) will generate a vector of length m of Binomial(n, p) variates. Random Samples Associated with Markov Chains Consider a finite state Markov Chain, a sequence of (discrete) random variables X1 , X2 , . . .each of which takes integer values 1, 2, . . . N (called states). The number of states of a Markov chain may be large or even infinite and it is not always convenient to label them with the positive integers and so it is common to define the state space as the set of all possible states of a Markov chain, but we will give some examples of this later. For the present we restrict attention to the case of a finite state space.

…

Roughly, the method consists of using a convenient “proposal” Markov chain with transition matrix Q to generate transitions, but then only “accept” the move to these new states with probability that depends on the distribution π. The idea resembles that behind importance sampling. The basic result on which the Metropolis-Hastings algorithm is pinned is the following theorem. Theorem 31 Suppose Qij is the transition matrix of a Markov chain. Assume P that g is a vector of non-negative values such that N i=1 gi = G and | gj | · K < ∞ for all i, j Qij for some finite value K. Define ρij = min(1, gj Qji ) gi Qij Then the Markov Chain with transition probability matrix Pij = Qij ρij , for i 6= j has stationary distribution πi = (3.29) gi G.

pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy
by Sharon Bertsch McGrayne
Published 16 May 2011

When Smith spoke at a workshop in Quebec in June 1989, he showed that Markov chain Monte Carlo could be applied to almost any statistical problem. It was a revelation. Bayesians went into “shock induced by the sheer breadth of the method.”12 By replacing integration with Markov chains, they could finally, after 250 years, calculate realistic priors and likelihood functions and do the difficult calculations needed to get posterior probabilities. To outsiders, one of the amazing aspects of Bayes’ history is that physicists and statisticians had known about Markov chains for decades. To illustrate this puzzling lapse, some flashbacks are required.

…

Gelfand was studying the Wong paper when David Clayton from the University of Leicester dropped by and said, “Oh, the paper by Geman and Geman has something to do with it, I think.” Clayton had written a technical report which, although never published, concerned Gibbs sampling. The minute Gelfand saw the Gemans’ paper, the pieces came together: Bayes, Gibbs sampling, Markov chains, and iterations. A Markov chain can be composed of scores of links, and for each one a range of possible variables must be sampled and calculated one after another. Anyone studying a rare and subtle effect must compute each chain over and over again to get a big enough number to reveal the rarity. The numbers involved become enormous, and the length and tediousness of the calculations turned off most researchers.

…

To illustrate this puzzling lapse, some flashbacks are required. Monte Carlo began in 1906, when Andrei Andreyevich Markov, a Russian mathematician, invented Markov chains of variables. The calculations took so long, though, that Markov himself applied his chains only to the vowels and consonants in a Pushkin poem. Thirty years later, with the beginning of nuclear physics in the 1930s, the Italian physicist Enrico Fermi was studying neutrons in collision reactions. Fermi fought insomnia by computing Markov chains in his head to describe the paths of neutrons in collision reactions. To the surprise of his colleagues the next morning, Fermi could predict their experimental results.

pages: 332 words: 93,672

Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
by George Gilder
Published 16 Jul 2018

Now it is time to move beyond the slippery slopes of the Internet and provide an immutable database on which to build new structures of trust and truth: low-entropy carriers for a high-entropy era of human creativity and accomplishment. The new era will move beyond Markov chains of disconnected probabilistic states to blockchain hashes of history and futurity, trust and truth. The opposite of memoryless Markov chains is blockchains. While Markov chains gain efficiency and speed by rigorously eclipsing the past, blockchains elaborately perpetuate the past in every block with a mathematical hash. Perhaps ten thousand times slower to compute than Markov chains, every hash contains an indelible signature of all transactions going back to the original block. Markov chains tell you the statistically likely future without knowing the past; blockchains enable the real-life future by indelibly recording the past.

…

Facilitating Baum’s work was the prestigious but little-known Markov contributor Lee Neuwirth, the longtime head of IDA, who named the predictive use of the chains “hidden Markov models” at a 1980 conference in Princeton. By every measure, the most widespread, immense, and influential of Markov chains today is Google’s foundational algorithm, PageRank, which encompasses the petabyte reaches of the entire World Wide Web. Treating the Web as a Markov chain enables Google’s search engine to gauge the probability that a particular Web page satisfies your search.5 To construct his uncanny search engine, Larry Page paradoxically began with the Markovian assumption that no one is actually searching for anything.

…

Gödel dictates a mathematics of creativity. This mathematics will first encounter a major obstacle in the stunning successes of the prevailing system of the world not only in Silicon Valley but also in finance. CHAPTER 8 Markov and Midas One of the supremely seminal ideas of the twentieth century is the Markov chain. Introduced by the Russian mathematician and information theorist Andrey Markov in 1913, it became a set of statistical tools for predicting the future from the present. Powerfully extended in what are called “hidden Markov models,” the technique can uncover unobserved realities behind a series of observations, such as images of cats and dogs at Google, patterns of weather over time, or even human minds.1 A black-bearded atheist, chess master, and political activist, Markov was dubbed “Andrey the Furious.”

Understanding search engines: mathematical modeling and text retrieval
by Michael W. Berry and Murray Browne
Published 15 Jan 2005

If one assumes that there are no such webpages in the instance of the Web to be modeled (this is obviously not true in reality), then the matrix A is, by definition, a row stochastic matrix so that the power iteration in equation (7.4) reflects a Markov chain [61, 63] or random walk on the graph defined by the link structure (inlinks and outlinks) of the Web (or Google's cached instance of the Web). An irreducible Markov chain is one in which every state can be (ultimately) reached from all other states. In this context, we would say that there exists a path in the Web (or neighborhood) graph from node (or webpage) i to node j for all i,j. In addition, one can show [61, Ch. 8] that the stationary distribution vector x for such a Markov chain is unique and positive7 — desirable properties for a PageRank vector. Several perturbations are made to the stochastic matrix A to produce an irreducible Markov chain [49]. Figure 7.3 illustrates one candidate approach (originally used by Google developers Brin and Page) to guarantee irreducibility. 7 Based on the Perron-Frobenius theorem, this would be the dominant left eigenvector corresponding to the eigenvalue A = 1. 86 Chapter 7.

…

In order to guarantee that the power iteration converges to a unique (and positive) solution x, commonly referred to as the stationary distribution vector for the corresponding Markov chain, some adjustments have to be made to the matrix A. 7.2.1 PageRank Adjustments The matrix A from equation (7.4), also referred to as the raw hyperlinked matrix [49], is nonnegative with all row sums equal to either 1 or 0. A webpage that has no outlinks will have all O's in its corresponding row of the matrix A. If one assumes that there are no such webpages in the instance of the Web to be modeled (this is obviously not true in reality), then the matrix A is, by definition, a row stochastic matrix so that the power iteration in equation (7.4) reflects a Markov chain [61, 63] or random walk on the graph defined by the link structure (inlinks and outlinks) of the Web (or Google's cached instance of the Web).

…

Research in the use of old PageRank scores to estimate new PageRank scores (without reconstructing the Markov chain) is ongoing [48, 23, 49]. Taking into account changes in the link structure (addition/deletion of inlinks and outlinks) must be addressed as well as the addition/deletion of webpages themselves. The latter issue can definitely affect the size of the stochastic iteration matrix A and its 8 e is an n-vector of all ones. 88 Chapter 7. Searching by Link Structure perturbations — making a difficult computation even harder. Approximate methods for updating Markov chains [48, 49] are promising alternatives to the computationally intensive exact methods (see [23]).

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Published 21 Sep 2015

Still not enough to pass the Turing test, but models like this are a key component of machine-translation systems, like Google Translate, which lets you see the whole web in English (or almost), regardless of the language the pages were originally written in. PageRank, the algorithm that gave rise to Google, is itself a Markov chain. Larry Page’s idea was that web pages with many incoming links are probably more important than pages with few, and links from important pages should themselves count for more. This sets up an infinite regress, but we can handle it with a Markov chain. Imagine a web surfer going from page to page by randomly following links: the states of this Markov chain are web pages instead of characters, making it a vastly larger problem, but the math is the same. A page’s score is then the fraction of the time the surfer spends on it, or equivalently, his probability of landing on the page after wandering around for a long time.

…

A page’s score is then the fraction of the time the surfer spends on it, or equivalently, his probability of landing on the page after wandering around for a long time. Markov chains turn up everywhere and are one of the most intensively studied topics in mathematics, but they’re still a very limited kind of probabilistic model. We can go one step further with a model like this: The states form a Markov chain, as before, but we don’t get to see them; we have to infer them from the observations. This is called a hidden Markov model, or HMM for short. (Slightly misleading, because it’s the states that are hidden, not the model.)

…

And in this way we can easily compute the probabilities of extremely unusual states, including states that were never observed before. Bayesian networks give the lie to the common misconception that machine learning can’t predict very rare events, or “black swans,” as Nassim Taleb calls them. In retrospect, we can see that Naïve Bayes, Markov chains, and HMMs are all special cases of Bayesian networks. The structure of Naïve Bayes is: Markov chains encode the assumption that the future is conditionally independent of the past given the present. HMMs assume in addition that each observation depends only on the corresponding state. Bayesian networks are for Bayesians what logic is for symbolists: a lingua franca that allows us to elegantly encode a dizzying variety of situations and devise algorithms that work uniformly in all of them.

Analysis of Financial Time Series
by Ruey S. Tsay
Published 14 Oct 2001

The function Pt (θ, h, A) = P(X h ∈ A | X t = θ), h>t is called the transition probability function of the Markov process. If the transition probability depends on h − t, but not on t, then the process has a stationary transition distribution. 10.1 MARKOV CHAIN SIMULATION Consider an inference problem with parameter vector θ and data X, where θ ∈ Θ. To make inference, we need to know the distribution P(θ | X). The idea of Markov chain simulation is to simulate a Markov process on Θ, which converges to a stationary transition distribution that is P(θ | X). The key to Markov chain simulation is to create a Markov process whose stationary transition distribution is a specified P(θ | X) and run the simulation sufficiently long so that the distribution of the current values of the process is close enough to the stationary transition distribution.

…

The key to Markov chain simulation is to create a Markov process whose stationary transition distribution is a specified P(θ | X) and run the simulation sufficiently long so that the distribution of the current values of the process is close enough to the stationary transition distribution. It turns out that, for a given P(θ | X), many Markov chains with the desired property can be constructed. We refer to methods that use Markov chain simulation to obtain the distribution P(θ | X) as Markov Chain Monte Carlo (MCMC) methods. The development of MCMC methods took place in various forms in the statistical literature. Consider the problem of “missing value” in data analysis. Most statistical methods discussed in this book were developed under the assumption of “complete data” (i.e., there is no missing value).

…

Multivariate Volatility Models and Their Applications 9.1 Reparameterization, 358 9.2 GARCH Models for Bivariate Returns, 363 9.3 Higher Dimensional Volatility Models, 376 9.4 Factor-Volatility Models, 383 9.5 Application, 385 9.6 Multivariate t Distribution, 387 Appendix A. Some Remarks on Estimation, 388 357 x CONTENTS 10. Markov Chain Monte Carlo Methods with Applications 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 Index 395 Markov Chain Simulation, 396 Gibbs Sampling, 397 Bayesian Inference, 399 Alternative Algorithms, 403 Linear Regression with Time-Series Errors, 406 Missing Values and Outliers, 410 Stochastic Volatility Models, 418 Markov Switching Models, 429 Forecasting, 438 Other Applications, 441 445 Preface This book grew out of an MBA course in analysis of financial time series that I have been teaching at the University of Chicago since 1999.

pages: 665 words: 159,350

Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy, and Everything Else
by Jordan Ellenberg
Published 14 May 2021

For babies born in 1917: Vensie, Adelle, Allwood, Walter, Wandeliottlie, Kathryn, Fran, Earnet, Carlus, Hazellia, Oberta . . . The Markov chain, simple as it is, is somehow capturing something of the style of naming practices of different eras. And there’s a way in which one almost experiences it as creative. Some of these names aren’t bad! You can imagine a kid in elementary school named “Jalee,” or, for a retro feel, “Vensie.” Maybe not “Naftalene.” The ability of a Markov chain to produce something like language gives one pause. Is language just a Markov chain? When we talk, are we just producing new words based on the last few words we said, based on some probability distribution we’ve come to learn based on all the other utterances we’ve ever heard?

…

For Markov to really knock down his rival, he needed to come up with a counterexample: a family of variables whose average was completely predictable, but which weren’t independent from each other. It was with this in mind that he invented what we now call the Markov chain. And guess what—it’s the same idea that Ross came up with to model the mosquito, that Bachelier had applied to the stock market, and that Einstein had used to explain Brownian motion. Markov’s first paper on the Markov chain appeared in 1906; he was fifty years old and had retired the year before from his academic position. It was the perfect time to really lean into an intellectual beef. Markov considered a mosquito leading a very constrained life; it only has two places it can fly.

…

This simple Markov process produces something that isn’t English but recognizably kind of looks like English. That’s the spooky power of the chain. Of course, the Markov chain depends on the body of text you use to learn the probabilities: the “training data,” as we say in the machine learning business. Norvig used the massive body of text Google harvested from websites and your emails; Shannon used the books on his shelf; Markov used Pushkin. Here’s some text I generated using a Markov chain trained on a list of names given to babies born in the United States in 1971: Teandola, Amberylon, Madrihadria, Kaseniane, Quille, Abenellett . . .

pages: 169 words: 41,887

Literary Theory for Robots: How Computers Learned to Write
by Dennis Yi Tenen
Published 6 Feb 2024

A long tradition of language machines also includes An Essay Towards a Real Character, and a Philosophical Language by John Wilkins. Written in 1668, the book imagines a new system of writing that would function as a kind of a translation protocol between cultures, commercial interests, and religions. Later, the mathematical notion of a Markov chain, central to contemporary AI, developed out of a paper on Pushkin’s poetry. These mostly forgotten artifacts remain in the background of contemporary computation. Later, we’ll have the chance to pluck a few of them out—unplug and hold them for a bit to see how the whole thing fits together.

…

Chapter 7 Markov’s Pushkin A big leap forward for linguistic intelligence in the twenty-first century came with a little-known, older paper by a statistician, published in a journal of mathematics, on the topic of Russia’s greatest poet, Alexander Pushkin. Wheel, table, pattern, template, schema—the time has come for us to trace the last of our arcane figures—the Orion’s Belt—a string of words, arranged in a series of statistically probable continuations based on observed probabilities, called a Markov chain. An idea before its time, it gathered dust in the archives until computers became powerful enough to appreciate it. Where the figures we have encountered so far were all a part of the same family of methods, Markov took a radically different and simple approach. The chain represented an entirely separate branch of statistically minded language scholarship, which went against the grain of conventional linguistics, changing the very meaning of meaning in the process.

…

Imagine for a moment attempting to guess what letter may follow the initial letter Z to continue a valid English word (similar to the game of Wordle). The letter A would be a good guess, given its occurrence in words like zag or zap. An O would start us on a path to zoo or zombie. However, certain letters would never follow Z, as no words contain the combination of ZK or ZP, for example. You can try experimenting with Markov chains on paper by writing down a few random letters and then guessing at the possible words that could result from their beginnings. Here we must also embark on a little historical side quest. The elephant in the room we’ve managed to ignore so far has been the development of the telegraph, happening roughly from the time of Babbage—past Markov—to MIT’s first research computer.

pages: 360 words: 85,321

The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling
by Adam Kucharski
Published 23 Feb 2016

To predict the effect of the next shuffle, you only need to know the current order; having additional information on the cards’ arrangement several shuffles ago is irrelevant. Thanks to Markov’s work, this one-step memory has become known as the “Markov property.” If the random event is repeated several times, it’s a “Markov chain.” From card shuffling to Chutes and Ladders, Markov chains are common in games of chance. They can also help when searching for hidden information. Remember how it takes at least six dovetail shuffles to properly mix up a deck of cards? One of the mathematicians behind that result was a Stanford professor named Persi Diaconis.

…

It meant solving the equations that described how particles collided, a frustrating task given the crude calculators around at the time. After years of battling with the problem, Metropolis and his colleagues realized that if they linked the brute force of the Monte Carlo method with a Markov chain, they would be able to infer the properties of substances made of interacting particles. By making smarter guesses, it would be possible to gradually uncover values that couldn’t be observed directly. The technique, which became known as “Markov chain Monte Carlo,” is the same one Coram would later use to decipher the prison messages. It eventually took Coram a few thousand rounds of computer-assisted guessing to crack the prison code.

…

(TV game show), 165–166, 167, 171, 190 Johanson, Michael, 177, 183, 188, 189, 192 Johnson, Neil, 124 Journal of Applied Statistics (journal), 74 Julius, George, 44 Kalashnikov, Mikhail, 83 Kaplan, Michael, 172, 177 Kasparov, Garry, 166, 171, 176, 185 Kelly, John, 65 Kelly criterion, 65–67, 144 Kent, Michael, 79, 80–81, 82, 86–87, 102–103, 105, 107, 167 Kentucky Derby, 43 Keynes, John Maynard, 122, 123 King, Brian, 205 Klein, Matthew, 96 Klincewicz, Stefan, 33–34 Knight Capital, 117–118, 119 Laak, Phil, 185–186, 187 Laplace, Pierre Simon, 217 Larsen, Svend Egil, 119 Las Vegas Sports Consultants, 87–88 Latin square, 24 Le Monaco (newspaper), 5, 7 learning ability, 151, 161, 163, 171–172, 173, 174, 176, 177, 187, 188, 190, 217 legalization, 100–102 Leitner, Christoph, 76 Lickel, Charles, 166 lie detection, 190–191 limited-stakes poker, 172, 177, 185, 187, 189 limits, betting, 22, 140 Liverpool soccer club, 104 logistic map, 125–128 Lorenz, Edward, 9, 53, 124 Lorenz cipher, 26 lotteries betting syndicates and, 30, 31–32, 33–34, 202 consultant for, 141 controlled randomness and, 25–26 loopholes found in, 28, 30, 32, 218 roll downs in, 29–32, 33 rollovers in, 29, 33–34, 204 and roulette, biased view of, 98 scooping up jackpots in, 197 security team, 27 traditional, 28–34 university courses studying, 215 Lovelace, Ada, 175 luck in checkers, 156, 157 in chess, 168, 176, 202 habitual, 202 in life, 216 line between skill and, 204 measuring, 205, 217 notion of, exploring, by studying gambling, 216 in poker, 168, 186, 198, 200, 201, 215 probability and, 98 questioning, 28 rarity of profits coming from, 218 in roulette, 202 runs of, 5 skill versus, debate over, 198–200, 201–202, 203 in sports, 84, 85, 204–205 Ma, Will, 214–215 magic formula, 105 magic trick, 217–218 Maisel, Herbert, 37 Major League Baseball playoffs, 209 major sports, problem with focusing on, 107 Maldini, Paolo, 103 Management Science (journal), 50 Mandelbrot, Benoit, 162 man-machine competitions in checkers, 156, 159, 160 in chess, 166, 171, 176 in poker, 185–187 in TV game shows, 165–166, 171 Man-Machine World Championship, 156 maps as abstractions, 210, 211 logistic, 125–128 Markov, Andrei, 62 Markov chain, 62, 64 Markov chain Monte Carlo technique, 64 Markov property, 62, 63 Martinez, Roberto, 209 Massachusetts State Lottery, 29–32 May, Robert, 13, 125, 127, 128, 129, 131 Mazrooei, Parisa, 182 Mazur, Matt, 192–196 McDermott, James, 37 McHale, Ian, 89 mediocrity, regression to, 205 Mega-Millions, 29 memorizing, 179, 180 memory and competitiveness, 161–162 memory capacity, 179–181 Meston, A.

pages: 197 words: 35,256

NumPy Cookbook
by Ivan Idris
Published 30 Sep 2012

The steady state vector determination A Markov chain is a system that has at least two states. For detailed information on Markov chains, please refer to http://en.wikipedia.org/wiki/Markov_chain.The state at time t depends on the state at time t-1, and only the state at t-1. The system switches at random between these states. I would like to define a Markov chain for a stock. Let's say that we have the states flat F, up U, and down D. We can determine the steady state based on end of day close prices. Far into the distant future or in theory infinite time, the state of our Markov chain system will not change anymore.

pages: 589 words: 69,193

Mastering Pandas
by Femi Anthony
Published 21 Jun 2015

In conducting Bayesian analysis in Python, we will need a module that will enable us to calculate the likelihood function using the Monte Carlo method that was described earlier. The PyMC library fulfills that need. It provides a Monte Carlo method known commonly as Markov Chain Monte Carlo (MCMC). I will not delve further into the technical details of MCMC, but the interested reader can find out more about MCMC implementation in PyMC at the following references: Monte Carlo Integration in Bayesian Estimation at http://bit.ly/1bMALeu Markov Chain Monte Carlo Maximum Likelihood at http://bit.ly/1KBP8hH Bayesian Statistical Analysis Using Python-Part 1| SciPy 2014, Chris Fonnesbeck at http://www.youtube.com/watch?

…

For more information, refer to the following web pages: http://en.wikipedia.org/wiki/Poisson_process http://pymc-devs.github.io/pymc/tutorial.html https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers We now create a model with the FB Count data (fb_activity) and the (early_mean, late_mean, and rate respectively) parameters. Next, using Pymc, we create an MCMC object that enables us to fit our data using Markov Chain Monte Carlo methods. We then call the sample on the resulting MCMC object to do the fitting: In [94]: fb_activity_model=pm.Model([fb_activity,early_mean, late_mean,rate]) In [95]: from pymc import MCMC fbM=MCMC(fb_activity_model) In [96]: fbM.sample(iter=40000,burn=1000, thin=20) [-----------------100%-----------------] 40000 of 40000 complete in 11.0 sec Fitting the model using MCMC involves using Markov-Chain Monte Carlo methods to generate a probability distribution for the posterior, P(s,e,l | D).

…

URL / Source installation Cython documentationreference link / Improving performance using Python extensions D datagrouping / Grouping of data reshaping / Pivots and reshaping data resampling / Resampling of data data analysismotivation / Motivation for data analysis big data / We live in a big data world time limitation / So much data, so little time for analysis URL / So much data, so little time for analysis real-time analytics / The move towards real-time analytics DataFrameabout / DataFrame creating / DataFrame Creation creating, with dictionaries of Series / Using dictionaries of Series creating, with dictionary of ndarrays/lists / Using a dictionary of ndarrays/lists creating, with structured array / Using a structured array creating, with Series structure / Using a Series structure constructors / Using a Series structure operations / Operations single row, appending to / Appending a single row to a DataFrame DataFrame.join function / The join function DataFrame constructorsDataFrame.from_dict / Using a Series structure DataFrame.from_records / Using a Series structure DataFrame.from_items / Using a Series structure pandas.io.parsers.read_csv / Using a Series structure pandas.io.parsers.read_table / Using a Series structure pandas.io.parsers.read_fwf / Using a Series structure DataFrame objectsSQL-like merging/joining / SQL-like merging/joining of DataFrame objects DataFrame operationsselection / Selection assignment / Assignment deletion / Deletion alignment / Alignment mathematical operations / Other mathematical operations dataset, Pythonmeasures of central tendency, computing of / Computing measures of central tendency of a dataset in Python data structure, pandasSeries / Series DataFrame / DataFrame panels / Panel data types, Numpyreference link / R data types data types, Rabout / R data types reference link / R data types DateOffset objectabout / DateOffset and TimeDelta objects features / DateOffset and TimeDelta objects ddplyreference link / Split-apply-combine Debian Python pageURL / Linux decision trees / Decision trees dependencereference link / Correlation descriptive statisticsversus inferential statistics / Descriptive statistics versus inferential statistics deviationabout / Deviation and variance dimensionality reduction / Dimensionality reduction discrete probability distributionsabout / Discrete probability distributions discrete uniform distributionabout / Discrete uniform distributions Bernoulli distribution / The Bernoulli distribution binomial distribution / The binomial distribution Poisson distribution / The Poisson distribution Geometric distribution / The Geometric distribution negative binomial distribution / The negative binomial distribution distributionfitting / Fitting a distribution downsamplingabout / Resampling of data E Enhancing Performance, documentationreference link / Improving performance using Python extensions EnthoughtURL / Third-party Python software installation Enthought CanopyURL / Other numeric or analytics-focused Python distributions exponential distributionabout / The exponential distribution reference link / The exponential distribution F Facebook (FB)about / Bayesian analysis example – Switchpoint detection factors / categorical dataabout / Factors/categorical data Fedora software installsURL / Linux file hierarchy, pandaspandas/core / Introduction to pandas' file hierarchy, pandas/core pandas/src / Introduction to pandas' file hierarchy pandas/io / Introduction to pandas' file hierarchy, pandas/io pandas/tools / Introduction to pandas' file hierarchy, pandas/tools pandas/sparse / Introduction to pandas' file hierarchy, pandas/sparse pandas/stats / Introduction to pandas' file hierarchy, pandas/stats pandas/util / Introduction to pandas' file hierarchy, pandas/util pandas/rpy / Introduction to pandas' file hierarchy, pandas/rpy pandas/tests / pandas/tests pandas/compat / pandas/compat pandas/computation / pandas/computation pandas/tseries / pandas/tseries pandas/sandbox / pandas/sandbox filteringapplying, on groupby object / Filtering FM regressionreference link / pandas/stats frequency aliasesreference link / Frequency conversion frequency conversion / Frequency conversion Frequentistsabout / How the model is defined Frequentist statisticsversus Bayesian statistics / Bayesian statistics versus Frequentist statistics G Geometric distributionabout / The Geometric distribution get-pip scriptURL / Third-party Python software installation GitHubIPython download, URL / Windows groupby-transform function / The transform() method groupby.py submoduleSplitter classes / pandas/core Grouper/Grouping classes / pandas/core groupby objectfiltering, applying on / Filtering groupby operationabout / The groupby operation using, with MultiIndex / Using groupby with a MultiIndex GroupBy operatorabout / Aggregation and GroupBy using / The pandas' GroupBy operator H histograms, versus bar plotsreference link / Computing measures of central tendency of a dataset in Python hyperparameters / The scikit-learn ML/classifier interface hypothesis testingabout / Hypothesis testing – the null and alternative hypotheses null hypothesis / The null and alternative hypotheses alternative hypothesis / The null and alternative hypotheses I %in% operator, R / R %in% operator .iat operatorabout / The .iat and .at operators .iloc operatorabout / Label, integer, and mixed indexing .ix operatorabout / Label, integer, and mixed indexing indexing, mixing with / Mixed indexing with the .ix operator illustration, with document classificationabout / Illustration using document classification supervised learning / Supervised learning unsupervised learning / Unsupervised learning independent samples t-tests / Types of t-tests indexing, pandasabout / Basic indexing attributes, accessing with dot operator / Accessing attributes using dot operator range slicing / Range slicing mixing, with .ix operator / Mixed indexing with the .ix operator inferential statisticsversus descriptive statistics / Descriptive statistics versus inferential statistics integer-oriented indexingabout / Label, integer, and mixed indexing, Integer-oriented indexing IntelURL / The move towards real-time analytics Interactive Python (IPython)about / IPython URL / IPython interpolate() functionreference link / Handling missing values IPythoninstallation / IPython installation installation, on Linux / Linux installation, on Windows / Windows installation, URL / Windows installation, on Mac OS/X / Mac OS X installation, via Anaconda / Install via Anaconda (for Linux/Mac OS X) installation, Wakari / Wakari by Continuum Analytics installation, with virtualenv / Virtualenv IPython NotebookURL / IPython Notebook isin() function, pandas / The pandas isin() function J join functionabout / The join function joiningabout / Merging and joining join operationreference link / The join function K K-means clustering / K-means clustering K-means clustering, scikit-learnreference link / K-means clustering KaggleURL / Application of machine learning – Kaggle Titanic competition, A naïve approach to Titanic problem Kaggle Titanic competition applicationabout / Application of machine learning – Kaggle Titanic competition, The titanic: machine learning from disaster problem problem of overfitting / The problem of overfitting L .loc operatorabout / Label, integer, and mixed indexing label-oriented indexing / Label, integer, and mixed indexing, Label-oriented indexingabout / Label-oriented indexing selection, Boolean array used / Selection using a Boolean array lagging / Shifting/lagging lambda functionsreference link / The groupby operation law of large numbers (LLN)reference link / The mean levelsswapping / Swapping and reordering levels re-ordering / Swapping and reordering levels linear regressionabout / Correlation and linear regression, Linear regression example / An illustrative example LinuxPython installation / Linux Anaconda installation / Linux panda installation / Linux IPython installation / Linux logical operators, NumPy arraynp.all() / Logical operators np.any() / Logical operators logical subsettingabout / Logical subsetting in R / Logical subsetting in R in pandas / Logical subsetting in pandas logistic regressionabout / Logistic regression reference link / Logistic regression M machine learningabout / Introduction to machine learning reference link / Introduction to machine learning machine learning applicationKaggle Titanic competition / Application of machine learning – Kaggle Titanic competition machine learning systemsworking / How machine learning systems learn Mac OS/XPython, installing / Linux, Mac OS X Python, installing from compressed tarball / Installing Python from compressed tarball Python installation, URL / Installation using a package manager Anaconda installation / Mac OS X panda installation / Mac IPython installation / Mac OS X IPython installation, URL / Mac OS X Markov Chain Monte Carlo (MCMC)about / Monte Carlo estimation of the likelihood function and PyMC Markov Chain Monte Carlo Maximum Likelihoodreference link / Monte Carlo estimation of the likelihood function and PyMC matching operatorscomparing, in R and pandas / Comparing matching operators in R and pandas mathematical framework, Bayesian statisticsabout / Mathematical framework for Bayesian statistics matplotlibusing, for plotting / Plotting using matplotlib reference link / Plotting using matplotlib maximum likelihood estimator (MLE)about / How the model is defined meanabout / Measures of central tendency and variability, The mean measure of central tendencyabout / Measures of central tendency and variability, Measures of central tendency mean / The mean median / The median mode / The mode computing, for dataset in Python / Computing measures of central tendency of a dataset in Python measure of dispersionabout / Measures of variability, dispersion, or spread range / Range quartile / Quartile measure of spreadabout / Measures of variability, dispersion, or spread measure of variabilityabout / Measures of central tendency and variability, Measures of variability, dispersion, or spread medianabout / Measures of central tendency and variability, The median melt() function, pandasabout / The pandas melt() function melt() function, Rabout / The R melt() function melt functionusing / Using the melt function used, for reshaping / Reshaping using melt merge functionabout / SQL-like merging/joining of DataFrame objects merge function, argumentsleft / SQL-like merging/joining of DataFrame objects right / SQL-like merging/joining of DataFrame objects how / SQL-like merging/joining of DataFrame objects on / SQL-like merging/joining of DataFrame objects left_on / SQL-like merging/joining of DataFrame objects right_on / SQL-like merging/joining of DataFrame objects left_index / SQL-like merging/joining of DataFrame objects right_index / SQL-like merging/joining of DataFrame objects sort / SQL-like merging/joining of DataFrame objects suffixes / SQL-like merging/joining of DataFrame objects copy / SQL-like merging/joining of DataFrame objects merge operationreference link / The join function mergingabout / Merging and joining reference link / The concat function, SQL-like merging/joining of DataFrame objects methods, for reshaping DataFramesabout / Other methods to reshape DataFrames melt function / Using the melt function pandas.get_dummies() function / The pandas.get_dummies() function methods, math.pyrank(..) / pandas/stats solve(..) / pandas/stats inv(..) / pandas/stats is_psd(..) / pandas/stats newey_west(..) / pandas/stats calc_F(..) / pandas/stats methods, parsers.pyread_csv(..) / pandas/io read_table(..) / pandas/io read_fwf(..) / pandas/io methods, pickle.pyto_pickle(..) / pandas/io read_pickle(..) / pandas/io methods, plotting.pyscatter_matrix(..) / pandas/tools andrews_curves(..) / pandas/tools parallel_coordinates(..) / pandas/tools lag_plot(..) / pandas/tools autocorrelation_plot(..) / pandas/tools bootstrap_plot(..) / pandas/tools radviz(..) / pandas/tools methods, sql.pypandasSQL_builder(..) / pandas/io get_schema(..) / pandas/io read_sql_table(..) / pandas/io read_sql_query(..) / pandas/io read_sql(..) / pandas/io methods, util.pyisleapyear(..) / pandas/tseries pivot_annual(..) / pandas/tseries MinGW installation, on WindowsURL / Source installation missing datahandling / Handling missing data missing valueshandling / Handling missing values modeabout / Measures of central tendency and variability, The mode Monte Carlo (MC) integrationabout / Monte Carlo estimation of the likelihood function and PyMC reference link / Monte Carlo estimation of the likelihood function and PyMC Monte Carlo estimation, likelihood functionabout / Monte Carlo estimation of the likelihood function and PyMC Monte Carlo estimation, PyMCabout / Monte Carlo estimation of the likelihood function and PyMC MSI packagesURL, for download / Core Python installation multi-indexing / MultiIndexing MultiIndexgroupby operation, using with / Using groupby with a MultiIndex multiple columnsselecting, in R / Multicolumn selection in R selecting, in pandas / Multicolumn selection in pandas multiple functionsapplying, to column / Applying multiple functions multiple object classes, internals.pyBlock / pandas/core NumericBlock / pandas/core FloatOrComplexBlock / pandas/core ComplexBlock / pandas/core FloatBlock / pandas/core IntBlock / pandas/core BoolBlock / pandas/core TimeDeltaBlock / pandas/core DatetimeBlock / pandas/core ObjectBlock / pandas/core SparseBlock / pandas/core BlockManager / pandas/core SingleBlockManager / pandas/core JoinUnit / pandas/core N N-dimensional version, DataFramereference link / pandas/core naïve approach, to Titanic problem / A naïve approach to Titanic problem negative binomial distributionabout / The negative binomial distribution normal distributionabout / The normal distribution NoSQLURL / Variety of big data np.nan* aggregation functions, NumPyreference link / Handling missing data np.newaxis function / Adding a dimension np.reshape functionURL / Reshaping null, and alternative hypothesesalpha value / The alpha and p-values p-value / The alpha and p-values null hypothesisabout / The null and alternative hypotheses Null Signifcance Hypothesis Testing (NHST) / A t-test example numexprreference link / pandas/computation NumPyndarrays / NumPy ndarrays URL / NumPy ndarrays datatypes / NumPy datatypes datatypes, URL / NumPy datatypes indexing / NumPy indexing and slicing slicing / NumPy indexing and slicing array, slicing / Array slicing array, masking / Array masking complex indexing / Complex indexing NumpyURL / Source installation numpy.dotURL / Basic operations numpy.percentile functionreference link / Quartile NumPy arrayURL / NumPy ndarrays creating / NumPy array creation creating, via numpy.array / NumPy arrays via numpy.array creating, via numpy.arange / NumPy array via numpy.arange creating, via numpy.linspace / NumPy array via numpy.linspace creating, via various other functions / NumPy array via various other functions indexing, URL / Array slicing copies / Copies and views views / Copies and views operations / Operations btoadcasting / Broadcasting shape manipulation / Array shape manipulation sorting / Array sorting Numpy arrayversus R-matrix / R-matrix and NumPy array compared NumPy array, creating via various functionabout / NumPy array via various other functions numpy.ones / numpy.ones numpy.eye / numpy.eye numpy.diag / numpy.diag numpy.random.rand / numpy.random.rand numpy.empty / numpy.empty numpy.tile / numpy.tile NumPy ndarraysabout / NumPy ndarrays O objectsslicing / Slicing and selection oddsabout / Bayes theory and odds one sample independent t-test / Types of t-tests Open SuseURL / Linux operations, NumPy arraybasic operations / Basic operations reduction operations / Reduction operations statistical operators / Statistical operators logical operators / Logical operators Ordinary Least Squares (OLS) / pandas/stats overfitting / The problem of overfitting P p-valuereferences / The alpha and p-values pad methodreference link / Handling missing values paired samples t-test / Types of t-tests Pandasinstalling, from third-party vendor / Installation of Python and pandas from a third-party vendor pandasabout / How Python and pandas fit into the data analytics mix, What is pandas?

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

The x-axis shows the number of samples generated and the y-axis shows the maximum absolute error in any of the probability values for a query on PropertyCost. 13.4.2Inference by Markov chain simulation Markov chain Monte Carlo (MCMC) algorithms work differently from rejection sampling and likelihood weighting. Instead of generating each sample from scratch, MCMC algorithms generate a sample by making a random change to the preceding sample. Think of an MCMC algorithm as being in a particular current state that specifies a value for every variable and generating a next state by making random changes to the current state. The term Markov chain refers to a random process that generates a sequence of states. (Markov chains also figure prominently in Chapters 14 and 16; the simulated annealing algorithm in Chapter 4 and the WALK SAT algorithm in Chapter 7 are also members of the MCMC family.)

…

An arrow labeled 0.0000 from not c not r points to c not r. An arrow labeled 0.5000 from c not r points back to not c not r. Figure 13.21(a) The states and transition probabilities of the Markov chain for the query P(Rain | Sprinkler = true, WetGrass = true). Note the self-loops: the state stays the same when either variable is chosen and then resamples the same value it already has. (b) The transition probabilities when the CPT for Rain constrains it to have the same value as Cloudy. Analysis of Markov chains We have said that Gibbs sampling works by wandering randomly around the state space to generate samples. To explain why Gibbs sampling works correctly—that is, why its estimates converge to correct values in the limit—we will need some careful analysis.

…

To explain why Gibbs sampling works correctly—that is, why its estimates converge to correct values in the limit—we will need some careful analysis. (This section is somewhat mathematical and can be skipped on first reading.) We begin with some of the basic concepts for analyzing Markov chains in general. Any such chain is defined by its initial state and its transition kernel k(x → x')—the probability of a transition to state x' starting from state x. Now suppose that we run the Markov chain for t steps, and let πt (x) be the probability that the system is in state x at time t. Similarly, let πt+1(x') be the probability of being in state x' at time t +1. Given πt (x), we can calculate πt+1 (x') by summing, for all states x the system could be in at time t, the probability of being in x times the probability of making the transition to x': We say that the chain has reached its stationary distribution if πt = πt+1.

pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think
by Marcus Du Sautoy
Published 7 Mar 2019

Pushkin, poetry and probabilities As a young man, François Pachet fantasised about being a musician, composing hits and playing the guitar like his heroes, but, despite some fair attempts at writing music, he was eventually seduced into a career in AI. While the head of the Sony Computer Science Laboratory in Paris, Pachet discovered that the tools he was learning in AI could help him compose music. He created the first AI jazz improviser using a mathematical formula from probability theory known as the Markov chain. Markov chains have been bubbling under many of the algorithms we have been considering thus far. They are fundamental tools for a slew of applications: from modelling chemical processes and economic trends to navigating the internet and assessing population dynamics. Intriguingly, the Russian mathematician Andrey Markov chose to test his theory not on science but on Pushkin’s poetry.

…

Tossing a coin does not depend on previous tosses, so it was not what Markov was after. But what about adding a little dependence so that the next event depends on what just happened, but not on how the system arrived at the current event. A series of events where the probability of each event depends only on the previous event became known as a Markov chain. Predicting the weather is a possible example. Tomorrow’s weather is certainly dependent on today’s but not particularly dependent on what happened last week. Consider the following model. It can be sunny, cloudy or rainy. If it is sunny today there is a 60 per cent chance of sun tomorrow, a 30 per cent chance of clouds and a 10 per cent chance of rain.

…

Knowledge of the previous letter thus changed the chances of a given outcome. This is not unexpected: most words tend to alternate between consonants and vowels. The chance of a vowel following a vowel, he calculated, was only 13 per cent. Eugene Onegin therefore provided a perfect example of a Markov chain to help him explain his ideas. Models of this sort are sometimes called models with amnesia: they forget what has happened and depend on the present to make their predictions. Sometimes the model may be improved by considering how the two previous states might affect the next state. (Knowledge of the two previous letters in Pushkin’s poems might help sharpen your chances of guessing the next letter.)

pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics)
by Trevor Hastie , Robert Tibshirani and Jerome Friedman
Published 25 Aug 2009

Then the page rank solution (divided by N ) is the stationary distribution of an irreducible, aperiodic Markov chain over the N webpages. Definition (14.107) also corresponds to an irreducible, aperiodic Markov chain, with different transition probabilities than those from he (1 − d)/N version. Viewing PageRank as a Markov chain makes clear why the matrix A has a maximal real eigenvalue of 1. Since A has positive entries with 578 14. Unsupervised Learning Page 2 Page 1 Page 4 Page 3 FIGURE 14.46. PageRank algorithm: example of a small network each column summing to one, Markov chain theory tells us that it has a unique eigenvector with eigenvalue one, corresponding to the stationary distribution of the chain (Bremaud, 1999).

…

Under regularity conditions it can be shown that this procedure eventually stabilizes, and the resulting random variables are indeed a sample from the joint distribution of U1 , U2 , . . . , UK . This occurs despite the fact (t) (t) (t) that the samples (U1 , U2 , . . . , UK ) are clearly not independent for different t. More formally, Gibbs sampling produces a Markov chain whose stationary distribution is the true joint distribution, and hence the term “Markov chain Monte Carlo.” It is not surprising that the true joint distribution is stationary under this process, as the successive steps leave the marginal distributions of the Uk ’s unchanged. 280 8. Model Inference and Averaging Note that we don’t need to know the explicit form of the conditional densities, but just need to be able to sample from them.

…

A. 510 Wong, W. 292 Wright, G. 664, 674, 693 Wright, M. 96, 421 Wu, T. 92, 294, 583 Wyner, A. 384, 603 Yang, N. 3, 49 Yang, Y. 686, 693 Yasui, Y. 664 Yeang, C. 654, 658 Yee, T. 300 Yekutieli, Y. 693 Yu, B. 90, 91, 384 Yuan, M. 90 Zhang, H. 90, 304, 428, 455 Zhang, J. 409–412, 605 Zhang, P. 257 Zhang, T. 384 Zhao, P. 90, 91 Zhao, Y. 693 Zhu, J. 89, 98, 174, 348, 349, 385, 426, 428, 434, 610, 611, 615, 657, 661, 664, 666, 693 Zidek, J. 84 Zou, H. 72, 78, 92, 349, 385, 550, 662, 693 This is page 737 Printer: Opaque this Index L1 regularization, see Lasso Activation function, 392–395 AdaBoost, 337–346 Adaptive lasso, 92 Adaptive methods, 429 Adaptive nearest neighbor methods, 475–478 Adaptive wavelet filtering, 181 Additive model, 295–304 Adjusted response, 297 Affine set, 130 Affine-invariant average, 482, 540 AIC, see Akaike information criterion Akaike information criterion (AIC), 230 Analysis of deviance, 124 Applications abstracts, 672 aorta, 204 bone, 152 California housing, 371–372, 591 countries, 517 demographics, 379–380 document, 532 flow cytometry, 637 galaxy, 201 heart attack, 122, 146, 207 lymphoma, 674 marketing, 488 microarray, 5, 505, 532 nested spheres, 590 New Zealand fish, 375–379 nuclear magnetic resonance, 176 ozone, 201 prostate cancer, 3, 49, 61, 608 protein mass spectrometry, 664 satellite image, 470 skin of the orange, 429–432 spam, 2, 300–304, 313, 320, 328, 352, 593 vowel, 440, 464 waveform, 451 ZIP code, 4, 404, 536–539 Archetypal analysis, 554–557 Association rules, 492–495, 499– 501 738 Index Automatic relevance determination, 411 Automatic selection of smoothing parameters , 156 B-Spline, 186 Back-propagation, 392–397, 408– 409 Backfitting, 297, 391 Backward selection, 58 stepwise selection, 59 Backward pass, 396 Bagging, 282–288, 409, 587 Basis expansions and regularization, 139–189 Basis functions, 141, 186, 189, 321, 328 Batch learning, 397 Baum–Welch algorithm, 272 Bayes classifier, 21 factor, 234 methods, 233–235, 267–272 rate, 21 Bayesian, 409 Bayesian information criterion (BIC), 233 Benjamini–Hochberg method, 688 Best-subset selection, 57, 610 Between class covariance matrix, 114 Bias, 16, 24, 37, 160, 219 Bias-variance decomposition, 24, 37, 219 Bias-variance tradeoff, 37, 219 BIC, see Bayesian Information Criterion Boltzmann machines, 638–648 Bonferroni method, 686 Boosting, 337–386, 409 as lasso regression, 607–609 exponential loss and AdaBoost, 343 gradient boosting, 358 implementations, 360 margin maximization, 613 numerical optimization, 358 partial-dependence plots, 369 regularization path, 607 shrinkage, 364 stochastic gradient boosting, 365 tree size, 361 variable importance, 367 Bootstrap, 249, 261–264, 267, 271– 282, 587 relationship to Bayesian method, 271 relationship to maximum likelihood method, 267 Bottom-up clustering, 520–528 Bump hunting, see Patient rule induction method Bumping, 290–292 C5.0, 624 Canonical variates, 441 CART, see Classification and regression trees Categorical predictors, 10, 310 Censored data, 674 Classical multidimensional scaling, 570 Classification, 22, 101–137, 305– 317, 417–429 Classification and regression trees (CART), 305–317 Clique, 628 Clustering, 501–528 k-means, 509–510 agglomerative, 523–528 hierarchical, 520–528 Codebook, 515 Combinatorial algorithms, 507 Combining models, 288–290 Committee, 289, 587, 605 Comparison of learning methods, 350–352 Complete data, 276 Index Complexity parameter, 37 Computational shortcuts quadratic penalty, 659 Condensing procedure, 480 Conditional likelihood, 31 Confusion matrix, 301 Conjugate gradients, 396 Consensus, 285–286 Convolutional networks, 407 Coordinate descent, 92, 636, 668 COSSO, 304 Cost complexity pruning, 308 Covariance graph, 631 Cp statistic, 230 Cross-entropy, 308–310 Cross-validation, 241–245 Cubic smoothing spline, 151–153 Cubic spline, 151–153 Curse of dimensionality, 22–26 Dantzig selector, 89 Data augmentation, 276 Daubechies symmlet-8 wavelets, 176 De-correlation, 597 Decision boundary, 13–15, 21 Decision trees, 305–317 Decoder, 515, see encoder Decomposable models, 641 Degrees of freedom in an additive model, 302 in ridge regression, 68 of a tree, 336 of smoother matrices, 153–154, 158 Delta rule, 397 Demmler-Reinsch basis for splines, 156 Density estimation, 208–215 Deviance, 124, 309 Diagonal linear discriminant analysis, 651–654 Dimension reduction, 658 for nearest neighbors, 479 Discrete variables, 10, 310–311 739 Discriminant adaptive nearest neighbor classifier, 475–480 analysis, 106–119 coordinates, 108 functions, 109–110 Dissimilarity measure, 503–504 Dummy variables, 10 Early stopping, 398 Effective degrees of freedom, 17, 68, 153–154, 158, 232, 302, 336 Effective number of parameters, 15, 68, 153–154, 158, 232, 302, 336 Eigenvalues of a smoother matrix, 154 Elastic net, 662 EM algorithm, 272–279 as a maximization-maximization procedure, 277 for two component Gaussian mixture, 272 Encoder, 514–515 Ensemble, 616–623 Ensemble learning, 605–624 Entropy, 309 Equivalent kernel, 156 Error rate, 219–230 Error-correcting codes, 606 Estimates of in-sample prediction error, 230 Expectation-maximization algorithm, see EM algorithm Extra-sample error, 228 False discovery rate, 687–690, 692, 693 Feature, 1 extraction, 150 selection, 409, 658, 681–683 Feed-forward neural networks, 392– 408 740 Index Fisher’s linear discriminant, 106– 119, 438 Flexible discriminant analysis, 440– 445 Forward selection, 58 stagewise, 86, 608 stagewise additive modeling, 342 stepwise, 73 Forward pass algorithm, 395 Fourier transform, 168 Frequentist methods, 267 Function approximation, 28–36 Fused lasso, 666 Gap statistic, 519 Gating networks, 329 Gauss-Markov theorem, 51–52 Gauss-Newton method, 391 Gaussian (normal) distribution, 16 Gaussian graphical model, 630 Gaussian mixtures, 273, 463, 492, 509 Gaussian radial basis functions, 212 GBM, see Gradient boosting GBM package, see Gradient boosting GCV, see Generalized cross-validation GEM (generalized EM), 277 Generalization error, 220 performance, 220 Generalized additive model, 295– 304 Generalized association rules, 497– 499 Generalized cross-validation, 244 Generalized linear discriminant analysis, 438 Generalized linear models, 125 Gibbs sampler, 279–280, 641 for mixtures, 280 Gini index, 309 Global Markov property, 628 Gradient Boosting, 359–361 Gradient descent, 358, 395–397 Graph Laplacian, 545 Graphical lasso, 636 Grouped lasso, 90 Haar basis function, 176 Hammersley-Clifford theorem, 629 Hard-thresholding, 653 Hat matrix, 46 Helix, 582 Hessian matrix, 121 Hidden nodes, 641–642 Hidden units, 393–394 Hierarchical clustering, 520–528 Hierarchical mixtures of experts, 329–332 High-dimensional problems, 649 Hints, 96 Hyperplane, see Separating Hyperplane ICA, see Independent components analysis Importance sampling, 617 In-sample prediction error, 230 Incomplete data, 332 Independent components analysis, 557–570 Independent variables, 9 Indicator response matrix, 103 Inference, 261–294 Information Fisher, 266 observed, 274 Information theory, 236, 561 Inner product, 53, 668, 670 Inputs, 10 Instability of trees, 312 Intercept, 11 Invariance manifold, 471 Invariant metric, 471 Inverse wavelet transform, 179 Index IRLS, see Iteratively reweighted least squares Irreducible error, 224 Ising model, 638 ISOMAP, 572 Isometric feature mapping, 572 Iterative proportional scaling, 585 Iteratively reweighted least squares (IRLS), 121 Jensen’s inequality, 293 Join tree, 629 Junction tree, 629 K-means clustering, 460, 509–514 K-medoid clustering, 515–520 K-nearest neighbor classifiers, 463 Karhunen-Loeve transformation (principal components), 66– 67, 79, 534–539 Karush-Kuhn-Tucker conditions, 133, 420 Kernel classification, 670 density classification, 210 density estimation, 208–215 function, 209 logistic regression, 654 principal component, 547–550 string, 668–669 trick, 660 Kernel methods, 167–176, 208–215, 423–438, 659 Knot, 141, 322 Kriging, 171 Kruskal-Shephard scaling, 570 Kullback-Leibler distance, 561 Lagrange multipliers, 293 Landmark, 539 Laplacian, 545 Laplacian distribution, 72 LAR, see Least angle regression Lasso, 68–69, 86–90, 609, 635, 636, 661 741 fused, 666 Latent factor, 674 variable, 678 Learning, 1 Learning rate, 396 Learning vector quantization, 462 Least angle regression, 73–79, 86, 610 Least squares, 11, 32 Leave-one-out cross-validation, 243 LeNet, 406 Likelihood function, 265, 273 Linear basis expansion, 139–148 Linear combination splits, 312 Linear discriminant function, 106– 119 Linear methods for classification, 101–137 for regression, 43–99 Linear models and least squares, 11 Linear regression of an indicator matrix, 103 Linear separability, 129 Linear smoother, 153 Link function, 296 LLE, see Local linear embedding Local false discovery rate, 693 Local likelihood, 205 Local linear embedding, 572 Local methods in high dimensions, 22–27 Local minima, 400 Local polynomial regression, 197 Local regression, 194, 200 Localization in time/frequency, 175 Loess (local regression), 194, 200 Log-linear model, 639 Log-odds ratio (logit), 119 Logistic (sigmoid) function, 393 Logistic regression, 119–128, 299 Logit (log-odds ratio), 119 Loss function, 18, 21, 219–223, 346 Loss matrix, 310 742 Index Lossless compression, 515 Lossy compression, 515 LVQ, see Learning Vector Quantization Mahalanobis distance, 441 Majority vote, 337 Majorization, 294, 553 Majorize-Minimize algorithm, 294, 584 MAP (maximum aposteriori) estimate, 270 Margin, 134, 418 Market basket analysis, 488, 499 Markov chain Monte Carlo (MCMC) methods, 279 Markov graph, 627 Markov networks, 638–648 MARS, see Multivariate adaptive regression splines MART, see Multiple additive regression trees Maximum likelihood estimation, 31, 261, 265 MCMC, see Markov Chain Monte Carlo Methods MDL, see Minimum description length Mean field approximation, 641 Mean squared error, 24, 285 Memory-based method, 463 Metropolis-Hastings algorithm, 282 Minimum description length (MDL), 235 Minorization, 294, 553 Minorize-Maximize algorithm, 294, 584 Misclassification error, 17, 309 Missing data, 276, 332–333 Missing predictor values, 332–333 Mixing proportions, 214 Mixture discriminant analysis, 449– 455 Mixture modeling, 214–215, 272– 275, 449–455, 692 Mixture of experts, 329–332 Mixtures and the EM algorithm, 272–275 MM algorithm, 294, 584 Mode seekers, 507 Model averaging and stacking, 288 Model combination, 289 Model complexity, 221–222 Model selection, 57, 222–223, 230– 231 Modified regression, 634 Monte Carlo method, 250, 495 Mother wavelet, 178 Multidimensional scaling, 570–572 Multidimensional splines, 162 Multiedit algorithm, 480 Multilayer perceptron, 400, 401 Multinomial distribution, 120 Multiple additive regression trees (MART), 361 Multiple hypothesis testing, 683– 693 Multiple minima, 291, 400 Multiple outcome shrinkage and selection, 84 Multiple outputs, 56, 84, 103–106 Multiple regression from simple univariate regression, 52 Multiresolution analysis, 178 Multivariate adaptive regression splines (MARS), 321–327 Multivariate nonparametric regression, 445 Nadaraya–Watson estimate, 193 Naive Bayes classifier, 108, 210– 211, 694 Natural cubic splines, 144–146 Nearest centroids, 670 Nearest neighbor methods, 463– 483 Nearest shrunken centroids, 651– 654, 694 Network diagram, 392 Neural networks, 389–416 Index Newton’s method (Newton-Raphson procedure), 120–122 Non-negative matrix factorization, 553–554 Nonparametric logistic regression, 299–304 Normal (Gaussian) distribution, 16, 31 Normal equations, 12 Numerical optimization, 395–396 Object dissimilarity, 505–507 Online algorithm, 397 Optimal scoring, 445, 450–451 Optimal separating hyperplane, 132– 135 Optimism of the training error rate, 228–230 Ordered categorical (ordinal) predictor, 10, 504 Ordered features, 666 Orthogonal predictors, 53 Overfitting, 220, 228–230, 364 PageRank, 576 Pairwise distance, 668 Pairwise Markov property, 628 Parametric bootstrap, 264 Partial dependence plots, 369–370 Partial least squares, 80–82, 680 Partition function, 638 Parzen window, 208 Pasting, 318 Path algorithm, 73–79, 86–89, 432 Patient rule induction method(PRIM), 317–321, 499–501 Peeling, 318 Penalization, 607, see regularization Penalized discriminant analysis, 446– 449 Penalized polynomial regression, 171 Penalized regression, 34, 61–69, 171 Penalty matrix, 152, 189 743 Perceptron, 392–416 Piecewise polynomials and splines, 36, 143 Posterior distribution, 268 probability, 233–235, 268 Power method, 577 Pre-conditioning, 681–683 Prediction accuracy, 329 Prediction error, 18 Predictive distribution, 268 PRIM, see Patient rule induction method Principal components, 66–67, 79– 80, 534–539, 547 regression, 79–80 sparse, 550 supervised, 674 Principal curves and surfaces, 541– 544 Principal points, 541 Prior distribution, 268–272 Procrustes average, 540 distance, 539 Projection pursuit, 389–392, 565 regression, 389–392 Prototype classifier, 459–463 Prototype methods, 459–463 Proximity matrices, 503 Pruning, 308 QR decomposition, 55 Quadratic approximations and inference, 124 Quadratic discriminant function, 108, 110 Radial basis function (RBF) network, 392 Radial basis functions, 212–214, 275, 393 Radial kernel, 548 Random forest, 409, 587–604 algorithm, 588 744 Index bias, 596–601 comparison to boosting, 589 example, 589 out-of-bag (oob), 592 overfit, 596 proximity plot, 595 variable importance, 593 variance, 597–601 Rao score test, 125 Rayleigh quotient, 116 Receiver operating characteristic (ROC) curve, 317 Reduced-rank linear discriminant analysis, 113 Regression, 11–14, 43–99, 200–204 Regression spline, 144 Regularization, 34, 167–176 Regularized discriminant analysis, 112–113, 654 Relevance network, 631 Representer of evaluation, 169 Reproducing kernel Hilbert space, 167–176, 428–429 Reproducing property, 169 Responsibilities, 274–275 Ridge regression, 61–68, 650, 659 Risk factor, 122 Robust fitting, 346–350 Rosenblatt’s perceptron learning algorithm, 130 Rug plot, 303 Rulefit, 623 SAM, 690–693, see Significance Analysis of Microarrays Sammon mapping, 571 SCAD, 92 Scaling of the inputs, 398 Schwarz’s criterion, 230–235 Score equations, 120, 265 Self-consistency property, 541–543 Self-organizing map (SOM), 528– 534 Sensitivity of a test, 314–317 Separating hyperplane, 132–135 Separating hyperplanes, 136, 417– 419 Separator, 628 Shape average, 482, 540 Shrinkage methods, 61–69, 652 Sigmoid, 393 Significance Analysis of Microarrays, 690–693 Similarity measure, see Dissimilarity measure Single index model, 390 Singular value decomposition, 64, 535–536, 659 singular values, 535 singular vectors, 535 Sliced inverse regression, 480 Smoother, 139–156, 192–199 matrix, 153 Smoothing parameter, 37, 156–161, 198–199 Smoothing spline, 151–156 Soft clustering, 512 Soft-thresholding, 653 Softmax function, 393 SOM, see Self-organizing map Sparse, 175, 304, 610–613, 636 additive model, 91 graph, 625, 635 Specificity of a test, 314–317 Spectral clustering, 544–547 Spline, 186 additive, 297–299 cubic, 151–153 cubic smoothing, 151–153 interaction, 428 regression, 144 smoothing, 151–156 thin plate, 165 Squared error loss, 18, 24, 37, 219 SRM, see Structural risk minimization Stacking (stacked generalization), 290 Starting values, 397 Statistical decision theory, 18–22 Index Statistical model, 28–29 Steepest descent, 358, 395–397 Stepwise selection, 60 Stochastic approximation, 397 Stochastic search (bumping), 290– 292 Stress function, 570–572 Structural risk minimization (SRM), 239–241 Subset selection, 57–60 Supervised learning, 2 Supervised principal components, 674–681 Support vector classifier, 417–421, 654 multiclass, 657 Support vector machine, 423–437 SURE shrinkage method, 179 Survival analysis, 674 Survival curve, 674 SVD, see Singular value decomposition Symmlet basis, 176 Tangent distance, 471–475 Tanh activation function, 424 Target variables, 10 Tensor product basis, 162 Test error, 220–223 Test set, 220 Thin plate spline, 165 Thinning strategy, 189 Trace of a matrix, 153 Training epoch, 397 Training error, 220–223 Training set, 219–223 Tree for regression, 307–308 Tree-based methods, 305–317 Trees for classification, 308–310 Trellis display, 202 745 Undirected graph, 625–648 Universal approximator, 390 Unsupervised learning, 2, 485–585 Unsupervised learning as supervised learning, 495–497 Validation set, 222 Vapnik-Chervonenkis (VC) dimension, 237–239 Variable importance plot, 594 Variable types and terminology, 9 Variance, 16, 25, 37, 158–161, 219 between, 114 within, 114, 446 Variance reduction, 588 Varying coefficient models, 203– 204 VC dimension, see Vapnik–Chervonenkis dimension Vector quantization, 514–515 Voronoi regions, 510 Wald test, 125 Wavelet basis functions, 176–179 smoothing, 174 transform, 176–179 Weak learner, 383, 605 Weakest link pruning, 308 Webpages, 576 Website for book, 8 Weight decay, 398 Weight elimination, 398 Weights in a neural network, 395 Within class covariance matrix, 114, 446

pages: 255 words: 78,207

Web Scraping With Python: Collecting Data From the Modern Web
by Ryan Mitchell
Published 14 Jun 2015

By starting with a random starting word (in this case, the ubiquitous “I”), we can tra‐ verse through the Markov chain easily, generating as many words as we like. Six Degrees of Wikipedia: Conclusion In Chapter 3, we created a scraper that collects links from one Wikipedia article to the next, starting with the article on Kevin Bacon, and stores them in a database. Why are we bringing it up again? Because it turns out the problem of choosing a path of links that starts on one page and ends up on the target page (i.e., finding a string of pages between https://en.wikipedia.org/wiki/Kevin_Bacon and https://en.wikipedia.org/wiki/ Eric_Idle) is the same as finding a Markov chain where both the first word and last word are defined.

…

That is, if our weather system represented an extremely small Internet, “rainy” would have a low page rank, while “cloudy” would have a high page rank. With all of this in mind, let’s bring it back down to a more concrete example: analyz‐ ing and writing text. Again using the inauguration speech of William Henry Harrison analyzed in the pre‐ vious example, we can write the following code that generates arbitrarily long Markov chains (with the chain length set to 100) based on the structure of its text: from urllib.request import urlopen from random import randint def wordListSum(wordList): sum = 0 for word, value in wordList.items(): sum += value return sum def retrieveRandomWord(wordList): randIndex = randint(1, wordListSum(wordList)) for word, value in wordList.items(): randIndex -= value if randIndex <= 0: return word def buildWordDict(text): 124 | Chapter 8: Reading and Writing Natural Languages #Remove newlines and quotes text = text.replace("\n", " "); text = text.replace("\"", ""); #Make sure punctuation marks are treated as their own "words," #so that they will be included in the Markov chain punctuation = [',','.',';',':'] for symbol in punctuation: text = text.replace(symbol, " "+symbol+" "); words = text.split(" ") #Filter out empty words words = [word for word in words if word !

…

Again using the inauguration speech of William Henry Harrison analyzed in the pre‐ vious example, we can write the following code that generates arbitrarily long Markov chains (with the chain length set to 100) based on the structure of its text: from urllib.request import urlopen from random import randint def wordListSum(wordList): sum = 0 for word, value in wordList.items(): sum += value return sum def retrieveRandomWord(wordList): randIndex = randint(1, wordListSum(wordList)) for word, value in wordList.items(): randIndex -= value if randIndex <= 0: return word def buildWordDict(text): 124 | Chapter 8: Reading and Writing Natural Languages #Remove newlines and quotes text = text.replace("\n", " "); text = text.replace("\"", ""); #Make sure punctuation marks are treated as their own "words," #so that they will be included in the Markov chain punctuation = [',','.',';',':'] for symbol in punctuation: text = text.replace(symbol, " "+symbol+" "); words = text.split(" ") #Filter out empty words words = [word for word in words if word != ""] wordDict = {} for i in range(1, len(words)): if words[i-1] not in wordDict: #Create a new dictionary for this word wordDict[words[i-1]] = {} if words[i] not in wordDict[words[i-1]]: wordDict[words[i-1]][words[i]] = 0 wordDict[words[i-1]][words[i]] = wordDict[words[i-1]][words[ i]] + 1 return wordDict text = str(urlopen("http://pythonscraping.com/files/inaugurationSpeech.txt") .read(), 'utf-8') wordDict = buildWordDict(text) #Generate a Markov chain of length 100 length = 100 chain = "" currentWord = "I" for i in range(0, length): chain += currentWord+" " currentWord = retrieveRandomWord(wordDict[currentWord]) print(chain) The output of this code changes every time it is run, but here’s an example of the uncannily nonsensical text it will generate: I sincerely believe in Chief Magistrate to make all necessary sacrifices and oppression of the remedies which we may have occurred to me in the arrangement and disbursement of the democratic claims them , consolatory to have been best political power in fervently commending every other addition of legislation , by the interests which violate that the Government would compare our aboriginal neighbors the people to its accomplishment .

pages: 313 words: 34,042

Tools for Computational Finance
by Rüdiger Seydel
Published 2 Jan 2002

Chapter 5 in [Gla04] provides more discussion and many references. Besides volume integration, Monte Carlo is needed to integrate over possibly high-dimensional probability distributions. Drawing samples from the required distribution can be done by running a cleverly constructed Markov chain. This kind of method is called Markov Chain Monte Carlo (MCMC). That is, a chain of random variables X0 , X1 , X2 , . . . is constructed where for given Xj the next state Xj+1 does not depend on the history of the chain X0 , X1 , X2 , . . . , Xj−1 . By suitable construction criteria, convergence to any chosen target distribution is obtained.

…

Barrett et al.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia (1994). [BaR96] M. Baxter, A. Rennie: Financial Calculus. An Introduction to Derivative Pricing. Cambridge University Press, Cambridge (1996). [Beh00] E. Behrends: Introduction to Markov Chains. Vieweg, Braunschweig (2000). [Ben84] A. Bensoussan: On the theory of option pricing. Acta Applicandae Math. 2 (1984) 139-158. [Bi79] P. Billingsley: Probability and Measure. John Wiley, New York (1979). 284 References [BV00] [BS73] [Blo86] [BP00] [Bo98] [BoM58] [BBG97] [BTT00] [Br91] [BrS77] [BrS02] [Br94] [BrD97] [BrG97] [BrG04] [BH98] [BuJ92] [CaMO97] [CaF95] [CaM99] [Cash84] [CDG00] G.I.

…

J.Finance 39 (1984) 1511-1524. [GeG98] T. Gerstner, M. Griebel: Numerical integration using sparse grids. Numer. Algorithms 18 (1998) 209-232. [GeG03] T. Gerstner, M. Griebel: Dimension-adaptive tensor-product quadrature. Computing 70,4 (2003). [GiRS96] W.R. Gilks, S. Richardson, D.J. Spiegelhalter (Eds.): Markov Chain Monte Carlo in Practice. Chapman & Hall, Boca Raton (1996). [Gla04] P. Glasserman: Monte Carlo Methods in Financial Engineering. Springer, New York (2004). [GV96] G. H. Golub, C. F. Van Loan: Matrix Computations. Third Edition. The John Hopkins University Press, Baltimore (1996). [GK01] L. Grüne, P.E.

pages: 361 words: 100,834

Mapmatics: How We Navigate the World Through Numbers
by Paulina Rowinska
Published 5 Jun 2024

Liu, ‘Sampling from Complicated and Unknown Distributions’, Physica A: Statistical Mechanics and Its Applications 506 (15 September 2018): 170–8, https://doi.org/10.1016/j.physa.2018.03.096. animals or plants over time: Liu, Cho and Wang, ‘PEAR’. called Markov chain Monte Carlo (MCMC): Benjamin Fifield et al., ‘Automated Redistricting Simulation Using Markov Chain Monte Carlo’, Journal of Computational and Graphical Statistics 29, no. 4 (2020): 715–28, https://doi.org/10.1080/10618600.2020.1739532. led by mathematician Moon Duchin: Daryl DeFord, Moon Duchin and Justin Solomon, ‘Recombination: A Family of Markov Chains for Redistricting’, Harvard Data Science Review 3, no. 1 (31 March 2021), https://doi.org/10.1162/99608f92.eb30390f.

…

The worst maps ‘die’, that is, they are removed from the final set. And so, the population evolves, generation by generation, until the sample is large enough. Evolutionary algorithms aren’t the only option. In 2014, a team of scientists at Princeton University generated electoral maps using a popular sampling method called Markov chain Monte Carlo (MCMC). To visualize how MCMC works, imagine that all possible maps are scattered across a hilly terrain. The ‘better’ a map is, according to a pre-defined criterion, the higher it will be placed. MCMC walks around the area, spending more time in places higher above the ground, and so collecting more plausible maps.

…

E. ref1 Google ref1, ref2, ref3, ref4, ref5, ref6, ref7 Gordon, Noel ref1 GPS (Global Positioning System) ref1, ref2, ref3, ref4 graph theory cognitive graphs ref1 five-colour theorem ref1, ref2 four-colour theorem ref1, ref2, ref3 graph labelling ref1, ref2 invention of ref1, ref2, ref3, ref4, ref5, ref6 Königsberg bridges puzzle ref1, ref2, ref3, ref4, ref5, ref6, ref7 Markov chain Monte Carlo (MCMC) ref1 representation ref1 term ‘graph’ ref1n travelling salesperson problem (TSP) see travelling salesperson problem (TSP) wedding planning ref1, ref2 gravity ref1, ref2 Grayeyes, Willie ref1n Great Bear, The (Patterson) ref1n great circles ref1, ref2, ref3, ref4, ref5, ref6n Gregory, James ref1 grid cells ref1, ref2 Groes, Nils ref1 Grotius, Hugo ref1 Gunter, Edmund ref1 Guthrie, Frederick ref1, ref2, ref3 H1N1 pandemic ref1, ref2 Haken, Wolfgang ref1 Hamilton, Sir William Rowan ref1, ref2 Hamming code ref1 Hamming, Richard ref1, ref2 Hausdorff, Felix ref1 Hayes, Brian ref1n Heawood, Percy ref1, ref2 Heezen, Bruce ref1, ref2, ref3 Helsinki airport, Finland ref1 Henry, Patrick ref1 Hess, Henry ref1 ‘heuristic’ defined ref1 Hinks, John ref1 History of the Life and Voyages of Christopher Columbus (Irving) ref1 homeomorphisms ref1, ref2 Horserød camp, Denmark ref1n ‘How Long is the Coast of Britain?’

pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution
by Gregory Zuckerman
Published 5 Nov 2019

Balding and bearded, Baum pursued math research while juggling government assignments, just like Simons. Over the course of several summers in the late 1960s, Baum and Lloyd Welch, an information theorist working down the hall, developed an algorithm to analyze Markov chains, which are sequences of events in which the probability of what happens next depends only on the current state, not past events. In a Markov chain, it is impossible to predict future steps with certainty, yet one can observe the chain to make educated guesses about possible outcomes. Baseball can be seen as a Markov game. If a batter has three balls and two strikes, the order in which they came and the number of fouls in between don’t matter.

…

They hadn’t tried using more-complex math to build trading formulas, partly because the computing power didn’t seem sufficient. Now Ax thought it might be time to give it a shot. Ax had long believed financial markets shared characteristics with Markov chains, those sequences of events in which the next event is only dependent on the current state. In a Markov chain, each step along the way is impossible to predict with certainty, but future steps can be predicted with some degree of accuracy if one relies on a capable model. When Simons and Baum developed their hypothetical trading model at the IDA, a decade prior, they, too, had described the market as a Markov-like process.

…

When Simons and Baum developed their hypothetical trading model at the IDA, a decade prior, they, too, had described the market as a Markov-like process. To improve their predictive models, Ax concluded it was time to bring in someone with experience developing stochastic equations, the broader family of equations to which Markov chains belong. Stochastic equations model dynamic processes that evolve over time and can involve a high level of uncertainty. Straus had recently read academic literature suggesting that trading models based on stochastic equations could be valuable tools. He agreed that Axcom needed to recruit additional mathematical firepower. A bit later, René Carmona, a professor at nearby University of California, Irvine, got a call from a friend.

pages: 404 words: 43,442

The Art of R Programming
by Norman Matloff

But while that may be true in the three-dimensional case, the approach shown here is quite fruitful in the n-ary case, in n-dimensional space. The cross product there is deﬁned as an n-by-n determinant of the form shown in Equation 8.1, and thus the preceding code generalizes perfectly. 8.4.2 Extended Example: Finding Stationary Distributions of Markov Chains A Markov chain is a random process in which we move among various states, in a “memoryless” fashion, whose deﬁnition need not concern us here. The state could be the number of jobs in a queue, the number of items stored in inventory, and so on. We will assume the number of states to be ﬁnite. As a simple example, consider a game in which we toss a coin repeatedly and win a dollar whenever we accumulate three consecutive heads.

…

Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Algebra Operations on Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Extended Example: Vector Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Extended Example: Finding Stationary Distributions of Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation Programming in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 Built-In Random Variate Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.2 Obtaining the Same Random Stream in Repeated Runs . . . . . . . . . . . . . 8.6.3 Extended Example: A Combinatorial Simulation . . . . . . . . . . . . . . . . . . . 9 OBJECT-ORIENTED PROGRAMMING 182 183 184 186 186 186 187 187 189 190 191 191 192 193 194 196 198 199 202 204 204 205 205 207 S3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 S3 Generic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Example: OOP in the lm() Linear Model Function . . . . . . . . . . . . . . . . . . 9.1.3 Finding the Implementations of Generic Methods . . . . . . . . . . . . . . . . . . . 9.1.4 Writing S3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.5 Using Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.6 Extended Example: A Class for Storing Upper-Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.7 Extended Example: A Procedure for Polynomial Regression . . . . . . . . .

…

See packages installing R, 353–354 downloading base package from CRAN, 353 from Linux package manager, 353–354 from source, 354 install_packages() function, 356 integrated development environments (IDEs), xx, 186 intensity, pixel, 63–64 interactive mode, 2–3 interfacing R to other languages, 323–332 using R from Python, 330–332 writing C/C++ functions to be called from R, 323–330 compiling and running code, 325 debugging R/C code, 326–327 extracting subdiagonals from square matrix example, 324–325 prediction of discrete-valued time series example, 327–330 internal data sets, 5 internal storage, matrix, 59, 61 Internet, accessing, 246–250 implementing parallel R example, 248–250 sockets, 247–248 TCP/IP, 247 Internet Protocol (IP) address, 247 intersect() set operation, 202 intextract() function, 243 I/O (input/output), 231–250 accessing Internet, 246–250 implementing parallel R example, 248–250 sockets in R, 247–248 TCP/IP, 247 accessing keyboard and monitor, 232–235 using print() function, 234–235 using readline() function, 234 using scan() function, 232–234 reading files, 235 accessing files on remote machines via URLs, 243 connections, 237–238 reading data frame or matrix from files, 236 reading PUMS census files example, 239–243 reading text files, 237 writing files getting files and directory information, 245 sum contents of many files example, 245–246 writing to files, 243–245 IP (Internet Protocol) address, 247 J join operation, 109 K keyboard, accessing, 232–235 printing to screen, 234–235 using readline() function, 234 using scan() function, 232–234 KMC (k-means clustering), 338–340 L lag operations, vector, 50–51 lapply() function applying functions to lists, 95 lists, 50 looping over nonvector sets, 142 using on data frames, 112–113 latency, 346 lazy evaluation principle, 52, 147 leaving-one-out method, 219, 222 legend() function, 270 length() function obtaining length of vector, 27 vector indexing, 32 levels, factors and, 121–122 .libPaths() function, 356–357 library functions, 165 linear algebra operations, on vectors and matrices, 61, 196–201 finding stationary distributions of Markov chains example, 199–201 vector cross product example, 198–199 lines() function, 264 Linux package manager, installing R from, 353–354 lists, 12–14, 85–100 accessing components and values, 93–95 applying functions to, 95–99 abalone data example, 99 lapply() and sapply() functions, 95 text concordance example, 95–98 general operations, 87–93 adding and deleting list elements, 88–90 getting size of list, 90 list indexing, 87–88 text concordance example, 90–93 recursive lists, 99–100 lm()function, 15, 208–210 load balance, 349–350 locator() function determining relevant rows and columns, 64–65 pinpointing locations with, 271–272 loess() function, 276 log10() math function, 189 logical operations, 30–31 logistic regression models, applying, 113–115 log() math function, 189 long-run state distribution, Markov modeling, 200 loops, control statements, 140–142 lowess() function, 276 ls() function environment and scope, 155–156 listing objects with, 226–227 M magnifying portions of curve, 277–280 makerow() function, 241–242 managers, snow package, 335 managing objects, 226–230 determining object structure, 228–230 exists() function, 230 listing objects with ls() function, 226–227 removing specific objects with rm() function, 227–228 saving collection of objects with save() function, 228 INDEX 365 mapsound() function, 115–116 marginal values, variable, 131 m argument, apply() function, 70 Markov chains, 199–201 MASS package, 23, 356 math functions, 189–193 calculating probability example, 190–191 calculus, 192–193 cumulative sums and products, 191 minima and maxima, 191–192 matrices, 11–12, 59–83 adding and deleting rows and columns, 73–78 finding closest pair of vertices in graph example, 75–78 resizing matrix, 73–75 applying functions to rows and columns, 70–73 apply() function, 70–72 finding outliers example, 72–73 avoiding unintended dimension reduction, 80–81 linear algebra operations on, 196–201 naming rows and columns, 81–82 operations, 61–70 filtering, 66–69 generating covariance matrix example, 69–70 image manipulation example, 63–66 linear algebra operations, 61 matrix indexing, 62–63 reading from files, 236 vector/matrix distinction, 78–79 as vectors, 28 matrix/array-like operations, 130–131 matrix class, 79 matrix() function, 60 matrix-inverse update method, 222 matrix-like operations, 104–109 apply() function, 107 extracting subdata frames, 104–105 NA values, 105–106 rbind() and cbind() functions, 106–107 salary study example, 108–109 matrix-multiplication operator, 12 maxima function, 191–192 max() math function, 190, 192 mean() function, 38 366 INDEX memory chunking, 320–321 functional programming, 314–316 avoiding memory copy example, 315–316 copy-on-change issues, 314–315 vector assignment issues, 314 using R packages for memory management, 321 merge() function, 109–110 merge sort method, numerical sorting, 347 merging data frames, 109–112 employee database example, 111–112 metacharacters, 254 methods() function, 210 microdata, 239 minima function, 191–192 min() math function, 190, 191 M/M/1 queue, 165, 168 modes batch, 1, 3, 24 defined, 26 interactive, 2–3 modulo operator, 44 monitor, accessing, 232–235 using print() function, 234–235 using readline() function, 234 using scan() function, 232–234 Monte Carlo simulation, achieving better speed in, 308–311 multicore machines, 340–341 mutlinks() function, 336 mutual outlinks, 333–334, 341–342 mvrnorm() function, MASS package, 23, 356 N named arguments, 146–147 names() function, 56 naming matrix rows and columns, 81–82 vector elements, 56 NA values matrix-like operations, 105–106 vectors, 43 n browser command, 289 nchar() function, 252 ncol() function, 79 negative subscripts, 32, 63 network, defined, 247 Newton-Raphson method, 192 next statement, 141 Nile data set, 5 noise, adding to image, 65–66 nominal variables, 121 nonlocals writing to with superassignment operator, 161–162 writing with assign() function, 163 nonvector sets, looping control statements over, 143 nonvisible functions, 211 nreps values, 205 nrow() function, 79 NULL values, 44 O object-oriented programming.

pages: 489 words: 117,470

Programming in Lua, Fourth Edition
by Roberto Ierusalimschy
Published 14 Jul 2016

Exercise 18.5: Write a true iterator that traverses all subsets of a given set. (Instead of creating a new table for each subset, it can use the same table for all its results, only changing its contents between iterations.) Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com> 19 Interlude: Markov Chain Algorithm Our next complete program is an implementation of the Markov chain algorithm, described by Kernighan & Pike in their book The Practice of Programming (Addison-Wesley, 1999). The program generates pseudo-random text based on what words can follow a sequence of n previous words in a base text. For this implementation, we will assume that n is two.

…

Modules and Packages The Function require The Basic Approach for Writing Modules in Lua Submodules and Packages III. Lua-isms 18. Iterators and the Generic for Iterators and Closures The Semantics of the Generic for Stateless Iterators Traversing Tables in Order True Iterators 19. Interlude: Markov Chain Algorithm 20. Metatables and Metamethods Arithmetic Metamethods Relational Metamethods Library-Defined Metamethods Table-Access Metamethods 21. Object-Oriented Programming Classes Inheritance Multiple Inheritance Privacy The Single-Method Approach Dual Representation 22. The Environment Global Variables with Dynamic Names Global-Variable Declarations Non-Global Environments Using _ENV Environments and Modules _ENV and load 23.

…

The Markov program local MAXGEN = 200 local NOWORD = "\n" -- build table local w1, w2 = NOWORD, NOWORD for nextword in allwords() do insert(prefix(w1, w2), nextword) w1 = w2; w2 = nextword; end insert(prefix(w1, w2), NOWORD) -- generate text w1 = NOWORD; w2 = NOWORD -- reinitialize for i = 1, MAXGEN do local list = statetab[prefix(w1, w2)] -- choose a random item from list local r = math.random(#list) local nextword = list[r] if nextword == NOWORD then return end io.write(nextword, " ") w1 = w2; w2 = nextword end Exercises Exercise 19.1: Generalize the Markov-chain algorithm so that it can use any size for the sequence of previous words used in the choice of the next word. Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com> 20 Metatables and Metamethods Usually, each value in Lua has a quite predictable set of operations. We can add numbers, we can concatenate strings, we can insert key–value pairs into tables, and so on.

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurélien Géron
Published 13 Mar 2017

If it goes to state s1, it will then most likely go to state s2 (90% probability), then immediately back to state s1 (with 100% probability). It may alternate a number of times between these two states, but eventually it will fall into state s3 and remain there forever (this is a terminal state). Markov chains can have very different dynamics, and they are heavily used in thermodynamics, chemistry, statistics, and much more. Figure 16-7. Example of a Markov chain Markov decision processes were first described in the 1950s by Richard Bellman.11 They resemble Markov chains but with a twist: at each step, an agent can choose one of several possible actions, and the transition probabilities depend on the chosen action. Moreover, some state transitions return some reward (positive or negative), and the agent’s goal is to find a policy that will maximize rewards over time.

…

Markov Decision Processes In the early 20th century, the mathematician Andrey Markov studied stochastic processes with no memory, called Markov chains. Such a process has a fixed number of states, and it randomly evolves from one state to another at each step. The probability for it to evolve from a state s to a state s′ is fixed, and it depends only on the pair (s,s′), not on past states (the system has no memory). Figure 16-7 shows an example of a Markov chain with four states. Suppose that the process starts in state s0, and there is a 70% chance that it will remain in that state at the next step.

…

summary, Stepping Back testing and validating, Testing and Validating-Testing and Validating types of systems, Types of Machine Learning Systems-Model-based learningbatch and online learning, Batch and Online Learning-Online learning instance-based versus model-based learning, Instance-Based Versus Model-Based Learning-Model-based learning supervised/unsupervised learning, Supervised/Unsupervised Learning-Reinforcement Learning workflow example, Model-based learning-Model-based learning machine translation (see natural language processing (NLP)) make(), Introduction to OpenAI Gym Manhattan norm, Select a Performance Measure manifold assumption/hypothesis, Manifold Learning Manifold Learning, Manifold Learning, LLE(see also LLE (Locally Linear Embedding) MapReduce, Frame the Problem margin violations, Soft Margin Classification Markov chains, Markov Decision Processes Markov decision processes, Markov Decision Processes-Markov Decision Processes master service, The Master and Worker Services Matplotlib, Create the Workspace, Take a Quick Look at the Data Structure, The ROC Curve, Error Analysis max margin learning, Pretraining on an Auxiliary Task max pooling layer, Pooling Layer max-norm regularization, Max-Norm Regularization-Max-Norm Regularization max_norm(), Max-Norm Regularization max_norm_regularizer(), Max-Norm Regularization max_pool(), Pooling Layer Mean Absolute Error (MAE), Select a Performance Measure-Select a Performance Measure mean coding, Variational Autoencoders Mean Square Error (MSE), Linear Regression, Manually Computing the Gradients, Sparse Autoencoders measure of similarity, Instance-based learning memmap, Incremental PCA memory cells, Model Parallelism, Memory Cells Mercer's theorem, Kernelized SVM meta learner (see blending) min-max scaling, Feature Scaling Mini-batch Gradient Descent, Mini-batch Gradient Descent-Mini-batch Gradient Descent, Training and Cost Function, Feeding Data to the Training Algorithm-Feeding Data to the Training Algorithm mini-batches, Online learning minimize(), Gradient Clipping, Freezing the Lower Layers, Policy Gradients, Learning to Play Ms.

pages: 443 words: 51,804

Handbook of Modeling High-Frequency Data in Finance
by Frederi G. Viens , Maria C. Mariani and Ionut Florescu
Published 20 Dec 2011

Moody and Saffell (2001) found that a trading system using direct reinforcement learning outperforms a Q-trader for the asset allocation problem between the S&P500 and T-bill. Dempster and Romahi (2002) compared four methods for foreign exchange trading (reinforcement learning, genetic algorithms, Markov chain linear programming, and simple heuristic) and concluded that a combination of technical indicators leads to better performance than using only individual indicators. Dempster and Leemans (2006) reached a similar conclusion using adaptive reinforcement learning. Bates et al. (2003) used Watkin’s Q-learning algorithm to maximize proﬁts; these authors compared order ﬂow and order book data and compared with technical trading rules.

…

In the case of stochastic volatility models M = 1 and R = 2 thus the market is arbitrage free but not complete. This means that the derivative prices (options) are not uniquely determined by the traded asset price. The same conclusion has been reached when approximating the stochastic volatility process with a Markov chain [19]. The solution in the above citation was to ﬁx the price of a certain derivative as a given asset and express all the other derivative prices in terms of the price of the given derivative. Next, we exemplify the choice by deriving the same nonlinear PDE in two different cases. One, when we pick the price of an option as given or two, when the volatility is a traded asset (the case of S&P500). 14.4.1 THE PDE DERIVATION WHEN WE TAKE A PARTICULAR OPTION AS A GIVEN ASSET As we have mentioned, with the two sources of randomness, we need two contracts to hedge the option: one being the underlying assets S as usual, and the second is a particular option V1 written on the asset S.

…

See also Likelihood ratio test (LRT) LRT p-values, 192–195 Lunch-time trader activity, 42 Machine learning methods, 48, 64–65 calibration of, 68 Machine learning perspective, 62 Machine-readable news, 64 Major ﬁnancial events observations centered on, 107 probability curves for, 108 Mancino, Maria Elvira, xiv, 243 Marginal utility function, 299 Mariani, Maria C., xiv, 347, 383 Market capitalization index, 128 Market completeness assumption, 302 Market complexity, modeling of, 99 Market crash, 346 2008, 136 Market index (indices) exponents calculated for, 345 squared returns of, 220 technique for producing, 110 Market index decrease, spread and, 105 Market inefﬁciencies, for small-space and mid-volume classes, 44 Market microstructure effects, 263 Market microstructure, effects on Fourier estimator, 245 Market microstructure contaminations, 273 Market microstructure model, of ultra high frequency trading, 235–242 Market model, 296–297 Market movement, indicators of, 110 Market reaction, to abnormal price movements, 45 Market-traded option prices, 219 Markov chain, stochastic volatility process with, 401 Markowitz-type optimization, 286 Martingale-difference process, 178. See also Continuous semimartingales; Equivalent martingale measure; Exponential martingale process Supermartingale Matlab, 14, 257 Matlab module, 125, 339 Maximum likelihood estimation (MLE), 13–14, 185 Index ﬁnite-sample performance of, 14–17 performance of, 23–24 Maximum likelihood estimators (MLEs, mles), 4, 6, 172–175, 190, 225.

pages: 295 words: 66,824

A Mathematician Plays the Stock Market
by John Allen Paulos
Published 1 Jan 2003

Occasionally, however, after the bad coin lands tails, the good coin, which lands tails almost 25 percent of the time, will land tails twice in succession, and you would move down to stair number -3, where the pattern will likely begin again. This latter downward pattern happens slightly more frequently (with probability .905 × .255 × .255) than does a rare head on the bad coin being followed by two heads on the good one (with probability .095 × .745 × .745) and your moving up three stairs as a consequence. So-called Markov chains are needed for a fuller analysis.) So far, so what? Game S is simple and results in steady movement down the staircase to the bottom, and game C is complicated and also results in steady movement down the staircase to the bottom. Parrondo’s fascinating discovery is that if you play these two games in succession in random order (keeping your place on the staircase as you switch between games), you will steadily ascend to the top of the staircase.

…

As we’ll see in chapter 8, however, not everyone believes that stocks’ rates of return are normally distributed. 7 Diversifying Stock Portfolios Long before my children’s fascination with Super Mario Brothers, Tetris, and more recent addictive games, I spent interminable hours as a kid playing antediluvian, low-tech Monopoly with my two brothers. The game requires the players to roll dice and move around the board buying, selling, and trading real estate properties. Although I paid attention to the probabilities and expected values associated with various moves (but not to what have come to be called the game’s Markov chain properties), my strategy was simple: Play aggressively, buy every property whether it made sense or not, and then bargain to get a monopoly. I always traded away railroads and utilities if I could, much preferring to build hotels on the real estate I owned instead. A Reminiscence and a Parable Although the game’s get-out-of-jail-free card was one of the few ties to the present-day stock market, I’ve recently had a tiny epiphany.

pages: 62 words: 14,996

SciPy and NumPy
by Eli Bressert
Published 14 Oct 2012

Some classic examples are performing linear regression, finding a function’s minimum and maximum values, determining the root of a function, and finding where two functions intersect. Below we begin with a simple linear regression and then expand it to fitting non-linear data. Tip The optimization and minimization tools that NumPy and SciPy provide are great, but they do not have Markov Chain Monte Carlo (MCMC) capabilities—in other words, Bayesian analysis. There are several popular MCMC Python packages like PyMC,[11] a rich package with many options, and emcee,[12] an affine invariant MCMC ensemble sampler (meaning that large scales are not a problem for it). 3.1.1 Data Modeling and Fitting There are several ways to fit data with a linear regression.

pages: 298 words: 84,394

We Are All Completely Beside Ourselves
by Karen Joy Fowler
Published 29 May 2013

Yet we all ate more at Grandma Donna’s, where we were left alone to fill our plates or not, where the piecrusts were flaky and the orange-cranberry muffins light as clouds; where there were silver candles in silver candlesticks, a centerpiece of autumn leaves, and everything was done with unassailable taste. Grandma Donna passed the oyster stuffing and asked my father straight out what he was working on, it being so obvious his thoughts were not with us. She meant it as a reprimand. He was the only one at the table who didn’t know this, or else he was ignoring it. He told her he was running a Markov chain analysis of avoidance conditioning. He cleared his throat. He was going to tell us more. We moved to close off the opportunity. Wheeled like a school of fish, practiced, synchronized. It was beautiful. It was Pavlovian. It was a goddamn dance of avoidance conditioning. “Pass the turkey, Mother,” my uncle Bob said, sliding smoothly into his traditional rant about the way turkeys are being bred for more white meat and less dark.

…

“‘The Learning Curve in Stochastic Learning Theory.’ I could hardly follow from one paragraph to the next. It was like I’d never seen those words before. Maybe if I’d gone to college.” “Wouldn’t have helped.” I told him briefly about Thanksgiving and how Dad had annoyed Grandma Donna with his Markov chains. I mentioned Peter’s SATs and Uncle Bob’s conspiracy theories and I almost told him Mom had given me her journals, but what if he’d wanted to see them? I didn’t want to admit, even to him, that they were lost. We walked into Bakers Square, with their gingham curtains, laminated place mats, and Muzak.

pages: 379 words: 113,656

Six Degrees: The Science of a Connected Age
by Duncan J. Watts
Published 1 Feb 2003

IEEE Transactions on Power Systems, 14(3), 967–979 (1999). Sachtjen, M. L., Carreras, B. A., and Lynch, V. E. Disturbances in a power transmission system. Physical Review E, 61(5), 4877–4882 (2000). Asavathiratham, C. The influence model: A tractable representation for the dynamics of networked Markov chains (Ph.D. dissertation, Department of Electrical Engineering and Computer Science, MIT, 2000). Emergence Although the author doesn’t call it emergence, one of the earliest works to really grapple with the issue of emergent behavior in complex (social) systems is Schelling, T. C. Micromotives and Macrobehavior (Norton, New York, 1978).

…

Competing technologies, increasing returns, and lock-in by historical events. Economic Journal, 99(394), 116–131 (1989). Arthur, W. B., and Lane, D. A. Information contagion. Structural Change and Economic Dynamics, 4(1), 81–103 (1993). Asavathiratham, C. The Influence Model: A Tractable Representation for the Dynamics of Networked Markov Chains. Ph.D. Dissertation, Department of Electrical Engineering and Computer Science, MIT (MIT, Cambridge, MA, 2000). Asch, S. E. Effects of group pressure upon the modification and distortion of judgments. In Cartwright, D., and Zander, A. (eds.), Group Dynamics: Research and Theory (Row, Peterson, Evanston, IL, 1953), pp. 151–162.

Mathematical Finance: Core Theory, Problems and Statistical Algorithms
by Nikolai Dokuchaev
Published 24 Apr 2007

In this case, Since r(t) is an observable parameter and the process R(t) is observable, we have that the © 2007 Nikolai Dokuchaev Mathematical Finance process 172 is observable as well, and one may apply the method described above for the estimation of parameters (α, λ, σ) of the equation for Other models for stock prices There are many other special types of diffusion models for stock prices: • The volatility is non-random, and the appreciation rate a(t) is an Ito process that evolves as where and where is some Wiener process; • (a(t), σ(t))=f(ξ(t)), where f is a deterministic function, ξ is a Markov chain process; • σ(t)=CS(t)p, where • the volatility σ(t) is an Ito process that evolves as where and where is some Wiener process. All these models (and many others) need statistical evaluation in implementation with real market data, and that is one of the mainstream research fields in financial econometrics and statistical finance.

pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions
by Brian Christian and Tom Griffiths
Published 4 Apr 2016

use this approach when trying to decipher codes: See Berg-Kirkpatrick and Klein, “Decipherment with a Million Random Restarts.” called the Metropolis Algorithm: Sometimes also known as the Metropolis-Hastings Algorithm, this technique is described in Metropolis et al., “Equation of State Calculations by Fast Computing Machines,” and Hastings, “Monte Carlo Methods Using Markov Chains and Their Applications.” The Metropolis Algorithm was developed by Nicholas Metropolis and the two husband-and-wife teams of Marshall and Arianna Rosenbluth and Edward and Augusta Teller in the 1950s. Metropolis was the first author on the paper describing the algorithm, so today it is known as the Metropolis Algorithm—which is doubly ironic.

…

“Can the Maximin Principle Serve as a Basis for Morality? A Critique of John Rawls’s Theory.” The American Political Science Review 69, no. 2 (1975): 594–606. Harvey, Jerry B. “The Abilene Paradox: The Management of Agreement.” Organizational Dynamics 3, no. 1 (1974): 63–80. Hastings, W. K. “Monte Carlo Methods Using Markov Chains and Their Applications.” Biometrika 57 (1970): 97–109. Hawken, Angela, and Mark Kleiman. Managing Drug Involved Probationers with Swift and Certain Sanctions: Evaluating Hawaii’s HOPE. Report submitted to the National Institute of Justice. 2009. http://www.ncjrs.gov/pdffiles1/nij/grants/229023.pdf.

pages: 573 words: 157,767

From Bacteria to Bach and Back: The Evolution of Minds
by Daniel C. Dennett
Published 7 Feb 2017

This objection is irrelevant to our application of Bayesian thinking, which is specifically designed to address the predicament of an agent blessed (or cursed) with a set of priors and wondering what to do next. The subjectivity of the agent is an integral part of the design problem being faced. 42There are skeptics. Domingos (2015) points out that the implementation of Bayesian expectation-generation depends heavily on the MCMC (Markov Chain Monte Carlo) algorithm, a brute force algorithm that does the heavy lifting; it gets the job done, but “does not simulate any real process” (p. 164). “Bayesians, for one, probably have MCMC to thank for the rising popularity of their methods more than anything else” (p. 165). Perhaps, but since the models simulate massively parallel architectures, their use of MCMC is almost a forced move. 43There are several distinct meanings of the term; in another sense, legacy code is compiled but restricted for special applications (for instance, to permit the program to run on an outdated platform). 44One of the many virtues of the detailed model proposed by Eliasmith in his groundbreaking How to Build a Brain (2013) is that his “semantic pointer architecture” accounts for behavioral comprehension (perception and control) of the animal variety, while leaving plenty of plasticity—knobs to turn—for invaders—words and other memes—to exploit.

…

M., 116–17 magic, magicians, techniques of, 318–19 malware, 304 mammals, social, 251 Manhattan Project, 71–72, 73 manifest images, 194, 198, 222, 224, 340, 365, 369, 412 of animals, 63, 123 communication and, 344 consciousness and, 336–37, 343–46 evolution of, 336, 366 meaning-concept-thing constructs in, 272–74 manifest images memes in, 287 scientific image vs., 61–63, 203, 206, 222, 287, 337, 350, 354, 355–56, 366, 367, 412 as shared ontology, 61–62, 63 as things that matter, 366 words and, 202, 203, 205, 273, 287 manipulation, in communication, 342 Markkula, Gustav, 353–54 Markov Chain Monte Carlo (MCMC) algorithm, 170n Marler, Peter, 177 Marsalis, Winton, 327 Mars Climate Orbiter, 67 Marvin, Lee, 317 Marvin, Michelle Triola, 317 Marx, Karl, 33–34, 163, 340 mass extinction, 9 Master Algorithm, 387–88, 399 Master Algorithm, The (Domingos), 384, 387–88 materialism, 19, 26, 368 conceptualization of mind and, 16, 18–19 free will and, 15 meaningfulness and, 15 scientific discoveries and, 18 Mathai, Robert, 126n mathematical proofs, computer-assisted, 386 mating displays, language and, 266 Mayer, Greg, 28 Mayr, Ernst, 235n McCarthy, John, 105, 116 McCulloch, Warren, 110, 151, 384 McFarland, David, 342–43, 344 meaningfulness, materialism and, 15 meanings: tokening of, without words, 184 words and, 272–74 Mediations (Descartes), 364, 365 medicine, expert systems and, 401–2 Melville, Herman, 328 memes, 25, 341 as affordances, 287 of animals, 282 biased transmission of, 253 coevolution of genes and, 413 and competence without comprehension, 210–11, 213–15 competition among, 235–37 comprehension and, 228 Dawkins’s coining of term, 205–6, 207, 209–10 as discrete, faithfully transmitted entities, 224–33, 236, 253n domestication of, 232–33, 310 evolution of, 255, 263–64, 330, 344, 370 in evolution of comprehension, 175, 389 examples of, 207–8 as exploiters of human vulnerabilities, 254–55 filters for, 392 genes compared to, 233–34, 242, 247 as infections, 173, 284–85 intelligent design and, 313 Internet and, 207 loci of, 235–37 in manifest images, 287 in musical composition, 326–27 natural selection and, 176, 212, 214, 310–13, 331 as nongenetically transmitted ways of behavior, 209, 224 as noninstinctual ways of behaving, 206 problem solving and, 305–6 pronounceable, see words propagation of, 225, 225–37, 254, 269, 310, 328, 330, 331 reproductive fitness of, 211, 244 routines as, 230 as semantic information, 206, 211 software apps compared to, 295, 301–4 as synanthropic, 264, 271, 285, 331 transmission errors and, 234–35 transmission of, 295 as units of cultural evolution, 205 viruses compared to, 215, 240, 254, 284–85 as ways of behaving, 295, 330 words as, 193, 194, 205–6, 207, 224–26, 263, 269–70 memes, as symbionts, 284–85, 370, 389 as commensals, 200, 215, 285 as mutualists, 193, 200, 211, 217, 232, 285 as parasites, 200, 215, 216–17, 257, 285 memetic theory: contributions to cultural theory by, 210–11, 239 depsychologizing of cultural evolution by, 239–40 as filling evolutionary gap between competent animals and intelligent designers, 241–42 as Lamarckian, 243–46 pathological cultural variants in, 240 social sciences vs., 242–43 memetic theory, criticisms of, 210, 216–17, 221–47 as adding no value to cultural theory, 237–40 by Gould, 205, 210n as not predictive, 241 by Pinker, 313, 315, 316–17 memory, personal, numbering and writing system and, 331 mens rea (guilty intention), 88 mental images, 352 Mercier, Hugo, 220 merge, in Chomskyan linguistics, 277, 279–81 meta-competences, 300–301 methodical selection, 198, 232, 272, 296, 384, 391 Michaels, Lou, 39 Mies van der Rohe, Ludwig, 321 Miller, George, 375 Millikan, Ruth Garrett, 187, 188, 273n, 289 mind, human: codependence of brain and, 413 as different from other species, 9, 11–13, 149, 335 as product of biological evolution and memetic infection, 389 as products of cultural evolution, 171 see also consciousness mind, human, conceptualizing of: difficulty of, 3–4, 10–11, 19–20 distorting fences in, 16–22 dualist, see dualism egocentrism and, 20–21 emotional investment in, 11, 12–13, 15–16, 19 first-person point of view in, 345, 364, 366 intentional stance in, 367 materialist, 16, 18–19, 368 as mystery, 9–10, 14 third-person point of view in, 366–67 mind, human, evolution of: interaction of population and technological growth in, 9 thinking tools and, see thinking tools minds, nonhuman: as Bayesian expectation machines, 170 comprehension attributed to, 86–94 human mind as different from, 9, 11–13, 149, 335 killjoy vs. romantic views of, 12, 86, 89, 90, 99–100 misinformation, 116, 117, 128, 206–7 mistakes, functions vs., 81–84 MIT, “summer vision project” at, 72 Mitchelson, Marvin, 317 Möbius, August Ferdinand, four-color theorem of, 386–87 Moby-Dick (Melville), 328 Monde, Le (The World) (Descartes), 13 Montreal Bach Festival, 384 morality, ethics, 367–69 nonhuman consciousness and, 338–39 reason-giving and, 41 More Than Nature Needs (Bickterton), 262–64 Morgenstern, Oskar, 114 morphogenesis, 131–32 morphosyntax, 267 multicellular organisms, competence of, 84–85 music, tonal: correction to the norms in, 226–27 evolution of, 226 notation of, 227 musical composition: by computers, 383–84 computers and, 322 memes in, 326–27 Musk, Elon, 400 mutations, see genetic change mutualist symbionts, 193n memes as, 193, 200, 211, 215, 217, 232, 285 mysterians, 373–75, 376, 378 Nagel, Thomas, 337 nano-intentionality, 162 nanorobots, proteins as, 383 nanotechnology, 380–81 National Physical Laboratory, 59 natural selection, 5, 336, 339 affordances and, 165–66 algorithms of, 43, 384 amplification of noise in, 241 in biological evolution, 26, 28–29, 30, 35, 40, 54, 58, 68, 76, 85, 311–12, 316–17 as bottom-up process, 74, 76, 77, 379, 382, 411 in cultural evolution, 24, 176, 210–12, 214, 244–45, 262, 309–13, 331 Darwin on, 50, 137–38, 324 de-Darwinization in, 142, 148, 331, 411 as design process, 83, 89, 93, 125 error catastrophe in, 141, 200, 227 evolution of causes into reasons in, 48–49 evolution of “how come” into “what for” questions in, 40 evolution of options into obligations in, 178, 257 in evolution of words, 187, 197 and extraction of semantic information, 118–19, 120, 150 foresight and imagination as lacking in, 399 Godfrey-Smith’s definition of, 138 as gradual process, 141–42 inheritance in, see inheritance, in natural selection just-so stories and, 121 Lamarckianism and, 244, 246 language and, 292 luck vs. talent in, 139–40, 142 memes and, 176, 212, 214, 310–13, 331 Need to Know principle in, 49–50, 71, 73, 336, 341 niche construction and, 260 Orgel’s Second Rule for, 30, 196n, 413 in origin of language, 251–52, 292 as “plagiarism,” 135–36 plasticity in, 89 potential utility in, 120–21 as R&D process, 69, 236, 254, 277 rejection of theory of, 36, 53–54, 151–52 reproduction in, 138 reproductive fitness in, 217–18 as set of processes, 37 transformation of noise into signal in, 124 variation in, 138, 139 works of genius and, 326–29 Necker cube, 20, 22 Need to Know principle, 49–50, 71, 73, 336, 341 nematode worm (C. elegans), 111 neurolinguistics, 166, 185 neuronal spike trains, 110–11, 361 neurons, neural circuits, 143, 159–60, 171, 340, 412 coalition-formation by, 160, 163, 173–74 competence of, 162 as descendents of eukaryotes, 160–61, 171 energy capture by, 162 feral, 5, 173–74, 412 idiosyncrasy of, 160, 162, 164, 174 plasticity of, 303 as selfish, 162, 163, 173 as Skinnerian creatures, 165 vision and, 347–53 von Economo, 174 neuroscience, computational, 110–11 “new math,” 57 Newton, Isaac, 13, 398 Newyorkabot (thought experiment), 165, 400 New York Times, 36 niche construction, 260–61, 335 Nietzsche, Friedrich, 340 Nigerian prince scam, 312 noise: in information transmission, 108, 111, 124, 127–28, 136 in natural selection, 241 norms, normativity: reason-giving and, 41 reasons and, 42 social vs. instrumental, 42 Nørretranders, Tor, 335n Not by Genes Alone (Richerson and Boyd), 209, 252 noticing, noticers, 395–96 “Not Waving but Drowning” (Smith), 117 novelty, in patent and copyright law, 132–33 Oak Ridge, Tennessee, 71 objets trouvés, 323 obligations, evolution of options into, 178, 257 observer bias, 238 “obvious truths”: jettisoning of, 4 see also reasoning, inversion of octopuses, as “honorary vertebrates,” 339 Odling-Smee, J., 260 Odyssey, The (Homer), 182 On Photography (Sontag), 236n On the Origin of Species (Darwin), 33, 137–38, 179, 198, 248 ontologies, 123, 340, 356, 412 of computers, 60–61 of early humans, 286–87 ontologies of elevator controllers, 65, 68 nonhuman, reasons as absent from, 170 of organisms, 60–61, 65, 79, 125 reverse engineering of, 336 semantic information and, 125 use of term, 60 see also affordances; Umwelt operant conditioning, 98 operationalism, 365 Oppenheim, Paul, 125n options, as evolving into obligations, 178, 257 Orgel’s Second Rule, 30, 196n, 382, 413 Origins of Language, The (Hurford), 266–67, 298 Oxford English Dictionary, 209 paleographers, 182 paleontologists, 79–80 palimony, 317 “Panglossian paradigm,” 29–30 parasites, 193n memes as, 200, 215, 216–17, 285 parental investment, asymmetrical, 134n Parker, Charlie “Bird,” 133 patents, 132–33, 136, 323 “patterned ground” phenomena, 44 pattern finding, 274, 275, 390–91, 393 Peirce, Charles Sanders, 182 Penrose, Roger, 77 performance models, competence models vs., 166 persistence, 47–48 reproduction and, 47, 48 phenomenology, hetero- vs. auto, 351 phenomes, in transmission without comprehension, 201–2 phenotypes, extended, 135 Philological Society of London, 248 philologists, 182 Phoenix, Joaquin, 399n phonemes: as benign illusion, 202 digitization of speech by, 199, 200, 202 as impervious to variations of tone and accent, 199–200 risks for, 267 type/token distinction and, 200 phonotactics, 267 phylogenetic diagrams (cladograms), 180 physical stance, 37 Picasso, Pablo, 149, 197, 323, 324 pidgins, 180 Pilot ACE, 59 Pinker, Steven, 246, 260, 279, 313, 315, 316–17, 323–24, 331 piping plovers, 91, 92–93, 340–41 Piraha (Amzonian language), 278 Pitts, Walter, 151, 384 Pittsburgh philosophers, 41–42 Pixar, 381, 402 plagiarism, 131, 228 natural selection as, 135–36 plane of the ecliptic, 17 Plans and the Structure of Behavior (Miller, et al.), 375 plants: cellular organization of, 150 information-transmission systems of, 150 plasticity: AI and, 164–65 of behavior, 99 of brain, 159–60, 303 Plato, 182, 299n, 300, 331 polarization, of reasoning, 25 polymerase chain reaction (PCR), 208, 381, 402 Popper, Karl, 98 Popperian creatures, 98, 99, 152, 331 “higher” animals as, 100 semantic information acquired by, 119 population, human: growth of, 9, 87, 177, 254 as percentage of total biomass, 87 Porter, Cole, 325 portmanteau words, 317 post-intelligent design, see artificial intelligence Powner, M.

pages: 543 words: 153,550

Model Thinker: What You Need to Know to Make Data Work for You
by Scott E. Page
Published 27 Nov 2018

Conservation and Society 14, no. 4: 380–390. Kauffman, Stuart. 1993. The Origins of Order: Self-Organization and Selection in Evolution. Oxford: Oxford University Press. Kennedy, John F. 1956. Profiles in Courage. New York: Harper & Brothers. Khmelev, Dmitri, and F. J. Tweedie. 2001. “Using Markov Chains for Identification of Writers.” Literary and Linguistic Computing 16, no. 4: 299–307. Kleinberg, Jon, and M. Raghu. 2015. “Team Performance with Test Scores.” Working paper, Cornell University School of Information. Knox, Grahame. n.d. “Lost at Sea.” Insight, http://insight.typepad.co.uk/lost_at_sea.pdf.

…

Journal of Finance 7, no. 1: 77–91. Markus, Greg B. 1988. “The Impact of Personal and National Economic Conditions on the Presidential Vote: A Pooled Cross-Sectional Analysis.” American Journal of Political Science 32: 137–154. Martin, Andrew D., and Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953–1999.” Political Analysis 10: 134–153. Martin, Francis, et al. 2008. “The Genome of Laccaria bicolor Provides Insights into Mycorrhizal Symbiosis.” Nature 452: 88–92. Martinez Peria, Maria Soledad, Giovanni Majnoni, Matthew T. Jones, and Winfrid Blaschke. 2001.

Data Mining: Concepts and Techniques: Concepts and Techniques
by Jiawei Han , Micheline Kamber and Jian Pei
Published 21 Jun 2011

Among many available analysis packages, BLAST (Basic Local Alignment Search Tool) is one of the most popular tools in biosequence analysis. Hidden Markov Model for Biological Sequence Analysis Given a biological sequence, biologists would like to analyze what that sequence represents. To represent the structure or statistical regularities of sequence classes, biologists construct various probabilistic models such as Markov chains and hidden Markov models. In both models, the probability of a state depends only on that of the previous state; therefore, they are particularly useful for the analysis of biological sequence data. The most common methods for constructing hidden Markov models are the forward algorithm, the Viterbi algorithm, and the Baum-Welch algorithm.

…

[PHM-A+04]; and Yan, Han, and Afshar [YHA03]. The study on sequence classification includes Ji, Bailey, and Dong [JBD05] and Ye and Keogh [YK09], with a survey by Xing, Pei, and Keogh [XPK10]. Dong and Pei [DP07] provide an overview on sequence data mining methods. Methods for analysis of biological sequences including Markov chains and hidden Markov models are introduced in many books or tutorials such as Waterman [Wat95]; Setubal and Meidanis [SM97]; Durbin, Eddy, Krogh, and Mitchison [DEKM98]; Baldi and Brunak [BB01]; Krane and Raymer [KR03]; Rabiner [Rab89]; Jones and Pevzner [JP04]; and Baxevanis and Ouellette [BO04].

…

seedata mining knowledge type constraints 294 k-predicate sets 289 Kulczynski measure 268, 272 negatively correlated pattern based on 293–294 L language model 26 Laplacian correction 355 lattice of cuboids 139, 156, 179, 188–189, 234 lazy learners 393, 422–426, 437 case-based reasoning classifiers 425–426 k-nearest-neighbor classifiers 423–425 l-diversity method 622 learning active 430, 433–434, 437 backpropagation 400 as classification step 328 connectionist 398 by examples 445 by observation 445 rate 397 semi-supervised 572 supervised 330 transfer 430, 434–436, 438 unsupervised 330, 445, 490 learning rates 403–404 leave-one-out 371 lift 266, 272 correlation analysis with 266–267 likelihood ratio statistic 363 linear regression 90, 105 multiple 106 linearly 412–413 linearly inseparable data 413–415 link mining 594 link prediction 594 load, in back-end tools/utilities 134 loan payment prediction 608–609 local outlier factor 566–567 local proximity-based outliers 564–565 logistic function 402 log-linear models 106 lossless compression 100 lossy compression 100 lower approximation 427 M machine learning 24–26 active 25 data mining similarities 26 semi-supervised 25 supervised 24 unsupervised 25 Mahalanobis distance 556 majority voting 335 Manhattan distance 72–73 MaPle 519 margin 410 market basket analysis 244–246, 271–272 example 244 illustrated 244 Markov chains 591 materialization full 159, 179, 234 iceberg cubes 319 no 159 partial 159–160, 192, 234 semi-offline 226 max patterns 280 max_confidence measure 268, 272 maximal frequent itemsets 247, 308 example 248 mining 262–264 shortcomings for compression 308–309 maximum marginal hyperplane (MMH) 409 SVM finding 412 maximum normed residual test 555 mean 39, 45 bin, smoothing by 89 example 45 for missing values 88 trimmed 46 weighted arithmetic 45 measures 145 accuracy-based 369 algebraic 145 all_confidence 272 antimonotonic 194 attribute selection 331 categories of 145 of central tendency 39, 44, 45–47 correlation 266 data cube 145 dispersion 48–51 distance 72–74, 461–462 distributive 145 holistic 145 Kulczynski 272 max_confidence 272 of multidimensional databases 146 null-invariant 272 pattern evaluation 267–271 precision 368–369 proximity 67, 68–72 recall 368–369 sensitivity 367 significance 312 similarity/dissimilarity 65–78 specificity 367 median 39, 46 bin, smoothing by 89 example 46 formula 46–47 for missing values 88 metadata 92, 134, 178 business 135 importance 135 operational 135 repositories 134–135 metarule-guided mining of association rules 295–296 example 295–296 metrics 73 classification evaluation 364–370 microeconomic view 601 midrange 47 MineSet 603, 605 minimal interval size 116 minimal spanning tree algorithm 462 minimum confidence threshold 18, 245 Minimum Description Length (MDL) 343–344 minimum support threshold 18, 190 association rules 245 count 246 Minkowski distance 73 min-max normalization 114 missing values 88–89 mixed-effect models 600 mixture models 503, 538 EM algorithm for 507–508 univariate Gaussian 504 mode 39, 47 example 47 model selection 364 with statistical tests of significance 372–373 models 18 modularity of clustering 530 use of 539 MOLAP.

pages: 255 words: 76,834

Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs
by Ken Kocienda
Published 3 Sep 2018

So, I wound up in the odd situation where I got approval to ask for help, as long as I didn’t tell the people I was asking exactly why I was asking or what I planned to do with their answers. This didn’t present as big a roadblock as it could have. The Apple text experts I spoke to didn’t seem overly distracted by my need for cloak-and-dagger-style confidentiality. They introduced me to concepts like Markov chains, conditional random fields, Bayesian inferences, and dynamic programming. In the end, these tech talks inspired me more than they directly informed my algorithms. Honestly, much of the math they described was beyond me. I’m not an engineer by training—in fact, I never took a single math course in college.

pages: 342 words: 72,927

Transport for Humans: Are We Nearly There Yet?
by Pete Dyson and Rory Sutherland
Published 15 Jan 2021

It remains to be seen whether Suffolk’s sandy soil is different to Somerset’s limestone rock base.21 The Bible offers a reference-class parable that questions the foundations of this assumption.22 Future British transport planners who wish to inspire confidence in their adjustments will turn to HM Treasury’s Magenta Book, which contains guidance on what to consider when designing an evaluation. They can use it to apply systems thinking, Stacy matrices, Bayesian belief networks, Markov chain Monte Carlo methods, qualitative comparative analysis and agent-based models.23 These methods are only as good as the available data and the skill and experience of the economists making the estimates, so attracting and developing the brightest talent are essential for innovation. International collaboration may help too.

pages: 306 words: 82,765

Skin in the Game: Hidden Asymmetries in Daily Life
by Nassim Nicholas Taleb
Published 20 Feb 2018

fn2 Complex regulations allow former government employees to find jobs helping firms navigate the regulations they themselves created. fn3 Thirty-nine percent of Americans will spend a year in the top 5 percent of the income distribution, 56 percent will find themselves in the top 10 percent, and 73 percent will spend a year in the top 20 percent. fn4 Or, more mathematically: Dynamic equality assumes Markov chain with no absorbing states. fn5 A technical comment (for nitpickers): what we can call here imperfect ergodicity means that each one of us has long-term, ergodic probabilities that have some variation among individuals: your probability of ending in the one percent may be higher than mine; nevertheless no state will have a probability of 0 for me, and no state will have a transition probability of 1 for you.

pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies
by Igor Tulchinsky
Published 30 Sep 2019

Trend analysis is an example of applications of statistical models in alpha research. In particular, a hidden Markov model is frequently utilized for that purpose, based on the belief that price movements of the stock market are not totally random. In a statistics framework, the hidden Markov model is a composition of two or more stochastic processes: a hidden Markov chain, which accounts for the temporal variability, and an observable process, which accounts for the spectral variability. In this approach, the pattern of the stock market behavior is determined based on these probability values at a particular time. The goal is to figure out the hidden state sequence given the observation sequence, extract the long-term probability distribution, and identify the current trend relative to that distribution.

Seeking SRE: Conversations About Running Production Systems at Scale
by David N. Blank-Edelman
Published 16 Sep 2018

For a given replication scheme, let’s say that we have n disks in our replication group, and that we lose data if we lose more than m disks; for example, for three-way database replication we have n = 3 and m = 2; for RS(9,6) erasure coding, we have n = 9 and m = 3. We can take these variables and model a Markov chain, shown in Figure 17-1, in which each state represents a given number of failures in a group and the transitions between states indicate the rate of disks failing or being recovered. Figure 17-1. Durability Markov model In this model, the flow from the first to second state is equal to nλ because there are n disks remaining that each fail at a rate of λ failures per hour.

…

Spotify and, The Future: Speed at Scale, Safely SRE problems addressed by, Some SRE Problems Machine Learning Can Help Solve TensorFlow and TensorBoard, Using TensorFlow and TensorBoard-Using TensorFlow and TensorBoard time series: server requests waiting, Time series: server requests waiting-Time series: server requests waiting training a neural network from scratch, A neural network from scratch-A neural network from scratch MacNamara, Ríona, Do Docs Better, Do Docs Better: Integrating Documentation into the Engineering Workflow-Communicating the Value of Documentation Mangot, Dave, Replies Markdown, The Google Experience: g3doc and EngPlay, Pick the simplest markup language that supports your needs market-oriented teams, Get Rid of as Many Handoffs as Possible Markov model, Estimating durability markup language, Pick the simplest markup language that supports your needs Maslow's Hierarchy of Needs, Production Engineering at Facebook Master Service Agreements (MSAs), Negotiating SLAs with vendors McDuffee, Keith, Replies McEniry, Chris, Replies Mean Time Between Failures (MTBF), Always in a State of Partial Failure Mean Time to Detect (MTTD), Clearing the Way for SRE in the Enterprise Mean Time to Failure (MTTF)in Markov chain, Estimating durability inappropriate optimization of, Antipattern 13: Optimizing Failure Avoidance Rather Than Recovery Time (MTTF > MTTR)-Antipattern 13: Optimizing Failure Avoidance Rather Than Recovery Time (MTTF > MTTR) Mean Time to Recovery (MTTR), Antipattern 13: Optimizing Failure Avoidance Rather Than Recovery Time (MTTF > MTTR)-Antipattern 13: Optimizing Failure Avoidance Rather Than Recovery Time (MTTF > MTTR) Mean Time to Repair (MTTR), Always in a State of Partial Failuredefined, Clearing the Way for SRE in the Enterprise window of vulnerability and, Window of Vulnerability Mediratta, Bharat, Pattern 1: Birth of Automated Testing at Google Meickle, James, Beyond Burnout, Beyond Burnout-Mental Disorder Resources Menchaca, Joaquin, Replies mental disorders, persons withand diversity conversation, Mental Disorders Are Missing from the Diversity Conversation benefits for, Benefits-Benefits business environment, Sanity Isn’t a Business Requirement compensation in job application process, Compensation crisis and, Leaving defined, Defining Mental Disorders importance of detailed job postings, Application inclusivity as beneficial to all, Inclusivity for Anyone Helps Everyone ineffectiveness of common workplace strategies towards, Thoughts and Prayers Aren’t Scalable interviewing for job, Interviewing job duties, Job Duties leaving a job, Leaving on-call work and, Application onboarding packets, Onboarding pro-inclusion patterns/antipatterns, Full-Stack Inclusivity-Mental Disorder Resources promotion, Promotion resources, Mental Disorder Resources training, Training working conditions, Working Conditions-Working Conditions workplace inclusivity and, Beyond Burnout-Mental Disorder Resources mental health, Beyond Burnout-Mental Disorder Resourcescrises, Leaving defined, Beyond Burnout mental models, Mental Models mentorship programs, Training Mercereau, Jonathan, Working with Third Parties Shouldn’t Suck, Working with Third Parties Shouldn’t Suck-Closing Thoughts Messeri, Eran, Pattern 1: Birth of Automated Testing at Google, Pattern 3: Create a Shared Source Code Repository metrics, Orienting to a Data-Driven Approach, Setting goals and defining metrics of success, Monitoring, Metrics, and KPIs(see also incident metrics) Miasnikoŭ, Stas, Do Docs Better, Do Docs Better: Integrating Documentation into the Engineering Workflow-Communicating the Value of Documentation Michel, Drew, SRE Without SRE, SRE Without SRE: The Spotify Case Study-The Future: Speed at Scale, Safely microservicescontext vs. control at Netflix, Context Versus Control in SRE-Context Versus Control in SRE current state of microservice networking, Current State of Microservice Networking service mesh and (see service mesh) mid-sized organizations (see Soundcloud, SRE at) migration cost, Project Operating Expense and Abandonment Expense(see also abandonment expense) migrations, database reliability engineering and, Migration Patterns-Rollback testing Mineiro, Luis, Replies Mitchell, Tom, What Do We Mean by Learning?

pages: 370 words: 94,968

The Most Human Human: What Talking With Computers Teaches Us About What It Means to Be Alive
by Brian Christian
Published 1 Mar 2011

“For the millionth time, I did not even remotely x! You’re the one who …” And on and on. A close reading of this dialogue, with MGonz in mind, turns up something interesting, and very telling: each remark after the first is only about the previous remark. The friends’ conversation has become stateless, unanchored from all context, a kind of “Markov chain” of riposte, meta-riposte, meta-meta-riposte. If we can be induced to sink to this level, of course the Turing test can be passed. Once again, the scientific perspective on what types of human behavior are imitable shines incredible light on how we conduct our own, human lives. There’s a sense in which verbal abuse is simply less complex than other forms of conversation.

pages: 311 words: 94,732

The Rapture of the Nerds
by Cory Doctorow and Charles Stross
Published 3 Sep 2012

“That's a lie,” Huw says. “Even if you believe it, it’s still a lie. You aren’t me. We have no common ancestor. You’re synthetic, created out of nothing to look and sound like me, or almost like me, just to discredit and provoke me. Some radical sectarian faction whipped you up out of polygons and Markov chains.” 639,219 studies Huw intently, tip of her tongue resting on her square, even teeth. “It’s remarkable,” she says at length. “Just incredible. To think that we share a common basis. Goodness me, love, you’re practically catatonic with denial, aren’t you? All right, I’ve heard your hypothesis. Now I’d like you to hear mine.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity
by Amy Webb
Published 5 Mar 2019

James Andrews, mathematician and professor at Florida State University who specialized in group theory and knot theory. Jean Bartik, mathematician and one of the original programmers for the ENIAC computer. Albert Turner Bharucha-Reid, mathematician and theorist who made significant contributions in Markov chains, probability theory, and statistics. David Blackwell, statistician and mathematician who made significant contributions to game theory, information theory, probability theory, and Bayesian statistics. Mamie Phipps Clark, a PhD and social psychologist whose research focused on self-consciousness.

pages: 268 words: 109,447

The Cultural Logic of Computation
by David Golumbia
Published 31 Mar 2009

[Alan] Newell and [Herbert] Simon, fresh from the 1956 Dartmouth Summer Seminar on artiﬁcial intelligence organized by John McCarthy, presented their Logic Theory Machine. Noam Chomsky discussed an early version of transformational-generative (TG) grammar in his “Three Models of Language.” (TG grammars would rapidly replace the Markov-chain model of Shannon and Miller with their more powerful formal-mechanical model. Miller recalls Newell told him “Chomsky was developing exactly the same kind of ideas for language that he and Herb Simon were developing for theorem proving.”) (Edwards 1996, 228–9)3 In our current climate it is remarkable to think of such diverse ﬁgures working together on similar problems, but in the ferment of Cambridge in the 1950s we ﬁnd nearly the entire set of ﬁgures and views that would rise to prominence in later decades.

The Art of SEO
by Eric Enge , Stephan Spencer , Jessie Stricchiola and Rand Fishkin
Published 7 Mar 2012

Here are some definitions that are useful for this discussion: Unique content This is content that is written by humans; is completely different from any other combination of letters, symbols, or words on the Web; and has clearly not been manipulated through computer text-processing algorithms (such as Markov chain–employing spam tools). Snippets These are small chunks of content such as quotes that are copied and reused; these are almost never problematic for search engines, especially when included in a larger document with plenty of unique content. Shingles Search engines look at relatively small phrase segments (e.g., five to six words), checking for the presence of the same segments on other pages on the Web.

…

Creating sites with little useful unique content. There are many techniques for doing this, including: Duplicating pages with minimal or no changes and exposing them to the same search engines under new URLs or domains. Machine-generating content to chosen keyword densities (e.g., using a technique such as Markov chains; but they are not recommended). Incorporating keyword-rich but nonsensical gibberish (also known as spamglish) into site content. Creating a low-value site solely for affiliate marketing purposes (see the upcoming subsection on duplicate content for a more complete definition of thin affiliate).

pages: 451 words: 103,606

Machine Learning for Hackers
by Drew Conway and John Myles White
Published 10 Feb 2012

The big idea behind this new approach, which we’ll call stochastic optimization, is to move through the range of possible parameters slightly randomly, but making sure to head in directions where your error function tends to go down rather than up. This approach is related to a lot of popular optimization algorithms that you may have heard of, including simulated annealing, genetic algorithms, and Markov chain Monte Carlo (MCMC). The specific algorithm we’ll use is called the Metropolis method; versions of the Metropolis method power a lot of modern machine learning algorithms. To illustrate the Metropolis method, we’ll work through this chapter’s case study: breaking secret codes. The algorithm we’re going to define isn’t a very efficient decryption system and would never be seriously used for production systems, but it’s a very clear example of how to use the Metropolis method.

A Brief History of Everyone Who Ever Lived
by Adam Rutherford
Published 7 Sep 2016

The stellar Nobel-prize winning embryologist Christiane Nüsslein-Volhard discovered the Toll family of genes in drosophila in the 1980s, and the name was given after she was heard to exclaim, ‘Das ist ja toll!’, which in English means, ‘That’s fantastic!’ 11 There are many ways to do this statistical trick, with different power and reliabilities, but Rasmussen used one called a Bayesian Markov Chain Monte Carlo in software called BEAST 2. We may not be very good at naming human genes, but we make up for it in analytical algorithms. 12 In autumn 2015, a tremendous paper was published in the journal Science, in which the first ancient African genome was retrieved from a man who had died and was buried in a cave in the highlands of Ethiopia.

pages: 354 words: 105,322

The Road to Ruin: The Global Elites' Secret Plan for the Next Financial Crisis
by James Rickards
Published 15 Nov 2016

Complexity and Bayes fit together hand in glove for solving capital markets problems. Capital markets are complex systems nonpareil. Market participants must forecast continually to optimize trading strategies and asset allocations. Forecasting capital markets is treacherous because they do not behave according to the Markovian stochastics widely used on Wall Street. A Markov chain has no memory; capital markets do. Capital markets produce surprises, no different from the butterfly effect identified by Lorenz in 1960. Since 2009 I have achieved superior results using complexity and Bayes to navigate the uncharted waters of systemic risk. A simple application of Bayes’ theorem can provide insights into otherwise secret understandings.

pages: 502 words: 107,510

Natural Language Annotation for Machine Learning
by James Pustejovsky and Amber Stubbs
Published 14 Oct 2012

For task A on the English corpus, it scored (.92, .80, .85, .92, and .65) and (.88, .60, .71, .88, and and .59) for precision, recall, F-measure, type, and value, respectively, for each run. For tasks C–F on the English corpus, it scored (.55, .82, .55, and .59) and (.54, .81, .55, and .60). TRIPS and TRIOS (UzZaman and Allen 2010) TRIPS and TRIOS were two different programs submitted by the same team of researchers, though both used combinations of Markov chains and CRF models trained on various syntactic and attribute features. The systems returned the same results for task A: .85 precision, .85 recall, .85 F-measure, .94 type accuracy, and .76 value accuracy. For task B, TRIOS obtained precision, recall, and F-measure values of .80, .74, and .77, while TRIPS scored .55, .88, and .68, respectively.

pages: 412 words: 104,864

Silence on the Wire: A Field Guide to Passive Reconnaissance and Indirect Attacks
by Michal Zalewski
Published 4 Apr 2005

To verify their estimates and demonstrate the issue in practice, the researchers used the Hidden Markov Model and Viterbi algorithm to guess keystrokes. A Markov Model is a method for describing a discrete system in which the next value depends only on its current state, and not on the previous values (Markov chain). The Hidden Markov Model is a variant that provides a method for describing a system for which each internal state generates an observation, but for which the actual state is not known. This model is commonly used in applications such as speech recognition, in which the goal is to obtain pure data (a textual representation of the spoken word) from its specific manifestation (sampled waveform).

pages: 549 words: 116,200

With a Little Help
by Cory Efram Doctorow , Jonathan Coulton and Russell Galen
Published 7 Dec 2010

2852 "You think I'm less sarcastic, more understanding." 2853 "Or you're better at seeming less sarcastic and more understanding." 2854 "I think working on the campaign is making you a better robot," BIGMAC said. 2855 "That was pretty sarcastic." 2856 "Or was it?" 2857 "You're really workin' the old Markov chains today, aren't you? I've got six more interviews lined up for you tomorrow --" 2858 "Saw that, put it in my calendar." BIGMAC read all the Campaign's email, and knew all that I was up to before I did. It was a little hard to get used to. 2859 "And I've got someone from Nature Computation interested in your paper about advising depressed people as a training exercise for machine-learning systems." 2860 "Saw that too." 2861 I sighed.

pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands
by Eric Topol
Published 6 Jan 2015

This relied upon quickly answering complex questions that would not be amenable to a Google search.30–32 IBM Watson was taught through hundreds of thousands of questions from prior Jeopardy shows, armed with all the information in Wikipedia, and programmed to do predictive modeling. There’s no prediction of the future here, just prediction that IBM Watson has the correct answer. Underlying its predictive capabilities was quite a portfolio of machine learning systems, including Bayesian nets, Markov chains, support vector machine algorithms, and genetic algorithms.33 I won’t go into any more depth; my brain is not smart enough to understand it all, and fortunately it’s not particularly relevant to where we are going here. Another subtype of AI and machine learning,2,20,34–48 known as deep learning, has deep importance to medicine.

Principles of Protocol Design
by Robin Sharp
Published 13 Feb 2008

After a number of attempts, the message will presumably get through, unless of course the system is overloaded or one of the senders has a defect and never stops transmitting3 . Contention protocols offer a form of what is known as statistical multiplexing, where the capacity of the multiplexed service is divided out among the senders in a non-deterministic manner. Their analysis (see, for example, Chapter 4 in [11]) is usually based on the theory of discrete-time Markov chains, which we shall not consider in detail here. In the case of unrestricted contention protocols, this analysis yields the intuitively obvious result that unless the generated traffic (number of new messages generated per unit time) is very moderate, then unrestricted contention is not a very effective method, leading to many collisions and long delays before a given message in fact gets through.

pages: 519 words: 142,646

Track Changes
by Matthew G. Kirschenbaum
Published 1 May 2016

But a work such as Pride and Prejudice and Zombies also seems to brazenly embrace the same anxieties about word processing and the automated production of literature that were forecasted by authors like Stanislaw Lem with his U-Write-It, Fritz Leiber with his wordmills, and Italo Calvino with his OEPHLW. In these dystopian scenarios, literature is produced according to type or template, the masterpieces of yesteryear mere fodder for the Markov chains that algorithmically generate the consumer-profiled best-sellers of tomorrow. Yet the reality is that enterprises like Grahame-Smith’s are notable precisely because of their comparative scarcity—the book and its handful of derivatives stand out because readers are stimulated rather than jaded by the self-conscious novelty of the prospect of reading a literary remix.

pages: 532 words: 141,574

Bleeding Edge: A Novel
by Thomas Pynchon
Published 16 Sep 2013

“Our design precedents happened to be pretty solid, for one thing” According to Justin, DeepArcher’s roots reach back to an anonymous remailer, developed from Finnish technology from the penet.fi days and looking forward to various onion-type forwarding procedures nascent at the time. “What remailers do is pass data packets on from one node to the next with only enough information to tell each link in the chain where the next one is, no more. DeepArcher goes a step further and forgets where it’s been, immediately, forever.” “Kind of like a Markov chain, where the transition matrix keeps resetting itself.” “At random.” “At pseudorandom.” To which the guys have also added designer linkrot to camouflage healthy pathways nobody wants revealed. “It’s really just another maze, only invisible. You’re dowsing for transparent links, each measuring one pixel by one, each link vanishing and relocating as soon as it’s clicked on . . . an invisible self-recoding pathway, no chance of retracing it.”

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
by Ralph Kimball and Margy Ross
Published 30 Jun 2013

Top line refers to the common practice by senior managers of seeing a quick top line review of what has happened in the enterprise over the past 24 hours. Data warehouse and long time series applications: All forms of reporting, ad hoc querying, historical analysis, master data management, large scale temporal dynamics, and Markov chain analysis. Each cache that exists in a given environment is physical and distinct from the other caches. Data moves from the raw source down this highway through ETL processes. There may be multiple paths from the raw source to intermediate caches. For instance, data could go to the real-time cache to drive a zero latency-style user interface, but at the same time be extracted directly into a daily top line cache that would look like a classic operational data store (ODS).

pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology
by Ray Kurzweil
Published 14 Jul 2005

The system is almost as accurate as I would be and much faster. Markov Models. Another method that is good at applying probabilistic networks to complex sequences of information involves Markov models.170 Andrei Andreyevich Markov (1856–1922), a renowned mathematician, established a theory of "Markov chains," which was refined by Norbert Wiener (1894–1964) in 1923. The theory provided a method to evaluate the likelihood that a certain sequence of events would occur. It has been popular, for example, in speech recognition, in which the sequential events are phonemes (parts of speech). The Markov models used in speech recognition code the likelihood that specific patterns of sound are found in each phoneme, how the phonemes influence each other, and likely orders of phonemes.

pages: 778 words: 239,744

Gnomon
by Nick Harkaway
Published 18 Oct 2017

The last digit of each stock value shifts to 4, just for a moment and one after another, running down the alphabetical list. For me this is like witnessing a solar eclipse or seeing Halley’s Comet, which I did when I was very small and plan to do again when I’m old. It is a rare and beautiful mathematical caprice called a Markov chain: an apparently meaningful sequence in a flow of random numbers. This is a particularly pretty one, a wonder of nature requiring a staggering string of coincidences. It looks almost like animation, conveys a sense of movement and of deliberation. The 4 moves back up the line, then hovers around the middle of the list.

The Art of Computer Programming: Fundamental Algorithms
by Donald E. Knuth
Published 1 Jan 1974

Suppose each arc e of G has been assigned a probability p(e), where the probabilities satisfy the conditions 0 < p(e) < 1; ^ p(e) = 1 for 1 < j < n. init(e)=Vj Consider a random path, which starts at V\ and subsequently chooses branch e of G with probability p(e), until Vn is reached; the choice of branch taken at each step is to be independent of all previous choices. 2.3.4.2 ORIENTED TREES 381 For example, consider the graph of exercise 2.3.4.1-7, and assign the respective probabilities 1, \, \, |, 1, f, \, \, \ to arcs ei, e2,.. •, e9. Then the path "Start-A- B-C-A-D-B-C-Stop" is chosen with probability l-|-l-|-|-|-l-i = tIs- Such random paths are called Markov chains, after the Russian mathematician Andrei A. Markov, who first made extensive studies of stochastic processes of this kind. The situation serves as a model for certain algorithms, although our requirement that each choice must be independent of the others is a very strong assumption. The purpose of this exercise is to analyze the computation time for algorithms of this kind.

pages: 1,535 words: 337,071

Networks, Crowds, and Markets: Reasoning About a Highly Connected World
by David Easley and Jon Kleinberg
Published 15 Nov 2010

Atlantic Monthly, 176(1):101–108, July 1945. [89] Vincent Buskens and Arnout van de Rijt. Dynamics of networks if everyone strives for structural holes. American Journal of Sociology, 114(2):371–407, 2009. [90] Samuel R. Buss and Peter Clote. Solving the Fisher-Wright and coalescence problems with a discrete Markov chain analysis. Advances in Applied Probability, 36:1175–1197, 2004. [91] Robert B. Cairns and Beverly D. Cairns. Lifelines and Risks: Pathways of Youth in our Time. Cambridge University Press, 1995. [92] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton University Press, 2003

pages: 1,263 words: 371,402

The Year's Best Science Fiction: Twenty-Sixth Annual Collection
by Gardner Dozois
Published 23 Jun 2009

He flicked between statistical summaries, technical overviews of linguistic structure, and snippets from the millions of conversations the software had logged. Food, weather, sex, death. As human dialogue the translations would have seemed utterly banal, but in context they were riveting. These were not chatterbots blindly following Markov chains, designed to impress the judges in a Turing test. The Phites were discussing matters by which they genuinely lived and died. When Daniel brought up a page of conversational topics in alphabetical order, his eyes were caught by the single entry under the letter G. Grief. He tapped the link, and spent a few minutes reading through samples, illustrating the appearance of the concept following the death of a child, a parent, a friend.