backpropagation

back to index

description: an algorithm used in machine learning to adjust the weights of artificial neural networks

67 results

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

by Pedro Domingos  · 21 Sep 2015  · 396pp  · 117,149 words

learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, and the analogizers’ is the support vector machine. In practice, however, each

the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionists’ master algorithm is backpropagation, which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as

we need to do is simulate it on the computer. The key problem that evolutionaries solve is learning structure: not just adjusting parameters, like backpropagation does, but creating the brain that those adjustments can then fine-tune. The evolutionaries’ master algorithm is genetic programming, which mates and evolves

inverse deduction is purely qualitative; we need to learn not just who interacts with whom, but how much, and backpropagation can do that. Nevertheless, both inverse deduction and backpropagation would be lost in space without some basic structure on which to hang the interactions and parameters they find, and

!” Hinton’s latest passion is deep learning, which we’ll meet later in this chapter. He was also involved in the development of backpropagation, an even better algorithm than Boltzmann machines for solving the credit-assignment problem that we’ll look at next. Boltzmann machines could solve the

sees more and more images of your grandmother and other people, the weights gradually converge to values that let it discriminate between the two. Backpropagation, as this algorithm is known, is phenomenally more powerful than the perceptron algorithm. A single neuron could only learn straight lines. Given enough

hidden neurons, a multilayer perceptron, as it’s called, can represent arbitrarily convoluted frontiers. This makes backpropagation—or simply backprop—the connectionists’ master algorithm. Backprop is an instance of a strategy that is very common in both nature and technology: if you’re in a hurry

is no simple linear relationship between these quantities. Rather, the cell maintains its stability through interlocking feedback loops, leading to very complex behavior. Backpropagation is well suited to this problem because of its ability to efficiently learn nonlinear functions. If we had a complete map of the cell

learn a complete model of a cell’s metabolic networks by a combination of structure search, with or without crossover, and parameter learning via backpropagation, but there are too many bad local optima to get stuck in. We need to reason with larger chunks, assembling and reassembling them

the tables have no holes or errors. Combining connectionism and evolutionism was fairly easy: just evolve the network structure and learn the parameters by backpropagation. But unifying logic and probability is a much harder problem. Attempts to do it go all the way back to Leibniz, who was

such as a multilayer perceptron. The neural network’s job is now to predict the value of a state, and the error signal for backpropagation is the difference between the predicted and observed values. There’s a problem, however. In supervised learning the target value for a state

the master equation we saw in the previous section.) The connectionists’ master algorithm is backpropagation, which they use to figure out which neurons are responsible for which errors and adjust their weights accordingly. Backpropagation is a form of gradient descent, which Alchemy uses to optimize the weights of a

re helping to bring about. You’ve met the five tribes of machine learning and their master algorithms: symbolists and inverse deduction; connectionists and backpropagation; evolutionaries and genetic algorithms; Bayesians and probabilistic inference; analogizers and support vector machines. And because you’ve traveled over a vast territory, negotiated

perceptrons; Hopfield on Hopfield networks; Ackley, Hinton, and Sejnowski on Boltzmann machines; Sejnowski and Rosenberg on NETtalk; and Rumelhart, Hinton, and Williams on backpropagation. “Efficient backprop,”* by Yann LeCun, Léon Bottou, Genevieve Orr, and Klaus-Robert Müller, in Neural Networks: Tricks of the Trade, edited by Genevieve Orr and

, 116–118 Automation, machine learning and, 10 Automaton, 123 The Average American (O’Keefe), 206 Average member, 206 Axon, 95 Babbage, Charles, 28 Backpropagation (backprop), 52, 104, 107–111, 115, 302 Alchemy and, 252 genetic algorithms vs., 128 neural networks and, 112–114 reinforcement learning and, 222 Bagging, 238

(NIPS), 170, 172 Conjunctive concepts, 65–68, 74 Connectionists/connectionism, 51, 52, 54, 93–119 Alchemy and, 252 autoencoder and, 116–118 backpropagation and, 52, 107–111 Boltzmann machine and, 103–104 cell model, 114–115 connectomics, 118–119 deep learning and, 115 further reading, 302–303

Generalizations, choosing, 60, 61 Generative model, Bayesian network as, 159 Gene regulation, Bayesian networks and, 159 Genetic algorithms, 122–128 Alchemy and, 252 backpropagation vs., 128 building blocks and, 128–129, 134 schemas, 129 survival of the fittest programs, 131–134 The Genetical Theory of Natural Selection (Fisher),

equation, 30 Web 2.0, 21 Web advertising, 10–11, 160, 305 Weighted k-nearest-neighbor algorithm, 183–185, 190 Weights attribute, 189 backpropagation and, 111 Master Algorithm and, 242 meta-learning and, 237–238 perceptron’s, 97–99 relational learning and, 229 of support vectors, 192–193

The Deep Learning Revolution (The MIT Press)

by Terrence J. Sejnowski  · 27 Sep 2018

that no learning algorithm for multilayer networks was possible. 1986—David Rumelhart and Geoffrey Hinton publish “Learning Internal Representations by Error-Propagation,” which introduced the “backprop” learning algorithm now used for deep learning. 1988—Richard Sutton publishes “Learning to Predict by the Methods of Temporal Differences” in Machine Learning. Temporal difference

mathematical concept in machine learning: for many problems, a cost function can be found for which the solution is the state Backpropagating Errors 111 Box 8.1 Error Backpropagation Inputs to the backprop network are propagated feedforward: In the diagram above, the inputs on the left propagate forward through the connections (arrows) to

the fastest way down a slope. Rumelhart discovered how to calculate the gradient for each weight in the network by a process called the “backpropagation of errors,” or “backprop” for short (box 8.1). Starting on the output layer, where the error is known, it is easy to calculate the gradient on

1968 science fiction film 2001: A Space Odyssey, to pursue artificial intelligence when he was nine years old. He had independently discovered a version of backpropagation for his doctoral dissertation in 1987,10 after which he moved to Toronto, to work with Geoffrey. He later moved to AT&T Bell Laboratories

an adjustable filter that automatically reduces noise. algorithm A step-by-step recipe that you follow to achieve a goal, not unlike baking a cake. backprop (backpropagation of errors) Learning algorithm that optimizes a neural network by gradient descent to minimize a cost function and improve performance. Bayes’s rule Formula that

at UCSD was founded by Don Norman, an expert on human factors and ergonomics, and had an eclectic faculty. 2. The mathematics used in the backpropagation learning algorithm had been around for some time, going back to the 1960s in the control theory literature, but it was the application to multilayer

Its Discontents: The Legacy of the Past Tense Debate,” Cognitive Science 38, no. 6 (2014): 1190–1228. 13. D. Zipser and R. A. Andersen, “A Back-Propagation Programmed Network That Simulates Response Properties of a Subset of Posterior Parietal Neurons” Nature 331, no. 6158 (1988):679–684. This network transformed the position

being hit, 148 Backgammon, 34, 144f, 148. See also TD-Gammon backgammon board, 144f learning how to play, 143–146, 148–149 Backpropagation (backprop) learning algorithm, 114f, 217, 299n2 Backpropagation of errors (backprop), 111b, 112, 118, 148 Bag-of-words model, 251 Ballard, Dana H., 96, 297nn11–12, 314n8 Baltimore, David A., 307n5 Bar

, Roger, 312n1 Blind source separation problem, 81, 82f, 83f Blocks World, 27 Boahen, Kwabena A., 313n14 Boltzmann, Ludwig, 99 Boltzmann learning, unsupervised, 106 Boltzmann machine backpropagation of errors contrasted with, 112 Charles Rosenberg on, 112 criticisms of, 106 diagram, 98b at equilibrium, 99 Geoffrey Hinton and, 49, 79, 104, 105f, 106

minima, 95b, 96, 96f. See also Attractor states; Global energy minimum Engelbart, Douglas C., 289n40 Enzymes, 265 Epigenetics, 107 Ermentrout, G. Bard, 297n6 Error backpropagation. See Backpropagation of errors Escherichia coli (E. coli), 266 scanning electron micrograph of, 266f Everest, George, 50f Evolution, 267. See also Orgel’s second rule evolutionary origins

Why Machines Learn: The Elegant Math Behind Modern AI

by Anil Ananthaswamy  · 15 Jul 2024  · 416pp  · 118,522 words

That was in 1982. Then, in 1986, David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams published a pathbreaking paper on an algorithm called backpropagation. (The idea itself predated their work, but their paper put it firmly on the map.) The algorithm, which showed how to train multi-layer perceptrons

fifteen more years before computers became powerful enough to handle the computational demands of artificial neural networks, but the “backprop” paper set a slow-burning revolution in motion. The precursor to the backpropagation algorithm, with its emphasis on calculus, however, was taking shape at about the same time as Rosenblatt was showing

it.” The line connecting ADALINE to modern neural networks (which have multiple layers and are trained using an algorithm called backpropagation) is clear. “The LMS algorithm is the foundation of backprop. And backprop is the foundation of AI,” Widrow told me. “In other words, if you trace it back, this whole field

of AI right now, [it] all starts with ADALINE.” In terms of the backpropagation algorithm, this is a fair assessment. Of course,

were beginning to probe how to train multi-layer perceptrons (or multi-layer neural networks). The outline of an algorithm that would soon be called backpropagation, or backprop, was taking shape. But the computing power in those days wasn’t up to the task. “Nobody could do

one knew yet how to efficiently train them. By 1986, that, too, had changed, irrevocably, with the publication of the first detailed exposition of the backpropagation algorithm. And within a few years of that, another paper, by a mathematician named George Cybenko, further inflamed passions about neural networks: Cybenko showed that

kind of multi-layer network, given enough neurons, could approximate any function in terms of transforming an input into a desired output. Before we tackle backpropagation, we’ll jump ahead to one of the classic findings about neural networks, the universal approximation theorem. MATHEMATICAL CODA CONVERGENCE PROOF / HOPFIELD NETWORK Theorem:

or a complex speech waveform; or recognizes images; or even generates new images. The theorem is called the universal approximation theorem. The implication of the backpropagation algorithm, detailed in the 1986 Rumelhart, Hinton, and Williams paper, was that multilayer neural networks could now be trained, while one kept in mind practical

is characterized by more than one weight matrix. By the mid- to late 1980s, researchers were successfully training some deep neural networks thanks to the backpropagation algorithm (which we’ll come to in the next chapter); the algorithm could deal with hidden layers. “But, at the time, there was no

before we can appreciate such mysteries, we need to examine the algorithm that allowed researchers to start training deep neural networks in the first place: backpropagation. CHAPTER 10 The Algorithm that Put Paid to a Persistent Myth It’s AI folklore that Minsky and Papert killed research on neural networks, starting

Rumelhart would point out the simpler solution. Their combined effort, with help from computer scientist Ronald Williams, would lead to the modern version of the backpropagation algorithm. But we are jumping ahead. Hinton’s path from Edinburgh to San Diego, to work with Rumelhart, wasn’t straightforward. Hinton handed in his

the mathematical portion of their book Rosenblatt’s chapters on multi-layer machines and his proof of convergence of a probabilistic learning algorithm based on back propagation of errors,” write professor of philosophy Hubert L. Dreyfus and his brother, Stuart E. Dreyfus, professor of industrial engineering and operations research, both at

stochastic gradient descent to train multi-layer perceptrons with hidden units; and Seppo Linnainmaa, in his 1970 master’s thesis, developed the code for efficient backpropagation. In 1974, Paul Werbos submitted his Ph.D. thesis at Harvard. Titled Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences,

it came closest to articulating the modern version of the backpropagation algorithm. The thesis wasn’t publicized much; nor was it aimed at researchers in neural networks. Despite such developments, none of them made their mark

. First Werbos and then Rumelhart, Hinton, and Williams, independently, developed an elegant technique for calculating the partial derivatives using the chain rule. THE BACKPROPAGATION ALGORITHM To understand “backpropagation” (the term introduced by Rosenblatt), we’ll turn to the simplest possible one-hidden-layer network, with one hidden neuron. During training, for

computations, but we also need to remember the old weights. (An aside: There’s a very important and interesting question about whether biological brains do backpropagation. The algorithm is considered biologically implausible, precisely because it needs to store the entire weight matrix used during the forward pass; no one knows how

shown how this algorithm would work by creating a table of the intermediate operations leading to the final result. He wrote about this procedure of backpropagation, “In general, the procedure…allows us to calculate the derivatives backwards down any ordered table of operations, so long as the operations correspond to

the fact that a neural network with hidden layers could approximate any function—which it could, given enough neurons. “We were the group that used backpropagation to develop interesting representations,” said Hinton, who is now at the University of Toronto. And therein lies the import of neural networks. The algorithms we

(More hidden layer neurons would enable a smoother decision boundary.): Rumelhart, Hinton, and Williams emphasized this aspect in their paper on backpropagation, the title of which read, “Learning Representations by Back-propagating Errors.” The abstract of their paper states, “As a result of the weight adjustments, internal ‘hidden’ units which are not part

the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure.” Of course, publishing the paper—it’s barely three pages long—involved laying some groundwork

might be referees,” Hinton told me. One of them was Stuart Sutherland, an experimental psychologist at the University of Sussex. Hinton described to Sutherland how backpropagation allowed a neural network to learn representations. “It took a while to explain to him, but then he really got it,” Hinton said. The canvassing

the next chapter, when we tackle image recognition, the application that brought neural networks fame. Even as Rumelhart, Hinton, and Williams were working on their backpropagation paper, a young student in Paris had independently developed an algorithm that achieved similar results. A colleague told Hinton that “there is a kid in

-z So: Using the chain rule: The first part of the expression is: The second part of the expression is: So: QED GENERALIZATION OF THE BACKPROPAGATION ALGORITHM Let’s start with an input vector, x. Say x = [x1, x2]. Take the first hidden layer of a neural network. Let’s

problem, using a neural network architecture that became one of his signature contributions to AI: the convolutional neural network. The CNN was trained using the backpropagation algorithm, unlike the neocognitron. A few years after LeCun’s paper was published, he met Fukushima. “He told me that when he saw our

The algorithm could then calculate an error for each unit and the requisite gradient to perform an update. Under special conditions, the algorithm behaves like backpropagation. While getting his Ph.D., LeCun began thinking about neural networks for invariant image recognition (of the kind we just saw). He presented a paper

a neural network to learn these kernels; after all, the elements of each kernel matrix are the weights of individual neurons. Training a network using backpropagation to do some task would, in essence, help the network find the appropriate kernels. We need to understand one more commonly used operation in

activation function is one such decision. The only condition is that the activation function should be differentiable, or at least approximately so, to enable the backpropagation of gradients. These hand-chosen parameters, including the size and number of kernel filters, the size and number of max pooling filters, the number of

constitute so-called hyperparameters. Fine-tuning, or finding the right values for, the hyperparameters is an art unto itself. Crucially, these are not learned via backpropagation. LeCun’s LeNet was somewhat more complicated than our example, but not overly so; he made it work. Also, it was a deep neural

called long short-term memory, or LSTM, proposed in 1997 by Jürgen Schmidhuber, whom we met in previous chapters, and his colleague Sepp Hochreiter.) The backpropagation algorithm is the workhorse for training neural networks, particularly feedforward networks. The algorithm can also be used to train recurrent networks, but we won’t

are others. Different activation functions lead neurons and the networks they constitute to behave differently; most important, these functions must be differentiable in order for backpropagation to work. (As pointed out earlier, there are activation functions that are not differentiable over their entire domain, but they can still be used,

17. See also biological neurons associative memory, 244–45, 256 AT&T Bell Labs, 206, 222–23, 240, 360 Avati, Anand, 186 B backpropagating error correction procedure, 305 backpropagation algorithm ADALINE and, 94 chain rule, 331 convolutional neural network and, 356 end game for, 308–11 generalization of, 342–45 history of

354–55 function approximator, 417 G Gamma function, 172–73 Gauss, Carl Friedrich, 2 Gaussian distribution, 2, 117–19 “Gender Shades” (Buolamwini), 421 generalization of backpropagation algorithm, 342–45 measure theory and, 166 overfitting vs., 162 rethinking, 394–95 generalization error, 392–93, 405 generative AI, 419 Gibson, James, 19 Goldilocks

images, convolutional neural networks and, 374–75 Hilbert, David, 180, 236 Hilbert spaces, 236 Hill, Alison, 124 hill climbing algorithm, 309–10 Hinton, Geoffrey on backpropagation algorithm, 56, 278 brain research, 302–3, 425 influences of, 303 LeCun collaboration, 340–41, 359 on LeNet, 374 Microsoft and, 376–77 on Minsky

The Road to Conscious Machines

by Michael Wooldridge  · 2 Nov 2018  · 346pp  · 97,890 words

what the ‘weights’ of the connections between neurons should be. And PDP provided a solution to this problem in the form of an algorithm called backpropagation, more commonly referred to as backprop – probably the single most important technique in the field of neural nets. As is often the case in science

the network. (Imagine that the network has been shown a picture of a cat, and the output layer has classified it as a dog.) The backprop algorithm propagates the error backwards through the network through each preceding layer (hence the name – backward propagation). It does this by first computing a landscape

meetings held in Asilomar, California, in 2015 and 2017. axon The component part of a neuron which connects it with other neurons. See also synapse. backprop/backpropagation The most important algorithm for training neural nets. backward chaining In knowledge-based systems, the idea that we start with a goal that we are

demonstrate the components of intelligent behaviour, in the hope they can later be integrated. gradient descent A technique used when training neural nets. See also backpropagation. Grand Challenge A competition for driverless cars, organized by US military funding agency DARPA, which led to the triumph of the robot named STANLEY in

drones 282–4 Autonomous Vehicle Disengagement Reports 231 autonomous vehicles see driverless cars autonomous weapons 281–7 autonomy levels 227–8 Autopilot 228–9 B backprop/backpropagation 182–3 backward chaining 94 Bayes nets 158 Bayes’ Theorem 155–8, 365–7 Bayesian networks 158 behavioural AI 132–7 beliefs 108–10 bias

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again

by Eric Topol  · 1 Jan 2019  · 424pp  · 114,905 words

Recurrent Neural Network—for tasks that involve sequential inputs, like speech or language, this neural network processes an input sequence one element at a time Backpropagation—an algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation

(DNN) were gaining considerable interest, and the field came back to life. A seminal Nature paper in 1986 by David Rumelhart and Geoffrey Hinton on backpropagation provided an algorithmic method for automatic error correction in neural networks and reignited interest in the field.15 It turned out this was the heart

, the layers are not designed by humans; indeed, they are hidden from the human users, and they are adjusted by techniques like Geoff Hinton’s backpropagation as a DNN interacts with the data. We’ll use an example of a machine being trained to read chest X-rays. Thousands of chest

, the father of deep learning, has even called the entire methodology into question.8 Although he invented backpropagation, the method for error correction in neural networks, he recently said he had become “deeply suspicious” of backprop, saying his view had become that we should “throw it all away and start again.”9

out the mechanics by which the brain works, but by giving us the conceptual tools to understand how it works. In Chapter 4 I reviewed backpropagation, the way neural networks learn by comparing their output with the desired output and adjusting in reverse order of execution. That critical concept wasn’t

thought to be biologically plausible. Recent work has actually borne out the brain’s way of using backpropagation to implement algorithms.53 Similarly, most neuroscientists thought biological neural networks, as compared with artificial neural networks, only do supervised learning. But that turns out

. 8. Somers, J., “Is AI Riding a One-Trick Pony?,” MIT Technology Review. 2017. 9. Perez, C. E., “Why We Should Be Deeply Suspicious of BackPropagation,” Medium. 2017. 10. Marcus, Deep Learning. 11. Hinton, G., S. Sabour, and N. Frosst, Matrix Capsules with EM Routing. 2018. ICLR. Simonite, T., “Google’s

, 229 Auris Health, 161 autism, genomics and, 211 automated science, 229–231 availability bias, 46 Avati, Anand, 187 Awdish, Rana, 306–307 Babylon Health, 265 backpropagation, 70 (table) Hinton and, 72, 77, 93 neuroscience and, 223 Baicker, Katherine, 193 Bayes’s theorem, 34, 43 Bayesian network, 9 Bejnordi, Babak, 127 BenevolentAI

, 227 Heyman, Jared, 53 high blood pressure, misdiagnosis and, 28–29 Hill, Austin Bradford, 143 Hinton, Geoffrey, 74–75, 203 AI, liabilities of, on, 93 backpropagation and, 72, 77, 93 capsule networks and, 93 neuroscience and, 224–225 radiology and, 114–115 Hippocrates, 235 Holter monitor, ECGs and, 152 honest signals

, 71–72 structured data and, 90 See also deep neural networks (DNNs) NeuroLex Diagnostics, 169 neurology, physical exams in, 300–301 neuromorphic chips, 227 neuroscience backpropagation and, 223 biohybrid computers and, 227 DeepMind and, 222–223 grid cells and, 222–223 Hinton and, 224–225 image recognition and, 227–228 machine

Architects of Intelligence

by Martin Ford  · 16 Nov 2018  · 586pp  · 186,548 words

readers can translate as simply “stuff under the deep learning hood.” Opening the hood and delving into the details of these terms is entirely optional: BACKPROPAGATION (or BACKPROP) is the learning algorithm used in deep learning systems. As a neural network is trained (see supervised learning below), information propagates back through the

individual neurons. The result is that the entire network gradually homes in on the correct answer. Geoff Hinton co-authored the seminal academic paper on backpropagation in 1986. He explains backprop further in his interview. An even more obscure term is GRADIENT DESCENT. This refers to the specific mathematical technique that the

backpropagation algorithm uses to the reduce error as the network is trained. You may also run into terms that refer to various types, or configurations, of

good enough to be incredibly useful in many applications. MARTIN FORD: Is it true that the thing that has really made deep learning possible is backpropagation? The idea that you can send the error information back through the layers, and adjust each layer based on the final outcome. YOSHUA BENGIO: Indeed

, backpropagation has been at the heart of the success of deep learning in recent years. It is a method to do credit assignment, that is, to

figure out how internal neurons should change to make the bigger network behave properly. Backpropagation, at least in the context of neural networks, was discovered in the early 1980s, at the time when I started my own work. Yann LeCun

mechanisms, memory, and the ability to not just classify but also generate images. MARTIN FORD: Do we know if the brain does something similar to backpropagation? YOSHUA BENGIO: That’s a good question. Neural nets are not trying to imitate the brain, but they are inspired by some of its computational

for the Advancement of Artificial Intelligence and the Association of Computing Machinery. Chapter 4. GEOFFREY HINTON In the past when AI has been overhyped—including backpropagation in the 1980s—people were expecting it to do great things, and it didn’t actually do things as great as they hoped. Today, it

Hinton is sometimes known as the Godfather of Deep Learning, and he has been the driving force behind some of its key technologies, such as backpropagation, Boltzmann machines, and the Capsules neural network. In addition to his roles at Google and the University of Toronto, he is also Chief Scientific Advisor

of the Vector Institute for Artificial Intelligence. MARTIN FORD: You’re most famous for working on the backpropagation algorithm. Could you explain what backpropagation is? GEOFFREY HINTON: The best way to explain it is by explaining what it isn’t. When most people think about neural

, with each weight having to be updated multiple times. It is an incredibly slow algorithm, but it works, and it’ll do whatever you want. Backpropagation is basically a way of achieving the same thing. It’s a way of tinkering with the weights so that the network does what you

. It’s faster by a factor of how many weights there are in the network. If you’ve got a network with a billion weights, backpropagation is going to be a billion times faster than the dumb algorithm. The dumb algorithm works by having you adjust one of the weights slightly

. You have control over that whole process because it’s all going on inside the neural net; you know all the weights that are involved. Backpropagation makes use of all that by sending information backward through the net. Using the fact that it knows all the weights, it can compute in

a little bit bigger or smaller to improve the output. The difference is that in evolution, you measure the effect of a change, and in backpropagation, you compute what the effect would be of making a change, and you can do that for all the weights at once with no interference

bit better. You still need to do the process a number of times, but it’s much faster than the evolutionary approach. MARTIN FORD: The backpropagation algorithm was originally created by David Rumelhart, correct, and you took that work forward? GEOFFREY HINTON: Lots of different people invented different versions of

backpropagation before David Rumelhart. They were mainly independent inventions, and it’s something I feel I’ve got too much credit for. I’ve seen things

in the press that say I invented backpropagation, and that’s completely wrong. It’s one of these rare cases when an academic feels he’s got too much credit for something! My

the record straight on that. In 1981, I was a postdoc in San Diego, California and David Rumelhart came up with the basic idea of backpropagation, so it’s his invention. Myself and Ronald Williams worked with him on formulating it properly. We got it working, but we didn’t do

I think of as a much more interesting idea, even though it doesn’t work as well. Then in 1984, I went back and tried backpropagation again so I could compare it with the Boltzmann machine, and discovered it actually worked much better, so I started communicating with David Rumelhart again

learn these feature vectors, and it was learning distributed representations of words. We submitted a paper to Nature in 1986 that had this example of backpropagation learning distributed features of words, and I talked to one of the referees of the paper, and that was what got him really excited about

learning algorithm that could learn representations of things was a big breakthrough. My contribution was not discovering the backpropagation algorithm, that was something Rumelhart had pretty much figured out, it was showing that backpropagation would learn these distributed representations, and that was what was interesting to psychologists, and eventually, to AI people

the neural network was pretty good at that and that it would discover these distributed representations of words. It made a big impact because the backpropagation algorithm could learn representations and you didn’t have to put them in by hand. People like Yann LeCun had been doing that in computer

vision for a while. He was showing that backpropagation would learn good filters for processing visual input in order to make good decisions, and that was a bit more obvious because we knew the

brain did things like that. The fact that backpropagation would learn distributed representations that captured the meanings and the syntax of words was a big breakthrough. MARTIN FORD: Is it correct to say that

that, but you also need to make a distinction between AI and machine learning on the one hand, and psychology on the other hand. Once backpropagation became popular in 1986, a lot of psychologists got interested in it, and they didn’t really lose their interest in it, they kept believing

amazing, but actually, it was just pretty good. In the early 1990s, other machine learning methods on small datasets turned out to work better than backpropagation and required fewer things to be fiddled with to get them to work well. In particular, something called the support vector machine did better at

learn multiple layers of hidden representations. Each layer would be a whole bunch of feature detectors that represent in a particular way. The idea of backpropagation was that you’d learn lots of layers, and then you’d be able to do amazing things, but we had great difficulty learning more

conventional AI is just wrong. MARTIN FORD: You gave an interview toward the end of 2017 where you said that you were suspicious of the backpropagation algorithm and that it needed to be thrown out and we needed to start from scratch. (https://www.axios.com/artificial-intelligence-pioneer-says-we

the context of the conversation wasn’t properly reported. I was talking about trying to understand the brain, and I was raising the issue that backpropagation may not be the right way to understand the brain. We don’t know for sure, but there are some reasons now for believing that

the brain might not use backpropagation. I said that if the brain doesn’t use backpropagation, then whatever the brain is using would be an interesting candidate for artificial systems. I didn’t at all mean

that we should throw out backpropagation. Backpropagation is the mainstay of all the deep learning that works, and I don’t think we should get rid of it. MARTIN FORD: Presumably, it

to be all sorts of ways of improving it, and there may well be other algorithms that are not backpropagation that also work, but I don’t think we should stop doing backpropagation. That would be crazy. MARTIN FORD: How did you become interested in artificial intelligence? What was the path that

dead end, or do you think that neural networks are the future of AI? GEOFFREY HINTON: In the past when AI has been overhyped—including backpropagation in the 1980s—people were expecting it to do great things, and it didn’t actually do things as great as they hoped. Today, it

at Google and Chief Scientific Adviser of the Vector Institute for Artificial Intelligence. Geoff was one of the researchers who introduced the backpropagation algorithm and the first to use backpropagation for learning word embeddings. His other contributions to neural network research include Boltzmann machines, distributed representations, time-delay neural nets, mixtures of

and an explosion in the amount of training data available? YANN LECUN: Yes, but it was more deliberate than that. With the emergence of the backpropagation algorithm in 1986-87, people were able to train neural nets with multiple layers, which was something that the old models didn’t do. This

was a community of people around the world who were working on neural nets, and I connected with them and ended up discovering things like backpropagation in parallel with people like David Rumelhart and Geoffrey Hinton. MARTIN FORD: So, in the early 1980s there was a lot of research in this

doesn’t say “horse,” you tell it that it’s wrong and here is the answer that it should have said. Then by using the backpropagation algorithm, it adjusts all the weights of all the connections in the network so that next time you show the same image of a horse

to be very, very uniform all over, whether you’re looking at the visual or prefrontal cortex. MARTIN FORD: Does the brain use something like backpropagation? YANN LECUN: We don’t really know. There are more fundamental questions than that, though. Most of the learning algorithms that people have come up

way as to improve this objective function? We don’t know that. If it estimates that gradient, does it do it by some form of backpropagation? It’s probably not backpropagation as we know it, but it could be a form of approximation of gradient estimation that is very similar to

the future? Or is there another thing out there that’s completely different, where we’re going to end up throwing away deep learning and back propagation and all of that, and have something entirely new? FEI-FEI LI: If you look at human civilization, the path of scientific progress is always

generation of AI. AI is a broad category, though, and I think when people discuss AI, what they really mean is the specific toolset of backpropagation, supervised learning, and neural networks. That is the most common piece of deep learning that people are working on right now. Of course, deep learning

is limited. Just because we invented electricity as a utility, it didn’t suddenly solve all of the problems of humanity. In the same way, backpropagation will not solve all the problems of humanity, but it is turning out to be incredibly valuable, and we’re nowhere near done building out

all the things we could do with neural networks trained by backpropagation. We’re just in the early phases of figuring out the implications of even the current generation of technology. Sometimes, when I’m giving a

at transfer or multitask learning; we need to figure out how to use unlabeled data better. So yes, there are a lot of things that backpropagation doesn’t do well, and again causality is one of them. When I look at the amount of high value projects being created, I don

there a visiting researcher from the University of Toronto got me involved in a project on neural networks. That’s when I learned about Rumelhart Backpropagation and the use of logisti sigmoid functions in neural network algorithms. Fast forward, I did well enough to get a Rhodes scholarship to go to

because one thing happened and then another thing happened, it’s just going to get better and better. For deep learning, the fundamental algorithm of backpropagation was developed in the 1980s, and those people eventually got it to work fantastically after 30 years of work. It was largely written off in

were also written off at the same time. No one predicted which one out of those 100 things would pop. It happened to be that backpropagation came together with a few extra things, such as clamping, more layers, and a lot more computation, and provided something great. You could never have

predicted that backpropagation and not one of those 99 other things were going to pop through. It was by no means inevitable. Deep learning has had great success

forward too, but something will come along to replace it. MARTIN FORD: When you say deep learning, do you mean by that neural networks using backpropagation? RODNEY BROOKS: Yes, but with lots of layers. MARTIN FORD: Maybe then the next thing will still be neural networks but with a different algorithm

deep learning broadly as any approach using sophisticated neural networks with lots of layers, rather than using a very technical definition involving specific algorithms like backpropagation or gradient descent. JOSH TENENBAUM: To me, the idea of using neural networks with lots of layers is also just one tool in the toolkit

Genius Makers: The Mavericks Who Brought A. I. To Google, Facebook, and the World

by Cade Metz  · 15 Mar 2021  · 414pp  · 109,622 words

new”—that scientists should never give up on an idea unless someone had proven it wouldn’t work. Twenty years earlier, Rosenblatt had proven that backpropagation wouldn’t work, so Hinton gave up on it. Then Rumelhart made this small suggestion. Over the next several weeks, the two men got to

actually recognize patterns in images. These were simple images. The system couldn’t recognize a dog or a cat or a car, but thanks to backpropagation, it could now handle that thing called “exclusive-or,” moving beyond the flaw that Marvin Minsky pinpointed in neural networks more than a decade earlier

) group of connectionists that convened once a year, at various places across the country, to discuss many of the same ideas percolating in San Diego. Backpropagation was one of them. The Boltzmann Machine was another. Years later, when asked to explain the Boltzmann Machine for the benefit of an ordinary person

digital technology. “It was the most exciting time of my life,” Sejnowski says. “We were convinced we had figured out how brains work.” But, like backpropagation, the Boltzmann Machine was still ongoing research that didn’t quite do anything useful. For years, it, too, lingered on the fringes of academia. Hinton

Baltimore on weekends so he could collaborate with Sejnowski in the lab at Johns Hopkins, and somewhere along the way, he also started tinkering with backpropagation, reckoning it would throw up useful comparisons. He thought he needed something he could compare with the Boltzmann Machine, and

backpropagation was as good as anything else. An old idea was new. At Carnegie Mellon, he had more than just the opportunity to explore these two

. The breakthrough came in 1985, a year after the lecture he gave Minsky in Boston. But the breakthrough wasn’t the Boltzmann Machine. It was backpropagation. In San Diego, he and Rumelhart had shown that a multilayered neural network could adjust its own weights. Then, at Carnegie Mellon, Hinton showed that

was Bill, it could learn that Bill was John’s father. Unbeknownst to Hinton, others in completely separate fields had designed mathematical techniques similar to backpropagation in the past. But unlike those before him, he showed that this mathematical idea had a future, and not just with images but with words

hour to mail a package to the editors of Nature, one of the world’s leading science journals. The package contained a research paper describing backpropagation, written with Rumelhart and a Northeastern University professor named Ronald Williams. It was published later that year. This was the kind of academic moment that

sentences. “I discovered we spoke the same language,” he says. Two years later, when LeCun finished his PhD thesis, which explored a technique similar to backpropagation, Hinton flew to Paris and joined the thesis committee, though he still knew almost no French. Typically, when reading research papers, he skipped the math

him in. Sutskever was a mathematics student, and in those few minutes, he seemed like a sharp one. Hinton gave him a copy of the backpropagation paper—the paper that had finally revealed the potential of deep neural networks twenty-five years earlier—and told him to come back once he

years.” When Geoff Hinton heard this, he pretended to count backwards through the years, as if to make sure GANs weren’t any cooler than backpropagation, before acknowledging that LeCun’s claim wasn’t far from the truth. Goodfellow’s work sparked a long line of projects that refined and expanded

University hires Geoff Hinton. 1984—Geoff Hinton and Yann LeCun meet in France. 1986—David Rumelhart, Geoff Hinton, and Richard Williams publish their paper on “backpropagation,” expanding the powers of neural networks. Yann LeCun joins Bell Labs in Holmdel, New Jersey, where he begins building LeNet, a neural network that can

he called “NETtalk”: “Learning, Then Talking,” New York Times, August 16, 1988. His breakthrough was a variation: Yann LeCun, Bernhard Boser, John Denker et al., “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computation (Winter 1989), http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf. ANNA was the acronym for

, 10 use of AI technology by bad actors, 243 AT&T, 52–53 Australian Centre for Robotic Vision, 278 autonomous weapons, 240, 242, 244, 308 backpropagation ability to handle “exclusive-or” questions, 38–39 criticism of, 38 family tree identification, 42 Geoff Hinton’s work with, 41 Baidu auction for acquiring

Rubik’s Cube demonstration, 276–78, 281, 297–98 use of Covariant’s automation technology in a Berlin warehouse, 284–85 Rosenblatt, Frank criticism of backpropagation, 38 death, 26–27 education and training, 17 Mark I machine development, 18 Perceptron machine demonstration, 15–19 research efforts, 25–26, 34, 36 rivalry

Data Mining: Concepts, Models, Methods, and Algorithms

by Mehmed Kantardzić  · 2 Jan 2003  · 721pp  · 197,134 words

as market basket analysis, a priori algorithm, and WWW path-traversal patterns. 5. Artificial Neural Networks. The common examples of which are multilayer perceptrons with backpropagation learning and Kohonen networks. 6. Genetic Algorithms. They are very useful as a methodology for solving hard-optimization problems, and they are often a part

neural networks. (a) Feedforward network; (b) recurrent network. Although many neural-network models have been proposed in both classes, the multilayer feedforward network with a backpropagation-learning mechanism is the most widely used model in terms of practical applications. Probably over 90% of commercial and industrial applications are based on this

applied on much more complex ANN architecture, and its implementation is discussed in Section 7.5, where the basic principles of multilayer feedforward ANNs with backpropagation are introduced. This example only shows how weight factors change with every training (learning) sample. We gave the results only for the first iteration. The

successfully to solve some difficult and diverse problems by training the network in a supervised manner with a highly popular algorithm known as the error backpropagation algorithm. This algorithm is based on the error-correction learning rule and it may be viewed as its generalization. Basically, error

backpropagation learning consists of two phases performed through the different layers of the network: a forward pass and a backward pass. In the forward pass, a

direction of synaptic connections. The synaptic weights are adjusted to make the actual response of the network closer to the desired response. Formalization of the backpropagation algorithm starts with the assumption that an error signal exists at the output of a neuron j at iteration n (i.e., presentation of the

is the number of inputs for jth neuron. Also, we use the symbol v as a shorthand notation of the previously defined variable net. The backpropagation algorithm applies a correction Δwji(n) to the synaptic weight wji(n), which is proportional to the partial derivative δE(n)/δwji(n). Using the

the form The correction Δwji(n) applied to wji(n) is defined by the delta rule where η is the learning-rate parameter of the backpropagation algorithm. The use of the minus sign accounts for gradient descent in weight space, that is, a direction for weight change that reduces the value

the local gradient δj(n) for a given node on a layer closer to the inputs. Let us analyze once more the application of the backpropagation-learning algorithm with two distinct passes of computation that are distinguished for each training example. In the first pass, which is referred to as the

connections for this layer. The backward procedure is repeated until all layers are covered and all weight factors in the network are modified. Then, the backpropagation algorithm continues with a new training sample. When there are no more training samples, the first iteration of the learning process finishes. With the same

through a second, third, and sometimes hundreds of iterations until error energy Eav for the given iteration is small enough to stop the algorithm. The backpropagation algorithm provides an “approximation” to the trajectory in weight space computed by the method of steepest descent. The smaller we make the learning rate parameter

. The idea behind momentum is apparent from its name: including some kind of inertia in weight corrections. The inclusion of the momentum term in the backpropagation algorithm has a stabilizing effect in cases where corrections in weight factors have a high oscillation and sign changes. The momentum term may also have

to be in mid-range regardless of the values of its inputs, and the learning process will converge much faster with every new iteration. In backpropagation learning, we typically use the algorithm to compute the synaptic weights by using as many training samples as possible. The hope is that the neural

them appreciate the technology’s origin, capabilities, and potential applications. The book examines all the important aspects of this emerging technology, covering the learning process, backpropagation, radial basis functions, recurrent networks, self-organizing systems, modular networks, temporal processing, neurodynamics, and VLSI implementation. It integrates computer experiments throughout to demonstrate how neural

Neural Networks with Java introduces the Java programmer to the world of neural networks and artificial intelligence (AI). Neural-network architectures such as the feedforward backpropagation, Hopfield, and Kohonen networks are discussed. Additional AI topics, such as Genetic Algorithms and Simulated Annealing, are also introduced. Practical examples are given for each

networks. In the neuro-fuzzy methods the main idea is to encode a fuzzy system in a neural network, and to apply standard approaches like backpropagation in order to train such a network. This way, neuro-fuzzy systems combine the representational advantages of fuzzy systems with the flexibility and adaptivity of

: NeuroDimension Inc. (www.neurosolutions.com) NeuroSolutions combines a modular, icon-based network design interface with an implementation of advanced learning procedures, such as recurrent backpropagation and backpropagation through time, and it solves data-mining problems such as classification, prediction, and function approximation. Some other notable features include C++ source code generation, customized

Domain-specific knowledge Don’t care symbol Eigenvalue Eigenvector Empirical risk Empirical risk minimization (ERM) Encoding Encoding scheme Ensemble learning Bagging Boosting AdaBoost Entropy Error back-propagation algorithm Error energy Error-correction learning Error rate Euclidean distance Exponential moving average Exploratory analysis Exploratory visualizations Extension principle False acceptance rate (FAR) False reject

Data Mining: Concepts and Techniques: Concepts and Techniques

by Jiawei Han, Micheline Kamber and Jian Pei  · 21 Jun 2011

to Improve Classification Accuracy 8.7. Summary 8.8. Exercises 8.9. Bibliographic Notes 9. Classification 9.1. Bayesian Belief Networks 9.2. Classification by Backpropagation 9.3. Support Vector Machines 9.4. Classification Using Frequent Patterns 9.5. Lazy Learners (or Learning from Your Neighbors) 9.6. Other Classification Methods

, including ensemble methods and how to handle imbalanced data. Chapter 9 discusses advanced methods for classification, including Bayesian belief networks, the neural network technique of backpropagation, support vector machines, classification using frequent patterns, k-nearest-neighbor classifiers, case-based reasoning, genetic algorithms, rough set theory, and fuzzy set approaches. Additional topics

. Normalization is particularly useful for classification algorithms involving neural networks or distance measurements such as nearest-neighbor classification and clustering. If using the neural network backpropagation algorithm for classification mining (Chapter 9), normalizing the input values for each attribute measured in the training tuples will help speed up the learning phase

they become complex. We discuss some work in this area, such as the extraction of classification rules from a “black box” neural network classifier called backpropagation, in Chapter 9. In summary, we have presented several evaluation measures. The accuracy measure works best when the data classes are fairly evenly distributed. Other

the tuple that is least likely to belong to the positive class lands at the bottom of the list. Naïve Bayesian (Section 8.3) and backpropagation (Section 9.2) classifiers return a class probability distribution for each prediction and, therefore, are appropriate, although other classifiers, such as decision tree classifiers (Section

, there is less chance of costly false negative errors). Examples of such classifiers include naïve Bayesian classifiers (Section 8.3) and neural network classifiers like backpropagation (Section 9.2). The threshold-moving method, although not as popular as over- and undersampling, is simple and has shown some success for the two

advanced techniques for data classification. We start with Bayesian belief networks (Section 9.1), which unlike naïve Bayesian classifiers, do not assume class conditional independence. Backpropagation, a neural network algorithm, is discussed in Section 9.2. In general terms, a neural network is a set of connected input/output units in

training process in the form of network topology and/or conditional probability values. This can significantly improve the learning rate. 9.2. Classification by Backpropagation “What is backpropagation?“ Backpropagation is a neural network learning algorithm. The neural networks field was originally kindled by psychologists and neurobiologists who sought to develop and test computational

and numeric prediction in data mining. There are many different kinds of neural networks and neural network algorithms. The most popular neural network algorithm is backpropagation, which gained repute in the 1980s. In Section 9.2.1 you will learn about multilayer feed-forward networks, the type of neural network on

which the backpropagation algorithm performs. Section 9.2.2 discusses defining a network topology. The backpropagation algorithm is described in Section 9.2.3. Rule extraction from trained neural networks is discussed in Section 9

.2.4. 9.2.1. A Multilayer Feed-Forward Neural Network The backpropagation algorithm performs learning on a multilayer feed-forward neural network. It iteratively learns a set of weights for prediction of the class label of tuples

“good” network structure. These typically use a hill-climbing approach that starts with an initial structure that is selectively modified. 9.2.3. Backpropagation “How does backpropagation work?” Backpropagation learns by iteratively processing a data set of training tuples, comparing the network's prediction for each tuple with the actual known target value

are made in the “backwards” direction (i.e., from the output layer) through each hidden layer down to the first hidden layer (hence the name backpropagation). Although it is not guaranteed, in general the weights will eventually converge, and the learning process stops. The algorithm is summarized in Figure 9.3

. However, once you become familiar with the process, you will see that each step is inherently simple. The steps are described next. Figure 9.3 Backpropagation algorithm. Initialize the weights: The weights in the network are initialized to small random numbers (e.g., ranging from −1.0 to 1.0, or

function, because it maps a large input domain onto the smaller range of 0 to 1. The logistic function is nonlinear and differentiable, allowing the backpropagation algorithm to model classification problems that are linearly inseparable. We compute the output values, Oj, for each hidden layer, up to and including the output

) “What is l inEq. (9.8)?” The variable l is the learning rate, a constant typically having a value between 0.0 and 1.0. Backpropagation learns using a gradient descent method to search for a set of weights that fits the training data so as to minimize the mean-squared

been presented. This latter strategy is called epoch updating, where one iteration through the training set is an epoch. In theory, the mathematical derivation of backpropagation employs epoch updating, yet in practice, case updating is more common because it tends to yield more accurate results. Terminating condition: Training stops when ■ All

prespecified number of epochs has expired. In practice, several hundreds of thousands of epochs may be required before the weights will converge. “How efficient is backpropagation?” The computational efficiency depends on the time spent training the network. Given tuples and w weights, each epoch requires time. However, in the worst-case

. For example, a technique known as simulated annealing can be used, which also ensures convergence to a global optimum. Sample calculations for learning by the backpropagation algorithm Figure 9.5 shows a multilayer feed-forward neural network. Let the learning rate be 0.9. The initial weight and bias values of

network are given in Table 9.1, along with the first training tuple, , with a class label of 1. This example shows the calculations for backpropagation, given the first training tuple, X. The tuple is fed into the network, and the net input and output of each unit are computed. These

is input to the trained network, and the net input and output of each unit are computed. (There is no need for computation and/or backpropagation of the error.) If there is one output node per class, then the output node with the highest value determines the predicted class label for

may be considered as belonging to the positive class, while values less than 0.5 may be considered negative. Several variations and alternatives to the backpropagation algorithm have been proposed for classification in neural networks. These may involve the dynamic adjustment of the network topology and of the learning rate or

other parameters, or the use of different error functions. 9.2.4. Inside the Black Box: Backpropagation and Interpretability “Neural networks are like a black box. How can I 'understand' what the backpropagation network has learned?” A major disadvantage of neural networks lies in their knowledge representation. Acquired knowledge in the

, the kernel chosen does not generally make a large difference in resulting accuracy. SVM training always finds a global solution, unlike neural networks, such as backpropagation, where many local minima usually exist (Section 9.2.3). So far, we have described linear and nonlinear SVMs for binary (i.e., two-class

Learners (or Learning from Your Neighbors) The classification methods discussed so far in this book—decision tree induction, Bayesian classification, rule-based classification, classification by backpropagation, support vector machines, and classification based on association rule mining—are all examples of eager learners. Eager learners, when given a set of training tuples

of variables. They provide a graphical model of causal relationships, on which learning can be performed. Trained Bayesian belief networks can be used for classification. ■ Backpropagation is a neural network algorithm for classification that employs a method of gradient descent. It searches for a set of weights that can model the

patterns serve as combined features, which are considered in addition to single features when building a classification model. ■ Decision tree classifiers, Bayesian classifiers, classification by backpropagation, support vector machines, and classification based on frequent patterns are all examples of eager learners in that they use training tuples to construct a generalization

the input and output layers. (b) Using the multilayer feed-forward neural network obtained in (a), show the weight values after one iteration of the backpropagation algorithm, given the training instance “(sales, senior, 31…35, 46K…50K) ”. Indicate your initial weight values and biases and the learning rate used. 9.2

hardware at the time, dampened enthusiasm for research in computational neuronal modeling for nearly 20 years. Renewed interest was sparked following the presentation of the backpropagation algorithm in 1986 by Rumelhart, Hinton, and Williams [RHW86], as this algorithm can learn concepts that are linearly inseparable. Since then, many variations of

backpropagation have been proposed, involving, for example, alternative error functions (Hanson and Burr [HB87]); dynamic adjustment of the network topology (Mèzard and Nadal [MN89]; Fahlman and

]; Ripley [Rip96]; and Haykin [Hay99]. Many books on machine learning, such as Mitchell [Mit97] and Russell and Norvig [RN95], also contain good explanations of the backpropagation algorithm. There are several techniques for extracting rules from neural networks, such as those found in these papers: SN88, Gal93, TS93, Avn95, LSL95, CS96 and

(ICDE’01) Heidelberg, Germany. (Apr. 2001), pp. 443–452. [BCP93] Brown, D.E.; Corruble, V.; Pittard, C.L., A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems, Pattern Recognition 26 (1993) 953–961. [BD01] Bickel, P.J.; Doksum, K.A., Mathematical Statistics: Basic Ideas and Selected

42 (1990) 393–405. [CPS98] Cios, K.; Pedrycz, W.; Swiniarski, R., Data Mining Methods for Knowledge Discovery. (1998) Kluwer Academic . [CR95] Chauvin, Y.; Rumelhart, D., Backpropagation: Theory, Architectures, and Applications. (1995) Lawrence Erlbaum . [Cra89] Crawford, S.L., Extensions to the CART algorithm, Int. J. Man-Machine Studies 31 (Aug. 1989) 197

Comprehensive Foundation. (1999) Prentice-Hall . [Hay08] Haykin, S., Neural Networks and Learning Machines. (2008) Prentice-Hall . [HB87] Hanson, S.J.; Burr, D.J., Minkowski-r back-propagation: Learning in connectionist models with non-euclidian error signals, In: Neural Information Proc. Systems Conf. Denver, CO. (1987), pp. 348–357. [HBV01] Halkidi, M.; Batistakis

data mining 604–607, 624 automatic classification 445 AVA. seeall-versus-all AVC-group 347 AVC-set 347 average() 215 B background knowledge 30–31 backpropagation 393, 398–408, 437 activation function 402 algorithm illustration 401 biases 402, 404 case updating 404 efficiency 404 epoch updating 404 error 403 functioning of

, 385 accuracy 330 accuracy improvement techniques 377–385 active learning 433–434 advanced methods 393–442 applications 327 associative 415, 416–419, 437 automatic 445 backpropagation 393, 398–408, 437 bagging 379–380 basic concepts 327–330 Bayes methods 350–355 Bayesian belief networks 393–397, 436 boosting 380–382 case

counting 256 E eager learners 423, 437 Eclat (Equivalence Class Transformation) algorithm 260, 272 e-commerce 609 editing method 425 efficiency Apriori algorithm 255–256 backpropagation 404 data mining algorithms 31 elbow method 486 email spam filtering 435 engineering applications 613 ensemble methods 378–379, 386 bagging 379–380 boosting 380

22 objective measures 21–22 strong association rules 264–265 subjective measures 22 threshold 21–22 unexpected 22 interestingness constraints 294 application of 297 interpretability backpropagation and 406–408 classification 369 cluster analysis 447 data 85 data quality and 85 probabilistic hierarchical clustering 469 interquartile range (IQR) 49, 555 interval-scaled

, 422–426, 437 case-based reasoning classifiers 425–426 k-nearest-neighbor classifiers 423–425 l-diversity method 622 learning active 430, 433–434, 437 backpropagation 400 as classification step 328 connectionist 398 by examples 445 by observation 445 rate 397 semi-supervised 572 supervised 330 transfer 430, 434–436, 438

592, 593 homogeneous 592, 593 information 592–594 mining in science applications 612–613 social 592 statistical modeling of 592–594 neural networks 19, 398 backpropagation 398–408 as black boxes 406 for classification 19, 398 disadvantages 406 fully connected 399, 406–407 learning 398 multilayer feed-forward 398–399 pruning

Data Science from Scratch: First Principles with Python

by Joel Grus  · 13 Apr 2015  · 579pp  · 76,657 words

result is a network that performs “or, but not and,” which is precisely XOR (Figure 18-3). Figure 18-3. A neural network for XOR Backpropagation Usually we don’t build neural networks by hand. This is in part because we use them to solve much bigger problems — an image recognition

to “reason out” what the neurons should be. Instead (as usual) we use data to train neural networks. One popular approach is an algorithm called backpropagation that has similarities to the gradient descent algorithm we looked at earlier. Imagine we have a training set that consists of input vectors and corresponding

+ 1)] for __ in range(output_size)] # the network starts out with random weights network = [hidden_layer, output_layer] And we can train it using the backpropagation algorithm: # 10,000 iterations seems enough to converge for __ in range(10000): for input_vector, target_vector in zip(inputs, targets

and Argument Unpacking arithmeticin Python, Arithmetic performing on vectors, Vectors artificial neural networks, Neural Networks(see also neural networks) assignment, multiple, in Python, Tuples B backpropagation, Backpropagation bagging, Random Forests bar charts, Bar Charts-Line Charts Bayes's Theorem, Bayes’s Theorem, A Really Dumb Spam Filter Bayesian Inference, Bayesian Inference Beautiful

and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see

Entropy of a Partition Creating a Decision Tree Putting It All Together Random Forests For Further Exploration 18. Neural Networks Perceptrons Feed-Forward Neural Networks Backpropagation Example: Defeating a CAPTCHA For Further Exploration 19. Clustering The Idea The Model Example: Meetups Choosing k Example: Clustering Colors Bottom-up Hierarchical Clustering For

Artificial Intelligence: A Modern Approach

by Stuart Russell and Peter Norvig  · 14 Jul 2019  · 2,466pp  · 668,761 words

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future

by Luke Dormehl  · 10 Aug 2016  · 252pp  · 74,167 words

Rule of the Robots: How Artificial Intelligence Will Transform Everything

by Martin Ford  · 13 Sep 2021  · 288pp  · 86,995 words

The Elements of Statistical Learning (Springer Series in Statistics)

by Trevor Hastie, Robert Tibshirani and Jerome Friedman  · 25 Aug 2009  · 764pp  · 261,694 words

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006

by Ben Goertzel and Pei Wang  · 1 Jan 2007  · 303pp  · 67,891 words

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think

by James Vlahos  · 1 Mar 2019  · 392pp  · 108,745 words

Superintelligence: Paths, Dangers, Strategies

by Nick Bostrom  · 3 Jun 2014  · 574pp  · 164,509 words

The Long History of the Future: Why Tomorrow's Technology Still Isn't Here

by Nicole Kobie  · 3 Jul 2024  · 348pp  · 119,358 words

Machine, Platform, Crowd: Harnessing Our Digital Future

by Andrew McAfee and Erik Brynjolfsson  · 26 Jun 2017  · 472pp  · 117,093 words

How the Mind Works

by Steven Pinker  · 1 Jan 1997  · 913pp  · 265,787 words

Robot Rules: Regulating Artificial Intelligence

by Jacob Turner  · 29 Oct 2018  · 688pp  · 147,571 words

Artificial Intelligence: A Guide for Thinking Humans

by Melanie Mitchell  · 14 Oct 2019  · 350pp  · 98,077 words

On Intelligence

by Jeff Hawkins and Sandra Blakeslee  · 1 Jan 2004  · 246pp  · 81,625 words

Darwin Among the Machines

by George Dyson  · 28 Mar 2012  · 463pp  · 118,936 words

The Mathematics of Banking and Finance

by Dennis W. Cox and Michael A. A. Cox  · 30 Apr 2006  · 312pp  · 35,664 words

When Computers Can Think: The Artificial Intelligence Singularity

by Anthony Berglas, William Black, Samantha Thalind, Max Scratchmann and Michelle Estes  · 28 Feb 2015

Global Catastrophic Risks

by Nick Bostrom and Milan M. Cirkovic  · 2 Jul 2008

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity

by Amy Webb  · 5 Mar 2019  · 340pp  · 97,723 words

Rationality: What It Is, Why It Seems Scarce, Why It Matters

by Steven Pinker  · 14 Oct 2021  · 533pp  · 125,495 words

Analysis of Financial Time Series

by Ruey S. Tsay  · 14 Oct 2001

Know Thyself

by Stephen M Fleming  · 27 Apr 2021

Prediction Machines: The Simple Economics of Artificial Intelligence

by Ajay Agrawal, Joshua Gans and Avi Goldfarb  · 16 Apr 2018  · 345pp  · 75,660 words

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots

by John Markoff  · 24 Aug 2015  · 413pp  · 119,587 words

The Age of Spiritual Machines: When Computers Exceed Human Intelligence

by Ray Kurzweil  · 31 Dec 1998  · 696pp  · 143,736 words

Python Data Analytics: With Pandas, NumPy, and Matplotlib

by Fabio Nelli  · 27 Sep 2018  · 688pp  · 107,867 words

The Rationalist's Guide to the Galaxy: Superintelligent AI and the Geeks Who Are Trying to Save Humanity's Future

by Tom Chivers  · 12 Jun 2019  · 289pp  · 92,714 words

Driverless: Intelligent Cars and the Road Ahead

by Hod Lipson and Melba Kurman  · 22 Sep 2016

The Ethical Algorithm: The Science of Socially Aware Algorithm Design

by Michael Kearns and Aaron Roth  · 3 Oct 2019

Programming Collective Intelligence

by Toby Segaran  · 17 Dec 2008  · 519pp  · 102,669 words

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

by Aurélien Géron  · 13 Mar 2017  · 1,331pp  · 163,200 words

The Alignment Problem: Machine Learning and Human Values

by Brian Christian  · 5 Oct 2020  · 625pp  · 167,349 words

The Thinking Machine: Jensen Huang, Nvidia, and the World's Most Coveted Microchip

by Stephen Witt  · 8 Apr 2025  · 260pp  · 82,629 words

Mastering Machine Learning With Scikit-Learn

by Gavin Hackeling  · 31 Oct 2014

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence

by John Brockman  · 5 Oct 2015  · 481pp  · 125,946 words

Seeking SRE: Conversations About Running Production Systems at Scale

by David N. Blank-Edelman  · 16 Sep 2018

Explaining Humans: What Science Can Teach Us About Life, Love and Relationships

by Camilla Pang  · 12 Mar 2020  · 256pp  · 67,563 words

The Means of Prediction: How AI Really Works (And Who Benefits)

by Maximilian Kasy  · 15 Jan 2025  · 209pp  · 63,332 words

Hello World: Being Human in the Age of Algorithms

by Hannah Fry  · 17 Sep 2018  · 296pp  · 78,631 words

Finding Alphas: A Quantitative Approach to Building Trading Strategies

by Igor Tulchinsky  · 30 Sep 2019  · 321pp

Your Face Belongs to Us: A Secretive Startup's Quest to End Privacy as We Know It

by Kashmir Hill  · 19 Sep 2023  · 487pp  · 124,008 words

The Optimist: Sam Altman, OpenAI, and the Race to Invent the Future

by Keach Hagey  · 19 May 2025  · 439pp  · 125,379 words

Data Mining in Time Series Databases

by Mark Last, Abraham Kandel and Horst Bunke  · 24 Jun 2004  · 205pp  · 20,452 words

Coders: The Making of a New Tribe and the Remaking of the World

by Clive Thompson  · 26 Mar 2019  · 499pp  · 144,278 words

The Singularity Is Near: When Humans Transcend Biology

by Ray Kurzweil  · 14 Jul 2005  · 761pp  · 231,902 words

I, Warbot: The Dawn of Artificially Intelligent Conflict

by Kenneth Payne  · 16 Jun 2021  · 339pp  · 92,785 words

Demystifying Smart Cities

by Anders Lisdorf

The Coming Wave: Technology, Power, and the Twenty-First Century's Greatest Dilemma

by Mustafa Suleyman  · 4 Sep 2023  · 444pp  · 117,770 words

Heart of the Machine: Our Future in a World of Artificial Emotional Intelligence

by Richard Yonck  · 7 Mar 2017  · 360pp  · 100,991 words

Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI

by Karen Hao  · 19 May 2025  · 660pp  · 179,531 words

The Age of Extraction: How Tech Platforms Conquered the Economy and Threaten Our Future Prosperity

by Tim Wu  · 4 Nov 2025  · 246pp  · 65,143 words

Human Frontiers: The Future of Big Ideas in an Age of Small Thinking

by Michael Bhaskar  · 2 Nov 2021

The Future of the Brain: Essays by the World's Leading Neuroscientists

by Gary Marcus and Jeremy Freeman  · 1 Nov 2014  · 336pp  · 93,672 words

Ghost Road: Beyond the Driverless Car

by Anthony M. Townsend  · 15 Jun 2020  · 362pp  · 97,288 words

Rationality: From AI to Zombies

by Eliezer Yudkowsky  · 11 Mar 2015  · 1,737pp  · 491,616 words

The Science of Language

by Noam Chomsky  · 24 Feb 2012

Being You: A New Science of Consciousness

by Anil Seth  · 29 Aug 2021  · 418pp  · 102,597 words

Applied Artificial Intelligence: A Handbook for Business Leaders

by Mariya Yao, Adelyn Zhou and Marlene Jia  · 1 Jun 2018  · 161pp  · 39,526 words