Artificial Intelligence: A Guide for Thinking Humans
by Melanie Mitchell
Published 14 Oct 2019

For the ImageNet project, Mechanical Turk was “a godsend.”6 The service continues to be widely used by AI researchers for creating data sets; nowadays, academic grant proposals in AI commonly include a line item for “Mechanical Turk workers.” The ImageNet Competitions In 2010, the ImageNet project launched the first ImageNet Large Scale Visual Recognition Challenge, in order to spur progress toward more general object-recognition algorithms. Thirty-five programs competed, representing computer-vision researchers from academia and industry around the world. The competitors were given labeled training images—1.2 million of them—and a list of possible categories. The task for the trained programs was to output the correct category of each input image. The ImageNet competition had a thousand possible categories, compared with PASCAL’s twenty.

The following year, the highest-scoring program—also using support vector machines—showed a respectable but modest improvement, getting 74 percent of the test images correct. Most people in the field expected this trend to continue; computer-vision research would chip away at the problem, with gradual improvement at each annual competition. However, these expectations were upended in the 2012 ImageNet competition: the winning entry achieved an amazing 85 percent correct. Such a jump in accuracy was a shocking development. What’s more, the winning entry did not use support vector machines or any of the other dominant computer-vision methods of the day. Instead, it was a convolutional neural network.

It didn’t take long before all the big tech companies (as well as many smaller ones) were snapping up deep-learning experts and their graduate students as fast as possible. Seemingly overnight, deep learning became the hottest part of AI, and expertise in deep learning guaranteed computer scientists a large salary in Silicon Valley or, better yet, venture capital funding for their proliferating deep-learning start-up companies. The annual ImageNet competition began to see wider coverage in the media, and it quickly morphed from a friendly academic contest into a high-profile sparring match for tech companies commercializing computer vision. Winning at ImageNet would guarantee coveted respect from the vision community, along with free publicity, which might translate into product sales and higher stock prices.

pages: 345 words: 75,660

Prediction Machines: The Simple Economics of Artificial Intelligence
by Ajay Agrawal , Joshua Gans and Avi Goldfarb
Published 16 Apr 2018

Inventory management involved predicting how many items would be in a warehouse on a given day. More recently, entirely new classes of prediction problems emerged. Many were nearly impossible before the recent advances in machine intelligence technology, including object identification, language translation, and drug discovery. For example, the ImageNet Challenge is a high-profile annual contest to predict the name of an object in an image. Predicting the object in an image can be a difficult task, even for humans. The ImageNet data contains a thousand categories of objects, including many breeds of dog and other similar images. It can be difficult to tell the difference between a Tibetan mastiff and a Bernese mountain dog, or between a safe and a combination lock.

pages: 416 words: 118,522

Why Machines Learn: The Elegant Math Behind Modern AI
by Anil Ananthaswamy
Published 15 Jul 2024

Titled “ImageNet: A Large-Scale Hierarchical Image Database,” the paper included an immense dataset of millions of hand-labeled images consisting of thousands of categories (immense by the standards of 2009). In 2010, the team put out the ImageNet challenge: Use 1.2 million ImageNet images, binned into 1,000 categories, to train your computer vision system to correctly categorize those images, and then test it on 100,000 unseen images to see how well the system recognizes them. The contest was so new that it was conducted as a “taster competition” alongside a more established contest, the PASCAL Visual Object Classes Challenge 2010. Standard computer vision still ruled the roost then. In recognition of this, the ImageNet challenge provided users with so-called scale invariant feature transforms (SIFTs).

In January 2022, at a town hall meeting organized under the aegis of the National Science Foundation, Tom Goldstein of the University of Maryland argued that much of the history of machine learning has been focused on theoretically principled mathematical frameworks (the kind that gave us support vector machines and kernel methods, for example). But by 2011, when AlexNet won the ImageNet competition, things had changed. AlexNet was a stupendous experimental success; there was no adequate theory to explain its performance. According to Goldstein, the AI community said to itself, “Maybe we shouldn’t have such a focus on theory. Maybe we should be doing experimental science to progress machine learning.”

pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by Mary L. Gray and Siddharth Suri
Published 6 May 2019

They tried a few different workflows but were ultimately able to use about 49,000 workers from 167 countries to accurately label 3.2 million images.9 After two and a half years, their collective labor created a massive, gold-standard data set of high-resolution images, each with highly accurate labels of the objects in the image. Li called it ImageNet. Thanks to ImageNet competitions held annually since its creation, research teams use the data set to develop more sophisticated image recognition algorithms and to advance the state of the art. Having a gold-standard data set allowed researchers to measure the accuracy of their new algorithms and to compare their algorithms with the current state of the art.

To incentivize researchers to use the data set, Li and her colleagues organized an annual contest pitting the best algorithms for the image recognition problem, from various research teams around the world, against one another. The progress scientists made toward this goal was staggering. The annual ImageNet competition saw a roughly 10x reduction in error and a roughly 3x increase in precision in recognizing images over the course of eight years. Eventually the vision algorithms achieved a lower error rate than the human workers. The algorithmic and engineering advances that scientists achieved over the eight years of competition fueled much of the recent success of neural networks, the so-called deep learning revolution, which would impact a variety of fields and problem domains.

Without them generating and improving the size and quality of the training data, ImageNet would not exist.11 ImageNet’s success is a noteworthy example of the paradox of automation’s last mile in action. Humans trained an AI, only to have the AI ultimately take over the task entirely. Researchers could then open up even harder problems. For example, after the ImageNet challenge finished, researchers turned their attention to finding where an object is in an image or video. These problems needed yet more training data, generating another wave of ghost work. But ImageNet is merely one of many examples of how computer programmers and business entrepreneurs use ghost work to create training data to develop better artificial intelligence.12 The Range of Ghost Work: From Micro-Tasks to Macro-Tasks The platforms generating on-demand ghost work offer themselves up as gatekeepers helping employers-turned-requesters tackle problems that need a bit of human intelligence.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design
by Michael Kearns and Aaron Roth
Published 3 Oct 2019

But money alone wasn’t enough to recruit talent—top researchers want to work where other top researchers are—so it was important for AI labs that wanted to recruit premium talent to be viewed as places that were already on the cutting edge. In the United States, this included research labs at companies such as Google and Facebook. One way to do this was to beat the big players in a high-profile competition. The ImageNet competition was perfect—focused on exactly the kind of vision task for which deep learning was making headlines. The contest required each team’s computer program to classify the objects in images into a thousand different and highly specific categories, including “frilled lizard,” “banded gecko,” “oscilloscope,” and “reflex camera.”

The training images came with labels, so that the learning algorithms could be told what kind of object was in each image. Such competitions have proliferated in recent years; the Netflix competition, which we have mentioned a couple of times already, was an early example. Commercial platforms such as Kaggle (which now, in fact, hosts the ImageNet competition) offer datasets and competitions—some offering awards of $100,000 for winning teams—for thousands of diverse, complex prediction problems. Machine learning has truly become a competitive sport. It wouldn’t make sense to score ImageNet competitors based on how well they classified the training images—after all, an algorithm could have simply memorized the labels for the training set, without learning any generalizable rule for classifying images.

It wouldn’t make sense to score ImageNet competitors based on how well they classified the training images—after all, an algorithm could have simply memorized the labels for the training set, without learning any generalizable rule for classifying images. Instead, the right way to evaluate the competitors is to see how well their models classify new images that they have never seen before. The ImageNet competition reserved 100,000 “validation” images for this purpose. But the competition organizers also wanted to give participants a way to see how well they were doing. So they allowed each team to test their progress by submitting their current model and being told how frequently it correctly classified the validation images.

pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything
by Martin Ford
Published 13 Sep 2021

Without relying on GPU chips to accelerate their deep neural network, it’s doubtful that the winning team’s entry would have performed well enough to win the contest. We’ll delve further into the history of deep learning in Chapter 4. The University of Toronto’s team used GPUs manufactured by NVIDIA, a company founded in 1993 whose business focused exclusively on designing and manufacturing state-of-the-art graphics chips. In the wake of the 2012 ImageNet competition and the ensuing widespread recognition of the powerful synergy between deep learning and GPUs, the company’s trajectory shifted dramatically, transforming it into one of the most prominent technology companies associated with the rise of artificial intelligence. Evidence of the deep learning revolution manifested directly in the company’s market value: between January 2012 and January 2020 NVIDIA’s shares soared by more than 1,500 percent.

Many of the startup companies and university researchers working in this area believe, like Covariant, that a strategy founded on deep neural networks and reinforcement learning is the best way to fuel progress toward more dexterous robots. One notable exception is Vicarious, a small AI company based in the San Francisco Bay Area. Founded in 2010—two years before the 2012 ImageNet competition brought deep learning to the forefront—Vicarious’s long-term objective is to achieve human-level or artificial general intelligence. In other words, the company is, in a sense, competing directly with higher-profile and far better funded initiatives like those at DeepMind and OpenAI. We’ll delve into the paths being forged by those two companies and the general quest for human-level AI in Chapter 5.

Schmidhuber is clearly frustrated over the lack of recognition given to his own research, and is known for abrasively interrupting presentations at AI conferences and leveling accusations of a “conspiracy” to rewrite deep learning’s history, especially on the part of Hinton, LeCun and Bengio.15 For their part, these better-known researchers push back aggressively. LeCun told a New York Times reporter that “Jürgen is manically obsessed with recognition and keeps claiming credit he doesn’t deserve.”16 Though disagreements about the true origins of deep learning are likely to persist, there is no doubt that in the wake of the 2012 ImageNet competition, the technique rapidly took the field of artificial intelligence—and most of the technology industry’s largest companies—by storm. American tech behemoths like Google, Amazon, Facebook and Apple, as well as the Chinese companies Baidu, Tencent and Alibaba, immediately recognized the disruptive potential of deep neural networks and began to build research teams and incorporate the technology into their products and operations.

The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do
by Erik J. Larson
Published 5 Apr 2021

The systems aren’t perfect, largely because of the constant cat-and-mouse game between service providers and spammers endlessly trying new and different approaches to fool trained filters.3 Spam detection is not a particularly sexy example of supervised learning. Modern deep learning systems also perform classification for tasks like image recognition and visual object recognition. The well-known ImageNet competitions present contestants with a large-scale task in supervised learning, drawing on the millions of images that ImageNet has downloaded from websites like Flickr for use in training and testing the accuracy of deep learning systems. All these images have been labeled by humans (providing their services to the project through Amazon’s Mechanical Turk interface) and the terms they apply make up a structured database of English words known as WordNet.

A selected subset of words in WordNet represents a category to be learned, using common nouns (like dog, pumpkin, piano, house) and a selection of more obscure items (like Scottish terrier, hussar monkey, flamingo). The contest is to see which of the competing deep learning classifiers is able to label the most images correctly, as they were labeled by the humans. With over a thousand categories being used in ImageNet competitions, the task far exceeds the yes-or-no problem presented to spam detectors (or any other binary classification task, such as simply labeling whether an image is of a human face or not). Competing in this competition means performing a massive classification task using pixel data as input.4 Sequence classification is often used in natural language processing applications.

In truth it was because there was, initially, a hodgepodge of older statistical techniques in use for data science and machine learning in AI that the sought-after insights emerging from big data were mistakenly pinned to the data volume itself. This was a ridiculous proposition from the start; data points are facts and, again, can’t become insightful themselves. Although this has become apparent only in the rearview mirror, the early deep learning successes on visual object recognition, in the ImageNet competitions, signaled the beginning of a transfer of zeal from big data to the machine learning methods that benefit from it—in other words, to the newly explosive field of AI. Thus big data has peaked, and now seems to be receding from popular discussion almost as quickly as it appeared. The focus on deep learning makes sense, because after all, the algorithms rather than just the data are responsible for trouncing human champions at Go, mastering Atari games, driving cars, and the rest.

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

Hinton has come up with an idea called “dropout,” where during training certain portions of the network get randomly turned off. Krizhevsky tries this, and it seems, for various reasons, to help. He tries using neurons with a so-called “rectified linear” output function. This, too, seems to help. He submits his best model on the ImageNet competition deadline, September 30, and then the final wait begins. Two days later, Krizhevsky gets an email from Stanford’s Jia Deng, who is organizing that year’s competition, cc’d to all of the entrants. In plain, unemotional language, Deng says to click the link provided to see the results. Krizhevsky clicks the link provided and sees the results.

—ERNEST BURGESS71 Your scientists were so preoccupied with whether or not they could . . . that they didn’t stop to think if they should. — JEFF GOLDBLUM AS IAN MALCOLM, JURASSIC PARK One of the most important things in any prediction is to make sure that you’re actually predicting what you think you’re predicting. This is harder than it sounds. In the ImageNet competition, for instance—in which AlexNet did so well in 2012—the goal is to train machines to identify what images depict. But this isn’t what the training data captures. The training data captures what human volunteers on Mechanical Turk said the image depicted. If a baby lion, let’s say, were repeatedly misidentified by human volunteers as a cat, it would become part of a system’s training data as a cat—and any system labeling it as a lion would be docked points and would have to adjust its parameters to correct this “error.”

By the fourth layer, the network was responding to configurations of eyes and nose, to tile floors, to the radial geometry of a starfish or a spider, to the petals of a flower or keys on a typewriter. By the fifth layer, the ultimate categories into which objects were being assigned seemed to exert a strong influence. The effect was dramatic, insightful. But was it useful? Zeiler popped the hood of the AlexNet model that had won the ImageNet competition in 2012 and started digging around, inspecting it using deconvolution. He noticed a bunch of flaws. Some low-level parts of the network had normalized incorrectly, like an overexposed photograph. Other filters had gone “dead” and weren’t detecting anything. Zeiler hypothesized that they weren’t correctly sized for the types of patterns they were trying to match.

pages: 477 words: 75,408

The Economic Singularity: Artificial Intelligence and the Death of Capitalism
by Calum Chace
Published 17 Jul 2016

In deep learning, the algorithms operate in several layers, each layer processing data from previous ones and passing the output up to the next layer. The output is not necessarily binary, just on or off: it can be weighted. The number of layers can vary too, with anything above ten layers seen as very deep learning – although in December 2015 a Microsoft team won the ImageNet competition with a system which employed a massive 152 layers.[lxvi] Deep learning, and especially artificial neural nets (ANNs), are in many ways a return to an older approach to AI which was explored in the 1960s but abandoned because it proved ineffective. While Good Old-Fashioned AI held sway in most labs, a small group of pioneers known as the Toronto mafia kept faith with the neural network approach.

In December 2015, Microsoft's chief speech scientist Xuedong Huang noted that speech recognition has improved 20% a year consistently for the last 20 years. He predicted that computers would be as good as humans at understanding human speech within five years. Geoff Hinton – the man whose team won the landmark 2012 ImageNet competition – went further. In May 2015 he said that he expects machines to demonstrate common sense within a decade. Common sense can be described as having a mental model of the world which allows you to predict what will happen if certain actions are taken. Professor Murray Shanahan of Imperial College uses the example of throwing a chair from a stage into an audience: humans would understand that members of the audience would throw up their hands to protect themselves, but some damage would probably be caused, and certainly some upset.

pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order
by Kai-Fu Lee
Published 14 Sep 2018

One of the clearest examples of these accelerating improvements is the ImageNet competition. In the competition, algorithms submitted by different teams are tasked with identifying thousands of different objects within millions of different images, such as birds, baseballs, screwdrivers, and mosques. It has quickly emerged as one of the most respected image-recognition contests and a clear benchmark for AI’s progress in computer vision. When the Oxford machine-learning experts made their estimates of technical capabilities in early 2013, the most recent ImageNet competition of 2012 had been the coming-out party for deep learning.

pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World
by Peter H. Diamandis and Steven Kotler
Published 3 Feb 2015

Fifty thousand different traffic signs are used—signs obscured by long distances, by trees, by the glare of sunlight. In 2011, for the first time, a machine-learning algorithm bested its makers, achieving a 0.5 percent error rate, compared to 1.2 percent for humans.32 Even more impressive were the results of the 2012 ImageNet Competition, which challenged algorithms to look at one million different images—ranging from birds to kitchenware to people on motor scooters—and correctly slot them into a thousand unique categories. Seriously, it’s one thing for a computer to recognize known objects (zip codes, traffic signs), but categorizing thousands of random objects is an ability that is downright human.

pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control
by Stuart Russell
Published 7 Oct 2019

This leads to a simple formula for propagating the error backwards from the output layer to the input layer, tweaking knobs along the way. Miraculously, the process works. For the task of recognizing objects in photographs, deep learning algorithms have demonstrated remarkable performance. The first inkling of this came in the 2012 ImageNet competition, which provides training data consisting of 1.2 million labeled images in one thousand categories, and then requires the algorithm to label one hundred thousand new images.4 Geoff Hinton, a British computational psychologist who was at the forefront of the first neural network revolution in the 1980s, had been experimenting with a very large deep convolutional network: 650,000 nodes and 60 million parameters.

For example, to learn the difference between the “situational superko” and “natural situational superko” rules, the learning algorithm would have to try repeating a board position that it had created previously by a pass rather than by playing a stone. The results would be different in different countries. 4. For a description of the ImageNet competition, see Olga Russakovsky et al., “ImageNet large scale visual recognition challenge,” International Journal of Computer Vision 115 (2015): 211–52. 5. The first demonstration of deep networks for vision: Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, ed.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

Although deep convolutional networks had been in use since the 1990s for tasks such as handwriting recognition, and neural networks had begun to surpass generative probability models for speech recognition by around 2010, it was the success of the AlexNet deep learning system in the 2012 ImageNet competition that propelled deep learning into the limelight. The ImageNet competition was a supervised learning task with 1,200,000 images in 1,000 different categories, and systems were evaluated on the “top-5” score—how often the correct category appears in the top five predictions. AlexNet achieved an error rate of 15.3%, whereas the next best system had an error rate of more than 25%.

Experiments were carried out with such networks as far back as the 1970s, and in the form of convolutional neural networks they found some success in hand-written digit recognition in the 1990s (LeCun et al., 1995). It was not until 2011, however, that deep learning methods really took off. This occurred first in speech recognition and then in visual object recognition. In the 2012 ImageNet competition, which required classifying images into one of a thousand categories (armadillo, bookshelf, corkscrew, etc.), a deep learning system created in Geoffrey Hinton’s group at the University of Toronto (Krizhevsky et al., 2013) demonstrated a dramatic improvement over previous systems, which were based largely on handcrafted features.

Their use in natural language processing is discussed in Chapter 25. 22.7Unsupervised Learning and Transfer Learning The deep learning systems we have discussed so far are based on supervised learning, which requires each training example to be labeled with a value for the target function. Although such systems can reach a high level of test-set accuracy—as shown by the ImageNet competition results, for example—they often require far more labeled data than a human would for the same task. For example, a child needs to see only one picture of a giraffe, rather than thousands, in order to be able to recognize giraffes reliably in a wide range of settings and views. Clearly, something is missing in our deep learning story; indeed, it may be the case that our current approach to supervised deep learning renders some tasks completely unattainable because the requirements for labeled data would exceed what the human race (or the universe) can supply.

pages: 590 words: 152,595

Army of None: Autonomous Weapons and the Future of War
by Paul Scharre
Published 23 Apr 2018

Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” 87 over a hundred layers: Christian Szegedy et al., “Going Deeper With Convolutions,” 87 error rate of only 4.94 percent: Richard Eckel, “Microsoft Researchers’ Algorithm Sets ImageNet Challenge Milestone,” Microsoft Research Blog, February 10, 2015, Kaiming He et al., “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” 87 estimated 5.1 percent error rate: Olga Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” January 20, 2015, 87 3.57 percent rate: Kaiming He et al., “Deep Residual Learning for Image Recognition,” December 10, 2015, 6 Crossing the Threshold: Approving Autonomous Weapons 89 delineation of three classes of systems: Department of Defense, “Department of Defense Directive Number 3000.09.” 90 “minimize the probability and consequences”: Ibid, 1. 91 “We haven’t had anything that was even remotely close”: Frank Kendall, interview, November 7, 2016. 91 “We had an automatic mode”: Ibid. 91 “relatively soon”: Ibid. 91 “sort through all that”: Ibid. 91 “Are you just driving down”: Ibid. 92 “other side of the equation”: Ibid. 92 “a reasonable question to ask”: Ibid. 92 “where technology supports it”: Ibid. 92 “principles and obey them”: Ibid. 93 “Automation and artificial intelligence are”: Ibid. 93 Work explained in a 2014 monograph: Robert O.

pages: 499 words: 144,278

Coders: The Making of a New Tribe and the Remaking of the World
by Clive Thompson
Published 26 Mar 2019

By 2012, the field had a seismic breakthrough. Up at the University of Toronto, the British computer scientist Geoff Hinton had been beavering away for two decades on improving neural networks. That year he and a team of students showed off the most impressive neural net yet—by soundly beating competitors at an annual AI shootout. The ImageNet challenge, as it’s known, is an annual competition among AI researchers to see whose system is best at recognizing images. That year, Hinton’s deep-learning neural net got only 15.3 percent of the images wrong. The next-best competitor had an error rate almost twice as high, of 26.2 percent. It was an AI moon shot.

pages: 282 words: 63,385

Attention Factory: The Story of TikTok and China's ByteDance
by Matthew Brennan
Published 9 Oct 2020

66 67 68 Image source: * “Real stuff” is my imperfect translation of 干货 gānhuò, which could also be translated as “the real McCoy” or “something of substance” Chapter 3 Recommendation, From YouTube to TikTok Chapter Timeline 2009 – Netflix awards a $1 million prize for an algorithm that increased the accuracy of their video recommendation by 10% 2011 – YouTube introduces machine learning algorithmic recommendation engine, Sibyl, with immediate impact 2012 Aug – ByteDance launches news aggregation app Toutiao 2012 Sep t – AlexNet breakthrough at the ImageNet challenge triggers a global explosion of interest in AI 2013 Mar – Facebook changes its newsfeed to a “personalized newspaper ” 2014 April – Instagram begins using an “explore ” tab of personalized content 2015 – Google Brain’s deep learning algorithms begin supercharging a wide variety of Google products, including YouTube recommendations It was 2010, and YouTube had a big problem.

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurélien Géron
Published 13 Mar 2017

You can often get the same effect as a 9 × 9 kernel by stacking two 3 × 3 kernels on top of each other, for a lot less compute. Over the years, variants of this fundamental architecture have been developed, leading to amazing advances in the field. A good measure of this progress is the error rate in competitions such as the ILSVRC ImageNet challenge. In this competition the top-5 error rate for image classification fell from over 26% to barely over 3% in just five years. The top-five error rate is the number of test images for which the system’s top 5 predictions did not include the correct answer. The images are large (256 pixels high) and there are 1,000 classes, some of which are really subtle (try distinguishing 120 dog breeds).

pages: 566 words: 169,013

Nexus: A Brief History of Information Networks From the Stone Age to AI
by Yuval Noah Harari
Published 9 Sep 2024

In the prolific literature on the story of AI, two events pop up again and again. The first occurred when, on September 30, 2012, a convolutional neural network called AlexNet won the ImageNet Large Scale Visual Recognition Challenge. If you have no idea what a convolutional neural network is, and if you have never heard of the ImageNet challenge, you are not alone. More than 99 percent of us are in the same situation, which is why AlexNet’s victory was hardly front-page news in 2012. But some humans did hear about AlexNet’s victory and decoded the writing on the wall. They knew, for example, that ImageNet is a database of millions of annotated digital images.

The article was illustrated with an image of a robotic hand holding up a photo of a cat.5 All those cat images that tech giants had been harvesting from across the world, without paying a penny to either users or tax collectors, turned out to be incredibly valuable. The AI race was on, and the competitors were running on cat images. At the same time that AlexNet was preparing for the ImageNet challenge, Google too was training its AI on cat images, and even created a dedicated cat-image-generating AI called the Meow Generator.6 The technology developed by recognizing cute kittens was later deployed for more predatory purposes. For example, Israel relied on it to create the Red Wolf, Blue Wolf, and Wolf Pack apps used by Israeli soldiers for facial recognition of Palestinians in the Occupied Territories.7 The ability to recognize cat images also led to the algorithms Iran uses to automatically recognize unveiled women and enforce its hijab laws.

pages: 586 words: 186,548

Architects of Intelligence
by Martin Ford
Published 16 Nov 2018

In the end, science won out, and two of my students won a big public competition, and they won it dramatically. They got almost half the error rate of the best computer vision systems, and they were using mainly techniques developed in Yann LeCun’s lab but mixed in with a few of our own techniques as well. MARTIN FORD: This was the ImageNet competition? GEOFFREY HINTON: Yes, and what happened then was what should happen in science. One method that people used to think of as complete nonsense had now worked much better than the method they believed in, and within two years, they all switched. So, for things like object classification, nobody would dream of trying to do it without using a neural network now.

We released the entire 15 million images to the world and started to run international competitions for researchers to work on the ImageNet problems: not on the tiny small-scale problems but on the problems that mattered to humans and applications. Fast-forward to 2012, and I think we see the turning point in object recognition for a lot of people. The winner of the 2012 ImageNet competition created a convergence of ImageNet, GPU computing power, and convolutional neural networks as an algorithm. Geoffrey Hinton wrote a seminal paper that, for me, was Phase One in achieving the holy grail of object recognition. MARTIN FORD: Did you continue this project? FEI-FEI LI: For the next two years, I worked on taking object recognition a step further.

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

: Xinmei Shen, “Microsoft’s Xiaoice Chatbot to Become Its Own Company in China,” Abacus (blog), South China Morning Post, July 13, 2020,; Yizhou (Joe) Xu, “Programmatic Dreams: Technographic Inquiry into Censorship of Chinese Chatbots,” Social Media + Society 4, no. 4 (2018), 159Xiaoice has since been programmed to sidestep questions: Xu, “Programmatic Dreams.” 160Bing search engine, which is permitted in China: Tom Simonite, “US Companies Help Censor the Internet in China, Too,” Wired, June 3, 2019, 160“we’re committed to providing our technology”: Colin Lecher, “Microsoft Workers’ Letter Demands Company Drop Army HoloLens Contract,” The Verge, February 22, 2019, 160“Whampoa Academy of China’s Internet”: Jeffrey Ding, translator, “The Whampoa Academy of China’s Internet,” Google Docs, n.d., 160over 500 Microsoft Research Asia alums work in China’s tech industry: Ding, “The Whampoa Academy of China’s Internet”; Will Knight, “Microsoft’s Roots in China Have Positioned It to Buy TikTok,” Wired, August 6, 2020, 160surpass human-level performance in image classification: Kaiming He et al., Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (Microsoft Research, February 6, 2015),; Richard Eckel, “Microsoft Researchers’ Algorithm Sets ImageNet Challenge Milestone,” Microsoft Research Blog, February 10, 2015, 160team’s 2015 paper on “deep residual learning”: Bec Crew, “Google Scholar Reveals Its Most Influential Papers for 2019,” nature index, August 2, 2019,; Kaiming He et al., Deep Residual Learning for Image Recognition (, n.d.), 1605,000 papers in top-tier journals: Tim Pan, interview by author, June 21, 2019. 160“That’s, on average, every working day”: Pan, interview. 161“very small number” of interns: Kevin Luo, interview by author, June 21, 2019. 161approximately eleven such interns: Information in this section comes from multiple interviews with Microsoft representatives during July and August 2021. 161also a PhD student in computer science at the PLA’s NUDT: 微软亚洲研究院 [Microsoft Research Asia] “实习派 | 胡明昊:在MSRA研究机器阅读理解是一种怎样的体验?

Driverless: Intelligent Cars and the Road Ahead
by Hod Lipson and Melba Kurman
Published 22 Sep 2016

See also Mid-level controls Consumer acceptance, 11–13 Controls engineering Overview of, 47, 75–77 See also Low-level controls; Mid-level controls; High-level controls Convolutional neural networks (CNNs), 214–218 Corner cases, 4, 5, 89, 154 Creative destruction, 261–263 Crime, 273, 274 DARPA Challenges, 149, 150 DARPA Grand Challenge 2004 DARPA Grand Challenge 2005, 151, 152 DARPA Urban Challenge 2007, 156–158 Data CAN bus protocol, 193, 194 Data collection, 239, 240 Training data for deep learning, 218–220 See also Machine learning; Route-planning software; Traffic prediction software Deep learning History of, 197, 199–202, 219, 223–226 How deep learning works, 7, 8, 226–231 See also ImageNet competition; Neocognitron; Perceptron; SuperVision Demo 97, 134, 135 Digital cameras, 173–175 Disney Hall, Los Angeles, 36 Disney’s Magic Highway U.S.A. Dog of War, 79 Downtowns, 32–37 Drive by wire191–194 Driver assist, 55–58. See also Human in the loop Driverless-car reliability, 98–104, 195–196 Drive-PX 225 E-commerce, 271, 272 Edge detectors 229 Electronic Highway History of, 116–120 Reasons for demise, 123, 124 See also General Motors Corporation (GM) Environment.

pages: 447 words: 111,991

Exponential: How Accelerating Technology Is Leaving Us Behind and What to Do About It
by Azeem Azhar
Published 6 Sep 2021

In 2012, a group of leading AI researchers – Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton – developed a ‘deep convolutional neural network’ which applied deep learning to the kinds of image-sorting tasks that AIs had long struggled with. It was rooted in extraordinary computing clout. The neural network contained 650,000 neurons and 60 million ‘parameters’, settings you could use to tune the system. It was a game-changer. Before AlexNet, as Krizhevsky’s team’s invention was called, most AIs that took on the ImageNet competition had stumbled, for years never scoring higher than 74 per cent. AlexNet had a success rate as high as 87 per cent. Deep learning worked. The triumph of deep learning sparked an AI feeding frenzy. Scientists rushed to build artificial intelligence systems, applying deep neural networks and their derivatives to a vast array of problems: from spotting manufacturing defects to translating between languages; from voice recognition to detecting credit card fraud; from discovering new medicines to recommending the next video we should watch.

pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again
by Eric Topol
Published 1 Jan 2019

IMAGES ImageNet exemplified an adage about AI: datasets—not algorithms—might be the key limiting factor of human-level artificial intelligence.39 When Fei-Fei Li, a computer scientist now at Stanford and half time at Google, started ImageNet in 2007, she bucked the idea that algorithms ideally needed nurturing from Big Data and instead pursued the in-depth annotation of images. She recognized it wasn’t about Big Data; it was about carefully, extensively labeled Big Data. A few years ago, she said, “I consider the pixel data in images and video to be the dark matter of the Internet.”40 Many different convolutional DNNs were used to classify the images with annual ImageNet Challenge contests to recognize the best (such as AlexNet, GoogleNet, VGG Net, and ResNet). Figure 4.6 shows the progress in reducing the error rate over several years, with ImageNet wrapping up in 2017, with significantly better than human performance in image recognition. The error rate fell from 30 percent in 2010 to 4 percent in 2016.