speech recognition

description: automatic conversion of spoken language into text

252 results

pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans
by Melanie Mitchell
Published 14 Oct 2019

Can deep learning, along with big data, produce machines that can flexibly and reliably deal with human language? Speech Recognition and the Last 10 Percent Automated speech recognition—the task of transcribing spoken language into text in real time—was deep learning’s first major success in NLP, and I’d venture to say that it is AI’s most significant success to date in any domain. In 2012, at the same time that deep learning was revolutionizing computer vision, a landmark paper on speech recognition was published by research groups at the University of Toronto, Microsoft, Google, and IBM.2 These groups had been developing deep neural networks for various aspects of speech recognition: recognizing phonemes from acoustic signals, predicting words from combinations of phonemes, predicting phrases from combinations of words, and so on.

…

According to a Google speech-recognition expert, the use of deep networks resulted in the “biggest single improvement in 20 years of speech research.”3 The same year, a new deep-network speech-recognition system was released to customers on Android phones; two years later it was released on Apple’s iPhone, with one Apple engineer commenting, “This was one of those things where the jump [in performance] was so significant that you do the test again to make sure that somebody didn’t drop a decimal place.”4 If you yourself happened to use any kind of speech-recognition technology both before and after 2012, you will have also noticed a very sharp improvement. Speech recognition, which before 2012 ranged from horribly frustrating to moderately useful, suddenly became very nearly perfect in some circumstances. I am now able to dictate all of my texts and emails on my phone’s speech-recognition app; just a few moments ago, I read the “Restaurant” story to my phone, using my normal speaking speed, and it correctly transcribed every word. What’s stunning to me is that speech-recognition systems are accomplishing all this without any understanding of the meaning of the speech they are transcribing. While the speech-recognition system on my phone can transcribe every word of my “Restaurant” story, I guarantee you that it doesn’t understand a thing about it, or about anything else.

…

In 2012, at the same time that deep learning was revolutionizing computer vision, a landmark paper on speech recognition was published by research groups at the University of Toronto, Microsoft, Google, and IBM.2 These groups had been developing deep neural networks for various aspects of speech recognition: recognizing phonemes from acoustic signals, predicting words from combinations of phonemes, predicting phrases from combinations of words, and so on. According to a Google speech-recognition expert, the use of deep networks resulted in the “biggest single improvement in 20 years of speech research.”3 The same year, a new deep-network speech-recognition system was released to customers on Android phones; two years later it was released on Apple’s iPhone, with one Apple engineer commenting, “This was one of those things where the jump [in performance] was so significant that you do the test again to make sure that somebody didn’t drop a decimal place.”4 If you yourself happened to use any kind of speech-recognition technology both before and after 2012, you will have also noticed a very sharp improvement.

pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think
by James Vlahos
Published 1 Mar 2019

Unlike at the board meeting, however, the voice interface belly flopped. Siri relied on a third-party company’s technology for speech recognition. But in a case of phenomenally bad timing, that company was experiencing technical problems on the day of the Apple showcase. “It was easily the worst demo that we ever did in the history of the company,” Kittlaus says. He told Siri, “Give me two tickets for the Cubs game,” and the speech recognition service interpreted his utterance as “The circus is going to be in town next week.” The founders were subsequently able to convince Apple that the speech-recognition glitch was only temporary. But they remained on edge in the months leading up to the launch of the Siri app.

…

The first group of challenges were those that required engineering—speech recognition and language understanding, for example. They weren’t easy, but with enough time, effort, and resources, they could be cracked using technological methods that were already known to the world. The second category of problems, however, were those that required invention—wholly new approaches. Topping that list was a challenge known as far-field speech recognition. Wherever you were in a room, and whatever else was happening acoustically—music playing, baby crying, Klingons attacking—the device should be able to hear you. “Far-field speech recognition did not exist in any commercial product when we started on this project,” Hart says.

…

The sound waves emanating from your mouth must be converted into words, a process known as automated speech recognition. Determining what you were trying to communicate with those words is called natural-language understanding. Formulating a suitable reply is natural-language generation. And finally, speech synthesis allows voice-computing devices to audibly reply. Each of these subprocesses has bedeviled computer scientists for decades. And each has been significantly advanced by deep learning, so we will spend the rest of this chapter examining how. Automated speech recognition The next time you get ready to swear at Siri for bungling your words, pump the brakes and consider the miracle of hearing.

pages: 414 words: 109,622

Genius Makers: The Mavericks Who Brought A. I. To Google, Facebook, and the World
by Cade Metz
Published 15 Mar 2021

Yann LeCun and Yoshua Bengio join him. 2007—Geoff Hinton coins the term “deep learning,” a way of describing neural networks. 2008—Geoff Hinton runs into Microsoft researcher Li Deng in Whistler, British Columbia. 2009—Geoff Hinton visits Microsoft Research lab in Seattle to explore deep learning for speech recognition. 2010—Abdel-rahman Mohamed and George Dahl, two of Hinton’s students, visit Microsoft. Demis Hassabis, Shane Legg, and Mustafa Suleyman found DeepMind. Stanford professor Andrew Ng pitches Project Marvin to Google chief executive Larry Page. 2011—University of Toronto researcher Navdeep Jaitly interns at Google in Montreal, building a new speech recognition system through deep learning. Andrew Ng, Jeff Dean, and Greg Corrado found Google Brain. Google deploys speech recognition service based on deep learning. 2012—Andrew Ng, Jeff Dean, and Greg Corrado publish the Cat Paper.

…

Working alongside Vanhouke’s team, this new lab pushed the technology onto Android smartphones in less than six months. At first Google didn’t tell the world its speech recognition service had changed, and soon after it went live, Vanhoucke received a phone call from a small company that supplied a chip for the latest Android phones. This chip was supposed to remove background noise when you barked commands into your phone—a way of cleaning up the sound so that the speech system could more easily identify what was being said. But the company told Vanhoucke its chip had stopped working. It was no longer boosting the performance of the speech recognition service. As Vanhoucke listened to what this company was saying, it didn’t take long to realize what had happened.

…

In early December 2009, Li Deng had once again made the drive from the NIPS conference in Vancouver to the NIPS workshops in Whistler. A year after running into Geoff Hinton inside the Whistler Hilton and stumbling onto his research with deep learning and speech recognition, Deng had organized a new workshop around the idea at the same spot high in the Canadian mountains. He and Hinton would spend the next few days explaining the finer points of “neural speech recognition” to the other researchers gathered in Whistler, walking them through the prototype under way at the Microsoft lab in Redmond. As he drove north, winding through the mountain roads, Deng carried three of these researchers in his SUV.

pages: 372 words: 101,174

How to Create a Mind: The Secret of Human Thought Revealed
by Ray Kurzweil
Published 13 Nov 2012

To the surprise of my colleagues, our effort turned out to be very successful, having succeeded in recognizing speech comprising a large vocabulary with high accuracy. After that experiment, all of our subsequent speech recognition efforts have been based on hierarchical hidden Markov models. Other speech recognition companies appeared to discover the value of this method independently, and since the mid-1980s most work in automated speech recognition has been based on this approach. Hidden Markov models are also used in speech synthesis—keep in mind that our biological cortical hierarchy is used not only to recognize input but also to produce output, for example, speech and physical movement.

…

The technique we developed had substantially all of the attributes that I describe in the PRTM. It included a hierarchy of patterns with each higher level being conceptually more abstract than the one below it. For example, in speech recognition the levels included basic patterns of sound frequency at the lowest level, then phonemes, then words and phrases (which were often recognized as if they were words). Some of our speech recognition systems could understand the meaning of natural-language commands, so yet higher levels included such structures as noun and verb phrases. Each pattern recognition module could recognize a linear sequence of patterns from a lower conceptual level.

…

There was not a lot known about the neocortex in the early 1980s, but based on my experience with a variety of pattern recognition problems, I assumed that the brain was also likely to be reducing its multidimensional data (whether from the eyes, the ears, or the skin) using a one-dimensional representation, especially as concepts rose in the neocortex’s hierarchy. For the speech recognition problem, the organization of information in the speech signal appeared to be a hierarchy of patterns, with each pattern represented by a linear string of elements with a forward direction. Each element of a pattern could be another pattern at a lower level, or a fundamental unit of input (which in the case of speech recognition would be our quantized vectors). You will recognize this situation as consistent with the model of the neocortex that I presented earlier.

pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence
by Ray Kurzweil
Published 31 Dec 1998

By late 1997 and early 1998, large-vocabulary CSR without a domain limitation for dictating written documents (like this book) was commercially introduced.16 • Prediction: The three technologies required for a translating telephone (where you speak and listen in one language such as English, and your caller hears you and replies in another language such as German)—speaker-independent (not requiring training on a new speaker), continuous, large-vocabulary speech recognition; language translation; and speech synthesis—will each exist in sufficient quality for a first generation system by the late 1990s. Thus, we can expect “translating telephones with reasonable levels of performance for at least the more popular languages early in the first decade of the twenty-first century.” What Happened: Effective, speaker-independent speech recognition, MY LIFE WITH MACHINES: SOME HIGHLIGHTS I walked onstage and played a composition on an old upright piano.

…

Artificial life Simulated organisms, each including a set of behavior and reproduction rules (a simulated “genetic code”), and a simulated environment. The simulated organisms simulate multiple generations of evolution. The term can refer to any self-replicating pattern. ASR See Automatic speech recognition. Automatic speech recognition (ASR) Software that recognizes human speech. In general, ASR systems include the ability to extract high-level patterns in speech data. BGM See Brain-generated music. Big bang theory A prominent theory on the beginning of the Universe: the cosmic explosion, from a single point of infinite density, that marked the beginning of the Universe billions of years ago.

…

A key question in the twenty-first century is whether computers will achieve consciousness (which their human creators are considered to have). Continuous speech recognition (CSR) A software program that recognizes and records natural language. Crystalline computing A system in which data is stored in a crystal as a hologram, conceived by Stanford professor Lambertus Hesselink. This three-dimensional storage method requires a million atoms for each bit and could achieve a trillion bits of storage for each cubic centimeter. Crystalline computing also refers to the possibility of growing computers as crystals. CSR See Continuous speech recognition. Cybernetic artist A computer program that is able to create original artwork in poetry, visual art, or music.

pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together
by Nick Polson and James Scott
Published 14 May 2018

Today, though, it works shockingly well. At its tech conference in 2017, for example, Google boldly announced that machines had now reached parity with humans at speech recognition, with a per-word dictation error rate of 4.9%—drastically better than the 20–30% error rates common as recently as 2013. This quantum leap in linguistic performance is a huge reason why machines now seem so smart. One might argue, in fact, that human-level speech recognition is the last decade’s single most important breakthrough in AI. So when was the tipping point, and how did we get there? What are “word vectors,” and why are they so useful?

…

From the 1950s through the 1970s, experts tried to get machines to understand natural language using this same top-down approach: (1) place constraints on human users, by restricting the grammar and vocabulary they can use; and (2) program the machines chock-full of translation rules: syntax, pronunciation, word choice … basically, all the rules you learned without trying as a child, together with all the grammar rules you learned from Mrs. Thistlebum in elementary school. This rules-based philosophy had worked great for programming languages. But it never worked very well for natural languages. A great example of how it went wrong is computer speech recognition. The very first speech-recognition systems were essentially toys. At the 1962 World’s Fair, for example, IBM showed off a machine that could recognize spoken English words—precisely 16 of them, and only if enunciated with painful clarity. In the 1970s there was a false dawn, in the form of a program called Harpy, created by researchers at Carnegie Mellon.

…

Language became a prediction-rule problem based on input/output pairs, similar to the problems solved by Henrietta Leavitt, or that farmer in Japan who uses deep learning to classify cucumbers: • For speech recognition, you pair a voice recording (input = “ahstinbrekfustahkoz”) with the correct transcription (output = “Austin breakfast tacos”). • For translating English to Russian, you pair an English word or sentence (“reset”) with the correct Russian translation (“perezagruzka”). • For predicting sentiment, you pair a sentence (“What a delightful morning spent in line at the DMV”) with a human annotation (). And so on. In each case, the machine must use the data to learn a prediction rule that correctly maps inputs to outputs. In the 1980s, speech-recognition software based on this principle began to hit the market.

pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order
by Kai-Fu Lee
Published 14 Sep 2018

International competitions frequently pit different computer vision or speech recognition teams against each other, with the competitors opening their work to scrutiny by other researchers. The speed of improvements in AI also drives researchers to instantly share their results. Many AI scientists aren’t trying to make fundamental breakthroughs on the scale of deep learning, but they are constantly making marginal improvements to the best algorithms. Those improvements regularly set new records for accuracy on tasks like speech recognition or visual identification. Researchers compete on the basis of these records—not on new products or revenue numbers—and when one sets a new record, he or she wants to be recognized and receive credit for the achievement.

…

I told him that he was a great young researcher but that China lagged too far behind American speech-recognition giants like Nuance, and there were fewer customers in China for this technology. To his credit, Liu ignored that advice and poured himself into building iFlyTek. Nearly twenty years and dozens of AI competition awards later, iFlyTek has far surpassed Nuance in capabilities and market cap, becoming the most valuable AI speech company in the world. Combining iFlyTek’s cutting-edge capabilities in speech recognition, translation, and synthesis will yield transformative AI products, including simultaneous translation earpieces that instantly convert your words and voice into any language.

…

At each step along the way, students’ time and performance on different problems feed into their student profiles, adjusting the subsequent problems to reinforce understanding. In addition, for classes such as English (which is mandatory in Chinese public schools), AI-powered speech recognition can bring top-flight English instruction to the most remote regions. High-performance speech recognition algorithms can be trained to assess students’ English pronunciation, helping them improve intonation and accent without the need for a native English speaker on site. From a teacher’s perspective, these same tools can be used to alleviate the burden of routine grading tasks, freeing up teachers to spend more time on the students themselves.

pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots
by John Markoff
Published 24 Aug 2015

Gruber had spent his time designing for personal computers and the World Wide Web, not mobile phones, so hearing Kittlaus describe the future of computing was a revelation. In the mid-2000s, keyboards on mobile phones were a limiting factor and so it made more sense to include speech recognition. SRI had been at the forefront of speech recognition research for decades. Nuance, the largest independent speech recognition firm, got its start as an SRI spin-off, so Cheyer understood the capabilities of speech recognition well. “It’s not quite ready yet,” he said. “But it will be.” Gruber was thrilled. Cheyer had been the chief architect of the CALO project at SRI, and Kittlaus had deep knowledge of the mobile phone industry.

…

“I’m here to announce”—he paused slightly for effect—“that the answer is still no.” The audience howled with laughter and broke into applause. He added, “But we’re getting there.” The Siri designers discovered early on that they could quickly improve cloud-based speech recognition. At that point, they weren’t using the SRI-inspired Nuance technology, but instead a rival system called Vlingo. Cheyer noticed that when speech recognition systems were placed on the Web, they were exposed to a torrent of data in the form of millions of user queries and corrections. This data set up a powerful feedback loop to train and improve Siri. The developers continued to believe that their competitive advantage would be that the Siri service represented a fundamental break with the dominant paradigm for finding information on the Web—the information search—exemplified by Google’s dramatically successful search engine.

…

H., 209–210 Maes, Pattie, 191–194 Magic Leap, 272–275 magnetometers, 125–127 Maker Movement, 213–214 Mako Surgical, 271–272 “Man-Computer Symbiosis” (Licklider), 24, 163 Marcuse, Herbert, 173–174 Markel, Lester, 71 Markoff, John, xi Markram, Henry, 144, 155 Marr, David, 46, 145 Massie, Thomas, 258 McAfee, Andrew, 79, 82–83, 86–87 McCarthy, John AI terminology coined by, xii, 108–109 on autonomous manipulation, 257 early career of, 105–111 early personal computing and, 197–198 IA versus AI debate and, 15, 130, 165–167 Moravec and, 117–118 SAIL inception and, 7–8, 111–115 (see also Stanford Artificial Intelligence Laboratory [SAIL]) McKinsey, 80–81, 84 McLuhan, Marshall, 187, 199 Mead, Carver, 231 Media Equation, The (Nass, Reeves), 189 medical applications, 127–128, 271–272, 285 Memex, 6, 62 Menlo Ventures, 315 Men of Mathematics (Bell), 105 Mercedes, 45, 57, 119 microcosm, 9 MicroPlanner, 175 Microsoft agent-based interfaces and, 187–191, 215–220 early smartphones and, 239 HoloLens, 273 IA versus AI debate and, 170, 187–191 Intel and, 262 neural nets, 152 Robotics Developer Studio, 329–330 Windows, 187 microsofts, 24 Mind Children (Moravec), 118, 125, 257 Minsky, Marvin AI Winter and, 131 early AI and, 4, 6, 104–109 Engelbart and, 17 on machine vision, 114, 234–235 MIT Artificial Intelligence Laboratory and, 109–111 neural networks and, 142 Perceptron, 146 Salisbury and, 257–258 Selfridge and, 190 Winograd and, 171, 172 MIT Architecture Machine Group, 306–307, 308–309 Artificial Intelligence Laboratory, 109–111 “Automation” (panel discussion), 98 Autor and, 10, 78 Baxter (robot), 195–196, 204–205, 205, 207 Brynjolfsson and McAfee, 79, 82–83 Leg Lab, 232, 235 Leonard and, 55 Media Lab, 191–192, 306–310 Minsky and, 4, 104, 109–111 Model Railway Club, 110 Raibert and, 230 Shashua and, 46–47 Wiener and, 69, 71–72 Winograd and, 16, 170–172, 171, 174–178 Mobileye, 46–54 Model S (Tesla), 46 Montemerlo, Mike, 20, 22, 32, 35–36, 36 Moore, Gordon, 95, 307–308 Moore’s law, xvii, 9, 88, 117–119 Mooter, 222 Moravec, Hans, 112, 114, 115–125, 120–122, 199, 200–201, 257 Morgenthaler, Gary, 315 Morozov, Evgeny, 213–214 Motwani, Rajeev, 185 mouse, invention of, 6, 255 Murnane, Richard J., 10 Mycin, 127 My Cybertwin (Cognea), 221–225, 237 My Perfect Girlfriend (My Cybertwin), 221–224 NASA Ames Research Center, 8–9, 168 bottom-up ideology, 202–204 early Mars robot, 233–234 Jet Propulsion Laboratory, 202, 230 moon exploration by, 113–114 Sojourner, 203 Stanford Cart project and, 120 Taylor and, 159–165 Valkyrie, 251 Nash, John Forbes, 107 Nass, Clifford, 189 National Australia Bank, 223 National Bureau of Economic Research (NBER), 80, 87, 325 National Commission on Technology, Automation, and Economic Progress, 74 National Highway Transportation Safety Administration, 45 National Institutes of Health, 336 Navlab (Carnegie Mellon), 33 NCR, 150 NCSA Mosaic, 301, 312 Negroponte, Nicholas, 191, 306–310, 340 Nelson, Ted, 288, 308 Nettalk, 147–148 Neural Computation and Adaptive Perception, 150–151 Neural Information Processing Systems (NIPS), 157 neural networks convolutional neural networks, 150 deep learning neural networks, 150–156 Google’s work on, 152–154 Hinton and Sejnowski on, 143–148, 151 LeCun on, 148–152, 151 Minsky and, 142 Rosenblatt and, 141–142, 143 Shakey project, 101–105 Neuromancer (Gibson), 23–24 Newell, Allen, 108 Newell, Martin, 47 NeXT, 305 Ng, Andrew, 153, 259–260, 265, 267 1984 (Orwell), xvi Nissan, 50 NLS (oN-Line System), 5–7, 172, 197 Norman, Donald, 143, 170, 186 Norvig, Peter, 55–56, 63, 92 “nouvelle AI,” 201–204 Nuance, 223, 300, 321 nuclear power plants, robots used for rescue, 233–235, 237–238 Numenta, 154 Obama, Barack, 24, 81, 167–168, 236 Odyssey (Sculley), 306 On Intelligence (Hawkins), 85, 154 ontologies, 288 Open Agent Architecture (OAA), 299 Open Source Computer Vision (OpenCV), 261, 261–265 O’Reilly, Tim, 83–84 Orwell, George, xvi Osborne 1, 211 Out of Control (Kelly), 17 Ozzie, Ray, 299 Page, Larry on agent-based interfaces, 13 founding of Google, 16, 184–187 Google’s image and, 41 PageRank algorithm, 62, 92, 259 robotic advancement and, 241–244 Thrun and, 35, 37, 38 Pandemonium, 190 Papert, Seymour, 143, 144, 146, 148, 177–178, 191 Parallel Distributed Processing (PDP), 145 Pask, Gordon, 308–310 pattern recognition, 9 Pearl, Judea, 216 Peninsula School, 3 Perceptive Assistant that Learns (PAL), 31 perceptrons, defined, 142 Perceptrons (Minsky, Papert), 143, 146 personal computing advent of, xv, 196–201, 210–211, 255–256 “post-PC era,” 239 Personal Information Manager (Lotus), 140 Phenomenology of Self, The (Hegel), 340 Philips, 65–68, 66 Planetary Society, 168 play-based interaction, 212 Poggio, Tommy, 47, 50 Poindexter, John, 29 Pratt, Gill, 232, 235–238, 335 probabilistic decision tree, 4 Program on Liberation Technology (Stanford), 342 Prospector (SRI), 128 PR1/2 (Personal Robot One/Two), 258–260, 267–268 Q&A, 135 “Race Against the Machine” (Brynjolfsson, McAfee), 79 Raibert, Marc, 227–232, 235, 238, 245, 247 RAND Corporation, 74, 177 Raphael, Bert, 103 Rashid, Richard, 152 Real Travel, 295–297 Reddy, Raj, 112–113, 282 Redzone Robotics, 233 Reeves, Byron, 189 Replicant (Rubin’s robot), 256 replicants (Blade Runner), 338–339 Rethink Robotics, 99, 204–208 Reuther, Walter, 68–73, 74 Rice University, 9, 86 Rifkin, Jeremy, 76–77 Robby (robot), 331 Robot Children (Moravec), 118 robotics advancement, 227–275 autonomous robots, 250–251 Bradski and, 260–275, 261 computer vision, 46–54, 114, 120–122, 200–201, 234–235, 242, 261, 261–265 DARPA Robotics Challenge, 227–230, 234, 236–238, 244–254, 249 early personal computing and, 255–256 effect on labor force, 241–244, 269–270 (see also labor force) Google and, 241–244, 248–255, 256, 260–261 Raibert and, 227–232, 235, 245, 247 Rubin and, 238–241, 240, 249–254 Salisbury and, 256–260, 267 scene understanding, 47–48, 155–156 self-aware machines and, 9–10, 15, 26, 72–76, 84–94, 119, 122–125, 220–221 touch, 257–260, 271 walking robots, 232 Whittaker and “field robotics,” 233–234 Robot (Moravec), 122–124 “Role of Autonomy in DoD Systems, The” (Department of Defense), 334–335 “Role of Raw Power in Intelligence, The” (McCarthy), 118 Rosen, Charles, 5, 100–105, 143–148 Rosenblatt, Frank, 141–142, 143 Rubin, Andy, xiii–xiv, 99–100, 195–196, 238–241, 240, 249–254, 332 Rubinsteyn, Alex, 157 Rumelhart, David, 143 Rutter, Brad, 225, 226 Saffo, Paul, 10 Salisbury, Ken, 256–260, 267 Samsung, 83–84 Sand, Ben, 80 Sandperl, Ira, 3 Schaft, 245–246, 248, 250, 251–254 Schank, Roger, 131, 180–181 Schirra, Wally, 161, 163 Schmidt, Eric, 41 “Scientist Rebels, A” (Wiener), 70 Scoble, Robert, 319 Scripts, Plans, Goals, and Understanding (Schank, Abelson), 180–181 Sculley, John, 35, 280, 300, 305, 306, 307, 317 Searle, John, 179–182 Second Machine Age, The (Brynjolfsson, McAfee), 82, 86–87 Sejnowski, Terry, 143, 144–148, 148–152, 151 self-aware machines machine learning, 91 “robot rights,” 15 singularity concept, 9–10, 84–94, 119, 122–125, 220–221 weapons as, 26 Wiener on, 72–76 Selfridge, Oliver, 190 Sensable Devices, 258 Shakey project, 1–7, 5, 101–105 Shannon, Claude, 107, 174 Shashua, Amnon, 46–54 Shneiderman, Ben, 186, 187–194, 317 Shockley, William, 95–99, 100, 256 Shockley Semiconductor Laboratory, 96, 98 SHRDLU, 132, 170–172, 174–178 Sidekick (Danger, Inc.), 240 Silicon Valley, inception of, 95–99, 100, 178, 256 Simon, Herbert, 108, 168, 215, 283 Simon (personal computer), 231 Singhal, Amit, 314 Singularity Institute, 17 Singularity Is Near, The (Kurzweil), 119 Sinofsky, Steven, 217 Siri (Apple), 277–323 Apple history and, 279–281 Apple’s acquisition of Siri, 281–282, 320–323 Cheyer’s early career and, 297–305 Gruber’s early career and, 277–279, 278, 282–297 IA versus AI issues, 12–13, 31, 190, 193–194, 282 Knowledge Navigator and, 188, 300, 304, 305–310, 317, 318 released as iPhone App, 319–320 Siri (company) founded, 310–320 Siu, Henry, 326 Sketchpad, 230–231, 306, 308 SLAM (simultaneous localization and mapping), 20–21, 37 Sloan Foundation, 180 smartphones, advent of, 239, 312 Snowden, Edward, 29 Software Psychology (Shneiderman), 188 Sol-20, 210–211 Sony, 23, 245, 246 Southworth, Lucy, 242 speech recognition. see language and speech recognition; Siri (Apple) Sproull, Bob, 231 SRI International CALO, 31, 297, 302–304, 310, 311 inception of, 3–4, 99–105 Intuitive Surgical and, 271 naming of, 2, 180 Prospector, 128 SRI name of, 180 Stanford Artificial Intelligence Laboratory (SAIL) Brooks and, 98–99, 196–201 McCarthy and, 7–8, 111–115 Salisbury and, 256 Stanford Cart, 120–122, 200–201 Stanford Artificial Intelligence Robot (STAIR), 259–260, 265 Stanford Industrial Park, 254–255 Stanford Knowledge Systems Laboratory, 293 Stanford Research Institute. see SRI International Stanford University Knowledge Engineering Laboratory, 133–134 Microsoft and, 189 Mycin, 127 Program on Liberation Technology, 16, 342 Stanford University Network, 134 Stanley project, 19–23, 35–36, 36 Starfish Prime, 126 Stochastic Neural Analog Reinforcement Calculator, 142 Strategic Defense Initiative (“Star Wars”), 134, 182, 183, 203 Strong artificial intelligence, 12, 26, 272 Suarez, Daniel, 45 Suitable Technologies, 243 Sun Microsystems, 134 Sutherland, Ivan, 230–232, 255, 306, 308 Symbolics, 128 Synergy, 134 Syntelligence, 128–130 Taylor, Robert, 31, 159–165 Teknowledge, 128, 134–140 Templeton, Brad, 333 Tenenbaum, Jay “Marty,” 289, 290 Terman, Frederick, 255 Tesla, 46 Tesler, Larry, 8, 138 Tether, Tony, 21–22, 28–32, 234, 236, 297, 302 Thinking Machines Corporation, 119 “Third Industrial Revolution” (IR3), 88, 89 Thrun, Sebastian Bradski and, 260, 263–265 goals of, xiii at Google, 35–40, 55–56, 152–153 Shashua and, 49–50 Stanley project and autonomous vehicles, 19–23, 35–36, 36, 59 time-sharing concept, 111–112, 197–198 Tools for Conviviality (Illich), 213–215 Total Information Awareness, 29 Toyoda, Akio, 90 Toyota, 60, 90 traffic jam assist, 45–46, 52, 57, 164 Trower, Tandy, 190, 327–332 Tufekci, Zeynep, 328–329 Turing, Alan, 13–14, 106 Turkle, Sherry, 173, 221–222 ubiquitous computing (UbiComp), xv, 193, 274, 281 Understanding Computers and Cognition (Flores, Winograd), 182, 188 United Auto Workers (UAW), 68–73 University of Massachusetts, 283–287 University of Pittsburgh, 127–128 University of Utah, 47 URBAN5, 308 Urmson, Chris, 56 U.S.

The Deep Learning Revolution (The MIT Press)
by Terrence J. Sejnowski
Published 27 Sep 2018

His literal translation of Aleksandr Pushkin’s Eugene Onegin into English, annotated with explanatory footnotes on the cultural background of the verses, made his point.11 Perhaps Google Translate will be able to translate Shakespeare someday by integrating across all of his poetry.12 Learning How to Listen Another holy grail of artificial intelligence is speech recognition. Until recently, speaker-independent speech recognition by computers was The Rise of Machine Learning 9 limited to narrow domains, such as airline reservations. Today, it is unlimited. A summer research project at Microsoft Research by an intern from the University of Toronto in 2012 dramatically improved the performance of Microsoft’s speech recognition system (figure 1.4).13 In 2016, a team at Microsoft announced that its deep learning network with 120 layers had achieved human-level performance on a benchmark test for multi-speaker speech recognition.14 The consequences of this breakthrough will ripple through society over the next few years, as computer keyboards are replaced by natural language interfaces.

…

Just as typewriters became obsolete with the widespread use of Figure 1.4 Microsoft Chief Research Officer Rick Rashid in a live demonstration of automated speech recognition using deep learning on October 25, 2012, at an event in Tianjin, China. Before an audience of 2,000 Chinese, Rashid’s words, spoken in English, were recognized by the automated system, which first showed them in subtitles below Rashid’s screen image and then translated them into spoken Chinese. This high-wire act made newsfeeds worldwide. Courtesy of Microsoft Research. 10 Chapter 1 personal computers, so computer keyboards will someday become museum pieces. When speech recognition is combined with language translation, it will become possible to communicate across cultures in real time.

…

For an early foray along these lines, see Andrej Karpathy, “The Unreasonable Effectiveness of Recurrent Neural Networks,” Andrej Karpathy Blog, posted May 21, 2015. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. 13. G. Hinton, L. Deng, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine 29, no. 6 (2012): 82–97. 14. W. Xiong, , J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, et al., “Achieving Human Parity in Conversational Speech Recognition,” Microsoft Research Technical Report MSR-TR-2016-71, revised February 2017. https://arxiv.org/pdf/ 1610.05256.pdf. 15. A. Esteva, B. Kuprel, R. A. Novoa, J. Ko J, S. M. Swetter, H. M. Blau, and S.

pages: 586 words: 186,548

Architects of Intelligence
by Martin Ford
Published 16 Nov 2018

One of the things it learned to do was to discover a pattern that would fire if there was a cat of some sort in the center of the frame because that’s a relatively common occurrence in YouTube videos, so that was pretty cool. The other thing we did was to work with the speech recognition team on applying deep learning and deep neural networks to some of the problems in the speech recognition system. At first, we worked on the acoustic model, where you try to go from raw audio waveforms to a part-of-word sound, like “buh,” or “fuh,” or “ss”—the things that form words. It turned out we could use neural networks to do that much better than the previous system they were using. That got very significant decreases in word error rate for the speech recognition system. We then just started to look and collaborate with other teams around Google about what kinds of interesting perception problems that it had in the speech space or in the image recognition or video processing space.

…

MARTIN FORD: That’s when it transitioned from being centered in universities to being in the mainstream domain at places like Google, Facebook, and Baidu? YOSHUA BENGIO: Exactly. The shift started slightly earlier, around 2010, with companies like Google, IBM, and Microsoft, who were working on neural networks for speech recognition. By 2012, Google had these neural networks on their Android smartphones. It was revolutionary for the fact that the same technology of deep learning could be used for both computer vision and speech recognition. It drove a lot of attention toward the field. MARTIN FORD: Thinking back to when you first started in neural networks, are you surprised at the distance things have come and the fact that they’ve become so central to what large companies, like Google and Facebook, are doing now?

…

They went as interns to IBM and Microsoft, and a third student took their system to Google. The basic system that they had built was developed further, and over the next few years, all these companies’ labs converted to doing speech recognition using neural nets. Initially, it was just using neural networks for the frontend of their system, but eventually, it was using neural nets for the whole system. Many of the best people in speech recognition had switched to believing in neural networks before 2012, but the big public impact was in 2012, when the vision community, almost overnight, got turned on its head and this crazy approach turned out to win.

pages: 317 words: 84,400

Automate This: How Algorithms Came to Rule Our World
by Christopher Steiner
Published 29 Aug 2012

Their hack was so revolutionary, in fact, that it not only changed speech translation software, but also speech recognition programs. Instead of trying to nail each word as it comes out of the speaker’s mouth, the latest and best speech recognition software looks for strings of words that make sense together. That way, it has an easy time distinguishing are from our. Are you going to the mall today won’t be mistaken with Our you going to the mall today because, simply, people never say our you going. Just as we learn grammar rules, so the machine-learning algorithm did as well. This method forms the backbone of the speech recognition programs we use today. Brown and Mercer’s breakthrough didn’t go unnoticed on Wall Street.

…

For years, constructing a bot that could quantify spoken words and determine personalities and thoughts was impossible. The technology—software and hardware—just wasn’t ready. Speech recognition software—the ability of computers to capture and translate exactly what humans say—was a lost cause for decades. The software that did exist for the purpose was buggy and often wildly inaccurate. But in the early 1990s, two scientists at IBM’s research center dove into computerized speech recognition and translation, a field that had long failed to produce anything robust enough to be used in everyday situations. Peter Brown and Robert Mercer started by working on programs that translated one language to another, starting with French to English.

…

In their freshman-year programming classes, many college engineers design a simple algorithm to flawlessly play the game of tic-tac-toe.3 In their program, the opposing, or human, player’s move forms the input. With that information, the algorithm produces an output in the way of its own moves. A student expecting an A on such a problem will produce an algorithm that never loses a game (but often plays to a draw). The algorithms used by a high-frequency trader or a speech recognition program work the same way. They’re fed inputs—perhaps the movements of different stock indices, currency rate fluctuations, and oil prices—with which they produce an output: say, buy GE stock. Algorithmic trading is nothing more than relying on an algorithm for the answers of when and what to buy and sell.

pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future
by Luke Dormehl
Published 10 Aug 2016

Two members of Hinton’s lab, George Dahl and Abdel-rahman Mohamed, quickly demonstrated that it worked just as well for speech recognition as it did for image recognition. In 2009, the pair pitted their newly created speech recognition neural network up against the then-industry standard tools, which had been worked on for the past three decades. The deep learning net won. At this point, major companies began to take an interest. One of these was Google. In 2011, a PhD student of Hinton’s named Navdeep Jaitly was asked to tinker with Google’s speech recognition algorithms. He took one look at them and suggested gutting the entire system and replacing it with a deep neural network.

…

‘I want to eat in the same restaurant I ate in last week,’ is a straightforward enough sentence, but to make it into something useful, an AI assistant such as Siri must not only use natural language processing to understand the concept you are talking about, but also use context to find the right rule in its programming to follow. The speech recognition used in Siri is the creation of Nuance Communications, arguably the most advanced speech recognition company in the world. ‘Our job is to figure out the logical assertions inherent in the question that is being asked, or the command that is being given,’ Nuance’s Distinguished Scientist Ron Kaplan tells me. ‘From that, you then have to be able to interpret and turn it into an executable command.

…

For example, the Coca-Cola Bottling Company of Atlanta, Georgia, made headlines when it ‘hired’ an AI assistant called Hank to man its phone switchboard. Using what was then a state-of-the-art speech recognition system, Hank proved capable of answering some queries and redirecting calls for others. Like a prototype Siri, he was programmed with both an archive of useful information and a jovial personality. Ask him about Coca-Cola shareholder issues and he could tell you. Ask him about his personal life and he would answer that ‘virtual assistants are not allowed to have relationships’. (Alas, Hank’s speech recognition wasn’t perfect. Questioning him on whether he snorted coke would prompt him to say, ‘Of course!

pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again
by Eric Topol
Published 1 Jan 2019

Whether it’s in English or Chinese, talking is more than two to three times faster than typing (both at the initial speech-transcription stage as well as when edited via keyboard) and has a significantly lower error rate in Chinese, a difficult language to type (Figure 12.2). It wasn’t until 2016 that speech recognition by AI came into its own, when Microsoft’s and Google’s speech recognition technologies matched our skill at typing, achieving a 5 percent error rate. By now, AI has surpassed human performance. FIGURE 12.1: The time it takes from introduction of a new technology to adoption by one in four Americans. Source: Adapted from: “Happy Birthday World Wide Web,” Economist (2014): www.economist.com/graphic-detail/2014/03/12/happy-birthday-world-wide-web.

…

Versus M.D.”17 The adversarial relationship between humans and their technology, which had a long history dating back to the steam engine and the first Industrial Revolution, had been rekindled. 1936—Turing paper (Alan Turing) 1943—Artificial neural network (Warren McCullogh, Walter Pitts) 1955—Term “artificial intelligence” coined (John McCarthy), 1957—Predicted ten years for AI to beat human at chess (Herbert Simon) 1958—Perceptron (single-layer neural network) (Frank Rosenblatt) 1959—Machine learning described (Arthur Samuel) 1964—ELIZA, the first chatbot 1964—We know more than we can tell (Michael Polany’s paradox) 1969—Question AI viability (Marvin Minsky) 1986—Multilayer neural network (NN) (Geoffrey Hinton) 1989—Convolutional NN (Yann LeCun) 1991—Natural-language processing NN (Sepp Hochreiter, Jurgen Schmidhuber) 1997—Deep Blue wins in chess (Garry Kasparov) 2004—Self-driving vehicle, Mojave Desert (DARPA Challenge) 2007—ImageNet launches 2011—IBM vs. Jeopardy! champions 2011—Speech recognition NN (Microsoft) 2012—University of Toronto ImageNet classification and cat video recognition (Google Brain, Andrew Ng, Jeff Dean) 2014—DeepFace facial recognition (Facebook) 2015—DeepMind vs. Atari (David Silver, Demis Hassabis) 2015—First AI risk conference (Max Tegmark) 2016—AlphaGo vs.

…

Even if Deep Blue didn’t have much of anything to do with deep learning, the technology’s day was coming. The founding of ImageNet by Fei-Fei Li in 2007 had historic significance. That massive database of 15 million labeled images would help catapult DNN into prominence as a tool for computer vision. In parallel, natural-language processing for speech recognition based on DNN at Microsoft and Google was moving into full swing. More squarely in the public eye was man versus machine in 2011, when IBM Watson beat the human Jeopardy! champions. Despite the relatively primitive AI that was used, which had nothing to do with deep learning networks and which relied on speedy access to Wikipedia’s content, IBM masterfully marketed it as a triumph of AI.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

James Baker’s DRAGON system (Baker, 1975) could be considered the first succesful speech recognition system. It was the first to use HMMs for speech. After several decades of systems based on probabilistic language models, the field began to switch to deep neural networks (Hinton et al., 2012). Deng (2016) describes how the introduction of deep learning enabled rapid improvement in speech recognition, and reflects on the implications for other NLP tasks. Today deep learning is the dominant approach for all large-scale speech recognition systems. Speech recognition can be seen as the first application area that highlighted the success of deep learning, with computer vision following shortly thereafter.

…

Machine translation: Online machine translation systems now enable the reading of documents in over 100 languages, including the native languages of over 99% of humans, and render hundreds of billions of words per day for hundreds of millions of users. While not perfect, they are generally adequate for understanding. For closely related languages with a great deal of training data (such as French and English) translations within a narrow domain are close to the level of a human (Wu et al., 2016b). Speech recognition: In 2017, Microsoft showed that its Conversational Speech Recognition System had reached a word error rate of 5.1%, matching human performance on the Switchboard task, which involves transcribing telephone conversations (Xiong et al., 2017). About a third of computer interaction worldwide is now done by voice rather than keyboard; Skype provides real-time speech-to-speech translation in ten languages.

…

(For handwritten or typed communication, we have the problem of optical character recognition.) 24.6Natural Language Tasks Natural language processing is a big field, deserving an entire textbook or two of its own (Goldberg, 2017; Jurafsky and Martin, 2020). In this section we briefly describe some of the main tasks; you can use the references to get more details. Speech recognition is the task of transforming spoken sound into text. We can then perform further tasks (such as question answering) on the resulting text. Current systems have a word error rate of about 3% to 5% (depending on details of the test set), similar to human transcribers. The challenge for a system using speech recognition is to respond appropriately even when there are errors on individual words. Top systems today use a combination of recurrent neural networks and hidden Markov models (Hinton et al., 2012; Yu and Deng, 2016; Deng, 2016; Chiu et al., 2017; Zhang et al.,2017).

pages: 284 words: 84,169

Talk on the Wild Side
by Lane Greene
Published 15 Dec 2018

His formal report concluded, in diplomatic language, that future American government funding “should be spent hardheadedly toward important, realistic, and relatively short-range goals”. The real message was clear: up until then, money had been spent thoughtlessly on trivial, unrealistic or long-range goals. Pierce wrote privately in 1969 about the progress in another field, speech recognition, to the Journal of the Acoustical Society of America. This time, he was frank: …a general phonetic typewriter [ie, a speech-recognition system that would take voice input and produce text output] is simply impossible unless the typewriter has an intelligence and a knowledge of language comparable to those of a native speaker of English … The typical recognizer … builds or programs an elaborate system that either does very little or flops in an obscure way.

…

In this case, instead of a mass of texts translated by humans from English to French, speech-recognition systems learn from a mass of recordings, paired with transcriptions of those recordings made by humans. Now the trick is to match not a string of English text to a string of French text, but a series of vibrations in the air to a string of words. Given enough data, computers can do exactly that, with increasing accuracy. Word-error rates have crept down gradually as computers have become more powerful, and are being fed more data. And the systems will get better if people continue using them. Every time a user uses a digital assistant with a speech-recognition system, the data becomes potential training data for the company that makes the system, as most requests are sent via the internet to the provider’s computers in the cloud.

…

Neural networks were introduced for a few language-pairs for Google Translate in late 2016, leading to an immediate and dramatic improvement in Translate’s performance. That same year, Microsoft announced a speech-recognition system that made as few errors as a human transcriber. The system was powered by six neural networks, each of which tackled some parts of the problem better than others. None of these systems are perfect at the time of writing, and they almost certainly won’t be any time soon. “Deep learning” brought a sudden jump in quality in many language technologies, but it still cannot flexibly handle language like humans can. Translation and speech-recognition systems perform much better when their tasks are limited to a single domain, like medicine or law.

pages: 477 words: 75,408

The Economic Singularity: Artificial Intelligence and the Death of Capitalism
by Calum Chace
Published 17 Jul 2016

We have seen before with the relative decline of seemingly invincible goliaths like IBM and Microsoft how fierce and fast-moving the competition is within the technology industry. This is one of the dynamics which is pushing AI forward so fast and so unstoppably. Image and speech recognition Deep learning has accelerated progress at tasks like image recognition, facial recognition, natural speech recognition and machine translation faster than anyone expected. In 2012, Google announced that an assembly of 16,000 processors looking at 10 million YouTube videos had identified – without being prompted – a particular class of objects. We call them cats.

…

t=33 [lxxxviii] http://news.sciencemag.org/social-sciences/2015/02/facebook-will-soon-be-able-id-you-any-photo [lxxxix] http://www.computerworld.com/article/2941415/data-privacy/is-facial-recognition-a-threat-on-facebook-and-google.html [xc] http://www.wired.com/2016/01/2015-was-the-year-ai-finally-entered-the-everyday-world/ [xci] At the time of writing, April 2016, Aipoly is impressive, but far from perfect. [xcii] http://www.bloomberg.com/news/2014-12-23/speech-recognition-better-than-a-human-s-exists-you-just-can-t-use-it-yet.html [xciii] http://www.forbes.com/sites/parmyolson/2014/05/28/microsoft-unveils-near-real-time-language-translation-for-skype/ [xciv] http://www.technologyreview.com/news/544651/baidus-deep-learning-system-rivals-people-at-speech-recognition/#comments [xcv] https://youtu.be/V1eYniJ0Rnk?t=1 [xcvi] http://edge.org/response-detail/26780 [xcvii] http://techcrunch.com/2016/03/19/how-real-businesses-are-using-machine-learning/ [xcviii] http://www.latimes.com/business/technology/la-fi-cutting-edge-ibm-20160422-story.html [xcix] http://www.wired.com/2016/04/openai-elon-musk-sam-altman-plan-to-set-artificial-intelligence-free/ [c] http://www.strategyand.pwc.com/global/home/what-we-think/innovation1000/top-innovators-spenders#/tab-2015 [ci] 2013 data: http://www.ons.gov.uk/ons/rel/rdit1/gross-domestic-expenditure-on-research-and-development/2013/stb-gerd-2013.html [cii] http://insights.venturescanner.com/category/artificial-intelligence-2/ [ciii] http://techcrunch.com/2015/12/25/investing-in-artificial-intelligence/ [civ] http://www.wired.com/2015/11/google-open-sources-its-artificial-intelligence-engine/ [cv] https://www.theguardian.com/technology/2016/apr/13/google-updates-tensorflow-open-source-artificial-intelligence [cvi] http://www.wired.com/2015/12/facebook-open-source-ai-big-sur/ [cvii] The name Parsey McParseFace is a play on a jokey name for a research ship which received a lot of votes in a poll run by the British government in April 2016. http://www.wsj.com/articles/googles-open-source-parsey-mcparseface-helps-machines-understand-english-1463088180 [cviii] Assuming you don't count the Vatican as a proper country. http://www.ibtimes.co.uk/google-project-loon-provide-free-wifi-across-sri-lanka-1513136 [cix] https://setandbma.wordpress.com/2013/02/04/who-coined-the-term-big-data/ [cx] http://www.pcmag.com/encyclopedia/term/37701/amara-s-law [cxi] http://www.lrb.co.uk/v37/n05/john-lanchester/the-robots-are-coming [cxii] Haitz's Law states that the cost per unit of useful light emitted decreases exponentially [cxiii] http://computationalimagination.com/article_cpo_decreasing.php [cxiv] http://www.nytimes.com/2006/06/07/technology/circuits/07essay.html [cxv] . http://arstechnica.com/gadgets/2015/02/intel-forges-ahead-to-10nm-will-move-away-from-silicon-at-7nm/ [cxvi] .

…

In January 2016 Baidu (often described as China's Google) showed off a system called DuLight which uses a camera to capture an image of something in front of you, sends the image to an app on your smartphone, which identifies the object and announces what it is. One application of this is to help blind people know what they are “looking” at.[xc] You can download a similar app called Aipoly for free at iTunes.[xci] Speech recognition systems that exceed human performance will be available in your smartphone soon.[xcii] Microsoft-owned Skype introduced real-time machine translation in March 2014: it is not yet perfect, but it is improving all the time. Microsoft CEO Satya Nadella revealed an intriguing discovery which he called transfer learning: “If you teach it English, it learns English,” he said.

pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology
by Ray Kurzweil
Published 14 Jul 2005

Watts's software is capable of matching the intricacies that have been revealed in subtle experiments on human hearing and auditory discrimination. Watts has used his model as a preprocessor (front end) in speech-recognition systems and has demonstrated its ability to pick out one speaker from background sounds (the "cocktail party effect"). This is an impressive feat of which humans are capable but up until now had not been feasible in automated speech-recognition systems.90 Like human hearing, Watts's cochlea model is endowed with spectral sensitivity (we hear better at certain frequencies), temporal responses (we are sensitive to the timing of sounds, which create the sensation of their spatial locations), masking, nonlinear frequency-dependent amplitude compression (which allows for greater dynamic range—the ability to hear both loud and quiet sounds), gain control (amplification), and other subtle features.

…

Another method that is good at applying probabilistic networks to complex sequences of information involves Markov models.170 Andrei Andreyevich Markov (1856–1922), a renowned mathematician, established a theory of "Markov chains," which was refined by Norbert Wiener (1894–1964) in 1923. The theory provided a method to evaluate the likelihood that a certain sequence of events would occur. It has been popular, for example, in speech recognition, in which the sequential events are phonemes (parts of speech). The Markov models used in speech recognition code the likelihood that specific patterns of sound are found in each phoneme, how the phonemes influence each other, and likely orders of phonemes. The system can also include probability networks on higher levels of language, such as the order of words.

…

Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE 77 (1989): 257–86. For a mathematical treatment of Markov models, see http://jedlik.phy.bme.hu/~gerjanos/HMM/node2.html. 171. Kurzweil Applied Intelligence (KAI), founded by the author in 1982, was sold in 1997 for $100 million and is now part of ScanSoft (formerly called Kurzweil Computer Products, the author's first company, which was sold to Xerox in 1980), now a public company. KAI introduced the first commercially marketed large-vocabulary speech-recognition system in 1987 (Kurzweil Voice Report, with a ten-thousand-word vocabulary). 172.

pages: 502 words: 107,510

Natural Language Annotation for Machine Learning
by James Pustejovsky and Amber Stubbs
Published 14 Oct 2012

Programs such as Google Translate are getting better and better, but the real killer app will be the BabelFish that translates in real time when you’re looking for the right train to catch in Beijing. Speech Recognition This is one of the most difficult problems in NLP. There has been great progress in building models that can be used on your phone or computer to recognize spoken language utterances that are questions and commands. Unfortunately, while these Automatic Speech Recognition (ASR) systems are ubiquitous, they work best in narrowly defined domains and don’t allow the speaker to stray from the expected scripted input (“Please say or type your card number now”).

…

Key Word in Context (KWIC) is invented as a means of indexing documents and creating concordances. 1960s: Kucera and Francis publish A Standard Corpus of Present-Day American English (the Brown Corpus), the first broadly available large corpus of language texts. Work in Information Retrieval (IR) develops techniques for statistical similarity of document content. 1970s: Stochastic models developed from speech corpora make Speech Recognition systems possible. The vector space model is developed for document indexing. The London-Lund Corpus (LLC) is developed through the work of the Survey of English Usage. 1980s: The Lancaster-Oslo-Bergen (LOB) Corpus, designed to match the Brown Corpus in terms of size and genres, is compiled.

…

For your own corpus, you may find yourself wanting to cover a wide variety of text, but it is likely that you will have a more specific task domain, and so your potential corpus will not need to include the full range of human expression. The Switchboard Corpus is an example of a corpus that was collected for a very specific purpose—Speech Recognition for phone operation—and so was balanced and representative of the different sexes and all different dialects in the United States. Early Use of Corpora One of the most common uses of corpora from the early days was the construction of concordances. These are alphabetical listings of the words in an article or text collection with references given to the passages in which they occur.

pages: 482 words: 121,173

Tools and Weapons: The Promise and the Peril of the Digital Age
by Brad Smith and Carol Ann Browne
Published 9 Sep 2019

See Sejnowski for a thorough history of the developments that have led to advances in neural networks over the past two decades. Back to note reference 10. Dom Galeon, “Microsoft’s Speech Recognition Tech Is Officially as Accurate as Humans,” Futurism, October 20, 2016, https://futurism.com/microsofts-speech-recognition-tech-is-officially-as-accurate-as-humans/; Xuedong Huang, “Microsoft Researchers Achieve New Conversational Speech Recognition Milestone,” Microsoft Research Blog, Microsoft, August 20, 2017, https://www.microsoft.com/en-us/research/blog/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone/. Back to note reference 11. The rise of superintelligence was first raised by I.J.

…

Vision and speech recognition have long been among the holy grails for researchers in computer science. In 1995, when Bill Gates founded Microsoft Research, one of the first goals of Nathan Myhrvold, who headed the effort, was to recruit the top academics in vision and speech recognition. I still recall when Microsoft’s basic research team optimistically predicted in the 1990s that a computer would soon be able to understand speech as well as a human being. The optimism of Microsoft researchers was shared by experts across academia and the tech sector. The reality was that speech recognition took longer to improve than experts had predicted.

…

It also required new breakthroughs in techniques needed to train multilayer neural networks,9 which started to come to fruition about a decade ago.10 The collective impact of these changes led to rapid and impressive advances in AI-based systems. In 2016, the team at Microsoft Research’s vision recognition system matched human performance in a specific challenge to identify a large number of objects in a library called ImageNet. They then did the same thing with speech recognition in a specific challenge called the Switchboard data set, achieving a 94.1 percent accuracy rate.11 In other words, computers were starting to perceive the world as well as human beings. The same phenomenon happened with translation of languages, which requires in part that computers understand the meaning of different words, including nuance and slang.

pages: 187 words: 55,801

The New Division of Labor: How Computers Are Creating the Next Job Market
by Frank Levy and Richard J. Murnane
Published 11 Apr 2004

The explanation was plausible—a call center that used operators who read scripts on computer screens moved to a source of even cheaper labor. In fact, however, the work order had not been taken by a human operator but by a computer using speech recognition software. By reading menus to the caller, the software could prompt the caller to identify the problem as being in a refrigerator, speciﬁcally, the ice-maker. It could also prompt the caller to choose a time he would be at home from a list of times when technicians were available. The speech recognition software could recognize the caller’s phone number and establish that the home address was a “HOUSE” rather than an apartment. While the software was not yet good enough to recognize the home address itself, it had captured enough information to print up a work order and append it to the technician’s schedule.

…

To make sense of what she sees, she must extract features from this information, understanding where an adult’s legs end and where another object begins. In a complex visual ﬁeld, this feature extraction is extremely difﬁcult to program even though most four-year-olds do it without thinking. Perception is an equally difﬁcult problem in speech recognition, determining where words begin and end, factoring out the “ummm’s” and “like’s.” The second problem, for both people and computers, involves interpreting what is perceived—recognizing what it is we are seeing, hearing, tasting, and so on. This recognition involves comparing what is perceived to concepts or schemas stored in memory.

…

In the case of Fannie Mae’s Desktop Underwriter, the analysis of previously issued mortgages provided the points assessed to different pieces of information in computing a total score. By contrast, in neural net soft- 26 CHAPTER 2 ware, the program is “trained” on samples of previously identiﬁed patterns to “learn” which characteristics of a set of information mark it as a pattern of interest. For example, training would enable speech recognition software to distinguish the digital pattern of the spoken word “BILL” from the digital pattern of “ROSE” and to distinguish each of them from the digital pattern for “SPAGHETTI.” But once software has identiﬁed “BILL,” there is still the problem of determining which meaning of “BILL” is intended.

pages: 416 words: 129,308

The One Device: The Secret History of the iPhone
by Brian Merchant
Published 19 Jun 2017

Yet if you ask Siri where she—sorry, it, but more on that in a second—comes from, the reply is the same: “I, Siri, was designed by Apple in California.” But that isn’t the full story. Siri is really a constellation of features—speech-recognition software, a natural-language user interface, and an artificially intelligent personal assistant. When you ask Siri a question, here’s what happens: Your voice is digitized and transmitted to an Apple server in the Cloud while a local voice recognizer scans it right on your iPhone. Speech-recognition software translates your speech into text. Natural-language processing parses it. Siri consults what tech writer Steven Levy calls the iBrain—around 200 megabytes of data about your preferences, the way you speak, and other details.

…

“So we now know a lot about what people want in life and what they want to say to a computer and what they say to an assistant. “We don’t give it to anyone outside the company—there’s a strong privacy policy. So we don’t even keep most of that data on the servers, if at all, for very long.… Speech recognition has gotten much better because we actually look at the data and run experiments on it.” He too is fully aware of Siri’s shortcomings. “Right now the illusion breaks down when either you have speech-recognition issue, or you have a question that isn’t a common question or a request with an uncommon way of saying it.… How chatty can it get? How companion-like could it really be? Who’s the audience for that?

…

Before that, it was a research project at Stanford backed by the Defense Department with the aim of creating an artificially intelligent assistant. Before that, it was an idea that had bounced around the tech industry, pop culture, and the halls of academia for decades; Apple itself had an early concept of a voice-interfacing AI in the 1980s. Before that there was the Hearsay II, a proto-Siri speech-recognition system. And Gruber says it was the prime inspiration for Siri. Dabbala Rajagopal “Raj” Reddy was born in 1937 in a village of five hundred people south of Madras, India. Around then, the region was hit with a seven-year drought and subsequent famine. Reddy learned to write, he says, by carving figures in the sand.

pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control
by Stuart Russell
Published 7 Oct 2019

N., 221–22 smart homes, 71–72 Smith, Adam, 227 snopes.com, 108 social aggregation theorem, 220–21 Social Limits to Growth, The (Hirsch), 230 social media, and content selection algorithms, 8–9 softbots, 64 software systems, 248 solutions, searching for, 257–66 abstract planning and, 264–66 combinatorial complexity and, 258 computational activity, managing, 261–62 15-puzzle and, 258 Go and, 259–61 map navigation and, 257–58 motor control commands and, 263–64 24-puzzle and, 258 “Some Moral and Technical Consequences of Automation” (Wiener), 10 Sophia (robot), 126 specifications (of programs), 248 “Speculations Concerning the First Ultraintelligent Machine” (Good), 142–43 speech recognition, 6 speech recognition capabilities, 74–75 Spence, Mike, 117 SpotMini, 73 SRI, 41–42, 52 standard model of intelligence, 9–11, 13, 48–61, 247 StarCraft, 45 Stasi, 103–4 stationarity, 24 statistics, 10, 176 Steinberg, Saul, 88 stimulus–response templates, 67 Stockfish (chess program), 47 striving and enjoying, relation between, 121–22 subroutines, 34, 233–34 Summers, Larry, 117, 120 Summit machine, 34, 35, 37 Sunstein, Cass, 244 Superintelligence (Bostrom), 102, 145, 150, 167, 183 supervised learning, 58–59, 285–93 surveillance, 104 Sutherland, James, 71 “switch it off” argument, 160–61 synapses, 15, 16 Szilard, Leo, 8, 77, 150 tactile sensing problem, robots, 73 Taobao, 106 technological unemployment.

…

Connections were made to the long-established disciplines of probability, statistics, and control theory. The seeds of today’s progress were sown during that AI winter, including early work on large-scale probabilistic reasoning systems and what later became known as deep learning. Beginning around 2011, deep learning techniques began to produce dramatic advances in speech recognition, visual object recognition, and machine translation—three of the most important open problems in the field. By some measures, machines now match or exceed human capabilities in these areas. In 2016 and 2017, DeepMind’s AlphaGo defeated Lee Sedol, former world Go champion, and Ke Jie, the current champion—events that some experts predicted wouldn’t happen until 2097, if ever.6 Now AI generates front-page media coverage almost every day.

…

Yann LeCun’s team at AT&T Labs didn’t write special algorithms to recognize “8” by searching for curvy lines and loops; instead, they improved on existing neural network learning algorithms to produce convolutional neural networks. Those networks, in turn, exhibited effective character recognition after suitable training on labeled examples. The same algorithms can learn to recognize letters, shapes, stop signs, dogs, cats, and police cars. Under the headline of “deep learning,” they have revolutionized speech recognition and visual object recognition. They are also one of the key components in AlphaZero as well as in most of the current self-driving car projects. If you think about it, it’s hardly surprising that progress towards general AI is going to occur in narrow-AI projects that address specific tasks; those tasks give AI researchers something to get their teeth into.

pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution
by Gregory Zuckerman
Published 5 Nov 2019

One sees the results of the chain but not the “states” that help explain the progression of the chain. Those not acquainted with baseball might throw their hands up when receiving updates of the number of runs scored each inning—one run in this inning, six in another, with no obvious pattern or explanation. Some investors liken financial markets, speech recognition patterns, and other complex chains of events to hidden Markov models. The Baum-Welch algorithm provided a way to estimate probabilities and parameters within these complex sequences with little more information than the output of the processes. For the baseball game, the Baum-Welch algorithm might enable even someone with no understanding of the sport to guess the game situations that produced the scores.

…

Today, though, Baum’s algorithm, which allows a computer to teach itself states and probabilities, is seen as one of the twentieth century’s notable advances in machine learning, paving the way for breakthroughs affecting the lives of millions in fields from genomics to weather prediction. Baum-Welch enabled the first effective speech recognition system and even Google’s search engine. For all of the acclaim Baum-Welch brought Lenny Baum, most of the hundreds of other papers he wrote were classified, which grated on Julia. She came to believe her husband was getting neither the recognition nor the pay he deserved. The Baum children had little idea what their father was up to.

…

One last thing got Patterson especially excited: if a potential recruit was miserable in their current job. “I liked smart people who were probably unhappy,” Patterson says. One day, after reading in the morning paper that IBM was slashing costs, Patterson became intrigued. He was aware of the accomplishments of the computer giant’s speech-recognition group and thought their work bore similarity to what Renaissance was doing. In early 1993, Patterson sent separate letters to Peter Brown and Robert Mercer, deputies of the group, inviting them to visit Renaissance’s offices to discuss potential positions. Brown and Mercer both reacted the exact same way—depositing Patterson’s letter in the closest trash receptacle.

pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era
by James Barrat
Published 30 Sep 2013

In an essay defending this view and his predictions about technological milestones he wrote: Basically, we are looking for biologically inspired methods that can accelerate work in AI, much of which has progressed without significant insight as to how the brain performs similar functions. From my own work in speech recognition, I know that our work was greatly accelerated when we gained insights as to how the brain prepares and transforms auditory information. Back in the 1990s, Kurzweil Computer Technologies broke new ground in voice recognition with applications designed to let doctors dictate medical reports. Kurzweil sold the company, and it became one of the roots of Nuance Communications, Inc. Whenever you use Siri it is Nuance’s algorithms that perform the speech recognition part of its magic. Speech recognition is the art of translating the spoken word to text (not to be confused with NLP, extracting meaning from written words).

…

A force so unstable and mysterious, nature achieved it in full just once—intelligence. Chapter One The Busy Child artificial intelligence (abbreviation: AI) noun the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. —The New Oxford American Dictionary, Third Edition On a supercomputer operating at a speed of 36.8 petaflops, or about twice the speed of a human brain, an AI is improving its intelligence. It is rewriting its own program, specifically the part of its operating instructions that increases its aptitude in learning, problem solving, and decision making.

…

Through several well-funded projects, IBM pursues AGI, and DARPA seems to be backing every AGI project I look into. So, again, why not Google? When I asked Jason Freidenfelds, from Google PR, he wrote: … it’s much too early for us to speculate about topics this far down the road. We’re generally more focused on practical machine learning technologies like machine vision, speech recognition, and machine translation, which essentially is about building statistical models to match patterns—nothing close to the “thinking machine” vision of AGI. But I think Page’s quotation sheds more light on Google’s attitudes than Freidenfelds’s. And it helps explain Google’s evolution from the visionary, insurrectionist company of the 1990s, with the much touted slogan DON’T BE EVIL, to today’s opaque, Orwellian, personal-data-aggregating behemoth.

pages: 374 words: 114,600

The Quants
by Scott Patterson
Published 2 Feb 2010

Then the trail goes cold. At first blush, speech recognition and investing would appear to have little in common. But beneath the surface, there are striking connections. Computer models designed to map human speech depend on historical data that mimic acoustic signals. To operate most efficiently, speech recognition programs monitor the signals and, based on probability functions, try to guess what sound is coming next. The programs constantly make such guesses to keep up with the speaker. Financial models are also made up of data strings. By glomming complex speech recognition models onto financial data, say a series of soybean prices, Renaissance can discern a range of probabilities for the future directions of prices.

…

By glomming complex speech recognition models onto financial data, say a series of soybean prices, Renaissance can discern a range of probabilities for the future directions of prices. If the odds become favorable … if you have an edge … It’s obviously not so simple—if it were, every speech recognition expert in the world would be running a hedge fund. There are complicated issues involving the quality of the data and whether the patterns discovered are genuine. But there is clearly a powerful connection between speech recognition and investing that Renaissance is exploiting to the hilt. A clue to the importance of speech recognition to Renaissance’s broader makeup is that Brown and Mercer were named co-CEOs of Renaissance Technologies after Simons stepped down in late 2009.

…

Renaissance has applied that skill to enormous strings of market numbers, such as tick-by-tick data in oil prices, while looking at other relationships the data have with assets such as the dollar or gold. Another clue can be found in the company’s decision in the early 1990s to hire several individuals with expertise in the obscure, decidedly non–Wall Street field of speech recognition. In November 1993, Renaissance hired Peter Brown and Robert Mercer, founders of a speech recognition group at IBM’s Thomas J. Watson Research Center in Yorktown Heights, New York, in the hills of Westchester County. Brown came to be known as a freakishly hard worker at the fund, often spending the night at Renaissance’s East Setauket headquarters on a Murphy bed with a whiteboard tacked to the bottom of it.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Published 21 Sep 2015

We can go one step further with a model like this: The states form a Markov chain, as before, but we don’t get to see them; we have to infer them from the observations. This is called a hidden Markov model, or HMM for short. (Slightly misleading, because it’s the states that are hidden, not the model.) HMMs are at the heart of speech-recognition systems like Siri. In speech recognition, the hidden states are written words, the observations are the sounds spoken to Siri, and the goal is to infer the words from the sounds. The model has two components: the probability of the next word given the current one, as in a Markov chain, and the probability of hearing various sounds given the word being pronounced.

…

“The PageRank citation ranking: Bringing order to the Web,”* by Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd (Stanford University technical report, 1998), describes the PageRank algorithm and its interpretation as a random walk over the web. Statistical Language Learning,* by Eugene Charniak (MIT Press, 1996), explains how hidden Markov models work. Statistical Methods for Speech Recognition,* by Fred Jelinek (MIT Press, 1997), describes their application to speech recognition. The story of HMM-style inference in communication is told in “The Viterbi algorithm: A personal history,” by David Forney (unpublished; online at arxiv.org/pdf/cs/0504020v2.pdf). Bioinformatics: The Machine Learning Approach,* by Pierre Baldi and Søren Brunak (2nd ed., MIT Press, 2001), is an introduction to the use of machine learning in biology, including HMMs.

…

Ironically, Lenat has belatedly embraced populating Cyc by mining the web, not because Cyc can read, but because there’s no other way. Even if by some miracle we managed to finish coding up all the necessary pieces, our troubles would be just beginning. Over the years, a number of research groups have attempted to build complete intelligent agents by putting together algorithms for vision, speech recognition, language understanding, reasoning, planning, navigation, manipulation, and so on. Without a unifying framework, these attempts soon hit an insurmountable wall of complexity: too many moving parts, too many interactions, too many bugs for poor human software engineers to cope with. Knowledge engineers believe AI is just an engineering problem, but we have not yet reached the point where engineering can take us the rest of the way.

Designing Search: UX Strategies for Ecommerce Success
by Greg Nudelman and Pabini Gabriel-Petit
Published 8 May 2011

Figure 15-8: Custom sort control implemented via popover in the ThirstyPocket iPhone app Changing Search Paradigms Because of the unique mix of constraints and opportunities that mobile application design presents, this design space is rich with possibilities for changing the existing paradigms for search and finding. Consider speech recognition, for example. Although, on the desktop, speech recognition does not yet enjoy widespread popularity and use, mobile represents an entirely different context—where speech recognition can offer an ideal solution. Not interpreting a spoken word correctly on a mobile device might not be quite as big a deal as it is on the desktop because the accuracy of speech recognition may actually approach, if not exceed, that of typing on a mobile phone’s awkward mini-keyboard. For some mobile contexts, like driving, speech recognition may even offer a way to access full-featured search when typing is not available.

…

For some mobile contexts, like driving, speech recognition may even offer a way to access full-featured search when typing is not available. Combine speech recognition with the use of an accelerometer and magnetometer, enabling gestural input, and you have the Google Mobile search application for the iPhone, shown in Figure 15-9. Figure 15-9: Google Mobile iPhone app Google’s iPhone application recognizes the gesture of a person’s swinging the phone up to his ear to know when to record a search command. When the user speaks, the search engine accepts and interprets his voice commands and then serves up search results. This user interface implements what is literally a game-changing design paradigm because its designers have taken the time to truly consider the mobile context of use and map natural interactions such as speech and gestures to mobile device functions.

…

The Amazon Remembers feature discussed in Chapter 15 is already moving toward creating a high-touch experience in which human operators can augment an image-based search user interface. For the ultimate in service, these specially trained salespeople would be ready to chat the moment a customer opens an ecommerce application. Through speech recognition and speech synthesis technology, people could communicate with their virtual personal assistant, who’s a great multitasker and is always ready to chat, whether they need help shopping or tutoring on various life matters. References Clark, Josh. “iPad Design Headaches.” Design4Mobile, September 20–24, 2010.

pages: 315 words: 92,151

Ten Billion Tomorrows: How Science Fiction Technology Became Reality and Shapes the Future
by Brian Clegg
Published 8 Dec 2015

The bad news is that it could only handle digits—well behind the kind of speech recognition program we tend to curse in modern automated telephone systems. It is interesting that when speech-recognition pioneer Ray Kurzweil was writing about Hal’s capabilities back in the late 1990s, he expected that we would be using speech to dictate to personal computer applications as the norm before 2001. In reality, such systems are still not used on most computers today. The shift Kurzweil expected has been much slower than he anticipated and may never come. Although the speech recognition technology built into my iMac is quite good, I very rarely use it.

…

v=ajsCY8SjJ1Y, accessed September 3, 2014. The first computer music, produced at the University of Manchester, is described in B. Jack Copeland, Turing: Pioneer of the Information Age (Oxford, UK: Oxford University Press, 2012), pp. 163–64. Information in the history of speech recognition from the Raymond Kurzweil section, “When will HAL Understand what we are Saying? Computer Speech Recognition and Understanding,” in David G. Stork (ed.), Hal’s Legacy (Cambridge, MA: MIT Press, 2000), pp. 145–50. Apple’s Knowledge Navigator appears at a number of locations on YouTube including www.youtube.com/watch?v=QRH8eimU_20, accessed September 3, 2014.

…

Over and above a few novelties and handling the kind of request you might expect to make of an electronic PA—booking appointments, looking something up online, planning a route, or playing music—it rapidly becomes clear that Siri is not capable of a real conversation, falling down on both of the key challenges of parsing and of understanding speech. Siri’s voice recognition is surprisingly effective, but there are times when it can struggle. Nonstandard accents can throw it easily—there isn’t a speech-recognition system yet that doesn’t fail with some Glasgow or Downeast Maine accents. And the way we speak as a matter of course involves slurring, running words together in a way that we never notice, but which a machine is forced to encounter and deal with. This doesn’t mean that it is not possible for a machine to understand speech.

pages: 332 words: 93,672

Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
by George Gilder
Published 16 Jul 2018

With clusters of supercomputers running at sufficient velocity, you could beat every short-term market you could access and measure. In 2009 Simons retired and named Mercer and Brown co-CEOs of his company. Mercer’s IBM boss, Fred Jelinek, was a protégé of the MIT information theorist Robert Fano and a student of Claude Shannon. He saw speech recognition as an information theory problem—an acoustic signal and a noisy channel. Citing the content-neutral concept behind his speech-recognition successes, Jelinek proudly declared, “Every time I fire a linguist, the performance improves.” Renaissance’s approach similarly spurns any direct kibitzing from fundamental analysts or anyone who knows anything special about particular companies.

…

These chips compute the car’s response to lidar, radar, ultrasound, and camera signals that free the missile to descend from the outer space of Elon Musk’s domains and enter the ever-changing high-entropy world beyond Google Maps. Dally barks his command: “Navigate to California Avenue Caltrain station,” and the car crisply responds. Dally comments, “In the last couple years speech recognition has become dramatically better. Thirty-percent better. Two years ago it was not really capable of getting it right. But now with machine learning on our Tegra chips, it gets it right every time.” Benefiting are all the users of Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, Google’s Go.

…

Simultaneously with the dogs and cats crisis in 2012, the leader of the Google Brain research team, Jeff Dean, raised the stakes by telling Urs Hölzle, Google’s data center dynamo, “We need another Google.” Dean meant that Google would have to double the capacity of its data centers just to accommodate new demand for its Google Now speech recognition services on Android smartphones. Late in the year, Bill Dally provided an answer. Over breakfast at Dally’s favorite Palo Alto café, his Stanford colleague Andrew Ng, who worked with Dean at Google Brain, was complaining about the naming of cats. Sixteen thousand costly microprocessor cores seemed inefficient.

pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future
by Andrew McAfee and Erik Brynjolfsson
Published 26 Jun 2017

depth=1#x0026;hl=en#x0026;prev=search#x0026;rurl=translate.google.com#x0026;sl=ja#x0026;sp=nmt4#x0026;u=http://www.fukoku-life.co.jp/about/news/download/20161226.pdf. 84 In October of 2016: Allison Linn, “Historic Achievement: Microsoft Researchers Reach Human Parity in Conversational Speech Recognition,” Microsoft (blog), October 18, 2016, http://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/#sm.0001d0t49dx0veqdsh21cccecz0e3. 84 “I must confess that I never thought”: Mark Liberman, “Human Parity in Conversational Speech Recognition,” Language Log (blog), October 18, 2016, http://languagelog.ldc.upenn.edu/nll/?p=28894. 84 “Every time I fire a linguist”: Julia Hirschberg, “ ‘Every Time I Fire a Linguist, My Performance Goes Up,’ and Other Myths of the Statistical Natural Language Processing Revolution” (speech, 15th National Conference on Artificial Intelligence, Madison, WI, July 29, 1998). 84 “AI-first world”: Julie Bort, “Salesforce CEO Marc Benioff Just Made a Bold Prediction about the Future of Tech,” Business Insider, May 18, 2016, http://www.businessinsider.com/salesforce-ceo-i-see-an-ai-first-world-2016-5. 85 “Many businesses still make important decisions”: Marc Benioff, “On the Cusp of an AI Revolution,” Project Syndicate, September 13, 2016, https://www.project-syndicate.org/commentary/artificial-intelligence-revolution-by-marc-benioff-2016-09.

…

But the hardest part of customer service to automate has not been finding an answer, but rather the initial step: listening and understanding. Speech recognition and other aspects of natural language processing have been tremendously difficult problems in artificial intelligence since the dawn of the field, for all of the reasons described earlier in this chapter. The previously dominant symbolic approaches have not worked well at all, but newer ones based on deep learning are making progress so quickly that it has surprised even the experts. In October of 2016, a team from Microsoft Research announced that a neural network they had built had achieved “human parity in conversational speech recognition,” as the title of their paper put it.

…

One of Logic Theorist’s proofs, in fact, was so much more elegant than the one in the book that Russell himself “responded with delight” to it. Simon announced that he and his colleagues had “invented a thinking machine.” Other challenges, however, proved much less amenable to a rule-based approach. Decades of research in speech recognition, image classification, language translation, and other domains yielded unimpressive results. The best of these systems achieved much worse than human-level performance, and the worst were memorably bad. According to a 1979 collection of anecdotes, for example, researchers gave their English-to-Russian translation utility the phrase “The spirit is willing, but the flesh is weak.”

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by Dipanjan Sarkar
Published 1 Dec 2016

A question is posed to a computer and a human, and the test is passed if it is impossible to say which of the answers given was given by the human. Over time, a lot of progress has been made in this area by using techniques like speech synthesis, analysis, syntactic parsing, and contextual reasoning. But one chief limitation for speech recognition systems still remains: They are very domain specific and will not work if the user strays even a little bit from the expected scripted inputs needed by the system. Speech-recognition systems are now found in many places, from desktop computers to mobile phones to virtual assistance systems. Question Answering Systems Question Answering Systems (QAS) are built upon the principle of Question Answering, based on using techniques from NLP and information retrieval (IR).

…

The Philosophy of Language Language Acquisition and Usage Linguistics Language Syntax and Structure Words Phrases Clauses Grammar Word Order Typology Language Semantics Lexical Semantic Relations Semantic Networks and Models Representation of Semantics Text Corpora Corpora Annotation and Utilities Popular Corpora Accessing Text Corpora Natural Language Processing Machine Translation Speech Recognition Systems Question Answering Systems Contextual Recognition and Resolution Text Summarization Text Categorization Text Analytics Summary Chapter 2: Python Refresher Getting to Know Python The Zen of Python Applications: When Should You Use Python? Drawbacks: When Should You Not Use Python?

…

In his spare time he loves reading, gaming, and watching popular sitcoms and football. About the Technical Reviewer Shanky Sharma Currently leading the AI team at Nextremer India, Shanky Sharma’s work entails implementing various AI and machine learning–related projects and working on deep learning for speech recognition in Indic languages. He hopes to grow and scale new horizons in AI and machine learning technologies. Statistics intrigue him and he loves playing with numbers, designing algorithms, and giving solutions to people. He sees himself as a solution provider rather than a scripter or another IT nerd who codes.

pages: 480 words: 119,407

Invisible Women
by Caroline Criado Perez
Published 12 Mar 2019

_tid=f0a12b58-f81d-11e6-af6b-00000aab0f26&acdnat=1487671995_41cfe19ea98e87fb7e3e693bdddaba6e; http://www.sciencedirect.com/science/article/pii/S1050641108001909 18 https://www.theverge.com/circuitbreaker/2016/7/14/12187580/keecok1-hexagon-phone-for-women 19 https://www.theguardian.com/technology/askjack/2016/apr/21/can-speech-recognition-software-help-prevent-rsi 20 https://makingnoiseandhearingthings.com/2016/07/12/googles-speech-recognition-has-a-gender-bias/ 21 http://blog-archive.griddynamics.com/2016/01/automatic-speech-recognition-services.html 22 https://www.autoblog.com/2011/05/31/women-voice-command-systems/ 23 https://www.ncbi.nlm.nih.gov/pubmed/27435949 24 American Roentgen Ray Society (2007), ‘Voice Recognition Systems Seem To Make More Errors With Women’s Dictation’, ScienceDaily, 6 May 2007; Rodger, James A. and Pendharkar, Parag C. (2007), ‘A field study of database communication issues peculiar to users of a voice activated medical tracking application’, Decision Support Systems, 43:1 (1 February 2007), 168–80, https://doi.org/10.1016/j.dss.2006.08.005. 25 American Roentgen Ray Society (2007) 26 http://techland.time.com/2011/06/01/its-not-you-its-it-voice-recognition-doesnt-recognize-women/ 27 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2994697/ 28 http://www.aclweb.org/anthology/P08-1044 29 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2790192/ 30 http://www.aclweb.org/anthology/P08-1044 31 http://groups.inf.ed.ac.uk/ami/corpus/; http://www1.icsi.berkeley.edu/Speech/papers/gelbart-ms/numbers/; http://www.voxforge.org/ 32 http://www.natcorp.ox.ac.uk/corpus/index.xml?

…

Voice recognition has also been suggested as a solution to smartphone-associated RSI,19 but this actually isn’t much of a solution for women, because voice-recognition software is often hopelessly male-biased. In 2016, Rachael Tatman, a research fellow in linguistics at the University of Washington, found that Google’s speech-recognition software was 70% more likely to accurately recognise male speech than female speech20 – and it’s currently the best on the market.21 Clearly, it is unfair for women to pay the same price as men for products that deliver an inferior service to them. But there can also be serious safety implications.

…

After five failed attempts I suggested she tried lowering the pitch of her voice. It worked first time. As voice-recognition software has become more sophisticated, its use has branched out to numerous fields, including medicine, where errors can be just as grave. A 2016 paper analysed a random sample of a hundred notes dictated by attending emergency physicians using speech-recognition software, and found that 15% of the errors were critical, ‘potentially leading to miscommunication that could affect patient care’.23 Unfortunately these authors did not sex-disaggregate their data, but papers that have, report significantly higher transcription error rates for women than men.24 Dr Syed Ali, the lead author of one of the medical dictation studies, observed that his study’s ‘immediate impact’ was that women ‘may have to work somewhat harder’ than men ‘to make the [voice recognition] system successful’.25 Rachael Tatman agrees: ‘The fact that men enjoy better performance than women with these technologies means that it’s harder for women to do their jobs.

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurélien Géron
Published 13 Mar 2017

This paper revived the interest of the scientific community and before long many new papers demonstrated that Deep Learning was not only possible, but capable of mind-blowing achievements that no other Machine Learning (ML) technique could hope to match (with the help of tremendous computing power and great amounts of data). This enthusiasm soon extended to many other areas of Machine Learning. Fast-forward 10 years and Machine Learning has conquered the industry: it is now at the heart of much of the magic in today’s high-tech products, ranking your web search results, powering your smartphone’s speech recognition, and recommending videos, beating the world champion at the game of Go. Before you know it, it will be driving your car. Machine Learning in Your Projects So naturally you are excited about Machine Learning and you would love to join the party! Perhaps you would like to give your homemade robot a brain of its own?

…

Caution Don’t jump into deep waters too hastily: while Deep Learning is no doubt one of the most exciting areas in Machine Learning, you should master the fundamentals first. Moreover, most problems can be solved quite well using simpler techniques such as Random Forests and Ensemble methods (discussed in Part I). Deep Learning is best suited for complex problems such as image recognition, speech recognition, or natural language processing, provided you have enough data, computing power, and patience. Other Resources Many resources are available to learn about Machine Learning. Andrew Ng’s ML course on Coursera and Geoffrey Hinton’s course on neural networks and Deep Learning are amazing, although they both require a significant time investment (think months).

…

In contrast, a spam filter based on Machine Learning techniques automatically notices that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention (Figure 1-3). Figure 1-3. Automatically adapting to change Another area where Machine Learning shines is for problems that either are too complex for traditional approaches or have no known algorithm. For example, consider speech recognition: say you want to start simple and write a program capable of distinguishing the words “one” and “two.” You might notice that the word “two” starts with a high-pitch sound (“T”), so you could hardcode an algorithm that measures high-pitch sound intensity and use that to distinguish ones and twos.

pages: 569 words: 156,139

Amazon Unbound: Jeff Bezos and the Invention of a Global Empire
by Brad Stone
Published 10 May 2021

After graduating in the late 1990s, Prasad passed on the dot-com boom and worked for the Cambridge, Massachusetts–based defense contractor BBN Technologies (later acquired by Raytheon) on some of the first speech recognition and natural language systems. At BBN, he worked on one of the first in-car speech recognition systems and automated directory assistance services for telephone companies. In 2000, he worked on another system that automatically transcribed courtroom proceedings. Accurately recording conversation from multiple microphones placed around a courtroom introduced him to the challenges of far-field speech recognition. At the start of the project, he said that eighty out of every hundred words were incorrect; but within the first year, they cut it down to thirty-three.

…

By the time he was following Bezos around, the facial hair was gone and Hart was a rising corporate star. “You sort of feel like you’re an assistant coach watching John Wooden, you know, perhaps the greatest basketball coach ever,” Hart said of his time as the TA. Hart remembered talking to Bezos about speech recognition one day in late 2010 at Seattle’s Blue Moon Burgers. Over lunch, Hart demonstrated his enthusiasm for Google’s voice search on his Android phone by saying, “pizza near me,” and then showing Bezos the list of links to nearby pizza joints that popped up on-screen. “Jeff was a little skeptical about the use of it on phones, because he thought it might be socially awkward,” Hart remembered.

…

There Greg Hart finally described “this little device, about the size of a Coke can, that would sit on your table and you could ask it natural language questions and it would be a smart assistant,” recalled Yap’s VP of research, Jeff Adams, a two-decade veteran of the speech industry. “Half of my team were rolling their eyes, saying ‘oh my word, what have we gotten ourselves into.’ ” After the meeting, Adams delicately told Hart and Lindsay that their goals were unrealistic. Most experts believed that true “far-field speech recognition”—comprehending speech from up to thirty-two feet away, often amid crosstalk and background noise—was beyond the realm of established computer science, since sound bounces off surfaces like walls and ceilings, producing echoes that confuse computers. The Amazon executives responded by channeling Bezos’s resolve.

pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders
by Mariya Yao , Adelyn Zhou and Marlene Jia
Published 1 Jun 2018

A Neural Network for Machine Translation, at Production Scale [blog post]. Retrieved from: https://research.googleblog.com/2016/09/a-neural-network-for-machine.html” (5) “Huang, X.D. (2017, August 20). Microsoft researchers achieve new conversational speech recognition milestone [blog post]. Retrieved from http://www.microsoft.com/en-us/research/blog/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone/” (6) “Customer Case Studies. (n.d.). Retrieved from http://blog.clarifai.com/customer-case-studies/” (7) http://probcomp.csail.mit.edu/ (8) Reading List. (n.d.). MIT Probabilistic Computing Project.

…

Deep Learning Deep learning is a subfield of machine learning that builds algorithms by using multi-layered artificial neural networks, which are mathematical structures loosely inspired by how biological neurons fire. Neural networks were invented in the 1950s, but recent advances in computational power and algorithm design—as well as the growth of big data—have enabled deep learning algorithms to approach human-level performance in tasks such as speech recognition and image classification. Deep learning, in combination with reinforcement learning, enabled Google DeepMind’s AlphaGo to defeat human world champions of Go in 2016, a feat that many experts had considered to be computationally impossible. Much media attention has been focused on deep learning, and an increasing number of sophisticated technology companies have successfully implemented deep learning for enterprise-scale products.

…

Much media attention has been focused on deep learning, and an increasing number of sophisticated technology companies have successfully implemented deep learning for enterprise-scale products. Google replaced previous statistical methods for machine translation with neural networks to achieve superior performance.(4) Microsoft announced in 2017 that they had achieved human parity in conversational speech recognition.(5) Promising computer vision startups like Clarifai employ deep learning to achieve state-of-the-art results in recognizing objects in images and video for Fortune 500 brands.(6) While deep learning models outperform older machine learning approaches to many problems, they are more difficult to develop because they require robust training of data sets and specialized expertise in optimization techniques.

pages: 246 words: 81,625

On Intelligence
by Jeff Hawkins and Sandra Blakeslee
Published 1 Jan 2004

Let's begin with some near-term applications. These are the things that seem obvious, like replacing tubes in a radio with transistors or building calculators with a microprocessor. And we can start by looking at some areas that AI tried to tackle but couldn't solve speech recognition, vision, and smart cars. * * * If you have ever tried to use speech recognition software to enter text on a personal computer, you know how dumb it can be. Like Searle's Chinese Room, the computer has no understanding of what is being said. The few times I tried these products, I grew frustrated. If there was any noise in the room, from a dropped pencil to someone speaking to me, extra words would appear on my screen.

…

However, the words overlap and interfere, and pieces of sound drop out because of noise. You would find it extremely difficult to separate words and recognize them. These obstacles are what speech recognition software struggles with today. Engineers have discovered that by using probabilities of word transitions, they can improve the software's accuracy somewhat. For example, they use rules of grammar to decide between homonyms. This is a very simple form of prediction, but the systems are still dumb. Today's speech recognition software succeeds only in highly constrained situations in which the number of words you might say at any given moment is limited.

…

Yet humans perform many language-related tasks easily, because our cortex understands not only words but sentences and the context within which they are spoken. We anticipate ideas, phrases, and individual words. Our cortical model of the world does this automatically. So we can expect that cortexlike memory systems will transform fallible speech recognition into robust speech understanding. Instead of programming in probabilities for single word transitions, a hierarchical memory will track accents, words, phrases, and ideas and use them to interpret what is being said. Like a person, such an intelligent machine could distinguish between various speech events for example, a discussion between you and a friend in the room, a phone conversation, and editing commands for a book.

pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think
by Viktor Mayer-Schonberger and Kenneth Cukier
Published 5 Mar 2013

But what really interested Amazon, explains Andreas Weigend, Amazon’s former chief scientist, was getting hold of data on what AOL users were looking at and buying, which would improve the performance of its recommendation engine. Poor AOL never realized this. It only saw the data’s value in terms of its primary purpose—sales. Clever Amazon knew it could reap benefits by putting the data to a secondary use. Or take the case of Google’s entry into speech recognition with GOOG-411 for local search listings, which ran from 2007 to 2010. The search giant didn’t have its own speech-recognition technology so needed to license it. It reached an agreement with Nuance, the leader in the field, which was thrilled to have landed such a prized client. But Nuance was then a big-data dunderhead: the contract didn’t specify who got to retain the voice-translation records, and Google kept them for itself.

…

Analyzing the data lets one score the probability that a given digitized snippet of voice corresponds to a specific word. This is essential for improving speech-recognition technology or creating a new service altogether. At the time Nuance perceived itself as in the business of software licensing, not data crunching. As soon as it recognized its error, it began striking deals with mobile operators and handset manufacturers to use its speech-recognition service—so that it could gather up the data. The value in data’s reuse is good news for organizations that collect or control large datasets but currently make little use of them, such as conventional businesses that mostly operate offline.

…

See also imprecision and big data, [>]–[>], [>], [>], [>], [>] in database design, [>]–[>], [>] and measurement, [>]–[>], [>] necessary in sampling, [>], [>]–[>] Excite, [>] Experian, [>], [>], [>], [>], [>] expertise, subject-area: role in big data, [>]–[>] explainability: big data and, [>]–[>] Facebook, [>], [>], [>]–[>], [>]–[>], [>], [>], [>], [>] data processing by, [>] datafication by, [>], [>] IPO by, [>]–[>] market valuation of, [>]–[>] uses “data exhaust,” [>] Factual, [>] Fair Isaac Corporation (FICO), [>], [>] Farecast, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>] finance: big data in, [>]–[>], [>], [>] Fitbit, [>] Flickr, [>]–[>] FlightCaster.com, [>]–[>] floor covering, touch-sensitive: and datafication, [>] Flowers, Mike: and government use of big data, [>]–[>], [>] flu: cell phone data predicts spread of, [>]–[>] Google predicts spread of, [>]–[>], [>], [>], [>], [>], [>], [>], [>] vaccine shots, [>]–[>] FlyOnTime.us, [>]–[>], [>]–[>] Ford, Henry, [>] Ford Motor Company, [>]–[>] Foursquare, [>], [>] Freakonomics (Leavitt), [>]–[>] free will: justice based on, [>]–[>] vs. predictive analytics, [>], [>], [>], [>]–[>] Galton, Sir Francis, [>] Gasser, Urs, [>] Gates, Bill, [>] Geographia (Ptolemy), [>] geospatial location: cell phone data and, [>]–[>], [>]–[>] commercial data applications, [>]–[>] datafication of, [>]–[>] insurance industry uses data, [>] UPS uses data, [>]–[>] Germany, East: as police state, [>], [>], [>] Global Positioning System (GPS) satellites, [>]–[>], [>], [>], [>] Gnip, [>] Goldblum, Anthony, [>] Google, [>], [>], [>], [>], [>], [>], [>], [>] artificial intelligence at, [>] as big-data company, [>] Books project, [>]–[>] data processing by, [>] data-reuse by, [>]–[>], [>], [>] Flu Trends, [>], [>], [>], [>], [>], [>] gathers GPS data, [>], [>], [>] Gmail, [>], [>] Google Docs, [>] and language translation, [>]–[>], [>], [>], [>], [>] MapReduce, [>], [>] maps, [>] PageRank, [>] page-ranking by, [>] predicts spread of flu, [>]–[>], [>], [>], [>], [>], [>], [>], [>] and privacy, [>]–[>] search-term analytics by, [>], [>], [>], [>], [>], [>] speech-recognition at, [>]–[>] spell-checking system, [>]–[>] Street View vehicles, [>], [>]–[>], [>], [>] uses “data exhaust,” [>]–[>] uses mathematical models, [>]–[>], [>] government: and open data, [>]–[>] regulation and big data, [>]–[>], [>] surveillance by, [>]–[>], [>]–[>] Graunt, John: and sampling, [>] Great Britain: open data in, [>] guilt by association: profiling and, [>]–[>] Gutenberg, Johannes, [>] Hadoop, [>], [>] Hammerbacher, Jeff, [>] Harcourt, Bernard, [>] health care: big data in, [>]–[>], [>], [>] cell phone data in, [>], [>]–[>] predictive analytics in, [>]–[>], [>] Health Care Cost Institute, [>] Hellend, Pat: “If You Have Too Much Data, Then ‘Good Enough’ Is Good Enough,” [>] Hilbert, Martin: attempts to measure information, [>]–[>] Hitwise, [>], [>] Hollerith, Herman: and punch cards, [>], [>] Hollywood films: profits predicted, [>]–[>] Honda, [>] Huberman, Bernardo: and social networking analysis, [>] human behavior: datafication and, [>]–[>], [>]–[>] human perceptions: big data changes, [>] IBM, [>] and electric automobiles, [>]–[>] founded, [>] and language translation, [>]–[>], [>] Project Candide, [>]–[>] ID3, [>] “If You Have Too Much Data, Then ‘Good Enough’ Is Good Enough” (Hellend), [>] Import.io, [>] imprecision.

pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy
by Sharon Bertsch McGrayne
Published 16 May 2011

During the 1970s IBM had two competing teams working on a similar problem, speech recognition. One group, filled with linguists, studied the rules of grammar. The other group, led by Mercer and Brown, who later went to RenTech, was filled with mathematically inclined communications specialists, computer scientists, and engineers. They took a different tack, replaced logical grammar with Bayes’ rule, and were ignored for a decade. Mercer’s ambition was to make computers do intelligent things, and voice recognition seemed to be the way to make this happen. For both Mercer and Brown speech recognition was a problem about taking a signal that had passed through a noisy channel like a telephone and then determining the most probable sentence that the speaker had in mind.

…

It allows its users to assess uncertainties when hundreds or thousands of theoretical models are considered; combine imperfect evidence from multiple sources and make compromises between models and data; deal with computationally intensive data analysis and machine learning; and, as if by magic, find patterns or systematic structures deeply hidden within a welter of observations. It has spread far beyond the confines of mathematics and statistics into high finance, astronomy, physics, genetics, imaging and robotics, the military and antiterrorism, Internet communication and commerce, speech recognition, and machine translation. It has even become a guide to new theories about learning and a metaphor for the workings of the human brain. One of the surprises is that Bayes, as a buzzword, has become chic. Stanford University biologist Stephen H. Schneider wanted a customized cancer treatment, called his logic Bayesian, got his therapy, went into remission, and wrote a book about the experience.

…

At first they worked their way through old, out-of-copyright children’s books; 1,000 words from a U.S. Patent Office experiment with laser technology; and 60 million words of Braille-readable text from the American Printing House for the Blind. At an international acoustic, speech, and signal meeting the group wore identical T-shirts printed with the words “Fundamental Equation of Speech Recognition” followed by Bayes’ theorem. They developed “a bit of swagger, I’m ashamed to say,” Mercer recalled. “We were an obnoxious bunch back in those days.” In a major breakthrough in the late 1980s they gained access to French and English translations of the Canadian parliament’s daily debates, about 100 million words in computer-readable form.

The Future of Technology
by Tom Standage
Published 31 Aug 2005

In computing, says Microsoft’s Mr Breese, “the holy grail of simplicity is I-just-wanna-talk-to-my-computer”, so that the computer can “anticipate my needs”. The technical term for this is speech recognition. “Speech makes the screen deeper,” says X.D. Huang, Microsoft’s expert on the subject. “Instead of a limited drop-down menu, thousands of functions can be brought to the foreground.” The only problem is that the idea is almost certainly unworkable. People confuse speech recognition with language understanding, argues Mr Norman. But to achieve language understanding, you first have to crack the problem of artificial intelligence (ai), which has eluded scientists for half a century.

…

Just think how difficult it would be to teach somebody to tie a shoelace or to fold an origami object by using words alone, without a diagram or a demonstration. “What we imagine systems of speech-understanding to be is really mind-reading,” says Mr Norman. “And not just mind-reading of thoughts, but of perfect thoughts, of solutions to problems that don’t yet exist.” The idea that speech recognition is the key to simplicity, Mr Norman says, is therefore “just plain silly”. He concludes that the only way to achieve simplicity is to have gadgets that explicitly and proudly do less (he calls these “information appliances”). Arguably, the iPod proves him right. Its success so far stems from its relative modesty of ambition: it plays songs but does little else.

…

Once work has moved abroad, however, it joins the same cycle of automation and innovation that pushes technology forward everywhere. Optical-character-recognition software is automating the work of Indian data-entry workers. Electronic airline tickets are eliminating some of the ticket-reconciliation work airlines carry out in India. Eventually, natural-language speech recognition is likely to automate some of the call-centre work that is currently going to India, says Steve Rolls of Convergys, the world’s largest call-centre operator. All this helps to promote outsourcing and the building of production platforms in India. ge is selling gecis, its Indian financial-services administrator, and Citibank, Deutsche Bank and others have disposed of some of their Indian it operations.

pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else
by Steve Lohr
Published 10 Mar 2015

Returning to Harvard, Hammerbacher got his job back at the library, a condition of his financial aid, and attended classes more regularly. One was a small math seminar on probability. The hands-on project was to write a software program for speech recognition. It seemed a good test bed for math, since calculating probabilities and matching patterns in sound frequencies is crucial in speech recognition. Not incidentally, the instructor was Paul Bamberg, a cofounder of Dragon Systems, a commercial pioneer in speech recognition software. The programming involved tasks like implementing a fast Fourier transform algorithm, which converts time or space to frequency, and vice versa. The seminar was for students with serious math muscles, and there were only five students in the class.

…

Humans understand things in good part largely because of their experience of the real world. Computers lack that advantage. Advances in artificial intelligence mean that machines can increasingly see, read, listen, and speak, in their way. And a very different way, it is. As Frederick Jelinek, a pioneer in speech recognition and natural-language processing at IBM, once explained by way of analogy: “Airplanes don’t flap their wings.” To get a sense of how computers build knowledge, let’s look at Carnegie Mellon University’s Never-Ending Language Learning system, or NELL. Since 2010, NELL has been steadily scanning hundreds of millions of Web pages for text patterns that it uses to learn facts, more than 2.3 million so far, with an estimated accuracy of 87 percent.

…

By December of 2013, however, Krugman had become more impressed by advances in computing and he wrote an article, published on the Times’s Web site, explaining why he thinks Gordon is “probably wrong.” A decade ago, Krugman writes, “the field of artificial intelligence had marched from failure to failure. But something has happened—things that were widely regarded as jokes not long ago, like speech recognition, machine translation, self-driving cars, and so on, have suddenly become more or less working reality.” Data and software, Krugman observes, have forged the path to working artificial intelligence. “They’re using big data and correlations and so on,” he writes, “to implement algorithms—mindless algorithms, you might say.

pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing
by Ed Finn
Published 10 Mar 2017

New York: Cambridge University Press, 2001. Baldwin, Roberto. “Netflix Gambles on Big Data to Become the HBO of Streaming.” WIRED, November 29, 2012. http://www.wired.com/2012/11/netflix-data-gamble. “Behind Apple’s Siri Lies Nuance’s Speech Recognition.” Forbes. Accessed May 28, 2014. http://www.forbes.com/sites/rogerkay/2014/03/24/behind-apples-siri-lies-nuances-speech-recognition. Belsky, Scott. “The Interface Layer: Where Design Commoditizes Tech.” Medium, May 30, 2014. https://medium.com/bridge-collection/the-interface-layer-when-design-commoditizes-tech-e7017872173a. Bendeich, Mark. “Foxconn Says Underage Workers Used in China Plant.”

…

Kay, Paul, Brent Berlin, Luisa Maffi, William R. Merrifield, and Richard Cook. The World Color Survey. 1st ed. Stanford, Calif.: Center for the Study of Language and Information, 2011. Kay, Roger. “Behind Apple’s Siri Lies Nuance’s Speech Recognition.” Forbes, March 24, 2014. http://www.forbes.com/sites/rogerkay/2014/03/24/behind-apples-siri-lies-nuances-speech-recognition/#3b1b09f8421c. Kim, Larry. “How Many Ads Does Google Serve in a Day?” Business 2 Community. Published November 2, 2012. Accessed May 30, 2014. http://www.business2community.com/online-marketing/how-many-ads-does-google-serve-in-a-day-0322253.

…

This is a classic computational pragmatist approach to a problem, charting an effective computability pathway through the morass of language by depending on trial and error, treating spoken language just like any other complex system. In this sense Siri is as much a listening service as it is an answering one. Over time Siri has presumably collected billions of records of successful and unsuccessful interactions, providing a valuable resource in improving speech recognition.15 Apple claims the data it retains is anonymized, but this policy is unsurprisingly troubling to privacy advocates.16 While we get personalized service, Siri is effectively a single collective machine, learning from these billions of data points under the supervision of its engineers. Like so many other big data, algorithmic machines, it depends on a deep well, a cistern of human attention and input that serves as an informational reservoir for computational inference.

pages: 370 words: 112,809

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future
by Orly Lobel
Published 17 Oct 2022

As it turns out, however, they do not always listen to everyone equally. Speech recognition exemplifies how partial training data has led machines to learn more about white men’s speech patterns and less about those of women and people of color. Case in point: Google’s speech recognition is 13 percent more accurate for men than it is for women.22 Testing a variety of speech activation technologies has shown that virtual assistants are more likely to understand male users than female users. If the user is a woman of color, the rate of accurately understanding her speech drops further. In one study testing speech recognition of different accents, English spoken with an Indian accent only had a 78 percent accuracy rate; recognition of English spoken with a Scottish accent was only 53 percent accurate.

…

In 2019, in partnership with the German Ministry for Economic Cooperation and Development, Mozilla increased its efforts to collect local language data in Africa through an initiative called Common Voice and Deep Speech.25 The data set is already being used in voice assistant technologies such as Mycroft, an open-source voice assistant named after Sherlock Holmes’s elder brother, and the Brazilian Portuguese medical transcription tool Iara Health. Kelly Davis, head of machine learning at Mozilla, describes the profound significance of focusing on underresourced languages and language preservation in correcting the imbalance of languages in mainstream speech recognition technology. He says that we should look at speech recognition as a public resource. This theme of conceptualizing advances in technologies, vastly aided through data collection, as a public good must become a recurring one as we strive to build equality machines. Voice and speech—like many other types of information that are making our machines smarter—are intimately tied to our autonomous selves, from our genetic makeup and our health information to our behavioral and emotional responses to different decision-making environments.

…

Still, undoubtedly the increased understanding of English-speaking males is something of a “Big Five” effect: most voice recognition platforms are made by five companies—Amazon, Apple, Google, Meta, and Microsoft, which themselves are disproportionately staffed and led by white males. This kind of deficiency in speech recognition is relatively easy to remedy. The fix involves increasing the range and diversity of the data that we feed technology. A more diverse range of voices in the video and sound fed to algorithms will result in those algorithms’ improved ability to interpret a broader range of speech patterns. Diversity in, diversity out.

pages: 193 words: 51,445

On the Future: Prospects for Humanity
by Martin J. Rees
Published 14 Oct 2018

The vein pattern in our eyes allows the use of ‘iris recognition’ software—a substantial improvement on fingerprints or facial recognition. This is precise enough to unambiguously identify individuals, among the 1.3 billion Indians. And it is a foretaste of the benefits that can come from future advances in AI. Speech recognition, face recognition, and similar applications use a technique called generalised machine learning. This operates in a fashion that resembles how humans use their eyes. The ‘visual’ part of human brains integrates information from the retina through a multistage process. Successive layers of processing identify horizontal and vertical lines, sharp edges, and so forth; each layer processes information from a ‘lower’ layer and then passes its output to other layers.8 The basic machine-learning concepts date from the 1980s; an important pioneer was the Anglo-Canadian Geoff Hinton.

…

But there are still limitations. The hardware underlying AlphaGo used hundreds of kilowatts of power. In contrast, the brain of Lee Sedol, AlphaGo’s Korean challenger, consumes about thirty watts (like a lightbulb) and can do many other things apart from play board games. Sensor technology, speech recognition, information searches, and so forth are advancing apace. So (albeit with a more substantial lag) is physical dexterity. Robots are still clumsier than a child in moving pieces on a real chessboard, tying shoelaces, or cutting toenails. But here too there is progress. In 2017, Boston Dynamics demonstrated a fearsome-looking robot called Handel (a successor to the earlier four-legged Big Dog), with wheels as well as two legs, that is agile enough to perform back flips.

…

See also food production AI (artificial intelligence): airplanes flown using, 93–94; benefits and risks of, 5, 63; concern about decisions by, 89, 116; facial recognition and, 84, 85, 89, 90, 101; game-playing computers, 86–87, 88, 103, 106, 191; gene combinations identified with, 68; human-level intelligence and, 102–8, 119; inorganic intelligences, 151, 152–53, 159, 169–70; iris recognition and, 84–85; jobs affected by, 91–92; machine learning and, 85, 89, 143; now at very early stage, 91; personalisation of online learning by, 98–99; as possible threat to civilisation, 109–10; posthuman evolution and, 153, 178; privacy concerns regarding, 90; responsible innovation in, 106, 218, 219, 225; self-awareness and, 107, 153; self-driving vehicles, 92–95, 102–3; speech recognition and, 85, 88; in warfare, 101. See also robots air traffic control, 108 Alcor, 81–82 Aldrin, Buzz, 138 aliens, intelligent: communicating through shared mathematical culture, 160, 168; with different perception of reality, 160, 190; early history of speculation on, 126–27; Earth’s history seen by, 1–2; likelihood of, 154–56, 162; possibly pervading the cosmos, 8, 156; search for, 156–64.

pages: 661 words: 156,009

Your Computer Is on Fire
by Thomas S. Mullaney , Benjamin Peters , Mar Hicks and Kavita Philip
Published 9 Mar 2021

Many of the early speech technologies used the now thirty-year-old corpus Switchboard, from the University of Pennsylvania’s Linguistic Consortium, to train on; the accents represented there are largely American Midwestern. The result is that speech recognition error rates for marginalized voices are higher than others; as Paul notes, “a typical database of American voices, for example, would lack poor, uneducated, rural, non-white, non-native English voices. The more of those categories you fall into, the worse speech recognition is for you.”46 Work by Caliksan-Islam, Bryson, and Narayanan has empirically demonstrated what we have already suspected: that beyond just marginalization of people, “human-like semantic biases result from the application of standard machine learning to ordinary language.”47 Simply put, speech technologies replicate existing language biases.

…

“Usage of Content Languages for Websites,” W3Techs, accessed March 31, 2018, W3techs.com/technologies/overview/content_language/all. 37. Daniela Hernandez, “How Voice Recognition Systems Discriminate against People with Accents: When Will There be Speech Recognition for the Rest of Us?” Fusion (August 21, 2015), http://fusion.net/story/181498/speech-recognition-ai-equality/. 38. Braj B. Kachru, “The English Language in the Outer Circle,” World Englishes 3 (2006): 241–255. 39. Salikoko Mufwene, “The Legitimate and Illegitimate Offspring of English,” in World Englishes 2000, ed. Larry E. Smith and Michael L.

…

These include but are not limited to: • A variety of what I would describe as “first-order,” more rudimentary, blunt tools that are long-standing and widely adopted, such as keyword ban lists for content and user profiles, URL and content filtering, IP blocking, and other user-identifying mechanisms;13 • More sophisticated automated tools such as hashing technologies used in products like PhotoDNA (used to automate the identification and removal of child sexual exploitation content; other engines based on this same technology do the same with regard to terroristic material, the definitions of which are the province of the system’s owners);14 • Higher-order AI tools and strategies for content moderation and management at scale, examples of which might include: ◦ Sentiment analysis and forecasting tools based on natural language processing that can identify when a comment thread has gone bad or, even more impressive, when it is in danger of doing so;15 ◦ AI speech-recognition technology that provides automatic, automated captioning of video content;16 ◦ Pixel analysis (to identify, for example, when an image or a video likely contains nudity);17 ◦ Machine learning and computer vision-based tools deployed toward a variety of other predictive outcomes (such as judging potential for virality or recognizing and predicting potentially inappropriate content).18 Computer vision was in its infancy when I began my research on commercial content moderation.

pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything
by Martin Ford
Published 13 Sep 2021

As always, competition between the cloud providers is a powerful driver of innovation, and Amazon’s deep learning tools for the AWS platform are likewise becoming easier to use. Along with the development tools, all the cloud services offer pre-built deep learning components that are ready to be used out of the box and incorporated into applications. Amazon, for example, offers packages for speech recognition and natural language processing and a “recommendation engine” that can make suggestions in the same way that online shoppers or movie watchers are shown alternatives that are likely to be of interest.16 The most controversial example of this kind of prepackaged capability is AWS’s Rekognition service, which makes it easy for developers to deploy facial recognition technology.

…

Much of this was funneled through the Advanced Research Projects Agency, or ARPA. One especially important center of APRA-funded research was the Stanford Research Institute, which later separated from Stanford University to become SRI International. SRI’s Artificial Intelligence Center, established in 1966, did groundbreaking work in areas like language translation and speech recognition. The lab also created the first truly autonomous robot, a machine capable of turning AI-powered reasoning into physical interaction with the environment. Nearly half a century after its founding, SRI’s Artificial Intelligence Center would spin off a startup company with a new personal assistant called Siri that would be acquired by Apple in 2010.

…

In the 1990s, Schmidhuber and his students developed a special type of neural network that implemented “long short-term memory,” or LSTM. With LSTM, networks are able to “remember” data from the past and incorporate it into the current analysis. This has proven to be of critical importance in areas like speech recognition and language translation, where the context created by words that came previously has a huge impact on accuracy. Companies like Google, Amazon and Facebook all rely heavily on LSTM, and Schmidhuber feels that it is the work of his team, rather than that of the more celebrated North American researchers, that underlies much of AI’s recent progress.

pages: 385 words: 111,113

Augmented: Life in the Smart Lane
by Brett King
Published 5 May 2016

By 1952, Bell had developed a system for single-digit speech recognition but it was extremely limited. In 1969, however, John Pierce, one of Bell’s leading engineers, wrote an open letter to the Acoustical Society of America criticising speech recognition at Bell and compared it to “schemes for turning water into gasoline, extracting gold from the sea, curing cancer, or going to the moon”. Ironically, one month after Pierce published his open letter, Neil Armstrong landed on the moon. Regardless, Bell Labs still had its funding for speech recognition pulled soon after. By 1993, speech recognition systems developed by Ray Kurzweil could recognise 20,000 words (uttered one word at a time), but accuracy was limited to about 10 per cent.

…

By 1993, speech recognition systems developed by Ray Kurzweil could recognise 20,000 words (uttered one word at a time), but accuracy was limited to about 10 per cent. In 1997, Bill Gates was pretty bullish on speech recognition, predicting that “In this 10-year time frame, I believe that we’ll not only be using the keyboard and the mouse to interact, but during that time we will have perfected speech recognition and speech output well enough that those will become a standard part of the interface.”25 In the year 2000, it was still a decade away. The big breakthroughs came with the application of Markov models and Deep Learning models or neural networks, basically better computer performance and bigger source databases.

pages: 321 words: 113,564

AI in Museums: Reflections, Perspectives and Applications
by Sonja Thiel and Johannes C. Bernhardt
Published 31 Dec 2023

Technical Details and Functioning Software Pipeline The interactive installation Wishing Well features a software pipeline that automatically generates prompt-based images from multilingual speech input. This pipeline integrates various AI components: The OpenAI automatic speech recognition system Whisper is used for voice activity detection, real-time speech enhancement, and automatic speech recognition. The ASR system, released in 2022, is capable of comprehending and transcribing nearly 100 languages automatically (Radford/Kim/Xu et al. 2022). Additionally, the pipeline utilizes DeepL for text translation from the languages spoken into English.

…

In his fundamental study of 1950, Alan Turing argued that the thinking of intelligent humans could not be precisely defined and therefore any output of a machine that cannot be recognized as such by humans should also be regarded as intelligent (Turing 1950; Vater 2023); a little later, the research field of artificial intelligence was established at the famous Dartmouth Workshop of 1956 (McCorduck 2004; Moor 2006). Since then, the concept of AI has changed again and again, been differentiated into subfields such as expert systems, speech recognition, or computer vision, and experienced booms and busts (Nilsson 2004; Seising 2021). AI functions as an umbrella term for a multitude of technical approaches that are often taken as a provocation of human intelligence and regularly trigger both fantasies and fears. If one speaks less far-reaching of systems that follow algorithmic rules, recognize patterns in data, and solve specific tasks, the challenges to human intelligence and related categories such as thinking, consciousness, reason, creativity, or intentionality pose themselves less sharply.

…

First, let us consider the prevalence of predominantly white stock images and the portrayal of white robots. This raises questions about the realistic depiction of technology and the people behind it. We must aim for images that accurately represent the technology, its strengths, weaknesses, context, and applications (Cave/ Dihal 2020). Similarly, speech recognition systems should be designed to recognize and accommodate various accents and dialects. The notion of what constitutes an accent itself deserves critical examination (Cave/Dihal 2020). It is evident that AI requires a cultural change, and we must acknowledge that everyone plays a role in driving this change.

pages: 661 words: 187,613

The Language Instinct: How the Mind Creates Language
by Steven Pinker
Published 1 Jan 1994

But this advantage can be enjoyed only by a high-tech speech recognizer, one that has some kind of knowledge of how vocal tracts blend sounds. The human brain, of course, is a high-tech speech recognizer, but no one knows how it succeeds. For this reason psychologists who study speech perception and engineers who build speech recognition machines keep a close eye on each other’s work. Speech recognition may be so hard that there are only a few ways it could be solved in principle. If so, the way the brain does it may offer hints as to the best way to build a machine to do it, and how a successful machine does it may suggest hypotheses about how the brain does it.

…

Ray Jackendoff and I think this is not the whole story, for reasons we explained in our paper debating Chomsky. Chapter 6: The Sounds of Silence. Speech recognition technology has advanced tremendously and is now inescapable in telephone information systems. But as everyone who has been trapped in “voice-mail jail” knows, the systems are far from foolproof (“I’m sorry, but I did not understand what you said”). And here is how the novelist Richard Powers described his recent experience with a state-of-the-art speech recognition program: “This machine is a master of speakos and mondegreens. Just as we might hear the…Psalms avow that ‘Shirley, good Mrs.

…

A finite inventory of phonemes is sampled and permuted to define words, and the resulting strings of phonemes are then massaged to make them easier to pronounce and understand before they are actually articulated. I will trace out these steps for you and show you how they shape some of our everyday encounters with speech: poetry and song, slips of the ear, accents, speech recognition machines, and crazy English spelling. One easy way to understand speech sounds is to track a glob of air through the vocal tract into the world, starting in the lungs. When we talk, we depart from our usual rhythmic breathing and take in quick breaths of air, then release them steadily, using the muscles of the ribs to counteract the elastic recoil force of the lungs.

pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI
by Paul R. Daugherty and H. James Wilson
Published 15 Jan 2018

“Illuminating Data,” Texas Medical Center, August 24, 2014, http://www.tmc.edu/news/2014/08/illuminating-data/. 24.George Wang, “Texas Medical Center and Ayasdi to Create a World-Class Center for Complex Data Research and Innovation,” Ayasdi, November 13, 2013, https://www.ayasdi.com/company/news-and-events/press/pr-texas-medical-center-and-ayasdi-to-create-a-world-class-center-for-complex-data-research-and-innovation/. 25.Khari Johnson, “Google’s Tensorflow Team Open-Sources Speech Recognition Dataset for DIY AI,” VentureBeat, August 24, 2017, https://venturebeat.com/2017/08/24/googles-tensorflow-team-open-sources-speech-recognition-dataset-for-diy-ai/. 26.Adam Liptak, “Sent to Prison by a Software Program’s Secret Algorithms,” New York Times, May 1, 2017, https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-software-programs-secret-algorithms.html?

…

Neural networks that convert audio signals to text signals in a variety of languages. Applications include translation, voice command and control, audio transcription, and more. Natural language processing (NLP). A field in which computers process human (natural) languages. Applications include speech recognition, machine translation, and sentiment analysis. AI Applications Component Intelligent agents. Agents that interact with humans via natural language. They can be used to augment human workers working in customer service, human resources, training, and other areas of business to handle FAQ-type inquiries.

…

See marketing and sales Salesforce, 85–86, 196 Samsung, 96–97 Samuel, Arthur, 41, 60 scale, 160 Schaefer, Markus, 148 scheduling agents, 196 Schnur, Steve, 194 scientific method, 69–77 hypotheses in, 72–74 observation in, 69–72 testing in, 74–77 SEB, 55–56, 59, 143–145, 160 second wave of business transformation, 5, 19, 47 security, IT, 56–58 semi-supervised learning, 62 Sensabot, 192 sensors in agriculture, 35–37 product development and, 29 retail shopping, 160–165 in robotic arms, 24–26 sentiment tracking, 176 SEW-Eurodrive, 149 Shah, Julie, 120 Shah, Uman, 98 Shannon, Claude, 40 Siemens, 23, 210 Sight Machine, 27 SigOpt, 77 Siri, 11, 96–97, 118, 146 6sense, 92 skills amplification of, 7 developing, 15–16 fusion, 12, 15–16, 181, 183–206 human vs. machine, 20–21, 105–106, 151 in manufacturing, 38 in marketing and sales, 100 in R&D, 83 Slack, 196 smart glasses, 143 smart mirrors, 87–88, 100 social media, 98, 176 software design, AI-enabled, 3 generative design, 135–137, 139, 141 Sophie, 119 SparkCognition, 58 speech recognition, 66 speech to text, 64 Spiegel, Eric, 210 Standup Bot, 196 State Farm, 99 Steele, Billy, 76 Stitch Fix, 110–111, 152, 204 Store No. 8, 162 Summer Olympics, 98 supervised learning, 60 supply chains, 19–39 data, 12, 15 sustaining, 107, 114–115, 179 jobs in, 126–132 See also missing middle S Voice, 96–97 Swedberg, Claire, 31 symbiosis, 7–8 symbol-based systems, 24, 41 Symbotic, 32–33 symmetry, 130 Systematica, 167 task performance training, 116 Tatsu, 196 Tay, 168–169 Taylor, Harriet, 91 telepresence robots, 159 Tempo, 176 Tesla, 67–68, 83, 190 testing, 74–77 Texas Medical Center, 178 Textio, 196 text recognition, 66 third wave transformation, 4–6 adaptive processes in, 19–21 time, rehumanizing, 12, 186–189 time-and-motion analyses, 4 Toyota Research Institute, 166 training/retraining, 15, 107, 114–115, 208 augmentation in, 143 auto technicians, 158–160 crowdsourcing/outsourcing, 120–121 curriculum development for, 178–179 data for, 121–122 education for AI, 132–133 empathy, 117–118 employee willingness toward, 185 feedback loops in, 174 for fusion skills, 211–213 holistic melding and, 200–201 human expertise and, 194–195 interaction modeling, 120 jobs in training AI systems, 100, 114–122 personality, 118–119 reciprocal apprenticing and, 12, 201–202 worldview and localization, 119–120 See also missing middle transparency, 213 transparency analysts, 125 trust of machines vs. humans, 166–168, 172–173 moral crumple zones and, 169–172 Twitter, 168–169 Uber, 44, 95, 169 uncanny valley, 116 Unilever, 51–52 Universal Robotics, 23 University of Ottawa, 70 University of Pennsylvania, 167 University of Pittsburgh Medical Center (UPMC), 188 unmanned vehicles, 28 unsupervised learning, 61–62 US Department of Justice, 45 user interfaces, 140 user needs, discovering, 156 V8, 98 Vallaeys, Frederick, 99 Vectra, 58 vehicle design anthropologists, 113–114 vehicles, autonomous, 67–68, 166–167, 189, 190 Vertesi, Janet, 201 vertical farms, 36 video recognition, 66 Virgin Trains, 47–48, 50, 59 vision.

pages: 626 words: 167,836

The Technology Trap: Capital, Labor, and Power in the Age of Automation
by Carl Benedikt Frey
Published 17 Jun 2019

It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 and can therefore potentially provide low-cost universal access to vital diagnostic care.”16 Machines are not just turning into better translators and diagnosticians. They are becoming better listeners, too. Speech recognition technology is improving at staggering speed. In 2016, Microsoft announced a milestone in reaching human parity in transcribing conversations. And in August 2017, a research paper published by Microsoft’s AI team revealed additional improvements, reducing the error rate from 6 percent to 5 percent.17 And like image recognition technology promises to replace doctors in diagnostic tasks, advances in speech recognition and user interfaces promise to replace workers in some interactive tasks. As we all know, Apple’s Siri, Google Assistant, and Amazon’s Alexa rely on natural user interfaces to recognize spoken words, interpret their meanings, and respond to them accordingly.

…

According to Cisco, worldwide internet traffic will increase nearly threefold over the next five years, reaching 3.3 zettabytes per year by 2021.8 To put this number in perspective, researchers at the University of California, Berkeley estimate that the information contained in all books worldwide is around 480 terabytes, while a text transcript of all the words ever spoken by humans would amount to some five exabytes.9 Data can justly be regarded as the new oil. As big data gets bigger, algorithms get better. When we expose them to more examples, they improve their performance in translation, speech recognition, image classification, and many other tasks. For example, an ever-larger corpus of digitalized human-translated text means that we are able to better judge the accuracy of algorithmic translators in reproducing observed human translations. Every United Nations report, which is always translated by humans into six languages, gives machine translators more examples to learn from.10 And as the supply of data expands, computers do better.

…

As we all know, Apple’s Siri, Google Assistant, and Amazon’s Alexa rely on natural user interfaces to recognize spoken words, interpret their meanings, and respond to them accordingly. Using speech recognition technology and natural language processing, a company called Clinc is now developing a new AI voice assistant to be used in drive-through windows of fast-food restaurants like McDonald’s and Taco Bell.18 And in 2018, Google announced that it is building AI technology to replace workers in call centers. Virtual agents will answer the phone when a customer calls. If a customer request involves something the algorithm cannot yet do, he or she will automatically be rerouted to a human agent.

pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson and Andrew McAfee
Published 20 Jan 2014

We could have pulled over, found the portable GPS and turned it on, typed in our destination, and waited for our routing, but we didn’t want to exchange information that way. We wanted to speak a question and hear and see (because a map was involved) a reply. Siri provided exactly the natural language interaction we were looking for. A 2004 review of the previous half-century’s research in automatic speech recognition (a critical part of natural language processing) opened with the admission that “Human-level speech recognition has proved to be an elusive goal,” but less than a decade later major elements of that goal have been reached. Apple and other companies have made robust natural language processing technology available to hundreds of millions of people via their mobile phones.10 As noted by Tom Mitchell, who heads the machine-learning department at Carnegie Mellon University: “We’re at the beginning of a ten-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.”11 Digital Fluency: The Babel Fish Goes to Work Natural language processing software is still far from perfect, and computers are not yet as good as people at complex communication, but they’re getting better all the time.

…

Exponential progress has made possible many of the advances discussed in the previous chapter. IBM’s Watson draws on a plethora of clever algorithms, but it would be uncompetitive without computer hardware that is about one hundred times more powerful than Deep Blue, its chess-playing predecessor that beat the human world champion, Garry Kasparov, in a 1997 match. Speech recognition applications like Siri require lots of computing power, which became available on mobile phones like Apple’s iPhone 4S (the first phone that came with Siri installed). The iPhone 4S was about as powerful, in fact, as Apple’s top-of-the-line Powerbook G4 laptop had been a decade earlier. As all of these innovations show, exponential progress allows technology to keep racing ahead and makes science fiction reality in the second half of the chessboard.

…

These three forces are yielding breakthroughs that convert science fiction into everyday reality, outstripping even our recent expectations and theories. What’s more, there’s no end in sight. The advances we’ve seen in the past few years, and in the early sections of this book—cars that drive themselves, useful humanoid robots, speech recognition and synthesis systems, 3D printers, Jeopardy!-champion computers—are not the crowning achievements of the computer era. They’re the warm-up acts. As we move deeper into the second machine age we’ll see more and more such wonders, and they’ll become more and more impressive. How can we be so sure?

pages: 238 words: 46

When Things Start to Think
by Neil A. Gershenfeld
Published 15 Feb 1999

A mouse forces you to manipulate things with one hand alone; Bill develops interfaces that can use both hands. A perennial contender on the short list for the next big interface is speech recognition, promising to let us talk to our computers as naturally as we talk to each other. Appealing as that is, it has a few serious problems. It would be tiring if we had to spend the day speaking continuously to get anything done, and it would be intrusive if our conversations with other people had to be punctuated by our conversations with our machines. Most seriously, even if speech recognition systems worked perfectly (and they don't}, the result is no better than if the commands had been typed.

…

(author of book on nuclear magnetic resonance), 155 expectations from computers, 4 eyeglasses, 58 factoring, 15 6 fashion and wearable computers, 55-56 Federal Communications Commission (FCC), 99 FedEx,47-48,203-4 Festo, 69 Feynmann, Richard, 158, 161, 166-67 First Amendment, 99 "Fish" circuit board, 144 Fletcher, Richard, 153 flexible work groups, 180, 192 focus groups, 75 Ford Motor Company, 78 Frankfurt book fair, 13 Fredkin, Ed, 132 freedom of speech and the press, 98-99 free markets versus central planning, 88-89 functional magnetic resonance imaging, 140-41 furnace, information, 201 furniture that can see, 169-70, 179, 193,202 futures traders, 77-78 fuzzy logic, 107, 109, 120-21 genome, editing the, 212 Global Positioning System (GPS) receivers, 152, 166, 177-78 GM, 78 gold standard, 79 Gore, AI, 60 Government Performance and Results Act (GRPA), 173-74 GPS receivers, 152, 166, 177-78 Grinstein, Geoff, 165 groupware, 59 Grover, Lov, 162 INDEX Gutenberg and movable metal type, 18-19 Hamanaka, Yasuo, 77 Harvard University, 195 Hawking, Stephen, 74 Hawley, Michael, 54-55, 195, 203 health care industry, 204 Helmholtz, Hermann, 39 Hewlett Packard, 52, 203 high-definition television, 6 Hilbert, David, 127 holograms, 142 Human-Computer Interaction community, 140 IBM, 63, 128, 159, 160, 165, 176 "Deep Blue," 129-30 Iguchi, Toshihide, 77, 86 Imitation Game, see Turing test implants, 211-12 industry: digital revolution and, 192 media Lab and, 107, 169-84, 202-7 research and development and, see research and development wearable computers and, 47-48 infant seats, device to disable, 170-71, 180 inflight videos, 111 Information Theory, 128 Institute for Theoretical Physics, 160 insurance, 209 Intel, 78, 156 intellectual property, 181, 194 intelligence agencies, quantum computers and, 159 interfaces, computer, 140-47, 156 mind control, 140-41 speech recognition, 140 3D graphics, 141-42 Windows-and-mice, 106, 139, 147 Internet and World Wide Web, 3-4, 6-7,9-10,58-59,213 attempts to regulate, 99 INDEX birth of, 79-80 developing countries and, 210-11 educational implications of, 193 electronic commerce and, 80-81 IP protocol, 89 search engines, 134 Things That Think and, 207 IRCAM, 33 Ishii, Hiroshi, 145 Jacobson, Joseph, 15-17, 72, 202 James II, King, 98 jargon, technology, 107-21 Jobs, Steve, 139 Josza, Richard, 158 Kasparov, Gary, 129, 133, 134-35 Kavafian, Ani, 49-50 Kay, Alan, 137, 138-39 Kaye, Joseph, 9 Kennedy administration, 185 Kepler, Johannes, 113 Land, Edwin, 187 Landauer, Rolf, 176, 177 laptop computers: backlighting of screen, 14 books versus, 14-15 complicated instructions for running, 97-98 invention of, 137 Leeson, Nick, 77, 86 Legoland, 68 Legos, 68-69, 71, 73, 146--47, 193 Leibniz, Gottfried, 131 Leo X, Pope, 95, 96 libraries and the electronic book, 20-23 Lind, James, 210 Lippman, Andy, 203 Lloyd, Seth, 158, 162 LOGO programming language, 138, 147 Lorenz, Edward, 113, 114 Lotus, 101 + 221 Lovelace, Lady Ada, 125 Luther, Martin, 95-97, 106 Luthiers, 39, 41 machine tools, 71, 75 characteristics of, 64 Machover, Tod, 33, 49, 169, 203, 206 Maelzel, Johann, 124 magnetic resonance imaging (MRI), 154 magnetoencephalography (MEG), 140--41 mainframes, 75, 151 characteristics of early, 63 Mann, Steve, 45--46, 47, 57-58 Manutius, Aldus, 20 Margolus, Norman, 132 Marketplace:Households product, 101 Massachusetts Institute of Technology (MIT), 65, 113, 128-29, 132, 138, 158, 195 Architecture Machine Group, 186 Media Lab, see Massachusetts Institute of Technology (MIT) Media Lab Massachusetts Institute of Technology (MIT) Media Lab, 6, 15, 33, 45-50, 55-56, 57, 68-69, 105, 110, 146--47, 179, 180-84, 185-97 attempts to copy, 194 Digital Life, 203 hidden purpose of, 185-86 industry and, 107, 169-71, 180-84, 202-7 News in the Future, 202-3 organization of, 194, 197 origins of, 185-86 student education in, 187-97 Things That Think, 202-7 unique qualities of, 194-95 Mathews, Max, 36 mature technologies, 10 Maxwell's demon, 175-76 222 Memex, 139 MEMS (Micro-Electro-Mechanical Systems), 72 microencapsulation, 15-16 micropayments, 82 Microsoft, 78, 139, 158, 203 microtubules, 162-63 MIDI, 90 milling machine, 66 mind control as computer interface, 140-41 minicomputers, 52, 138 Minsky, Marvin, 33, 117, 135, 2012 MIT, see Massachusetts Institute of Technology (MIT) modems, 188 money: distinction between atom-dollars and bit-dollars, 83-85 gold standard, 79 pennies, 82 smart money, see smart money as tangible asset, 79, 91 Moog, Bob, 30 Moog synthesizer, 30 Moore, Gordon, 156 Moore's law, 155-57, 163 Motorola, 99-100, 203 mouse, 106, 139, 142-43 multimedia, 109, 110-11 music and computers, 27-44 competing with a Stradivarius, 32-33,39-42 critical reaction to digital instruments, 37-38 designing and writing for smart instruments, 27-44, 143-44, 187 enhancing technology of the cello, 27-29 first electronic music ensemble, 31 limits of instruments and players, 32 MIDI protocol, 90 physics of musical instruments, 40 + INDEX purposes of smart instruments, 33, 42-43 synthesizers, 30-31 Mysterzum Cosmographicum, 113 nanotechnology, 161 Nass, Clifford, 54 National Highway Traffic Safety Administration, 170 National Science Foundation, 172-73, 174, 178 NEC, 170-71 Negroponte, Nicholas, 183, 185, 186 Netscape, 78 networking, 59-60 open standards for, 90 neural networks, 107, 109, 117-20, 164 Newton, Isaac, 131 Newton's theory of gravitation, 114 Nike, 202, 203 Ninety-five Theses, 96, 98 Nixon, Richard, 79 nuclear bombs, 171, 172, 178-79 nuclear magnetic resonance (NMR), 154-55, 160 numerically controlled (NC) mills, 66, 67-68 Office of Scientific Research and Development, 172 Oppenheimer, j.

…

V., 39-40 Reeves, Byron, 54 Reformation, 95-97, 103 religion, 131, 133 research and development, 169-84 applied research, 172, 177, 178, 185 basic research, 172, 174, 177, 178, 185 government role, 171-74 new way to organize inquiry, 180-84 organization in the U.S., 171-74, 180 presumed pathway of, 177 Resnick, Mitchel, 68-70, 146-47, 206 responsibilities in using new technologies, 104 reusable paper, 16-17 Reynolds, Matt, 196, 197 rights: Bill of Things' Rights, 104 Bill of Things Users' Rights, 102 Rittmueller, Phil, 170, 180 Roosevelt, Franklin D., 171, 172 Santa Fe Institute, 118 Satellites, communications, 99-100 Science-The Endless Frontier, 172 search engines, 134 security versus privacy, 57 224 + semiconductor industry, 72 Sensormatic, 153 Shannon,C~ud~5, 128,176,188-90 shoe, computer in a, 50, 52, 102-3, 179 shoplifting tags, 153 Shor, Peter, 158, 159 Silicon Graphics, 140 Simon, Dan, 158 skepticism about technological advances, 122 Small, David, 22-23 Smalltalk, 138 smart cards, 81, 152 smart money, 77-91 cryptography and, 80-81 as digital information, 80 distinction between atom-dollars and bit-dollars, 83-85 freeing money from legacy as tangible asset, 79, 91 global currency market, 83 linking algorithms with money, 86-88 paying-as-you-go, 82 precedent for, 80 standards for, 88-91 smart name badges, 206 Smith, Joshua, 144, 170-71 sociology of science, 119 software, 7, 53, 156 belief in magic bullets, 121 CAD, 73 for children, 138 remarkable descriptions of, 108-9 upgrades, 98, 108-9 Soviet Union, 121-22 speech recognition, 140 spirit chair, 169-70, 179, 193, 202 spread-spectrum coding techniques, 165, 166 standards: computer, 88-90, 126 smart money, 88-91 Stanford Research Institute, 139 INDEX Stanford University, 54 Starner, Thad, 47, 57-58 Steane, Andy, 159 Steelcase, 202, 203, 204 Stradivarius, designing digital instrument to compete with, 32-33,39-42 Strickon, Joshua, 55 Sumitomo, 77 supercomputers, 151, 177, 199 surveillance, 57 Swatch Access watches, 152 Szilard, Leo, 176 technology: Bill of Things' Rights, 104 Bill of Things Users' Rights, 102 daily use of, 58 freedom of technological expression, 103 imposing on our lives, 95, 100-2 invisible and unobtrusive, 44, 200, 211 jargon, 107-22 mature, 10 musical instruments incorporating available, 38 wisdom in old technologies, 19, 24 telemarketing, 95, 101 telephones, 175 access to phone numbers, 100 invasion in our lives, 95, 101 satellite, 99-100 smart cards, 81 widespread dissemination of, 99 television, 10, 99, 202 high-definition, 6 Termen, Lev, 144 Tetzel, Johann, 96 "There's Plenty of Room at the Bottom," 161 thermodynamics, 175, 176 Things That Think, 202-7 privacy and, 207-10 stratification of society and, 210-11 INDEX 3D graphics interface, 141-42 3D printer, 64-65, 70-71 3001: The Final Odyssey (Clarke), 51 Toffoli, Tomaso, 132 transistors: invention of the, 175 study of, 179 Turing, Alan, 127-28, 131, 135, 166 Turing test, 128, 131, 133-34, 135 281, 210-11 Underkoffler, John, 145-46 U.S.

When Computers Can Think: The Artificial Intelligence Singularity
by Anthony Berglas , William Black , Samantha Thalind , Max Scratchmann and Michelle Estes
Published 28 Feb 2015

Decision tables 6. Regression 7. Artificial Neural Networks 1. Introduction 2. Perceptrons 3. Sigmoid perceptrons 4. Using perceptron networks 5. Hype and real neurons 8. 9. 10. 11. 12. 6. Support vector machines 7. Unsupervised learning 8. Competing technologies Speech and Vision 1. Speech recognition 2. Hidden Markov models 3. Words and language 4. 3D graphics 5. Machine vision 6. 3D vs 2.5D 7. Kinetics Robots 1. Automata 2. Robotics 3. Sensing environment 4. Motion Planning 5. Movement and Balance 6. Robocup 7. Other robots 8. Humanistic 9. Robots leaving the factory Programs writing Programs 1.

…

Instead, it contains patterns and images and loose associations that can either be analyzed directly in order to make predictions or be abstracted into symbolic knowledge which can then be reasoned about more deeply. The practical concerns of a robot are then addressed, namely to be able to hear, see and move. Speech recognition is now a practical technology that may see increased usage in small devices that lack keyboards. Machine vision is a critical aspect of understanding the environment in which a robot lives. It is a huge area of research in which much has been achieved but the problem is far from solved. A robot also has to move its limbs and body, which involves several non-trivial problems.

…

The best results can be achieved by using multiple decision trees and then averaging the results. When considering error rates, humans that carefully examine images are said to have an error rate of 0.2%, whereas post office workers quickly sorting mail had an error rate of 2.5%. So all of the automated systems had better than human performance in practice. Speech and Vision Speech recognition One achievement of modern artificial intelligence research is the ability to understand spoken speech. After a little work training a system, people may abandon their keyboards and simply talk to their computers. This is particularly useful in situations for those with busy hands or disabilities.

pages: 407 words: 103,501

The Digital Divide: Arguments for and Against Facebook, Google, Texting, and the Age of Social Netwo Rking
by Mark Bauerlein
Published 7 Sep 2011

For example, if a majority of users start clicking on the fifth item on a particular search results page more often than the first, Google’s algorithms take this as a signal that the fifth result may well be better than the first, and eventually adjust the results accordingly. Now consider an even more current search application, the Google Mobile Application for the iPhone. The application detects the movement of the phone to your ear, and automatically goes into speech recognition mode. It uses its microphone to listen to your voice, and decodes what you are saying by referencing not only its speech recognition database and algorithms, but also the correlation to the most frequent search terms in its search database. The phone uses GPS or cell-tower triangulation to detect its location, and uses that information as well. A search for “pizza” returns the result you most likely want: the name, location, and contact information for the three nearest pizza restaurants.

…

And while some of the databases referenced by the application—such as the mapping of GPS coordinates to addresses—are “taught” to the application, others, such as the recognition of speech, are “learned” by processing large, crowdsourced data sets. Clearly, this is a “smarter” system than what we saw even a few years ago. Coordinating speech recognition and search, search results and location, is similar to the “hand-eye” coordination the baby gradually acquires. The Web is growing up, and we are all its collective parents. >>> cooperating data subsystems In our original Web 2.0 analysis, we posited that the future “Internet operating system” would consist of a series of interoperating data subsystems.

…

. >>> how the web learns: explicit vs. implicit meaning But how does the Web learn? Some people imagine that for computer programs to understand and react to meaning, meaning needs to be encoded in some special taxonomy. What we see in practice is that meaning is learned “inferentially” from a body of data. Speech recognition and computer vision are both excellent examples of this kind of machine learning. But it’s important to realize that machine learning techniques apply to far more than just sensor data. For example, Google’s ad auction is a learning system, in which optimal ad placement and pricing are generated in real time by machine learning algorithms.

pages: 411 words: 98,128

Bezonomics: How Amazon Is Changing Our Lives and What the World's Best Companies Are Learning From It
by Brian Dumaine
Published 11 May 2020

Chapter 7: Sexy Alexa For centuries humans: William of Malmesbury, Chronicle of the Kings of England, Bk. II, Ch. x, 181, c. 1125. The first breakthrough: Melanie Pinola, “Speech Recognition Through the Decades,” PC World, November 2, 2011. Around that time: Andrew Myers, “Stanford’s John McCarthy, Seminal Figure of Artificial Intelligence, Dies at 84,” Stanford Report, October 25, 2011. By the 1980s: Pinola, “Speech Recognition Through the Decades.” A product called Dragon: Ibid. By 2010, computing: Bianca Bosker, “Siri Rising: The Inside Story of Siri’s Origins—and Why She Could Overshadow the iPhone,” Huffington Post, December 6, 2017.

…

He defined it as machines that can perform human tasks, such as understanding language, recognizing objects and sounds, learning, and problem solving. By the 1980s, talking dolls, such as Worlds of Wonder’s Julie, could respond to simple questions from a child, but it wasn’t until the next decade that the first serious speech recognition software hit the market. A product called Dragon could process simple speech without the speaker having to pause awkwardly between each word. Despite this progress, over the next two decades, voice recognition as well as other types of AI programming largely disappointed its supporters, periodically entering into what the academic community dubbed AI winters—periods when progress and funding would dry up.

…

See also AI flywheel health-care industry and, 222 job losses and, 143, 248, 267, 271 Prime Video and, 104 societal and ethical challenges of, 90–91 vision recognition and, 109 warehouse controls using, 128 artists, and automation, 143 ASOS, 9, 97, 117, 194 Atlantic, The (magazine), 172–73, 259 auctions, 42 Audrey speech recognition system, 107 Aurora, 175 Australia, drones used in, 179–80 automation air traffic control with, 179 Amazon employees’ concerns about job loss from, 248 business models and, 270 customer data from shopping with, 191 discontent from threat of, 240 disruptive nature of, 127, 139 economic growth and, 250 grocery stores with, 139–41 job losses and, 9, 12, 126–27, 141–43, 241–42, 248, 267 warehouses with, 124, 128, 129–30, 135, 136–37, 143 automobiles.

pages: 731 words: 134,263

Talk Is Cheap: Switching to Internet Telephones
by James E. Gaskin
Published 15 Mar 2005

If you don't have a fairly recent computer or separate sound card, and have a choice of analog headset or USB headset in the price range you're comfortable with, try the USB headset first. Some experts disagree, but more lean toward USB. * * * Note: Stop MumblingIf you use speech recognition software, or are curious and want to try it, analog headsets will help. One of the strong recommendations by speech recognition software vendors is to use a quality microphone, and the headset in the medium and high range will do an excellent job. * * * 4.2.1. Quick and Cheap (Less Than $30) This category isn't the biggest, but you will have plenty of choices.

…

Everest chat defaults file transfers history instant messaging IRC (Internet Relay Chat) chat within teleconferencing combination Wi-Fi cellular phones conference calling 2nd confidentiality and configuring cordless phone costs Dutch hip-hop and encrypted connection encryption 2nd equipment requirements features firewalls and Forwarder enhancement future features Instant Messaging and file transfer Instant Messaging tricks KaZaA and MoneyBookers and operating systems and password and email, changing PayPal and PDA support Pocket PC 2nd presence feature ringtones signing up SIP versus SkypeOut service Sound Set Up Guide web site support forum Sysgration SkyGenie adapter technical answers web site technical details they don't mention traditional telephone network and (SkypeOut) troubleshooting Voicemail 2nd Vonage versus wireless USB headset Skype Answering Machine (SAM) Skype for Business features SkypeIn service Skypeing, verbified noun SkypeOut 2nd tracking usage SkypePlus SkypeVM softphones definition open source peer-to-peer telephone providers systems sound, measuring speech recognition software SummitCircle web site Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] teleconferences Telemarketer Block Teleo, business-oriented softphone integrated with Microsoft Office telephones adapters bandwidth systems versus Internet Telephony three-way calling Time Warner 2nd TiVo phone links toll-free numbers virtual Touchtone service fee TowerStream WiMAX vendor TPC (The Phone Company) traditional phone companies Internet Telephony from SBC, Qwest, and BellSouth Verizon and AT&T traditional phone services Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Universal Service USB handsets 2nd phones Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Verizon 2nd videophones 2nd services viral marketing virtual numbers 2nd 3rd Viseon VocalTec VoFi (VoIP over Wi-Fi) VoiceGlo USB handset voicemail 2nd Voicemail Skype and VoicePulse 2nd VoiceWing VoIP (Voice over Internet Protocol) VON (Voice on the Net) Vonage 2nd 3rd 911 service and Bandwidth Saver Basic 500 plan business details they don't mention competitors Dashboard interface encryption, lack of firewalls and Great Benefits Help pages rebooting router Skype versus standard features technical details they don't mention troubleshooting Viseon and voicemail voicemail alerts voicemail management Wi-Fi cell phone handset Vonage, Inc.

pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts
by Richard Susskind and Daniel Susskind
Published 24 Aug 2015

Because human beings were believed to understand speech by understanding context, the view was that artificial intelligence would in the end be achieved by, essentially, modelling human intelligence and human beings’ ways of processing information and of thinking about the world around them. This would require systems that had common sense and general knowledge. However, speech recognition was eventually cracked through brute-force processing, massive data retrieval and storage capability, and statistics. This means, for example, that a good speech recognition system that ‘hears’ the sentence ‘my last visit to the office took two hours too long’ can correctly spell the ‘to’, ‘two’, and ‘too’. It can do this not because it understands the context of the usage of these words as human beings do, but because it can determine, statistically, that ‘to’ is much more likely immediately to precede ‘the office’ than ‘two’ or ‘too’.

…

But some weak systems are becoming increasingly capable and can outperform human beings, even though they do not ‘think’ or operate in the same way as we think we do. We learned this many years ago in the context of speech recognition, another branch of AI (or at least it was regarded as such in the early days). As we explain in section 4.9, the challenge of developing systems that could recognize human speech was eventually met through a combination of brute-force processing and statistics. An advanced speech recognition system that can distinguish between ‘abominable’ and ‘a bomb in a bull’ does so not by understanding the broader context of these utterances in the way that human beings do, but by statistical analysis of a large database of documents that confirm, for instance, other words that are likely to be collocated or associated with ‘bull’.

…

The term ‘artificial intelligence’ was coined by John McCarthy in 1955, and in the thirty years or so that followed a wide range of systems, techniques, and technologies were brought under its umbrella (the terms used in the mid-1980s are included in parentheses): the processing and translation of natural language (natural language processing); the recognition of the spoken word (speech recognition); the playing of complex games such as chess (game-playing); the recognition of images and objects of the physical world (vision and perception); learning from examples and precedents (machine learning); computer programs that can themselves generate programs (automatic programming); the sophisticated education of human users (intelligent computer-aided instruction); the design and development of machines whose physical movements resembled those of human beings (robotics), and intelligent problem-solving and reasoning (intelligent knowledge-based systems or expert systems).103 Our project at the University of Oxford (1983–6) focused on theoretical and philosophical aspects of this last category—expert systems—as applied in the law.

Driverless: Intelligent Cars and the Road Ahead
by Hod Lipson and Melba Kurman
Published 22 Sep 2016

In contrast, deep-learning software would focus on the cat’s identifying visual features—perhaps its pointy ears and tail—and would quickly (and correctly) surmise that although the cat appears in an unusual setting, it’s still a cat. Deep learning has transformed the study of artificial perception and is being applied with great success to speech recognition and other activities that require software to deal with information that presents itself in quirky and imperfect ways. In the past few years, in search of deep-learning expertise, entire divisions of automotive companies have migrated to Silicon Valley. Deep learning is why software giants like Google and Baidu, already armed with expertise in managing huge banks of data and building intelligent software, are giving the once-invincible automotive giants a run for their money.

…

We may be finally seeing the resolution of Moravec’s paradox, as roboticists and computer scientists find creative new ways to apply deep learning to automate artificial perception and response. Since 2012, deep learning has given driverless cars the ability to “see,” and has improved the language comprehension of speech-recognition software. In a high-profile demonstration of its power and versatility, in 2016, deep-learning software enabled Google’s AlphaGo program to trounce the world’s best players of go, a board game considered by many to be more challenging than chess. To encourage third-party developers to build intelligent applications using their software tools, Google, Microsoft, and Facebook have each launched their own version of an open source deep-learning development platform.

…

The year following SuperVision’s triumph, the winner achieved a 11.2 percent error rate, with runners-up close behind at 12 percent and 13 percent; all used customized variants of deep-learning convolutional neural networks.10 In 2014, a team from Google achieved 6.66 percent error and a team from the University of Oxford achieved 7.1 percent error.11 In 2015, a team of researchers at Microsoft’s Beijing research lab (led by principal researcher Jian Sun) used a network that was 152-layers deep to win first place in all three categories.12 Most remarkably, Microsoft’s team achieved a 3.57 percent error rate, surpassing for the first time the previously unbeatable 5 percent error rate of human-level perception.13 After these triumphs, suddenly alternative research approaches in machine vision became obsolete. The string of watershed breakthroughs in object recognition quickly spilled out of the computer vision field into all other areas of artificial-intelligence research. Derivatives of the algorithm that ran SuperVision oozed into other AI fields, such as speech recognition and text generation. The final remaining barrier to the development of driverless cars—software capable of artificial perception—finally began to melt away. Soon after this big success, the pieces started coming together. Nvidia launched a deep-learning card that implemented a derivative of Krizhevsky’s Supervision network on low-power hardware.

pages: 486 words: 132,784

Inventors at Work: The Minds and Motivation Behind Modern Inventions
by Brett Stern
Published 14 Oct 2012

One of the products we came out with about a decade after the Speak & Spell was the Julie doll. She was a doll that had speech recognition on it. Stern: And how did that go? Frantz: Well, there were other things that made it die a short death. It happened to be that they brought it out in late 1987, which if you remember, there was the crash in the stock market, and start-ups didn’t do very well through that crash. But I had been working with toy companies for years trying to add speech recognition to their products, and it really came down to this silly notion at that time that speech recognition did not work. Stern: Did the companies know how the technology worked in those situations?

…

Did the companies come to you looking for something? Or were you going to the companies, saying, “I have a great solution.”? Frantz: A little bit of both. I went after companies, saying, “We have this new technology—now what can you do with it?” And companies came to me saying things like, “We have a brilliant idea, and all we need is your speech recognition capability to make it work.” Stern: So back in the day, what was the prior art? Or what else was going on in the industry with this technology? Frantz: Well, that particular one was mostly used in military systems to do specific things, of which you spent more time training the user than training the product.

…

advice artificial vision baseball cards brainstorming chip cloud confidence consumer corporate setting digital signal processing DRAM failures family background and education final words of wisdom firing definition IC-integrated circuit development ideation process inanimate object innovation definition inspiration integrated circuit intrapreneurs Julie doll, speech recognition manufacturing capacity marketplace mentor military systems moral judgment plan to retire positive and the negative balance presentation problem vs. solution professional heroes reputations serial innovator skill set solution and implementation spontaneity Starbucks team effort and responsibility transistor-transistor logic (TTL) US PTO variations G, H, I Gass, S.

Future Files: A Brief History of the Next 50 Years
by Richard Watson
Published 1 Jan 2008

Take children’s dolls as an example. Historically these were inert, rather poor representations of the human form. They are already becoming more realistic and more intelligent. Owners of “Amazing Amanda” can chat with their doll and “intelligence” is available in the form of facial recognition, speech recognition and accessories impregnated with radio-frequency identification devices (RFID). If you’re a bit older (and presumably no wiser) you can even buy a physically realistic, life-sized “love partner” for US$7,000 from a company called realdoll.com. But you ain’t seen nothing yet. In a few years’ time you will be able to personalize your doll’s face (to your own choice or, more likely, to resemble a celebrity), communicate with your doll by telephone or email, have real conversations and experience your entire life history through the eyes, ears (and nose) of your doll.

…

The true test for artificial intelligence dates to 1950, when the British mathematician Alan Turing suggested the criterion of humans submitting statements through a machine and then not being able to tell whether the responses had come from another person or the machine. The 1960s and 1970s saw a great deal of progress in AI, but real breakthroughs failed to materialize. Instead, scientists and developers focused on specific problems such as speech recognition, text recognition and computer vision. However, we may be less than ten years away from seeing Turing’s AI vision become a reality. For instance, a company in Austin, Texas has developed a product called Cyc. It is much like a “chatbot” except that, if it answers Science and Technology 45 a question incorrectly, you can correct it and Cyc will learn from its mistakes.

…

During the twentieth century people were paid to accumulate and apply information. The acquisition and analysis of data are logical left-brain activities, but, as Daniel Pink points out in his book A Whole New Mind, they are activities that are fast disappearing thanks to developments in areas such as computing. For instance, speech recognition and GPS systems are replacing people for taxi bookings, while sites like completemycase.com are giving mediocre lawyers a run for their money. So dump that MBA and get an arts education instead. Better still, do both. One fascinating statistic I came across recently is that 12 years ago 61% of McKinsey’s new US recruits had MBAs.

pages: 118 words: 35,663

Smart Machines: IBM's Watson and the Era of Cognitive Computing (Columbia Business School Publishing)
by John E. Kelly Iii
Published 23 Sep 2013

Cameras capture speakers for video conferencing no matter where they sit or stand in the room—and also record facial expressions and body language. (The video might be analyzed later by the computer system, and insights drawn from it can be given back to the speakers so they see how people reacted to specific moments of a presentation.) Acoustic arrays help with speech recognition and processing. The idea is to create the richest possible environment for people’s interactions with data and one another in order to enhance the group’s collective intelligence—a term used by MIT’s Thomas Malone. Dario says, “The goal is to have the system partner with the humans inside the cognitive environments to transform the decision-making activities of teams that are dealing with very complex problems.”10 Imagine how a large oil company might use cognitive capabilities in such a space a few years from now.

…

Experimental self-driving vehicles are capable of using vision and other senses to navigate through cities without colliding with buses, running over pedestrians, or getting speeding tickets. Yet, as of now, robots remain firmly in the von Neumann computing paradigm. They must be programmed in advance by people to deal with nearly every situation they encounter. Already, learning systems are playing important roles in advances in speech recognition and image recognition. Some voice-recognition systems, for instance, get better at understanding an individual’s manner of speech the more they interact. But the scientific community is just at the beginning of making machines that learn like humans do, so we’re just scratching the surface of the potential of computer sensing.

pages: 559 words: 157,112

Dealers of Lightning
by Michael A. Hiltzik
Published 27 Apr 2000

The Systems Science Lab was to take over development of a laser-driven computer printer whose inventor, a Webster engineer, had come west after failing to interest his bosses in its potential. SSL researchers would also investigate optical memories, a technology that would eventually give rise to today’s compact disc and CD-ROM, and speech recognition by computer. Taylor’s Computer Science Lab was to pursue his pet interest in graphics while developing specifications for a basic center-wide computer system. And GSLwas assigned studies in solid-state technologies, including the electrical and optical qualities of crystals. Pake warned his superiors that under the projected growth curve the Porter Drive complex, which at the time housed everyone comfortably, would certainly burst its seams by the close of 1971.

…

“I just came to introduce you to your next-door neighbor,” Biegelsen said, leading Thornburg into the adjoining warren. “This is George. I thought you guys should get together because you shared a similar research interest in grad school.” Thornburg was perplexed. He understood George to be working on speech recognition and he had come in as a thin-film metallurgist. “Really?” the neighbor asked. “What did you do you work in?” That was all the voluble Thornburg needed to set off on a thorough explication of his doctoral career, not excepting the time he had to change themes in midcourse thanks to the preemptive publication of a thesis on the same subject by a guy from Oregon named George White.

…

Inside the building it was impossible to pass within a few yards of Kay’s door without sensing a gravitational tug. Perhaps his most important recruit was swept into his orbit that way, never to leave. Dan Ingalls had come to PARC on a temporary contract to help George White set up the SDS Sigma 3 he had acquired for his work in speech recognition. “My office ended up across the hall from Alan’s,” Ingalls said. “I kept noticing that I was more interested in what I was hearing across the hall than in the speech work I was hired to do. These conversa-tions I was eavesdropping on were all about open-ended computer science stuff, which I was interested in.

pages: 250 words: 73,574

Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers
by John MacCormick and Chris Bishop
Published 27 Dec 2011

Yet the computers at web search companies are constantly performing these computations. In this chapter, on the other hand, we examine an area in which humans have a natural advantage: the field of pattern recognition. Pattern recognition is a subset of artificial intelligence and includes tasks such as face recognition, object recognition, speech recognition, and handwriting recognition. More specific examples would include the task of determining whether a given photograph is a picture of your sister, or determining the city and state written on a hand-addressed envelope. Thus, pattern recognition can be defined more generally as the task of getting computers to act “intelligently” based on input data that contains a lot of variability.

…

Yet this apparent stumbling block for AI was convincingly eradicated in 1997, when IBM's Deep Blue computer beat world champion Garry Kasparov. Meanwhile, the success stories of AI were gradually creeping into the lives of ordinary people too. Automated telephone systems, servicing customers through speech recognition, became the norm. Computer-controlled opponents in video games began to exhibit human-like strategies, even including personality traits and foibles. Online services such as Amazon and Netflix began to recommend items based on automatically inferred individual preferences, often with surprisingly pleasing results.

…

See also digital signature; RSA select operation server; secure SHA Shakespeare, William Shamir, Adi Shannon, Claude Shannon-Fano coding shared secret; definition of; length of shared secret mixture shorter-symbol trick signature: digital (see digital signature); handwritten Silicon Valley simple checksum. See checksum simulation: of the brain; of random surfer Singh, Simon SizeChecker.exe Sloane, N. J. A. smartphone. See phone snoop social network software; download; reliability of; signed software engineering sources spam. See also web spam speech recognition spirituality spreadsheet SQL staircase checksum. See checksum Stanford University Star Trek statistics Stein, Clifford stochastic gradient descent Strohman, Trevor structure: in data; in a web page. See also database, table structure query sunglasses problem. See neural network supercomputer support vector machine surfer authority score symbol table.

pages: 269 words: 70,543

Tech Titans of China: How China's Tech Sector Is Challenging the World by Innovating Faster, Working Harder, and Going Global
by Rebecca Fannin
Published 2 Sep 2019

The Ministry of Science and Technology in China has earmarked specialties for each of these Chinese tech titans in its master plan for AI global dominance: Baidu for autonomous driving, Alibaba for smart-city initiatives, and Tencent for computer vision in medical diagnoses. The Chinese government also has designated two startups to lead AI development: SenseTime for facial recognition and iFlytek for speech recognition. Baidu, Alibaba, and Tencent are all powering up in autonomous driving, and each has a specialty focus area in AI. Baidu has its DuerOS line of smart household goods and Apollo, an open platform for self-driving technology solutions, and detoured on the AI journey several years before Google in 2015.

…

China has an advantage based on large numbers of well-trained AI talent, a supportive government policy, and access to a vast amount of data sets powering AI and gleaned from China’s world-leading number of internet and mobile phone users, he notes. In the age of AI, data is the new oil, so China is the new Saudi Arabia, says Lee, author of AI Superpowers.4 His venture investment firm in Beijing, Sinovation Ventures, which I’ve visited multiple times, is betting on AI’s future. Lee, who is widely known for his pioneering work in speech recognition and artificial intelligence, is an investor in five Chinese AI companies worth more than $1 billion. Two that are in the forefront are Megvii, a Chinese developer of facial recognition system Face++, and 4Paradigm, a machine learning software for detecting fraud in insurance and banking. I’ve known Lee since 2006, when he was running Google China, and I’ve watched his career flourish as a China tech investor from starting Sinovation Ventures in 1999 and as a world-leading AI expert.

…

The AI-based edtech startup was founded in 2012 by Wang Yi, a Princeton PhD in computer science and former Google product manager in charge of analytics and cloud computing. Yi’s startup is disrupting the online education sector by helping Chinese people learn to speak English through AI-powered interactive, customized courses accessed on mobile phones. Its AI technology crunches data to feed a speech recognition engine that can provide feedback on pronunciation, grammar, and vocabulary. This being China, LAIX integrates games and social sharing into its mobile app to make for a more fun, interactive learning experience. Riding high on China’s growing trend toward online knowledge platforms, LAIX attracted 110 million registered users in 2018, including 2.5 million who paid for courses for the full year.

Succeeding With AI: How to Make AI Work for Your Business
by Veljko Krunic
Published 29 Mar 2020

An ML pipeline could span the whole community Whole communities and subfields of ML are formed around the pipelines that emerge early, showing that ossification of the ML pipeline can affect not only a single organization, but a whole community. An example is the NLP community, which often uses a relatively standardized form of the pipeline. Historical efforts of the speech recognition community also have led to a standardized pipeline [90]. In some situations, whole AI communities might be facing the possibility that the current standard pipeline needs to be changed. For example, the advance of deep learning caused a need for significant changes in the traditional speech recognition pipeline [90]. The ML pipeline’s ossification is compounded if data science and data engineering are in separate groups in the organization.

…

This section explains those new capabilities. What’s new with AI and big data is that automated analysis has become cheaper, faster, better, and (using big data systems) capable of operating on much larger datasets. Analysis that used to require human involvement is now possible to do with computers in areas like image and speech recognition. Thanks to this new AI-powered capability, whole Sense/Analyze/React loops became viable in these contexts, when it wasn’t economical to apply them before. Examples of AI making automation viable The following are some examples where an introduction of AI makes it possible to automate tasks that previously required a human to perform them:  Automated translations from one language to another—Language translation is nothing new and is something that humans have done since the beginning of time.

…

See machine learning (ML) pipelines mlr package 131 Modified National Institute of Standards and Technology (MNIST) 126–127 monotonic profit curves 154, 156 moonshots 58, 76–77 N natural language processing (NLP) 157–158, 202, 206–207 AutoML 211 machine learning pipeline 116–120 Nest 30, 198 Ng, Andrew 5 non-monotonic profit curves 154, 156, 158–159 non-unique (ambiguous) profit curves 155, 157 nonlinearity 176–178 O organizational silos 108–109 ossification of machine learning pipeline 118–120 addressing 123–125 causes of 118–120 example of 121–123 P pet monitors 43 pre-segmentation 151 product AI as fully autonomous product 41–42 AI as part of larger 38, 40–42 evolution of capabilities of 42 packaging AI as 38, 43–44 wide applicability of 44–45 profit curve accounting for influence of time on 187–188 arguments against 109 better defined by data science team 103–104 constructing 99–102 defined 97–98 evolution of over time 186–187 improvement over time 107 in academia 102–103 nonlinearity (convexity) in 176–178 not limited to supervised learning 106 precision depends on business problem 106 sophistication of mathematical analysis 100 profit curves in MinMax analysis categories of 157 categories of profit curves 154 complex profit curves 157–159 mental calculations 159 Provost, F. 102 pseudo experiments 180 publishing industry 66, 71 R recommendation engines 17, 32, 83–84 reinforcement learning 205 research questions defining 69–73 best practices for business leaders 72–73 INDEX contractual language of technical domain 70–72 misaligned business and research questions 72 understanding business concepts 70 linking business problems and business decisions based on technical metric 88–90 questions answered by metric 87–88 right business metric 93–94 right research question 87 understandability of business metric 91–93 linking business questions and 70–73, 85–94 Roomba 41–42 root mean square error (RMSE) metric 19–20, 88–93, 99–101 rule engines 144, 199, 211 S safety and security issues 41–42, 117–118, 198–200 autonomous vehicles 198 disagreement between AI engineering and safety engineering 210 heuristics for building safe systems 199 importance of human involvement 200 in MinMax analysis 154 local vs. global models 198–199 non-monotonic profit curves 156 ossification of machine learning pipeline 121–123 Sculley, D. 118 Sense/Analyze/React loop 29–31 AI methods and data 63–65 applicability of 30–32 elements of 29–30 finding business questions for AI to answer 60–62 monetization with 33–37 not limited to AI 213 prioritizing projects 59 speed in closing 30 sensitivity analysis 166–172, 175–185 CLUE 172–175 common critiques to 181–183 defined 166 design of experiments 180–181 detecting nonlinearity 176–178 enhancing quality of data analyzing data-producing stage 184 collapsing two stages of pipeline into one 184 example of using results 171–172 global sensitivity analysis 170–171 increasing/decreasing accuracy by 1% 169 interactions between pipeline stages addressing 179 effect of 179 introducing errors 185 local sensitivity analysis 167–170 MinMax analysis vs. 138 not limited to AI 214 recent advancements in field of 184–185 supervised vs.unsupervised learning 183 sentiment analysis 32 smart parking meters 142–146 smart speakers 40 smart thermostats 30, 38, 198 smart, internet-connected oven 64–65 speech recognition 31, 120 stock market investments 137 streaming analytics 30 Support-Vector Machines (SVMs) 129, 134 T team dynamics 128–129 technical metrics 17 business metrics vs. 20 escaping into the wild 89 linking to business metrics 19–20, 96–106 need for 97 poorly understood 91–93 presenting directly 88–90 technology smokescreens 91 Tesla 202 three-state, yes/no/maybe classification of results 191–192 threshold 67 timebox approach 191 timing diagrams 188–190 transplant projects 75–76 trend estimates 152–153 263 264 U uncanny valley concept 156 unicorns 47–50 acquiring skillsets of 48 data engineers 48–49 data science 47–48 gap analysis 49–50 INDEX MinMax analysis 146–147 selecting 101–102, 132, 143 sensitivity analysis 172, 181–183 timing diagrams 189 video surveillance systems 62, 78 ossification of machine learning pipeline 121–123 voice recognition 35, 206 V W vacuum cleaning robots 38, 41–42 value threshold defined 98 evolution of over time 186–187, 189 improving machine learning pipeline over time 187–188 Wikimedia Foundation 197 Z zero-shot learning 205 DATA SCIENCE/AI Succeeding with AI See first page Veljko Krunic S ucceeding with AI requires talent, tools, and money.

pages: 416 words: 118,522

Why Machines Learn: The Elegant Math Behind Modern AI
by Anil Ananthaswamy
Published 15 Jul 2024

By the time Nagy arrived at Cornell, Rosenblatt had already built the Mark I Perceptron; we saw in the prologue that Rosenblatt had done so in 1958, leading to the coverage in The New York Times. Nagy began working on the next machine, called Tobermory (named after the talking cat created by H. H. Munro, aka Saki), a hardware neural network designed for speech recognition. Meanwhile, the Mark I Perceptron and Rosenblatt’s ideas had already garnered plenty of attention. In the summer of 1958, the editor of the Cornell Aeronautical Laboratory’s Research Trends magazine had devoted an entire issue to Rosenblatt (“because of the unusual significance of Dr. Rosenblatt’s article,” according to the editor).

…

The large network required GPUs; by then, these came equipped with software called CUDA, a programming interface that allowed engineers to use GPUs for general-purpose tasks beyond their intended use as graphics accelerators. Mnih wrote another package atop CUDA, called CUDAMat, to “make it easy to perform basic matrix calculations on CUDA-enabled GPUs.” A year later, two other students of Hinton’s successfully used CUDAMat to program deep neural networks to make breakthroughs in speech recognition. GPUs were obviously crucial for unleashing the power of these networks, but not everyone recognized this. Hinton recalls trying to persuade Microsoft to buy GPUs for a common project, but Microsoft balked. Hinton told the company, tongue in cheek, that his team could afford GPUs because he was at a rich Canadian university and that Microsoft was a “poor impoverished company,” so it was understandable that it “couldn’t afford them.”

…

Their small neural network had seemingly grokked the data. Grokking is just one of many odd behaviors demonstrated by deep neural networks. (We’ll look at it in more detail later in this chapter.) Another has to do with the size of these networks. The most successful neural networks today, whether they are doing image or speech recognition or natural language processing, are behemoths: They have hundreds of millions or billions of weights, or parameters; maybe even a trillion. The parameters can at times equal or vastly outnumber the instances of data used to train these networks. Standard ML theory says that such networks shouldn’t work the way they do: They should simply overfit the data and fail to make inferences about, or “generalize to,” new, unseen data.

pages: 144 words: 43,356

Surviving AI: The Promise and Peril of Artificial Intelligence
by Calum Chace
Published 28 Jul 2015

The most obvious example is your smartphone. It is probably the last inanimate thing you touch before you go to sleep at night and the first thing you touch in the morning. It has more processing power than the computers that NASA used to send Neil Armstrong to the moon in 1969. It uses AI algorithms to offer predictive text and speech recognition services, and these features improve year by year as the algorithms are improved. Many of the apps we download to our phones also employ AI to make themselves useful to us. The AI in our phones becomes more powerful with each generation of phone as their processing power increases, the bandwidth of the phone networks improve, cloud storage becomes better and cheaper, and we become more relaxed about sharing enough of our personal data for the AIs to “understand” us better.

…

Andrej Markov was a Russian mathematician who died in 1922 and in the type of model that bears his name the next step depends only on the current step, and not any previous steps. A Hidden Markov Model (often abbreviated to HMM because they are so useful) is one where the current state is only partially observable. They are particularly useful in speech recognition and handwriting recognition systems. Deep learning Deep learning is a subset of machine learning. Its algorithms use several layers of processing, each taking data from previous layers and passing an output up to the next layer. The nature of the output may vary according to the nature of the input, which is not necessarily binary, just on or off, but can be weighted.

pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life
by Adam Greenfield
Published 29 May 2017

Chief among these are a multi-core central processing unit; a few gigabits of nonvolatile storage (and how soon that “giga-” will sound quaint); and one or more ancillary chips dedicated to specialized functions. Among the latter are the baseband processor, which manages communication via the phone’s multiple antennae; light and proximity sensors; perhaps a graphics processing unit; and, of increasing importance, a dedicated machine-learning coprocessor, to aid in tasks like speech recognition. The choice of a given chipset will determine what operating system the handset can run; how fast it can process input and render output; how many pictures, songs and videos it can store on board; and, in proportion to these capabilities, how much it will cost at retail. Thanks to its Assisted GPS chip—and, of course, the quarter-trillion-dollar constellation of GPS satellites in their orbits twenty million meters above the Earth—the smartphone knows where it is at all times.

…

The material form of such speakers is all but irrelevant, though, as their primary job is to function as the physical presence of and portal onto a service—specifically, a branded “virtual assistant.” Google, Microsoft, Amazon and Apple each offer their own such assistant, based on natural-language speech recognition; no doubt further competitors and market entrants will have appeared by the time this book sees print. Almost without exception, these assistants are given female names, voices and personalities, presumably based on research conducted in North America indicating that users of all genders prefer to interact with women.7 Apple’s is called Siri, Amazon’s Alexa; Microsoft, in dubbing their agent Cortana, has curiously chosen to invoke a character from their Halo series of games, polluting that universe without seeming to garner much in return.

…

And if this sense of effortlessness will never truly be achievable via handset, it is precisely what an emerging class of wearable mediators aims to provide for its users. The first of this class to reach consumers was the ill-fated Google Glass, which mounted a high-definition, forward-facing camera, a head-up reticle and the microphone required by its natural-language speech recognition interface on a lightweight aluminum frame. While Glass posed any number of aesthetic, practical and social concerns—all of which remain to be convincingly addressed, by Google or anyone else—it does at least give us a way to compare hands-free, head-mounted AR with the handset-based approach.

pages: 382 words: 120,064

Bank 3.0: Why Banking Is No Longer Somewhere You Go but Something You Do
by Brett King
Published 26 Dec 2012

UBank has the second-highest Facebook fan-base among the Australian banks, and ironically has more than its parent, NAB. The same can be said for First Direct in the UK, which has fantastic advocacy and is heavily engaged on Twitter, whereas its parent, HSBC, doesn’t yet have a brand Twitter account. In Bank 2.0 I also talked about the use of advances in speech recognition, enabling customers to issue spoken commands. Obviously the next generation of this speech recognition technology can be seen in Apple’s recent launch of Siri. In Siri’s patent application, various possibilities are hinted at, including being a voice agent providing assistance for “automated teller machines”.4 In fact, SRI (the creator of Siri™) and BBVA recently announced a collaboration to introduce Lola5, a Siri-like technology, to customers through the Internet and via voice.

…

They found the man’s conversation with his phone ‘creepy,’ without any of the natural pauses and voice inflections that occur in a discussion between two people.” —New York Times article, “Oh, for the Good Old Days of Rude Cellphone Gabbers”, 2 December 2011 The same problem presents itself with the use of IVR in the near term. Speech recognition is not yet good enough for natural speech. But clearly it’s getting better, and fast. Avatars replacing IVRs The key advantage to integrating natural speech recognition to replace current IVR menus will be that IVRs will start to become more human again. The logical extension of this technology married with avatars are automated customer service representatives that will look and sound like a real person and be able to answer simple “canned” questions and respond to issues such as “What is my account balance?”

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies
by Nick Bostrom
Published 3 Jun 2014

There are robotic pets and cleaning robots, lawn-mowing robots, rescue robots, surgical robots, and over a million industrial robots.64 The world population of robots exceeds 10 million.65 Modern speech recognition, based on statistical techniques such as hidden Markov models, has become sufficiently accurate for practical use (some fragments of this book were drafted with the help of a speech recognition program). Personal digital assistants, such as Apple’s Siri, respond to spoken commands and can answer simple questions and execute commands. Optical character recognition of handwritten and typewritten text is routinely used in applications such as mail sorting and digitization of old documents.66 Machine translation remains imperfect but is good enough for many applications.

…

The DART tool for automated logistics planning and scheduling was used in Operation Desert Storm in 1991 to such effect that DARPA (the Defense Advanced Research Projects Agency in the United States) claims that this single application more than paid back their thirty-year investment in AI.68 Airline reservation systems use sophisticated scheduling and pricing systems. Businesses make wide use of AI techniques in inventory control systems. They also use automatic telephone reservation systems and helplines connected to speech recognition software to usher their hapless customers through labyrinths of interlocking menu options. AI technologies underlie many Internet services. Software polices the world’s email traffic, and despite continual adaptation by spammers to circumvent the countermeasures being brought against them, Bayesian spam filters have largely managed to hold the spam tide at bay.

…

One can readily imagine improved versions of this technology—perhaps a next-generation implant could plug into Broca’s area (a region in the frontal lobe involved in language production) and pick up internal speech.73 But whilst such a technology might assist some people with disabilities induced by stroke or muscular degeneration, it would hold little appeal for healthy subjects. The functionality it would provide is essentially that of a microphone coupled with speech recognition software, which is already commercially available—minus the pain, inconvenience, expense, and risks associated with neurosurgery (and minus at least some of the hyper-Orwellian overtones of an intracranial listening device). Keeping our machines outside of our bodies also makes upgrading easier.

pages: 533

Future Politics: Living Together in a World Transformed by Tech
by Jamie Susskind
Published 3 Sep 2018

AI systems are now close to surpassing humans in their ability to translate natural languages, recognize faces, and mimic human speech.3 Self-driving vehicles using AI are widely expected to become commonplace in the next few years (Ford is planning a mass-market model by 2021).4 In 2016 Microsoft unveiled a speech-recognition AI system that can transcribe human conversation with the same number of errors, or fewer, as professional human transcriptionists.5 Researchers at Oxford University developed an AI system capable of lip-reading with 93 per cent accuracy, as against a 60 per cent success rate among professional lip-readers.6 AI systems can already write articles about sports, business, and finance.7 In 2014, the Associated Press began using algorithms to computerize the production of hundreds of formerly handcrafted earnings reports, producing fifteen times as many as before.8 AI systems have directed films and created movie trailers.9 AI ‘chatbots’ (systems that can ‘chat’ to you) will soon be taking orders at restaurants.10 OUP CORRECTED PROOF – FINAL, 26/05/18, SPi РЕЛИЗ ПОДГОТОВИЛА ГРУППА "What's News" VK.COM/WSNWS Increasingly Capable Systems 31 Ominously, engineers have even built an AI system capable of writing entire speeches in support of a specified political party.11 It’s bad enough that politicians frequently sound like soulless robots; now we have soulless robots that sound like politicians.

…

Taking one specific set of affordances, there are many people who are currently considered disabled whose freedom of action will be significantly enlarged in the digital lifeworld. Voice-controlled robots will do the bidding of people with limited mobility. Self-driving vehicles will make it easier to get around.Those unable to speak or hear will be able to use gloves that can turn sign language into writing.18 Speech recognition software embedded in ‘smart’ eyewear could allow all sounds—speech, alarms, sirens—to be captioned and read by the wearer.19 Brain interfaces will allow people with communication difficulties to ‘type’ messages to other people using only their thoughts.20 OUP CORRECTED PROOF – FINAL, 30/05/18, SPi РЕЛИЗ ПОДГОТОВИЛА ГРУППА "What's News" VK.COM/WSNWS 170 FUTURE POLITICS As for freedom of thought, in the short time we’ve lived with digital technology we’ve already witnessed an explosion in the creation and communication of information.

…

Peter Campbell, ‘Ford Plans Mass-market Self-driving Car by 2021’, Financial Times, 16 August 2016 <https://www.ft.com/content/ d2cfc64e-63c0-11e6-a08a-c7ac04ef00aa#axzz4HOGiWvHT> (accessed 28 November 2017); David Millward, ‘How Ford Will Create a New Generation of Driverless Cars’, Telegraph, 27 February 2017 <http://www.telegraph.co.uk/business/2017/02/27/ford-seeks- pioneer-new-generation-driverless-cars/> (accessed 28 November 2017). 5. Wei Xiong et al.,‘Achieving Human Parity in Conversational Speech Recognition’, arXiv, 17 February 2017 <https://arxiv.org/abs/ 1610.05256> (accessed 28 November 2017). 6. Yannis M. Assael et al.,‘LipNet: End-to-End Sentence-level Lipreading’, arXiv, 16 December 2016 <https://arxiv.org/abs/1611.01599> (accessed 6 December 2017). 7. Laura Hudson, ‘Some Like it Bot’, FiveThirtyEight, 29 September 2016 <http://fivethirtyeight.com/features/some-like-it-bot/> (accessed 28 November 2017). 8.

pages: 509 words: 132,327

Rise of the Machines: A Cybernetic History
by Thomas Rid
Published 27 Jun 2016

A third problem identified by Licklider was the speedy storage and retrieval of vast quantities of information and data. Licklider suspected that graphical interfaces and speech recognition would be highly desirable. A military commander, for instance, would need fast decisions. The notion of a ten-minute war would be overstated, yes, but it would be dangerous to assume that leaders would have more than ten minutes for critical decisions in wartime. Only speech recognition was fast enough as a human-machine interface; an officer in battle or a senior executive in a company could hardly be taken “away from his work to teach him to type,” Licklider quipped.

…

Only speech recognition was fast enough as a human-machine interface; an officer in battle or a senior executive in a company could hardly be taken “away from his work to teach him to type,” Licklider quipped. It would probably take five years, he concluded in 1960, to achieve practically significant speech recognition on a “truly symbiotic level” of real-time man-machine interaction.84 In 1962, Licklider moved on to the Pentagon’s Advanced Research Projects Agency, ARPA. He became the first director of ARPA’s newly founded Information Processing Techniques Office, a research and funding organization tasked to improve military command-and-control systems. At ARPA, Licklider continued to work toward improved man-machine communication. He especially supported university-based research projects working on time-sharing over long distances, just as the air force had done.

…

(Ken Goffman), 185, 218 60 Minutes, 335–37 Skipjack (Clipper algorithm), 275–76 sleep, 142–43 smart weapons, 300 Socialist Review, 151 social networking, 192 social science, 51 Solaris operating systems, 314 Solar Sunrise, 314–15 “Some Moral and Technical Consequences of Automation” (Wiener), 120–21 Sonoma State University, 222 “Sorcerer’s Apprentice, The” (Goethe), 93–95 South Carolina Research Authority, 319 Sovereign Individual, The (Rees-Mogg and Davidson), 285–87 Soviet Union Albe Archer exercise, 208 and automation, 110 collapse of, 246 first nuclear detonation, 75–76 and space race, 126, 127, 142 Sputnik I, 123 space; See also cyberspace electronic vs. outer, ix–x helmet-mounted sights, 198–206 space race, 123 space travel, 123, 125–28, 142–43 Spacewar video game, 181–84 speech recognition, 146 Sperry, Elmer Ambrose, 13–14 Sperry, Roger Wolcott, 65 Sperry cams, 22–23 Sperry Gyroscope, 13–14 and automated anti-aircraft weapons, 37–38 and automation, 110 and control systems, 36–37 Alfred Crimi’s work for, 14–16 and gun directors, 23 Spirit of St. Louis, 11 spirituality, cybernetics and, 348 Sputnik I, 123 Stalingrad, 35 Stanford University, 68, 181 Star Wars (film), 199–200, 204, 295–96 State Department, US, 276 Stenger, Nicole, 232–35 Steps to an Ecology of Mind (Bateson), 174–80, 227 Sterling, Bruce, 232, 242 Stibitz, George, 30, 33, 34 Stimpy (hacker), 315 Stimson, Henry, 74 Stoll, Clifford, 308 Stone, Allucquère Rosanne, 234 Strategic Information and Operations Center (SIOC), 321, 333 Strauss, Erwin, 287 structural unemployment, 109 Stuka dive-bombers, 24 subconscious mind, 163 subservience, intelligence and, 72 suffering, human, 90–91 Sun Microsystems, 314 super cockpit, 199, 202, 206 supercomputers, 77, 78, 146, 187, 324 surveillance, domestic vs. foreign, 273 survival, brain and, 63 symbiosis, 145–46 systems and cybernetic analysis, 51 and environment, 57–61, 63–67 taboos, 89–92 TACOM, See Tank-automotive and Armaments Command tactile sensation/feedback, 129–30, 135, 138 Tank-automotive and Armaments Command (TACOM), 132, 133, 135 Tanksley, Mike, 305 taxation, 267, 286 Technical Memo Number 82, 253 technological evolution, biological evolution and, 121–22 technology myths, xiv–xvi Telecommunications Act of 1996, 244 teledildonics, 235–37 telephone, 22 Tenenbaum, Ehud “The Analyzer,” 315 Tenet, George, 307 Terminal Compromise (Schwartau), 308 Terminator (film), 154, 155 Terminator 2 (film), 154 terms, new, 349–50 Texas Towers, 77 theodicy, 91 theoretical genetics, 119 “Theory of Self-Reproducing Automata” (von Neumann lectures), 115–16 Third Reich, 43 Thule, Greenland, early warning site, 99 thyraton tube, 27 Tibbets, Paul, 45 Tien, Lee, 270 Time magazine, 3–4, 53, 306 time-sharing, 146, 147, 182 TiNi, 241 Tizard, Sir Henry, 19 Toffler, Alvin, 308–10 Toffler, Heidi, 308–9 Tomahawk cruise missile, 303 TRADOC (US Army Training and Doctrine Command), 299; See also Field Manual (FM) 100-5 Tresh, Tom, 164–65 tribes/tribalism, 193 Trinity College, Oxford, 148 Trips Festival, 193 True Names (Vinge) and crypto anarchy, 258–59, 292–93 and cyberspace, 206–8, 212 and cypherpunk, 265, 266 and Habitat, 229 as inspiration for HavenCo Ltd., 288 Tim May and, 258–59 “Truly SAGE System, The”(Licklider), 144 Truman, Harry S., 75 TRW, 238 Tuve, Merle, 28 2001: A Space Odyssey (Clarke), 120–22 2001: A Space Odyssey (film), 149, 343 Übermensch, 140, 291 “Ultimate Offshore Startup, The” (Wired article), 289 ultraintelligent machines, 148–49 unemployment automated factories and, 109–10 automation and, 83, 100 cybernation and, 104 United Arab Emirates, 315 United Kingdom, 317 US Air Force automated defense systems, 71 cyberspace research, 196–206 forward-deployed early warning sites, 99 military cyborg research, 128–29 and virtual space, 195 US Army and cyborg research, 131–32 SCR-268 radar, 18 and V-2 missile, 43–72 US Army General Staff, 11 US Army Medical Corps, 85 US Army Natick Laboratories, 137 US Army Training and Doctrine Command (TRADOC), 299 US Court of Appeals for the Ninth Circuit, 276 US Navy Office of Naval Research, 136–37 US Pacific Command (PACOM), 311–13 US Secret Service, 238–39 unit key, 274 University of Birmingham, 19 University of California at Berkeley, 168, 172 University of Cincinnati, 317 University of Pennsylvania, 114 University of Texas, 231 University of Toronto, 316, 320 utopia in 1960s–1990s view of cybernetics, 5 dystopia vs., 6 and Halacy’s vision of cyborgs, 141 as mindset of 1950s research, 117–18 thinking machines and, 4 V-1 (Vergeltungswaffe 1) flying bomb, 39–42 V-2 ballistic missile, 43–44, 73 vacuum tubes, 27, 28, 96, 114 Valley, George, 76, 79–82 Valley Committee (Air Defense Systems Engineering Committee), 76 Vatis, Michael, 321, 334 VAX computer, 191, 194, 200 VCASS (visually coupled airborne systems simulator), 198–206 vehicles, cyborgs vs., 133 Viet Cong, 131 Vietnam War, 295 aftereffects of, 298–99 amputees, 142 and cyborg research, 131–32 and fourth-generation fighter-bombers, 197 and smart weapons, 300 Vinge, Vernor and crypto anarchy, 292–93 and cybernetic myth, 344 and cyberspace, 206–8 and cypherpunk, 265, 266 William Gibson and, 212 and Habitat, 229 as inspiration for HavenCo Ltd., 288 on limitations of IO devices, 228–29 Tim May and, 258–59 and singularity, 149 violence, 267, 285–86 Virginia Military Institute, 270 virtual reality (VR), 220–21 and Cyberthon, 240–43 Jaron Lanier and, 212–19 VCASS, 198–206 virtual space in 1980s, 195–96 and cyberwar, 304–5 and military research, 196–206 in science fiction, 206–8 viruses, 115, 150; See also computer viruses visually coupled airborne systems simulator (VCASS), 198–206 von Bertalanffy, Ludwig, 52 von Braun, Wernher, 43 von Foerster, Heinz, 52 Vonnegut, Kurt, 86–87, 129 von Neumann, John, 52 on brain–computer similarities, 114 and cybernetic myth, 344 and ENIAC, 114–15 and Player Piano, 87 and self-replicating machines, 118 virus studies, 115–16 VPL DataGlove, 226 VPL Research, 213–16, 241, 243 VR, See virtual reality VT (variable-time) fuse, 26–27, 40, 41, 67 Walhfred Anderson (fictional character), 87–88 Walker, John, 219–20, 225 “walking truck” (quadruped cyborg), 134–35 Wall Street Journal, 221 Walter, W.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists
by Gary Marcus and Jeremy Freeman
Published 1 Nov 2014

The successful phase resetting of neuronal oscillations provides time constants (or optimal temporal integration windows) for parsing and decoding speech signals. It has been shown recently in both behavioral and physiological experiments that eliminating such oscillatory phase-resetting operations compromises speech intelligibility. Such studies connect the neural infrastructure provided by neural oscillations to well-known perceptual challenges in speech recognition. An emerging generalization suggests that acoustic signals must contain an “edge,” that is, an acoustic discontinuity that the listeners use to chunk the signal at the appropriate temporal granularity. Acoustic edges in speech are likely to play an important causal role in the successful perceptual analysis of complex auditory signals, and this type of perceptual analysis is closely linked to the existence and causal force of cortical oscillations.

…

In the 1990s, journals and conferences were filled with demonstrations that showed how it was supposedly possible to capture simple cognitive and linguistic phenomena in any number of fields (such as models of how children acquired English past-tense verbs). But as Steven Pinker and I showed, the details were rarely correct empirically; more than that, nobody was ever able to turn a neural network into a functioning system for understanding language. Today neural networks have finally found a valuable home—in machine learning, especially in speech recognition and image classification, due in part to innovative work by researchers such as Geoff Hinton and Yann LeCun. But the utility of neural networks as models of mind and brain remains marginal, useful, perhaps, in aspects of low-level perception but of limited utility in explaining more complex, higher-level cognition.

…

Many neurons specialize in detecting low-level properties of images, and some neurons that are further up the chain of command represent more abstract entities, like faces versus houses, and in some instances, even particular individuals (most notoriously, Jennifer Aniston, in work by Itzhak Fried, Christof Koch, and their collaborators). The “Aniston” cells even seem to respond cross-modally, responding to written words as well as to photographs. Hierarchies of feature detectors have now also found practical application, in the modern-day neural networks that I mentioned earlier, in speech recognition and image class ific ation. So-called deep learning, for example, is a successful machine-learning variation on the theme of hierarchical feature detection, using many layers of feature detectors. But just because some of the brain is composed of feature detectors doesn’t mean that all of it is.

pages: 287 words: 95,152

The Dawn of Eurasia: On the Trail of the New World Order
by Bruno Macaes
Published 25 Jan 2018

As if to confirm my theory, I would be going to the Almaty Opera that night to see a performance of Bizet’s The Pearl Fishers, whose action takes place on the beach in Ceylon and whose main character is a priestess of Brahma called Leila. 5 Chinese Dreams TECHNO ORIENTALISM ‘But you need an algorithm for English and another for Chinese in speech recognition, so machines will have their own national identity.’ My interlocutor smiled. ‘Not at all. The algorithm is pretty much universal.’ ‘What do you mean?’ ‘It learns to recognize speech and the learning process works equally for every language. Feed it the data and it will learn Latin or Sanskrit.

…

Eventually, the top layer yields the output: a dog match in the example above. In this, machine intelligence comes to resemble the way a large array of neurons works in the human brain. Speech and image recognition are among the most immediate applications of deep learning. Yuanqing told me how Baidu had been able to develop practically infallible speech recognition applications, even if the user chooses to whisper to his device rather than speak. They were now concentrating their efforts on how to apply deep learning to automated driving. Applying it to prediction systems still lies considerably in the future, but the future is getting closer each day.

…

Be that as it may, the point will serve to illustrate how two scientific civilizations may differ substantially. Patterns and rules dictated by a scientific culture are still dependent on the everyday world from which they are abstracted. If you walk in a Chinese city today, applications from deep learning can be seen all around you. Speech recognition software is so reliable that lots of young people now dictate their university essays. If you take a picture of some object that has caught your fancy, special software can take you directly to a website selling it. If you have a car accident, it is easy to pull out your smartphone, take a photo, and use image recognition to determine the damage and file an insurance claim.

pages: 352 words: 96,532

Where Wizards Stay Up Late: The Origins of the Internet
by Katie Hafner and Matthew Lyon
Published 1 Jan 1996

It could “teach the kids French” and “continue teaching them, while they sleep.” At the advertised price of $4,000, the thing seemed a steal. Phil Karlton of Carnegie-Mellon was the first to alert the Msg-Group, on May 26, 1977. His site on the ARPANET was heavily involved in exploring artificial intelligence, speech recognition, and related research problems, so he knew a thing or two about robots. The android and its inventor had attracted a fair amount of national press attention, most of it favorable. Quasar’s sales pitch had also caught the attention of Consumer Reports, which ran a skeptical item on it in the June issue, just out.

…

Brian Reid and a colleague, Mark Fox, from the Carnegie-Mellon Artificial Intelligence Lab, posted an offbeat report to everyone in the MsgGroup, giving them a personal account of their inspection of the domestic robot, “Sam Strugglegear,” at a large department store in downtown Pittsburgh. People in the research community, knowing of CMU’s pioneering AI work, had been calling the Lab to ask how it was possible for Quasar’s robot to be so much better at speech recognition than anything CMU had produced. Rising to the challenge, a four-member team from CMU had done the fieldwork. “They found a frightening sight,” reported Reid and Fox. In the men’s department, among the three-piece suits, was a five-feet-two-inch “aerosol can on wheels, talking animatedly” to a crowd.

…

Sloan School of Management smileys Snuper Computer sound localization Southern California, University of, Information Sciences Institute (ISI) Soviet Union perceived threat of nuclear attack from Sputnik launched by technological race between U.S. and space exploration Space Physics Analysis Network (SPAN) speech recognition speed-reading Sputnik, Sputnik II, Stanford Research Institute (SRI) ARPA host site at Network Information Center (NIC) at oNLine System of Stanford University Artificial Intelligence Laboratory at Business School of Computer Science Department at first network test at Medical School of Star Wars program stationary communications satellites Stefferud, Einar store-and-forward network Strategic Air Command submarines Sun (Stanford University Network) Microsystems SURAnet Sussex University Sutherland, Ivan Swarthmore College synchronizer bugs System Development Corporation (SDC) TALK tape recorders Taylor, Bob at ARPA networking concept advanced by TCP (Transmission-Control Protocol) TCP/IP telecommunications British history of long distance real-time systems of survivability of technology of teleconferences telegraphy TELENET telephone system analog signals of British circuit switching mechanism of computer network transmission through dedicated lines in direct dialing in line tests in long-distance network of service and equipment of vulnerability of see also AT&T; Bell System Teletype Model Model television Telnet Tenex Terminal IMPs, see TIPs terminals terrorist groups Texas, University of (UT) Texas Instruments Thach, Truett Thacker, Chuck Thomas, Charles Thoreau, Henry David 3Com 360 Model 50 computers Thrope, Marty Time time clocks time-sharing TIPs (Terminal IMPs) Tomlinson, Ray transistors transmission-control protocol (TCP) Trusted Information Systems Turing, Alan 2001: A Space Odyssey TX-0 TX-2 UCLA ARPANET host site at computer science department at first network test at Network Measurement Center at School of Public Health at “Under Libra: Weights and Measures” (Merrill) Ungermann-Bass Union Carbide United Nations (UN) Univac UNIX USING Utah, University of ARPANET host site at U-2 spy planes UUCP Van Nuys High School Vezza, Al Vietnam War antiwar movement against virtual reality conversations in Vittal, John Walden, Dave programming work of real-time systems expertise Walker, Steve Washington Hilton Hotel Washington University Watergate scandal Weizenbaum, Joseph Welchman, Gordon Wessler, Barry Western Union Telegraph Company Whirlwind White House Office of Telecommunications Policy at wind tunnels Wingfield, Mike Wired Wisconsin, University of Woods, Don workstation computers World War II World Wide Web Xerox Corporation Palo Alto Research Center X-Y position indicator for a display system see mouse Yale University York, Herbert

pages: 463 words: 105,197

Radical Markets: Uprooting Capitalism and Democracy for a Just Society
by Eric Posner and E. Weyl
Published 14 May 2018

If simple, early problems have much greater value than later, more complex ones, data will have diminishing value. However, if later, harder problems are more valuable than earlier, easier ones, then data’s marginal value may increase as more data become available. A classic example of this is speech recognition. Early ML systems for speech recognition achieved gains in accuracy more quickly than did later systems. However, a speech recognition system with all but very high accuracy is mostly useless, as it takes so much time for the user to correct the errors it makes. This means that the last few percentage points of accuracy may make a bigger difference for the value of a system than the first 90% does.

pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by Mary L. Gray and Siddharth Suri
Published 6 May 2019

The types of micro-tasks available to workers on UHRS are not surprising if you think about the products that Microsoft sells. Workers review voice recordings, rating the sound quality of the recorded clip. They check written text to ensure it’s not peppered with adult content. Another popular task is translation. Microsoft’s strength in speech recognition and machine translation comes from the ghost work of people training algorithms with accurate data sets. They create them by listening to short audio recordings of one sentence in one language, typically English, and entering the translation of the sentence in their mother tongue in an Excel file.

…

“You’re just basically hitting two keys to either start or stop your subtitle to sync it up with the video. It’s a great program.” You could say that it’s as easy as playing a video game. Automatically recognizing and translating language looks easy in some ways because people are accustomed to the everyday nature of tools like Siri, Cortana, and Alexa. Automating human speech recognition and translation is a fundamental part of artificial intelligence that grew into a field called natural language processing. Natural language processing was helped immensely by the internet’s capacity to amass tons of examples of people writing and speaking in various languages. Yet capturing dialogue in video, particularly action scenes that change the mood and meaning of an actor’s words, remains a difficult task for a computer program to understand, let alone translate into different languages.

…

See ratings or reputation score requesters API effect on, 171 bait-and-switch strategy, 83 collaboration of, 132 communication of tasks, 83–84 fluctuations in, 14 identity of, 31–32 inequality of power in, 91–93 information about, 223 n18, 233 n6, 236 n25 Microsoft employees as, 18 needs of, xvii–xviii origin of, 5–6 transaction costs, 70–75 vetting of, 76–77 Reuther, Walter, 47 rider-customers (Uber), 145–46 risk of entrepreneurship, 95 Industrial Revolution, 45–46 mitigation by requesters, 74 reputation score, 70–71, 81–82 scams, 104, 122, 125 workplace safety, xxiii–xxiv, 60, 86, 97, 190, 193–94 See also transaction costs Riyaz, 86–90 Roberts, Sarah T., 19 robots, xviii–xxiii Romney, Mitt, xii Rosie the Riveter, 47 S S&P Global Market Intelligence, 62 safety, workplace algorithmic cruelty, 86 Bangladesh Accord, 193–94 for full-time employment, 60, 97 Good Work Code, 157 industrial era, 45–46 unraveling of, xxiii–xxiv workspaces, 190 safety net, for workers, 189–92 Sanjay, 128–29 Sanjeev, 126 scaffolding technique, 149–50, 164, 240 n11 scams, 104, 122, 125 scheduling 80/20 rule, 103, 118 always-on workers, 104, 105, 126, 150–51, 158–59, 170, 190 control over, 96, 99–100, 108, 157 employer control over, xxvi, 48 experimentalists, 104, 126, 150–51 just-in-time scheduling, 100, 235 n11 MTurk, 5, 79 as priority, 147, 150, 155, 164 Treaty of Detroit, 48 Sears, Mark, 141, 143, 149 self-improvement, 100, 110–13 sentiment analysis, 19 Service Employees International Union, 158–59, 191 service jobs, growth of, 97 Shah, Palak, 157 shared workspaces, 180–81 Singh, Manmohan, 55 skilled work, 39, 51, 97 skills, learning, 100, 110–13 skills gap, 230 n26 Skype, 23, 132, 179 slavery, 40–41, 226 n2 Smart Glasses, 167–68 Smith, Aaron, 219 n2, 242 n2 Smith, Adam, 58 social consequences, algorithmic cruelty, 68–69 social entrepreneurship, 147–55 social environment forums as, 132–33, 164, 239 n8 job validation, 95 need for, 178–80, 233 n6 requesters on, 73–74 in workplaces, 121–23, 173–74 See also collaboration Software Technology Parks of India (STPI), 55 SpaceX, xviii Sparrow Cycling, 142 speech recognition, 30 spinning jenny, 43, 173 Star, Susan Leigh, 238 n2 Starbucks, 28, 100 Stern, Andy, 191 Strauss, Anselm, 238 n1 strikes, 47, 48 subcontracting, Industrial Revolution, 41–42 success, changing definition of, 97–98 Suchman, Lucy, 238 n3 support collaboration, 121–23, 133–37 for on-demand work, 105 as requirement, 162 of workers, 21, 140–43, 149, 240 n11 See also double bottom line; forums Suri, Siddharth, xxvii–xxix, 221 n23 surveys LeadGenius, 224 n27 market surveys, 3, 19 on payment, 90–91 as task, 87, 116, 219 n2, 242 n2 worker motivation, 100 T Taft, Robert A., 48 Taft-Hartley Act, 48–49, 54, 228 n20 Taste of the World, 14 Taylor, Frederick, 227 n6 Team Genius, 88–90 teamwork, 24, 28, 160–61, 164, 182–83 technology AI. see artificial intelligence (AI) APIs. see application programming interface (API) automation, xviii–xxiii, 173–77, 176–77, 243 n5 computers. see computers machinery, 42, 43–44, 58–59, 227 n5 paradox of automation, xxii, 36, 170, 173, 175 Technology, Entertainment and Design (TED).

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

The Chinese surveillance industry continues to grow at a breakneck pace, growing by an estimated 18 percent in 2019. Chinese firms participating in domestic surveillance are top-tier global technology companies. Hikvision specializes in surveillance cameras; telecom providers Huawei and ZTE in data storage and system integration; Alibaba in cloud computing services and big data analysis; iFLYTEK in speech recognition; SenseTime and SenseNets in facial recognition; and Huawei subsidiary HiSilicon in chips. Huawei, SenseTime, and Megvii have denied their technology is used to profile or target groups. Yet all eight technology companies are suppliers for Skynet or Sharp Eyes. Chinese facial recognition firms are some of the best in the world.

…

Inside, we were shown a small museum of iFLYTEK AI wonders: voice cloning, speech-to-text, text-to-speech, real-time language translation, and more. iFLYTEK is a global leader in AI voice technology. In 2017, they were named by MIT Technology Review the sixth smartest company in the world. In 2018, they won twelve awards in international AI and speech recognition competitions. iFLYTEK representatives gave a tour of the many applications for their AI systems, including in business, education, and health care. Their goal was to “enable machines to listen & speak, understand & think.” iFLYTEK had a voice-activated smart home system, like Amazon’s Alexa.

…

Lin spoke positively of iFLYTEK’s relationship with the government, saying that the company has joint labs with the government and that the government encourages innovation and likes iFLYTEK. Likewise, he said that the company advises the government on regulations and policy. Lin was very open about their partnership with the local government in Xinjiang on voice translation. He said that in 2011 they began a partnership with Xinjiang University to amass data on speech recognition for the Uighur language and, as a result of this partnership, had built a translation service to help Uighurs learn Chinese. He said that in Xinjiang the government was a “big customer.” Independent investigations of iFLYTEK operations in Xinjiang tell a more sinister story. iFLYTEK subsidiaries have signed deals to sell “voiceprint” collection systems to Xinjiang police and a “strategic cooperation framework agreement” with the Xinjiang prison administration bureau, according to investigations by Reuters and Human Rights Watch.

pages: 199 words: 56,243

Trillion Dollar Coach: The Leadership Playbook of Silicon Valley's Bill Campbell
by Eric Schmidt , Jonathan Rosenberg and Alan Eagle
Published 15 Apr 2019

There was the time AT&T offered to pay tens of millions of dollars to license Tellme’s software. Tellme made the first cloud-based speech recognition platform for large businesses and provided the service that answered the phones when you called companies like FedEx, Fidelity, and American Airlines. The problem with the AT&T offer was that they wanted to create a competitive product to Tellme’s; in fact, the offer was contingent upon Tellme getting out of the cloud speech recognition business altogether. Oh, and if the deal didn’t happen, AT&T, who was at the time Tellme’s largest customer, would pull all of its business.

pages: 215 words: 59,188

Seriously Curious: The Facts and Figures That Turn Our World Upside Down
by Tom Standage
Published 27 Nov 2018

Then a new approach emerged, based on machine learning – a technique in which computers are trained using lots of examples, rather than being explicitly programmed. For speech recognition, computers are fed sound files on the one hand, and human-written transcriptions on the other. The system learns to predict which sounds should result in what transcriptions. In translation, the training data are source-language texts and human-made translations. The system learns to match the patterns between them. One thing that improves both speech recognition and translation is a “language model” – a bank of knowledge about what (for example) English sentences tend to look like.

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by Eric Siegel
Published 19 Feb 2013

One must phrase questions in a simple form, since WolframAlpha is designed first to compute answers from tables of data, and only secondarily to attempt to handle complicated grammar. Siri processes spoken inquiries, whereas Watson processes transcribed questions. Researchers generally approach processing speech (speech recognition) as a separate problem from processing text. There is more room for error when a system attempts to transcribe spoken language before also interpreting it, as Siri does. Siri includes a dictionary of humorous canned responses. If you ask Siri about its origin with, “Who’s your daddy?” it will respond, “I know this must mean something . . . everybody keeps asking me this question.”

…

“Oh dear,” says God, “I hadn’t thought of that,” and promptly disappears in a puff of logic. AI faces analogous self-destruction because, once you get a computer to do something, you’ve necessarily trivialized it. We conceive of as yet unmet “intelligent” objectives that appear big, impressive, and unwieldy, such as transcribing the spoken word (speech recognition) or defeating the world chess champion. They aren’t easy to achieve, but once we do pass such benchmarks, they suddenly lose their charm. After all, computers can manage only mechanical tasks that are well understood and well specified. You might be impressed by its lightning-fast speed, but its electronic execution couldn’t hold any transcendental or truly humanlike qualities.

…

Apparently, they’ve predicted you’re going to switch to a competitor, because they are offering a huge discount on the iPhone 13. 7. Internet search. As it’s your colleague’s kid’s birthday, you query for a toy store that’s en route. Siri, available through your car’s audio, has been greatly improved—better speech recognition and proficiently tailored interaction. 8. Driver inattention. Your seat vibrates as internal sensors predict your attention has wavered—perhaps you were distracted by a personalized billboard a bit too long. 9. Collision avoidance. A stronger vibration plus a warning sound alert you to a potential imminent collision—possibly with a child running toward the curb or another car threatening to run a red light. 10.

pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations
by Nicholas Carr
Published 5 Sep 2016

The patent, as Amazon describes it, covers “a hybrid machine/human computing arrangement which advantageously involves humans to assist a computer to solve particular tasks, allowing the computer to solve the tasks more efficiently.” It specifies several applications of such a system, including speech recognition, text classification, image recognition, image comparison, speech comparison, transcription of speech, and comparison of music samples. Amazon also notes that “those skilled in the art will recognize that the invention is not limited to the embodiments described.” The patent goes into great detail about how the system might work in evaluating the skills and performance of the “human-operated nodes.”

…

The company’s new artificial-intelligence algorithms “decide for themselves which features of data to pay attention to, and which patterns matter, rather than having humans decide that, say, colors and particular shapes are of interest to software trying to identify objects.” Google has begun applying its neural nets to speech-recognition and image-recognition tasks. And, according to one of the company’s engineers, Jeff Dean, the technology can already outperform people at some jobs. “We are seeing better than human-level performance in some visual tasks,” [Dean] says, giving the example of labeling, where house numbers appear in photos taken by Google’s Street View car, a job that used to be farmed out to many humans.

…

(Coupland), 102 Martin, Paul, 335 Marx, Karl, Marxism, xvii, xviii, 26, 83, 174, 308 Marx, Leo, 131 Maslow, Abraham, 117–20 massive open online courses (MOOCs), 133 master-slave metaphor, 307–9 mastery, 64–65 Mayer, Marissa, 268 Mayer-Schönberger, Viktor, 48 McAfee, Andrew, 195 McCain, John, 318 McKeen, William, 13–15 McLuhan, Marshall, 102–6, 183–84, 232, 326 McNealy, Scott, 257 measurement, 182 of experience, 197–98, 211–12 mechanical loom, 77 Mechanical Turk, 37–38 media: as advertorial, 53 big outlets for, 67 changes in, 53–54, 59–60 democratization of, xvi, xviii, 28 hegemony of internet in, 236–37 intellectual and social effects of, 103–6 as invasive, 105–6, 127–30 mainstream, 7–8 pursuit of immediacy in, 79 real world vs., 223 in shaping thought, 232 smartphones’ dominance of, 183–84 tools vs., 226 meditation, 162 Mehta, Mayank, 303 memory: association and cohesion in, 100–101 computer, 147, 231 cultural, 325–28 digital, 327 effect of computers on, 98–101, 234, 240 internet manipulation of, 48 neuroengineering of, 332–34 packaging of, 186 in revivification, 69–70 spatial, 290 time vs., 226 video games and, 94–97 Merholz, Peter, 21 Merleau-Ponty, Maurice, 300 Merton, Robert, 12–13 message-automation service, 167 Meyer, Stephenie, 50 Meyerowitz, Joanne, 338 microfilm, microphotography, 267 Microsoft, 108, 168, 205, 284 military technology, 331–32 Miller, Perry, xvii mindfulness, 162 Minima Moralia (Adorno), 153–54 mirrors, 138–39 Mitchell, Joni, 128 Mollie (video poker player), 218–19 monitoring: corporate control through, 163–65 of thoughts, 214–15 through wearable behavior-modification devices, 168–69 Montaigne, Michel de, 247, 249, 252, 254 Moore, Geoffrey, 209 Morlocks, 114, 186 “Morphological Basis of the Arm-to-Wing Transition, The” (Poore), 329–30 Morrison, Ewan, 288 Morrison, Jim, 126 Morse code, 34 “Most of It, The” (Frost), 145–46 motor skills, video games and, 93–94 “Mowing” (Frost), 296–300, 302, 304–5 MP3 players, 122, 123, 124, 216, 218, 293 multitasking, media, 96–97 Mumford, Lewis, 138–39, 235 Murdoch, Rupert and Wendi, 131 music: bundling of, 41–46 commercial use of, 244–45 copying and sharing technologies for, 121–26, 314 digital revolution in, 293–95 fidelity of, 124 listening vs. interface in, 216–18, 293 in participatory games, 71–72 streamed and curated, 207, 217–18 music piracy, 121–26 Musings on Human Metamorphoses (Leary), 171 Musk, Elon, 172 Musset, Alfred de, xxiii Muzak, 208, 244 MySpace, xvi, 10–11, 30–31 “Names of the Hare, The,” 201 nanotechnology, 69 Napster, 122, 123 narcissism, 138–39 Twitter and, 34–36 narrative emotions, 250 natural-language processing, 215 Negroponte, Nicholas, xx neobehavioralism, 212–13 Netflix, 92 neural networks, 136–37 neuroengineering, 332–33 New Critics, 249 News Feed, 320 news media, 318–20 newspapers: evolution of, 79, 237 online archives of, 47–48, 190–92 online vs. printed, 289 Newton, Isaac, 66 New York Public Library, 269 New York Times, 8, 71, 83, 133, 152–53, 195, 237, 283, 314, 342 erroneous information revived by, 47–48 on Twitter, 35 Nielsen Company, 80–81 Nietzsche, Friedrich, 126, 234–35, 237 Nightingale, Paul, 335 Nixon, Richard, 317 noise pollution, 243–46 Nook, 257 North of Boston (Frost), 297 nostalgia, 202, 204, 312 in music, 292–95 Now You See It (Davidson), 94 Oates, Warren, 203 Oatley, Keith, 248–50 Obama, Barack, 314 obsession, 218–19 OCLC, 276 “off grid,” 52 Olds, James, 235 O’Neill, Gerard, 171 One Infinite Loop, 76 Ong, Walter, 129 online aggregation, 192 On Photography (Sontag), xx open networks, profiteering from, 83–85 open-source projects, 5–7, 26 Oracle, 17 orchises, 305 O’Reilly, Tim, 3–5, 7 organ donation and transplantation, 115 ornithopters, 239 orphan books, 276, 277 Overture, 279–80 Owad, Tom, 256 Oxford Junior Dictionary, 201–2 Oxford University, library of, 269 Page, Larry, 23, 160, 172, 239, 268–69, 270, 279, 281–85 personal style of, 16–17, 281–82, 285 paint-by-number kits, 71–72 Paley, William, 43 Palfrey, John, 272–74, 277 Palmisano, Sam, 26 “pancake people,” 242 paper, invention and uses of, 286–89 Paper: An Elegy (Sansom), 287 Papert, Seymour, 134 Paradise within the Reach of All Men, The (Etzler), xvi–xvii paradox of time, 203–4 parenting: automation of, 181 of virtual child, 73–75 Parker, Sarah Jessica, 131 participation: “cognitive surplus” in, 59 as content and performance, 184 inclusionists vs. deletionists in, 18–20 internet, 28–29 isolation and, 35–36, 184 limits and flaws of, 5–7, 62 Paul, Rand, 314 Pendragon, Caliandras (avatar), 25 Pentland, Alex, 212–13 perception, spiritual awakening of, 300–301 personalization, 11 of ads, 168, 225, 264 isolation and, 29 loss of autonomy in, 264–66 manipulation through, 258–59 in message automation, 167 in searches, 145–46, 264–66 of streamed music, 207–9, 245 tailoring in, 92, 224 as threat to privacy, 255 Phenomenology of Perception (Merleau-Ponty), 300 Philosophical Investigations (Wittgenstein), 215 phonograph, phonograph records, 41–46, 133, 287 photography, technological advancement in, 311–12 Pichai, Sundar, 181 Pilgrims, 172 Pinterest, 119, 186 playlists, 314 PlayStation, 260 “poetic faith,” 251 poetry, 296–313 polarization, 7 politics, transformed by technology, 314–20 Politics (Aristotle), 307–8 Poore, Samuel O., 329–30 pop culture, fact-mongering in, 58–62 pop music, 44–45, 63–64, 224 copying technologies for, 121–26 dead idols of, 126 industrialization of, 208–9 as retrospective and revivalist, 292–95 positivism, 211 Potter, Dean, 341–42 power looms, 178 Presley, Elvis, 11, 126 Prim Revolution, 26 Principles of Psychology (James), 203 Principles of Scientific Management, The (Taylor), 238 printing press: consequences of, 102–3, 234, 240–41, 271 development of, 53, 286–87 privacy: devaluation of, 258 from electronic surveillance, 52 family cohesion vs., 229 free flow of information vs. right to, 190–94 internet threat to, 184, 255–59, 265, 285 safeguarding of, 258–59, 283 vanity vs., 107 proactive cognitive control, 96 Prochnik, George, 243–46 “Productivity Future Vision (2011),” 108–9 Project Gutenberg, 278 prosperity, technologies of, 118, 119–20 prosumerism, 64 protest movements, 61 Proust and the Squid (Wolf), 234 proximal clues, 303 public-domain books, 277–78 “public library,” debate over use of term, 272–74 punch-card tabulator, 188 punk music, 63–64 Quantified Self Global Conference, 163 Quantified Self (QS) movement, 163–65 Quarter-of-a-Second Rule, 205 racecars, 195, 196 radio: in education, 134 evolution of, 77, 79, 159, 288 as music medium, 45, 121–22, 207 political use of, 315–16, 317, 319 Radosh, Daniel, 71 Rapp, Jen, 341–42 reactive cognitive control, 96 Readers’ Guide to Periodical Literature, 91 reading: brain function in, 247–54, 289–90 and invention of paper, 286–87 monitoring of, 257 video gaming vs., 261–62 see also books reading skills, changes in, 232–34, 240–41 Read Write Web (blog), 30 Reagan, Ronald, 315 real world: digital media intrusion in, 127–30 perceived as boring and ugly, 157–58 as source of knowledge, 313 virtual world vs., xx–xxi, 36, 62, 127–30, 303–4 reconstructive surgery, 239 record albums: copying of, 121–22 jackets for, 122, 224 technology of, 41–46 Redding, Otis, 126 Red Light Center, 39 Reichelt, Franz, 341 Reid, Rob, 122–25 relativists, 20 religion: internet perceived as, 3–4, 238 for McLuhan, 105 technology viewed as, xvi–xvii Republic of Letters, 271 reputations, tarnishing of, 47–48, 190–94 Resident Evil, 260–61 resource sharing, 148–49 resurrection, 69–70, 126 retinal implants, 332 Retromania (Reynolds), 217, 292–95 Reuters, Adam, 26 Reuters’ SL bureau, 26 revivification machine, 69–70 Reynolds, Simon, 217–18, 292–95 Rice, Isaac, 244 Rice, Julia Barnett, 243–44 Richards, Keith, 42 “right to be forgotten” lawsuit, 190–94 Ritalin, 304 robots: control of, 303 creepy quality of, 108 human beings compared to, 242 human beings replaced by, 112, 174, 176, 195, 197, 306–7, 310 limitations of, 323 predictions about, xvii, 177, 331 replaced by humans, 323 threat from, 226, 309 Rogers, Roo, 83–84 Rolling Stones, 42–43 Roosevelt, Franklin, 315 Rosen, Nick, 52 Rubio, Marco, 314 Rumsey, Abby Smith, 325–27 Ryan, Amy, 273 Sandel, Michael J., 340 Sanders, Bernie, 314, 316 Sansom, Ian, 287 Savage, Jon, 63 scatology, 147 Schachter, Joshua, 195 Schivelbusch, Wolfgang, 229 Schmidt, Eric, 13, 16, 238, 239, 257, 284 Schneier, Bruce, 258–59 Schüll, Natasha Dow, 218 science fiction, 106, 115, 116, 150, 309, 335 scientific management, 164–65, 237–38 Scrapbook in American Life, The, 185 scrapbooks, social media compared to, 185–86 “Scrapbooks as Cultural Texts” (Katriel and Farrell), 186 scythes, 302, 304–6 search-engine-optimization (SEO), 47–48 search engines: allusions sought through, 86 blogging, 66–67 in centralization of internet, 66–69 changing use of, 284 customizing by, 264–66 erroneous or outdated stories revived by, 47–48, 190–94 in filtering, 91 placement of results by, 47–48, 68 searching vs., 144–46 targeting information through, 13–14 writing tailored to, 89 see also Google searching, ontological connotations of, 144–46 Seasteading Institute, 172 Second Life, 25–27 second nature, 179 self, technologies of the, 118, 119–20 self-actualization, 120, 340 monitoring and quantification of, 163–65 selfies, 224 self-knowledge, 297–99 self-reconstruction, 339 self-tracking, 163–65 Selinger, Evan, 153 serendipity, internet as engine of, 12–15 SETI@Home, 149 sexbots, 55 Sex Pistols, 63 sex-reassignment procedures, 337–38 sexuality, 10–11 virtual, 39 Shakur, Tupac, 126 sharecropping, as metaphor for social media, 30–31 Shelley, Percy Bysshe, 88 Shirky, Clay, 59–61, 90, 241 Shop Class as Soulcraft (Crawford), 265 Shuster, Brian, 39 sickles, 302 silence, 246 Silicon Valley: American culture transformed by, xv–xxii, 148, 155–59, 171–73, 181, 241, 257, 309 commercial interests of, 162, 172, 214–15 informality eschewed by, 197–98, 215 wealthy lifestyle of, 16–17, 195 Simonite, Tom, 136–37 simulation, see virtual world Singer, Peter, 267 Singularity, Singularitarians, 69, 147 sitcoms, 59 situational overload, 90–92 skimming, 233 “Slaves to the Smartphone,” 308–9 Slee, Tom, 61, 84 SLExchange, 26 slot machines, 218–19 smart bra, 168–69 smartphones, xix, 82, 136, 145, 150, 158, 168, 170, 183–84, 219, 274, 283, 287, 308–9, 315 Smith, Adam, 175, 177 Smith, William, 204 Snapchat, 166, 205, 225, 316 social activism, 61–62 social media, 224 biases reinforced by, 319–20 as deceptively reflective, 138–39 documenting one’s children on, 74–75 economic value of content on, 20–21, 53–54, 132 emotionalism of, 316–17 evolution of, xvi language altered by, 215 loom as metaphor for, 178 maintaining one’s microcelebrity on, 166–67 paradox of, 35–36, 159 personal information collected and monitored through, 257 politics transformed by, 314–20 scrapbooks compared to, 185–86 self-validation through, 36, 73 traditional media slow to adapt to, 316–19 as ubiquitous, 205 see also specific sites social organization, technologies of, 118, 119 Social Physics (Pentland), 213 Society for the Suppression of Unnecessary Noise, 243–44 sociology, technology and, 210–13 Socrates, 240 software: autonomous, 187–89 smart, 112–13 solitude, media intrusion on, 127–30, 253 Songza, 207 Sontag, Susan, xx SoundCloud, 217 sound-management devices, 245 soundscapes, 244–45 space travel, 115, 172 spam, 92 Sparrow, Betsy, 98 Special Operations Command, U.S., 332 speech recognition, 137 spermatic, as term applied to reading, 247, 248, 250, 254 Spinoza, Baruch, 300–301 Spotify, 293, 314 “Sprite Sips” (app), 54 Squarciafico, Hieronimo, 240–41 Srinivasan, Balaji, 172 Stanford Encyclopedia of Philosophy, 68 Starr, Karla, 217–18 Star Trek, 26, 32, 313 Stengel, Rick, 28 Stephenson, Neal, 116 Sterling, Bruce, 113 Stevens, Wallace, 158 Street View, 137, 283 Stroop test, 98–99 Strummer, Joe, 63–64 Studies in Classic American Literature (Lawrence), xxiii Such Stuff as Dreams (Oatley), 248–49 suicide rate, 304 Sullenberger, Sully, 322 Sullivan, Andrew, xvi Sun Microsystems, 257 “surf cams,” 56–57 surfing, internet, 14–15 surveillance, 52, 163–65, 188–89 surveillance-personalization loop, 157 survival, technologies of, 118, 119 Swing, Edward, 95 Talking Heads, 136 talk radio, 319 Tan, Chade-Meng, 162 Tapscott, Don, 84 tattoos, 336–37, 340 Taylor, Frederick Winslow, 164, 237–38 Taylorism, 164, 238 Tebbel, John, 275 Technics and Civilization (Mumford), 138, 235 technology: agricultural, 305–6 American culture transformed by, xv–xxii, 148, 155–59, 174–77, 214–15, 229–30, 296–313, 329–42 apparatus vs. artifact in, 216–19 brain function affected by, 231–42 duality of, 240–41 election campaigns transformed by, 314–20 ethical hazards of, 304–11 evanescence and obsolescence of, 327 human aspiration and, 329–42 human beings eclipsed by, 108–9 language of, 201–2, 214–15 limits of, 341–42 master-slave metaphor for, 307–9 military, 331–32 need for critical thinking about, 311–13 opt-in society run by, 172–73 progress in, 77–78, 188–89, 229–30 risks of, 341–42 sociology and, 210–13 time perception affected by, 203–6 as tool of knowledge and perception, 299–304 as transcendent, 179–80 Technorati, 66 telegrams, 79 telegraph, Twitter compared to, 34 telephones, 103–4, 159, 288 television: age of, 60–62, 79, 93, 233 and attention disorders, 95 in education, 134 Facebook ads on, 155–56 introduction of, 103–4, 159, 288 news coverage on, 318 paying for, 224 political use of, 315–16, 317 technological adaptation of, 237 viewing habits for, 80–81 Teller, Astro, 195 textbooks, 290 texting, 34, 73, 75, 154, 186, 196, 205, 233 Thackeray, William, 318 “theory of mind,” 251–52 Thiel, Peter, 116–17, 172, 310 “Things That Connect Us, The” (ad campaign), 155–58 30 Days of Night (film), 50 Thompson, Clive, 232 thought-sharing, 214–15 “Three Princes of Serendip, The,” 12 Thurston, Baratunde, 153–54 time: memory vs., 226 perception of, 203–6 Time, covers of, 28 Time Machine, The (Wells), 114 tools: blurred line between users and, 333 ethical choice and, 305 gaining knowledge and perception through, 299–304 hand vs. computer, 306 Home and Away blurred by, 159 human agency removed from, 77 innovation in, 118 media vs., 226 slave metaphor for, 307–8 symbiosis with, 101 Tosh, Peter, 126 Toyota Motor Company, 323 Toyota Prius, 16–17 train disasters, 323–24 transhumanism, 330–40 critics of, 339–40 transparency, downside of, 56–57 transsexuals, 337–38 Travels and Adventures of Serendipity, The (Merton and Barber), 12–13 Trends in Biochemistry (Nightingale and Martin), 335 TripAdvisor, 31 trolls, 315 Trump, Donald, 314–18 “Tuft of Flowers, A” (Frost), 305 tugboats, noise restrictions on, 243–44 Tumblr, 166, 185, 186 Turing, Alan, 236 Turing Test, 55, 137 Twain, Mark, 243 tweets, tweeting, 75, 131, 315, 319 language of, 34–36 theses in form of, 223–26 “tweetstorm,” xvii 20/20, 16 Twilight Saga, The (Meyer), 50 Twitter, 34–36, 64, 91, 119, 166, 186, 197, 205, 223, 224, 257, 284 political use of, 315, 317–20 2001: A Space Odyssey (film), 231, 242 Two-Lane Blacktop (film), 203 “Two Tramps in Mud Time” (Frost), 247–48 typewriters, writing skills and, 234–35, 237 Uber, 148 Ubisoft, 261 Understanding Media (McLuhan), 102–3, 106 underwearables, 168–69 unemployment: job displacement in, 164–65, 174, 310 in traditional media, 8 universal online library, 267–78 legal, commercial, and political obstacles to, 268–71, 274–78 universe, as memory, 326 Urban Dictionary, 145 utopia, predictions of, xvii–xviii, xx, 4, 108–9, 172–73 Uzanne, Octave, 286–87, 290 Vaidhyanathan, Siva, 277 vampires, internet giants compared to, 50–51 Vampires (game), 50 Vanguardia, La, 190–91 Van Kekerix, Marvin, 134 vice, virtual, 39–40 video games, 223, 245, 303 as addictive, 260–61 cognitive effects of, 93–97 crafting of, 261–62 violent, 260–62 videos, viewing of, 80–81 virtual child, tips for raising a, 73–75 virtual world, xviii commercial aspects of, 26–27 conflict enacted in, 25–27 language of, 201–2 “playlaborers” of, 113–14 psychological and physical health affected by, 304 real world vs., xx–xxi, 36, 62, 127–30 as restrictive, 303–4 vice in, 39–40 von Furstenberg, Diane, 131 Wales, Jimmy, 192 Wallerstein, Edward, 43–44 Wall Street, automation of, 187–88 Wall Street Journal, 8, 16, 86, 122, 163, 333 Walpole, Horace, 12 Walters, Barbara, 16 Ward, Adrian, 200 Warhol, Andy, 72 Warren, Earl, 255, 257 “Waste Land, The” (Eliot), 86, 87 Watson (IBM computer), 147 Wealth of Networks, The (Benkler), xviii “We Are the Web” (Kelly), xxi, 4, 8–9 Web 1.0, 3, 5, 9 Web 2.0, xvi, xvii, xxi, 33, 58 amorality of, 3–9, 10 culturally transformative power of, 28–29 Twitter and, 34–35 “web log,” 21 Wegner, Daniel, 98, 200 Weinberger, David, 41–45, 277 Weizenbaum, Joseph, 236 Wells, H.

pages: 918 words: 257,605

The Age of Surveillance Capitalism
by Shoshana Zuboff
Published 15 Jan 2019

The company describes itself “at the forefront of innovation in machine intelligence,” a term in which it includes machine learning as well as “classical” algorithmic production, along with many computational operations that are often referred to with other terms such as “predictive analytics” or “artificial intelligence.” Among these operations Google cites its work on language translation, speech recognition, visual processing, ranking, statistical modeling, and prediction: “In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, applying learning algorithms to understand and generalize.”9 These machine intelligence operations convert raw material into the firm’s highly profitable algorithmic products designed to predict the behavior of its users.

…

As Urs Hölzle, Google’s senior vice president of technical infrastructure, put it, “The dirty secret behind [AI] is that they require an insane number of computations to just actually train the network.” If the company had tried to process the growing computational workload with traditional CPUs, he explained, “We would have had to double the entire footprint of Google—data centers and servers—just to do three minutes or two minutes of speech recognition per Android user per day.”27 With data center construction as the company’s largest line item and power as its highest operating cost, Google invented its way through the infrastructure crisis. In 2016 it announced the development of a new chip for “deep learning inference” called the tensor processing unit (TPU).

…

In 2004 he asserted that cell phones and other wearable devices with “computational horsepower” would provide the “foundation” for reality mining as an “exciting new suite of business applications.” The idea was always that businesses could use their privileged grasp of “reality” to shape behavior toward maximizing business objectives. He describes new experimental work in which speech-recognition technology generated “profiles of individuals based on the words they use,” thus enabling a manager to “form a team of employees with harmonious social behavior and skills.”17 In their 2006 article, Pentland and Eagle explained that their data would be “of significant value in the workplace,” and the two jointly submitted a patent for a “combined short range radio network and cellular telephone network for interpersonal communications” that would add to the stock of instruments available for businesses to mine reality.18 Eagle told Wired that year that the reality mining study represented an “unprecedented data set about continuous human behavior” that would revolutionize the study of groups and offer new commercial applications.

pages: 561 words: 157,589

WTF?: What's the Future and Why It's Up to Us
by Tim O'Reilly
Published 9 Oct 2017

Social media platforms like YouTube, Facebook, Twitter, Instagram, and Snapchat all gain their power by aggregating the contributions of billions of users. When people asked me what came after Web 2.0, I was quick to answer “collective intelligence applications driven by data from sensors rather than from people typing on keyboards.” Sure enough, advances in areas like speech recognition and image recognition, real-time traffic and self-driving cars, all depend on massive amounts of data harvested from sensors on connected devices. The current race in autonomous vehicles is a race not just to develop new algorithms, but to collect larger and larger amounts of data from human drivers about road conditions, and ever-more-detailed maps of the world created by millions of unwitting contributors.

…

It was Sunil Paul’s efforts to get the California Public Utilities Commission to accept the model that made it thinkable. Lyft jumped on the opportunity. Uber eventually followed. A more recent demonstration of how old thinking holds back even smart entrepreneurs is how long it took for the Amazon Echo to arrive, given that speech recognition has been a feature of smartphones since the 2011 launch of Apple’s Siri intelligent agent. Yet it was Amazon’s Alexa, not Siri or Google, that brought a seemingly minor change that made all the difference: Alexa was the first smart agent always listening to your commands without the need to first touch a button.

…

In their 2009 paper, “The Unreasonable Effectiveness of Data,” (a homage in its title to Eugene Wigner’s classic 1960 talk, “The Unreasonable Effectiveness of Mathematics in the Natural Sciences”), Google machine learning researchers Alon Halevy, Peter Norvig, and Fernando Pereira explained the growing effectiveness of statistical methods in solving previously difficult problems such as speech recognition and machine translation. Much of the previous work had been grammar based. Could you construct what was in effect a vast piston engine that used its knowledge of grammar rules to understand human speech? Success had been limited. But that changed as more and more documents came online. A few decades ago, researchers relied on carefully curated corpora of human speech and writings that, at most, contained a few million words.

pages: 228 words: 65,953

The Six-Figure Second Income: How to Start and Grow a Successful Online Business Without Quitting Your Day Job
by David Lindahl and Jonathan Rozek
Published 4 Aug 2010

You have three options for taking that recorded content and making it usable. First, you can simply listen to it and write down the sections you find the most helpful. I’m not a fan of that method but you may be. Second, you can use a tool like Dragon Naturally Speaking to have the computer transcribe your words. Speech-recognition software seems to be getting better by the month. What formerly was a pretty cumbersome and inaccurate process of using software to create transcripts has now become fairly workable. With the recent generation of software, the more you train it to understand your voice the more accurate it becomes.

…

See also Live events boot camps live tours and lunch/dinner one-day teleseminars/webinars videos of Shopping carts Size of business Slide charts Snagit software Snowball microphone Social media. See also Blogs Software Camtasia as content delivery tool domain name permutation software Dragon Naturally Speaking FTP tools HTML editing tools iPhone applications Joomla Microsoft Word MindManager shopping cart software Snagit speech-recognition software trial software Spam Special reports Specificity of claims Success advice on conventional approach to effort needed for excuses for lack of false barriers to implementation leading to proof of real dangers to Sullivan, Anthony “Sully,” Target market Techsmith.com Telephone services consulting hotlines customer contact by telephone iPhone applications teleseminars/webinars toll-free 24/7 recorded lines Templates, web site Terminology.

pages: 259 words: 67,456

The Mythical Man-Month
by Brooks, Jr. Frederick P.
Published 1 Jan 1975

Most of the work is problem-specific, and some abstraction or creativity is required to see how to transfer it.5 I agree completely with this critique. The techniques used for speech recognition seem to have little in common with those used for image recognition, and both are different from those used in expert systems. I have a hard time seeing how image recognition, for example, will make any appreciable difference in programming practice. The same is true of speech recognition. The hard thing about building software is deciding what to say, not saying it. No facilitation of expression can give more than marginal gains. Expert systems technology, AI-2, deserves a section of its own.

pages: 205 words: 20,452

Data Mining in Time Series Databases
by Mark Last , Abraham Kandel and Horst Bunke
Published 24 Jun 2004

Another, more ﬂexible way to describe similar but out-of-phase sequences can be achieved by using the Dynamic Time Warping (DTW) [38]. Berndt and Cliﬀord [5] were the ﬁrst that introduced this measure in the datamining community. Recent works on DTW are [27,31,45]. DTW has been ﬁrst used to match signals in speech recognition [38]. DTW between two sequences A and B is described by the following equation: DTW (A, B) = Dbase (am , bn ) + min{DTW (Head(A), Head(B)), DTW (Head(A), B), DTW (A, Head(B))} where Dbase is some Lp Norm and Head(A) of a sequence A are all the elements of A except for am , the last one. Indexing Time-Series under Conditions of Noise 71 Fig. 2.

…

Keywords: String distance; set median string; generalized median string; online handwritten digits. 1. Introduction Strings provide a simple and yet powerful representation scheme for sequential data. In particular time series can be eﬀectively represented by strings. Numerous applications have been found in a broad range of ﬁelds including computer vision [2], speech recognition, and molecular biology [13,34]. 173 174 X. Jiang, H. Bunke and J. Csirik A large number of operations and algorithms have been proposed to deal with strings [1,5,13,34,36]. Some of them are inherent to the special nature of strings such as the shortest common superstring and the longest common substring, while others are adapted from other domains.

Cartesian Linguistics
by Noam Chomsky
Published 1 Jan 1966

Hall (Washington: American Physiological Society, 1960), vol III, chap. LXV. [Scientiﬁc research on perception since 140 Notes 1966 continues this theme; the literature is now massive. Chomsky sometimes refers to Marr 1981.] 124. For discussion and references in the areas of phonology and syntax respectively, see M. Halle and K. N. Stevens, “Speech Recognition: A Model and a Program for Research,” in Fodor and Katz (eds.), op. cit.; and G. A. Miller and N. Chomsky, “Finitary Models of Language Users,” part 2, in Handbook of Mathematical Psychology, ed. R. D. Luce, R. Bush, and E. Galanter (New York: John Wiley, 1963), vol. II.. 141 Bibliography Aarslef, H.

…

Gregoire, ‘Petit traité de linguistique,’” Revue des langues romanes, vol. 60, 1920. ———. Traité de phonétique, Delagrave, Paris, 1933. Gunderson, K. “Descartes, La Mettrie, Language and Machines,” Philosophy, vol. 39, 1964. Gysi, L. Platonism and Cartesianism in the Philosophy of Ralph Cudworth. Herbert Lang, Bern, 1962. Halle, M., and K. N. Stevens. “Speech Recognition: A Model and a Program for Research,” in Fodor and Katz, Structure of Language. Harnois, G. “Les théories du langage en France de 1660 à 1821,” Études Françaises, vol. 17, 1929. Harris, J. Works, ed. Earl of Malmesbury. London, 1801. Harris, Z. S. “Co-occurrence and Transformation in Linguistic Structure,” Language, vol. 33, 1957. pp. 283–340.

pages: 1,201 words: 233,519

Coders at Work
by Peter Seibel
Published 22 Jun 2009

In the process of going around to these service bureaus, I wound up at the CDC service bureau in Stanford industrial park—typically you're working late at night because that's when it was less expensive—there was another guy there who had a Fortran program to do speech recognition. He had various speech samples and his program analyzed the spectra and grouped the phonemes and stuff like that. I started talking to him and I said, “Well, jeez, you want to run my program on yours?” So we did that and parted company. He called me up a couple of weeks later and said, “I've been hired by Xerox to do a speech-recognition project and I've got no one to help me with the nitty-gritty; would you like to work with me?” So I started consulting with him. That was George White, who went on for a long time to do speech recognition. That's how I got in with Xerox and also with Alan Kay, because it turned out that my office was across the hall from Alan's and I kept hearing conversations that I was more interested in than speech recognition.

…

That's how I got in with Xerox and also with Alan Kay, because it turned out that my office was across the hall from Alan's and I kept hearing conversations that I was more interested in than speech recognition. Seibel: Was the domain of speech recognition not that interesting or was it something about the programming involved? Ingalls: Oh, it was interesting—it was fascinating. I ended up building up a whole personal-computing environment on this Sigma 3 minicomputer. It used card decks and Fortran was the main thing I had to work with. Out of that I built an interactive environment. I wrote a text editor in Fortran and then something so we could start submitting stuff remotely from a terminal.

pages: 525 words: 116,295

The New Digital Age: Transforming Nations, Businesses, and Our Lives
by Eric Schmidt and Jared Cohen
Published 22 Apr 2013

All digital platforms will forge a common policy with respect to dangerous extremist videos online, just as they have coalesced in establishing policies governing child pornography. There is a fine line between censorship and security, and we must create safeguards accordingly. The industry will work as a whole to develop software that more effectively identifies videos with terrorist content. Some in the industry may even go so far as employing speech-recognition software that registers strings of keywords, or facial-recognition software that identifies known terrorists. Terrorism, of course, will never disappear, and it will continue to have a destructive impact. But as the terrorists of the future are forced to live in both the physical and the virtual world, their model of secrecy and discretion will suffer.

…

With registration and specialized platforms to address these concerns, IDPs will be able to receive alerts, navigate their new environment, and receive supplies and benefits from international aid organizations on the scene. Facial-recognition software will be heavily used to find lost or missing persons. With speech-recognition technology, illiterate users will be able to speak the names of relatives and the database will report if they are in the camp system. Online platforms and mobile phones will allow refugee camps to classify and organize their members according to their skills, backgrounds and interests. In today’s refugee camps, there are large numbers of people with relevant and needed skills (doctors, teachers, soccer coaches) whose participation is only leveraged in an ad hoc manner, mobilized slowly through word-of-mouth networks throughout the camps.

…

Sarkozy, Nicolas satellite positioning Saud, Alwaleed bin Talal al- Saudi Arabia, 2.1, 2.2, 3.1, 4.1, 6.1 “Saudi People Demand Hamza Kashgari’s Execution, The” (Facebook group) Save the Children scale effects Schengen Agreement Scott-Railton, John search-engine optimization (SEO), n secession movements secure sockets layer (SSL) security, 2.1, 2.2, 2.3, 2.4 in autocracies censorship and company policy on, 2.1, 2.2 privacy vs., itr.1, 5.1, 5.2 in schools selective memory self-control self-driving cars, itr.1, 1.1, 1.2 September 11, 2001, terrorist attacks of, 3.1, 5.1 Serbia, 4.1, 6.1 servers Shafik, Ahmed shanzhai network, 1.1 sharia Shia Islam Shia uprising Shiites Shock Doctrine, The (Klein), 7.1n short-message-service (SMS) platform, 4.1, 7.1 Shukla, Prakash Sichuan Hongda SIM cards, 5.1, 5.2, 5.3, 6.1, 6.2, nts.1 Singapore, 2.1, 4.1 Singer, Peter, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8 singularity SkyGrabber Skype, 2.1, 2.2, 2.3, 3.1, 5.1 sleeping rhythms Slim Helú, Carlos smart phones, itr.1, 1.1, 1.2, 5.1, 5.2, 7.1 in failed states peer-to-peer capability on Snapchat Snoad, Nigel social networking, 2.1, 4.1, 5.1 social-networking profiles social prosthetics social robots “socioeconomically at risk” people Solidarity Somalia, 2.1, 5.1, 5.2, 5.3, 6.1n, 210, 7.1, 7.2, 7.3 Sony South Africa, 4.1, 7.1 South Central Los Angeles Southern African Development Community (SADC) South Korea, 3.1, 3.2 South Sudan Soviet Union, 4.1, 6.1 Spain Speak2Tweet Special Weapons Observation Reconnaissance Detection System (SWORDS), 6.1, 6.2 speech-recognition technology spoofing Spotify Sputnik spyware, 3.1, 6.1 Stanford University statecraft State Department, U.S., 5.1, 7.1 states: ambition of future of Storyful, n Strategic Arms Limitation Talks (SALT) Stuxnet worm, 3.1, 3.2 suborbital space travel Sudan suggestion engines Summit Against Violent Extremism Sunni Web supersonic tube commutes supplements supply chains Supreme Council of the Armed Forces (SCAF) surveillance cameras Sweden switches Switzerland synthetic skin grafts Syria, 2.1, 3.1, 4.1, 4.2 uprising in Syrian Telecommunications Establishment tablets, 1.1, 1.2, 7.1 holographic Tacocopter Tahrir Square, 4.1, 4.2, 4.3 Taiwan Taliban, 2.1, 5.1, 7.1 TALON Tanzania technology companies, 2.1, 3.1 Tehran Telecom Egypt telecommunications, reconstruction of telecommunications companies Télécoms Sans Frontières television terrorism, terrorists, 4.1, 5.1, con.1 chat rooms of connectivity and cyber, 3.1n, 153–5, 5.1 hacking by Thailand Thomson Reuters Foundation thought-controlled robotic motion 3-D printing, 1.1, 2.1, 2.2, 5.1 thumbprints Tiananmen Square protest, 3.1, 4.1 Tibet time zones tissue engineers to-do lists Tor service, 2.1, 2.2, 2.3, 3.1, 5.1n Total Information Awareness (TIA) trade transmission towers transparency, 2.1, 4.1 “trespass to chattels” tort, n Trojan horse viruses, 2.1, 3.1 tsunami Tuareg fighters Tumblr Tunisia, 4.1, 4.2, 4.3, 4.4, 4.5 Turkey, 3.1, 3.2, 4.1, 5.1, 6.1 Tutsis Twa Twitter, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.1, 3.2, 4.1, 4.2, 5.1, 5.2, 6.1, 7.1, 7.2, nts.1 Uganda Uighurs, 3.1, 6.1 Ukraine unemployment UNESCO World Heritage Centre unique identification (UID) program United Arab Emirates, 2.1, 2.2, 2.3 United Kingdom, 2.1, 2.2, 2.3, 3.1 United Nations, 4.1, 5.1, 6.1, 7.1 United Nations Security Council, 3.1n, 214, 7.1 United Russia party United States, 3.1, 3.2, 3.3, 4.1, 5.1, 7.1 engineering sector in United States Agency for International Development (USAID) United States Cyber Command (USCYBERCOM) unmanned aerial vehicles (UAVs), 6.1, 6.2, 6.3, 6.4, 6.5 Ürümqi riots user-generated content Ushahidi vacuuming, 1.1, 1.2 Valspar Corporation Venezuela, 2.1, 2.2, 6.1 verification video cameras video chats video games videos Vietcong Vietnam vigilantism violence virtual espionage virtual governance virtual identities, itr.1, 2.1, 2.2 virtual juvenile records virtual kidnapping virtual private networks (VPNs), 2.1, 3.1 virtual reality virtual statehood viruses vitamins Vodafone, 4.1, 7.1 Vodafone/Raya voice-over-Internet-protocol (VoIP) calls, 2.1, 5.1 voice-recognition software, 1.1, 2.1, 5.1 Voilà VPAA statute, n Walesa, Lech walled garden Wall Street Journal, 97 war, itr.1, itr.2, 6.1 decline in Wardak, Abdul Rahim warfare: automated remote warlords, 2.1, 2.2 Watergate Watergate break-in Waters, Carol weapons of mass destruction wearable technology weibos, 62 Wen Jiabao Wenzhou, China West Africa whistle-blowers whistle-blowing websites Who Controls the Internet?

pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence
by John Brockman
Published 5 Oct 2015

A TURNING POINT IN ARTIFICIAL INTELLIGENCE STEVE OMOHUNDRO Scientist, Self-Aware Systems; cofounder, Center for Complex Systems Research, University of Illinois Last year appears to have been a turning point for AI and robotics. Major corporations invested billions of dollars in these technologies. AI techniques, like machine learning, are now routinely used for speech recognition, translation, behavior modeling, robotic control, risk management, and other applications. McKinsey predicts that these technologies will create more than $50 trillion of economic value by 2025. If this is accurate, we should expect dramatically increased investment soon. The recent successes are being driven by cheap computer power and plentiful training data.

…

After thirty years of research, a million-times improvement in computer power, and vast data sets from the Internet, we now know the answer to this question: Neural networks scaled up to twelve layers deep, with billions of connections, are outperforming the best algorithms in computer vision for object recognition and have revolutionized speech recognition. It’s rare for any algorithm to scale this well, which suggests that they may soon be able to solve even more difficult problems. Recent breakthroughs have been made that allow the application of deep learning to natural-language processing. Deep recurrent networks with short-term memory were trained to translate English sentences into French sentences at high levels of performance.

…

More flexibility means a greater ability to capture the patterns appearing in data but a greater risk of finding patterns that aren’t there. In artificial intelligence research, this tension between structure and flexibility manifests in different kinds of systems that can be used to solve challenging problems like speech recognition, computer vision, and machine translation. For decades, the systems that performed best on those problems came down on the side of structure: They were the result of careful planning, design, and tweaking by generations of engineers who thought about the characteristics of speech, images, and syntax and tried to build into the system their best guesses about how to interpret those particular kinds of data.

pages: 580 words: 125,129

Androids: The Team That Built the Android Operating System
by Chet Haase
Published 12 Aug 2021

But in high school, he realized he could just do programming as his main focus. So he changed direction and went to college in computer science, getting his master’s degree in 1998. Debajit spent a few years working on speech recognition, which combined a growing interest in mobile with the ability for users to get information on the go. In 2005, a colleague went to Google to form a speech recognition group. He reached out to Debajit to see if he was interested in coming to Google to work on mobile technology. At first, Debajit wasn’t interested, thinking, “Google? I don’t want to work at Google — it’s way too big of a company.”

…

Hiroshi Lockheimer was crucial in partnership relationships as well, in project-managing the development of Android on partner devices. Rich Miner, another Android co-founder, had come from Orange Telecom, where he had worked with carriers and had run a venture fund that invested in mobile and platform companies (including Danger). In addition to managing the engineering teams working on the Android browser and speech recognition, Rich was part of the business team that helped make the deal for the Motorola Droid, along with Hiroshi and Tom Moss. Tom Moss and the Business Deals Tom Moss worked on many of the key business deals in early Android, but he didn’t come into Google on the business side. “I actually started on Android as a lawyer.

pages: 297 words: 77,362

The Nature of Technology
by W. Brian Arthur
Published 6 Aug 2009

The first and most basic one is that a technology is a means to fulfill a human purpose. For some technologies—oil refining—the purpose is explicit. For others—the computer—the purpose may be hazy, multiple, and changing. As a means, a technology may be a method or process or device: a particular speech recognition algorithm, or a filtration process in chemical engineering, or a diesel engine. It may be simple: a roller bearing. Or it may be complicated: a wavelength division multiplexer. It may be material: an electrical generator. Or it may be nonmaterial: a digital compression algorithm. Whichever it is, it is always a means to carry out a human purpose.

…

INDEX accounting, 85, 153, 197 agriculture, 10, 25, 154, 196 air inlet system, 40, 41 Airbus, 91 aircraft, 7, 10, 22, 182 design of, 72–73, 77, 91, 92–94, 108, 111–12, 120, 133, 136–37 detection of, 22, 39, 49, 73–74, 132 navigation and control of, 25, 30, 72–73, 93–94, 96, 108, 111–12, 132, 206 people and cargo processed by, 30, 32, 92–94 piston-and-propeller, 108, 111, 113, 120, 140–41 propulsion of, 108, 111–12, 120 radar surveillance, 41 stealth, 39–42 see also jet engines; specific aircraft aircraft carriers, 39–42 air traffic control, 132 algorithms, 6, 24, 25, 50, 53, 55, 80, 167, 178, 180–81, 206 digital compression, 28 sorting, 17, 30–31, 98 speech recognition, 28 text-processing, 153 altruism, 142 amplifiers, 69, 83, 167–68 analog systems, 71 anatomy, 13, 14, 32, 43 animals, 9, 53 bones and organs of, 13, 45, 187 genus of, 13 natural selection among, 16 see also vertebrates; specific animals archaeology, 45–46, 88 archaeomagnetic dating, 45 Architectural Digest, 175 architecture, 10, 32, 35, 41–42, 71, 73, 79, 81, 84, 98, 101, 116, 212–13 arithmetic, 81, 108, 125, 182 Armstrong oscillator, 102, 130 Arpanet, 156 artificial intelligence, 12, 215 arts, 15, 72, 77, 79 see also music; painting; poetry Astronomical Society, 74 astronomy, 47–50, 74 Atanasoff-Berry machine, 87 Atomic Energy Commission, U.S., 104 atomic power, 10, 24, 80, 103–5, 114–15, 160, 200 automobiles, 2, 10, 176, 180 autopoiesis, 2–3, 21, 24, 59, 167–70, 188 Babbage, Charles, 74, 75, 126 bacteria, 10, 119, 148, 207 banking, 149, 153–55, 192, 201, 209 bar-codes, 48 barges, 81–83 barometers, 47 batteries, 58, 59, 63 Bauhaus architecture, 212 beekeeping, 25 Bernoulli effect, 52 Bessemer process, 14, 75, 152, 153 biochemistry, 61, 119–20, 123–24, 147 biology, 10, 13, 16, 17, 18, 53, 54, 147–48, 187–88 evolution and, 13, 16, 107, 127–28, 188, 204 molecular, 147, 161, 188 technology and, 28, 61, 206–8 BIOS chip, 13 Black, Fischer, 154 black-bellied plover (pluvialis squatarola), 31 black box concept, 14, 18, 178 blacksmithing, 180 Boeing 737, 96 Boeing 747, 92–94, 109 Boeing 787, 32 bones, 13, 45, 187–88 Boot, Henry, 113 bows, 171 Boyer, Herbert, 148 brain: imaging of, 10 implanting electrodes in, 9 mental processes of, 9, 23, 56, 97, 112, 121–22, 193 parts of, 9, 10, 56, 208 bridges, 29, 109, 150 cable-stayed, 31, 70, 91 concrete, 99–100 bridging technologies, 83–84 bronze, 185 Brown, John Seely, 210 buildings, 47 design and construction of, 10, 71, 72 business, 54, 148, 149, 192, 205 practices of, 80–81, 83, 153, 157, 158–59, 209 Butler, Paul, 47–48, 49–50 Butler, Samuel, 16, 17 cables, 31, 70, 91 fiber optic, 69, 83 calculating devices, 74 canals, 81–83, 85, 150, 192 canoes, 16, 171 capacitors, 59, 69, 169 carbon-14, 45 carrier: air wing, 40, 42 battlegroup, 40–41 Cathcart, Brian, 160 cathedrals, 10 cathode-ray tubes, 57, 59 Cavendish Laboratory, 160 cavity magnetron, 113 Chain, Ernst, 120 Chargaff, Erwin, 77 chemistry, 25, 57, 66, 69, 159, 202, 205 industrial, 75, 162, 171 polymer, 162 Chicago Board of Trade, 156 “chunking,” 36–37, 50 clocks, 33, 36, 38, 49, 158, 198 atomic, 24, 206 cloning, 70 cloud chamber, 61 coal, 82, 83 Cockburn, Lord, 149 Cohen, Stanley, 148 combustion systems, 17, 19, 34, 50, 52, 53, 120 common sense, 65 communication, 66, 78 see also language; telecommunications compressors, 18–19, 34, 51–52, 65, 136–37, 168 computers, 10, 28, 33, 64, 71–73, 75, 80–81, 82, 85, 96, 153–55, 181–83, 203 evolution of, 87, 108–9, 125–26, 146, 150–51, 159, 168–69, 171 intrinsic capabilities of, 88–89 operating systems of, 12–13, 34–35, 36, 72–73, 79–80, 88, 108–9, 150, 156 programming of, 34–35, 53, 71, 88–89 see also algorithms; Internet computer science, 38, 98 concrete, 10, 73, 99–100 contracts, 54, 55, 153–54, 193, 201 derivatives, 154–55, 209 cooling systems, 103–4, 134–35 Copernicus, Nicolaus, 61 copper, 9, 58 cotton, 139, 196 Crick, Francis, 58, 61 Crooke’s tube, 57 Cuvier, Georges, 13 cyclotron, 115, 131 Darwin, Charles, 16, 17–18, 89, 102–3, 107, 127–28, 129, 132, 138, 142, 188, 203–4 “Darwin Among the Machines” (Butler), 16 Darwin’s mechanism, 18, 89, 138 data, 50, 146, 153 processing of, 70, 80–81, 83, 151 dating technologies, 45–46 David, Paul, 157–58 Dawkins, Richard, 102 deep craft, 159–60, 162, 164 de Forest, Lee, 167–68 Deligne, Pierre, 129 dendrochronology, 45 Descartes, René, 208, 211 diabetes, 175 Dickens, Charles, 197 digital technologies, 25, 28, 66, 71, 72, 79–80, 80–81, 82, 84, 117–18, 145, 154, 156, 206 “Digitization and the Economy” (Arthur), 4 DNA, 24, 77, 85, 169, 208 amplification of, 37, 70, 123–24 complementary base pairing in, 57–58, 61, 123–24 extraction and purification of, 61, 70 microarrays, 85 recombinant, 10, 148 replication of, 147 sequencing of, 6, 37, 70, 123–24 domaining, 71–76 definition of, 71–72 redomaining and, 72–74, 85, 151–56 domains, 69–85, 103, 108, 145–65, 171 choice of, 71–73, 101 deep knowledge of, 78–79 definitions of, 70, 80, 84, 145 discipline-based, 146 economy and, 149, 151–56, 163 effectiveness of, 75–76, 150 evolution and development of, 72, 84, 85, 88, 145–65 languages and grammar of, 69, 76–80, 147 mature, 149–50, 165 morphing of, 150–51 novel, 74–75, 152–53 styles defined by, 74–76 subdomains and sub-subdomains of, 71, 151, 165 worlds of, 80–85 Doppler effect, 48, 122 dynamo, 14 Eckert, J.

pages: 274 words: 73,344

Found in Translation: How Language Shapes Our Lives and Transforms the World
by Nataly Kelly and Jost Zetzsche
Published 1 Oct 2012

It exists today, and it’s a field that is growing. But machines cannot yet fully replace humans when it comes to converting spoken language. A human can interpret simultaneously—listening and speaking at nearly the same time. For now, a machine works much more slowly. Actually, the machine has to complete three separate processes. First, a speech recognition program comprehends what was spoken in one language, converting it into text. Then, using automatic translation, the written text gets translated into a second language. For the final step, the machine vocalizes or speaks the translated version of the text. Because there are so many variables involved, speech translation presents even more obstacles to developers than text translation.

…

Department of Defense has spent millions upon millions of dollars over the years on various projects to automate the translation of speech. There are some promising examples of technologies that do a decent job when limited to certain settings or specific languages. There are even some tools that work reasonably well (after significant time spent in training the speech recognition portion) with a single user’s voice. Yet, despite plenty of investment from government organizations and private-sector firms, automated speech translation today does not even come close to doing what human interpreters can do. Enabling human beings who speak different languages to communicate with each other in real time without relying on a human interpreter is one of the final frontiers of translation technology.

pages: 256 words: 73,068

12 Bytes: How We Got Here. Where We Might Go Next
by Jeanette Winterson
Published 15 Mar 2021

Usually we encounter them on response messages, asking us what’s the problem with our washing machine, or that our parcel is on the back porch, or how did we rate Pavel who just delivered a pizza? Chatbots use Natural Language Processing (NLP) to communicate with humans in a specific and limited way. These speech-recognition systems attempt to work out what it is you want. How can I help you? Problems start when we humans try to explain what it is we want. For instance: ‘Do you sell black shoes?’ is fine. But if you type, ‘Do you have black shoes?’ the chatbot might reply, ‘I don’t wear shoes.’ Natural language is trickier than it looks.

…

And it’s not only ‘gaze’. In-car voice-recognition systems respond well to deeper, likely male voices, with standard accents. Speech scientists building voice-recognition data-sets often use TED talks. 70% of people giving TED talks are white men. This matters, because more and more of our daily lives use speech recognition – and it is estimated that voice-commerce will be an 80 billion dollar business by 2023. Does it need to be a gender binary business? Does it need to keep the world’s default as white men and the rest of us – every woman, and most people of colour – as atypical? * * * If you’re wondering why I haven’t added LGBTQQIP2SAA — lesbian, gay, bisexual, transgender, questioning, queer, intersex, pansexual, two-spirit (2S), androgynous and asexual – or even straight to my binaries, it’s because I see all types of homophobia, or sexual-identity discrimination, as gender discrimination.

pages: 296 words: 78,631

Hello World: Being Human in the Age of Algorithms
by Hannah Fry
Published 17 Sep 2018

It’s what led to the intriguing shopping suggestion that confronted Reddit user Kerbobotat after buying a baseball bat on Amazon: ‘Perhaps you’ll be interested in this balaclava?’11 Filtering: isolating what’s important Algorithms often need to remove some information to focus on what’s important, to separate the signal from the noise. Sometimes they do this literally: speech recognition algorithms, like those running inside Siri, Alexa and Cortana, first need to filter out your voice from the background noise before they can get to work on deciphering what you’re saying. Sometimes they do it figuratively: Facebook and Twitter filter stories that relate to your known interests to design your own personalized feed.

…

(TV show) 97–9 John Carter (film) 180 Johnson, Richard 50, 51 Jones Beach 1 Jones, Robert 13–14 judges anchoring effect 73 bail, factors for consideration 73 decision-making consistency in 51 contradictions in 52–3 differences in 52 discretion in 53 unbiased 77 judges (continued) discrimination and bias 70–1, 75 intuition and considered thought 72 lawyers’ preference over algorithms 76–7 vs machines 59–61 offenders’ preference over algorithms 76 perpetuation of bias 73 sentencing 53–4, 63 use of algorithms 63, 64 Weber’s Law 74–5 Jukebox 192 junk algorithms 200 Just Noticeable Difference 74 justice 49–78 algorithms and 54–6 justification for 77 appeals process 51 Brixton riots 49–51 by country Australia 53 Canada 54 England 54 Ireland 54 Scotland 54 United States 53, 54 Wales 54 discretion of judges 53 discrimination 70–1 humans vs machines 59–61, 62–4 hypothetical cases (UK research) 52–3 defendants appearing twice 52–3 differences in judgement 52, 53 hypothetical cases (US research) 51–2 differences in judgements 52 differences in sentencing 52 inherent injustice 77 machine bias 65–71 maximum terms 54 purpose of 77–8 re-offending 54, 55 reasonable doubt 51 rehabilitation 55 risk-assessment algorithms 56 sentencing consistency in 51 mitigating factors in 53 substantial grounds 51 Kadoodle 15–16 Kahneman, Daniel 72 Kanevsky, Dr Jonathan 93, 95 kangaroos 128 Kant, Immanuel 185 Kasparov, Gary 5-7, 202 Kelly, Frank 87 Kerner, Winifred 188–9 Kernighan, Brian x Killingbeck 145, 146 Larson, Steve 188–9 lasers 119–20 Leibniz, Gottfried 184 Leroi, Armand 186, 192–3 level 0 (driverless technology) 131 level 1 (driverless technology) 131 level 2 (driverless technology) 131, 136 careful attention 134–5 level 3 (driverless technology) 131 technical challenge 136 level 4 (driverless technology) 131 level 5 (driverless technology) 131 Li Yingyun 45 Lickel, Charles 97–8 LiDAR (Light Detection and Ranging) 119–20 life insurance 109 ‘Lockdown’ (52Metro) 177 logic 8 logical instructions 8 London Bridge 172 London School of Economics (LSE) 129 Loomis, Eric 217n38 Los Angeles Police Department 152, 155 Lucas, Teghan 161–2, 163 machine-learning algorithms 10–11 neural networks 85–6 random forests 58–9 machines art and 194 bias in 65–71 diagnostic 98–101, 110–11 domination of humans 5-6 vs humans 59–61, 62–4 paradoxical relationship with 22–3 recognising images 84–7 superior judgement of 16 symbolic dominance over humans 5-6 Magic Test 199 magical illusions 18 mammogram screenings 94, 96 manipulation 39–44 micro-manipulation 42–4 Maple, Jack 147–50 Marx, Gary 173 mastectomies 83, 84, 92, 94 maternity wards, deaths on 81 mathematical certainty 68 mathematical objects 8 McGrayne, Sharon Bertsch 122 mechanized weaving machines 2 Medicaid assistance 16–17 medical conditions, algorithms for 96–7 medical records 102–7 benefits of algorithms 106 DeepMind 104–5 disconnected 102–3 misuse of data 106 privacy 105–7 medicine 79–112 in ancient times 80 cancer diagnoses study 79–80 complexity of 103–4 diabetic retinopathy 96 diagnostic machines 98–101, 110–11 choosing between individuals and the population 111 in fifteenth-century China 81 Hippocrates and 80 magic and 80 medical records 102–6 neural networks 85–6, 95, 96, 219–20n11 in nineteenth-century Europe 81 pathology 79, 82–3 patterns in data 79–81 predicting dementia 90–2 scientific base 80 see also Watson (IBM computer) Meehl, Paul 21–2 MegaFace challenge 168–9 Mercedes 125–6 microprocessors x Millgarth 145, 146 Mills, Tamara 101–2, 103 MIT Technology Review 101 modern inventions 2 Moses, Robert 1 movies see films music 176–80 choosing 176–8 diversity of charts 186 emotion and 189 genetic algorithms 191–2 hip hop 186 piano experiment 188–90 algorithm 188, 189–91 popularity 177, 178 quality 179, 180 terrible, success of 178–9 Music Lab 176–7, 179, 180 Musk, Elon 138 MyHeritage 110 National Geographic Genographic project 110 National Highway Traffic Safety Administration 135 Navlab 117 Netflix 8, 188 random forests 59 neural networks 85–6, 95, 119, 201, 219–20n11 driverless cars 117–18 in facial recognition 166–7 predicting performances of films 183 New England Journal of Medicine 94 New York City subway crime 147–50 anti-social behaviour 149 fare evasion 149 hotspots 148, 149 New York Police Department (NYPD) 172 New York Times 116 Newman, Paul 127–8, 130 NHS (National Health Service) computer virus in hospitals 105 data security record 105 fax machines 103 linking of healthcare records 102–3 paper records 103 prioritization of non-smokers for operations 106 nuclear war 18–19 Nun Study 90–2 obesity 106 OK Cupid 9 Ontario 169–70 openworm project 13 Operation Lynx 145–7 fingerprints 145 overruling algorithms correctly 19–20 incorrectly 20–1 Oxbotica 127 Palantir Technologies 31 Paris Auto Show (2016) 124–5 parole 54–5 Burgess’s forecasting power 55–6 violation of 55–6 passport officers 161, 164 PathAI 82 pathologists 82 vs algorithms 88 breast cancer research on corpses 92–3 correct diagnoses 83 differences of opinion 83–4 diagnosing cancerous tumours 90 sensitivity and 88 specificity and 88 pathology 79, 82 and biology 82–3 patterns in data 79–81, 103, 108 payday lenders 35 personality traits 39 advertising and 40–1 inferred by algorithm 40 research on 39–40 Petrov, Stanislav 18–19 piano experiment 188–90 pigeons 79–80 Pomerleau, Dean 118–19 popularity 177, 178, 179, 183–4 power 5–24 blind faith in algorithms 13–16 overruling algorithms 19–21 struggle between humans and algorithms 20–4 trusting algorithms 16–19 power of veto 19 Pratt, Gill 137 precision in justice 53 prediction accuracy of 66, 67, 68 algorithms vs humans 22, 59–61, 62–5 Burgess 55–6 of crime burglary 150–1 HunchLab algorithm 157–8 PredPol algorithm 152–7, 158 risk factor 152 Strategic Subject List algorithm 158 decision trees 56–8 dementia 90–2 prediction (continued) development of abnormalities 87, 95 homicide 62 of personality 39–42 of popularity 177, 178, 179, 180, 183–4 powers of 92–6 of pregnancy 29–30 re-offending criminals 55–6 recidivism 62, 63–4, 65 of successful films 180–1, 182–3, 183 superiority of algorithms 22 see also Clinical vs Statistical Prediction (Meehl); neural networks predictive text 190–1 PredPol (PREDictive POLicing) 152–7, 158, 228–9n27 assessing locations at risk 153–4 cops on the dots 155–6 fall in crime 156 feedback loop 156–7 vs humans, test 153–4 target hardening 154–5 pregnancy prediction 29–30 prescriptive sentencing systems 53, 54 prioritization algorithms 8 prisons cost of incarceration 61 Illinois 55, 56 reduction in population 61 privacy 170, 172 false sense of 47 issues 25 medical records 105–7 overriding of 107 sale of data 36–9 probabilistic inference 124, 127 probability 8 ProPublica 65–8, 70 quality 179, 180 ‘good’ changing nature of 184 defining 184 quantifying 184–8 difficulty of 184 Washington Post experiment 185–6 racial groups COMPAS algorithm 65–6 rates of arrest 68 radar 119–20 RAND Corporation 158 random forests technique 56–9 rape 141, 142 re-offending 54 prediction of 55–6 social types of inmates 55, 56 recidivism 56, 62, 201 rates 61 risk scores 63–4, 65 regulation of algorithms 173 rehabilitation 55 relationships 9 Republican voters 41 Rhode Island 61 Rio de Janeiro–Galeão International Airport 132 risk scores 63–4, 65 Robinson, Nicholas 49, 50, 50–1, 77 imprisonment 51 Rossmo, Kim 142–3 algorithm 145–7 assessment of 146 bomb factories 147 buffer zone 144 distance decay 144 flexibility of 146 stagnant water pools 146–7 Operation Lynx 145–7 Rotten Tomatoes website 181 Royal Free NHS Trust 222–3n48 contract with DeepMind 104–5 access to full medical histories 104–5 outrage at 104 Rubin’s vase 211n13 rule-based algorithms 10, 11, 85 Rutherford, Adam 110 Safari browser 47 Sainsbury’s 27 Salganik, Matthew 176–7, 178 Schmidt, Eric 28 School Sisters of Notre Dame 90, 91 Science magazine 15 Scunthorpe 2 search engines 14–15 experiment 14–15 Kadoodle 15–16 Semmelweis, Ignaz 81 sensitivity, principle of 87, 87–8 sensors 120 sentencing algorithms for 62–4 COMPAS 63, 64 considerations for 62–3 consistency in 51 length of 62–3 influencing 73 Weber’s Law 74–5 mitigating factors in 53 prescriptive systems 53, 54 serial offenders 144, 145 serial rapists 141–2 Sesame Credit 45–6, 168 sexual attacks 141–2 shoplifters 170 shopping habits 28, 29, 31 similarity 187 Slash X (bar) 113, 114, 115 smallpox inoculation 81 Snowden, David 90–2 social proof 177–8, 179 Sorensen, Alan 178 Soviet Union detection of enemy missiles 18 protecting air space 18 retaliatory action 19 specificity, principle of 87, 87–8 speech recognition algorithms 9 Spotify 176, 188 Spotify Discover 188 Sreenivasan, Sameet 181–2 Stammer, Neil 172 Standford University 39–40 STAT website 100 statistics 143 computational 12 modern 107 NYPD 172 Stilgoe, Jack 128–9, 130 Strategic Subject List 158 subway crime see New York City subway crime supermarkets 26–8 superstores 28–31 Supreme Court of Wisconsin 64, 217n38 swine flu 101–2 Talley, Steve 159, 162, 163–4, 171, 230n47 Target 28–31 analysing unusual data patterns 28–9 expectant mothers 28–9 algorithm 29, 30 coupons 29 justification of policy 30 teenage pregnancy incident 29–30 target hardening 154–5 teenage pregnancy 29–30 Tencent YouTu Lab algorithm 169 Tesco 26–8 Clubcard 26, 27 customers buying behaviour 26–7 knowledge about 27 loyalty of 26 vouchers 27 online shopping 27–8 ‘My Favourites’ feature 27–8 removal of revealing items 28 Tesla 134, 135 autopilot system 138 full autonomy 138 full self-driving hardware 138 Thiel, Peter 31 thinking, ways of 72 Timberlake, Justin 175–6 Timberlake, Justin (artist) 175–6 Tolstoy, Leo 194 TomTom sat-nav 13–14 Toyota 137, 210n13 chauffeur mode 139 guardian mode 139 trolley problem 125–6 true positives 67 Trump election campaign 41, 44 trust 17–18 tumours 90, 93–4 Twain, Mark 193 Twitter 36, 37, 40 filtering 10 Uber driverless cars 135 human intervention 135 uberPOOL 10 United Kingdom (UK) database of facial images 168 facial recognition algorithms 161 genetic tests for Huntington’s disease 110 United States of America (USA) database of facial images 168 facial recognition algorithms 161 life insurance stipulations 109 linking of healthcare records 103 University of California 152 University of Cambridge research on personality traits 39–40 and advertising 40–1 algorithm 40 personality predictions 40 and Twitter 40 University of Oregon 188–90 University of Texas M.

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurelien Geron
Published 14 Aug 2019

…

Now the child is able to recognize apples in all sorts of colors and shapes. Genius. Machine Learning is not quite there yet; it takes a lot of data for most Machine Learning algorithms to work properly. Even for very simple problems you typically need thousands of examples, and for complex problems such as image or speech recognition you may need millions of examples (unless you can reuse parts of an existing model). The Unreasonable Effectiveness of Data In a famous paper published in 2001, Microsoft researchers Michele Banko and Eric Brill showed that very different Machine Learning algorithms, including fairly simple ones, performed almost identically well on a complex problem of natural language disambiguation8 once they were given enough data (as you can see in Figure 1-20).

pages: 328 words: 77,877

API Marketplace Engineering: Design, Build, and Run a Platform for External Developers
by Rennay Dorasamy
Published 2 Dec 2021

One day, I will hopefully have a smart lock which will join that routine. To the layman, this may seem easy to achieve – however, this is integration engineering at its finest. It is simple, reliable, and easy to use. And it is all underpinned by APIs. The words I spoke to Alexa were sent to a speech recognition service which converted the speech to text. Thereafter a Natural Language Understanding (NLU) service converted the text to an Alexa intent. The intent triggered smart home skills which interface to my universal remote and light automation via APIs. For this seemingly simple request, it triggered a wave of activity that probably spanned continents.

…

Index A Account Information Service Providers (AISP) Account Servicing Payment Service Provider (ASPSP) Amazon Echo devices Application development administration authorization development guidance gateway product implementation hosted applications lightweight microservices operation architecture client/server elements integration logic components logic orchestration middleware component platform services portal applications principles rate limiting security third-party onboarding well-defined requests Application Performance Monitoring (APM) Application Programming Interface (API) aircraft carrier altruistic view Amazon Echo devices banking benefits banking industry SeeBanking industry benefits coding principle collaboration/sharing information connectedness counter-survival delivery lead democratizing technology developer centricity developer ecosystem digital channel engineering lead (EL) enterprise-wide impact Forensics teams foundational elements human-readable information security integration pattern interface Krypton lab environment manifesto concepts operational platform operations lead (OL) pivot product owner (PO) program executive requirements retrofit products sharing services speech recognition service speedboat streamline performance and execution team dynamic members technical debt third-party participation timeframe Automatic Teller Machines (ATMs) B Banking industry across territories Australia benefits China European Union facilitative/flexible approach Hong Kong India Japan objectives operational risk overview payment system benefits prescriptive approach reputational risk risks association significant risks Singapore South Africa terminology third-party providers United State (USA) visual representation Billing engineering event sequence monetization approach real-time sequences technical flow transaction bunq application Business development attract bulid trust case studies detailing collaboration consultation sessions customer testimonial educate lead mailing lists non-technical overview partnership pricing reference customers technical jargon timelines transparency use-cases Business to Business (B2B) security Business to Consumer (B2C) security C CHIP application Client URL (cURL) capability code snippets education fiddler mandatory elements neutrality postman sharing variables and environments Commercial-Off-The-Shelf (COTS) Commercial proposition SeeMonetization Consumption advocacy business SeeBusiness development business/technical personas client URL SeeClient URL (cURL) ecosystem engagements interactions internal and external developers Marketplace vs. internal APIs personas support technical-developer portal attract beta program blog posts and podcasts blueprints collaborate developer certification educate hackathons impact assessment instructions/code lead/product owner (PO) messaging channels patterns product/service release notes status page transparency trust user groups/community sessions toolings User Experience (UX) Continuous Delivery (CD) Continuous integration (CI) D Demilitarized Zone (DMZ) Design Strategies access mechanisms availability bottom up build your own strategy business rules compliance requirements consideration consumer-driven approach definition documentation enterprise architecture (EA) error handling filtering/pagination governance guidelines integration approach lifecycle design process developer experience end of life versioning maintainability/onboarding patterns asynchronous callback pattern complex business logics event-driven architecture polling approach proxy vs. tap debate synchronous tap-and-go strategy performance pre-defined/top-down regulatory requirements reporting and historical data requirements Software Development Kit (SDK) viability/feasibility E Elasticsearch, Logstash, and Kibana (ELK) Electronic funds transfer (EFT) Enterprise Application Integration (EAI) Enterprise Java Beans (EJBs) Extensible Markup Language (XML) F Financial Services (FS) APIs applications benefits commercial models customer financial data digital payment services direct vs. brokered integration dispute mechanisms open banking/finance payment initiation regulation screen scraping standardization wait-and-see approach Financial stability and security G, H Google Remote Procedure Call (gRPC) bidirectional streaming client streaming knee-jerk reaction proto definition server streaming/unary types GraphQL I Integration strategy code elements components dedicated adapter deployment architecture as-is configuration launch configuration to-be configuration deployment strategies duplicated framework gRPC logic review microservice middleware components overview platform as a service platform services auditing error handling property management shared libraries/packages snowflakes tracing port-forward connectivity shared library taxonomy business logic categories components connectivity component loose architectural pattern microservice middleware service assembly J Java Enterprise Edition (JEE) JavaScript Object Notation (JSON) K, L Kibana Query Language (KQL) Knee-jerk reaction M Marketplace vs. internal APIs Mobile Virtual Network Operators (MVNO) Monetization analytics/insight analytics collection dashboards email historical path/progress implementation instant messages operational metrics reporting platform billing engineering business models flywheel identification implementation leveraging economies Marketplace positioning notional income statement value/revenue strategies affiliate Business-to-business (B2B) customer data financial perspective free/freemium freemium gets paid indirect strategy information pays points based service referral program reputational process revenue share social media personalities tiered services transaction fee Monitoring capability alerting strategy application performance monitoring environment functional monitoring infrastructure overview service telemetry transaction tracing user interface (UI) Monolith vs. microservice N Notional Income Statement O OAuth (open-standard framework) actors/participants application registration client/resource server process flow server interaction authorization code flow client credentials client credentials/authorization credential phishing flow grant type/access tokens Lucidchart login open banking variation permission administration refresh token scenario vulnerability Open Web Application Security Project (OWASP) authentication mechanisms excessive data exposure function level authorization guidelines improper assets management injection flaws insufficient logging and monitoring mass assignment misconfiguration object level authorization resources and rate limiting working process Operational expenditure (OPEX) budget Operational universe change/release management high-level view implementation guide internal/supporting elements product iteration quality assurance team release funnel requirements third-party consumption updates DevOps process foundational elements logging process architecture principle contexts Elasticsearch Kibana console mounting persistent storage overview strategies monitoring overview parameters/conditions platform supporting systems approaches architecture backend dependencies managed services process flow application domain incident management issue tracking/reporting severity-response mapping sub-domains support capability supporting systems dependencies traffic/value transactions Optical Character Recognition (OCR) Organizational capability P, Q Payment Initiation Service Providers (PISP) Payment Services Directive (PSD2) Platform architecture API gateway (external/internal) authorization framework container deployment database docker configuration elements identification inherent benefit integration strategy SeeIntegration strategy iterations Managed container platform microservices layer middleware platform snowflake environment spreadsheet time-lapsed view virtualized approach virtual Machines (VM) Platform as a Service (PaaS) Proof of Concept (PoC) R Remote Procedure Calls (RPC) SeeGoogle Remote Procedure Call (gRPC) Representational State Transfer (REST) Revolut application S Sandbox strategies access key areas/services backend simulation approach conditions design environment pros/cons system virtualiser beta approach design considerations inception phase pros/cons strategies use-cases environments full-blown operational environment functional testing interface definition live context objective onboarding process overview purpose quality assurance (QA) approach design foundational consumer pros/cons use-cases semi-live approach design environment pros/cons use-cases shallow approach backend simulation approach design flow diagram overview pros/cons use-cases third parties unique opportunity unique synergy virtualiser configuration customizations design philosophy implementation options integration components predicate parameters process requirements responses runtime configuration Screen scraping Security application code approaches Business to Business (B2B) Business to Consumer (B2C) client identifier container cross-cutting concern dynamicity infrastructure meaning network OAuth SeeOAuth (open-standard framework) open API pattern OWASP review process Service-Level Agreements (SLAs) Simple Object Access Protocol (SOAP) Software Development Kit (SDK) Software development lifecycle application SeeApplication development delivery approach agile methodology initial strategy ongoing delivery planning sprint approach squad strategies DevOps aspirational goal continuous delivery continuous integration (CI) implementation microservices integration requirement objectives philosophy borrows retrieve reference data team structure automated testing consumer/provider delivery leads developer journey development documentation finance/reporting orchestrating development ownership performance testing quality assurance (QA) responsibilities security team unit testing Speedboat operating model Standardization and neutral technology Storage Area Network (SAN) Support capability communications strategy details development focus call foundational principle full-stack squad level 1/2 support logging/error handling operations lead people/process/technology periodic updates third party communication transition war room T, U, V, W X Third-Party Provider (TPP) Transparency and public accountability TrueLayer application Y, Z YOLT’s application

The Book of Why: The New Science of Cause and Effect
by Judea Pearl and Dana Mackenzie
Published 1 Mar 2018

The owl can be a good hunter without understanding why the rat always goes from point A to point B. Some readers may be surprised to see that I have placed present-day learning machines squarely on rung one of the Ladder of Causation, sharing the wisdom of an owl. We hear almost every day, it seems, about rapid advances in machine learning systems—self-driving cars, speech-recognition systems, and, especially in recent years, deep-learning algorithms (or deep neural networks). How could they still be only at level one? The successes of deep learning have been truly remarkable and have caught many of us by surprise. Nevertheless, deep learning has succeeded primarily by showing that certain questions or tasks we thought were difficult are in fact not.

…

Thanks in part to Bonaparte’s accuracy and speed, the NFI managed to identify remains from 294 of the 298 victims by December 2014. As of 2016, only two victims of the crash (both Dutch citizens) have vanished without a trace. Bayesian networks, the machine-reasoning tool that underlies the Bonaparte software, affect our lives in many ways that most people are not aware of. They are used in speech-recognition software, in spam filters, in weather forecasting, in the evaluation of potential oil wells, and in the Food and Drug Administration’s approval process for medical devices. If you play video games on a Microsoft Xbox, a Bayesian network ranks your skill. If you own a cell phone, the codes that your phone uses to pick your call out of thousands of others are decoded by belief propagation, an algorithm devised for Bayesian networks.

…

All of this is exciting, and the results leave no doubt: deep learning works for certain tasks. But it is the antithesis of transparency. Even AlphaGo’s programmers cannot tell you why the program plays so well. They knew from experience that deep networks have been successful at tasks in computer vision and speech recognition. Nevertheless, our understanding of deep learning is completely empirical and comes with no guarantees. The AlphaGo team could not have predicted at the outset that the program would beat the best human in a year, or two, or five. They simply experimented, and it did. Some people will argue that transparency is not really needed.

pages: 296 words: 78,112

Devil's Bargain: Steve Bannon, Donald Trump, and the Storming of the Presidency
by Joshua Green
Published 17 Jul 2017

But pattern-hunting worked. A computer could learn to recognize patterns without regard for the rules of grammar and still produce a successful translation. “Statistical machine translation,” as the process became known, soon outpaced the old method and went on to become the basis of modern speech-recognition software and tools such as Google Translate. At Renaissance, Mercer and Brown applied this approach broadly to the markets, feeding all kinds of abstruse data into their computers in a never-ending hunt for hidden correlations. Sometimes they found them in strange places. Even by the paranoid standards of black-box quantitative hedge funds, Renaissance is notoriously secretive about its methods.

…

According to The Washington Post, between 2009 and 2014, the family donated $35 million to conservative think tanks and at least $36.5 million to individual GOP races, www.washingtonpost.com/politics/pro-trump-megadonor-is-part-owner-of-breitbart-news-empire-ceo-reveals/2017/02/24/9f16eea4-fad8-11e6-9845-576c69081518_story.html?utm_term=.86e1f0f5e7c4. modern speech-recognition: Zachary Mider, “What Kind of Man Spends Millions to Elect Ted Cruz?,” Bloomberg Politics, January 20, 2016, https://www.bloomberg.com/politics/features/2016-01-20/what-kind-of-man-spends-millions-to-elect-ted-cruz-. “Bob told me he believed”: Jane Mayer, “The Reclusive Hedge-Fund Tycoon Behind the Trump Presidency,” New Yorker, March 27, 2017, www.newyorker.com/magazine/2017/03/27/the-reclusive-hedge-fund-tycoon-behind-the-trump-presidency.

pages: 590 words: 152,595

Army of None: Autonomous Weapons and the Future of War
by Paul Scharre
Published 23 Apr 2018

Deep learning neural networks, first mentioned in chapter 5 as one potential solution to improving military automatic target recognition in DARPA’s TRACE program, have been the driving force behind astounding gains in AI in the past few years. Deep neural networks have learned to play Atari, beat the world’s reigning champion at go, and have been behind dramatic improvements in speech recognition and visual object recognition. Neural networks are also behind the “fully automated combat module” that Russian arms manufacturer Kalashnikov claims to have built. Unlike traditional computer algorithms that operate based on a script of instructions, neural networks work by learning from large amounts of data.

…

AlphaGo can beat any human at go, but it can’t play a different game, drive a car, or make a cup of coffee. Still, the tools used to train AlphaGo are generalizable tools that can be used to build any number of special-purpose narrow AIs to solve various problems. Deep neural networks have been used to solve other thorny problems that have bedeviled the AI community for years, notably speech recognition and visual object recognition. A deep neural network was the tool used by the research team I witnessed autonomously find the crashed helicopter. The researcher on the project explained that he had taken an existing neural network that had already been trained on object recognition, stripped off the top few layers, then retrained the network to identify helicopters, which hadn’t originally been in its image dataset.

…

This hack exploits their weakness on the extremes, however, in the space of all possible images, which is virtually infinite. Because this vulnerability stems from the basic structure of the neural net, it is present in essentially every deep neural network commonly in use today, regardless of its specific design. It applies to visual object recognition neural nets but also to those used for speech recognition or other data analysis. This exploit has been demonstrated with song-interpreting AIs, for example. Researchers fed specially evolved noise into the AI, which sounds like nonsense to humans, but which the AI confidently interpreted as music. In some settings, the consequences of this vulnerability could be severe.

pages: 324 words: 92,805

The Impulse Society: America in the Age of Instant Gratification
by Paul Roberts
Published 1 Sep 2014

“The ‘Me’ Decade and the Third Great Awakening.” New York Magazine, August 23, 1976. Wood, Allen W. “Hegel on Education.” In Philosophy as Education, edited by Amélie O. Rorty. London: Routledge, 1998. Notes Chapter 1: More Better 1. Andrew Nusca, “Say Command: How Speech Recognition Will Change the World,” SmartPlanet, Issue 7, at http://www.smartplanet.com/blog/smart-takes/say-command-how-speech-recognition-will-change-the-world/19895tag=content;siu-container. 2. Apple video introducing Siri, at http://www.youtube.com/watchv=8ciagGASro0. 3. The Independent, 86–87 (1916), at http://books.google.com/booksid=IZAeAQAAMAAJ&lpg=PA108&ots=L5W1-w9EDW&dq=Edward%20Earle%20Purinton&pg=PA246#v=onepage&q=Edward%20Earle%20Purinton&f=false. 4.

pages: 329 words: 93,655

Moonwalking With Einstein
by Joshua Foer
Published 3 Mar 2011

See also specific titles Borges, Jorge Luis Born on a Blue Day (Tammet) Bradwardine, Thomas Brain and Mind Brainman brain(s) capacity of of chess masters computer and, seamless connection between as energetically expensive experimenting on of Kim Peek of London cabdrivers of mental athletes mysteries of neuroplasticity of part used while memorizing physical structure of random-access indexing system of temporary turn off of brain training software Bruno, Giordano Buddha Buzan, Tony appearance of author’s interview with awakening to art of memory BBC series of on education home of on intelligence on memory Mind Mapping skills and talents of travel schedule of World Memory Championships and writings of Byblos calendar calculating Cambridge Autism Research Center Camillo, Giulio cards. See speed cards Carroll, Lewis Carruthers, Mary Carvello, Creighton Charmadas Chase, Bill chess masters chicken sexers chunking Cicero Clemens, Samuel L. Clemons, Alonzo cochlear implants computer and brain, seamless connection between memory speech recognition Confessions (Augustine) context Cooke, Ed as author’s coach birthday party of career plans of classroom demonstration of family life of home of intellectual pursuits of memorization projects of memorizing techniques of personality of on remembering experiences speed cards and Tony Buzan and at World Memory Championships Cooke, Rod Cooke, Teen corpus callosum creativity, memory and cricket cultural literacy Cyrus, King Darnton, Robert Dead Reckoning: Calculating Without Instruments Dean, John, memory of deliberate practice De Oratore (Cicero) Dewey, John digital information, externalization of memory and digit span author’s SF’s test Discover Doerfler, Ronald Dottino, Tony “Double Deck’r Bust” Down, John Langdon Draschl, Corinna drawing dreams Du Bois, W.

…

See also Talented Tenth Santos, Chester savant syndrome definition of reasons for S (case study) compulsive remembering of inability to forget personal life of regimented memory of synesthesia of Scipio, Lucius Scoville, William scriptio continua scrolls self/identity Seneca the Elder Seneca the Younger SenseCam Serrell, Orlando SF (case study) Shakespeare Shass Pollak (Talmud Pole) Siffre, Michel Simonides of Ceos Simplicius Simpson’s in the Strand singularity skill acquisition “Skilled Memory Theory” Slate Small, Jocelyn Penny Smith, Steven Snyder, Allan Socrates song, as structuring device for language spatial navigation. See also memory palace(s)/art of memory speech speech recognition speed cards author as U.S. record holder author’s work on Ben Pridmore and Ed Cooke and techniques for memorizing at U.S. Memory Championships at World Memory Championships speed numbers sports records Squire, Larry Stanislavski, Konstantin stardom, author’s Stoll, Maurice Stratton, George Stromeyer, Charles surgeons SWAT officers swimming synesthesia testing for table of contents Talented Tenth Tammet, Daniel Anders Ericsson and Asperger’s syndrome of author and Ben Pridmore on childhood of epileptic seizure of Kim Peek compared to as mental mathematician numbers and online memory course of as psychic skepticism about study of synesthesia of at World Memory Championships tannaim Test of Genuineness for Synesthesia text(s).

pages: 305 words: 93,091

The Art of Invisibility: The World's Most Famous Hacker Teaches You How to Be Safe in the Age of Big Brother and Big Data
by Kevin Mitnick , Mikko Hypponen and Robert Vamosi
Published 14 Feb 2017

Researchers carried out another experiment in which they created a mock video file and loaded it to a USB drive, then plugged it into their TV. When they analyzed network traffic, they found that the video file name was transmitted unencrypted within http traffic and sent to the address GB.smartshare.lgtvsdp.com. Sensory, a company that makes embedded speech-recognition solutions for smart products, thinks it can do even more. “We think the magic in [smart TVs] is to leave it always on and always listening,” says Todd Mozer, CEO of Sensory. “Right now [listening] consumes too much power to do that. Samsung’s done a really intelligent thing and created a listening mode.

…

The malware in this case, researchers say, can also pick up minute air vibrations, including those produced by human speech. Google’s Android operating system allows movements from the sensors to be read at 200 Hz, or 200 cycles per second. Most human voices range from 80 to 250 Hz. That means the sensor can pick up a significant portion of those voices. Researchers even built a custom speech-recognition program designed to interpret the 80–250 Hz signals further.9 Cui found something similar within the VoIP phones and printers. He found that the fine pins sticking out of just about any microchip within any embedded device today could be made to oscillate in unique sequences and therefore exfiltrate data over radio frequency (RF).

pages: 340 words: 90,674

The Perfect Police State: An Undercover Odyssey Into China's Terrifying Surveillance Dystopia of the Future
by Geoffrey Cain
Published 28 Jun 2021

Back in the late 1990s, a promising young researcher named Liu Qingfeng had turned down an internship offer at Microsoft Research Asia and instead dedicated his career to his start-up, iFlyTek, with a mission to develop cutting-edge voice recognition technology. “I told him that he was a great young researcher but China lagged too far behind American speech-recognition giants like Nuance, and there were fewer customers in China for this technology,” Kai-fu Lee wrote. “To his credit, Liu ignored that advice and poured himself into building iFlyTek.”22 In 2010, iFlyTek set up a laboratory in Xinjiang to develop speech recognition technology for translating the Uyghur language to Mandarin Chinese,23 technology that would soon be used to track and monitor Uyghur populations.24 By 2016, iFlyTek was the supplier of twenty-five “voiceprint” systems in Kashgar that captured the unique signatures of a person’s voice in order to help identify and track people.25 “All these companies were coming to Xinjiang,” Irfan recalled.

pages: 504 words: 89,238

Natural language processing with Python
by Steven Bird , Ewan Klein and Edward Loper
Published 15 Dec 2009

As in other chapters, there will be many examples drawn from practical experience managing linguistic data, including data that has been collected in the course of linguistic fieldwork, laboratory work, and web crawling. 11.1 Corpus Structure: A Case Study The TIMIT Corpus was the first annotated speech database to be widely distributed, and it has an especially clear organization. TIMIT was developed by a consortium including Texas Instruments and MIT, from which it derives its name. It was designed to provide data for the acquisition of acoustic-phonetic knowledge and to support the development and evaluation of automatic speech recognition systems. The Structure of TIMIT Like the Brown Corpus, which displays a balanced selection of text genres and sources, TIMIT includes a balanced selection of dialects, speakers, and materials. For each of eight dialect regions, 50 male and female speakers having a range of ages and educational backgrounds each read 10 carefully chosen sentences.

…

Curation Versus Evolution As large corpora are published, researchers are increasingly likely to base their investigations on balanced, focused subsets that were derived from corpora produced for 414 | Chapter 11: Managing Linguistic Data entirely different reasons. For instance, the Switchboard database, originally collected for speaker identification research, has since been used as the basis for published studies in speech recognition, word pronunciation, disfluency, syntax, intonation, and discourse structure. The motivations for recycling linguistic corpora include the desire to save time and effort, the desire to work on material available to others for replication, and sometimes a desire to study more naturalistic forms of linguistic behavior than would be possible otherwise.

…

For a certain period in the development of NLP, particularly during the 1980s, this premise provided a common starting point for both linguists and practitioners of NLP, leading to a family of grammar formalisms known as unification-based (or feature-based) grammar (see Chapter 9), and to NLP applications implemented in the Prolog programming language. Although grammar-based NLP is still a significant area of research, it has become somewhat eclipsed in the last 15–20 years due to a variety of factors. One significant influence came from automatic speech recognition. Although early work in speech processing adopted a model that emulated the kind of rule-based phonological phonology processing typified by the Sound Pattern of English (Chomsky & Halle, 1968), this turned out to be hopelessly inadequate in dealing with the hard problem of recognizing actual speech in anything like real time.

The Singularity Is Nearer: When We Merge with AI
by Ray Kurzweil
Published 25 Jun 2024

These steps (except for the last one) are detailed below: The Problem Input The problem input to the neural net consists of a series of numbers. This input can be: in a visual pattern recognition system, a two-dimensional array of numbers representing the pixels of an image; or in an auditory (e.g., speech) recognition system, a two-dimensional array of numbers representing a sound, in which the first dimension represents parameters of the sound (e.g., frequency components) and the second dimension represents different points in time; or in an arbitrary pattern recognition system, an n-dimensional array of numbers representing the input pattern.

…

See also photovoltaics solar system, 30, 97 SolarWindow Technologies, 173 Sophia (robot), 101–2 Soviet Union, 138, 162–63, 269–70 Spain literacy, 124, 125 poverty rate, 117 spatial-temporal trade-off, 69–70 speech recognition system, 20 sperm, 95 Spider-Man, 44 spinal cord injuries, 70 SSRIs (selective serotonin reuptake inhibitors), 240 Stable Diffusion, 49, 209, 221 Standard Model, 96–97, 335n standard of living, 216, 224, 227–28 Stanford University, 207–8, 232, 243, 262 StarCraft II (video game), 42 star formation, 30, 97 Star Wars (movies), 114 stem cells, 186, 190–91, 260 Stephenson, Neal, 250–51 sticky fingers problem, 250 Stirling engines, 179 stock market, 116 Stone Age, 79, 114 storytelling and negative bias, 114–16, 119–20 Stranger Things (TV show), 220 streaming, 130, 220–21 strong nuclear force, 7, 96–97 subconscious, 30–31 subjective consciousness, 62, 76–80, 81, 93, 94 subtractive manufacturing, 183 suffering, 78, 170, 230, 231, 284 Super Bowl (1984), 132 supercentenarians, 256 supercomputing, 41, 61, 154 superintelligent AI, 109 bottlenecks, 60, 61 risks and perils, 278–85 supernovas, 7, 97 superstition, 65 superviruses, 271–72 surgeries, robotic, 244–45 symbolic computing, 14–19, 40 synapses, 61, 93 synchronous neural nets.

…

He was the principal inventor of the first CCD flat-bed scanner, omni-font optical character recognition, print-to-speech reading machine for the blind, text-to-speech synthesizer, music synthesizer capable of recreating the grand piano and other orchestral instruments, and commercially marketed large-vocabulary speech recognition software. Ray received a GRAMMY® Award for outstanding achievement in music technology; he is the recipient of the National Medal of Technology and was inducted into the National Inventors Hall of Fame. He has written five best-selling books including The Singularity Is Near and How to Create a Mind.

pages: 374 words: 97,288

The End of Ownership: Personal Property in the Digital Economy
by Aaron Perzanowski and Jason Schultz
Published 4 Nov 2016

This IoT Barbie looks like many of her predecessors but offers a unique feature. She can engage in conversation with a child and learn about them in the process. Barbie does this by recording her conversations and transmitting them via network connections to ToyTalk, a third-party cloud-based speech recognition service. ToyTalk then uses software and data analytics to analyze those conversations and deliver personalized responses. It’s an impressive trick, but the implications for our sense of ownership are quite shocking. For many children, talking to toy dolls is a way to share their unfiltered thoughts, dreams, and fears in a safe, private environment.

…

But according to the terms of the Hello Barbie EULA, ToyTalk and its unnamed partners have wide latitude to make use of information about your child’s conversations in ways that few parents would anticipate: All information, materials and content ... is owned by ToyTalk or is used with permission. ... You agree that ToyTalk and its licensors and contractors may use, transcribe and store. ... Recordings and any speech data contained therein, including your voice and likeness as may be captured therein, to provide and maintain the ToyTalk App, to develop, tune, test, enhance or improve speech recognition technology and artificial intelligence algorithms, to develop acoustic and language models and for other research and development purposes. ... By using any Service, you consent to ToyTalk’s collection, use and/or disclosure of your personal information as described in this Policy. By allowing other people to use the Service via your account, you are confirming that you have the right to consent on their behalf to ToyTalk’s collection, use and disclosure of their personal information as described below.

pages: 317 words: 101,074

The Road Ahead
by Bill Gates , Nathan Myhrvold and Peter Rinearson
Published 15 Nov 1995

The seeds of new competition are being sown constantly in research environments and garages around the world. For instance, the Internet is becoming so important that Windows will only thrive if it is clearly the best way to gain access to the Internet. All operating-system companies are rushing to find ways to have a competitive edge in providing Internet support. When speech recognition becomes genuinely reliable, this will cause another big change in operating systems. In our business things move too fast to spend much time looking back. I pay close attention to our mistakes, however, and try to focus on future opportunity. It's important to acknowledge mistakes and make sure you draw some lesson from them.

…

These innovations will first show up in the mainstream in the high-volume office-productivity packages: word processors, spreadsheets, presentation packages, databases, and electronic mail. Some proponents claim these tools are so capable already that there will never be a need for newer versions. But there were those who thought that about software five and ten years ago. Over the next few years, as speech recognition, social interfaces, and connections to the information highway are incorporated into core applications, I think individuals and companies will find the productivity enhancements these improved applications will bring extremely attractive. The greatest improvement in productivity, and the greatest change in work habits, will be brought about because of networking.

pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines
by Thomas H. Davenport and Julia Kirby
Published 23 May 2016

Humans are still better able to make subjective judgments on unstructured data, such as interpreting the meaning of a poem, or distinguishing between images of good neighborhoods and bad ones. But computers are making headway even on these fronts. Meanwhile, intelligent applications that already combine text, image, and speech recognition offer very welcome “human support” by making it easier for us to communicate with computers. As you probably know, it is very difficult for machines to deal with high levels of variation in speech accents, pronunciation, volume, background noise, and so forth. If you use Siri on your iPhone or have an Amazon Echo device, you know both the joy and the frustration.

…

This might involve translating words across languages, understanding questions posed by people in plain language, and answering in kind, or “reading” a text with sufficient understanding to summarize it—or create new passages in the same style. Machine translation has been around for a while, and like everything else digital, it gets better all the time. Written language translation has progressed much faster than spoken language, since no speech recognition is necessary, but both are becoming quite useful. Google Translate, for example, does a credible job of it using “statistical machine translation,” or looking at a variety of examples of translated work and determining which translation is most likely. IBM’s Watson is the first tool to be broadly capable of ingesting, analyzing, and “understanding” text to a sufficient degree to answer detailed questions on it.

pages: 332 words: 97,325

The Launch Pad: Inside Y Combinator, Silicon Valley's Most Exclusive School for Startups
by Randall Stross
Published 4 Sep 2013

And we’re going to find anyone else who tries to fuck with our customers!” The choice of verb draws extended laughter in the hall; someone even starts to clap. Tan resumes. “Now this is a really hard problem, but we’re the right team to solve this. My cofounder, Brandon, was most recently the tech lead for Android speech recognition—that’s a really hard machine-learning problem. I myself have worked at three startups and was CTO of BuzzLabs, acquired by IAC in April.” • Outside, the founders await their turns to present. They mill about, exchanging tidbits of news that dribble out from the show inside. These hours spent standing outside anxiously will turn out to be the time when members of the batch build bonds in a way they had not during the summer itself.

…

Abbott, Ryan, 46, 171, 174, 177, 180, 181 Adidas, 234 Adpop Media, 46–47, 122–23, 129 AeroFS, 231 Airbnb, 4, 43, 88, 95 AirTV, 103–4 circumvention, 177–78 investors seeking next, 207 Kutcher, Ashton, 265n1 marketplace, 179 Sift Science, 210 Vidyard, 103–5, 120 Aisle50, 51–52, 191, 208–9, 223, 233 Akamai, 101 Albertsons, 209 Allen, Paul, 16 AlphaLab, 41 Altair BASIC, 11, 68 AltaVista, 204 Altman, Sam, 220 on buzzwords, 18–19 CampusCred, 111–14 interviewing finalists, 11, 21 Rap Genius, 196–202 Science Exchange, 173 Sift Science, 75–76 speaking style, 114, 115, 196 YC partner, 63, 150 Amarasiriwardena, Thushan, 127–28 Amazon, 126 Interview Street, 213 Mechanical Turk, 89, 90 movie rentals, 106 web services, 32, 101, 131, 132 Andreessen Horowitz, 4, 66, 230 Andreessen, Marc, 1–2, 4, 215, 239 Android, 17, 122, 147, 212 NFC, 157, 158 speech recognition, 210 Andrzejewski, Alexa, 54 angel investors, 28, 86–87, 189–90 AnyAsq, 166–67 Anybots, 12, 27, 40, 63 AOL, 124, 126 AppJet, 64, 204 Apple, 69 App Store, 100, 127 cofounders, 161 headquarters, 251n1 iOS devices, 122, 127–28, 142, 147, 172, 187, 209, 212 Sequoia Capital, 3 Snapjoy, 187 Arrington, Michael, 48–49 The Art of Ass-Kicking blog (Shen), 9 Artix, 25, 27, 29 Ask Me Anything, 166 Auburn University, 29 Auctomatic, 64, 66, 159, 204 Austin, TX, 42 Australia, 17, 238 Ballinger, Brandon, 70–76, 121, 134–38, 209–10 Barbie, 53 Bard College at Simon’s Rock, 52 Beatles, 200 Bechtolsheim, Andy, 86 Bellingham, WA, 101 Benchmark Capital, 5 Berkeley (UC) CampusCred, 20, 111 graduates of, 9, 68, 135, 164 Information, School of, 89, 90 newspaper, 136 students, 20, 52–53, 135 Venture Lab Competition, 53 women in computer science, 53 Bernstam, Tikhon, 122, 166, 185–86, 212, 228, 230–31 Bernstein, Mikael, 213–14 Betaspring, 42 Bible, 127, 197, 199, 200 Bill of Rights, 197 Bing Nursery School, 52 Birmingham, AL, 29, 33, 51, 203, 223 BizPress, 125, 147–48, 192 BlackBerry, 157, 184 Blackwell, Trevor Anybots, 27 interviewing finalists, 10, 11–12, 32–33 Kiko, 16 Viaweb, 25 YC partner, 27, 40, 57, 63 Blank, Steve, 77 Blogger, 57 Blomfield, Tom, 191 Bloomberg, Michael, 227 Bloomberg TV, 55 Blurb.com, 12 BMW, 165 Books On Campus, 164 BoomStartup, 42 Boso, 57–58, 60 Boston, MA, 56 Boston University, 112 Boucher, Ross, 64 Boulder, CO, 41, 43, 53, 130 Box, 54 Boyd, E.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity
by Amy Webb
Published 5 Mar 2019

Winning and Losing Hinton kept working, workshopping the idea with his students as well as with Lecun and Bengio, and published papers beginning in 2006. By 2009, Hinton’s lab had applied deep neural nets for speech recognition, and a chance meeting with a Microsoft researcher named Li Deng meant that the technology could be piloted in a meaningful way. Deng, a Chinese deep-learning specialist, was a pioneer in speech recognition using large-scale deep learning. By 2010, the technique was being tested at Google. Just two years later, deep neural nets were being used in commercial products. If you used Google Voice and its transcription services, that was deep learning, and the technique became the basis for all the digital assistants we use today.

Artificial Whiteness
by Yarden Katz

As veteran AI researcher Nils Nilsson pointed out, DARPA’s “command-and-control” category was “sufficiently vague” to encompass much of AI, however carved up.43 Moreover, AI’s subfields were from the start organized around a militaristic frame: vision research to detect “enemy” ships and spot resources of interest from satellite images, speech recognition for surveillance and voice-controlled air craft control, robotics to develop autonomous weaponry, and so on. Indeed, already in the 1960s, AI’s various research strands were articulated using military tasks such as “tactical commander’s management aide” and “missile range picture analysis” (figure 1.3).44 This militaristic frame was accompanied by a capitalist view of nature as a place for “resource extraction.”

…

Yet criticisms of militarism as such were rare and hardly well-received, as to be expected in a field bankrolled by the Pentagon. For one, the originator of the “AI” label, John McCarthy, rejected the notion that the U.S. Department of Defense could have nefarious ends in mind, such as surveillance. He claimed that “Weizenbaum’s conjecture that the Defense Department supports speech recognition research in order to be able to snoop on telephone conversations is biased, baseless, false, and seems motivated by political malice.” Furthermore, he added, the critiques of the Pentagon by the likes of Weizenbaum are harmful to the nation: “The failure of many scientists to defend the Defense Department against attacks they know are unjustified, is unjust in itself, and furthermore has harmed the country.”71 It was far more acceptable for practitioners to criticize “military applications” of AI (as opposed to militarism or imperialism), but here the objections were different.

pages: 362 words: 103,087

The Elements of Choice: Why the Way We Decide Matters
by Eric J. Johnson
Published 12 Oct 2021

When we understand speech fluently, it’s very easy for us to listen to the radio or a conversation. However, objectively, it is very hard for other, nonhuman intelligences. In the early days of artificial intelligence (AI), I sat in a windowless room with one hundred other people watching a demonstration of Hearsay, one of the first speech recognition systems. The crowd was amazed when the system understood, after several minutes, the words “Rook to King’s Knight 4,” a phrase from chess notation, the very limited domain that the system recognized. The Department of Defense’s Defense Advanced Research Projects Agency (DARPA) had given millions of dollars in contracts to Carnegie Mellon to get this to happen.

…

Armstrong, 209–10 Robinhood, 282–91, 350n Roth, Al, 162–63, 165 rule of 72, 347n Russo, Jay, 194–95 Rutgers University, 138–39 Saddler, Claudette, 164–65, 168–69 Saddler, Radcliffe, 159–60, 161, 164–65, 168–69, 182–83 salt content labels, 256–59, 258 Salvation Army, 71–72 Sam Houston University, 188 Sanders, Bernie, 45–46 “Save the Last Dance for Me” (Bruine de Bruin), 206 scaling metrics, 260–63 Scheibehenne, Benjamin, 166–67 Schoar, Antoinette, 104–5 school choice, 159–66, 168–71, 180, 182–83, 212–13 Schwartz, Barry, 167 screening, 47–52, 164–65 dating sites, 47–52, 164–65 use of term, 47–48 Zappos, 278 search results, ordering of, 199–202 Securities and Exchange Commission (SEC), 288 self-control, 34–35 sequential presentations, 192, 203–5, 207–8 Sharpe, William, 293–94, 351n Sharpe ratios, 293–94 Shealy, Tripp, 247–48 Silva-Risso, Jorge, 68–69 simulated outcome plots, 294, 295 simultaneous presentations, 203–5, 208 Singapore and organ donation, 114 SiriusXM, 277 Siskel, Gene, 251 Skiles, Jeff, 23–24, 27–28 sleight of hand, 56 slotting allowances, 209–10 Slovic, Paul, 101–2 sludge, 18–19, 267 smaller-sooner outcome, 34–35, 37, 39 smart defaults, 153–54, 155, 266, 274 smart-enough defaults, 275 “Snoverkill,” 67 “Snowpocalypse,” 67 social justice, 318 Social Security benefits, 88–98, 310, 336–37n finding right box, 92–95 Retirement Estimator, 95–98 Soll, Jack, 224–29 “Song of Myself” (Whitman), 63 sort order, 213, 278–79 spaghetti plots, 295, 296, 297 Spain and organ donation, 110, 115–16, 338–39n speech recognition, 42 spread trade, 288 Stanford University, 3–4 Stango, Victor, 243 Starbucks, 152 Stash Financial, Inc., 284 status quo bias, 340n stem cell transplants, 107–11 stock options, 285–89 stopping points, 171–72, 196 stop signs, 259 storm tracks, 294–97, 296 straight line metrics, 240–44 “stupid human tricks,” 4 subliminal perception, 300–1 subscription services, 126–27, 267, 314–15 Sullenberger, Chesley “Sully,” 22–29, 44, 102, 332–33n load shedding, 23–24, 27, 28, 44, 61 Sunstein, Cass, 11, 95, 118, 121, 135, 308 supermarket shelves, 209–11 sustainability and targets, 248–50 Sydnor, Justin, 174–75 System 1 thinking, 319–20, 333n target date funds, 153–54, 275 targets, 245–50 taxes, 236–37 taxis and tipping, 12, 128–29, 323 Temple University, 35 Tenev, Vlad, 283 Tesla, 235, 284, 285–86 Tetlock, Phil, 352n Thaler, Richard, 11, 95, 118, 121, 139–40, 214–15, 216 “thumbs-up, thumbs-down” scale, 240, 251, 271 Tinder, 45–48, 276 Tinder Thumb, 46 tipping and taxis, 12, 128–29, 323 Todd, Peter, 166–67 Toyota RAV4, 225–26 traffic light colors, 251, 252, 256, 257, 258 Trump, Donald, 190, 261, 291–93 “tunnel vision,” 332–33n Twitter, 276, 295, 324 “tyranny of choice,” 166 Uber and tipping, 128–29, 323 Uber ratings, 238–39, 250 uncertainty, 291–97 Ungemach, Christoph, 234–35 United States map, 72–73, 335n University of Basel, 166–67 University of California, Riverside, 67 University of Chicago, 117 University of Iowa, 59–62 University of London, 305 University of Michigan, 50–51 University of Nebraska, 283, 289 University of Pennsylvania, 131–32, 306 unsubscribe links, 129 Ursu, Raluca, 200–2 US Airways Flight 1549, 21–29 user models, 266, 273–77 usury laws, 243 utility bills, 134–37, 347n Verizon, privacy settings, 315–16 Vicary, James H., 299–301 visuals, 208–11 Vogelsang, Timothy, 69 voter registration, 156–58 walking, 30, 122, 208 Wall, Dan, 35, 37 wallpaper backgrounds, 65–66 warning labels, 257, 259, 307 Watt, James, 255 weather, 66–69 car sales and, 68–69 climate change and, 66–68 Weber, Elke, 21, 44, 352n airplane tickets and taxes, 236 building sustainability, 247–48 decision by distortion, 194–95 default studies, 141–42 environmental regulations, 261–62 greenhouse gas ratings, 234–35 query theory, 75–76 website design, 5, 13–15 auto buying, 146–51 backgrounds, 65–66 health insurance exchanges, 13–15, 173–82, 279–81 Weill Cornell Medical College, 9–10 Wertlieb, Stacey, 116 Wharton School, 4–5, 141, 182 Whitman, Walt, 63 Whole Foods, 209–10 Wilson, Woodrow, 186–87, 195 wine tastings, 204, 206–7, 207 wireframe, 13–14 Wonderlic, 230 Woods, Erika, 44–48, 103–4 word associations, 60–63, 235–36 Yang, Sybil, 217–18 Yelp, 40, 265 Zappos, 278–79 Zaval, Lisa, 67 Zink, Sheldon, 116 Zinman, Jonathan, 243 Zoom, 155–56 “Zoom bombing,” 156 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z About the Author Eric J.

pages: 105 words: 34,444

The Open Revolution: New Rules for a New World
by Rufus Pollock
Published 29 May 2018

For example, information is not Open if it is available only to those in the United States, or if it may not be used to make a profit – or even used for military purposes. Distasteful though it may sometimes be, universality is especially important to the idea of Openness. An inventor may not want his speech-recognition software to be used to power drones that bomb people. However, rather as if Apple issued an edict that its computers were not to be used for trolling on the internet or posting terrorist videos, this would be impractical and unpoliceable. The power of Openness, like that of freedom of speech, lies in its being available, whatever people wish to do with it.

The Big Score
by Michael S. Malone
Published 20 Jul 2021

Although not a single product of this type has been built, much less sold, anywhere in the world, excited industry analysts have already begun calling this new “speech recognition” business one of the fastest-growing electronics industries of the 1980s. Some have pegged annual sales at the end of the decade at more than $1 billion. These men have heard rumors that IBM is looking at the business, as are a number of Japanese firms. But the scientist at the table is one of the world’s leading experts in computer semantics. He has assured them that his newly designed algorithm, the basic equation of a speech-recognition computer program, is unmatched anywhere. His partners take the scientist’s statement on faith; after all, they are businessmen, not engineers.

…

See Apple Computer Commodore, 327–28 Eagle, 353–54 IBM, 323, 324 Osborne, 331–34 Personal Software, 326 Pet computer, 327 Phelps, Mel, 121 Philco, 102–3, 459 Philips, 119, 146, 182, 187, 303 Philips Research Laboratory (Holland), 144, 146 Pizza Time Theaters, 265, 393, 394 Planar process, 117, 118 Plimpton, George, 387 Plug-compatible computers, 298–99, 303, 346 Polish agents, sale of military secrets to, 268–69 Pollution, chemical, 442–45 Pong, 375, 376–77 Poniatoff, Alexander M., 86–91 Porter, Ed, 51, 53, 54 Poulsen Wireless Telephone and Telegraph, 38 Pratt, Stan, 321–22 Precious-metals recovery, 280–81 Precision Monolithics, 137, 182 Press, trade, 368–69 Pridham, E.S., 39 Printed-circuit (PC) boards, 241 Printers, 243 Profit sharing, at Hewlett-Packard (HP), 59 Psychological problems, 425–29 Public relations agencies, 369 R Radar, 79–80, 221, 224–25 Radio, 37, 39–40, 42–44, 220, 221, 224–27 RAM (random access memory), 238–39 Ramo Wooldridge Corp., 343 Ramtek Corp., 329 Raphael, Howard, 264–65 Real estate, 366–67, 453 Recession of 1974–75, 291 of 1981, 320 of 1982, 293, 455 Reciprocity, 145 Refiners, 281, 282–83 Relays, electromechanical, 223 Religion, 423 Republic Electronics, 271 Resistors, 220 Rheem Semiconductor Inc., 116–17, 118 Rhumbatron, 77, 79 Richeson, Gene, 310, 314 Rind, Kenneth W., 321 Roberts, David, 276, 285, 441 Roberts, Sheldon, 92, 114, 119 Roberts, Thomas, 153, 181 Roberts, William, 90 Rock, Arthur, 95, 165, 211, 318, 325, 403 Rockefeller, Shroeder, 211 Rojas, Marta, 443 Rolm Corp., 142, 289, 291, 295, 310–16, 324, 406, 450 ROM (read only memory), 238 Rosen, Ben, 170 Ross, Lorraine, 444 Ross, Steve, 383 Ryan, Harris, 43–44 S Sanders, Jerry, 112, 123, 129, 183–87, 193–215, 356 background and youth of, 194–98 divorce, 213–14 at Douglas Aircraft Co., 199–200 at Fairchild Semiconductor, 120, 123–24, 126, 129, 131–32, 134, 136, 137, 155, 200–209 fired from Fairchild Semiconductor, 206–9 marriage, 202 at Motorola, 200 National Semiconductor’s job offer and, 205–6 personal characteristics and lifestyle, 124, 183, 193–94, 201–3, 213–15 See also Advanced Micro Devices Sanders, Linda, 202, 213 San Francisco, 82 San Jose, 35, 82–84 San Jose Mercury-News, 91, 269, 291, 427, 431, 438, 442, 451 San Jose State University, 36, 82 Santa Clara Valley, 35–47, 82–85 boundaries of, 35 See also Silicon Valley Scams, 25 Schlockers (chip-brokerage companies), 271–73 Schlumberger Ltd., 141, 142, 181–82 Schuler, Ruby Louise, 268 Schwarz, Robert, 305 Scott, Michael, 407, 408, 412 Scrap dealers, 267, 280–81 Sculley, John, 415, 417, 418 Securities and Exchange Commission, 383, 392 Seeq Inc., 263 Semiconductor industry capital requirements in, 187–88 microwave chip firms, 183 rise of new firms in 1980s, 189 second-tier firms in, 182–83 toxic chemicals in, 442–45 See also Electronics industry Semiconductors, 63 metal-oxide, 128, 129–30, 164, 235 theory behind, 227–28 types of, 226–27 See also Integrated circuits and specific types of devices Shannon, Claude, 223 Shapero, Albert, 305 Shatner, William, 327 Shockley, William, 72, 92–95, 356 Hogan and, 143, 146 Noyce and, 93–94, 102, 103–4 Shockley Laboratories Inc., 92–95 Shugart, Alan, 308 Shugart Associates, 244, 275, 411, 440 Siegel, Murray, 114–15 Siemens, A.G., 187, 215, 271 Siemenson, Sven, 210 Signetics Corp., 119, 158, 182, 187, 234, 273, 275, 281, 303, 335, 438–39, 441, 443–44 Siliconix Corp., 119, 125, 182, 269, 335 Silicon Valley, 28–29, 217 cultural life in, 428 health orientation in, 363 homes and real estate in, 357–60, 366–68 lifestyles in, 359–64 local economy of, 364–68 in the 1980s, 447–62 pollution in, 442–45 as sensate being, 30–31 special characteristics of, 29–30 See also Electronics industry; Santa Clara Valley; and specific topics Silicon Valley Syndrome, 425–26, 429, 432 Simko, Bob, 124 Smith, Adam, 128 Smullen, Roger, 127, 131, 155 Soejima, Toshio, 280 Software, 246–48, 292–94 for Lisa computer, 412 Solectron Corp., 268 Solfan Inc., 336, 431–35 Southard, Doug, 276, 441 Soviet Union, 269, 272–74 Space Age Metals, 271 Spangle, Clarence (Clancy), 302 Speech-recognition devices, 26–27 Sperry Gyroscope, 79, 80 SPI, 189 Sporck, Charles E., 120, 123, 125–28, 130–33, 135, 173, 251–59 background and youth of, 252–53 on consumer business, 161–62 at Fairchild Semiconductor, 112, 113, 116, 120, 121, 123, 125, 126–28, 131–33, 254–55 at General Electric, 253 on Japanese threat, 256–57 at National Semiconductor, 252, 255–59 resignation from Fairchild Semiconductor, 132–33, 204 Sprague, Peter, 155–56, 160, 205 Sprout Capital, 211 SRI International, 181 Stanford Industrial Park, 65, 72, 81, 84 Stanford Linear Accelerator (SLAC), 81, 398 Stanford University, 36–39, 43–46, 71–72 communications laboratory at, 44 electrical engineering department of, 43–45 Varian brothers and, 76–78 Stealing, 25, 262–67, 270–76, 441–42 of gold, 282–85 of trade secrets, 264 Stein, Al, 189 Stevens, William, 324–25 Stock options, 308 Stolaroff, Myron, 88 Storage Technology, 301 Stratus Inc., 317 STRETCH (Model 7030) computer, 343 Sunnyvale, 29, 422–5 Sunnyvale Historical Society, 23–25 Sweet, Bill, 264–65 Sylvania Electric, 63, 85, 288, 290, 311 Synapse, 317 Synertek, Inc., 187, 189, 275–76 Systems, 241–42, 287–93 Systems Control Inc., 268 T Taito Ltd., 389 Tandem, 291, 310, 315–18, 407 Tape recorders, 88 Taylor, A.

Data Mining: Concepts and Techniques: Concepts and Techniques
by Jiawei Han , Micheline Kamber and Jian Pei
Published 21 Jun 2011

What adaptations can be made to allow for when there are more than two classes? This question is addressed in Section 9.7.1 on multiclass classification. What can we do if we want to build a classifier for data where only some of the data are class-labeled, but most are not? Document classification, speech recognition, and information extraction are just a few examples of applications in which unlabeled data are abundant. Consider document classification, for example. Suppose we want to build a model to automatically classify text documents like articles or web pages. In particular, we want the model to distinguish between hockey and football documents.

…

We have a vast amount of documents available, yet the documents are not class-labeled. Recall that supervised learning requires a training set, that is, a set of classlabeled data. To have a human examine and assign a class label to individual documents (to form a training set) is time consuming and expensive. Speech recognition requires the accurate labeling of speech utterances by trained linguists. It was reported that 1 minute of speech takes 10 minutes to label, and annotating phonemes (basic units of sound) can take 400 times as long. Information extraction systems are trained using labeled documents with detailed annotations.

…

Artificial Intelligence (AAAI’96), Vol. 1 (Aug. 1996), pp. 725–730. [RA87] Rissland, E.L.; Ashley, K., HYPO: A case-based system for trade secret law, In: Proc. 1st Int. Conf. Artificial Intelligence and Law Boston, MA. (May 1987), pp. 60–66. [Rab89] Rabiner, L.R., A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (1989) 257–286. [RBKK95] Russell, S.; Binder, J.; Koller, D.; Kanazawa, K., Local learning in probabilistic networks with hidden variables, In: Proc. 1995 Joint Int. Conf. Artificial Intelligence (IJCAI’95) Montreal, Quebec, Canada. (Aug. 1995), pp. 1146–1152. [RC07] Ramakrishnan, R.; Chen, B.

pages: 492 words: 118,882

The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory
by Kariappa Bheemaiah
Published 26 Feb 2017

A Chatbot is essentially a service, powered by rules and artificial intelligence (AI), that a user can interact with via a chat interface. The service could be anything ranging from functional to fun, and it could exist in any chat product (Facebook Messenger, Slack, telegram, text messages, etc.). Recent advancements in Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) , coupled with crowdsourced data inputs and machine learning techniques, now allow AI’s to not just understand groups of words but also submit a corresponding natural response to a grouping of words. That’s essentially the base definition of a conversation, except this conversation is with a “bot.”

…

Buiter The Precariat: The New Dangerous Class (2011), Guy Standing Inventing the Future: Postcapitalism and a World Without Work (2015), Nick Srnicek and Alex Williams Raising the Floor: How a Universal Basic Income Can Renew Our Economy and Rebuild the American Dream (2016), Andy Stern Index A Aadhaar program Agent Based Computational Economics (ABCE) models complexity economists developments El Farol problem and minority games Kim-Markowitz Portfolio Insurers Model Santa Fe artificial stock market model Agent based modelling (ABM) aggregate behavioural trends axiomatisation, linearization and generalization black-boxing bottom-up approach challenge computational modelling paradigm conceptualizing, individual agents EBM enacting agent interaction environmental factors environment creation individual agent parameters and modelling decisions simulation designing specifying agent behaviour Alaska Anti-Money Laundering (AML) ARPANet Artificial Neural Networks (ANN) Atlantic model Automatic Speech Recognition (ASR) Autor-Levy-Murnane (ALM) B Bandits’ Club BankID system Basic Income Earth Network (BIEN) Bitnation Blockchain ARPANet break down points decentralized communication emails fiat currency functions Jiggery Pokery accounts malware protocols Satoshi skeleton keys smart contract TCP/IP protocol technological and financial innovation trade finance Blockchain-based regulatory framework (BRF) BlockVerify C Capitalism ALM hypotheses and SBTC Blockchain and CoCo canonical model cashlessenvironment See(Multiple currencies) categories classification definition of de-skilling process economic hypothesis education and training levels EMN fiat currency CBDC commercial banks debt-based money digital cash digital monetary framework fractional banking system framework ideas and methods non-bank private sector sovereign digital currency transition fiscal policy cashless environment central bank concept of control spending definition of exogenous and endogenous function fractional banking system Kelton, Stephanie near-zero interest rates policy instrument QE and QQE tendency ultra-low inflation helicopter drops business insider ceteris paribus Chatbots Chicago Plan comparative charts fractional banking keywords technology UBI higher-skilled workers ICT technology industry categories Jiggery Pokery accounts advantages bias information Blockchain CFTC digital environment Enron scandal limitations private/self-regulation public function regulatory framework tech-led firms lending and payments CAMELS evaluation consumers and SMEs cryptographic laws fundamental limitations governments ILP KYB process lending sector mobile banking payments industry regulatory pressures rehypothecation ripple protocol sectors share leveraging effect technology marketing money cashless system crime and taxation economy IRS money Seigniorage tax evasion markets and regulation market structure multiple currency mechanisms occupational categories ONET database policies economic landscape financialization monetary and fiscal policy money creation methods The Chicago Plan transformation probabilities regulation routine and non-routine routinization hypothesis Sarbanes-Oxley Act SBTC scalability issue skill-biased employment skills and technological advancement skills downgrading process trades See(Trade finance) UBI Alaska deployment Mincome, Canada Namibia Cashless system Cellular automata (CA) Central bank digital currency (CBDC) Centre for Economic Policy Research (CEPR) Chicago Plan Clearing House Interbank Payments System (CHIPS) Collateralised Debt Obligations (CDOs) Collateralized Loan Obligations (CLOs) Complexity economics agent challenges consequential decisions deterministic and axiomatized models dynamics education emergence exogenous and endogenous changes feedback loops information affects agents macroeconoic movements network science non-linearity path dependence power laws self-adapting individual agents technology andinvention See(Technology and invention) Walrasian approach Computing Congressional Research Service (CRS) Constant absolute risk aversion (CARA) Contingent convertible (CoCo) Credit Default Swaps (CDSs) CredyCo Cryptid Cryptographic law Currency mechanisms Current Account Switching System (CASS) D Data analysis techniques Debt and money broad and base money China’s productivity credit economic pressures export-led growth fractional banking See also((Fractional Reserve banking) GDP growth households junk bonds long-lasting effects private and public sectors problems pubilc and private level reaganomics real estate industry ripple effects security and ownership societal level UK DigID Digital trade documents (DOCS) Dodd-Frank Act Dynamic Stochastic General Equilibrium (DSGE) model E EBM SeeEquation based modelling (EBM) Economic entropy vs. economic equilibrium assemblages and adaptations complexity economics complexity theory DSGE based models EMH human uncertainty principle’ LHC machine-like system operating neuroscience findings reflexivity RET risk assessment scientific method technology and economy Economic flexibility Efficient markets hypothesis (EMH) eID system Electronic Discrete Variable Automatic Computer (EDVAC) Elliptical curve cryptography (ECC) EMH SeeEfficient Market Hypothesis (EMH) Equation based modelling (EBM) Equilibrium business-cycle models Equilibrium economic models contract theory contact incompleteness efficiency wages explicit contracts implicit contracts intellectual framework labor market flexibility menu cost risk sharing DSGE models Federal Reserve system implicit contracts macroeconomic models of business cycle NK models non-optimizing households principles RBC models RET ‘rigidity’ of wage and price change SIGE steady state equilibrium, economy structure Taylor rule FRB/US model Keynesian macroeconomic theory RBC models Romer’s analysis tests statistical models Estonian government European Migration Network (EMN) Exogenous and endogenous function Explicit contracts F Feedback loop Fiat currency CBDC commercial banks debt-based money digital cash digital monetary framework framework ideas and methods non-bank private sector sovereign digital currency transition Financialization de facto definition of eastern economic association enemy of my enemy is my friend FT slogans Palley, Thomas I.

pages: 415 words: 114,840

A Mind at Play: How Claude Shannon Invented the Information Age
by Jimmy Soni and Rob Goodman
Published 17 Jul 2017

If there can be said to have been an old boys’ club of Silicon Valley in its initial days, then Claude Shannon was a card-carrying member—and he benefited from all the privileges therein. The club benefited from Shannon as well, in his roles as network node and informal consultant. For instance, when Teledyne received an acquisition offer from a speech recognition company, Shannon advised Singleton to turn it down. From his own experience at the Labs, he doubted that speech recognition would bear fruit anytime soon: the technology was in its early stages, and during his time at the Labs, he’d seen much time and energy fruitlessly sunk into it. The years of counsel paid off, for Singleton and for Shannon himself: his investment in Teledyne achieved an annual compound return of 27 percent over twenty-five years

pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms
by Mehmed Kantardzić
Published 2 Jan 2003

The main idea of the SOMs is to project the n-dimensional input data into some representation that could be better understood visually, for example, in a 2-D image map. The SOM algorithm is not only a heuristic model used to visualize, but also to explore linear and nonlinear relationships in high-dimensional data sets. SOMs were first used in the 1980s for speech-recognition problems, but later they become a very popular and often used methodology for a variety of clustering and classification-based applications. The problem that data visualization attempts to solve is: Humans simply cannot visualize high-dimensional data, and SOM techniques are created to help us visualize and understand the characteristics of these dimensional data.

…

Finally, selection of distance measure is important as in any clustering algorithm. Euclidean distance is almost standard, but that does not mean that it is always the best. For an improved quality (isotropy) of the display, it is advisable to select the grid of the SOM units as hexagonal. The SOMs have been used in large spectrum of applications such as automatic speech recognition, clinical data analysis, monitoring of the condition of industrial plants and processes, classification from satellite images, analysis of genetic information, analysis of electrical signals from the brain, and retrieval from large document collections. Illustrative examples are given in Figure 7.18.

…

For example, with the MM shown in Figure 12.21, the probability that the MM takes the horizontal path from starting node to S2 is 0.4 × 0.7 = 0.28. Figure 12.21. A simple Markov Model. MM is derived based on the memoryless assumption. It states that given the current state of the system, the future evolution of the system is independent of its history. MMs have been used widely in speech recognition and natural language processing. Hidden Markov Model (HMM) is an extension to MM. Similar to MM, HMM consists of a set of states and transition probabilities. In a regular MM, the states are visible to the observer, and the state-transition probabilities are the only parameters. In HMM, each state is associated with a state-probability distribution.

pages: 394 words: 118,929

Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software
by Scott Rosenberg
Published 2 Jan 2006

But it was also clear that he was having a hard time pulling a plan together because his life outside OSAF had begun to fall apart almost immediately after he joined the Chandler team. His marriage broke up, he grew depressed, and he came down with a miserable case of repetitive stress injury that forced him off the keyboard and onto a speech-recognition dictation system. Over the holidays he decided to change his first name from David to Rys, and soon he began to change his appearance, too: He let his hair grow long and dyed it blond. He began wearing lipstick. He blogged about his interest in an “alternative lifestyle,” though he warned his readers not to “speculate incorrectly”: “You almost certainly are getting something wrong.

…

(To pass a Turing Test, typically conducted via the equivalent of instant messaging, a computer program must essentially fool human beings into believing that they are conversing with a person rather than a machine.) Taking the other side of the bet was Ray Kurzweil, a prolific inventor responsible for breakthroughs in electronic musical instruments and speech recognition who had more recently become a vigorous promoter of an aggressive species of futurism. Kurzweil’s belief in a machine that could ace the Turing Test was one part of his larger creed—that human history was about to be kicked into overdrive by the exponential acceleration of Moore’s Law and a host of other similar skyward-climbing curves.

pages: 348 words: 119,358

The Long History of the Future: Why Tomorrow's Technology Still Isn't Here
by Nicole Kobie
Published 3 Jul 2024

Beyond co-founding the robot vacuum company iRobot with Colin Angle and Helen Greiner, Brooks argued intelligence was about understanding the contextual environment and reacting to it rather than just knowledge representation and reasoning. This is called behavioural AI. A Roomba needs to be able to find its way around a space it’s never seen before, not just define a carpet. So AI was starting to move into the real world, developing real-sounding chatbots, turning neural networks to handwriting and speech recognition, and finally beating a grandmaster at chess with IBM’s Deep Blue, originally developed at CMU. That was achieved using symbolic AI algorithms running on a supercomputer to search and assess as many possible positions as possible. In 1997, after losing multiple times over 10 years, Deep Blue finally beat grandmaster Garry Kasparov in a six-game match.

…

In the mid-1980s, Ames was working on VR research, helped by VPL. In 1986, Michael McGreevy set up a project to build a headset known as the Virtual Visual Environment Display (VIVED), which used LEEP’s stereoscopic optics with LCD screens, with a 120-degree field of vision, head tracking and even speech recognition – all the key features of a modern headset. This could be paired with a gesture capture system like the DataGlove. NASA intended VIVED to be used for controlling robots and other hardware in space, and as such, the headset looks like an astronaut helmet. A separate, lightweight workstation-mounted viewer – think a high-tech version of the red plastic View-Master9 toy that lets children click through a ring of different images – was also built for information-management tasks and for 3D interaction with your files.

pages: 170 words: 45,121

Don't Make Me Think, Revisited: A Common Sense Approach to Web Usability
by Steve Krug
Published 1 Jan 2000

New technologies and form factors are going to be introduced all the time, some of them involving dramatically different ways of interacting.8 8 Personally, I think talking to your computer is going to be one of the next big things. Recognition accuracy is already amazing; we just need to find ways for people to talk to their devices without looking, sounding, and feeling foolish. Someone who’s seriously working on the problems should give me a call; I’ve been using speech recognition software for 15 years, and I have a lot of thoughts about why it hasn’t caught on. Just make sure that usability isn’t being lost in the shuffle. And the best way to do this is by testing. Chapter 11. Usability as common courtesy WHY YOUR WEB SITE SHOULD BE A MENSCH1 1 Mensch: a German-derived Yiddish word originally meaning “human being.”

pages: 160 words: 45,516

Tomorrow's Lawyers: An Introduction to Your Future
by Richard Susskind
Published 10 Jan 2013

I must also record my thanks to the various referees who anonymously assessed my book proposal and made a wide range of suggestions that led, I believe, to many significant improvements. Next is Patricia Cato, who helped me with innumerable initial drafts and still comfortably outperforms any speech recognition system in making sense of my rapid Glaswegian. I have also benefited greatly from the guidance, encouragement, and criticisms of a small group of friends and colleagues who generously spent many hours of their time reading an early draft of the book—Neville Eisenberg, Hazel Genn, Daniel Harris, Laurence Mills, David Morley, Alan Paterson, and Tony Williams.

pages: 372 words: 152

The End of Work
by Jeremy Rifkin
Published 28 Dec 1994

The ambitious effort, which has been dubbed the RealWorld Program, will attempt to develop what the Japanese call "flexible information processing" or "Soft Logic," the kind of intuitive thinking that people use when they make decisions. 3 Using new computers equipped with massive parallel processing, neural networks, and optical signals, the Japanese hope to create a new generation of intelligent machines that can read text, understand complex speech, interpret facial gestures and expressions, and even anticipate behavior. Intelligent machines equipped with rudimentary speech recognition already exist. Companies like BBN Systems and Technologies of Cambridge, Massachusetts, and Dragon Systems of Newton, Massachusetts, have developed computers with vocabularies of up to 30,000 words. 4 Some of the new thinking machines can recognize casual speech, carryon meaningful conversations, and even solicit additional information upon which to make decisions, provide recommendations, and answer questions.

…

The robot's arm then selects the CD from the shelves and delivers it to the customer along with his or her receipt. A twentythree-year-old who shops regularly at the store says he prefers the robot to a human salesperson. "It's easy to use and it can't talk back to yoU."47 More sophisticated robots equipped with speech recognition and conversational abilities will likely be commonplace in department stores, convenience stores, fast-food restaurants, and other retail and service businesses by the early part of the next century. A large European super-discounter is experimenting with a new electronic technology that allows the customer to insert his or her credit card into a slot on the shelf holding the desired product.

pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands
by Eric Topol
Published 6 Jan 2015

Wardrop, “Doctors Told to Prescribe Smartphone Apps to Patients,” Telegraph, February 22, 2012, http://www.telegraph.co.uk/health/healthnews/9097647/Doctors-told-to-prescribe-smartphone-apps-to-patients.html. 97. S. Curtis, “Digital Doctors: How Mobile Apps Are Changing Healthcare,” Telegraph, December 4, 2013, http://www.telegraph.co.uk/technology/news/10488778/Digital-doctors-how-mobile-apps-are-changing-healthcare.html. 98. “How Speech-Recognition Software Got So Good,” The Economist, April 22, 2014, http://www.economist.com/node/21601175/print. 99. J. Conn, “IT Experts Push Translator Systems to Convert Doc-Speak into ICD-10 Codes,” Modern Healthcare, May 3, 2014, http://www.modernhealthcare.com/article/20140503/MAGAZINE/305039969/1246/. 100.

…

“When Waiting Is Not an Option,” The Economist, May 13, 2012, http://www.economist.com/node/21554157/print. 10. J. Weiner, “What Big Data Can’t Tell Us, but Kolstad’s Paper Suggests,” Penn LDI, April 24, 2014, http://ldi.upenn.edu/voices/2014/04/24/what-big-data-can-t-tell-us-but-kolstad-s-paper-suggests. 11. “How Speech-Recognition Software Got So Good,” The Economist, April 22, 2014, http://www.economist.com/node/21601175/print. 12. T. Lewin, “Master’s Degree Is New Frontier of Study Online,” New York Times, August 18, 2013, http://www.nytimes.com/2013/08/18/education/masters-degree-is-new-frontier-of-study-online.html. 13.

AI 2041: Ten Visions for Our Future
by Kai-Fu Lee and Qiufan Chen
Published 13 Sep 2021

These provide a natural source of supervision for machines to learn to translate languages. The AI can be trained from the simple pairing of, say, each of the millions of sentences in English with its professionally translated counterpart in French. Using this approach, supervised learning can be extended to speech recognition (converting speech into text), optical character recognition (converting handwriting or images into text), or speech synthesis (converting text to speech). For these types of natural language recognition tasks where supervised training is feasible, AI already outperforms most humans. A more complex application of NLP goes from recognition to understanding.

…

Zoom and other videoconferencing services will go down in history as the tools that kept the world turning during COVID-19. They made possible productive team meetings, joyful weddings, and active classrooms for millions of students. We can anticipate that in the near future, business meetings will be archived and transcribed by automatic speech recognition. This will make past meetings searchable, and help us track commitments, schedules, and possible anomalies, significantly improving business efficiency and management. Pervasive video communications in the future will also enable AI-based avatars. As we learned in chapter 2, generating a realistic video of my talking head is much easier than replicating a human in real-life.

pages: 474 words: 130,575

Surveillance Valley: The Rise of the Military-Digital Complex
by Yasha Levine
Published 6 Feb 2018

In 1960, he published a paper that outlined his vision for the coming “man-computer symbiosis” and described in simple terms the kinds of computer components that needed to be invented to make it happen. The paper essentially described a modern multipurpose computer, complete with a display, keyboard, speech recognition software, networking capabilities, and applications that could be used in real time for a variety of tasks.27 It seems obvious to us now, but back then Lick’s ideas were visionary. His paper was widely circulated in defense circles and earned him an invitation by the Pentagon to do a series of lectures on the topic.28 “My first experience with computers had been listening to a talk by [mathematician John] von Neumann in Chicago back in nineteen forty-eight.

…

Clients included, according to Brand’s Media Lab, ABC, NBC, CBS, PBS, HBO, Warner Brothers, 20th Century Fox, and Paramount. IBM, Apple, Hewlett-Packard, Digital Equipment Corporation, Sony, NEC, Mitsubishi, and General Motors were also members, as were major newspapers and news publishing businesses: Time Inc., the Washington Post, and the Boston Globe. 92. Among other things, DARPA funded lab research on speech recognition technology that promised to identify people by their voices or to visually read their lips from a distance. 93. Todd Hertz, “How Computer Nerds Describe God,” Christianity Today, November 20, 2002. 94. “Wired was meant to be a lifestyle magazine as well as a technology guide,” writes John Cassidy in Dot.Con, a book about the dot-com bubble.

pages: 502 words: 132,062

Ways of Being: Beyond Human Intelligence
by James Bridle
Published 6 Apr 2022

Neural computation – the brain’s processing of the world – is low-speed and low-precision compared to a computer, but it’s also massively parallel and real-time, operating more like the flow of a river than the ticking of a clock. As our brains have evolved to interface with the world as it is, this suggests that solving ‘real world’ problems – route-finding, speech recognition, economics – is better addressed by computers which are more like the world too. The second point is that the water in the bucket isn’t ‘thinking’ or ‘remembering’ – but it is processing. It’s computing information. The form of this information isn’t like the ones or zeros that pass through digital machines (including the crab computer): it’s analogue, which rather than old or fuzzy means complex, knotty and continuous.

…

In appearance, Sophia is humanoid, taking the form of a torso on a plinth, her face modelled on a combination of the ancient Egyptian Queen Nefertiti, Audrey Hepburn and her inventor’s wife. The skin on the back of her head is peeled back to reveal wires and blinking lights beneath a translucent cranium. In 2018, she was upgraded with legs, enabling her to move around, and the ability to emulate more than sixty facial expressions. She uses speech recognition technology developed by Google, while automatic facial recognition means she can track interlocutors and hold their gaze. In practice, she is capable of responding to specific questions or phrases with pre-scripted answers, and has been unfavourably compared to ‘a chatbot with a face’. While knowledgeable about the weather and stock market prices, Sophia’s conversational and critical skills have been less than impressive.

pages: 901 words: 234,905

The Blank Slate: The Modern Denial of Human Nature
by Steven Pinker
Published 1 Jan 2002

This, Weizenbaum told us, was really designed to help the Pentagon come up with counterinsurgency strategies in Vietnam. The Vietcong had been said to “move in the jungle as fish move in water.” If the program were fed this information, he said, it could deduce that just as you can drain a pond to expose the fish, you can denude the jungle to expose the Vietcong. Turning to research on speech recognition by computer, he said that the only conceivable reason to study speech perception was to allow the CIA to monitor millions of telephone conversations simultaneously, and he urged the students in the audience to boycott the topic. But, he added, it didn’t really matter if we ignored his advice because he was completely certain—there was not the slightest doubt in his mind—that by the year 2000 we would all be dead.

…

The rumors of our death turned out to be greatly exaggerated, and the other prophecies of the afternoon fared no better. The use of analogy in reasoning, far from being the work of the devil, is today a major research topic in cognitive science and is widely considered a key to what makes us smart. Speech-recognition software is routinely used in telephone information services and comes packaged with home computers, where it has been a godsend for the disabled and for people with repetitive strain injuries. And Weizenbaum’s accusations stand as a reminder of the political paranoia and moral exhibitionism that characterized university life in the 1970s, the era in which the current opposition to the sciences of human nature took shape.

…

social psychology, see psychology, social social reality social sciences sociobiology Sociobiology (Wilson) sociology Socrates Sokal, Alan Solzhenitsyn, Aleksandr Sommers, Christina Hoff Sontag, Susan soul see also Ghost in the Machine South Africa Southerners Soviet Union Sowell, Thomas Spanish Civil War spatial sense Specter, Arlen speech-recognition software Spencer, Herbert Sperber, Dan Sperry, Roger Spock, Benjamin Sponsel, Leslie sports Springsteen, Bruce Stalin, Joseph Standard Social Science Model see also social constructionism; social sciences Stardust Memories statistics status Stein, Gertrude Steinem, Gloria Steiner, George Steiner, Wendy stem cell research Stephen, James stepparenting stereotypes Stevens, Wallace Stich, Stephen Stills, Stephen Sting Stockhausen, Karlheinz Stoicism Stolba, Christine Stoppard, Tom Storey, Robert strict constructionism Strossen, Nadine Sullivan, Andrew Sullivan, Arthur Sulloway, Frank Summerhill (Neill) Superfund Act (1980) superorganism (group mind) supervisory attention system Supreme Court, U.S.

The Complete Android Guide: 3Ones
by Kevin Purdy
Published 15 Apr 2011

Everyone in the U.S. or Canada who calls 1-800-GOOG-411 (1-800-466-4411) and says the business they're looking for, along with city and state, can be connected for free, or have additional information sent by text message. As you might have guessed, Google has been using all that speech—in particular, the phonemes of regional dialects—and the search results they're connected to in order to build a pretty huge speech recognition database. It is far, far from perfect, but it's also surprisingly good at times. Alternate Keyboards HTC's Keyboard, Included by Default on Phones with "Sense" Interfaces Not every phone comes with Google's own keyboard installed as the default, and not every phone has to keep it around.

pages: 174 words: 56,405

Machine Translation
by Thierry Poibeau
Published 14 Sep 2017

Speech processing has been the subject of intensive research in recent decades, and performance is now acceptable. However, the task remains difficult since speech processing as well as machine translation have to be performed in real time, and errors are cumulative (i.e., if a word has not been properly analyzed by the speech recognition system, it will not be properly translated). Large companies producing connected tools (Apple, Google, Microsoft, or Samsung, to name a few) develop their own solutions and regularly buy start-ups in technological domains. They need to be first on the technological front and propose new features that may be an important source of revenue in the future.

pages: 523 words: 148,929

Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100
by Michio Kaku
Published 15 Mar 2011

The translation won’t be perfect, since there are always problems with idioms, slang, and colorful expressions, but it will be good enough so you will understand the gist of what that person is saying. There are several ways in which scientists are making this a reality. The first is to create a machine that can convert the spoken word into writing. In the mid-1990s, the first commercially available speech recognition machines hit the market. They could recognize up to 40,000 words with 95 percent accuracy. Since a typical, everyday conversation uses only 500 to 1,000 words, these machines are more than adequate. Once the transcription of the human voice is accomplished, then each word is translated into another language via a computer dictionary.

…

(His detractors say that he is whipping up a near-religious fervor around the singularity. However, his supporters say that he has an uncanny ability to correctly see into the future, judging by his track record.) Kurzweil cut his teeth on the computer revolution by starting up companies in diverse fields involving pattern recognition, such as speech recognition technology, optical character recognition, and electronic keyboard instruments. In 1999, he wrote a best seller, The Age of Spiritual Machines: When Computers Exceed Human Intelligence, which predicted when robots will surpass us in intelligence. In 2005, he wrote The Singularity Is Near and elaborated on those predictions.

Beautiful Data: The Stories Behind Elegant Data Solutions
by Toby Segaran and Jeff Hammerbacher
Published 1 Jul 2009

Using data collected from the API servers, user profiles, and activity data from the site itself, we were able to construct a model for scoring applications that allowed us to allocate invitations to the applications deemed most useful to users. The Unreasonable Effectiveness of Data In a recent paper, a trio of Google researchers distilled what they have learned from trying to solve some of machine learning’s most difficult challenges. When discussing the problems of speech recognition and machine translation, they state that, “invariably, simple models and a lot of data trump more elaborate models based on less data.” I don’t intend to debate their findings; certainly there are domains where elaborate models are successful. Yet based on their experiences, there does exist a wide class of problems for which more data and simple models are better.

…

N-gram counts have this property: we can easily harvest a trillion words of naturally occurring text from the Web. On the other hand, labeled spelling corrections do not occur naturally, and thus we found only 40,000 of them. It is not a coincidence that the two most successful applications of natural language—machine translation and speech recognition—enjoy large corpora of examples available in the wild. In contrast, the task of syntactic parsing of sentences remains largely unrealized, in part because there is no large corpus of naturally occurring parsed sentences. It should be mentioned that our probabilistic data-driven methodology—maximize the probability over all candidates—is a special case of the rational data-driven methodology— maximize expected utility over all candidates.

pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking
by Foster Provost and Tom Fawcett
Published 30 Jun 2013

Working through case studies (either in theory or in practice) of data science applications helps prime the mind to see opportunities and connections to new problems that could benefit from data science. For example, in the late 1980s and early 1990s, one of the largest phone companies had applied predictive modeling—using the techniques we’ve described in this book—to the problem of reducing the cost of repairing problems in the telephone network and to the design of speech recognition systems. With the increased understanding of the use of data science for helping to solve business problems, the firm subsequently applied similar ideas to decisions about how to allocate a massive capital investment to best improve its network, and how to reduce fraud in its burgeoning wireless business.

…

O., What Data Can’t Do: Humans in the Loop, Revisited R Ra, Sun, Example: Jazz Musicians ranking cases, classifying vs., Visualizing Model Performance–Example: Performance Analytics for Churn Modeling ranking variables, Supervised Segmentation reasoning, Similarity, Neighbors, and Clusters Recall metric, Costs and benefits Receiver Operating Characteristics (ROC) graphs, ROC Graphs and Curves–ROC Graphs and Curves area under ROC curves (AUC), The Area Under the ROC Curve (AUC) in KDD Cup churn problem, Example: Performance Analytics for Churn Modeling–Example: Performance Analytics for Churn Modeling recommendations, Similarity, Neighbors, and Clusters Reddit, Why Text Is Important regional distribution centers, grouping/associations and, Co-occurrences and Associations: Finding Items That Go Together regression, From Business Problems to Data Mining Tasks, From Business Problems to Data Mining Tasks, Similarity, Neighbors, and Clusters building models for, Business Understanding classification and, From Business Problems to Data Mining Tasks ensemble methods and, Bias, Variance, and Ensemble Methods least squares, Regression via Mathematical Functions logistic, Example: Overfitting Linear Functions ridge, * Avoiding Overfitting for Parameter Optimization supervised data mining and, Supervised Versus Unsupervised Methods supervised segmentation and, Selecting Informative Attributes regression modeling, Generalizing Beyond Classification regression trees, Supervised Segmentation with Tree-Structured Models, Bias, Variance, and Ensemble Methods regularization, * Avoiding Overfitting for Parameter Optimization, Summary removing missing values, Data Preparation repetition, Data Science, Engineering, and Data-Driven Decision Making requirements, Data Preparation responders, likely vs. not likely, Using Expected Value to Frame Classifier Use retrieving, Similarity, Neighbors, and Clusters retrieving neighbors, Regression Reuters news agency, Example: Clustering Business News Stories ridge regression, * Avoiding Overfitting for Parameter Optimization root-mean-squared error, Generalizing Beyond Classification S Saint Magdalene single malt scotch, * Using Supervised Learning to Generate Cluster Descriptions Scapa single malt scotch, Understanding the Results of Clustering Schwartz, Henry, Stepping Back: Solving a Business Problem Versus Data Exploration scoring, From Business Problems to Data Mining Tasks search advertising, display vs., Example: Targeting Online Consumers With Advertisements search engines, Why Text Is Important second-layer models, Nonlinear Functions, Support Vector Machines, and Neural Networks segmentation creating the best, Selecting Informative Attributes supervised, Clustering unsupervised, Stepping Back: Solving a Business Problem Versus Data Exploration selecting attributes, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation informative variables, Supervised Segmentation variables, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation selection bias, A Brief Digression on Selection Bias–A Brief Digression on Selection Bias semantic similarity, syntactic vs., The news story clusters separating classes, Example: Overfitting Linear Functions sequential backward elimination, A General Method for Avoiding Overfitting sequential forward selection (SFS), A General Method for Avoiding Overfitting service usage, From Business Problems to Data Mining Tasks sets, Bag of Words Shannon, Claude, Selecting Informative Attributes Sheldon Cooper (fictional character), Example: Evidence Lifts from Facebook “Likes” sign consistency, in cost-benefit matrix, Costs and benefits Signet Bank, Data and Data Science Capability as a Strategic Asset, From an Expected Value Decomposition to a Data Science Solution Silver Lake, Term Frequency Silver, Nate, Evaluation, Baseline Performance, and Implications for Investments in Data similarity, Similarity, Neighbors, and Clusters–* Using Supervised Learning to Generate Cluster Descriptions applying, Example: Whiskey Analytics calculating, The Fundamental Concepts of Data Science clustering, Clustering–The news story clusters cosine, * Other Distance Functions data exploration vs. business problems and, Stepping Back: Solving a Business Problem Versus Data Exploration–Stepping Back: Solving a Business Problem Versus Data Exploration distance and, Similarity and Distance–Similarity and Distance heterogeneous attributes and, Heterogeneous Attributes link recommendation and, Link Prediction and Social Recommendation measuring, Similarity and Distance nearest-neighbor reasoning, Nearest-Neighbor Reasoning–* Combining Functions: Calculating Scores from Neighbors similarity matching, From Business Problems to Data Mining Tasks similarity-moderated classification (equation), * Combining Functions: Calculating Scores from Neighbors similarity-moderated regression (equation), * Combining Functions: Calculating Scores from Neighbors similarity-moderated scoring (equation), * Combining Functions: Calculating Scores from Neighbors Simone, Nina, Example: Jazz Musicians skew, Problems with Unbalanced Classes Skype Global, Term Frequency smoothing, Probability Estimation social recommendations, Link Prediction and Social Recommendation–Link Prediction and Social Recommendation soft clustering, Profiling: Finding Typical Behavior software development, Implications for Managing the Data Science Team software engineering, data science vs., A Firm’s Data Science Maturity software skills, analytic skills vs., Implications for Managing the Data Science Team Solove, Daniel, Privacy, Ethics, and Mining Data About Individuals solution paths, changing, Data Understanding spam (target class), Example: Targeting Online Consumers With Advertisements spam detection systems, Example: Targeting Online Consumers With Advertisements specified class value, Supervised Versus Unsupervised Methods specified target value, Supervised Versus Unsupervised Methods speech recognition systems, Thinking Data-Analytically, Redux speeding up neighbor retrieval, Computational efficiency Spirited Away, Example: Evidence Lifts from Facebook “Likes” spreadsheet, implementation of Naive Bayes with, Evidence in Action: Targeting Consumers with Ads spurious correlations, * Example: Why Is Overfitting Bad?

pages: 189 words: 57,632

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future
by Cory Doctorow
Published 15 Sep 2008

Of course, the Singularity isn't just a conceit for noodling with in the pages of the pulps: it's the subject of serious-minded punditry, futurism, and even science. Ray Kurzweil is one such pundit-futurist-scientist. He's a serial entrepreneur who founded successful businesses that advanced the fields of optical character recognition (machine-reading) software, text-to-speech synthesis, synthetic musical instrument simulation, computer-based speech recognition, and stock-market analysis. He cured his own Type-II diabetes through a careful review of the literature and the judicious application of first principles and reason. To a casual observer, Kurzweil appears to be the star of some kind of Heinlein novel, stealing fire from the gods and embarking on a quest to bring his maverick ideas to the public despite the dismissals of the establishment, getting rich in the process.

pages: 202 words: 59,883

Age of Context: Mobile, Sensors, Data and the Future of Privacy
by Robert Scoble and Shel Israel
Published 4 Sep 2013

A free, government-published mobile app lets users post to a hot map that shows real-time air quality data in the same way that Libelium, the Spanish sensor company, provided radiation-level data in Japan after the Fukushima nuclear disaster. Teenybopper lockets. The iLocket from Dano is a $25 little heart-shaped locket that is connected to an Apple iOS mobile app. Targeted to young teens, iLocket lets users whisper their most personal secrets into an iPhone or iPad. The app uses speech recognition. Put in a favorite photo of a secret heartthrob and the app prints a photo that fits perfectly inside the locket. The killer part of the app is how it treats user privacy. Press the locket and the diary entries disappear and remain protected until the iLocket owner unlocks it by pressing the locket again to send a unique code to the iPad app.

Speaking Code: Coding as Aesthetic and Political Expression
by Geoff Cox and Alex McLean
Published 9 Nov 2012

The choice of the song “Daisy Bell” is explained on the site: “originally written by Harry Dacre in 1892, was made famous in 1962 by John Kelly, Max Mathews, and Carol Lockbaum as the first example of musical speech synthesis. In contrast to the 1962 version, Bicycle Built for 2,000 was synthesized with a distributed system of human voices from all over the world.” 120. 2001: A Space Odyssey (1968, dir. Stanley Kubrick, Metro-Goldwyn-Mayer). HAL is a computer capable of speech, speech recognition, facial recognition, natural language processing, lip reading, art appreciation, interpreting and reproducing emotional behaviors, reasoning, and playing chess. Notes to Pages 67–70 125 121. The full story was on the Forumwarz blog but is no longer available. See http:// en.wikipedia.org/wiki/Forumwarz.

Demystifying Smart Cities
by Anders Lisdorf

But while these problems are hard for humans and indeed show immense human skills, they are comparatively easy for artificial intelligence. None of these are really hard problems for an AI. This has to do with how AI works. Notice that one thing is common for all three examples: the goals are very clear. Chess, Jeopardy, and Go: you either win or you don't. This is similar to other successful AI applications like facial and speech recognition: you either recognize the person or you don't. As we saw previously, AI is good at finding solutions to types of problems where you have a very well-defined correct solution. If only human life were so simple. Virtually all of human life does not have a well-defined right or wrong. This is why management and self-help literature is filled with advice around how important it is to set goals.

pages: 288 words: 66,996

Travel While You Work: The Ultimate Guide to Running a Business From Anywhere
by Mish Slade
Published 13 Aug 2015

You do need data access for this, though. Speak and translate. Speak in your own language and Google Translate will display the translation, then the other person just taps on their own language to speak back to you. You can therefore have a complete Google-mediated conversation with someone – all you need is to trust Google's speech recognition not to mishear you and end up accidentally insulting their mother. Make full screen. Once you've translated a word or phrase, you can just go into the options and tap "make full screen" to show someone what you want to say without sending them scrambling for their reading glasses. Save to phrasebook.

pages: 391 words: 71,600

Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
by Satya Nadella , Greg Shaw and Jill Tracie Nichols
Published 25 Sep 2017

See also mobile phones; and specific products Smith, Brad, 3, 131, 170–71, 173, 189 SMS, 216 Snapchat, 193 Snapdeal, 33 Snow Crash (Stephenson), 143 Snowden, Edward, 172–73, 179–80 Social Connector, 137 social contract, 239 socioeconomic change, 12–13 software design, 27, 49 software engineering, 74 solar, wind, and tidal power, 43, 228 Sony, 28 Sony Pictures Entertainment, 169–70, 177, 179, 189 Soul of a New Machine, The (Kidder), 68 South Zone, 37, 115 sovereignty, 170 space exploration, 145–46 Spain, 215 spam filters, 158 speech recognition, 76, 89, 142, 150–51, 164 Spencer, Phil, 106–7 sports franchises, 15 spreadsheets, 143 SQL (structured query language), 26 SQL Server, 53, 55 Stallone, Sylvester, 44 Stanford University, 64 One Hundred Year Study, 208 Start-up of You, The (Hoffman), 233 Station Q, 162–63, 166 Stephenson, Neal, 143 string theory, 164 Studio D, 65–66 success leadership, 120 Sun Microsystems, 26–29, 54 Super Bowl, 4 supercomputers, 161 superconducting, 162–65 supply-chain operations, 103 Surface, 2, 129 Surface Hub, 89, 137 Surface Pro 3, 85 Surface Studio, 137 Svore, Krysta, 164–65 Sway, 121 Sweden, 44 Swisher, Kara, 138 Sydney Opera House, 98 symbiotic intelligence, 204 Synopsys, 25 Syria, 218 tablets, 45, 85, 134, 141.

pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy
by Pistono, Federico
Published 14 Oct 2012

Robots will eventually steal your job, but before them something else is going to jump in. In fact, it already has, in a much more pervasive way that any physical machine could ever do. I am of course talking about computer programs in general. Automated Planning and Scheduling, Machine Learning, Natural Language Processing, Machine Perception, Computer Vision, Speech Recognition, Affective Computing, Computational Creativity, these are all fields of Artificial Intelligence that do not have to face the cumbersome issues that Robotics has to. It is much easier to enhance an algorithm than it is to build a better robot. A more accurate title for the book would have been “Machine intelligence and computer algorithms are already stealing your job, and they will do so ever more in the future” – but that was not exactly a catchy title.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design
by Michael Kearns and Aaron Roth
Published 3 Oct 2019

The technical name for the algorithmic framework we have been describing is a generative adversarial network (GAN), and the approach we’ve outlined above indeed seems to be highly effective: GANs are an important component of the collection of techniques known as deep learning, which has resulted in qualitative improvements in machine learning for image classification, speech recognition, automatic natural language translation, and many other fundamental problems. (The Turing Award, widely considered the Nobel Prize of computer science, was recently awarded to Yoshua Bengio, Geoffrey Hinton, and Yann LeCun for their pioneering contributions to deep learning.) Fig. 21. Synthetic cat images created by a generative adversarial network (GAN), from https://ajolicoeur.wordpress.com/cats.

pages: 244 words: 66,599

Insanely Great: The Life and Times of Macintosh, the Computer That Changed Everything
by Steven Levy
Published 2 Feb 1994

You might even be set free from the keyboard, entering commands by speaking to the Navigator. What you see on the large, flat display screen will likely be in full color, high-definition, television-quality images, full pages of text, graphics, computer-generated animations. What you hear will incorporate high-fidelity sound, speech synthesis, and speech recognition. You will be able to work in several of these windows at any time, giving you the possibility to simultaneously compare, for example, the animated structural system of living cells with the animated network of a global economy. Or you might want to explore the depths of Zen philosophy in which beauty is in the details, comparing it with examples of the architectural details of the Parthenon from ancient Greece and then contrasting these ideas with the design details of a Japanese camera.

pages: 233 words: 64,702

China's Disruptors: How Alibaba, Xiaomi, Tencent, and Other Companies Are Changing the Rules of Business
by Edward Tse
Published 13 Jul 2015

Lee has long been one of China’s most popular microbloggers, with more than 50 million people following his Sina Weibo postings. Born in 1961, Lee moved to the United States with his family in the 1970s. After studying computing at Columbia University and earning a PhD from Carnegie Mellon University for research into speech recognition technology, he worked at some of the biggest names in American computing, including Apple, Silicon Graphics, and Microsoft, before moving to Beijing with Google. There he established Google China and built a huge following as a commentator on China-related technology and social issues. In 2009, at almost the same time as Yihaodian’s Yu Gang, he ended his multinational career to start Innovation Works.

pages: 296 words: 66,815

The AI-First Company
by Ash Fontana
Published 4 May 2021

Government funding for the field was cut in the seventies because developing a general purpose AI was deemed to be an intractable problem, prone to combinatorial explosion. This was more of a political than technological problem: the field overpromised and under-delivered. Corporations picked up the slack, developing the first robots, speech recognition systems, and language translators. Corporations anticipated AI intersecting with another trend: process automation. This trend was based on a factory management system developed in the late nineteenth century by an engineer named Frederick W. Taylor. The system increased efficiency by breaking down each step of a manufacturing process into specialized tasks, repeatedly performed by the same person.

pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives
by Steven Levy
Published 12 Apr 2011

By not requiring native speakers, Google was free to provide translations to the most obscure language pairs. “You can always translate French to English or English to Spanish, but where else can you translate Hindi to Danish or Finnish or Norwegian?” A long-term problem in computer science had been speech recognition—the ability of computers to hear and understand natural language. Google applied Och’s techniques to teaching its vast clusters of computers how to make sense of the things humans said. It set up a telephone number, 1-800-GOOG-411, and offered a free version of what the phone companies used to call directory assistance.

…

J., 140 Playboy, 153–54, 155 pornography, blocking, 54, 97, 108, 173, 174 Postini, 241 Pregibon, Daryl, 118–19 Premium Sunset, 109, 112–13, 115 privacy: and Book Settlement, 363 and browsers, 204–12, 336–37 and email, 170–78, 211–12, 378 and Google’s policies, 10, 11, 145, 173–75, 333–35, 337–40 and Google Street View, 340–43 and government fishing expeditions, 173 and interest-based ads, 263, 334–36 and security breach, 268 and social networking, 378–79, 383 and surveillance, 343 Privacy International, 176 products: beta versions of, 171 “dogfooding,” 216 Google neglect of, 372, 373–74, 376, 381 in GPS meetings, 6, 135, 171 machine-driven, 207 marketing themselves, 77, 372 speed required in, 186 Project Database (PDB), 164 property law, 6, 360 Python, 18, 37 Qiheng, Hu, 277 Queiroz, Mario, 230 Rainert, Alex, 373, 374 Rajaram, Gokul, 106 Rakowski, Brian, 161 Randall, Stephen, 153 RankDex, 27 Rasmussen, Lars, 379 Red Hat, 78 Reese, Jim, 181–84, 187, 195, 196, 198 Reeves, Scott, 153 Rekhi, Manu, 373 Reyes, George, 70, 148 Richards, Michael, 251 robotics, 246, 351, 385 Romanos, Jack, 356 Rosenberg, Jonathan, 159–60, 281 Rosenstein, Justin, 369 Rosing, Wayne, 44, 55, 82, 155, 158–59, 186, 194, 271 Rubin, Andy, 135, 213–18, 220, 221–22, 226, 227–30, 232 Rubin, Robert, 148 Rubinson, Barry, 20–21 Rubinstein, Jon, 221 Sacca, Chris, 188–94 Salah, George, 84, 128, 129, 132–33, 166 Salinger Group, The, 190–91 Salton, Gerard, 20, 24, 40 Samsung, 214, 217 Samuelson, Pamela, 362, 365 Sandberg, Sheryl, 175, 257 and advertising, 90, 97, 98, 99, 107 and customer support, 231 and Facebook, 259, 370 Sanlu Group, 297–98 Santana, Carlos, 238 Schillace, Sam, 201–3 Schmidt, Eric, 107, 193 and advertising, 93, 95–96, 99, 104, 108, 110, 112, 114, 115, 117, 118, 337 and antitrust issues, 345 and Apple, 218, 220, 236–37 and applications, 207, 240, 242 and Book Search, 350, 351, 364 and China, 267, 277, 279, 283, 288–89, 305, 310–11, 313, 386 and cloud computing, 201 and financial issues, 69–71, 252, 260, 376, 383 and Google culture, 129, 135, 136, 364 and Google motto, 145 and growth, 165, 271 and IPO, 147–48, 152, 154, 155–57 on lawsuits, 328–29 and management, 4, 80–83, 110, 158–60, 165, 166, 242, 254, 255, 273, 386, 387 and Obama, 316–17, 319, 321, 346 and privacy, 175, 178, 383 and public image, 328 and smart phones, 216, 217, 224, 236 and social networking, 372 and taxes, 90 and Yahoo, 344, 345 and YouTube, 248–49, 260, 265 Schrage, Elliot, 285–87 Schroeder, Pat, 361 search: decoding the intent of, 59 failed, 60 freshness in, 42 Google as synonymous with, 40, 41, 42, 381 mobile, 217 organic results of, 85 in people’s brains, 67–68 real-time, 376 sanctity of, 275 statelessness of, 116, 332 verticals, 58 see also web searches search engine optimization (SEO), 55–56 search engines, 19 bigram breakage in, 51 business model for, 34 file systems for, 43–44 and hypertext link, 27, 37 information retrieval via, 27 and licensing fees, 77, 84, 95, 261 name detection in, 50–52 and relevance, 48–49, 52 signals to, 22 ultimate, 35 upgrades of, 49, 61–62 Search Engine Watch, 102 SearchKing, 56 SEC regulations, 149, 150–51, 152, 154, 156 Semel, Terry, 98 Sengupta, Caesar, 210 Seti, 65–67 Shah, Sonal, 321 Shapiro, Carl, 117 Shazeer, Noam, 100–102 Sheff, David, 153 Sherman Antitrust Act, 345 Shriram, Ram, 34, 72, 74, 79 Siao, Qiang, 277 Sidekick, 213, 226 signals, 21–22, 49, 59, 376 Silicon Graphics (SGI), 131–32 Silverstein, Craig, 13, 34, 35, 36, 43, 78, 125, 129, 139 Sina, 278, 288, 302 Singh, Sanjeev, 169–70 Singhal, Amit, 24, 40–41, 48–52, 54, 55, 58 Siroker, Dan, 319–21 skunkworks, 380–81 Skype, 233, 234–36, 322, 325 Slashdot, 167 Slim, Carlos, 166 SMART (Salton’s Magical Retriever of Text), 20 smart phones, 214–16, 217–22 accelerometers on, 226–28 carrier contracts for, 230, 231, 236 customer support for, 230–31, 232 direct to consumer, 230, 232 Nexus One, 230, 231–32 Smith, Adam, 360 Smith, Bradford, 333 Smith, Christopher, 284–86 Smith, Megan, 141, 158, 184, 258, 318, 350, 355–56 social graph, 374 social networking, 369–83 Sogou, 300 Sohu, 278, 300 Sony, 251, 264 Sooner (mobile operating system), 217, 220 Southworth, Lucinda, 254 spam, 53–57, 92, 241 Spector, Alfred, 65, 66–67 speech recognition, 65, 67 spell checking, 48 Spencer, Graham, 20, 28, 201, 375 spiders, 18, 19 Stanford University: and BackRub, 29–30 and Book Search, 357 Brin in, 13–14, 16, 17, 28, 29, 34 computer science program at, 14, 23, 27, 32 Digital Library Project, 16, 17 and Google, 29, 31, 32–33, 34 and MIDAS, 16 Page in, 12–13, 14, 16–17, 28, 29, 34 and Silicon Valley, 27–28 Stanley (robot), 246, 385 Stanton, Katie, 318, 321, 322, 323–25, 327 Stanton, Louis L., 251 State Department, U.S., 324–25 Steremberg, Alan, 18, 29 Stewart, Jon, 384 Stewart, Margaret, 207 Stricker, Gabriel, 186 Sullivan, Danny, 102 Sullivan, Stacy, 134, 140, 141, 143–44, 158–59 Summers, Larry, 90 Sun Microsystems, 28, 70 Swetland, Brian, 226, 228 Taco Town, 377 Tan, Chade-Meng, 135–36 Tang, Diane, 118 Taylor, Bret, 259, 370 Teetzel, Erik, 184, 197 Tele Atlas, 341 Tesla, Nikola, 13, 32, 106 Thompson, Ken, 241 3M, 124 Thrun, Sebastian, 246, 385–86 T-Mobile, 226, 227, 230 Tseng, Erick, 217, 227 Twentieth Century Fox, 249 Twitter, 309, 322, 327, 374–77, 387 Uline, 112 Universal Music Group, 261 Universal Search, 58–60, 294, 357 University of Michigan, 352–54, 357 UNIX, 54, 80 Upson, Linus, 210, 211–12 Upstartle, 201 Urchin Software, 114 users: in A/B tests, 61 data amassed about, 45–48, 59, 84, 144, 173–74, 180, 185, 334–37 feedback from, 65 focus on, 5, 77, 92 increasing numbers of, 72 predictive clues from, 66 and security breach, 268, 269 U.S.

pages: 619 words: 177,548

Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity
by Daron Acemoglu and Simon Johnson
Published 15 May 2023

In surveys, support for democracy goes together with a disdain for overbearing experts, and those who believe in democracy do not want to cede political voice in favor of the experts and their priorities. Such diversity is often maligned by experts who argue that regular people cannot provide valuable inputs into highly technical matters. We are not advocating that there should be a set of citizens from all backgrounds deciding the laws of thermodynamics or the best way to design speech-recognition algorithms. Rather, different technology choices—for example, on algorithms, financial products, and how we use the laws of physics—tend to have distinct social and economic consequences, and everybody should have a say on whether we find these consequences desirable or even acceptable. When a company decides to develop face-recognition technology to track the faces in a crowd, to better market products to them or to make sure that people do not participate in protests, their engineers are best placed to decide how to design the software.

…

Tracking this indelible footprint, we can see that AI investments and the hiring of AI specialists concentrate in organizations that rely on tasks that can be performed by these technologies, such as actuarial and accounting functions, procurement and purchasing analysis, and various other clerical jobs that involve pattern recognition, computation, and basic speech recognition. However, the same organizations also lower their overall hiring substantially—for example, reducing their postings for all sorts of other positions. Indeed, the evidence indicates that AI so far has been predominantly focused on automation. Moreover, claims that AI and RPAs are expanding into nonroutine, higher-skilled tasks notwithstanding, most of the burden of AI automation to date has fallen on less-educated workers, already disadvantaged by earlier forms of digital automation.

pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You
by Eli Pariser
Published 11 May 2011

What Translate is doing with foreign languages Google aims to do with just about everything. Cofounder Sergey Brin has expressed his interest in plumbing genetic data. Google Voice captures millions of minutes of human speech, which engineers are hoping they can use to build the next generation of speech recognition software. Google Research has captured most of the scholarly articles in the world. And of course, Google’s search users pour billions of queries into the machine every day, which provide another rich vein of cultural information. If you had a secret plan to vacuum up an entire civilization’s data and use it to build artificial intelligence, you couldn’t do a whole lot better.

pages: 381 words: 78,467

100 Plus: How the Coming Age of Longevity Will Change Everything, From Careers and Relationships to Family And
by Sonia Arrison
Published 22 Aug 2011

Incredibly prolific, Kurzweil was the principal developer of the first print-to-speech reading machine for the blind, the first CCD flat-bed scanner, the first text-to-speech synthesizer, the first music synthesizer capable of re-creating the grand piano and other orchestral instruments, and the first commercially marketed, large-vocabulary speech recognition technology.17 That is, it’s safe to say that he is good at collecting information and translating it into usable ideas and products. In his book The Singularity Is Near, Kurzweil discusses how exponentially growing technology will have many important effects, such as pushing the growth of AI that will help humans solve longevity problems.

pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More
by Luke Dormehl
Published 4 Nov 2014

Capable of scanning through political speeches in real time and informing us of when we are being lied to, the TruthTeller is an uncomfortable reminder of both our belief in algorithmic objectivity and our desire for easy answers. In a breathless article, Geek.com described it as “the most robust, automated way to tell whether a politician is lying or not, even more [accurate] than a polygraph test . . . because politicians are so delusional they end up genuinely believing their lies.” The algorithm works by using speech recognition technology developed by Microsoft, which converts audio signals into words, before handing the completed transcript over to a matching algorithm to comb through and compare alleged “facts” to a database of previously recorded, proven facts.35 Imagine the potential for manipulation should such a technology ever ascend beyond simple gimmickry to enjoy the ubiquity of, for instance, automated spell-checking algorithms.

pages: 232 words: 72,483

Immortality, Inc.
by Chip Walter
Published 7 Jan 2020

His many inventions included the K250 music synthesizer, also known as the Kurzweil piano, a machine that could flawlessly imitate the sound of a grand piano (and just about any other instrument); hardware and software that created the first flatbed scanners; the first machines to read and synthesize written words for the blind; some of the earliest speech-recognition software; and a variety of other futuristic and humanitarian technologies. People take these advancements for granted now, but they didn’t before Kurzweil created them. For his achievements, President Clinton awarded him the 1999 National Medal of Technology and Innovation, 14 years before Levinson received his medal and eight years before Venter got his.

pages: 284 words: 75,744

Death Glitch: How Techno-Solutionism Fails Us in This Life and Beyond
by Tamara Kneese
Published 14 Aug 2023

Bowles, “Thermostats, Locks and Lights.” 29. Maalsen and Sadowski, “The Smart Home on FIRE.” 30. Barassi, Child Data Citizen. 31. Conner, “10 Weirdest Things.” 32. Wilson, “Alexa’s Creepy Laughter.” 33. Canales, “Siri, Cortana, and Alexa.” For more on the problems with virtual assistants and speech recognition technology in relation to gender and disability, see Alper, “Talking Like a Princess.” 34. Kidder, The Soul of a New Machine. The computer historian Thomas Haigh published a brief, nostalgic note about Kidder’s book, pointing out that it is “unashamedly masculine” in its presentation; the lone woman on the team is hardly mentioned.

pages: 677 words: 206,548

Future Crimes: Everything Is Connected, Everyone Is Vulnerable and What We Can Do About It
by Marc Goodman
Published 24 Feb 2015

From Japan to Europe and the United States, there are unprecedented amounts of research-and-development dollars flowing into robotics. Admittedly, some of these new developments sound like something out of a Philip K. Dick novel. For instance, nanny bots have already been launched in South Korea and Japan. They can play games and carry out limited conversations with speech recognition. Many use the robot’s eyes to transmit live video of your children to your computer or smart phone. NEC’s PaPeRo robot nanny also allows you to speak with your children directly or via text messages, which the robot can read to your child, and SoftBank’s Pepper proclaims that “it can read your child’s emotions and facial expressions and respond appropriately.”

…

NOAM CHOMSKY When the computer scientist John McCarthy coined the term “artificial intelligence” in 1956, he defined it succinctly as “the science and engineering of making intelligent machines.” Today artificial intelligence (AI) more broadly refers to the study and creation of information systems capable of performing tasks that resemble human problem-solving capabilities, using computer algorithms to do things that would normally require human intelligence, such as speech recognition, visual perception, and decision making. These computers and software agents are not self-aware or intelligent in the way people are; rather, they are tools that carry out functionalities encoded in them and inherited from the intelligence of their human programmers. This is the world of narrow or weak AI, and it surrounds us daily.

pages: 720 words: 197,129

The Innovators: How a Group of Inventors, Hackers, Geniuses and Geeks Created the Digital Revolution
by Walter Isaacson
Published 6 Oct 2014

and it will have no clue, even though a toddler could tell you, after a bit of giggling.11 At Applied Minds near Los Angeles, you can get an exciting look at how a robot is being programmed to maneuver, but it soon becomes apparent that it still has trouble navigating an unfamiliar room, picking up a crayon, and writing its name. A visit to Nuance Communications near Boston shows the wondrous advances in speech-recognition technologies that underpin Siri and other systems, but it’s also apparent to anyone using Siri that you still can’t have a truly meaningful conversation with a computer, except in a fantasy movie. At the Computer Science and Artificial Intelligence Laboratory of MIT, interesting work is being done on getting computers to perceive objects visually, but even though the machine can discern pictures of a girl with a cup, a boy at a water fountain, and a cat lapping up cream, it cannot do the simple abstract thinking required to figure out that they are all engaged in the same activity: drinking.

…

P., ref1 Snow White, ref1 Snyder, Betty, ref1, ref2, ref3 ENIAC’s glitch fixed by, ref1 and public display of ENIAC, ref1 social networking, ref1, ref2 software, ref1 open-source, ref1, ref2 Software Publishing Industry, ref1n Sokol, Dan, ref1 solid circuit, ref1 solid-state physics, ref1, ref2, ref3 Solomon, Les, ref1, ref2 Somerville, Mary, ref1, ref2 sonic waves, ref1 Source, ref1, ref2, ref3 source code, ref1 Soviet Union, ref1, ref2 space program, ref1 Spacewar, ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9 speech-recognition technologies, ref1 Spence, Frances Bilas, see Bilas, Frances Spencer, Herbert, ref1 Sperry Rand, ref1, ref2, ref3n, ref4 Sputnik, ref1, ref2, ref3, ref4 SRI, ref1 Srinivasan, Srinija, ref1 Stallman, Richard, ref1, ref2, ref3, ref4, ref5, ref6 Stanford Artificial Intelligence Lab (SAIL), ref1, ref2, ref3, ref4 Stanford Industrial Park, ref1 Stanford Linear Accelerator Center, ref1 Stanford Research Institute, ref1, ref2, ref3 Stanford Research Park, ref1 steam engine, ref1 stepped reckoner, ref1 Stevenson, Adlai, ref1 Stibitz, George, ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9, ref10 store-and-forward switching, ref1 Strachey, Oliver, ref1 Strategic Air Command, ref1, ref2 subroutines, ref1, ref2 in ENIAC, ref1 of video games, ref1 Suess, Randy, ref1 “Summit Ridge Drive,” ref1 Sun Microsystem, ref1, ref2 surface states, ref1 Sutherland, Ivan, ref1, ref2, ref3, ref4, ref5 Swarthmore University, ref1 Swimming Across (Grove), ref1 Switchboards, ref1 “Symbolic Analysis of Relay and Switching Circuits, A,” ref1 symbolic logic, ref1, ref2, ref3, ref4, ref5, ref6, ref7 Symbolics, ref1 Syzygy, ref1 Tanenbaum, Andrew, ref1 Taylor, Bob, ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9, ref10 ARPA funding raised by, ref1 Internet designed as decentralized by, ref1 Kleinrocker criticized by, ref1 on nuclear weapons myth of Internet origin, ref1 On Distributed Communications read by, ref1 online communities and, ref1 PARC leadership style of, ref1 personality of, ref1, ref2 recruited to PARC, ref1 Robert’s hiring suggested by, ref1 TCP/IP protocols, ref1, ref2 Teal, Gordon, ref1 teamwork, innovation and, ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9 Tech Model Railroad Club, ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8 technotribalism, ref1 Teitelbaum, Ruth Lichterman, see Lichterman, Ruth Tektronix, ref1 Teledyne, ref1, ref2 telephones, ref1, ref2 television, ref1 Teller, Edward, ref1, ref2, ref3 Tennis for Two, ref1 Terman, Fred, ref1n, ref2, ref3, ref4, ref5 Terman, Lewis, ref1 Terrell, Paul, ref1 Tesla, Nikola, ref1, ref2 Texas Instruments, ref1, ref2, ref3, ref4, ref5, ref6, ref7 military microchips by, ref1 textiles, ref1 Thacker, Chuck, ref1, ref2, ref3 “That’s All Right,” ref1 Third Wave, The (Toffler), ref1 Thompson, Clive, ref1 Time, ref1, ref2, ref3, ref4, ref5, ref6, ref7 Time Inc., ref1, ref2 Times (London), ref1 time-sharing, ref1, ref2, ref3, ref4, ref5, ref6 Time Warner, ref1 Tocqueville, Alexis de, ref1, ref2 Toffler, Alvin, ref1 Tolkien, J.

pages: 305 words: 79,303

The Four: How Amazon, Apple, Facebook, and Google Divided and Conquered the World
by Scott Galloway
Published 2 Oct 2017

Echo is the speaker-like cylinder, and Alexa is its artificial intelligence, named after the library of Alexandria.40 Alexa is designed to operate like a personal communicator, enabling the user to call up music, search the web, and get answers to questions. Most of all, it takes gathering to the next level by ordering through powerful speech recognition software. Say, “Alexa, add Sensodyne to shopping cart” or (such a pain) push a Trojan Condoms Dash button41—and in an hour or less, it’s go time. And Alexa gets smarter every time you use it. That’s what the customer gets. For Amazon, the reward is greater: Amazon’s customers trust it so much that they’re allowing the company to listen in on their conversations and harvest their consumption data.

Toast
by Stross, Charles
Published 1 Jan 2002

We live in a world which, by the metrics of Victorian industrial consumption, is poverty stricken; nevertheless, we are richer than ever before. Apply our own metrics to the Victorian age and they appear poor. The definition of what is valuable changes over time, and with it change our social values. As AI and computer speech recognition pioneer Raymond Kurzweil pointed out in The Age of Sensual Machines, the first decade of the twenty-first century will see more change than the latter half of the twentieth. To hammer the last nail into the coffin of predictive SF, our personal values are influenced by our social environment.

pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It)
by Salim Ismail and Yuri van Geest
Published 17 Oct 2014

After allowing it to browse ten million randomly selected YouTube video thumbnails for three days, the network began to recognize cats, without actually knowing the concept of “cats.” Importantly, this was without any human intervention or input. In the two years since, the capabilities of Deep Learning have improved considerably. Today, in addition to improving speech recognition, creating a more effective search engine (Ray Kurzweil is working on this within Google) and identifying individual objects, Deep Learning algorithms can also detect particular episodes in videos and even describe them in text, all without human input. Deep Learning algorithms can even play video games by figuring out the rules of the game and then optimizing performance.

pages: 304 words: 80,143

The Autonomous Revolution: Reclaiming the Future We’ve Sold to Machines
by William Davidow and Michael Malone
Published 18 Feb 2020

A second, but just as important, benefit of all this was that the new technologies replaced muscle power with mechanical power, freeing workers from much of the backbreaking labor that ruined bodies and shortened life expectancies. Non-monetizable productivity has different effects. The technologies driving it are intelligent machines, artificial intelligence, high bandwidth communication, low-cost sensors, lifelike visual displays, and speech recognition and control. These technologies can substitute for the human brain and mind, the senses of vision, touch, and smell, and give us new ways to interact with people and our surroundings. These technologies greatly increase the output of goods and services even as they reduce costs. Substitutional equivalents of some products and services are being produced and scaled with almost zero cost.

pages: 286 words: 87,401

Blitzscaling: The Lightning-Fast Path to Building Massively Valuable Companies
by Reid Hoffman and Chris Yeh
Published 14 Apr 2018

If you wait for an innovation to make its way into the English-language press, perhaps because a Silicon Valley company is now doing it, you might be giving China’s blitzscalers a one-year head start on the global market. The biggest opportunity is for Silicon Valley and China to work together and combine their respective strengths. According to Andrew Ng, it took a combination of ideas from both sides of the Pacific to drive breakthrough progress in speech recognition. Silicon Valley companies like Nvidia provided the graphical processor units (GPUs) to power machine learning networks, while progress came from combining Silicon Valley’s expertise in GPU programming with China’s expertise in supercomputing. As of November 2016, the world’s most powerful supercomputer was the Sunway TaihuLight at the National Supercomputing Center in Wuxi, China, while number two was the Tianhe-2.

pages: 301 words: 85,263

New Dark Age: Technology and the End of the Future
by James Bridle
Published 18 Jun 2018

‘HP cameras are racist’, YouTube video, username: wzamen01, December 10, 2009. 14.David Smith, ‘“Racism” of early colour photography explored in art exhibition’, Guardian, January 25, 2013, theguardian.com. 15.Phillip Martin, ‘How A Cambridge Woman’s Campaign Against Polaroid Weakened Apartheid’, WGBH News, December 9, 2013, news.wgbh.org. 16.Hewlett-Packard, ‘Global Citizenship Report 2009’, hp.com. 17.Trevor Paglen, ‘re:publica 2017 | Day 3 – Livestream Stage 1 – English’, YouTube video, username: re:publica, May 10, 2017. 18.Walter Benjamin, ‘Theses on the Philosophy of History’, in Walter Benjamin: Selected Writings, Volume 4: 1938–1940, Cambridge, MA: Harvard University Press, 2006. 19.PredPol, ‘5 Common Myths about Predictive Policing’, predpol.com. 20.G. O. Mohler, M. B. Short, P. J. Brantingham, et al., ‘Self-exciting point process modeling of crime’, JASA 106 (2011). 21.Daniel Jurafsky and James H. Martin, Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edition, Upper Saddle River, NJ: Prentice Hall, 2009. 22.Walter Benjamin, ‘The Task of the Translator’, in Selected Writings Volume 1 1913–1926, Marcus Bullock and Michael W. Jennings, eds, Cambridge, MA and London: Belknap Press, 1996. 23.Murat Nemet-Nejat, ‘Translation: Contemplating Against the Grain’, Cipher, 1999, cipherjournal.com. 24.Tim Adams, ‘Can Google break the computer language barrier?’

pages: 245 words: 83,272

Artificial Unintelligence: How Computers Misunderstand the World
by Meredith Broussard
Published 19 Apr 2018

Data-driven decisions rarely fit with these complex sets of rules. The same unreasonable effectiveness of data appears in translation, voice-controlled smart home gadgets, and handwriting recognition. Words and word combinations are not understood by machines the way that humans understand them. Instead, statistical methods for speech recognition and machine translation rely on vast databases full of short word sequences, or n-grams, and probabilities. Google has been working on these problems for decades and has the best scientific minds on these topics, and they have more data than anyone has ever before assembled. The Google Books corpus, the New York Times corpus, the corpus of everything everyone has ever searched for using Google: it turns out that when you load all of this in and assemble a massive database of how often words occur near each other, it’s unreasonably effective.

pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies
by Igor Tulchinsky
Published 30 Sep 2019

International Journal of Central Banking 5, no. 4: 177–199. Piotroski, J. (2000) “Value Investing: The Use of Historical Financial Information to Separate Winners from Losers.” Journal of Accounting Research 38: 1–41. Rabiner, L. (1989) “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77, no. 2: 257–286. Rodgers, J. and Nicewander, W.A. (1988) “Thirteen Ways to Look at the Correlation Coefficient.” American Statistician 42, no. 1: 59–66. Rosenberg, B., Reid, K., and Lanstein, R. (1985) “Persuasive Evidence of Market Inefficiency.” Journal of Portfolio Management 11, no. 3: 9–17.

pages: 276 words: 81,153

Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives
by David Sumpter
Published 18 Jun 2018

The important breakthrough of Alex’s research was that it demonstrated that convolutional neural networks could learn to solve a problem without having to be ‘told’ which problem it was they were solving. It soon became apparent that the same approach beat its competitors on handwriting and speech-recognition tasks. It could be used to recognise actions in short films and predict what was going to happen next. This was why convolutional neural networks that played Atari games were so much more exciting than the algorithm that had beaten Kasparov at chess. When Deep Blue won, the researchers had established that a computer could beat a human at an esoteric game.

pages: 263 words: 81,527

The Mind Is Flat: The Illusion of Mental Depth and the Improvised Mind
by Nick Chater
Published 28 Mar 2018

Computational intelligence has instead taken a very different tack: focusing on problems, like chess or arithmetic, that require no free interpretation at all, but which can be reduced to vast sequences of calculations, performed at lightning speed. In addition it has proved to be invaluable for things like speech recognition, machine translation and general knowledge tests, hoovering up solutions to almost unimaginably vast quantities of past problems to enable the machine to solve new problems, which are only a little different.6 Yet what is astonishing about human intelligence, and perhaps biological intelligence more broadly, is its spectacular flexibility.

pages: 295 words: 84,843

There's a War Going on but No One Can See It
by Huib Modderkolk
Published 1 Sep 2021

‘But for that, he’d certainly have been convicted. That’s a shocking discovery.’ When I set out all the details Robin gave me, he also immediately suspects an intelligence agency, and more specifically intelligence in the UK. ‘The British GCHQ taps dozens of fibre-optic cables and has experience with speech recognition software going back to 1981,’ Pluijmers explains. ‘Plus, they have a major stake in automating identification of call content and metadata. They might send Appen tapped calls to improve its software.’ That makes sense. And by choosing an Australian company rather than a British one, outwardly there’s less cause for suspicion.

pages: 328 words: 84,682

The Business of Platforms: Strategy in the Age of Digital Competition, Innovation, and Power
by Michael A. Cusumano , Annabelle Gawer and David B. Yoffie
Published 6 May 2019

When Amazon introduced the Echo speaker device with Alexa software in late 2014, it set in motion a war for platform domination among Google, Apple, Microsoft, Alibaba, Tencent, and a host of start-ups. Amazon’s strategy was to create a new platform powered by a combination of Amazon Web Services, speech recognition, and high-quality speech synthesis. CEO Jeff Bezos sought to bundle the technology with an affordable piece of dedicated hardware. Immediately identifying the potential for cross-side network effects, Amazon launched its Alexa Skills Kit (ASK)—a collection of self-service APIs and tools that made it easy for third-party developers to create new Alexa apps.

pages: 277 words: 81,718

Vassal State
by Angus Hanton
Published 25 Mar 2024

And US defence spending also created Kevlar, Teflon, better weather forecasting and nuclear power, each of which has been used widely in civilian life. But that is tip-of-the-iceberg stuff: military research has also empowered the big US tech companies through the sponsorship by the Department of Defense (DoD) of high-capacity batteries, digital cameras, speech recognition, faster computers, virtual reality, biometrics and GPS. Going back to 1969, military researchers also contributed to the ARPANET, the precursor to the internet. In 1983, the ARPANET was split into two networks: one for the military and one for civilian use, so that the word ‘internet’ had its first use as a way to describe the interlinking of these two early networks.

pages: 299 words: 91,839

What Would Google Do?
by Jeff Jarvis
Published 15 Feb 2009

I can imagine it using us to create a vast repository of our reviews and recommendations about establishments (“leave your review after the tone” or “rate the restaurant using your keypad”). Google may find yet another side door to make money. Tech publisher Tim O’Reilly speculated on his blog that Google wants to gather billions of voice samples as we ask for listings. That will make its speech recognition smarter, helping it get ready for the day when phones and computers respond to voice commands. Chris Anderson, editor of Wired magazine, projected that by 2012, Google could make $144 million in fees from users if it charged for directory assistance, but by foregoing that revenue it could instead make $2.5 billion in the voice-powered mobile search market.

pages: 509 words: 92,141

The Pragmatic Programmer
by Andrew Hunt and Dave Thomas
Published 19 Oct 1999

A blackboard system lets us decouple our objects from each other completely, providing a forum where knowledge consumers and producers can exchange data anonymously and asynchronously. As you might guess, it also cuts down on the amount of code we have to write. Blackboard Implementations Computer-based blackboard systems were originally invented for use in artificial intelligence applications where the problems to be solved were large and complex—speech recognition, knowledge-based reasoning systems, and so on. Modern distributed blackboard-like systems such as JavaSpaces and T Spaces [URL 50, URL 25] are based on a model of key/value pairs first popularized in Linda [CG90], where the concept was known as tuple space. With these systems, you can store active Java objects—not just data—on the blackboard, and retrieve them by partial matching of fields (via templates and wildcards) or by subtypes.

pages: 313 words: 91,098

The Knowledge Illusion
by Steven Sloman
Published 10 Feb 2017

This makes human ignorance all the more surprising. If causality is so critical to selecting the best actions, why do individuals have so little detailed knowledge about how the world works? It’s because thought is masterful at extracting only what it needs and filtering out everything else. When you hear a sentence uttered, your speech recognition system goes to work extracting the gist, the underlying meaning of the utterance, and forgetting the specific words. When you encounter a complicated causal system, you similarly extract the gist and forget the details. If you’re someone who likes figuring out how things work, you might open up an old appliance on occasion, perhaps a coffee machine.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
by Zdravko Markov and Daniel T. Larose
Published 5 Apr 2007

Now these four 2-grams are the features that represent our documents. In this representation the documents do not have any overlap. We have already mentioned n-grams as a technique for approximate string matching but they are also popular in many other areas where the task is detecting subsequences such as spelling correction, speech recognition, and character recognition. Shingled document representation can be used for estimating document resemblance. Let us denote the set of shingles of size w contained in document d as S(d,w). That is, the set S(d,w) contains all w-grams obtained from document d. Note that T (d) = S(d,1), because terms are in fact 1-grams.

pages: 319 words: 90,965

The End of College: Creating the Future of Learning and the University of Everywhere
by Kevin Carey
Published 3 Mar 2015

For most people, information only moved in one direction through the airwaves—broadcast out—and the same was true for the growing network of coaxial cable. The only common way to exchange information in real time at a distance was over the copper wire telephone network, and that was designed for voice interaction. This happened to be one of the tasks computers couldn’t do very well. Although many advances have been made in electronic speech recognition, nobody has invented a computer that can listen and talk to you with the speed and facility of a person. When the Internet became available to ordinary people in the mid-1990s, many of these barriers began to fall. Connections to both powerful computers and to other people that had been limited to small populations in the defense and research sectors were suddenly available to millions and then billions worldwide.

pages: 312 words: 91,538

The Fear Index
by Robert Harris
Published 14 Aug 2011

‘Computers are increasingly reliable translators in the sectors of commerce and technology. In medicine they can listen to a patient’s symptoms and are diagnosing illnesses and even prescribing treatment. In the law they search and evaluate vast amounts of complex documents at a fraction of the cost of legal analysts. Speech recognition enables algorithms to extract the meaning from the spoken as well as the written word. News bulletins can be analysed in real time. ‘When Hugo and I started this fund, the data we used was entirely digitalised financial statistics: there was almost nothing else. But over the past couple of years a whole new galaxy of information has come within our reach.

pages: 339 words: 92,785

I, Warbot: The Dawn of Artificially Intelligent Conflict
by Kenneth Payne
Published 16 Jun 2021

This was a pioneering effort to use AI for something grander than controlling a particular weapon. And even if it was pretty basic, it anticipated the sorts of ‘strategic’ AI we’ll consider later. Perhaps most impressively, they even managed to design a ‘pilot’s associate’—featuring a functioning cockpit speech recognition system to interact with combat pilots. That was a remarkable achievement given the space constraints and the noise and turbulence of flight. All that rested on clunky 1980s computers and AI that relied on statistical processing of symbols. Less successful though was an autonomous land vehicle for the army.

pages: 404 words: 92,713

The Art of Statistics: How to Learn From Data
by David Spiegelhalter
Published 2 Sep 2019

‘Narrow’ AI refers to systems that can carry out closely prescribed tasks, and there have been some extraordinarily successful examples based on machine learning, which involves developing algorithms through statistical analysis of large sets of historical examples. Notable successes include speech recognition systems built into phones, tablets and computers; programs such as Google Translate which know little grammar but have learned to translate text from an immense published archive; and computer vision software that uses past images to ‘learn’ to identify, say, faces in photographs or other cars in the view of self-driving vehicles.

pages: 291 words: 88,879

Going Solo: The Extraordinary Rise and Surprising Appeal of Living Alone
by Eric Klinenberg
Published 1 Jan 2012

Although the machine is still a work in progress, its 2010 incarnation featured a short, sleek human form with a head, a cartoon face, and a video camera on top; a box with two large wheels at its base; and a large touch-screen monitor at its midsection, which its human companions could use for a wide range of Internet-mediated communications. The designers of Kompaï envision the family, friends, and health providers of the machine’s user contacting them through an Internet-based program such as Skype or Facebook or an instant messaging service. When completed, Kompaï will be able to locate and move to its owner, and its speech recognition software will allow physically impaired people to communicate by voice.8 Machines like Kompaï may appeal to homebound and elderly singletons because they provide greater access to a kind of communication that people who live alone are already enjoying. (And by the time the current generation of young adults reaches old age, their comfort with machines will make robotic companions even more attractive.)

pages: 442 words: 94,734

The Art of Statistics: Learning From Data
by David Spiegelhalter
Published 14 Oct 2019

pages: 339 words: 94,769

Possible Minds: Twenty-Five Ways of Looking at AI
by John Brockman
Published 19 Feb 2019

Tech prophets often warn of a “surveillance state” in which a government empowered by technology will monitor and interpret all private communications, allowing it to detect dissent and subversion as it arises and make resistance to state power futile. Orwell’s telescreens are the prototype, and in 1976 Joseph Weizenbaum, one of the gloomiest tech prophets of all time, warned my class of graduate students not to pursue automatic speech recognition because government surveillance was its only conceivable application. Though I am on record as an outspoken civil libertarian, deeply concerned with contemporary threats to free speech, I lose no sleep over technological advances in the Internet, video, or artificial intelligence. The reason is that almost all the variation across time and space in freedom of thought is driven by differences in norms and institutions and almost none of it by differences in technology.

The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy
by Matthew Hindman
Published 24 Sep 2018

Google has even built new globally distributed database systems called Spanner and F1, in which operations across different data centers are synced using atomic clocks.22 The latest iteration of Borg, Google’s cluster management system, coordinates “hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.”23 In recent years Google’s data centers have expanded their capabilities in other ways, too. As Google has increasingly focused on problems like computer vision, speech recognition, and natural language processing, it has worked to deploy deep learning, a variant of neural network methods. Google’s investments in deep learning have been massive and multifaceted, including (among other things) major corporate acquisitions and the development of the TensorFlow high-level programming toolkit.24 But one critical component has been the development of a custom computer chip built specially for machine learning.

The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do
by Erik J. Larson
Published 5 Apr 2021

But simply asserting that the mind is a hierarchical-pattern recognizer by itself tells us too little: it doesn’t say why human beings are the sort of creatures that use language (rodents presumably have a capacity for hierarchical-pattern recognition, too, but don’t talk), and it doesn’t explain why many humans struggle constantly with issues of self-control, nor why we are the sort of creatures who leave tips in restaurants in towns to which we will never return.”4 Such generic theories are, ironically, also inspired by Big Data AI in a roundabout but very real way. Kurzweil is known for using hierarchical methods in machine learning for speech recognition applications; he worked on the original Siri application now owned by Apple and part of the iPhone. Hierarchical hidden Markov models are part of the data-analytic techniques that have merged with big data. And more recently, the now ever-present deep learning networks are arranged in hierarchies of layers.

pages: 336 words: 91,806

Code Dependent: Living in the Shadow of AI
by Madhumita Murgia
Published 20 Mar 2024

The genesis of the transformer and the story of its creators helps to account for how we got to this moment in artificial intelligence: an inflection point, comparable to our transition to the web or to smartphones, that has seeded a new generation of entrepreneurs building AI-powered consumer products for the masses. Today, the transformer underpins most cutting-edge applications of AI in development, from Google Search and Translate to mobile autocomplete and speech recognition by Alexa. It also paved the way for Californian company OpenAI to build ChatGPT. The Transformer Chatbot Nothing could have prepared Mira Murati and her colleagues for how ChatGPT would be used by the world. On 29 November 2022, Mira, who was OpenAI’s chief technology officer, was putting the finishing touches to a new release launching the next day.3 There hadn’t been much fanfare about it, as it was mostly an experimental prototype.

pages: 359 words: 96,019

How to Turn Down a Billion Dollars: The Snapchat Story
by Billy Gallagher
Published 13 Feb 2018

When they eventually sold the company to Snapchat, it was a reunion of sorts, as Rodriguez had lived with Evan (and Reggie) in the Donner dorm at Stanford back in their freshman year. Evan set up a new division of the company, dubbed Snap Lab, and filled it with the ex-Vergence team and engineers with experience working on computer vision, gaze tracking, and speech recognition. Over the next year, Snapchat recruited a dozen wearable technology experts, industrial designers, and people with experience in the fashion industry. Members of the Snap Lab team took frequent trips to Shenzhen, China, to prepare a potential supply chain for a Snapchat hardware product. Snapchat never announces its acquisitions.

Cataloging the World: Paul Otlet and the Birth of the Information Age
by Alex Wright
Published 6 Jun 2014

In this way, everyone from his armchair will be able to contemplate creation, in whole or in certain parts.10 8 I ntrod u ction Even more startling, Otlet also imagined that individuals would be able to upload files to central servers and communicate via wireless networks, anticipated the development of speech recognition tools, and described technologies for transmitting sense perceptions like taste and smell. He foresaw the possibilities of social networks, of letting users “participate, applaud, give ovations, sing in the chorus.” And while he likely would have been flummoxed by the chaotic mesh of present-day social media outlets like Facebook and Twitter, nonetheless he saw the possibilities of constructing a social space around individual pieces of media, and allowing a network of contributors to create links from one to another, much the way hyperlinks work on today’s Web.

pages: 371 words: 98,534

Red Flags: Why Xi's China Is in Jeopardy
by George Magnus
Published 10 Sep 2018

Indeed, the US National Science Foundation said in its annual report mandated by Congress for 2018 that while the US maintains a lead in research and development (R&D), venture capital, most advanced degrees, and production of high-tech manufacturing, its lead is slipping in certain important areas. The US is still ahead overall in AI technology, but Baidu, for example, is one of the top global firms in speech recognition. China was a good second to the US in R&D spending, and accounted for about half the US share of global venture capital investment and of knowledge and technology-intensive services provided to businesses. In both the latter cases, though, China’s progress since 2012 has been remarkable, albeit from a low base.

pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think
by Marcus Du Sautoy
Published 7 Mar 2019

Sure, the machine was programmed by humans, but that doesn’t really seem to make it feel better. AlphaGo has since retired from competitive play. The Go team at DeepMind has been disbanded. Hassabis proved his Cambridge lecturer wrong. DeepMind has now set its sights on other goals: health care, climate change, energy efficiency, speech recognition and generation, computer vision. It’s all getting very serious. Given that Go was always my shield against computers doing mathematics, was my own subject next in DeepMind’s cross hairs? To truly judge the potential of this new AI we are going to need to look more closely at how it works and dig around inside.

pages: 418 words: 102,597

Being You: A New Science of Consciousness
by Anil Seth
Published 29 Aug 2021

.* These images are created by cleverly mixing features from large databases of actual faces, employing techniques similar to those we used in our hallucination machine (described in chapter 6). When combined with ‘deepfake’ technologies, which can animate these faces to make them say anything, and when what they say is powered by increasingly sophisticated speech recognition and language production software, such as GPT-3, we are all of a sudden living in a world populated by virtual people who are effectively indistinguishable from virtual representations of real people. In this world, we will become accustomed to not being able to tell who is real and who is not.

pages: 340 words: 101,675

A New History of the Future in 100 Objects: A Fiction
by Adrian Hon
Published 5 Oct 2020

That fraction still counted for a lot, but it wasn’t quite enough, leading companies such as Dragon and Babylon to partner with massively multiplayer online language education games to put players to work translating content in return for free access and virtual currency. The process wasn’t perfect, but, combined with smarter forms of translation and speech recognition, it improved machine-translated speech such that it achieved over 99.8 percent accuracy within a single second; just about fast enough to be used in conversation if you had a bit of patience. Here’s what Alice Singh thinks: Babylon fascinated me. I remember being on holiday in Myanmar and just walking up to someone at a bus stop and talking to them.

The Art of SEO
by Eric Enge , Stephan Spencer , Jessie Stricchiola and Rand Fishkin
Published 7 Mar 2012

Learning a language based on grammar does not work, as it turns out that language is far too dynamic and changing. However, learning from real-world usage does. Using this technology, Google can offer instant translation across 58 languages. Voice search works much the same way. Historically, speech recognition solutions did not work very well and required the user to train the system to her voice. Google Voice uses a different approach, as noted by Peter Norvig in that same interview: “[For] Voice Search, where you speak your search to Google, we train this model on around 230 billion words from real-world search queries.”

…

Voice-Recognition Search When users are mobile they must deal with the limitations of their mobile devices, specifically the small screen and small keyboard. These make web surfing and mobile searching more challenging than they are in the PC environment. Voice search could be a great way to improve the mobile search experience. It eliminates the need for the keyboard, and provides users with a simple and elegant interface. Speech recognition technology has been around for a long time, and the main challenge has always been that it requires a lot of computing power. Processing power continues to increase, though, even on mobile devices, and the feasibility of this type of technology is growing. This will doubtless be another major area of change in the mobile search landscape.

pages: 345 words: 104,404

Pandora's Brain
by Calum Chace
Published 4 Feb 2014

‘Not really: as I say, just the odd article here and there. If you are curious about their ideas, perhaps you should read one of Ray Kurzweil’s books: he seems to be the best-known proponent. He’s an interesting chap, actually. He was a successful software developer – made a lot of money out of speech recognition software, if I remember right. He’s also written several books about an event called a Singularity, when the rate of technological progress becomes so fast that mere humans are unable to keep up, and we will have to upload our minds into computers. He thinks that this will happen remarkably soon – within your lifetime.’

pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future
by Kevin Kelly
Published 6 Jun 2016

A future office worker is not going to be pecking at a keyboard—not even a fancy glowing holographic keyboard—but will be talking to a device with a newly evolved set of hand gestures, similar to the ones we now have of pinching our fingers in to reduce size, pinching them out to enlarge, or holding up two L-shaped pointing hands to frame and select something. Phones are very close to perfecting speech recognition today (including being able to translate in real time), so voice will be a huge part of interacting with devices. If you’d like to have a vivid picture of someone interacting with a portable device in the year 2050, imagine them using their eyes to visually “select” from a set of rapidly flickering options on the screen, confirming with lazy audible grunts, and speedily fluttering their hands in their laps or at their waist.

pages: 331 words: 104,366

Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins
by Garry Kasparov
Published 1 May 2017

This growth of machines from chess beginners to Grandmasters is also a progression that is being repeated by countless AI projects around the world. AI products tend to evolve from laughably weak to interesting but feeble, then to artificial but useful, and finally to transcendent and superior to human. We see this path with speech recognition and speech synthesis, with self-driving cars and trucks, and with virtual assistants like Apple’s Siri. There is always a tipping point at which they go from amusing diversions to essential tools. Then there comes another shift, when a tool becomes something more, something more powerful than even its creators had in mind.

pages: 389 words: 109,207

Fortune's Formula: The Untold Story of the Scientific Betting System That Beat the Casinos and Wall Street
by William Poundstone
Published 18 Sep 2006

Perception Technology was founded by an MIT physicist, Huseyin Yilmaz, whose training was largely in general relativity. During the visit with Shannon, Yilmaz spoke enthusiastically about physics, asserting that there was a “gap in Einstein’s equation” which Yilmaz had filled with an extra term. Yilmaz’s company, however, was involved in speech recognition. They had developed a secret “word spotter” that would allow intelligence agencies to automatically listen for key words like “missile” or “atomic” in tapped conversations. Another product allowed a computer to talk. Shannon’s pithy report warned Singleton that speech synthesis “is a very difficult field.

pages: 396 words: 107,814

Is That a Fish in Your Ear?: Translation and the Meaning of Everything
by David Bellos
Published 10 Oct 2011

A second threat to maintaining current language practice in international organizations is that some states may become unwilling to finance simultaneous interpretation into languages that are ceasing to be global vehicular tongues—but the replacement of Russian (for example) may prove politically impossible for many decades yet, and nobody has a clear idea of what might replace French. But the bigger threat looming on the horizon is something that’s going on right now in research labs in New Jersey and elsewhere. Using the technology of speech recognition that allows a widely available word processor to generate text from speech, alongside the speech synthesis systems that power today’s automated answering machines, the FAHQT target that current U.S. science policy encourages could well become FAHQST—fully automated, high-quality speech translation.

pages: 394 words: 108,215

What the Dormouse Said: How the Sixties Counterculture Shaped the Personal Computer Industry
by John Markoff
Published 1 Jan 2005

Kay took her under his wing, and before long she was writing intricate low-level software for his project. Others came to Xerox and then were pulled into Kay’s orbit, because his group was talking about the most “supercool things” in an already cool place. Dan Ingalls was working on a separate speech-recognition project across the hallway from Kay’s office and soon found he couldn’t resist Kay’s ideas. Ingalls had come to Stanford in 1966 as a graduate student in electrical engineering. He had grown up in Cambridge, steeped in both old-world wealth and intellectual scholarship. His family had been Virginia landowners for generations, but his father was a Harvard Sanskrit scholar.

pages: 484 words: 104,873

Rise of the Robots: Technology and the Threat of a Jobless Future
by Martin Ford
Published 4 May 2015

Artificial neural networks were first conceived and experimented with in the late 1940s and have long been used to recognize patterns. However, the last few years have seen a number of dramatic breakthroughs that have resulted in significant advances in performance, especially when multiple layers of neurons are employed—a technology that has come to be called “deep learning.” Deep learning systems already power the speech recognition capability in Apple’s Siri and are poised to accelerate progress in a broad range of applications that rely on pattern analysis and recognition. A deep learning neural network designed in 2011 by scientists at the University of Lugano in Switzerland, for example, was able to correctly identify more than 99 percent of the images in a large database of traffic signs—a level of accuracy that exceeded that of human experts who competed against the system.

pages: 688 words: 107,867

Python Data Analytics: With Pandas, NumPy, and Matplotlib
by Fabio Nelli
Published 27 Sep 2018

Finally, in the last few years, the concept of artificial intelligence has focused on visual and auditory recognition operations, which until recently were thought of as “exclusive human relevance”. These operations include:Image recognition Object detection Object segmentation Language translation Natural language understanding Speech recognition These are the problems still under study thanks to the deep learning techniques. Machine Learning Is a Branch of Artificial Intelligence In the previous chapter you saw machine learning in detail, with many examples of the different techniques for classifying or predicting data.

pages: 427 words: 112,549

Freedom
by Daniel Suarez
Published 17 Dec 2009

"My lady, Master Rakh will be very glad when he hears that you've arrived safely. May I ask you to remain here while I fetch him?" Philips knew what to do. She could either right-click on the servant and select from a list of options to respond--or . . . She decided to speak directly into the headset mic. "Yes." She knew Sobol's speech recognition was pretty good. The NPC nodded and smiled. "Excellent, my lady. I don't think the master will be long." With that he marched off in a hurry, placing his hat back on his head. That left Philips some time to explore the terrace. It appeared to be the garden of a several-story villa built into the rock face.

pages: 363 words: 105,039

Sandworm: A New Era of Cyberwar and the Hunt for the Kremlin's Most Dangerous Hackers
by Andy Greenberg
Published 5 Nov 2019

“This was a very significant wake-up call,” as Maersk’s chairman, Snabe, had said at his Davos panel. Then he’d added, with a Scandinavian touch of understatement, “You could say, a very expensive one.” * * * ■ But not all of NotPetya’s costs could be measured in dollars. Another of its collateral victims was a little-known company called Nuance, focused on speech-recognition software. Nuance’s code was used in the first version of the iPhone’s Siri, for instance, and the voice command system in Ford cars. By 2017, however, much of Nuance’s business came from a vast array of institutions that relied on its technology in matters of life and death: hospitals. As it had for so many other massive multinationals, NotPetya sprang out from Nuance’s Ukraine office to instantly paralyze the company’s digital systems across its seventy locations, from India to Korea to its headquarters in Burlington, Massachusetts.

Reset
by Ronald J. Deibert
Published 14 Aug 2020

Central Asian countries like Uzbekistan and Kazakhstan have even gone so far as to advertise for Bitcoin mining operations to be hosted in their jurisdictions because of cheap and plentiful coal and other fossil-fuelled energy sources.349 Some estimates put electric energy consumption associated with Bitcoin mining at around 83.67 terawatt-hours per year, more than that of the entire country of Finland, with carbon emissions estimated at 33.82 megatons, roughly equivalent to those of Denmark.350 To put it another way, the Cambridge Centre for Alternative Finance says that the electricity consumed by the Bitcoin network in one year could power all the teakettles used to boil water in the entire United Kingdom for nineteen years.351 A similar energy-sucking dynamic underlies other cutting-edge technologies, like “deep learning.” The latter refers to the complex artificial intelligence systems used to undertake the fine-grained, real-time calculations associated with the range of social media experiences, such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, and so on. Research undertaken at the University of Massachusetts, Amherst, in which the researchers performed a life-cycle assessment for training several common large AI models, found that training a single AI model can emit more than 626,000 pounds of carbon dioxide equivalent — or nearly five times the lifetime emissions of the average American car (including its manufacturing).352 It’s become common to hear that “data is the new oil,” usually meaning that it is a valuable resource.

pages: 387 words: 106,753

Why Startups Fail: A New Roadmap for Entrepreneurial Success
by Tom Eisenmann
Published 29 Mar 2021

For two decades, Breazeal’s research teams had studied how robots can provide companionship for the elderly, encourage autistic children to engage in social interaction, and spark collaborative creative learning, among other useful functions. In 2013, she and co-founder Jeri Asher, a healthcare entrepreneur, raised $2.2 million in seed funding to commercialize these inventions. To serve as CEO, they recruited Steve Chambers, then president of Nuance Communications, the leading provider of natural language understanding and speech recognition software. Because Breazeal’s research had shown how social robots could contribute to seniors’ emotional wellness, the team initially pitched Jibo as a companion for the elderly. However, mainstream VCs interested in consumer electronics and complex systems like robots weren’t interested in the elderly market, and the small set of investors who funded new ventures for aging consumers—accustomed to simpler concepts like cellphones with large keypads—were daunted by the program’s technological vision.

pages: 412 words: 104,864

Silence on the Wire: A Field Guide to Passive Reconnaissance and Indirect Attacks
by Michal Zalewski
Published 4 Apr 2005

A Markov Model is a method for describing a discrete system in which the next value depends only on its current state, and not on the previous values (Markov chain). The Hidden Markov Model is a variant that provides a method for describing a system for which each internal state generates an observation, but for which the actual state is not known. This model is commonly used in applications such as speech recognition, in which the goal is to obtain pure data (a textual representation of the spoken word) from its specific manifestation (sampled waveform). The authors conclude that the Hidden Markov Model is applicable to keystroke analysis, and they consider the internal state of the system to be the information about keys pressed; the observation in the Hidden Markov Model is the inter-keystroke timing.

pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics)
by Trevor Hastie , Robert Tibshirani and Jerome Friedman
Published 25 Aug 2009

FDA in this case can be shown to perform a penalized linear discriminant analysis in the enlarged space. We elaborate in Section 12.6. Linear boundaries in the enlarged space map down to nonlinear boundaries in the reduced space. This is exactly the same paradigm that is used with support vector machines (Section 12.3). We illustrate FDA on the speech recognition example used in Chapter 4.), with K = 11 classes and p = 10 predictors. The classes correspond to 12.5 Flexible Discriminant Analysis Linear Discriminant Analysis o o oo ooo oo ooooooo ooo o oo o oooooooooo o ooo o o oo o ooo o o ooo o oo o o o o o o oo oo o oooo o o o ooooo o ooo ooo o o o oo oo oo o o o ooo oo oooooooo o oo o oo o oo o ooooo ooo oo ooooo o oo o oo o ooo oo o oo ooo o oo ooo oo o o o oo o o ooooo oo o o oo ooo o o o o o ooo ooo o o o o ooooooooooo o oooo o ooo ooo ooo ooooo o o ooo oo o ooo o oo o o o oooo o oo oooo o oo o o oooooo oo ooo o oo o ooooo o o o o o o o o o o o o o o o ooooo o oo ooooo o o ooo o oooo o o oo o o o o oo o o o oooo o o o o o o oo oo oo o ooo o o o o oo o oo o oo o oooooo ooo o o o oo o ooooooo oo o oo o oo o o o oo o o oo o o o oo o o o o o ooo oo oo o o o o o o o oo o o o o oooo o o oo oo o oo o oo oo o oooo o o o oo o o o o Coordinate 2 for Training Data Coordinate 2 for Training Data Flexible Discriminant Analysis -- Bruto oo ooo oooo oo oo 443 o ooooo o oo oooo ooo o oooo o oo o o ooo o oooooooooooo o o o oo oo o o o o o ooo oo o o ooo o ooo oo o oooo o o ooooo o oo oo ooo o o o o o o o o oo o o oo o o oo o o ooo oo ooo oo oooo o o o o o oo o o o o oooooo ooooo o o o oooo o o o ooooo o o oo o o o o o o o o oo o ooo oooo oo o o o o o o oo o oo ooooooo o oo o o o ooooo o o ooo oo o oo o ooo o o o ooo oo o o o o oo o o o o o o o o o o o o ooooo ooo o o oo o oooo oo o o oo o oo o ooo oo o ooo o ooooo oo ooooo o o oo o oo oooooooooooooooo o ooo ooooooo oo oooooooo ooo o o o o oo o o o ooo ooooooooooooo oo oo o o o ooo o oo ooo o oooooo o o oooo o oooooo oo o o oo o oo oo ooo o oo o o oo o o o o o o ooo o o o oo oo oo o ooo o o o o o o oo o Coordinate 1 for Training Data Coordinate 1 for Training Data FIGURE 12.10.

…

Here are the words, preceded by the symbols that represent them: Vowel i: E a: Word heed head hard Vowel O U 3: Word hod hood heard Vowel I A Y Word hid had hud Vowel C: u: Word hoard who’d Each of eight speakers spoke each word six times in the training set, and likewise seven speakers in the test set. The ten predictors are derived from the digitized speech in a rather complicated way, but standard in the speech recognition world. There are thus 528 training observations, and 462 test observations. Figure 12.10 shows two-dimensional projections produced by LDA and FDA. The FDA model used adaptive additive-spline regression functions to model the ηℓ (x), and the points plotted in the right plot have coordinates η̂1 (xi ) and η̂2 (xi ).

pages: 405 words: 117,219

In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence
by George Zarkadakis
Published 7 Mar 2016

This technology is valuable for Facebook as it aspires to increase the ways in which it serves its billions of customers – and the advertising industry – by extracting meaning from its colossal and ever-expanding archive of user-generated content. Google has a similar aspiration: it wants to use AI technology to understand context and meaning, and thus provide better search resources, video recognition, speech recognition and translation, increased security, and smarter services when it comes to Google’s social networks and e-commerce platforms. When Google spent half a billion dollars to acquire the British company Deep Mind, it was in fact hedging a bet that Artificial Intelligence will define the second machine age.

pages: 390 words: 114,538

Digital Wars: Apple, Google, Microsoft and the Battle for the Internet
by Charles Arthur
Published 3 Mar 2012

(Its forebears are seen on the original Star Trek series, and the Etch A Sketch.) As Gates explained to the New York Times that August, ‘We’re trying to see if we can produce a tablet PC and the software for it that will be sufficiently powerful and intuitive and inexpensive to capture the imagination and the marketplace.’1 He thought handwriting or speech recognition might replace the keyboard. Brass already knew he faced a huge challenge. It wasn’t over the quality of the idea; it was getting the right backing inside the company. He had experience of that already: when the group invented a system for displaying text on-screen with greater legibility, which they called ClearType, he was told by the Windows group that some of the colours made the display break, and by the Office group that the display wasn’t sharp, but ‘fuzzy’.

pages: 352 words: 120,202

Tools for Thought: The History and Future of Mind-Expanding Technology
by Howard Rheingold
Published 14 May 2000

Knowledge engineering is but one part of that ever-expanding area of hardware and software research that constitutes the field of AI. Unlike other artificial intelligence researchers, Avron Barr is not concerned with systems that can direct an optical sensor to recognize visual patterns, or to help a speech-recognition system to understand natural languages, or direct a robot in the task of climbing stairs. He and his colleagues are trying to build systems that can transfer knowledge from experts to novices and that can use the transferred knowledge to help people make decisions about specific problems. Barr's specialty seems to bridge the gap between those who see the future of computers in terms of "mind tools" and those who see it in terms of "the next step in the evolution of intelligence."

System Error: Where Big Tech Went Wrong and How We Can Reboot
by Rob Reich , Mehran Sahami and Jeremy M. Weinstein
Published 6 Sep 2021

Some analysts believe the future here may be more of a “human on the loop” model, where a human translator uses the results of a machine as a fast first pass and then works to make the translation more accurate and reflective of colloquial language usage. But would we still need as many translators? In a seemingly small industry, will such a specialized skill be needed? In other domains, such as automated customer support, AI systems that combine speech recognition and natural language understanding—much like those used in smart speakers such as Amazon Alexa and Google Home—have become a new front line for customer interaction. Only when the system isn’t able to get you the help you need will a human be brought in to intervene. That seems very likely to reduce the number of customer support representatives needed.

pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline
by Cathy O'Neil and Rachel Schutt
Published 8 Oct 2013

It’s not necessarily useful to argue over who the rightful owner is of these methods, but it’s worth pointing out here that it can get a little vague or ambiguous about what the actual difference is. In general, machine learning algorithms that are the basis of artificial intelligence (AI) such as image recognition, speech recognition, recommendation systems, ranking and personalization of content—often the basis of data products—are not usually part of a core statistics curriculum or department. They aren’t generally designed to infer the underlying generative process (e.g., to model something), but rather to predict or classify with the most accuracy.

Super Thinking: The Big Book of Mental Models
by Gabriel Weinberg and Lauren McCann
Published 17 Jun 2019

The modern equivalent is internet messaging services: they need to reach critical mass within a community to be useful. Once they pass this tipping point, they can rapidly make their way into the mainstream. Network effects have value beyond communication, however. Many modern systems gain network effects by simply being able to process more data. For example, speech recognition improves when more voices are added. Other systems gain advantages by being able to provide more liquidity or selection based on the volume or breadth of participants. Think of how more goods are available on Etsy and eBay when more people are participating on those sites. Network effects apply to person-to-person connections within a community as well.

pages: 404 words: 115,108

They Don't Represent Us: Reclaiming Our Democracy
by Lawrence Lessig
Published 5 Nov 2019

For a fantastic analysis of the antitrust problem raised by “free” data, see Dirk Bergemann and Alessandro Bonatti, “The Economics of Social Data” (working paper, January 15, 2019) (while identifying a competitive problem, the authors have no clear remedy beyond data portability). 94.Steven Levy, In the Plex: How Google Thinks, Works, and Shapes Our Lives (New York: Simon & Schuster, 2011), 172–73. 95.Microsoft acquired Skype for $8.5 billion on May 10, 2011. “Microsoft Officially Welcomes Skype,” Microsoft, October 13, 2011, available at link #124. Microsoft revealed its speech recognition capabilities through an announcement that demonstrated Star Trek–like technology (the “universal translator”). “Skype Translator Preview—An Exciting Journey to a New Chapter in Communication,” Skype, December 15, 2014, available at link #125. Microsoft assures users that no personally identifiable data is gathered from Skype and that the data is not used for advertising.

Financial Statement Analysis: A Practitioner's Guide
by Martin S. Fridson and Fernando Alvarez
Published 31 May 2011

That was despite questions raised about MicroStrategy's financials by accounting expert Howard Schilit six months earlier and by reporter David Raymond in an issue of Forbes ASAP distributed on February 21.6 It was reportedly only after reading Raymond's article that an accountant in the auditor's national office contacted the local office that had handled the audit, ultimately causing the firm to retract its previous certification of the 1998 and 1999 financials.7 No Straight Talk from Lernout & Hauspie On November 16, 2000, the auditor for Lernout & Hauspie Speech Products (L&H) withdrew its clean opinion of the company's 1998 and 1999 financials. The action followed a November 9 announcement by the Belgian producer of speech-recognition and translation software that an internal investigation had uncovered accounting errors and irregularities that would require restatement of results for those two years and the first half of 2000. Two weeks later, the company filed for bankruptcy. Prior to November 16, 2000, while investors were relying on the auditor's opinion that Lernout & Hauspie's financial statements were consistent with generally accepted accounting principles, several events cast doubt on that opinion.

pages: 382 words: 114,537

On the Clock: What Low-Wage Work Did to Me and How It Drives America Insane
by Emily Guendelsberger
Published 15 Jul 2019

If you don’t say some specific keywords you’re supposed to use. I’m initially skeptical of rumors that some software scans every call for proper (or improper) verbiage and flags calls in which a customer swears or sounds irritated. As someone who transcribes a lot of interviews, I’ve followed the progress of speech-recognition technology pretty closely: last time I checked, the day I could trust a computer to do my transcription for me was still at least a decade away. But later, I actually find a ton of companies claiming that their software will do exactly what the rumor mill says. One even can nag workers about tone in real time: We all know how it feels to be low on energy at the end of a long work day.

pages: 315 words: 115,894

Billionaire, Nerd, Savior, King: Bill Gates and His Quest to Shape Our World
by Anupreeta Das
Published 12 Aug 2024

After the thundering applause and the flash of cellphone cameras faded, Gates began. He spoke at length about software, innovation, entrepreneurship, and philanthropy, topics that were dear to him, and his vision for all the ways in which technology would continue to transform lives and society. He dwelled on the potential of 3D technology. He talked about how far speech recognition software had to go. And then, with a simple but attention-grabbing statistic, he explained why his latest pursuit, philanthropy, was necessary: Because malaria research gets only 10 percent of the funding that goes to research on cures for baldness. “The market directs itself to solve problems based on economic signals,” Gates said to his rapt audience, moving about the stage, gesticulating, as he explained why.

pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See
by Gary Price , Chris Sherman and Danny Sullivan
Published 2 Jan 2003

Crawlers can fetch any page that can be displayed in a Web browser, regardless of whether it’s a static page stored on a server or generated dynamically. A good example of this type of Invisible Web site is Compaq’s experimental SpeechBot search engine, which indexes audio and video content using speech recognition, and converts the streaming media files to viewable text (http://www.speech bot.com). Somewhat ironically, one could make a good argument that most search engine result pages are themselves Invisible Web content, since they generate dynamic pages on the fly in response to user search terms.

pages: 519 words: 118,095

Your Money: The Missing Manual
by J.D. Roth
Published 18 Mar 2010

Here are a handful of ways to add to your cash flow: Research studies You can earn quick cash by participating in medical research and marketing studies. I once earned $120 for spending an hour inside an MRI scanner while answering questions about money. Other folks have earned $150 for giving opinions on food packaging, $50 to record 40 minutes of audio for a speech-recognition program—and even $35 for watching porn! Colleges and companies are always looking for people to join their experiments and focus groups. To find studies in your area, check Craigslist.org's "miscellaneous jobs" section or scope out college newspapers and bulletin boards. Here's a short video from MSN Money that describes one study: http://tinyurl.com/MSNmoneystudies.

User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work & Play
by Cliff Kuang and Robert Fabricant
Published 7 Nov 2019

It seemed like one of those quirks about a place, like the faucets or the bus stops, that make it feel so foreign. “I thought, Okay, that’s strange, but maybe that’s just how they hold their phones here,” said Connell. It turned out that the Chinese were using their phones differently than almost anyone in the West, using voice as the main interface, letting speech recognition programs do their texting instead of tapping out things themselves. Partly, this was simply easier, because of the cumbersome nature of tapping out Chinese characters on a smartphone. But the more interesting fact was that they weren’t just chatting with humans. They were using chat as their entry point into the digital world; rather than tapping through menus to find the right app, they were using their voice.6 The smartphone is a different thing in China.

pages: 550 words: 124,073

Democracy and Prosperity: Reinventing Capitalism Through a Turbulent Century
by Torben Iversen and David Soskice
Published 5 Feb 2019

Alphabet (Google’s parent company) has recently created a new research unit, called Verily Life Sciences, to use AI approaches to medical data analysis that can assist in diagnosing and devising treatment plans, and Microsoft’s Healthcare NeXT is focused on collecting huge amounts of individual data from a variety of sources to cloud-based systems, including a virtual assistant that takes notes from patient-doctor meetings using speech recognition technologies (Singer 2017). In principle, doctors are not even required to arrive at an accurate diagnosis; all these systems need are individuals who can ask patients for the right information, a task computers could easily take over if not for the unease people might feel about revealing private information to a machine.

pages: 451 words: 125,201

What We Owe the Future: A Million-Year View
by William MacAskill
Published 31 Aug 2022

The highest-profile AI achievements in real-time strategy games were DeepMind’s AlphaStar defeat of human grandmasters in the game StarCraft II and the OpenAI Five’s defeat of human world champions in Dota 2 (OpenAI et al. 2019; Vinyals et al. 2019). Early successes in image classification (see, e.g., Krizhevsky et al. 2012) are widely seen as having been key for demonstrating the potential of deep learning. See also the following: speech recognition, Abdel-Hamid et al. (2014); Ravanelli et al. (2019); music, Briot et al. (2020); Choi et al. (2018); Magenta (n.d.); visual art, Gatys et al. (2016); Lecoutre et al. (2017). Building on astonishing progress demonstrated by Ramesh et al. (2021), the ability to create images from text descriptions by combining two AI systems known as VQGAN (Esser et al. 2021) and CLIP (OpenAI 2021b; Radford et al. 2021) caused a Twitter sensation (Miranda 2021). 38.

pages: 487 words: 124,008

Your Face Belongs to Us: A Secretive Startup's Quest to End Privacy as We Know It
by Kashmir Hill
Published 19 Sep 2023

They kept tinkering with their neural networks, going to conferences and publishing papers about their work, in the hope of recruiting others to their technological cause. And eventually, thanks to faster computers, new techniques, and loads more data, their neural networks started to work. Once they were up and running, neural networks blew away all the other approaches to AI. They made speech recognition better, image recognition more reliable, and facial recognition more accurate. Neural networks would be employed for all manner of tasks: recommending shows to watch on Netflix, populating playlists on Spotify, providing eyes to the autopilot in Tesla’s electric cars, and allowing ChatGPT to converse in a seemingly human way.

pages: 1,164 words: 309,327

Trading and Exchanges: Market Microstructure for Practitioners
by Larry Harris
Published 2 Jan 2003

As they spoke, it wrote their words and those of their counterparts onto screens on their desks so that they could see exactly what they said. This system allowed them to recognize and correct errors as they occurred. It also forced them to enunciate clearly. Lehman Brothers abandoned the system because the speech recognition technology then available was too primitive to transcribe speech accurately. With recent advances in this technology, attempts to use this system should be more successful. ◀ * * * In face-to-face oral negotiations, traders sometimes use hand signs to convey their intentions. These signals are especially common in futures pits.

…

(SLKC), 19–20 specialist participation rate, 500 specialists, 494–513 affirmative obligations, 496–98 assignment of, 509–10 as auctioneers and exchange officials, 501 as brokers, 500–501, 508 control of market quotes, 504–5 cream-skimming strategies, 503 and dealers, 279, 496–99, 508–9 definition of, 494 and market open, 508, 509 negative obligations, 198–99, 500 New York Stock Exchange, 298, 494, 495, 496, 500, 510 participation rates, 502 privileges, 501–9 quote-matching strategies, 502–3 regulatory issues, 510–11 and Rule 80A, 580 speculative strategies, 501–2 and stop orders, 503–4, 505–8 in U.S. stock markets, 48 specialist trading systems, 494, 495–96 specialization, 215 special settlement instructions, 85 specified price benchmark methods, 422, 423–27 speculative arbitrages, 356–63 speculative margins, 61 speculators, 6, 46, 177, 221–75 and bluffers, 265–67 definition of, 5, 181, 190, 194, 200, 206 and prediction of performance, 442 as profit-motivated traders, 194–95, 196, 206 speech recognition technology, 108 speed of execution, 515 Spider. See SPDR (Standard & Poor’s Depository Receipt) spinning off holdings, 370 split orders, 428–29 spoiling the market, 324 spot currencies, 58 spot markets, 39, 182 spot prices, 416 spread orders, 84 spreads, 312, 357–58, 367 See also bid/ask spread; outside spread spread traders, 279 squawk boxes, 149 squeezers, 195, 196, 245, 254–56, 257 stale information, 229 stale prices, 134, 135 Standard and Poor’s.

pages: 476 words: 132,042

What Technology Wants
by Kevin Kelly
Published 14 Jul 2010

Sometimes we dismiss it by calling it “machine learning.” So while we weren’t watching, billions of tiny, insectlike artificial minds spawned deep into the technium, doing invisible, low-profile chores like reliably detecting credit-card fraud or filtering e-mail spam or reading text from documents. These proliferating microminds run speech recognition on the phone, assist in crucial medical diagnosis, aid stock-market analysis, power fuzzy-logic appliances, and guide automatic gearshifts and brakes in cars. A few experimental minds can even drive a car autonomously for a hundred miles. The future of the technium at first seems to point to bigger brains.

pages: 462 words: 142,240

Iron Sunrise
by Stross, Charles
Published 28 Oct 2004

“I want it back before someone else finds it,” she said, forcing a tone of spoiled pique. Trying to figure it out, whatever it was that Wednesday had stashed near the police station in Old Newfie, was infuriating, but he didn’t dare say so openly while they might be under surveillance. The combination of ultrawideband transceivers, reprogrammed liaison network nodes, and speech recognition software had turned the entire ship into a panopticon prison — one where mentioning the wrong words could get a passenger into a world of pain. Martin’s head hurt just thinking about it, and he had an idea from her tense, clipped answers to any questions he asked her that Rachel felt the same way.

pages: 573 words: 142,376

Whole Earth: The Many Lives of Stewart Brand
by John Markoff
Published 22 Mar 2022

Brand asked Shel Kaphan, one of the young computer hackers who worked at the Truck Store, to put him in touch with the researchers at computer scientist John McCarthy’s Stanford Artificial Intelligence Laboratory. SAIL had been established to build a working artificial intelligence and he had collected an eclectic group of young researchers exploring technologies like robotics, computer vision, natural language understanding, and speech recognition. Simultaneously, Bill English opened the doors to the Palo Alto Research Center. Xerox had created PARC to compete directly with IBM, and Robert Taylor, a young psychologist who had funded the development of the ARPANET while at the Pentagon, had been given the charter of rethinking the future of the office based upon computers and networks.

pages: 462 words: 129,022

People, Power, and Profits: Progressive Capitalism for an Age of Discontent
by Joseph E. Stiglitz
Published 22 Apr 2019

This happened to Blackberry, at one time one of the leading mobile phone companies, which, after extensive litigation, had to pay $612 million just to continue offering its services, whether the patents which it allegedly infringed were eventually held to be valid or not. For start-ups, such suits are even more daunting. For example, Vlingo was a start-up that worked on speech recognition technologies. However, it was hit by a series of lawsuits by a much larger firm called Nuance. Ultimately Vlingo agreed to be acquired by Nuance, but that was after $3 million legal expenses, despite winning the first lawsuit (there were six filed in total). See Charles Duhigg and Steve Lohr, “The Patent, Used as a Sword,” New York Times, Oct. 7, 2012.

Adam Smith: Father of Economics
by Jesse Norman
Published 30 Jun 2018

Take the capacity to communicate in writing: thousands of years ago hominids could draw on sand with their hands or with sticks, or scratch marks on stone; then came stylus and ink on papyrus, quill pens, pencils, the fountain pen, the ballpoint pen, the typewriter, the desk printer, the keyboard, the keypad, predictive text and automated speech recognition. The self-sufficient provision of one’s own writing tools yielded to markets, competition, specialization, huge falls in cost, rapidly accelerating product lifecycles and almost universal availability. But even in Smith’s day, with a much slower pace of change, markets changed and evolved in response to wider economic conditions, seasonal factors, government interventions, consumer or producer pressure, competition, taste, fashion and social norms, among a host of other things.

pages: 611 words: 130,419

Narrative Economics: How Stories Go Viral and Drive Major Economic Events
by Robert J. Shiller
Published 14 Oct 2019

Fears of the Singularity Gain Strength after the 2007–9 World Financial Crisis According to Google Trends, the latest wave of automation/technology-based fears began around 2016 and continues unabated at the time of this writing. How do we explain this recent surge in automation fears? To answer this question, we must consider the advent of Apple’s Siri, the iPhone app launched in 2011 that uses automatic speech recognition (ASR) and natural language understanding (NLU) to (attempt to) answer the questions you’ve asked it.21 To many, Siri’s ability to talk, understand, and provide information looked like the advent of that long-awaited singularity when machines become as smart as, or smarter than, people. That same year, IBM presented its talking computer Watson as a competitor on the television quiz show Jeopardy, and Watson beat the human champions who played against it.

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
by Valliappa Lakshmanan , Sara Robinson and Michael Munn
Published 31 Oct 2020

NLU is used by speech agents like Amazon’s Alexa, Apple’s Siri, and Google’s Assistant to understand sentences like, “What is the weather forecast this weekend?” There are many use cases that fall under the umbrella of NLU and it can be applied to a lot of processes, such as text classification (email filtering), entity extraction, question answering, speech recognition, text summarization, and sentiment analysis. Embeddings Hashed Feature Neutral Class Multimodal Input Transfer Learning Two-Phase Predictions Cascade Windowed Inference Computer Vision Computer vision is the broad parent name for AI that trains machines to understand visual input, such as images, videos, icons, and anything where pixels might be involved.

pages: 642 words: 141,888

Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination
by Mark Bergen
Published 5 Sep 2022

In an interview back in 2002 he explained that effective web search, really giving people what they want, required understanding “everything in the world” and that this required an AI. A decade later he rightly predicted that machine learning would become all the rage. Amazon would release a speech-recognition gizmo called Echo. Facebook’s Mark Zuckerberg, who publicized his annual life-betterment goals, would spend one year inventing an AI butler. Tech companies threw around the slogan “mobile-first,” signposting their fitness for the smartphone world; Google would declare itself “AI-first.” Once the arcade boxing clip concluded on the TED stage, Page caught his breath.

pages: 310 words: 34,482

Makers at Work: Folks Reinventing the World One Object or Idea at a Time
by Steven Osborn
Published 17 Sep 2013

I’ve often said—I don’t have enough energy to do this myself yet—I always thought it would be nice to just grab a recorder and go to these old-timers and plop it down in front of them and just have them tell stories: “Tell us the nitty-gritty little stories about what you did and how you solved problems in engineering,” and get those recorded because it’s pretty exciting. All this media right now—we’re recording audio today. It’s not very searchable, but speech recognition is going to become more and more searchable in the future. If we could at least capture that information now and someday down the road it will become searchable. Video will become more searchable, too. Osborn: If I had one question for them, it would be, “Tell me about abusing some components to do things that they were not designed to do, and what did you use it for?”

pages: 478 words: 149,810

We Are Anonymous: Inside the Hacker World of LulzSec, Anonymous, and the Global Cyber Insurgency
by Parmy Olson
Published 5 Jun 2012

When the #press channel’s participants read over the press release, it sounded so dramatic and ominous that they decided something similar should be narrated in a video, too. A member of the group, whose nickname was VSR, created a YouTube account called Church0fScientology, and the group spent the next several hours finding uncopyrighted footage and music, then writing a video script that could be narrated by an automated voice. The speech recognition technology was so bad they had to go back and misspell most of the words—destroyed became “dee stroid,” for instance—to make it sound natural. The final script ended up looking like nonsense but sounding like normal prose. When they finally put it together, a Stephen Hawking–style robotic voice said over an image of dark clouds, “Hello, leaders of Scientology, we are Anonymous.”

pages: 667 words: 149,811

Economic Dignity
by Gene Sperling
Published 14 Sep 2020

Each of these experts sees a bigger government role as not only increasing the likelihood of game-changing breakthroughs but also as a pathway to dramatically increasing high-quality jobs. After all, DARPA-funded research formed the basis for the internet and contributed to the development of the personal computer, speech recognition software, and even Google.43 Nonetheless, outside some support for the National Institutes of Health, conservatives rarely support such research, and the Trump administration has consistently proposed eliminating ARPA-E altogether. E. MISSING THE LAWS OF HEALTH-CARE ECONOMICS There is no debate that reveals the inability of a “less government” dogma to support economic dignity more clearly than health care.

pages: 573 words: 157,767

From Bacteria to Bach and Back: The Evolution of Minds
by Daniel C. Dennett
Published 7 Feb 2017

And nothing compact and salient can be discerned among the physical properties of different occurrences of the phonemes that distinguish cat from bat from bad from bed and ball from bill and fall from full. The differences between these words seem simple and obvious, but that is an illusion engendered by our inbuilt competence, not by underlying simplicity in the signal. After decades of research and development, speech-recognition software is finally almost as competent as a five-year-old child at extracting the phonemes of casual speech from the acoustic maelstrom that arrives at an ear or a microphone. The digitization of phonemes has a profound implication: words play a role in cultural evolution that is similar to the role of DNA in genetic evolution, but, unlike the physically identical ladder rungs in the double helix made of Adenine, Cytosine, Guanine, and Thymine, words are not physically identical replicators; they are “identical” only at the user-illusion level of the manifest image.

pages: 579 words: 160,351

Breaking News: The Remaking of Journalism and Why It Matters Now
by Alan Rusbridger
Published 14 Oct 2018

Over dinner in a North London restaurant Negroponte started with convergence – the melting of all boundaries between TV, newspapers, magazines and the internet into a single media experience – and moved on to the death of copyright, possibly the nation state itself. There would be virtual reality, speech recognition, personal computers with inbuilt cameras, personalised news. The entire economic model of information was about to fall apart. The audience would pull rather than wait for old media to push things as at present. Information and entertainment would be on demand. Overly hierarchical and status-conscious societies would rapidly erode.

pages: 499 words: 144,278

Coders: The Making of a New Tribe and the Remaking of the World
by Clive Thompson
Published 26 Mar 2019

The National Cancer Institute is working on using it to detect cancer in CT scans. It’s even seeping into the world of culture: ByteDance, one of China’s hugest firms, uses neural nets to help curate news stories in its Toutiao news app, so successfully that users spend more than 74 minutes a day using it. A few years ago, Kai-Fu Lee, who invented the first “plain-talk speech recognition” and went on to be a veteran of Apple, Microsoft, and Google put all his new financial investment decisions in the hands of AI. “I don’t trade with humans anymore” for those things, he told me. And, as with any software craze, the hunt for warm bodies exploded. Silicon Valley and China in particular grew ravenous for coders fluent in deep learning, with salaries reaching well into the six figures for anyone adept at teaching computers to see, hear, read, and predict.

pages: 592 words: 161,798

The Future of War
by Lawrence Freedman
Published 9 Oct 2017

Louis Del Monte envisaged a line of development from computers designing nanoweapons, within parameters set by humans, to a ‘singularity computer’, one more intelligent than the whole human race, in place by 2050.18 All this required enormous technical problems to be solved in miniature—including the furnishing of these tiny robots with a power source, antennae, communication, and steering.19 Well before such issues arose there were still troubling matters to be addressed. Artificial intelligence referred to computer systems capable of performing tasks normally requiring human intelligence, such as visual perception, speech recognition, and decision-making.20 This could involve quite mundane tasks. At issue therefore was the level of complexity that could be achieved. In war this would require selecting and engaging targets without meaningful human control, so that their behaviour would vary according to circumstances even in the same broad operating environment.

Turing's Cathedral
by George Dyson
Published 6 Mar 2012

We knew from the beginning that this logical, intelligent behavior evident in organisms was the result of fundamentally statistical, probabilistic processes, but we ignored that (or left the details to the biologists), while building “models” of intelligence—with mixed success. Through large-scale statistical, probabilistic information processing, real progress is being made on some of the hard problems, such as speech recognition, language translation, protein folding, and stock market prediction—even if only for the next millisecond, now enough time to complete a trade. How can this be intelligence, since we are just throwing statistical, probabilistic horsepower at the problem, and seeing what sticks, without any underlying understanding?

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

Behavior Analyst 28, no. 2 (2005): 143–59. Bain, Alexander. The Senses and the Intellect. London: John W. Parker and Son, 1855. Baldassarre, Gianluca, and Marco Mirolli, eds. Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, 2013. Balentine, Bruce. It’s Better to Be a Good Machine Than a Bad Person: Speech Recognition and Other Exotic User Interfaces in the Twilight of the Jetsonian Age. ICMI Press, 2007. Barabas, Chelsea, Madars Virza, Karthik Dinakar, Joichi Ito, and Jonathan Zittrain. “Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment.” In Proceedings of Machine Learning Research, 81 (2018): 62–76.

pages: 624 words: 180,416

For the Win
by Cory Doctorow
Published 11 May 2010

WHERE IS THE BATHROOM “What is it?” said Suzanne. Her hand wobbled a little and the distant letters danced. WHAT IS IT “This is a new artifact designed and executed by five previously out-of-work engineers in Athens, Georgia. They’ve mated a tiny Linux box with some speaker-independent continuous speech recognition software, a free software translation engine that can translate between any of twelve languages, and an extremely high-resolution LCD that blocks out words in the path of the laser-pointer. “Turn this on, point it at a wall, and start talking. Everything said shows up on the wall, in the language of your choosing, regardless of what language the speaker was speaking.”

pages: 598 words: 183,531

Hackers: Heroes of the Computer Revolution - 25th Anniversary Edition
by Steven Levy
Published 18 May 2010

And the harsh realities of funding hit Tech Square in the seventies: ARPA, adhering to the strict new Mansfield Amendment passed by Congress, had to ask for specific justification for many computer projects. The unlimited funds for basic research were drying up; ARPA was pushing some pet projects like speech recognition (which would have directly increased the government’s ability to mass-monitor phone conversations abroad and at home). Minsky thought the policy was a “losing” one, and distanced the AI lab from it. But there was no longer enough money to hire anyone who showed exceptional talent for hacking.

pages: 834 words: 180,700

The Architecture of Open Source Applications
by Amy Brown and Greg Wilson
Published 24 May 2011

Once phone calls are made to and from an Asterisk system, there are many additional features that can be used to customize the processing of the phone call. Some features are larger pre-built common applications, such as voicemail. There are other smaller features that can be combined together to create custom voice applications, such as playing back a sound file, reading digits, or speech recognition. 1.1. Critical Architectural Concepts This section discusses some architectural concepts that are critical to all parts of Asterisk. These ideas are at the foundation of the Asterisk architecture. 1.1.1. Channels A channel in Asterisk represents a connection between the Asterisk system and some telephony endpoint (Figure 1.1).

pages: 1,331 words: 183,137

Programming Rust: Fast, Safe Systems Development
by Jim Blandy and Jason Orendorff
Published 21 Nov 2017

Systems programming is for: Operating systems Device drivers of all kinds Filesystems Databases Code that runs in very cheap devices, or devices that must be extremely reliable Cryptography Media codecs (software for reading and writing audio, video, and image files) Media processing (for example, speech recognition or photo editing software) Memory management (for example, implementing a garbage collector) Text rendering (the conversion of text and fonts into pixels) Implementing higher-level programming languages (like JavaScript and Python) Networking Virtualization and software containers Scientific simulations Games In short, systems programming is resource-constrained programming.

Big Data and the Welfare State: How the Information Revolution Threatens Social Solidarity
by Torben Iversen and Philipp Rehm
Published 18 May 2022

Alphabet has recently created a new research unit, called Verily Life Sciences, to develop AI-based approaches to data analysis, and Microsoft’s Healthcare NExT is focused on collecting massive amounts of individual data from a variety of sources and transferring it to cloud-based systems, including a virtual assistant that takes notes at patient-doctor meetings using speech recognition technologies (Singer 2017). AI-enabled machine learning is used to make sense of the data for diagnostic purposes, which are of course also highly relevant for underwriting. One of the sources of data is independent laboratories, which have greatly proliferated over time, and these data can be combined with other health data to produce detailed proﬁles of individual health parameters with enormous predictive power.

pages: 698 words: 198,203

The Stuff of Thought: Language as a Window Into Human Nature
by Steven Pinker
Published 10 Sep 2007

Nor is it a reaction to learning that the speaker harbors an abominable attitude. These days someone who displayed the same attitude by simply saying “I hate African Americans, women, and Jews” would be stigmatizing himself far more than his targets, and would quickly be written off as a loathsome kook. I suspect that our sense of offense comes from the nature of speech recognition and from what it means to understand the connotation of a word. If you’re an English speaker, you can’t hear the words nigger or cunt or fucking without calling to mind what they mean to an implicit community of speakers, including the emotions that cling to them. To hear nigger is to try on, however briefly, the thought that there is something contemptible about African Americans, and thus to be complicit in a community that standardized that judgment by putting it into a word.

pages: 701 words: 199,010

The Crisis of Crowding: Quant Copycats, Ugly Models, and the New Crash Normal
by Ludwig B. Chincarini
Published 29 Jul 2012

The fund uses complex mathematical models to execute trades, which are often automated and generated on a model’s signal. One-third of the fund’s employees have PhDs in fields such as statistics, economics, mathematics, and physics. The firm depends on models built by mathematician Leonard Baum, co-author of the Baum-Welch algorithm, which determine probabilities in (among other things) biology, automated speech recognition, and statistical computing. Simons hoped to harness Baum’s mathematical models to trade currencies. The models and techniques were modified over time, but stayed rooted in the quantitative discipline. Some of Medallion’s staff eventually left to start their own hedge funds. Sandor Strauss, for instance, started Merfin LLC using a similar methodology. 10.

The Code: Silicon Valley and the Remaking of America
by Margaret O'Mara
Published 8 Jul 2019

SRI still did plenty of military work, but it now was grabbing national attention for what it was doing to make machines think. In November 1970, Life magazine invited its millions of readers to “Meet Shaky,” a robot that rolled the linoleum halls of SRI, capable of point-to-point navigation, “seeing” objects in its way, and rudimentary speech recognition. The mainframe-powered Shaky was a smartphone and a driverless car, fifty years before its time.12 Down the hall at SRI lay another future-tense lab, the Augmentation Research Center, led by a soft-spoken engineer in his forties named Douglas C. Engelbart. As academics and policymakers worried over the question of automation—the replacement of human workers with robotic machines, and human brains with artificially intelligent computers—Engelbart was one of a small and growing group of researchers interested in augmentation of human effort via technology.

pages: 797 words: 227,399

Wired for War: The Robotics Revolution and Conflict in the 21st Century
by P. W. Singer
Published 1 Jan 2010

Soon after, Kurzweil created such inventions as an automated college application program, the first print-to-speech reading machine for the blind (considered the biggest advancement for the visually impaired since the Braille language in 1829), the first computer flatbed scanner, and the first large-vocabulary speech recognition system. The musician Stevie Wonder, who used one of Kurzweil’s reading machines, then urged him to invent an electronic music synthesizer that could re-create the sounds of pianos and other orchestral instruments. So Kurzweil did. As his inventions piled up, Forbes magazine called him “the Ultimate Thinking Machine” and “rightful heir to Thomas Edison.”

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 17 Apr 2017

Rushing toward an Inter‐ net of Things, we are rapidly approaching a world in which every inhabited space contains at least one internet-connected microphone, in the form of smartphones, smart TVs, voice-controlled assistant devices, baby monitors, and even children’s toys that use cloud-based speech recognition. Many of these devices have a terrible security record [95]. Even the most totalitarian and repressive regimes could only dream of putting a microphone in every room and forcing every person to constantly carry a device capable of tracking their location and movements. Yet we apparently voluntarily, even enthusiastically, throw ourselves into this world of total surveillance.

pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 16 Mar 2017

Rushing toward an Internet of Things, we are rapidly approaching a world in which every inhabited space contains at least one internet-connected microphone, in the form of smartphones, smart TVs, voice-controlled assistant devices, baby monitors, and even children’s toys that use cloud-based speech recognition. Many of these devices have a terrible security record [95]. Even the most totalitarian and repressive regimes could only dream of putting a microphone in every room and forcing every person to constantly carry a device capable of tracking their location and movements. Yet we apparently voluntarily, even enthusiastically, throw ourselves into this world of total surveillance.

pages: 892 words: 91,000

Valuation: Measuring and Managing the Value of Companies
by Tim Koller , McKinsey , Company Inc. , Marc Goedhart , David Wessels , Barbara Schwimmer and Franziska Manoury
Published 16 Aug 2015

They do this because they can acquire the technology more quickly than developing it themselves, avoid royalty 14 IBM Investor Briefing website, 2014. 610 MERGERS AND ACQUISITIONS payments on patented technologies, and keep the technology away from competitors. For example, Apple bought Siri (the automated personal assistant) in 2010 to enhance its iPhones. More recently, in 2014, Apple purchased Novauris Technologies, a speech recognition technology company, to further enhance Siri’s capabilities. In 2014, Apple also purchased Beats Electronics, which had recently launched a music-streaming service. One reason for the acquisition was to quickly offer its customers a music-streaming service, as the market was moving away from Apple’s iTunes business model of purchasing and downloading music.

pages: 1,199 words: 332,563

Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition
by Robert N. Proctor
Published 28 Feb 2012

In 1993, for example, just to receive calls and process orders for its Marlboro Adventure Team promotion, Philip Morris established a new 450,000-square-foot “fulfillment facility” in Lafayette, Indiana, staffed by 350 employees, and a new Customer Service Telemarketing Facility in Kankakee, Illinois, with a staff of 25 to handle phone orders. Philip Morris in the year 2000 expanded its call-receiving capabilities, implementing natural-language speech recognition, standby promotional and apology mail packages, and a “new attitude” tailoring personal service to the individual smoker. Callers were given a personalized consumer ID and PIN to allow personal logins, and email and fax programs were installed to reach consumers more quickly. For a time the industry hoped to replace its telephonic contacts with fax, email, and web-based interactions, though phone calls apparently still remain important, with texting and interactive web 2.0 advertising close on their heels.77 Philip Morris is not the only tobacco company to engage outside firms for such purposes.

Engineering Security
by Peter Gutmann

Another example of the mind’s ability to transparently fix up problems occurs with a synthesised speech form called sinewave speech, which is generated by using a formant tracker to detect the formant frequencies in normal speech and then 166 Psychology generating sine waves that track the centres of these formants [316]. The first time that you hear this type of synthesised speech it sounds like an alien language. If you then listen to the same message as normal speech, the brain’s speech-recognition circuits are activated in a process known as perceptual insight, and from then on you can understand the previously unintelligible sinewave speech. In fact no matter how hard you try you can no longer “unhear” what was previously unintelligible, alien sounds. In another variant of this, it’s possible under the right stimuli of chaotic surrounding sounds for the brain to create words and even phrases that aren’t actually there as it tries to extract meaning from the surrounding cacophony [317].

pages: 1,737 words: 491,616

Rationality: From AI to Zombies
by Eliezer Yudkowsky
Published 11 Mar 2015

(See also: Truly Part Of You, Words as Mental Paintbrush Handles, Drew McDermott’s “Artificial Intelligence Meets Natural Stupidity.”1) The essential driver of the Detached Lever Fallacy is that the lever is visible, and the machinery is not; worse, the lever is variable and the machinery is a background constant. You can all hear the word “apple” spoken (and let us note that speech recognition is by no means an easy problem, but anyway . . .) and you can see the text written on paper. On the other hand, probably a majority of human beings have no idea their temporal cortex exists; as far as I know, no one knows the neural code for it. You only hear the word “apple” on certain occasions, and not others.