recommendation engine

back to index

description: information filtering system to predict users' preferences

212 results

pages: 282 words: 63,385

Attention Factory: The Story of TikTok and China's ByteDance
by Matthew Brennan
Published 9 Oct 2020

To deceive machines, a person needed to have a scrupulous attention to detail. These practices boiled down to a single goal: trick the app’s recommendation systems. Recommendation—the use of machine learning to infer people’s preferences for content based on their behavior alone—was at the very heart of TikTok. It was the key to understanding the success of the app and its parent company, ByteDance. ByteDance had been the earliest Chinese internet company to go “all in” on the then-nascent technology and commit to the daunting task of building a recommendation engine, challenging the status quo of human curation. This early bet paid off in spades. The foundations of TikTok’s success were laid many years before the app itself was built, and it was no coincidence that ByteDance was the company to make it.

Lei Li, ByteDance AI Labs Image: the large central “fishbowl” glass meeting room, ByteDance’s Beijing head offices, former aviation museum AVIC Plaza. Chapter Timeline 2012 Sept – Toutiao ’s personalized recommendation system goes live 2013 Aug – Zhang Lidong joins ByteDance to lead commercialization 20 14 – Yang Zhenyuan joins ByteDance as VP of Technology 2015 Jan – Okina wa annual meeting 2016 Feb – Company moves to new offices at AVIC Plaza I n mid-2012, an email dropped into ByteDance’s technical team’s inboxes with the ominous title “Recommended Engine General Meeting.” Yiming was determined to push forward on a topic that he saw to be critical to the company’s future.

Yiming was determined to push forward on a topic that he saw to be critical to the company’s future. The email continued: “To be an information platform, it is necessary to do a good job on the personalized recommendation engine. Do you want to start this thing now?” Toutiao’s early recommendation system, its so-called “personalization technology,” was, at the time, rudimentary. Open the app, and the user would be bombarded with top-read articles to keep them immediately hooked. Next, it would mix in more targeted click-bait articles appealing only to specific demographics to test and determine who the reader was. The user clicking on the article with a big preview picture of a female car show model is probably male.

pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline
by Cathy O'Neil and Rachel Schutt
Published 8 Oct 2013

proximity clustering, Morningside Analytics prtobuf, Back to Josh: Workflow pseudo-likelihood estimation procedure, Inference for ERGMs pseudocounts, Comparing Naive Bayes to k-NN purity, Probabilities Matter, Not 0s and 1s Q Quora, The Current Landscape (with a Little History) R R-squared, Adding in modeling assumptions about the errors, Selection criterion random forests, Random Forests–Random Forests random graphs, A First Example of Random Graphs: The Erdos-Renyi Model–A Second Example of Random Graphs: The Exponential Random Graph Model Erdos-Renyi model, A First Example of Random Graphs: The Erdos-Renyi Model–A Second Example of Random Graphs: The Exponential Random Graph Model exponential, A Second Example of Random Graphs: The Exponential Random Graph Model random variables, Probability distributions ranks, Evaluation, The Dimensionality Problem real-life performance measures, How to Be a Good Modeler real-time streaming data, Populations and Samples of Big Data real-world data, Process Thinking real-world processes, Statistical Inference RealDirect, Case Study: RealDirect website, Exercise: RealDirect Data Strategy RealDirect case study, Case Study: RealDirect–Sample R code RealDirect data strategy exercise, Exercise: RealDirect Data Strategy–Sample R code realizations, Probability distributions recall, Pick an evaluation metric, Evaluation, Defining the error metric receiver operating characteristic curve, Evaluation recommendation engines, Recommendation Engines: Building a User-Facing Data Product at Scale–Exercise: Build Your Own Recommendation System Amazon and, Recommendation Engines: Building a User-Facing Data Product at Scale building, exercise, Exercise: Build Your Own Recommendation System dimensionality, The Dimensionality Problem k-Nearest Neighbors (k-NN) and, Nearest Neighbor Algorithm Review–Some Problems with Nearest Neighbors machine learning classifications and, Beyond Nearest Neighbor: Machine Learning Classification–Beyond Nearest Neighbor: Machine Learning Classification Netflix and, Recommendation Engines: Building a User-Facing Data Product at Scale real-world, A Real-World Recommendation Engine records, Populations and Samples of Big Data Red Hat, Cloudera Reddy, Ben, Helping Hands redundancies, Feature Selection regression, stepwise, Selecting an algorithm regular expressions, Helping Hands relational ties, Terminology from Social Networks relations, Terminology from Social Networks relationships deterministic, Linear Regression understanding, Linear Regression relative time differentials, Thought Experiment residual sum of squares (RSS), Fitting the model residuals, Adding in modeling assumptions about the errors retention, understanding, Example: User Retention return, The Decision Tree Algorithm Robo-Graders, ethical implications of as thought experiment, Thought Experiment: What Are the Ethical Implications of a Robo-Grader?

Use mixed methods to come to a better understanding of what’s going on. Qualitative surveys can really help. Chapter 8. Recommendation Engines: Building a User-Facing Data Product at Scale Recommendation engines, also called recommendation systems, are the quintessential data product and are a good starting point when you’re explaining to non–data scientists what you do or what data science really is. This is because many people have interacted with recommendation systems when they’ve been suggested books on Amazon.com or gotten recommended movies on Netflix. Beyond that, however, they likely have not thought much about the engineering and algorithms underlying those recommendations, nor the fact that their behavior when they buy a book or rate a movie is generating data that then feeds back into the recommendation engine and leads to (hopefully) improved recommendations for themselves and other people.

Beyond that, however, they likely have not thought much about the engineering and algorithms underlying those recommendations, nor the fact that their behavior when they buy a book or rate a movie is generating data that then feeds back into the recommendation engine and leads to (hopefully) improved recommendations for themselves and other people. Aside from being a clear example of a product that literally uses data as its fuel, another reason we call recommendation systems “quintessential” is that building a solid recommendation system end-to-end requires an understanding of linear algebra and an ability to code; it also illustrates the challenges that Big Data poses when dealing with a problem that makes intuitive sense, but that can get complicated when implementing its solution at scale.

pages: 208 words: 57,602

Futureproof: 9 Rules for Humans in the Age of Automation
by Kevin Roose
Published 9 Mar 2021

The injection of algorithmic recommendations into every facet of modern life has gone mostly unnoticed, and yet, if we consider how many of our daily decisions we outsource to machines, it’s hard not to think that a historic, species-level transformation is taking place. “Recommendation engines increasingly shape who people are, what they desire, and who they want to become,” writes Michael Schrage, an MIT research fellow and author of a book about recommendation engines. “The future of the self,” he adds, “is the future of recommendation.” Modern recommendation systems are orders of magnitude more powerful than the one Doug Terry and Dave Nichols developed to sift through their email inboxes. Today’s tech companies have access to huge amounts of computing power that allows them to generate detailed models of user behavior, and machine learning techniques that let them discover patterns in enormous data sets—studying the online shopping behavior of a hundred million people to find out, for example, that people who buy a certain brand of dog food are statistically more likely to vote Republican.

What is rewarding is often hard, and hard is the enemy of the machine. * * * — Recently, I called Doug Terry, the Xerox PARC engineer who came up with Tapestry, the first algorithmic recommender system, nearly three decades ago. Terry, who is sixty-two, works at Amazon now, and after reminiscing about the early days of Tapestry, I asked what he thought of the recommendation engines that power services like Facebook, YouTube, and Netflix. “I don’t think there’s any comparison,” he said. “We just had a little simple system, and nowadays there’s trillions and trillions of feeds for billions of people—just the scale and complexity and everything is different.”

Terry, “A Tour Through Tapestry,” Proceedings of the 1993 ACM Conference on Organizational Computing Systems (1993). Michael Schrage, an MIT research fellow Michael Schrage, Recommendation Engines (Boston: MIT Press, 2020). YouTube has said that recommendations Paresh Dave, “YouTube Sharpens How It Recommends Videos Despite Fears of Isolating Users,” Reuters, November 28, 2017. It has been estimated that 30 percent of Amazon page views Amit Sharma, Jake M. Hofman, and Duncan J. Watts, “Estimating the Causal Impact of Recommendation Systems from Observational Data,” Proceedings of the 2015 ACM Conference on Economics and Computation (2015). Spotify’s algorithmically generated Discover Weekly playlists Devindra Hardawar, “Spotify’s Discover Weekly Playlists Have 40 Million Listeners,” Engadget, May 25, 2016.

pages: 347 words: 91,318

Netflixed: The Epic Battle for America's Eyeballs
by Gina Keating
Published 10 Oct 2012

The costs of buying enough DVDs to satisfy the growing subscriber base would eventually crush the company unless Lowe could persuade studios to drop DVD prices drastically in exchange for a share of rental revenues. In the meantime, Netflix engineers had been hard at work since shortly after launch on a recommendation engine—an in-house solution to DVD shortages that would theoretically drive up retention and get more of the company’s catalog into circulation by directing customers away from the most popular films toward more obscure titles that they would like just as much. As a result, the recommendation engine took over the editorial team’s tasks of determining which movies to feature on certain themed Web pages, using machine logic rather than human intuition.

• • • WHEN NETFLIX’S FOUNDING software engineers, including Hastings, contemplated building a recommendation engine in 1999, their first approach was rudimentary and involved linking movies through common attributes: genre, actors, director, setting, happy or sad ending. As the film library grew, that method proved cumbersome and inaccurate, because no matter how many attributes they assigned each film, they could not capture why Pretty Woman was so different from, say, American Gigolo. Both were movies about prostitution set in a major U.S. city and starring Richard Gere, but they were unlikely to appeal to the same audiences. Early recommendation engines were unpredictable: In one famous gaffe, Walmart had to issue an apology and disable theirs after its Web site presented the film Planet of the Apes to shoppers looking for films related to Black History Month.

leaves Netflix, 84–85 Marquee Plan, 58–59, 63 meets Mitch Lowe, 23–24 Netflix, development of, 20–31 and Netflix founding, 6–9 personality of, 2, 15 post-Netflix positions, 253–54 public relations, early experience, 17–19 at Pure Atria, 11–13, 14–16 Queue, 58 on Qwikster fiasco, 253 relationship with Hastings, 15, 20, 44 video rentals, learning about, 22–24 Randolph, Muriel, 17 Randolph, Stephen, 17 Raskopf, Karen, 129, 152, 175 Rational Software, 7, 16 Recommendation engine. See Cinematch recommendation engine Redbox, 5, 84, 234–38 as Blockbuster competitor, 231–32, 234 and Coinstar, 237–38 development of, 24–25, 161 McDonalds placements, 235–36 as Netflix competitor, 228–29 new releases advantage, 161, 235, 238 pricing, 237 rationale for business, 235–36 video stores, impact on, 212, 238 Redpoint Ventures, 53 Redstone, Sumner, 72–74, 111–12 Reel.com, 48, 60, 62 ReFLEX, 78 Reiss, Lisa Battaglia, 45 Remind Me, 36 Rendich, Andy, 248, 251–53 leaves Netflix, 252 ReplayTV, 168 Rock The Block, 219–21, 230–32 Roku box, 224–25 Rolling Road Show, 177–79 Ross, Ken leaves Netflix, 243–44, 247 Netflix corporate communications actions, 138–41, 145, 177–81, 187–88 on Qwikster fiasco, 252 Roth Capitol Partners, 135 S Sam Goody/Musicland, 51 Santa Cruz, Netflix launch, 7–9 Sarandos, Ted, 103, 126, 129, 179, 210, 225, 240 Satellite hub system, 57 Schappert, John, 225 Scorsese, Martin, 179 Sellers, Pattie, 243 Serialized Delivery, 58–59 Sheehan, Susan, 140 Shepherd, James, 128–29 Shepherd, Nick background of, 118–19 as Blockbuster COO, 214–15 Blockbuster financial moves, concern about, 128–29, 202–3 cost-cutting, 118, 162–63 and End of Late Fees, 117–19 and hostile board of directors, 216–17 joins Blockbuster, 90–92 leaves Blockbuster, 217 personality of, 116–17 Redbox purchase, rejecting, 236 sells Blockbuster shares, 219 Siftar, Michael, 107 Siminoff, Ellen and David, 222 Simpson, Jessica, 175 Skip shipping, 28–29 Skorman, Stuart, 48 Smith, Therese “Te” background of, 16 leaves Netflix, 54 and Netflix development, 22, 28, 33, 37–38 Smith’s grocery (Las Vegas) Netflix Express, 83–84 Redbox at, 237 Social Register, 13 Sock puppets, 40–41 Software Publishing, 15 Soleil Securities, 209 SpeakerText, 42 Squali, Youssef, 135 Starfish Software, 16 Starz Entertainment, 225, 239–40 Stead, Ed and Blockbuster Online development, 86–89, 95 and Carl Icahn, 116, 121 and Hastings alliance attempts, 66–67, 77 leaves Blockbuster, 170 personality of, 60, 95 and video streaming plans, 77–78 Streaming video.

pages: 390 words: 109,519

Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media
by Tarleton Gillespie
Published 25 Jun 2018

See self-harm Tinder (dating app), (i), (ii), (iii) Topfree Equal Rights Association [TERA], (i), (ii), (iii) traditional media: economics of, (i), (ii), (iii); regulation of, (i), (ii); moderation of, (i), (ii), (iii), (iv), (v), (vi), (vii), (viii), (ix), (x), (xi) transgender, (i) transparency and accountability, (i), (ii), (iii), (iv), (v), (vi), (vii), (viii), (ix) transparency reports, (i) TripAdvisor (recommendation platform), (i) trolling, (i), (ii), (iii), (iv), (v), (vi), (vii), (viii)n2 Trust and Safety Council (Twitter), (i), (ii)n60 Tumblr (Yahoo): community guidelines, (i), (ii), (iii); and the thinspo controversy, (i); and ratings, (i); and the NSFW controversy, (i); and filtering, (i), (ii) Tushnet, Rebecca, (i) Twitter: community guidelines, (i), (ii), (iii), (iv), (v), (vi), (vii)n10; and harassment, (i), (ii), (iii), (iv), (v)n60, (vi)n2, (vii)n5; responses to removal requests, (i), (ii); approach to moderation, (i), (ii); and flagging, (i); and automated detection, (i), (ii), (iii); moderation of Trends, (i); fake news / Russian ad controversies, (i) U.K.

SoundCloud is a social place but it’s not the place for you to act out rage from other parts of your life. Don’t let a personal issue strain the rest of the community.”20 Trolls often look to game the platform itself: misusing complaint mechanisms, and wreaking havoc with machine learning algorithms behind recommendation systems and chatbots.21 Unfortunately, platforms must also address behavior that goes well beyond spirited debate gone overboard or the empty cruelty of trolls: directed and ongoing harassment of an individual over time, including intimidation, stalking, and direct threats of violence. This is where the prohibitions often get firm: “Never threaten to harm a person, group of people, or property” (Snapchat).

“Race, Civil Rights, and Hate Speech in the Digital Era.” In Learning Race and Ethnicity: Youth and Digital Media. Cambridge: MIT Press. http://academicworks.cuny.edu/gc_pubs/193/. DAVID, SHAY, AND TREVOR JOHN PINCH. 2005. “Six Degrees of Reputation: The Use and Abuse of Online Review and Recommendation Systems.” First Monday, special issue 6: Commercial Applications of the Internet. http://firstmonday.org/ojs/index.php/fm/article/view/1590/1505. DEIBERT, RONALD, JOHN PALFREY, RAFAL ROHOZINSKI, AND JONATHAN ZITTRAIN, EDS. 2008. Access Denied: The Practice and Policy of Global Internet Filtering.

pages: 451 words: 103,606

Machine Learning for Hackers
by Drew Conway and John Myles White
Published 10 Feb 2012

More likely, you have heard of something like a recommendation system, which implicitly produces a ranking of products. Even if you have not heard of a recommendation system, it’s almost certain that you have used or interacted with a recommendation system at some point. Some of the most successful ecommerce websites have benefited from leveraging data on their users to generate recommendations for other products their users might be interested in. For example, if you have ever shopped at Amazon.com, then you have interacted with a recommendation system. The problem Amazon faces is simple: what items in their inventory are you most likely to buy?

We use data from US Senator roll call voting to cluster those legislators based on their votes. Recommendation system: suggesting R packages to users To further the discussion of spatial similarities, we discuss how to build a recommendation system based on the closeness of observations in space. Here we introduce the k-nearest neighbors algorithm and use it to suggest R packages to programmers based on their currently installed packages. Social network analysis: who to follow on Twitter Here we attempt to combine many of the concepts previously discussed, as well as introduce a few new ones, to design and build a “who to follow” recommendation system from Twitter data.

It can be quite interesting and informative to explore these structures in detail, and we encourage you to do so. In the next and final section, we will use these community structures to build our own “who to follow” recommendation engine for Twitter. Building Your Own “Who to Follow” Engine There are many ways that we might think about building our own friend recommendation engine for Twitter. Twitter has many dimensions of data in it, so we could think about recommending people based on what they “tweet” about. This would be an exercise in text mining and would require matching people based on some common set of words or topics within their corpus of tweets.

pages: 23 words: 5,264

Designing Great Data Products
by Jeremy Howard , Mike Loukides and Margit Zwemer
Published 23 Mar 2012

The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go. Drivetrain Approach to recommender systems Let’s look at how we could apply this process to another industry: marketing. We begin by applying the Drivetrain Approach to a familiar example, recommendation engines, and then building this up into an entire optimized marketing strategy. Recommendation engines are a familiar example of a data product based on well-built predictive models that do not achieve an optimal objective. The current algorithms predict what products a customer will like, based on purchase history and the histories of similar customers.

Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “Discworld series:” All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books. There may be some unexpected recommendations on pages 2 through 14 of the feed, but how many customers are going to bother clicking through? Instead, let’s design an improved recommendation engine using the Drivetrain Approach, starting by reconsidering our objective. The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with books he or she would not have purchased without the recommendation. What we would really like to do is emulate the experience of Mark Johnson, CEO of Zite, who gave a perfect example of what a customer’s recommendation experience should be like in a recent TOC talk.

While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. O'Reilly Media * * * Chapter 1. Designing Great Data Products By Jeremy Howard, Margit Zwemer, and Mike Loukides In the past few years, we’ve seen many data products based on predictive modeling. These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself. But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction. Prediction technology can be interesting and mathematically elegant, but we need to take the next step.

Succeeding With AI: How to Make AI Work for Your Business
by Veljko Krunic
Published 29 Mar 2020

.  Application of various metrics seen in the scientific papers (such as “Evaluating Recommendation Systems” [77]) on recommendation engines. An example of such a metric is novelty [78], which measures how many new items that the user didn’t know about were recommended. What is often not clear is if such technical metrics positively impact any aspect of the business—I might not have known that the retailer stocks garden hoses, but do I care?  Measure the sales increase from improved recommendations. This approach has clear business relevance and is unambiguous. You don’t have to deploy your recommendation engine fully in production to test the sales increase.

Let’s look at one typical situation in which it’s difficult to have an intuitive feel for what good results are for your AI project. 4.1.1 What constitutes a good recommendation engine? You’re in charge of the recommendation engine of a large retailer. The retailer is selling 200 K products, and it has a total of 80 million customers and close to 2 million products viewed every day. Your recommendation engine suggests to every customer additional products they might be interested in buying. You’ve just made an update to your recommendation engine. How do you know that the latest update is moving the system in the right direction? You can look at a few products overall, but looking at a few products doesn’t actually tell you if your latest change is doing well across all of your customers.

Artificial general intelligence. Wikipedia. [Cited 2018 Jun 13.] Available from: https://en.wikipedia.org/w/index.php?title=Artificial _general_intelligence Shani G, Gunawardana A. Evaluating recommendation systems. In: Ricci F, Rokach L, Shapira B, Kantor PB, editors. Recommender systems handbook. New York: Springer; 2011. p. 257–297. Konstan JA, McNee SM, Ziegler , Torres R, Kapoor N, Riedl JT. Lessons on applying automated recommender systems to information-seeking tasks. Proceedings of the Twenty-First National Conference on Artificial Intelligence; 2006. Wikimedia Foundation. Expected value of perfect information.

pages: 439 words: 131,081

The Chaos Machine: The Inside Story of How Social Media Rewired Our Minds and Our World
by Max Fisher
Published 5 Sep 2022

An American iteration, which had first appeared on the message board 4chan under the label “QAnon,” had recently hit Facebook like a match to a pool of gasoline. Later, as QAnon became a movement with tens of thousands of followers, an internal FBI report identified it as a domestic terror threat. Throughout, Facebook’s recommendation engines promoted QAnon groups to huge numbers of readers, as if this were merely another club, helping to grow the conspiracy into the size of a minor political party, for seemingly no more elaborate reason than the continued clicks the QAnon content generated. Within Facebook’s muraled walls, though, belief in the product as a force for good seemed unshakable.

Soon, DiResta noticed Facebook doing something strange: pushing a stream of notifications urging her to follow other anti-vaccine pages. “If you joined the one anti-vaccine group,” she said, “it was transformative.” Nearly every vaccine-related recommendation promoted to her was for anti-vaccine content. “The recommendation engine would push them and push them and push them.” Before long, the system prompted her to consider joining groups for unrelated conspiracies. Chemtrails. Flat Earth. And as she poked around, she found another way that the system boosted vaccine misinformation. Just as with the ad-targeting tool, typing “vaccines” in Facebook’s search bar returned a stream of anti-vaccine posts and groups.

Others from DiResta’s informal group of social media watchers were noticing Facebook and other platforms routing them in similar ways. The same pattern played out over and over, as if those A.I.s had all independently arrived at some common, terrible truth about human nature. “I called it radicalization via the recommendation engine,” she said. “By having engagement-driven metrics, you created a world in which rage-filled content would become the norm.” The algorithmic logic was sound, even brilliant. Radicalization is an obsessive, life-consuming process. Believers come back again and again, their obsession becoming an identity, with social media platforms the center of their day-to-day lives.

pages: 642 words: 141,888

Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination
by Mark Bergen
Published 5 Sep 2022

Type in an impossibly long question (Where did the actress who plays Rachel’s mom on Friends go to college?) and there’s the answer. Translate this question into French et voilà. Neural networks went into Google’s email spam filters and ad-targeting dials and digital photo albums. At YouTube neural networks plugged into its recommendation engine. * * * • • • Think of YouTube’s recommendation system as a gigantic, multiarmed sorting machine. It has one task: predict what video someone will watch next and deliver it. From YouTube’s outset its computer programs strove to do this. But the Brain neural network could make predictions and sort in ways fallible humans and flimsier code could not.

Certain videos claimed that Hillary Clinton and her top aide assaulted a young girl and drank her blood. This was Frazzledrip, a bizarre cousin of Pizzagate, a theory that had, by then, morphed into QAnon, the cultlike conspiracy theory and movement. “What is your company policy on that?” Raskin asked. At that time YouTube was working on a major overhaul of its recommendation engine to bury conspiracy clips and other footage deemed “harmful” in its penalty box. But this change wasn’t ready for public consumption, so Pichai didn’t mention it. “We are looking to do more,” he replied. “Is your basic position,” the congressman pressed, “that there’s just an avalanche of material and there’s nothing that could be done?”

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A ABC television network, 76 “Abortion Man,” 75 abortion-related content, 86 Accenture, 317–18, 319 addictiveness of videos, 239 “The Adpocalypse: What it Means” (vlogbrothers), 289 advertising/advertisers and ad-friendly mandate for creators, 267–68, 274, 282–83 algorithms’ role in placement of, 284, 285, 295 banner ads, 73, 197, 200 and boycotts, 283–90, 295, 300, 308, 329, 356 and child-directed content, 173, 368, 394 and child-exploitation debacle, 311–14 and comments removed from kids’ videos, 371 and copyright concerns, 108 creators’ control over types of, 347 and data shared with marketers, 284–85, 295 dismantling of targeted, 390 Dynamic Ad Loads (Dallas), 191–92, 194 and eligibility thresholds for partner program, 329 on Facebook, 252, 284 and first profit of YouTube, 50 and fraudsters, 284 and Google Preferred, 210 and home page of YouTube, 101 Hurley’s reluctance to employ, 68 increase in videos eligible for, 110 on non-partner program channels, 391–92 and partner program, 69–70, 163–64 and pay-per-view business model, 133 pop-up ads, 68, 73 “pre-rolls,” 68 and product placements, 67 and Project MASA, 296, 313, 381 and questionable/troubling content, 67, 75, 87, 255, 285–87, 296 recent sales revenues, 391, 401 removed from creators’ videos, 267–68, 314, 324 Russia Today (RT), 340–41 and sell-through rate, 108, 109–10 shortage of slots for, 164 skippable, 192 and Spotlight (influencer campaigns), 248 television model for, 110–11 and user experience, 68 and viewability/measurability, 284–85 and Wojcicki, 195, 196, 197–98, 200, 213 AdWords, 51 Agha-Soltan, Neda, 137 Aghdam, Nasim Najafi, 331–35 Akilah Obviously, 261 al-Awlaki, Anwar, 214 Alchemy House, 256 algorithms of YouTube for ad placements, 284, 285, 295 adult content from kids’ search terms, 306 and advertiser-friendly content, 267 and authoritative news sources, 325–26 and changes to reward system, 155, 156–60, 164 and clickbait, 150 and comments vs. likes, 158, 276 and conspiracy theories, 325–29 and creators of color, 339 creators’ understanding of, 297, 385 and daily viewers, 252, 254 disclosure of information about, 297, 398 and Google Preferred content, 210 and government regulation, 401 and home-page of YouTube, 99–102, 135, 298 Jho on improvements in, 394–95 and keyword stuffing, 308–9 and “Leanback” feature, 189–90 limitations of, 255–56 and machine learning, 191–92 and Paul’s video of suicide victim, 323 and PewDiePie, 275 and presidential election of 2016, 272, 326 and presidential election of 2020, 388 and quality content, 175 responsibility metric, 328 screeners’ role in training, 320 and skeptics of YouTube, 223 skin-detection by, 255–56 titles of content chosen for, 172 watch time favored in, 156–60 and YouTube Kids app, 238, 244–45 Allen & Company (investment bank), 49 Alphabet, 257 alt-right, 263, 269–70, 275, 277–78 Amazing Atheist, 223 Amazon, 210, 232–33, 253 Anderson, Erica, 350 Andreessen, Marc, 72 Android, 147, 177 animated videos, 241 Annoying Orange (YouTuber), 128, 140, 160 anonymous creators, 172–73 Anti-Defamation League (ADL), 281 antisemitism, 86, 275, 277, 281 Apple, 35, 56, 149, 176–77, 207–8 Arab Spring, 139, 140, 141–43, 145, 149, 164, 213 Argento, Dario, 383 Armstrong, Tim, 73 Arnspiger, Dianna, 334 artificial intelligence and neural networks content moderation with, 233–35, 292–93, 315, 399–400 and DeepMind acquisition by Google, 230–31 detection of red flags, 396 and DistBelief system, 232 and Google Brain, 231–35, 298 Google’s application of, 233 inability to precisely control or predict, 308 “precision and recall” protocols for, 309 and problematic content targeted at kids, 308–10 in recommendation engine of YouTube, 233–35 Reinforce program, 298 Whittaker’s criticisms of, 355 See also algorithms of YouTube; machine learning Ask a Ninja, 69 ASMR videos, 7, 208 AT&T, 210, 286 atheists/atheism, 221–22, 223, 226 audience of YouTube ages of viewers, 86, 169 and Arab Spring content, 145 and audience is king credo, 254, 297 average time on platform, 126 and billion-hours-of-viewing goal, 228, 270 and channels model, 127 communities built by, 122 complaints from users, 25 and COVID-19 pandemic, 376, 377–78 and cumulative hours of viewed footage, 154 daily viewers, 252, 254 emphasis on growth of, 91 and initiative to recruit female viewers, 369 and length of viewing sessions, 252 (see also watch time of audience) loyalty to YouTube, 394 number of videos watched daily, 49, 140 satisfaction ratings of, 296–97 See also engagement of users Auletta, Ken, 97 authoritative sources, 368, 388 Authors Guild, 48 auto-play function, 167 AwesomenessTV, 132, 210 B “Baby Shark,” 5, 306 bad actors, 308, 316, 329.

pages: 519 words: 102,669

Programming Collective Intelligence
by Toby Segaran
Published 17 Dec 2008

To find a set of links similar to one that you found particularly interesting, you can try: >>url=recommendations.getRecommendations(delusers,user)[0][1] >> recommendations.topMatches(recommendations.transformPrefs(delusers),url) [(0.312, u'http://www.fonttester.com/'), (0.312, u'http://www.cssremix.com/'), (0.266, u'http://www.logoorange.com/color/color-codes-chart.php'), (0.254, u'http://yotophoto.com/'), (0.254, u'http://www.wpdfd.com/editorial/basics/index.html')] That's it! You've successfully added a recommendation engine to del.icio.us. There's a lot more that could be done here. Since del.icio.us supports searching by tags, you can look for tags that are similar to each other. You can even search for people trying to manipulate the "popular" pages by posting the same links with multiple accounts. Item-Based Filtering The way the recommendation engine has been implemented so far requires the use of all the rankings from every user in order to create a dataset. This will probably work well for a few thousand people or items, but a very large site like Amazon has millions of customers and products—comparing a user with every other user and then comparing every product each user has rated can be very slow.

In late 2006 it announced a prize of $1 million to the first person to improve the accuracy of its recommendation system by 10 percent, along with progress prizes of $50,000 to the current leader each year for as long as the contest runs. Thousands of teams from all over the world entered and, as of April 2007, the leading team has managed to score an improvement of 7 percent. By using data about which movies each customer enjoyed, Netflix is able to recommend movies to other customers that they may never have even heard of and keep them coming back for more. Any way to improve its recommendation system is worth a lot of money to Netflix. The search engine Google was started in 1998, at a time when there were already several big search engines, and many assumed that a new player would never be able to take on the giants.

In Chapter 4 you'll learn about search engines and the PageRank algorithm, an important part of Google's ranking system. Other examples include web sites with recommendation systems. Sites like Amazon and Netflix use information about the things people buy or rent to determine which people or items are similar to one another, and then make recommendations based on purchase history. Other sites like Pandora and Last.fm use your ratings of different bands and songs to create custom radio stations with music they think you will enjoy. Chapter 2 covers ways to build recommendation systems. Prediction markets are also a form of collective intelligence. One of the most well known of these is the Hollywood Stock Exchange (http://hsx.com), where people trade stocks on movies and movie stars.

The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy
by Matthew Hindman
Published 24 Sep 2018

Compared to straight collaborative filtering, the hybrid model produced 31 percent more clicks on news stories, though this was largely the result of shifting traffic from interior sections of the site to recommended stories on the front page. Even more importantly, over the course of the study users who saw the hybrid model had 14 percent more daily visits to the Google News site. This is a clear demonstration of just how much improved recommendation systems can boost daily traffic. Other computer science researchers have produced traffic bonuses with news recommendation engines. Hewlett-Packard researchers Evan Kirshenbaum, George Forman, and Michael Dugan conducted an experiment comparing different methods of content recommendation on Forbes.com. Here too, as at Google, the researchers found that a mixture of content-based and collaborative-filtering methods gave a significant improvement.40 Yahoo!

Scholarship to date suggests six broad, interrelated lessons about which types of organizations are likely to win—and lose—in a world with ubiquitous algorithmic filtering. First, and most important, recommender systems can dramatically increase digital audience. Web traffic is properly thought of as a dynamic, even evolutionary process. Recommender systems make sites stickier, and users respond by clicking more and visiting more often. Over time sites and apps with recommender systems have grown in market share, while those without have shrunk. Second, recommender systems favor digital firms with lots of goods and content. There is only value in matching if the underlying catalogue of choices is large.

As this book goes to press, new journalism and communication scholarship has finally started to address this longstanding gap.9 Still, much work remains to be done. This chapter has two main aims. First, it offers a more detailed examination of the principals behind these recommendation systems than previous media scholarship. Recommender systems research has changed dramatically over the past decade, but little of this new knowledge has filtered into 40 • Chapter 3 research on web traffic, online news, or the future of journalism. Much of the writing on recommender systems in these fields has been an unhelpful montage of hypotheticals and what-ifs. Elaborate deductive conclusions have been built from false foundational assumptions.

pages: 1,085 words: 219,144

Solr in Action
by Trey Grainger and Timothy Potter
Published 14 Sep 2014

Instead of thinking of Solr as a text search engine, it can be mentally freeing to think of Solr as a “matching engine that happens to be able to match on parsed text.” Whether the search is manual or automated is of no consequence to Solr. In fact, several organizations have successfully built recommender systems directly on top of Solr using this thinking. The following sections will cover how to build your own Solr-powered recommendation engine and ultimately how to merge the concepts of a user-driven search experience and an automated recommendation system to provide a powerful, personalized search experience. In particular, we will discuss several content-based recommendation approaches including attribute-based matching, hierarchical-classification-based matching, matching based upon extracted interesting terms (More Like This), concept-based matching, and geographical matching.

This shifts the paradigm completely, because it requires software systems to be intelligent enough to recommend information to users as opposed to having them explicitly search for it. Although organizations such as Netflix and Amazon are well known for their recommender systems and have spent millions of dollars developing them, it’s both possible and easy to develop such systems yourself—particularly on top of Solr—to drastically improve the relevancy of your application. 16.5.1. Search vs. recommendations When one thinks of a search engine, the vision of a keyword box (and sometimes a separate location box) typically comes to mind. Likewise, when one thinks of a recommendation engine, the vision of a magical algorithm which automatically suggests information based upon past behavior and preferences likely comes to mind.

The beauty of collaborative filtering, regardless of the implementation, is that it’s able to work without any knowledge about the content of your documents. Therefore, you could build a recommendation engine based upon Solr with documents containing nothing more than document IDs and users, and you should still see quality recommendations as long as you have enough users linking your documents together. If you don’t put any text content, attributes, or classifications into Solr, then it means you will not be able to make use of those additional techniques at all. The next section will discuss why you may want to consider combining multiple techniques to achieve optimal relevancy in your recommendation system. 16.5.8. Hybrid approaches Throughout this chapter, you have seen multiple different recommendation approaches, each with its own strengths and weaknesses.

Designing Search: UX Strategies for Ecommerce Success
by Greg Nudelman and Pabini Gabriel-Petit
Published 8 May 2011

If fancy group formatting or Ajax carousels make customers disregard the more important More Like This buttons, such a page fails to meet its primary objective. Note—If you are still thinking about using a carousel for your More Like This groups, consider that Netflix has one of the best recommendation engines in the world and can usually select very relevant items to include among its 8 to 10 options. Amazon.com, which also has an exceptional recommendation engine, tried incorporating carousels for all its groups in the past, but has since dropped the feature. Amazon.com now uses the carousel feature sparingly, if at all, presumably, because the results underperformed the Spartan group design, which is optimized for quick scanning.

—Brynn Evans References Enterprise Social Search slides: www.slideshare.net/bmevans/designing-for-sociality-in-enterprise-search Wired article (Wired, November 2010): www.wired.com/magazine/2010/11/st_flowchart_social/ “Do your friends make you smarter” paper: http://brynnevans.com/papers/Do-your-friends-make-you-smarter.pdf Personalized Search and Recommender Systems Machine learning lets search engines draw reliable inferences and deliver improved search results by leveraging customers’ data. In the ecommerce realm, personalized search lets an online vendor use a customer’s past purchasing history—and possibly other data like product ratings, search history, the customer’s user profile, and even social networking activity—to interpret search strings, predict what products might be of interest to that customer, and deliver more relevant search results. On ecommerce sites, recommender systems—which are sometimes called implicit collaborative filtering systems, a bit of a misnomer—often use the past purchasing history of other customers who are similar in some way to a particular customer to predict what products might be of interest to that customer.

. … Given a similar-items table, the algorithm finds items similar to each of the user’s purchases and ratings, aggregates those items, and then recommends the most popular or correlated items.” [1] Amazon employs its recommender system to great effect—delivering product recommendations that encourage customers to browse additional products and, thus, helping users to find similar products of interest. Recommendations are particularly effective on product pages, where Amazon uses them in cross-selling additional products to customers. Amazon also personalizes the content on its home page extensively by providing many different types of recommendations. The recommender system Amazon has innovated helps customers find what they need and, because its recommendations actually provide a valuable service to customers, increases customer loyalty—and ultimately enhances Amazon’s bottom line.

pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI
by Paul R. Daugherty and H. James Wilson
Published 15 Jan 2018

Whereas, in the past, a salesperson might glean a sales opportunity based on physical or social cues over the phone or in person, 6sense is returning to salespeople some of the skills that more socially opaque online interactions, like the extensive use of email, had blunted.7 Your Buddy, the Brand Some of the biggest changes to the front office are happening through online tools and AI-enabled interfaces. Think how easily Amazon customers can purchase a vast array of consumer items, thanks to AI-enhanced product-recommendation engines and “Alexa” (the personal assistant bot), which is used via “Echo” (the smart, voice-enabled wireless speaker). AI systems similar to those designed for jobs like customer service are now beginning to play a much larger role in generating revenue, traditionally a front-office objective, and the ease of the purchasing experience has become a major factor for customers.

INDEX Accenture, 35, 129, 146, 184–185 acceptance of AI, 131 accountability, 129–130, 169–172 adaptive processes, 5–6, 108–109 mindset for reimagining, 155–160 skills for, 16 advertising, 86, 99 AdWords, 99 AeroFarms, 36 agency, 15, 172–174 agriculture, 34–37, 156–157 Aida, 55–56, 59, 139, 143, 145, 160 Airbus, 144 AI systems as black boxes, 106, 125, 169 collaboration with, 1–2, 25 in corporate functions, 45–66 current state of, 9–11 definition of, 3 deployment of, 208–209 embodied, 21–23 fairness and auditability of, 129 fear of, 126–127, 129, 166–167 framing problems in, 191–193 future of, 209–214 glossary on, 60–66 guardrails with, 168–169 history of, 40–44 human replaced by, 4–5 human roles in developing and deploying, 113–133 humans vs., 7, 19, 106, 207 integration of, 3 modifying outcomes of, 172–174 potential and impact of, 3–4 in production, supply chain, and distribution, 19–39 in R&D, 67–83 responses to, 131–132 scientific method and, 69–77 skills of, 20–21, 105–106 symbiotic partnerships with, 7–8 third wave of, 4–6 training, 100, 114–122 “winters” of, 25, 41 Akshaya Patra, 37 Alexa, 11, 56, 86, 92, 94, 118, 146 Capital One and, 204–205 empathy training for, 117–118 Alexander, Rob, 204–205 algorithm aversion, 167 algorithm forensics analysts, 124–125 Alice, 146 Allgood, Brandon, 81 Almax, 89, 90 Amazon Alexa, 11, 56, 86, 92, 94–95, 118, 146 Echo, 92, 94–95, 164–165 fulfillment at, 31, 150 Go, 160–165 Mechanical Turk, 169 recommendation engine, 92 Amelia, 55–56, 139, 164, 201, 202 amplification, 7, 107, 138–139, 141–143, 176–177 jobs with, 141–143 See also augmentation; missing middle anthropomorphism, brand, 93–94 Antigena, 58 anti-money-laundering (AML) detection, 45–46, 51 Apple, 11, 96–97, 118, 146 Apprenticeship Levy, 202 apprenticing, reciprocal, 12, 201–202 Arizona State University, 49 “Artificial Intelligence, Automation, and the Economy,” 211 Asimov, Isaac, 69, 128–129 assembly lines, 1–2, 4 flexible teams vs., 13–14 AT&T, 188 Audi, 158–160, 190 audio and signal processing, 64 Audi Robotic Telepresence (ART), 159–160 augmentation, 5, 7 customer-aware shops and, 87–90 embodiment and, 147–149 fostering positive experiences with, 166 generative design and, 135–137 of observation, 157–158 types of, 138–140 workforce implications of, 137–138 augmented reality, 143 Autodesk, 3, 136–137, 141 automakers, 116–117, 140 autonomous cars and, 67–68, 166–167, 189, 190 BMW, 1, 4, 10, 149–150 customization among, 147–149 Mercedes-Benz, 4, 10 process reimagination at, 158–160 automation, 5, 19 intelligent, 65 automation ethicists, 130–131 Ayasdi, 178 back-office operations, 10 banking digital lending, 86 fraud detection in, 42 money laundering and, 45–46, 51 virtual assistants in, 55–56 Beiersdorf, 176–177 Benetton, 89 Benioff, Marc, 196 Berg Health, 82 Bezos, Jeff, 161, 164 BHP Billiton Ltd., 28 biases, 121–122, 129–130, 174, 179 biometrics, 65 BlackRock, 122 blockchain, 37 Bloomberg Beta, 195 BMW, 1, 4, 10, 148, 209 Boeing, 28, 143 Boli.io, 196 bot-based empowerment, 12, 186, 195–196 boundaries, 168–169 BQ Zosi, 146 Braga, Leda, 167 brands, 87, 92–94 anthropomorphism of, 93–94 disintermediated, 94–95 personalization and, 96–97 as two-way relationships, 119 Brooks, Rodney, 22, 24 burnout, 187–188 Burns, Ed, 76 business models, 152 business processes.

Intelligent automation. Transfers some tasks from man to machine to fundamentally change the traditional ways of operating. Through machine-specific strengths and capabilities (speed, scale, and the ability to cut through complexity), these tools complement human work to expand what is possible. Recommendation systems. Make suggestions based on subtle patterns detected by AI algorithms over time. These can be targeted toward consumers to suggest new products or used internally to make strategic suggestions. Intelligent products. Have intelligence baked into their design so that they can evolve to continuously meet and anticipate customers’ needs and preferences.

pages: 321 words: 105,480

Filterworld: How Algorithms Flattened Culture
by Kyle Chayka
Published 15 Jan 2024

As Netflix describes it in its official Help Center, “Our systems have ranked titles in a way that is designed to present the best possible ordering of titles that you may enjoy.” Netflix pioneered the filtering of culture through recommendation engines. Before it debuted its streaming service in 2007, when it was still just a system of mail-order rental DVDs, Netflix had Cinematch, a module on its website that recommended movies for users, based on other users’ ratings (out of five stars), a form of social information filtering not far from Ringo, the early music recommendation system mentioned in the previous chapter. Cinematch launched in 2002. Over the years, the predictions proved to be accurate within half a star three-quarters of the time, and half of Netflix users who rented a movie that Cinematch recommended rated it five stars.

These early algorithms sorted individual emails, musicians (as opposed to specific songs), web pages, and commercial products. As digital platforms expanded, recommender systems moved into more complex areas of culture and operated at much faster speeds and higher volumes, sorting millions of tweets, films, user-uploaded videos, and even potential romantic partners. Filtering became the default online experience. This history is also a reminder that recommender systems are not omniscient entities but tools built by groups of tech researchers or workers. They are fallible products. Nick Seaver is a sociologist and a professor at Tufts University who studies recommender systems. His research focuses on the human side of algorithms, how the engineers who make them think about algorithmic recommendations.

Rather than targeting the data, newer EU laws take action against recommender systems more specifically. According to Leerssen, they are “command and control regulations, where the government is telling the industry what to do, rather than leaving it to a matter of user choice.” The Digital Services Act, which was approved in July 2022 and goes into effect in 2024, provides for some of the same kinds of transparency and communication around recommendations that GDPR does for data: Platforms “should clearly present the main parameters for such recommender systems in an easily comprehensible manner to ensure that the recipients understand how information is prioritized for them.”

pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future
by Kevin Kelly
Published 6 Jun 2016

First I’d like to be delivered more of what I know I like. This personal filter already exists. It’s called a recommendation engine. It is in wide use at Amazon, Netflix, Twitter, LinkedIn, Spotify, Beats, and Pandora, among other aggregators. Twitter uses a recommendation system to suggest who I should follow based on whom I already follow. Pandora uses a similar system to recommend what new music I’ll like based on what I already like. Over half of the connections made on LinkedIn arise from their follower recommender. Amazon’s recommendation engine is responsible for the well-known banner that “others who like this item also liked this next item.”

Amazon’s greatest asset is not its Prime delivery service but the millions of reader reviews it has accumulated over decades. Readers will pay for Amazon’s all-you-can-read ebook service, Kindle Unlimited, even though they will be able to find ebooks for free elsewhere, because Amazon’s reviews will guide them to books they want to read. Ditto for Netflix. Movie fans will pay Netflix because their recommendation engine finds gems they would not otherwise discover. They may be free somewhere else, but they are essentially lost and buried. In these examples, you are not paying for the copies, you are paying for the findability. • • • These eight qualities require a new skill set for creators. Success no longer derives from mastering distribution.

accelerometers, 221 accessing and accessibility, 109–33 and clouds, 125–31 and communications, 125 and decentralization, 118–21, 125, 129–31 and dematerialization, 110–14, 125 and emergence of the “holos,” 293–94 as generative quality, 70–71 ownership vs., 70–71 and platform synergy, 122–25 and real-time on demand, 114–17 and renting, 117–18 and right of modification, 124–25 accountability, 260–64 Adobe, 113, 206 advertising, 177–89 aggregated information, 140, 147 Airbnb, 109, 113, 124, 172 algorithms and targeted advertising, 179–82 Alibaba, 109 Amazon and accessibility vs. ownership, 109 and artificial intelligence, 33 cloud of, 128, 129 and on-demand model of access, 115 as ecosystem, 124 and filtering systems, 171–72 and recommendation engines, 169 and robot technology, 50 and tracking technology, 254 and user reviews, 21, 72–73 anime, 198 annotation systems, 202 anonymity, 263–64 anthropomorphization of technology, 259 Apache software, 69, 141, 143 API (application programming interface), 23 Apple, 1–2, 123, 124, 246 Apple Pay, 65 Apple Watch, 224 Arthur, Brian, 193, 209 artificial intelligence (AI), 29–60 ability to think differently, 42–43, 48, 51–52 as accelerant of change, 30 as alien intelligence, 48 in chess, 41–42 and cloud-based services, 127 and collaboration, 273 and commodity consumer attention, 179 and complex questions, 47 concerns regarding, 44 and consciousness, 42 corporate investment in, 32 costs of, 29, 52–53 data informing, 39 and defining humanity, 48–49 and digital storage capacity, 265, 266–67 and emergence of the “holos,” 291 as enhancement of human intelligence, 41–42 and filtering systems, 175 of Google, 36–37 impact of, 29 learning ability of, 32–33, 40 and lifelogging, 251 networked, 30 and network effect, 40 potential applications for, 34–36 questions arising from, 284 specialized applications of, 42 in tagging book content, 98 technological breakthroughs influencing, 38–40 ubiquity of, 30, 33 and video games, 230 and visual intelligence, 203 See also robots arts and artists artist/audience inversion, 81 and augmented reality, 232 and authenticity, 70 and creative remixing, 209 and crowdfunding, 156–61 and low-cost reproduction, 87 and patronage, 72 public art, 232 attention, 168–69, 176, 177–89 audience, 88, 148–49, 155, 156–57 audio recording, 249.

pages: 306 words: 82,909

A Hacker's Mind: How the Powerful Bend Society's Rules, and How to Bend Them Back
by Bruce Schneier
Published 7 Feb 2023

DEFENDING AGAINST AI HACKERS 236recommendation engines: Zeynep Tufekci (10 Mar 2018), “YouTube, the great equalizer,” New York Times, https://www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical.html. Renee DiResta (11 Apr 2018), “Up next: A better recommendation system,” Wired, https://www.wired.com/story/creating-ethical-recommendation-engines. 237can also benefit the defense: One example: Gregory Falco et al. (28 Aug 2018), “A master attack methodology for an AI-based automated attack planner for smart cities,” IEEE Access 6, https://ieeexplore.ieee.org/document/8449268. 59. A FUTURE OF AI HACKERS 242novel and completely unexpected hacks: Hedge funds and investment firms are already using AI to inform investment decisions.

And tribalism is so powerful and divisive that hacking it—especially with digital speed and precision—can have disastrous social effects, whether that’s the goal of a computer-assisted social hacker (like the Russians) or a side effect of an AI that neither knows nor cares about the costs of its actions (like social media recommendation engines). 48 Defending against Cognitive Hacks The “pick-up artist” community is a movement of men who develop and share manipulative techniques to seduce women. It predates the popular Internet but thrives there today. A lot of their techniques resemble cognitive hacks. “Negging” is one of their techniques.

If your driverless car navigation system satisfies the goal of maintaining a high speed by spinning in circles, programmers will notice this behavior and modify the AI’s goal accordingly. We’ll never see this behavior on the road. The greatest concern lies in the less obvious hacks that we won’t even notice because their effects are subtle. Much has been written about recommendation engines—the first generation of subtle AI hacks—and how they push people towards extreme content. They weren’t programmed to do this; it’s a property that naturally emerged as the systems continually tried things, saw the results, then modified themselves to do more of what increased user engagement and less of what didn’t.

pages: 267 words: 72,552

Reinventing Capitalism in the Age of Big Data
by Viktor Mayer-Schönberger and Thomas Ramge
Published 27 Feb 2018

With data-richness, market participants may learn the preferences of others and pair them using matching algorithms, but how do market participants express their preferences and their relative weight and communicate them to each other? It’s a difficult challenge, and solving it is crucial. Nobody wants to transact on markets that require hours of time spent answering questionnaires. Fortunately, here, too, recent technical advances have gotten us much closer to viable solutions. Consider again Amazon’s product-recommendation engine: at first glance, it’s a matching system. It quite successfully matches our preferences with available products and makes recommendations about what we should order. But that is only half of the story. Amazon captures our preferences not from us directly but from the comprehensive data stream it gathers about our every interaction with its website—what products we look at, when and for how long we look at them, which reviews we read.

To put an end to such inefficiencies, firms such as American Express, AT&T, and IBM have phased in software platforms that go far beyond classified-ad-type announcements of open positions on the company’s intranet. They match detailed (albeit standardized) job descriptions with detailed (albeit standardized) talent profiles. Filters make individuals and position pools easy to search, both for employees seeking a new challenge and for managers looking for new talent. And recommendation engines facilitate matchmaking across multiple dimensions. These internal talent marketplaces offer a number of advantages. First, they decentralize matching, reducing information overload within HR departments. Searching and matching is done outside HR, by managers with positions to fill and employees interested in making a move.

Consider Amazon: because of its sheer scale, it can fulfill customer orders at low cost. Network effects make Amazon a thick market, with lots of buyers and sellers, and many customers who leave valuable product reviews for others. Each additional customer adds value to the community. Finally, Amazon uses adaptive systems and feedback data to hone its recommendation engine, as well as its intelligent personal assistant, Alexa. Apple’s iPhone is another case in point. Because it can mass produce the phone, Apple can keep profit margins high while still holding to a price point that’s acceptable to consumers. A growing number of iPhone users have led to a vibrant app market.

pages: 296 words: 78,631

Hello World: Being Human in the Age of Algorithms
by Hannah Fry
Published 17 Sep 2018

There are algorithms that can automatically classify and remove inappropriate content on YouTube, algorithms that will label your holiday photos for you, and algorithms that can scan your handwriting and classify each mark on the page as a letter of the alphabet. Association: finding links Association is all about finding and marking relationships between things. Dating algorithms such as OKCupid have association at their core, looking for connections between members and suggesting matches based on the findings. Amazon’s recommendation engine uses a similar idea, connecting your interests to those of past customers. It’s what led to the intriguing shopping suggestion that confronted Reddit user Kerbobotat after buying a baseball bat on Amazon: ‘Perhaps you’ll be interested in this balaclava?’11 Filtering: isolating what’s important Algorithms often need to remove some information to focus on what’s important, to separate the signal from the noise.

But an algorithm needs something to go on. So, once you take away popularity and inherent quality, you’re left with the only thing that can be quantified: a metric for similarity to whatever has gone before. There’s still a great deal that can be done using measures of similarity. When it comes to building a recommendation engine, like the ones found in Netflix and Spotify, similarity is arguably the ideal measure. Both companies have a way to help users discover new films and songs, and, as subscription services, both have an incentive to accurately predict what users will enjoy. They can’t base their algorithms on what’s popular, or users would just get bombarded with suggestions for Justin Bieber and Peppa Pig The Movie.

Every now and then they will come up with something that you absolutely love, but it’s a bit like cold reading in that sense. You only need a strike every now and then to feel the serendipity of discovering new music. The engines don’t need to be right all the time. Similarity works perfectly well for recommendation engines. But when you ask algorithms to create art without a pure measure for quality, that’s where things start to get interesting. Can an algorithm be creative if its only sense of art is what happened in the past? Good artists borrow; great artists steal – Pablo Picasso In October 1997, an audience arrived at the University of Oregon to be treated to a rather unusual concert.

pages: 202 words: 62,901

The People's Republic of Walmart: How the World's Biggest Corporations Are Laying the Foundation for Socialism
by Leigh Phillips and Michal Rozworski
Published 5 Mar 2019

Two of the best examples of this are the “chaotic storage” system Amazon uses in its warehouses and the recommendations system buzzing in the background of its website, telling you which books or garden implements you might be interested in. Amazon’s recommendations system is the backbone of the company’s rapid success. This system drives those usually helpful (although sometimes comical—“Frequently bought together: baseball bat + black balaclava”) items that pop up in the “Customers who bought this also bought …” section of the website. Recommendations systems solve some of the information problems that have historically been associated with planning.

A universe of the most disparate ratings and reviews—always partial and often contradictory—can, if parsed right, provide very useful and lucrative information. Amazon also uses a system it calls “item-to-item collaborative filtering.” The company made a breakthrough when it devised its recommendations algorithm by managing to avoid common pitfalls plaguing other early recommendation engines. Amazon’s system doesn’t look for similarities between people; not only do such systems slow down significantly once millions are profiled, but they report significant overlaps among people whose tastes are actually very different (e.g., hipsters and boomers who buy the same bestsellers).

The two things may not be very obviously related, but it is enough that some people buy or browse them together. Combining millions of such interactions between people and things, Amazon’s algorithm creates a virtual map of its catalog that adapts very well to new information, even saving precious computing power when compared to the alternatives—clunkier recommendations systems that try to match similar users or find abstract similarities. Here is how the researchers at IBM’s labs describe Amazon’s recommendations: “When it takes other users’ behavior into account, collaborative filtering uses group knowledge to form a recommendation based on like users.” Filtering is an example of an IT-based rejoinder to one of the criticisms Hayek leveled against his socialist adversaries in the 1930s calculation debate: that only markets can aggregate and put to use the information dispersed throughout society.

pages: 302 words: 73,581

Platform Scale: How an Emerging Business Model Helps Startups Build Large Empires With Minimum Investment
by Sangeet Paul Choudary
Published 14 Sep 2015

Many Web 1.0 era filters were created based on long sign-up forms that the user filled out. Today, filters are created based on data captured on an ongoing basis through a user’s actions. Filters may be standalone or collaborative. Amazon’s “People who purchased this product also purchased this product” feature is based on a collaborative filter. Many recommendation platforms allow users to filter results based on a “people like you” parameter. This, again, is a collaborative filter. The most important innovation in recent times that has led to the spread of collaborative filters is the implementation of Facebook’s social graph. Through the social graph, third-party platforms like TripAdvisor serve reviews based on a collaborative filter of people who are close to you on the graph.

pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World
by Peter H. Diamandis and Steven Kotler
Published 3 Feb 2015

Well, in the case of Netflix, a better movie recommendation engine. A movie recommendation engine is a bit of software that tells you what movie you might want to watch next based on movies you’ve already watched and rated (on a scale of one to five stars). Netflix’s original recommendation engine, Cinematch, was created back in 2000 and quickly proved to be a wild success. Within a few years, nearly two-thirds of their rental business was being driven by their recommendation engine. Thus the obvious corollary: the better their recommendation engine, the better their business. And that was the problem.

In December 2006, a competitor called ‘simonfunk’ posted a complete description of his algorithm—which at the time was tied for third place—giving everyone else the opportunity to piggyback on his progress. ‘We had no idea the extent to which people would collaborate with each other,’ says Jim Bennett, vice president for recommendation systems at Netflix.”16 And this isn’t an aberration. Over the course of the eight XPRIZEs launched to date, there has been an extraordinary amount of cooperation. We’ve seen teams providing unsolicited advice, teams merging, teams acquiring and sharing technology and experts. When the prize is driven by an MTP, while a team’s primary purpose is to win, a close second is their desire to see the primary objective achieved; thus teams exhibit a much higher willingness to share.

pages: 353 words: 104,146

European Founders at Work
by Pedro Gairifo Santos
Published 7 Nov 2011

Jones: Nothing substantial, really. I think sometimes rights holders, especially in the music industry, will use court action or the threat of court action as a sort of negotiating position. But, no. I think we managed to avoid anything serious in that regard. Santos: From the technical point of view, the actual recommendation engine and statistics, how does that actually work? How hard was it to develop it and tweak it? Did you change the approach many times? Did you have a clear idea on how to do it from the start? Jones: So initially when I was building it, we tried all sorts of stuff. I think what I was using for a long time in the beginning was just to use Lucene, a document indexing system.

At one point we published a data dump of all of these scrobbling histories and some of our users at the time contributed various recommender strategies and said, “Hey, try this. I had quite good results with it." So for a while, we were piecing together ideas from the community. All this time, were mainly concerned with keeping the site afloat, keeping it fast, scaling up properly, and this sort of scrobbling data and radio. The recommendation engine wasn't brilliant to begin with. And then, we finally decided we needed to hire somebody who knows what they're doing, who's going to work on this full-time. We e-mailed some mailing lists. We e-mailed the ISMIR2 mailing list. They're a group who meet every year about music recommendations and information retrieval in music.

I told some of my friends so I could get some data and they told their friends and they told their friends and it spread. It turns out people quite liked just having those stats on what they listened to. They weren't even interested in recommendations at that point. I didn't really have a good recommender system for a long time. From your listening stats, you could click on an artist, and see who else had been listening to them. You could then see the listening stats of the other fans of artists you like. Just that system of connecting all the listening tastes proved to be really quite addictive.

pages: 404 words: 95,163

Amazon: How the World’s Most Relentless Retailer Will Continue to Revolutionize Commerce
by Natalie Berg and Miya Knights
Published 28 Jan 2019

The value of recommendation Having identified AI as the culmination of the main drivers shaping technology innovation today (stemming from a need for more autonomous computer systems particularly) – and before diving straight into voice technology as its current apotheosis – it is necessary to undertake an examination of how Amazon capitalized on the development of AI systems across its business and not just in its customers’ homes, as we have already done with the drivers of ubiquitous connectivity and pervasive interfaces. This examination adds to our understanding of how it has achieved its aim of removing friction from the average shopping journey and, in so doing, created a virtuous cycle that, in turn, generates even more sales and growth. In fact, it is AI that underpins the power of its search and recommendation engines. Back in the 1990s, Amazon was one of the first e-commerce players to rely heavily on product recommendations, which also helped it to cross-sell new categories as it moved beyond books. It is a category of technology development that Bezos has described as ‘the practical application of machine learning’.

The decision to open source DSSTNE also demonstrates when Amazon recognizes the need to collaborate over making gains with the vast potential of AI. On the Amazon site, these recommendations can be personalized, based on categories and ranges previously searched or browsed, to increase conversion. Equally, Amazon’s recommendation engine can display products similar to those searched for or browsed in the hopes of converting customers to rival brands or products. There are also recommendations based on anything ‘related to the items you’ve viewed’. Or they can depend on items that are ‘frequently bought together’ or by ‘customers who bought this item also bought…’ with the aim of boosting average order value.

Return customers to the Group’s Tmall and Taobao platforms are presented with product recommendations based not just on their past transactions, but also on browsing history, product feedback, bookmarks, geographic location and other online activity-related data. During the 2016 ‘Singles’ Day’ shopping festival, Alibaba said it used its AI recommendations engine to generate 6.7 billion personalized shopping pages based on merchants’ target customer data. Alibaba said that this large-scale personalization resulted in a 20 per cent improvement in conversion rate from the 11 November event.4 Recommendations and personalization aside, Amazon’s reliance on AI systems to orchestrate its vast business operations as well as its customer-facing ones is diverse.

pages: 377 words: 97,144

Singularity Rising: Surviving and Thriving in a Smarter, Richer, and More Dangerous World
by James D. Miller
Published 14 Jun 2012

This recommendation system bases its decisions on statistical analysis of the videos that viewers with tastes similar to yours have chosen and rated positively.75 Let me now offer you thirteen reasons why video recommendation is an excellent medium in which to develop AI: 1.Massive Profits—The growing proliferation of Internet videos means that a high-quality AI recommender would be worth billions to its owner. 2.Implicitly Knows a Lot About Us—Although we humans often understand why we like a video and can accurately guess what other types of people would like it, we frequently can’t reduce our reasoning to words, in part because mere language generally isn’t rich enough to capture our video experiences. A big part of our brain is devoted to processing visual inputs. Hence, a good recommendation system would necessarily have powerful insights into a significant chunk of our brains. 3.Measurable Incremental Progress—Think of AI as a destination a thousand miles away with the entire pathway hidden by fog. To reach our destination, we need to take many small steps, and for each step we need a way to determine if we have gone in the right direction. A video recommendation system provides this corrective by gathering continuous feedback on how many users liked the recommended videos. 4.Profitable with Every Step—Businesses are more motivated to invest in a type of innovation if they can continually increase revenue with each small improvement.

Fortunately, with video recommendations, many challenges, such as finding what type of cat video a certain set of users might enjoy, can be worked on independently for reasonably long periods of time. 6.Free Labor from Customers—A recommendation system would rely on millions of people to freely help train the system by picking which videos to watch, rating some of the videos they see, writing reviews of videos, and labeling in words the content they upload. 7.Help from Advertisers and Political Consultants—Salesmen would eagerly seek to learn what types of messages appealed to different factions of the population. The recommendation system could piggyback on these salesmen’s attempts to understand their clientele and use their insights to improve recommendation software. 8.AI and Human Recommenders Could Productively Work Together—Unlike what YouTube currently does, an effective AI recommendation system could make use of human evaluators.

For example, if 90 percent of people who had some unusual allele or brain microstructure enjoyed a certain cat video, then the AI recommender would suggest the video to all other viewers who had that trait. 12.Amenable to Crowdsourcing—Netflix, the rent-by-mail and streaming video distributor, offered (and eventually paid) a $1 million prize to whichever group improved its recommendation system the most, so long as at least one group improved the system by at least 10 percent. This “crowdsourcing,” which occurs when a problem is thrown open to anyone, helps a company by allowing them to draw on the talents of strangers, while only paying the strangers if they help the firm. This kind of crowdsourcing works only if, as with a video recommendation system, there is an easy and objective way of measuring progress toward the crowdsourced goal. 13.Potential Improvement All the Way Up to Superhuman Artificial General Intelligence—A recommendation AI could slowly morph into a content creator.

pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think
by Viktor Mayer-Schonberger and Kenneth Cukier
Published 5 Mar 2013

For example, in Amazon’s early days it signed a deal with AOL to run the technology behind AOL’s e-commerce site. To most people, it looked like an ordinary outsourcing deal. But what really interested Amazon, explains Andreas Weigend, Amazon’s former chief scientist, was getting hold of data on what AOL users were looking at and buying, which would improve the performance of its recommendation engine. Poor AOL never realized this. It only saw the data’s value in terms of its primary purpose—sales. Clever Amazon knew it could reap benefits by putting the data to a secondary use. Or take the case of Google’s entry into speech recognition with GOOG-411 for local search listings, which ran from 2007 to 2010.

Purchase one about babies and you’d be inundated with more of the same. “They tended to offer you tiny variations on your previous purchase, ad infinitum,” recalled James Marcus, an Amazon book reviewer from 1996 to 2001, in his memoir, Amazonia. “It felt as if you had gone shopping with the village idiot.” Greg Linden saw a solution. He realized that the recommendation system didn’t actually need to compare people with other people, a task that was technically cumbersome. All it needed to do was find associations among products themselves. In 1998 Linden and his colleagues applied for a patent on “item-to-item” collaborative filtering, as the technique is known.

Salespeople in all sectors have long been told that they need to understand what makes customers tick, to grasp the reasons behind their decisions. Professional skills and years of experience have been highly valued. Big data shows that there is another, in some ways more pragmatic approach. Amazon’s innovative recommendation systems teased out valuable correlations without knowing the underlying causes. Knowing what, not why, is good enough. Predictions and predilections Correlations are useful in a small-data world, but in the context of big data they really shine. Through them we can glean insights more easily, faster, and more clearly than before.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design
by Michael Kearns and Aaron Roth
Published 3 Oct 2019

The engine can then recommend to a user the movies that it predicts she will rate the highest. Netflix had a basic recommendation system based on collaborative filtering, but the company wanted a better one. The Netflix Prize competition offered $1 million for improving the accuracy of Netflix’s existing system by 10 percent. A 10 percent improvement is hard, so Netflix expected a multiyear competition. An improvement of 1 percent over the previous year’s state of the art qualified a competitor for an annual $50,000 progress prize, which would go to the best recommendation system submitted that year. Of course, to build a recommendation system, you need data, so Netflix publicly released a lot of it—a dataset consisting of more than a hundred million movie rating records, corresponding to the ratings that roughly half a million users gave to a total of nearly eighteen thousand movies.

But now that we know this, can the problem of privacy be solved by simply concealing information about birthdate, sex, and zip code in future data releases? It turns out that lots of less obvious things can also identify you—like the movies you watch. In 2006, Netflix launched the Netflix Prize competition, a public data science competition to find the best “collaborative filtering” algorithm to power Netflix’s movie recommendation engine. A key feature of Netflix’s service is its ability to recommend to users movies that they might like, given how they have rated past movies. (This was especially important when Netflix was primarily a mail-order DVD rental service, rather than a streaming service—it was harder to quickly browse or sample movies.)

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 17 Apr 2017

The discussion in Chapter 2 was focused around OLTP-style use: quickly executing queries to find a small number of vertices matching certain criteria. It is also interesting to look at graphs in a batch processing context, where the goal is to perform some kind of offline processing or analysis on an entire graph. This need often arises in machine learning applications such as recommendation engines, or in ranking systems. For example, one of the most famous graph analysis algorithms is PageRank [69], which tries to estimate the popularity of a web page based on what other web pages link to it. It is used as part of the formula that determines the order in which web search engines present their results.

The opposite of bounded. 558 | Glossary Index A aborts (transactions), 222, 224 in two-phase commit, 356 performance of optimistic concurrency con‐ trol, 266 retrying aborted transactions, 231 abstraction, 21, 27, 222, 266, 321 access path (in network model), 37, 60 accidental complexity, removing, 21 accountability, 535 ACID properties (transactions), 90, 223 atomicity, 223, 228 consistency, 224, 529 durability, 226 isolation, 225, 228 acknowledgements (messaging), 445 active/active replication (see multi-leader repli‐ cation) active/passive replication (see leader-based rep‐ lication) ActiveMQ (messaging), 137, 444 distributed transaction support, 361 ActiveRecord (object-relational mapper), 30, 232 actor model, 138 (see also message-passing) comparison to Pregel model, 425 comparison to stream processing, 468 Advanced Message Queuing Protocol (see AMQP) aerospace systems, 6, 10, 305, 372 aggregation data cubes and materialized views, 101 in batch processes, 406 in stream processes, 466 aggregation pipeline query language, 48 Agile, 22 minimizing irreversibility, 414, 497 moving faster with confidence, 532 Unix philosophy, 394 agreement, 365 (see also consensus) Airflow (workflow scheduler), 402 Ajax, 131 Akka (actor framework), 139 algorithms algorithm correctness, 308 B-trees, 79-83 for distributed systems, 306 hash indexes, 72-75 mergesort, 76, 402, 405 red-black trees, 78 SSTables and LSM-trees, 76-79 all-to-all replication topologies, 175 AllegroGraph (database), 50 ALTER TABLE statement (SQL), 40, 111 Amazon Dynamo (database), 177 Amazon Web Services (AWS), 8 Kinesis Streams (messaging), 448 network reliability, 279 postmortems, 9 RedShift (database), 93 S3 (object storage), 398 checking data integrity, 530 amplification of bias, 534 of failures, 364, 495 Index | 559 of tail latency, 16, 207 write amplification, 84 AMQP (Advanced Message Queuing Protocol), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 message ordering, 446 analytics, 90 comparison to transaction processing, 91 data warehousing (see data warehousing) parallel query execution in MPP databases, 415 predictive (see predictive analytics) relation to batch processing, 411 schemas for, 93-95 snapshot isolation for queries, 238 stream analytics, 466 using MapReduce, analysis of user activity events (example), 404 anti-caching (in-memory databases), 89 anti-entropy, 178 Apache ActiveMQ (see ActiveMQ) Apache Avro (see Avro) Apache Beam (see Beam) Apache BookKeeper (see BookKeeper) Apache Cassandra (see Cassandra) Apache CouchDB (see CouchDB) Apache Curator (see Curator) Apache Drill (see Drill) Apache Flink (see Flink) Apache Giraph (see Giraph) Apache Hadoop (see Hadoop) Apache HAWQ (see HAWQ) Apache HBase (see HBase) Apache Helix (see Helix) Apache Hive (see Hive) Apache Impala (see Impala) Apache Jena (see Jena) Apache Kafka (see Kafka) Apache Lucene (see Lucene) Apache MADlib (see MADlib) Apache Mahout (see Mahout) Apache Oozie (see Oozie) Apache Parquet (see Parquet) Apache Qpid (see Qpid) Apache Samza (see Samza) Apache Solr (see Solr) Apache Spark (see Spark) 560 | Index Apache Storm (see Storm) Apache Tajo (see Tajo) Apache Tez (see Tez) Apache Thrift (see Thrift) Apache ZooKeeper (see ZooKeeper) Apama (stream analytics), 466 append-only B-trees, 82, 242 append-only files (see logs) Application Programming Interfaces (APIs), 5, 27 for batch processing, 403 for change streams, 456 for distributed transactions, 361 for graph processing, 425 for services, 131-136 (see also services) evolvability, 136 RESTful, 133 SOAP, 133 application state (see state) approximate search (see similarity search) archival storage, data from databases, 131 arcs (see edges) arithmetic mean, 14 ASCII text, 119, 395 ASN.1 (schema language), 127 asynchronous networks, 278, 553 comparison to synchronous networks, 284 formal model, 307 asynchronous replication, 154, 553 conflict detection, 172 data loss on failover, 157 reads from asynchronous follower, 162 Asynchronous Transfer Mode (ATM), 285 atomic broadcast (see total order broadcast) atomic clocks (caesium clocks), 294, 295 (see also clocks) atomicity (concurrency), 553 atomic increment-and-get, 351 compare-and-set, 245, 327 (see also compare-and-set operations) replicated operations, 246 write operations, 243 atomicity (transactions), 223, 228, 553 atomic commit, 353 avoiding, 523, 528 blocking and nonblocking, 359 in stream processing, 360, 477 maintaining derived data, 453 for multi-object transactions, 229 for single-object writes, 230 auditability, 528-533 designing for, 531 self-auditing systems, 530 through immutability, 460 tools for auditable data systems, 532 availability, 8 (see also fault tolerance) in CAP theorem, 337 in service level agreements (SLAs), 15 Avro (data format), 122-127 code generation, 127 dynamically generated schemas, 126 object container files, 125, 131, 414 reader determining writer’s schema, 125 schema evolution, 123 use in Hadoop, 414 awk (Unix tool), 391 AWS (see Amazon Web Services) Azure (see Microsoft) B B-trees (indexes), 79-83 append-only/copy-on-write variants, 82, 242 branching factor, 81 comparison to LSM-trees, 83-85 crash recovery, 82 growing by splitting a page, 81 optimizations, 82 similarity to dynamic partitioning, 212 backpressure, 441, 553 in TCP, 282 backups database snapshot for replication, 156 integrity of, 530 snapshot isolation for, 238 use for ETL processes, 405 backward compatibility, 112 BASE, contrast to ACID, 223 bash shell (Unix), 70, 395, 503 batch processing, 28, 389-431, 553 combining with stream processing lambda architecture, 497 unifying technologies, 498 comparison to MPP databases, 414-418 comparison to stream processing, 464 comparison to Unix, 413-414 dataflow engines, 421-423 fault tolerance, 406, 414, 422, 442 for data integration, 494-498 graphs and iterative processing, 424-426 high-level APIs and languages, 403, 426-429 log-based messaging and, 451 maintaining derived state, 495 MapReduce and distributed filesystems, 397-413 (see also MapReduce) measuring performance, 13, 390 outputs, 411-413 key-value stores, 412 search indexes, 411 using Unix tools (example), 391-394 Bayou (database), 522 Beam (dataflow library), 498 bias, 534 big ball of mud, 20 Bigtable data model, 41, 99 binary data encodings, 115-128 Avro, 122-127 MessagePack, 116-117 Thrift and Protocol Buffers, 117-121 binary encoding based on schemas, 127 by network drivers, 128 binary strings, lack of support in JSON and XML, 114 BinaryProtocol encoding (Thrift), 118 Bitcask (storage engine), 72 crash recovery, 74 Bitcoin (cryptocurrency), 532 Byzantine fault tolerance, 305 concurrency bugs in exchanges, 233 bitmap indexes, 97 blockchains, 532 Byzantine fault tolerance, 305 blocking atomic commit, 359 Bloom (programming language), 504 Bloom filter (algorithm), 79, 466 BookKeeper (replicated log), 372 Bottled Water (change data capture), 455 bounded datasets, 430, 439, 553 (see also batch processing) bounded delays, 553 in networks, 285 process pauses, 298 broadcast hash joins, 409 Index | 561 brokerless messaging, 442 Brubeck (metrics aggregator), 442 BTM (transaction coordinator), 356 bulk synchronous parallel (BSP) model, 425 bursty network traffic patterns, 285 business data processing, 28, 90, 390 byte sequence, encoding data in, 112 Byzantine faults, 304-306, 307, 553 Byzantine fault-tolerant systems, 305, 532 Byzantine Generals Problem, 304 consensus algorithms and, 366 C caches, 89, 553 and materialized views, 101 as derived data, 386, 499-504 database as cache of transaction log, 460 in CPUs, 99, 338, 428 invalidation and maintenance, 452, 467 linearizability, 324 CAP theorem, 336-338, 554 Cascading (batch processing), 419, 427 hash joins, 409 workflows, 403 cascading failures, 9, 214, 281 Cascalog (batch processing), 60 Cassandra (database) column-family data model, 41, 99 compaction strategy, 79 compound primary key, 204 gossip protocol, 216 hash partitioning, 203-205 last-write-wins conflict resolution, 186, 292 leaderless replication, 177 linearizability, lack of, 335 log-structured storage, 78 multi-datacenter support, 184 partitioning scheme, 213 secondary indexes, 207 sloppy quorums, 184 cat (Unix tool), 391 causal context, 191 (see also causal dependencies) causal dependencies, 186-191 capturing, 191, 342, 494, 514 by total ordering, 493 causal ordering, 339 in transactions, 262 sending message to friends (example), 494 562 | Index causality, 554 causal ordering, 339-343 linearizability and, 342 total order consistent with, 344, 345 consistency with, 344-347 consistent snapshots, 340 happens-before relationship, 186 in serializable transactions, 262-265 mismatch with clocks, 292 ordering events to capture, 493 violations of, 165, 176, 292, 340 with synchronized clocks, 294 CEP (see complex event processing) certificate transparency, 532 chain replication, 155 linearizable reads, 351 change data capture, 160, 454 API support for change streams, 456 comparison to event sourcing, 457 implementing, 454 initial snapshot, 455 log compaction, 456 changelogs, 460 change data capture, 454 for operator state, 479 generating with triggers, 455 in stream joins, 474 log compaction, 456 maintaining derived state, 452 Chaos Monkey, 7, 280 checkpointing in batch processors, 422, 426 in high-performance computing, 275 in stream processors, 477, 523 chronicle data model, 458 circuit-switched networks, 284 circular buffers, 450 circular replication topologies, 175 clickstream data, analysis of, 404 clients calling services, 131 pushing state changes to, 512 request routing, 214 stateful and offline-capable, 170, 511 clocks, 287-299 atomic (caesium) clocks, 294, 295 confidence interval, 293-295 for global snapshots, 294 logical (see logical clocks) skew, 291-294, 334 slewing, 289 synchronization and accuracy, 289-291 synchronization using GPS, 287, 290, 294, 295 time-of-day versus monotonic clocks, 288 timestamping events, 471 cloud computing, 146, 275 need for service discovery, 372 network glitches, 279 shared resources, 284 single-machine reliability, 8 Cloudera Impala (see Impala) clustered indexes, 86 CODASYL model, 36 (see also network model) code generation with Avro, 127 with Thrift and Protocol Buffers, 118 with WSDL, 133 collaborative editing multi-leader replication and, 170 column families (Bigtable), 41, 99 column-oriented storage, 95-101 column compression, 97 distinction between column families and, 99 in batch processors, 428 Parquet, 96, 131, 414 sort order in, 99-100 vectorized processing, 99, 428 writing to, 101 comma-separated values (see CSV) command query responsibility segregation (CQRS), 462 commands (event sourcing), 459 commits (transactions), 222 atomic commit, 354-355 (see also atomicity; transactions) read committed isolation, 234 three-phase commit (3PC), 359 two-phase commit (2PC), 355-359 commutative operations, 246 compaction of changelogs, 456 (see also log compaction) for stream operator state, 479 of log-structured storage, 73 issues with, 84 size-tiered and leveled approaches, 79 CompactProtocol encoding (Thrift), 119 compare-and-set operations, 245, 327 implementing locks, 370 implementing uniqueness constraints, 331 implementing with total order broadcast, 350 relation to consensus, 335, 350, 352, 374 relation to transactions, 230 compatibility, 112, 128 calling services, 136 properties of encoding formats, 139 using databases, 129-131 using message-passing, 138 compensating transactions, 355, 461, 526 complex event processing (CEP), 465 complexity distilling in theoretical models, 310 hiding using abstraction, 27 of software systems, managing, 20 composing data systems (see unbundling data‐ bases) compute-intensive applications, 3, 275 concatenated indexes, 87 in Cassandra, 204 Concord (stream processor), 466 concurrency actor programming model, 138, 468 (see also message-passing) bugs from weak transaction isolation, 233 conflict resolution, 171, 174 detecting concurrent writes, 184-191 dual writes, problems with, 453 happens-before relationship, 186 in replicated systems, 161-191, 324-338 lost updates, 243 multi-version concurrency control (MVCC), 239 optimistic concurrency control, 261 ordering of operations, 326, 341 reducing, through event logs, 351, 462, 507 time and relativity, 187 transaction isolation, 225 write skew (transaction isolation), 246-251 conflict-free replicated datatypes (CRDTs), 174 conflicts conflict detection, 172 causal dependencies, 186, 342 in consensus algorithms, 368 in leaderless replication, 184 Index | 563 in log-based systems, 351, 521 in nonlinearizable systems, 343 in serializable snapshot isolation (SSI), 264 in two-phase commit, 357, 364 conflict resolution automatic conflict resolution, 174 by aborting transactions, 261 by apologizing, 527 convergence, 172-174 in leaderless systems, 190 last write wins (LWW), 186, 292 using atomic operations, 246 using custom logic, 173 determining what is a conflict, 174, 522 in multi-leader replication, 171-175 avoiding conflicts, 172 lost updates, 242-246 materializing, 251 relation to operation ordering, 339 write skew (transaction isolation), 246-251 congestion (networks) avoidance, 282 limiting accuracy of clocks, 293 queueing delays, 282 consensus, 321, 364-375, 554 algorithms, 366-368 preventing split brain, 367 safety and liveness properties, 365 using linearizable operations, 351 cost of, 369 distributed transactions, 352-375 in practice, 360-364 two-phase commit, 354-359 XA transactions, 361-364 impossibility of, 353 membership and coordination services, 370-373 relation to compare-and-set, 335, 350, 352, 374 relation to replication, 155, 349 relation to uniqueness constraints, 521 consistency, 224, 524 across different databases, 157, 452, 462, 492 causal, 339-348, 493 consistent prefix reads, 165-167 consistent snapshots, 156, 237-242, 294, 455, 500 (see also snapshots) 564 | Index crash recovery, 82 enforcing constraints (see constraints) eventual, 162, 322 (see also eventual consistency) in ACID transactions, 224, 529 in CAP theorem, 337 linearizability, 324-338 meanings of, 224 monotonic reads, 164-165 of secondary indexes, 231, 241, 354, 491, 500 ordering guarantees, 339-352 read-after-write, 162-164 sequential, 351 strong (see linearizability) timeliness and integrity, 524 using quorums, 181, 334 consistent hashing, 204 consistent prefix reads, 165 constraints (databases), 225, 248 asynchronously checked, 526 coordination avoidance, 527 ensuring idempotence, 519 in log-based systems, 521-524 across multiple partitions, 522 in two-phase commit, 355, 357 relation to consensus, 374, 521 relation to event ordering, 347 requiring linearizability, 330 Consul (service discovery), 372 consumers (message streams), 137, 440 backpressure, 441 consumer offsets in logs, 449 failures, 445, 449 fan-out, 11, 445, 448 load balancing, 444, 448 not keeping up with producers, 441, 450, 502 context switches, 14, 297 convergence (conflict resolution), 172-174, 322 coordination avoidance, 527 cross-datacenter, 168, 493 cross-partition ordering, 256, 294, 348, 523 services, 330, 370-373 coordinator (in 2PC), 356 failure, 358 in XA transactions, 361-364 recovery, 363 copy-on-write (B-trees), 82, 242 CORBA (Common Object Request Broker Architecture), 134 correctness, 6 auditability, 528-533 Byzantine fault tolerance, 305, 532 dealing with partial failures, 274 in log-based systems, 521-524 of algorithm within system model, 308 of compensating transactions, 355 of consensus, 368 of derived data, 497, 531 of immutable data, 461 of personal data, 535, 540 of time, 176, 289-295 of transactions, 225, 515, 529 timeliness and integrity, 524-528 corruption of data detecting, 519, 530-533 due to pathological memory access, 529 due to radiation, 305 due to split brain, 158, 302 due to weak transaction isolation, 233 formalization in consensus, 366 integrity as absence of, 524 network packets, 306 on disks, 227 preventing using write-ahead logs, 82 recovering from, 414, 460 Couchbase (database) durability, 89 hash partitioning, 203-204, 211 rebalancing, 213 request routing, 216 CouchDB (database) B-tree storage, 242 change feed, 456 document data model, 31 join support, 34 MapReduce support, 46, 400 replication, 170, 173 covering indexes, 86 CPUs cache coherence and memory barriers, 338 caching and pipelining, 99, 428 increasing parallelism, 43 CRDTs (see conflict-free replicated datatypes) CREATE INDEX statement (SQL), 85, 500 credit rating agencies, 535 Crunch (batch processing), 419, 427 hash joins, 409 sharded joins, 408 workflows, 403 cryptography defense against attackers, 306 end-to-end encryption and authentication, 519, 543 proving integrity of data, 532 CSS (Cascading Style Sheets), 44 CSV (comma-separated values), 70, 114, 396 Curator (ZooKeeper recipes), 330, 371 curl (Unix tool), 135, 397 cursor stability, 243 Cypher (query language), 52 comparison to SPARQL, 59 D data corruption (see corruption of data) data cubes, 102 data formats (see encoding) data integration, 490-498, 543 batch and stream processing, 494-498 lambda architecture, 497 maintaining derived state, 495 reprocessing data, 496 unifying, 498 by unbundling databases, 499-515 comparison to federated databases, 501 combining tools by deriving data, 490-494 derived data versus distributed transac‐ tions, 492 limits of total ordering, 493 ordering events to capture causality, 493 reasoning about dataflows, 491 need for, 385 data lakes, 415 data locality (see locality) data models, 27-64 graph-like models, 49-63 Datalog language, 60-63 property graphs, 50 RDF and triple-stores, 55-59 query languages, 42-48 relational model versus document model, 28-42 data protection regulations, 542 data systems, 3 about, 4 Index | 565 concerns when designing, 5 future of, 489-544 correctness, constraints, and integrity, 515-533 data integration, 490-498 unbundling databases, 499-515 heterogeneous, keeping in sync, 452 maintainability, 18-22 possible faults in, 221 reliability, 6-10 hardware faults, 7 human errors, 9 importance of, 10 software errors, 8 scalability, 10-18 unreliable clocks, 287-299 data warehousing, 91-95, 554 comparison to data lakes, 415 ETL (extract-transform-load), 92, 416, 452 keeping data systems in sync, 452 schema design, 93 slowly changing dimension (SCD), 476 data-intensive applications, 3 database triggers (see triggers) database-internal distributed transactions, 360, 364, 477 databases archival storage, 131 comparison of message brokers to, 443 dataflow through, 129 end-to-end argument for, 519-520 checking integrity, 531 inside-out, 504 (see also unbundling databases) output from batch workflows, 412 relation to event streams, 451-464 (see also changelogs) API support for change streams, 456, 506 change data capture, 454-457 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 unbundling, 499-515 composing data storage technologies, 499-504 designing applications around dataflow, 504-509 566 | Index observing derived state, 509-515 datacenters geographically distributed, 145, 164, 278, 493 multi-tenancy and shared resources, 284 network architecture, 276 network faults, 279 replication across multiple, 169 leaderless replication, 184 multi-leader replication, 168, 335 dataflow, 128-139, 504-509 correctness of dataflow systems, 525 differential, 504 message-passing, 136-139 reasoning about, 491 through databases, 129 through services, 131-136 dataflow engines, 421-423 comparison to stream processing, 464 directed acyclic graphs (DAG), 424 partitioning, approach to, 429 support for declarative queries, 427 Datalog (query language), 60-63 datatypes binary strings in XML and JSON, 114 conflict-free, 174 in Avro encodings, 122 in Thrift and Protocol Buffers, 121 numbers in XML and JSON, 114 Datomic (database) B-tree storage, 242 data model, 50, 57 Datalog query language, 60 excision (deleting data), 463 languages for transactions, 255 serial execution of transactions, 253 deadlocks detection, in two-phase commit (2PC), 364 in two-phase locking (2PL), 258 Debezium (change data capture), 455 declarative languages, 42, 554 Bloom, 504 CSS and XSL, 44 Cypher, 52 Datalog, 60 for batch processing, 427 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 delays bounded network delays, 285 bounded process pauses, 298 unbounded network delays, 282 unbounded process pauses, 296 deleting data, 463 denormalization (data representation), 34, 554 costs, 39 in derived data systems, 386 materialized views, 101 updating derived data, 228, 231, 490 versus normalization, 462 derived data, 386, 439, 554 from change data capture, 454 in event sourcing, 458-458 maintaining derived state through logs, 452-457, 459-463 observing, by subscribing to streams, 512 outputs of batch and stream processing, 495 through application code, 505 versus distributed transactions, 492 deterministic operations, 255, 274, 554 accidental nondeterminism, 423 and fault tolerance, 423, 426 and idempotence, 478, 492 computing derived data, 495, 526, 531 in state machine replication, 349, 452, 458 joins, 476 DevOps, 394 differential dataflow, 504 dimension tables, 94 dimensional modeling (see star schemas) directed acyclic graphs (DAGs), 424 dirty reads (transaction isolation), 234 dirty writes (transaction isolation), 235 discrimination, 534 disks (see hard disks) distributed actor frameworks, 138 distributed filesystems, 398-399 decoupling from query engines, 417 indiscriminately dumping data into, 415 use by MapReduce, 402 distributed systems, 273-312, 554 Byzantine faults, 304-306 cloud versus supercomputing, 275 detecting network faults, 280 faults and partial failures, 274-277 formalization of consensus, 365 impossibility results, 338, 353 issues with failover, 157 limitations of distributed transactions, 363 multi-datacenter, 169, 335 network problems, 277-286 quorums, relying on, 301 reasons for using, 145, 151 synchronized clocks, relying on, 291-295 system models, 306-310 use of clocks and time, 287 distributed transactions (see transactions) Django (web framework), 232 DNS (Domain Name System), 216, 372 Docker (container manager), 506 document data model, 30-42 comparison to relational model, 38-42 document references, 38, 403 document-oriented databases, 31 many-to-many relationships and joins, 36 multi-object transactions, need for, 231 versus relational model convergence of models, 41 data locality, 41 document-partitioned indexes, 206, 217, 411 domain-driven design (DDD), 457 DRBD (Distributed Replicated Block Device), 153 drift (clocks), 289 Drill (query engine), 93 Druid (database), 461 Dryad (dataflow engine), 421 dual writes, problems with, 452, 507 duplicates, suppression of, 517 (see also idempotence) using a unique ID, 518, 522 durability (transactions), 226, 554 duration (time), 287 measurement with monotonic clocks, 288 dynamic partitioning, 212 dynamically typed languages analogy to schema-on-read, 40 code generation and, 127 Dynamo-style databases (see leaderless replica‐ tion) E edges (in graphs), 49, 403 property graph model, 50 edit distance (full-text search), 88 effectively-once semantics, 476, 516 Index | 567 (see also exactly-once semantics) preservation of integrity, 525 elastic systems, 17 Elasticsearch (search server) document-partitioned indexes, 207 partition rebalancing, 211 percolator (stream search), 467 usage example, 4 use of Lucene, 79 ElephantDB (database), 413 Elm (programming language), 504, 512 encodings (data formats), 111-128 Avro, 122-127 binary variants of JSON and XML, 115 compatibility, 112 calling services, 136 using databases, 129-131 using message-passing, 138 defined, 113 JSON, XML, and CSV, 114 language-specific formats, 113 merits of schemas, 127 representations of data, 112 Thrift and Protocol Buffers, 117-121 end-to-end argument, 277, 519-520 checking integrity, 531 publish/subscribe streams, 512 enrichment (stream), 473 Enterprise JavaBeans (EJB), 134 entities (see vertices) epoch (consensus algorithms), 368 epoch (Unix timestamps), 288 equi-joins, 403 erasure coding (error correction), 398 Erlang OTP (actor framework), 139 error handling for network faults, 280 in transactions, 231 error-correcting codes, 277, 398 Esper (CEP engine), 466 etcd (coordination service), 370-373 linearizable operations, 333 locks and leader election, 330 quorum reads, 351 service discovery, 372 use of Raft algorithm, 349, 353 Ethereum (blockchain), 532 Ethernet (networks), 276, 278, 285 packet checksums, 306, 519 568 | Index Etherpad (collaborative editor), 170 ethics, 533-543 code of ethics and professional practice, 533 legislation and self-regulation, 542 predictive analytics, 533-536 amplifying bias, 534 feedback loops, 536 privacy and tracking, 536-543 consent and freedom of choice, 538 data as assets and power, 540 meaning of privacy, 539 surveillance, 537 respect, dignity, and agency, 543, 544 unintended consequences, 533, 536 ETL (extract-transform-load), 92, 405, 452, 554 use of Hadoop for, 416 event sourcing, 457-459 commands and events, 459 comparison to change data capture, 457 comparison to lambda architecture, 497 deriving current state from event log, 458 immutability and auditability, 459, 531 large, reliable data systems, 519, 526 Event Store (database), 458 event streams (see streams) events, 440 deciding on total order of, 493 deriving views from event log, 461 difference to commands, 459 event time versus processing time, 469, 477, 498 immutable, advantages of, 460, 531 ordering to capture causality, 493 reads as, 513 stragglers, 470, 498 timestamp of, in stream processing, 471 EventSource (browser API), 512 eventual consistency, 152, 162, 308, 322 (see also conflicts) and perpetual inconsistency, 525 evolvability, 21, 111 calling services, 136 graph-structured data, 52 of databases, 40, 129-131, 461, 497 of message-passing, 138 reprocessing data, 496, 498 schema evolution in Avro, 123 schema evolution in Thrift and Protocol Buffers, 120 schema-on-read, 39, 111, 128 exactly-once semantics, 360, 476, 516 parity with batch processors, 498 preservation of integrity, 525 exclusive mode (locks), 258 eXtended Architecture transactions (see XA transactions) extract-transform-load (see ETL) F Facebook Presto (query engine), 93 React, Flux, and Redux (user interface libra‐ ries), 512 social graphs, 49 Wormhole (change data capture), 455 fact tables, 93 failover, 157, 554 (see also leader-based replication) in leaderless replication, absence of, 178 leader election, 301, 348, 352 potential problems, 157 failures amplification by distributed transactions, 364, 495 failure detection, 280 automatic rebalancing causing cascading failures, 214 perfect failure detectors, 359 timeouts and unbounded delays, 282, 284 using ZooKeeper, 371 faults versus, 7 partial failures in distributed systems, 275-277, 310 fan-out (messaging systems), 11, 445 fault tolerance, 6-10, 555 abstractions for, 321 formalization in consensus, 365-369 use of replication, 367 human fault tolerance, 414 in batch processing, 406, 414, 422, 425 in log-based systems, 520, 524-526 in stream processing, 476-479 atomic commit, 477 idempotence, 478 maintaining derived state, 495 microbatching and checkpointing, 477 rebuilding state after a failure, 478 of distributed transactions, 362-364 transaction atomicity, 223, 354-361 faults, 6 Byzantine faults, 304-306 failures versus, 7 handled by transactions, 221 handling in supercomputers and cloud computing, 275 hardware, 7 in batch processing versus distributed data‐ bases, 417 in distributed systems, 274-277 introducing deliberately, 7, 280 network faults, 279-281 asymmetric faults, 300 detecting, 280 tolerance of, in multi-leader replication, 169 software errors, 8 tolerating (see fault tolerance) federated databases, 501 fence (CPU instruction), 338 fencing (preventing split brain), 158, 302-304 generating fencing tokens, 349, 370 properties of fencing tokens, 308 stream processors writing to databases, 478, 517 Fibre Channel (networks), 398 field tags (Thrift and Protocol Buffers), 119-121 file descriptors (Unix), 395 financial data, 460 Firebase (database), 456 Flink (processing framework), 421-423 dataflow APIs, 427 fault tolerance, 422, 477, 479 Gelly API (graph processing), 425 integration of batch and stream processing, 495, 498 machine learning, 428 query optimizer, 427 stream processing, 466 flow control, 282, 441, 555 FLP result (on consensus), 353 FlumeJava (dataflow library), 403, 427 followers, 152, 555 (see also leader-based replication) foreign keys, 38, 403 forward compatibility, 112 forward decay (algorithm), 16 Index | 569 Fossil (version control system), 463 shunning (deleting data), 463 FoundationDB (database) serializable transactions, 261, 265, 364 fractal trees, 83 full table scans, 403 full-text search, 555 and fuzzy indexes, 88 building search indexes, 411 Lucene storage engine, 79 functional reactive programming (FRP), 504 functional requirements, 22 futures (asynchronous operations), 135 fuzzy search (see similarity search) G garbage collection immutability and, 463 process pauses for, 14, 296-299, 301 (see also process pauses) genome analysis, 63, 429 geographically distributed datacenters, 145, 164, 278, 493 geospatial indexes, 87 Giraph (graph processing), 425 Git (version control system), 174, 342, 463 GitHub, postmortems, 157, 158, 309 global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), 398 GNU Coreutils (Linux), 394 GoldenGate (change data capture), 161, 170, 455 (see also Oracle) Google Bigtable (database) data model (see Bigtable data model) partitioning scheme, 199, 202 storage layout, 78 Chubby (lock service), 370 Cloud Dataflow (stream processor), 466, 477, 498 (see also Beam) Cloud Pub/Sub (messaging), 444, 448 Docs (collaborative editor), 170 Dremel (query engine), 93, 96 FlumeJava (dataflow library), 403, 427 GFS (distributed file system), 398 gRPC (RPC framework), 135 MapReduce (batch processing), 390 570 | Index (see also MapReduce) building search indexes, 411 task preemption, 418 Pregel (graph processing), 425 Spanner (see Spanner) TrueTime (clock API), 294 gossip protocol, 216 government use of data, 541 GPS (Global Positioning System) use for clock synchronization, 287, 290, 294, 295 GraphChi (graph processing), 426 graphs, 555 as data models, 49-63 example of graph-structured data, 49 property graphs, 50 RDF and triple-stores, 55-59 versus the network model, 60 processing and analysis, 424-426 fault tolerance, 425 Pregel processing model, 425 query languages Cypher, 52 Datalog, 60-63 recursive SQL queries, 53 SPARQL, 59-59 Gremlin (graph query language), 50 grep (Unix tool), 392 GROUP BY clause (SQL), 406 grouping records in MapReduce, 406 handling skew, 407 H Hadoop (data infrastructure) comparison to distributed databases, 390 comparison to MPP databases, 414-418 comparison to Unix, 413-414, 499 diverse processing models in ecosystem, 417 HDFS distributed filesystem (see HDFS) higher-level tools, 403 join algorithms, 403-410 (see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, 340 capturing, 187 concurrency and, 186 hard disks access patterns, 84 detecting corruption, 519, 530 faults in, 7, 227 sequential write throughput, 75, 450 hardware faults, 7 hash indexes, 72-75 broadcast hash joins, 409 partitioned hash joins, 409 hash partitioning, 203-205, 217 consistent hashing, 204 problems with hash mod N, 210 range queries, 204 suitable hash functions, 203 with fixed number of partitions, 210 HAWQ (database), 428 HBase (database) bug due to lack of fencing, 302 bulk loading, 413 column-family data model, 41, 99 dynamic partitioning, 212 key-range partitioning, 202 log-structured storage, 78 request routing, 216 size-tiered compaction, 79 use of HDFS, 417 use of ZooKeeper, 370 HDFS (Hadoop Distributed File System), 398-399 (see also distributed filesystems) checking data integrity, 530 decoupling from query engines, 417 indiscriminately dumping data into, 415 metadata about datasets, 410 NameNode, 398 use by Flink, 479 use by HBase, 212 use by MapReduce, 402 HdrHistogram (numerical library), 16 head (Unix tool), 392 head vertex (property graphs), 51 head-of-line blocking, 15 heap files (databases), 86 Helix (cluster manager), 216 heterogeneous distributed transactions, 360, 364 heuristic decisions (in 2PC), 363 Hibernate (object-relational mapper), 30 hierarchical model, 36 high availability (see fault tolerance) high-frequency trading, 290, 299 high-performance computing (HPC), 275 hinted handoff, 183 histograms, 16 Hive (query engine), 419, 427 for data warehouses, 93 HCatalog and metastore, 410 map-side joins, 409 query optimizer, 427 skewed joins, 408 workflows, 403 Hollerith machines, 390 hopping windows (stream processing), 472 (see also windows) horizontal scaling (see scaling out) HornetQ (messaging), 137, 444 distributed transaction support, 361 hot spots, 201 due to celebrities, 205 for time-series data, 203 in batch processing, 407 relieving, 205 hot standbys (see leader-based replication) HTTP, use in APIs (see services) human errors, 9, 279, 414 HyperDex (database), 88 HyperLogLog (algorithm), 466 I I/O operations, waiting for, 297 IBM DB2 (database) distributed transaction support, 361 recursive query support, 54 serializable isolation, 242, 257 XML and JSON support, 30, 42 electromechanical card-sorting machines, 390 IMS (database), 36 imperative query APIs, 46 InfoSphere Streams (CEP engine), 466 MQ (messaging), 444 distributed transaction support, 361 System R (database), 222 WebSphere (messaging), 137 idempotence, 134, 478, 555 by giving operations unique IDs, 518, 522 idempotent operations, 517 immutability advantages of, 460, 531 Index | 571 deriving state from event log, 459-464 for crash recovery, 75 in B-trees, 82, 242 in event sourcing, 457 inputs to Unix commands, 397 limitations of, 463 Impala (query engine) for data warehouses, 93 hash joins, 409 native code generation, 428 use of HDFS, 417 impedance mismatch, 29 imperative languages, 42 setting element styles (example), 45 in doubt (transaction status), 358 holding locks, 362 orphaned transactions, 363 in-memory databases, 88 durability, 227 serial transaction execution, 253 incidents cascading failures, 9 crashes due to leap seconds, 290 data corruption and financial losses due to concurrency bugs, 233 data corruption on hard disks, 227 data loss due to last-write-wins, 173, 292 data on disks unreadable, 309 deleted items reappearing, 174 disclosure of sensitive data due to primary key reuse, 157 errors in transaction serializability, 529 gigabit network interface with 1 Kb/s throughput, 311 network faults, 279 network interface dropping only inbound packets, 279 network partitions and whole-datacenter failures, 275 poor handling of network faults, 280 sending message to ex-partner, 494 sharks biting undersea cables, 279 split brain due to 1-minute packet delay, 158, 279 vibrations in server rack, 14 violation of uniqueness constraint, 529 indexes, 71, 555 and snapshot isolation, 241 as derived data, 386, 499-504 572 | Index B-trees, 79-83 building in batch processes, 411 clustered, 86 comparison of B-trees and LSM-trees, 83-85 concatenated, 87 covering (with included columns), 86 creating, 500 full-text search, 88 geospatial, 87 hash, 72-75 index-range locking, 260 multi-column, 87 partitioning and secondary indexes, 206-209, 217 secondary, 85 (see also secondary indexes) problems with dual writes, 452, 491 SSTables and LSM-trees, 76-79 updating when data changes, 452, 467 Industrial Revolution, 541 InfiniBand (networks), 285 InfiniteGraph (database), 50 InnoDB (storage engine) clustered index on primary key, 86 not preventing lost updates, 245 preventing write skew, 248, 257 serializable isolation, 257 snapshot isolation support, 239 inside-out databases, 504 (see also unbundling databases) integrating different data systems (see data integration) integrity, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 in consensus formalization, 365 integrity checks, 530 (see also auditing) end-to-end, 519, 531 use of snapshot isolation, 238 maintaining despite software bugs, 529 Interface Definition Language (IDL), 117, 122 intermediate state, materialization of, 420-423 internet services, systems for implementing, 275 invariants, 225 (see also constraints) inversion of control, 396 IP (Internet Protocol) unreliability of, 277 ISDN (Integrated Services Digital Network), 284 isolation (in transactions), 225, 228, 555 correctness and, 515 for single-object writes, 230 serializability, 251-266 actual serial execution, 252-256 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 violating, 228 weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-237 snapshot isolation, 237-242 iterative processing, 424-426 J Java Database Connectivity (JDBC) distributed transaction support, 361 network drivers, 128 Java Enterprise Edition (EE), 134, 356, 361 Java Message Service (JMS), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 distributed transaction support, 361 message ordering, 446 Java Transaction API (JTA), 355, 361 Java Virtual Machine (JVM) bytecode generation, 428 garbage collection pauses, 296 process reuse in batch processors, 422 JavaScript in MapReduce querying, 46 setting element styles (example), 45 use in advanced queries, 48 Jena (RDF framework), 57 Jepsen (fault tolerance testing), 515 jitter (network delay), 284 joins, 555 by index lookup, 403 expressing as relational operators, 427 in relational and document databases, 34 MapReduce map-side joins, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 MapReduce reduce-side joins, 403-408 handling skew, 407 sort-merge joins, 405 parallel execution of, 415 secondary indexes and, 85 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 support in document databases, 42 JOTM (transaction coordinator), 356 JSON Avro schema representation, 122 binary variants, 115 for application data, issues with, 114 in relational databases, 30, 42 representing a résumé (example), 31 Juttle (query language), 504 K k-nearest neighbors, 429 Kafka (messaging), 137, 448 Kafka Connect (database integration), 457, 461 Kafka Streams (stream processor), 466, 467 fault tolerance, 479 leader-based replication, 153 log compaction, 456, 467 message offsets, 447, 478 request routing, 216 transaction support, 477 usage example, 4 Ketama (partitioning library), 213 key-value stores, 70 as batch process output, 412 hash indexes, 72-75 in-memory, 89 partitioning, 201-205 by hash of key, 203, 217 by key range, 202, 217 dynamic partitioning, 212 skew and hot spots, 205 Kryo (Java), 113 Kubernetes (cluster manager), 418, 506 L lambda architecture, 497 Lamport timestamps, 345 Index | 573 Large Hadron Collider (LHC), 64 last write wins (LWW), 173, 334 discarding concurrent writes, 186 problems with, 292 prone to lost updates, 246 late binding, 396 latency instability under two-phase locking, 259 network latency and resource utilization, 286 response time versus, 14 tail latency, 15, 207 leader-based replication, 152-161 (see also replication) failover, 157, 301 handling node outages, 156 implementation of replication logs change data capture, 454-457 (see also changelogs) statement-based, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 linearizability of operations, 333 locking and leader election, 330 log sequence number, 156, 449 read-scaling architecture, 161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 leaderless replication, 177-191 (see also replication) detecting concurrent writes, 184-191 capturing happens-before relationship, 187 happens-before relationship and concur‐ rency, 186 last write wins, 186 merging concurrently written values, 190 version vectors, 191 multi-datacenter, 184 quorums, 179-182 consistency limitations, 181-183, 334 sloppy quorums and hinted handoff, 183 read repair and anti-entropy, 178 leap seconds, 8, 290 in time-of-day clocks, 288 leases, 295 implementation with ZooKeeper, 370 574 | Index need for fencing, 302 ledgers, 460 distributed ledger technologies, 532 legacy systems, maintenance of, 18 less (Unix tool), 397 LevelDB (storage engine), 78 leveled compaction, 79 Levenshtein automata, 88 limping (partial failure), 311 linearizability, 324-338, 555 cost of, 335-338 CAP theorem, 336 memory on multi-core CPUs, 338 definition, 325-329 implementing with total order broadcast, 350 in ZooKeeper, 370 of derived data systems, 492, 524 avoiding coordination, 527 of different replication methods, 332-335 using quorums, 334 relying on, 330-332 constraints and uniqueness, 330 cross-channel timing dependencies, 331 locking and leader election, 330 stronger than causal consistency, 342 using to implement total order broadcast, 351 versus serializability, 329 LinkedIn Azkaban (workflow scheduler), 402 Databus (change data capture), 161, 455 Espresso (database), 31, 126, 130, 153, 216 Helix (cluster manager) (see Helix) profile (example), 30 reference to company entity (example), 34 Rest.li (RPC framework), 135 Voldemort (database) (see Voldemort) Linux, leap second bug, 8, 290 liveness properties, 308 LMDB (storage engine), 82, 242 load approaches to coping with, 17 describing, 11 load testing, 16 load balancing (messaging), 444 local indexes (see document-partitioned indexes) locality (data access), 32, 41, 555 in batch processing, 400, 405, 421 in stateful clients, 170, 511 in stream processing, 474, 478, 508, 522 location transparency, 134 in the actor model, 138 locks, 556 deadlock, 258 distributed locking, 301-304, 330 fencing tokens, 303 implementation with ZooKeeper, 370 relation to consensus, 374 for transaction isolation in snapshot isolation, 239 in two-phase locking (2PL), 257-261 making operations atomic, 243 performance, 258 preventing dirty writes, 236 preventing phantoms with index-range locks, 260, 265 read locks (shared mode), 236, 258 shared mode and exclusive mode, 258 in two-phase commit (2PC) deadlock detection, 364 in-doubt transactions holding locks, 362 materializing conflicts with, 251 preventing lost updates by explicit locking, 244 log sequence number, 156, 449 logic programming languages, 504 logical clocks, 293, 343, 494 for read-after-write consistency, 164 logical logs, 160 logs (data structure), 71, 556 advantages of immutability, 460 compaction, 73, 79, 456, 460 for stream operator state, 479 creating using total order broadcast, 349 implementing uniqueness constraints, 522 log-based messaging, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 disk space usage, 450 replaying old messages, 451, 496, 498 slow consumers, 450 using logs for message storage, 447 log-structured storage, 71-79 log-structured merge tree (see LSMtrees) replication, 152, 158-161 change data capture, 454-457 (see also changelogs) coordination with snapshot, 156 logical (row-based) replication, 160 statement-based replication, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 scalability limits, 493 loose coupling, 396, 419, 502 lost updates (see updates) LSM-trees (indexes), 78-79 comparison to B-trees, 83-85 Lucene (storage engine), 79 building indexes in batch processes, 411 similarity search, 88 Luigi (workflow scheduler), 402 LWW (see last write wins) M machine learning ethical considerations, 534 (see also ethics) iterative processing, 424 models derived from training data, 505 statistical and numerical algorithms, 428 MADlib (machine learning toolkit), 428 magic scaling sauce, 18 Mahout (machine learning toolkit), 428 maintainability, 18-22, 489 defined, 23 design principles for software systems, 19 evolvability (see evolvability) operability, 19 simplicity and managing complexity, 20 many-to-many relationships in document model versus relational model, 39 modeling as graphs, 49 many-to-one and many-to-many relationships, 33-36 many-to-one relationships, 34 MapReduce (batch processing), 390, 399-400 accessing external services within job, 404, 412 comparison to distributed databases designing for frequent faults, 417 diversity of processing models, 416 diversity of storage, 415 Index | 575 comparison to stream processing, 464 comparison to Unix, 413-414 disadvantages and limitations of, 419 fault tolerance, 406, 414, 422 higher-level tools, 403, 426 implementation in Hadoop, 400-403 the shuffle, 402 implementation in MongoDB, 46-48 machine learning, 428 map-side processing, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 mapper and reducer functions, 399 materialization of intermediate state, 419-423 output of batch workflows, 411-413 building search indexes, 411 key-value stores, 412 reduce-side processing, 403-408 analysis of user activity events (exam‐ ple), 404 grouping records by same key, 406 handling skew, 407 sort-merge joins, 405 workflows, 402 marshalling (see encoding) massively parallel processing (MPP), 216 comparison to composing storage technolo‐ gies, 502 comparison to Hadoop, 414-418, 428 master-master replication (see multi-leader replication) master-slave replication (see leader-based repli‐ cation) materialization, 556 aggregate values, 101 conflicts, 251 intermediate state (batch processing), 420-423 materialized views, 101 as derived data, 386, 499-504 maintaining, using stream processing, 467, 475 Maven (Java build tool), 428 Maxwell (change data capture), 455 mean, 14 media monitoring, 467 median, 14 576 | Index meeting room booking (example), 249, 259, 521 membership services, 372 Memcached (caching server), 4, 89 memory in-memory databases, 88 durability, 227 serial transaction execution, 253 in-memory representation of data, 112 random bit-flips in, 529 use by indexes, 72, 77 memory barrier (CPU instruction), 338 MemSQL (database) in-memory storage, 89 read committed isolation, 236 memtable (in LSM-trees), 78 Mercurial (version control system), 463 merge joins, MapReduce map-side, 410 mergeable persistent data structures, 174 merging sorted files, 76, 402, 405 Merkle trees, 532 Mesos (cluster manager), 418, 506 message brokers (see messaging systems) message-passing, 136-139 advantages over direct RPC, 137 distributed actor frameworks, 138 evolvability, 138 MessagePack (encoding format), 116 messages exactly-once semantics, 360, 476 loss of, 442 using total order broadcast, 348 messaging systems, 440-451 (see also streams) backpressure, buffering, or dropping mes‐ sages, 441 brokerless messaging, 442 event logs, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 replaying old messages, 451, 496, 498 slow consumers, 450 message brokers, 443-446 acknowledgements and redelivery, 445 comparison to event logs, 448, 451 multiple consumers of same topic, 444 reliability, 442 uniqueness in log-based messaging, 522 Meteor (web framework), 456 microbatching, 477, 495 microservices, 132 (see also services) causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 Microsoft Azure Service Bus (messaging), 444 Azure Storage, 155, 398 Azure Stream Analytics, 466 DCOM (Distributed Component Object Model), 134 MSDTC (transaction coordinator), 356 Orleans (see Orleans) SQL Server (see SQL Server) migrating (rewriting) data, 40, 130, 461, 497 modulus operator (%), 210 MongoDB (database) aggregation pipeline, 48 atomic operations, 243 BSON, 41 document data model, 31 hash partitioning (sharding), 203-204 key-range partitioning, 202 lack of join support, 34, 42 leader-based replication, 153 MapReduce support, 46, 400 oplog parsing, 455, 456 partition splitting, 212 request routing, 216 secondary indexes, 207 Mongoriver (change data capture), 455 monitoring, 10, 19 monotonic clocks, 288 monotonic reads, 164 MPP (see massively parallel processing) MSMQ (messaging), 361 multi-column indexes, 87 multi-leader replication, 168-177 (see also replication) handling write conflicts, 171 conflict avoidance, 172 converging toward a consistent state, 172 custom conflict resolution logic, 173 determining what is a conflict, 174 linearizability, lack of, 333 replication topologies, 175-177 use cases, 168 clients with offline operation, 170 collaborative editing, 170 multi-datacenter replication, 168, 335 multi-object transactions, 228 need for, 231 Multi-Paxos (total order broadcast), 367 multi-table index cluster tables (Oracle), 41 multi-tenancy, 284 multi-version concurrency control (MVCC), 239, 266 detecting stale MVCC reads, 263 indexes and snapshot isolation, 241 mutual exclusion, 261 (see also locks) MySQL (database) binlog coordinates, 156 binlog parsing for change data capture, 455 circular replication topology, 175 consistent snapshots, 156 distributed transaction support, 361 InnoDB storage engine (see InnoDB) JSON support, 30, 42 leader-based replication, 153 performance of XA transactions, 360 row-based replication, 160 schema changes in, 40 snapshot isolation support, 242 (see also InnoDB) statement-based replication, 159 Tungsten Replicator (multi-leader replica‐ tion), 170 conflict detection, 177 N nanomsg (messaging library), 442 Narayana (transaction coordinator), 356 NATS (messaging), 137 near-real-time (nearline) processing, 390 (see also stream processing) Neo4j (database) Cypher query language, 52 graph data model, 50 Nephele (dataflow engine), 421 netcat (Unix tool), 397 Netflix Chaos Monkey, 7, 280 Network Attached Storage (NAS), 146, 398 network model, 36 Index | 577 graph databases versus, 60 imperative query APIs, 46 Network Time Protocol (see NTP) networks congestion and queueing, 282 datacenter network topologies, 276 faults (see faults) linearizability and network delays, 338 network partitions, 279, 337 timeouts and unbounded delays, 281 next-key locking, 260 nodes (in graphs) (see vertices) nodes (processes), 556 handling outages in leader-based replica‐ tion, 156 system models for failure, 307 noisy neighbors, 284 nonblocking atomic commit, 359 nondeterministic operations accidental nondeterminism, 423 partial failures in distributed systems, 275 nonfunctional requirements, 22 nonrepeatable reads, 238 (see also read skew) normalization (data representation), 33, 556 executing joins, 39, 42, 403 foreign key references, 231 in systems of record, 386 versus denormalization, 462 NoSQL, 29, 499 transactions and, 223 Notation3 (N3), 56 npm (package manager), 428 NTP (Network Time Protocol), 287 accuracy, 289, 293 adjustments to monotonic clocks, 289 multiple server addresses, 306 numbers, in XML and JSON encodings, 114 O object-relational mapping (ORM) frameworks, 30 error handling and aborted transactions, 232 unsafe read-modify-write cycle code, 244 object-relational mismatch, 29 observer pattern, 506 offline systems, 390 (see also batch processing) 578 | Index stateful, offline-capable clients, 170, 511 offline-first applications, 511 offsets consumer offsets in partitioned logs, 449 messages in partitioned logs, 447 OLAP (online analytic processing), 91, 556 data cubes, 102 OLTP (online transaction processing), 90, 556 analytics queries versus, 411 workload characteristics, 253 one-to-many relationships, 30 JSON representation, 32 online systems, 389 (see also services) Oozie (workflow scheduler), 402 OpenAPI (service definition format), 133 OpenStack Nova (cloud infrastructure) use of ZooKeeper, 370 Swift (object storage), 398 operability, 19 operating systems versus databases, 499 operation identifiers, 518, 522 operational transformation, 174 operators, 421 flow of data between, 424 in stream processing, 464 optimistic concurrency control, 261 Oracle (database) distributed transaction support, 361 GoldenGate (change data capture), 161, 170, 455 lack of serializability, 226 leader-based replication, 153 multi-table index cluster tables, 41 not preventing write skew, 248 partitioned indexes, 209 PL/SQL language, 255 preventing lost updates, 245 read committed isolation, 236 Real Application Clusters (RAC), 330 recursive query support, 54 snapshot isolation support, 239, 242 TimesTen (in-memory database), 89 WAL-based replication, 160 XML support, 30 ordering, 339-352 by sequence numbers, 343-348 causal ordering, 339-343 partial order, 341 limits of total ordering, 493 total order broadcast, 348-352 Orleans (actor framework), 139 outliers (response time), 14 Oz (programming language), 504 P package managers, 428, 505 packet switching, 285 packets corruption of, 306 sending via UDP, 442 PageRank (algorithm), 49, 424 paging (see virtual memory) ParAccel (database), 93 parallel databases (see massively parallel pro‐ cessing) parallel execution of graph analysis algorithms, 426 queries in MPP databases, 216 Parquet (data format), 96, 131 (see also column-oriented storage) use in Hadoop, 414 partial failures, 275, 310 limping, 311 partial order, 341 partitioning, 199-218, 556 and replication, 200 in batch processing, 429 multi-partition operations, 514 enforcing constraints, 522 secondary index maintenance, 495 of key-value data, 201-205 by key range, 202 skew and hot spots, 205 rebalancing partitions, 209-214 automatic or manual rebalancing, 213 problems with hash mod N, 210 using dynamic partitioning, 212 using fixed number of partitions, 210 using N partitions per node, 212 replication and, 147 request routing, 214-216 secondary indexes, 206-209 document-based partitioning, 206 term-based partitioning, 208 serial execution of transactions and, 255 Paxos (consensus algorithm), 366 ballot number, 368 Multi-Paxos (total order broadcast), 367 percentiles, 14, 556 calculating efficiently, 16 importance of high percentiles, 16 use in service level agreements (SLAs), 15 Percona XtraBackup (MySQL tool), 156 performance describing, 13 of distributed transactions, 360 of in-memory databases, 89 of linearizability, 338 of multi-leader replication, 169 perpetual inconsistency, 525 pessimistic concurrency control, 261 phantoms (transaction isolation), 250 materializing conflicts, 251 preventing, in serializability, 259 physical clocks (see clocks) pickle (Python), 113 Pig (dataflow language), 419, 427 replicated joins, 409 skewed joins, 407 workflows, 403 Pinball (workflow scheduler), 402 pipelined execution, 423 in Unix, 394 point in time, 287 polyglot persistence, 29 polystores, 501 PostgreSQL (database) BDR (multi-leader replication), 170 causal ordering of writes, 177 Bottled Water (change data capture), 455 Bucardo (trigger-based replication), 161, 173 distributed transaction support, 361 foreign data wrappers, 501 full text search support, 490 leader-based replication, 153 log sequence number, 156 MVCC implementation, 239, 241 PL/pgSQL language, 255 PostGIS geospatial indexes, 87 preventing lost updates, 245 preventing write skew, 248, 261 read committed isolation, 236 recursive query support, 54 representing graphs, 51 Index | 579 serializable snapshot isolation (SSI), 261 snapshot isolation support, 239, 242 WAL-based replication, 160 XML and JSON support, 30, 42 pre-splitting, 212 Precision Time Protocol (PTP), 290 predicate locks, 259 predictive analytics, 533-536 amplifying bias, 534 ethics of (see ethics) feedback loops, 536 preemption of datacenter resources, 418 of threads, 298 Pregel processing model, 425 primary keys, 85, 556 compound primary key (Cassandra), 204 primary-secondary replication (see leaderbased replication) privacy, 536-543 consent and freedom of choice, 538 data as assets and power, 540 deleting data, 463 ethical considerations (see ethics) legislation and self-regulation, 542 meaning of, 539 surveillance, 537 tracking behavioral data, 536 probabilistic algorithms, 16, 466 process pauses, 295-299 processing time (of events), 469 producers (message streams), 440 programming languages dataflow languages, 504 for stored procedures, 255 functional reactive programming (FRP), 504 logic programming, 504 Prolog (language), 61 (see also Datalog) promises (asynchronous operations), 135 property graphs, 50 Cypher query language, 52 Protocol Buffers (data format), 117-121 field tags and schema evolution, 120 provenance of data, 531 publish/subscribe model, 441 publishers (message streams), 440 punch card tabulating machines, 390 580 | Index pure functions, 48 putting computation near data, 400 Q Qpid (messaging), 444 quality of service (QoS), 285 Quantcast File System (distributed filesystem), 398 query languages, 42-48 aggregation pipeline, 48 CSS and XSL, 44 Cypher, 52 Datalog, 60 Juttle, 504 MapReduce querying, 46-48 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 query optimizers, 37, 427 queueing delays (networks), 282 head-of-line blocking, 15 latency and response time, 14 queues (messaging), 137 quorums, 179-182, 556 for leaderless replication, 179 in consensus algorithms, 368 limitations of consistency, 181-183, 334 making decisions in distributed systems, 301 monitoring staleness, 182 multi-datacenter replication, 184 relying on durability, 309 sloppy quorums and hinted handoff, 183 R R-trees (indexes), 87 RabbitMQ (messaging), 137, 444 leader-based replication, 153 race conditions, 225 (see also concurrency) avoiding with linearizability, 331 caused by dual writes, 452 dirty writes, 235 in counter increments, 235 lost updates, 242-246 preventing with event logs, 462, 507 preventing with serializable isolation, 252 write skew, 246-251 Raft (consensus algorithm), 366 sensitivity to network problems, 369 term number, 368 use in etcd, 353 RAID (Redundant Array of Independent Disks), 7, 398 railways, schema migration on, 496 RAMCloud (in-memory storage), 89 ranking algorithms, 424 RDF (Resource Description Framework), 57 querying with SPARQL, 59 RDMA (Remote Direct Memory Access), 276 read committed isolation level, 234-237 implementing, 236 multi-version concurrency control (MVCC), 239 no dirty reads, 234 no dirty writes, 235 read path (derived data), 509 read repair (leaderless replication), 178 for linearizability, 335 read replicas (see leader-based replication) read skew (transaction isolation), 238, 266 as violation of causality, 340 read-after-write consistency, 163, 524 cross-device, 164 read-modify-write cycle, 243 read-scaling architecture, 161 reads as events, 513 real-time collaborative editing, 170 near-real-time processing, 390 (see also stream processing) publish/subscribe dataflow, 513 response time guarantees, 298 time-of-day clocks, 288 rebalancing partitions, 209-214, 556 (see also partitioning) automatic or manual rebalancing, 213 dynamic partitioning, 212 fixed number of partitions, 210 fixed number of partitions per node, 212 problems with hash mod N, 210 recency guarantee, 324 recommendation engines batch process outputs, 412 batch workflows, 403, 420 iterative processing, 424 statistical and numerical algorithms, 428 records, 399 events in stream processing, 440 recursive common table expressions (SQL), 54 redelivery (messaging), 445 Redis (database) atomic operations, 243 durability, 89 Lua scripting, 255 single-threaded execution, 253 usage example, 4 redundancy hardware components, 7 of derived data, 386 (see also derived data) Reed–Solomon codes (error correction), 398 refactoring, 22 (see also evolvability) regions (partitioning), 199 register (data structure), 325 relational data model, 28-42 comparison to document model, 38-42 graph queries in SQL, 53 in-memory databases with, 89 many-to-one and many-to-many relation‐ ships, 33 multi-object transactions, need for, 231 NoSQL as alternative to, 29 object-relational mismatch, 29 relational algebra and SQL, 42 versus document model convergence of models, 41 data locality, 41 relational databases eventual consistency, 162 history, 28 leader-based replication, 153 logical logs, 160 philosophy compared to Unix, 499, 501 schema changes, 40, 111, 130 statement-based replication, 158 use of B-tree indexes, 80 relationships (see edges) reliability, 6-10, 489 building a reliable system from unreliable components, 276 defined, 6, 22 hardware faults, 7 human errors, 9 importance of, 10 of messaging systems, 442 Index | 581 software errors, 8 Remote Method Invocation (Java RMI), 134 remote procedure calls (RPCs), 134-136 (see also services) based on futures, 135 data encoding and evolution, 136 issues with, 134 using Avro, 126, 135 using Thrift, 135 versus message brokers, 137 repeatable reads (transaction isolation), 242 replicas, 152 replication, 151-193, 556 and durability, 227 chain replication, 155 conflict resolution and, 246 consistency properties, 161-167 consistent prefix reads, 165 monotonic reads, 164 reading your own writes, 162 in distributed filesystems, 398 leaderless, 177-191 detecting concurrent writes, 184-191 limitations of quorum consistency, 181-183, 334 sloppy quorums and hinted handoff, 183 monitoring staleness, 182 multi-leader, 168-177 across multiple datacenters, 168, 335 handling write conflicts, 171-175 replication topologies, 175-177 partitioning and, 147, 200 reasons for using, 145, 151 single-leader, 152-161 failover, 157 implementation of replication logs, 158-161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 state machine replication, 349, 452 using erasure coding, 398 with heterogeneous data systems, 453 replication logs (see logs) reprocessing data, 496, 498 (see also evolvability) from log-based messaging, 451 request routing, 214-216 582 | Index approaches to, 214 parallel query execution, 216 resilient systems, 6 (see also fault tolerance) response time as performance metric for services, 13, 389 guarantees on, 298 latency versus, 14 mean and percentiles, 14 user experience, 15 responsibility and accountability, 535 REST (Representational State Transfer), 133 (see also services) RethinkDB (database) document data model, 31 dynamic partitioning, 212 join support, 34, 42 key-range partitioning, 202 leader-based replication, 153 subscribing to changes, 456 Riak (database) Bitcask storage engine, 72 CRDTs, 174, 191 dotted version vectors, 191 gossip protocol, 216 hash partitioning, 203-204, 211 last-write-wins conflict resolution, 186 leaderless replication, 177 LevelDB storage engine, 78 linearizability, lack of, 335 multi-datacenter support, 184 preventing lost updates across replicas, 246 rebalancing, 213 search feature, 209 secondary indexes, 207 siblings (concurrently written values), 190 sloppy quorums, 184 ring buffers, 450 Ripple (cryptocurrency), 532 rockets, 10, 36, 305 RocksDB (storage engine), 78 leveled compaction, 79 rollbacks (transactions), 222 rolling upgrades, 8, 112 routing (see request routing) row-oriented storage, 96 row-based replication, 160 rowhammer (memory corruption), 529 RPCs (see remote procedure calls) Rubygems (package manager), 428 rules (Datalog), 61 S safety and liveness properties, 308 in consensus algorithms, 366 in transactions, 222 sagas (see compensating transactions) Samza (stream processor), 466, 467 fault tolerance, 479 streaming SQL support, 466 sandboxes, 9 SAP HANA (database), 93 scalability, 10-18, 489 approaches for coping with load, 17 defined, 22 describing load, 11 describing performance, 13 partitioning and, 199 replication and, 161 scaling up versus scaling out, 146 scaling out, 17, 146 (see also shared-nothing architecture) scaling up, 17, 146 scatter/gather approach, querying partitioned databases, 207 SCD (slowly changing dimension), 476 schema-on-read, 39 comparison to evolvable schema, 128 in distributed filesystems, 415 schema-on-write, 39 schemaless databases (see schema-on-read) schemas, 557 Avro, 122-127 reader determining writer’s schema, 125 schema evolution, 123 dynamically generated, 126 evolution of, 496 affecting application code, 111 compatibility checking, 126 in databases, 129-131 in message-passing, 138 in service calls, 136 flexibility in document model, 39 for analytics, 93-95 for JSON and XML, 115 merits of, 127 schema migration on railways, 496 Thrift and Protocol Buffers, 117-121 schema evolution, 120 traditional approach to design, fallacy in, 462 searches building search indexes in batch processes, 411 k-nearest neighbors, 429 on streams, 467 partitioned secondary indexes, 206 secondaries (see leader-based replication) secondary indexes, 85, 557 partitioning, 206-209, 217 document-partitioned, 206 index maintenance, 495 term-partitioned, 208 problems with dual writes, 452, 491 updating, transaction isolation and, 231 secondary sorts, 405 sed (Unix tool), 392 self-describing files, 127 self-joins, 480 self-validating systems, 530 semantic web, 57 semi-synchronous replication, 154 sequence number ordering, 343-348 generators, 294, 344 insufficiency for enforcing constraints, 347 Lamport timestamps, 345 use of timestamps, 291, 295, 345 sequential consistency, 351 serializability, 225, 233, 251-266, 557 linearizability versus, 329 pessimistic versus optimistic concurrency control, 261 serial execution, 252-256 partitioning, 255 using stored procedures, 253, 349 serializable snapshot isolation (SSI), 261-266 detecting stale MVCC reads, 263 detecting writes that affect prior reads, 264 distributed execution, 265, 364 performance of SSI, 265 preventing write skew, 262-265 two-phase locking (2PL), 257-261 index-range locks, 260 performance, 258 Serializable (Java), 113 Index | 583 serialization, 113 (see also encoding) service discovery, 135, 214, 372 using DNS, 216, 372 service level agreements (SLAs), 15 service-oriented architecture (SOA), 132 (see also services) services, 131-136 microservices, 132 causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 remote procedure calls (RPCs), 134-136 issues with, 134 similarity to databases, 132 web services, 132, 135 session windows (stream processing), 472 (see also windows) sessionization, 407 sharding (see partitioning) shared mode (locks), 258 shared-disk architecture, 146, 398 shared-memory architecture, 146 shared-nothing architecture, 17, 146-147, 557 (see also replication) distributed filesystems, 398 (see also distributed filesystems) partitioning, 199 use of network, 277 sharks biting undersea cables, 279 counting (example), 46-48 finding (example), 42 website about (example), 44 shredding (in relational model), 38 siblings (concurrent values), 190, 246 (see also conflicts) similarity search edit distance, 88 genome data, 63 k-nearest neighbors, 429 single-leader replication (see leader-based rep‐ lication) single-threaded execution, 243, 252 in batch processing, 406, 421, 426 in stream processing, 448, 463, 522 size-tiered compaction, 79 skew, 557 584 | Index clock skew, 291-294, 334 in transaction isolation read skew, 238, 266 write skew, 246-251, 262-265 (see also write skew) meanings of, 238 unbalanced workload, 201 compensating for, 205 due to celebrities, 205 for time-series data, 203 in batch processing, 407 slaves (see leader-based replication) sliding windows (stream processing), 472 (see also windows) sloppy quorums, 183 (see also quorums) lack of linearizability, 334 slowly changing dimension (data warehouses), 476 smearing (leap seconds adjustments), 290 snapshots (databases) causal consistency, 340 computing derived data, 500 in change data capture, 455 serializable snapshot isolation (SSI), 261-266, 329 setting up a new replica, 156 snapshot isolation and repeatable read, 237-242 implementing with MVCC, 239 indexes and MVCC, 241 visibility rules, 240 synchronized clocks for global snapshots, 294 snowflake schemas, 95 SOAP, 133 (see also services) evolvability, 136 software bugs, 8 maintaining integrity, 529 solid state drives (SSDs) access patterns, 84 detecting corruption, 519, 530 faults in, 227 sequential write throughput, 75 Solr (search server) building indexes in batch processes, 411 document-partitioned indexes, 207 request routing, 216 usage example, 4 use of Lucene, 79 sort (Unix tool), 392, 394, 395 sort-merge joins (MapReduce), 405 Sorted String Tables (see SSTables) sorting sort order in column storage, 99 source of truth (see systems of record) Spanner (database) data locality, 41 snapshot isolation using clocks, 295 TrueTime API, 294 Spark (processing framework), 421-423 bytecode generation, 428 dataflow APIs, 427 fault tolerance, 422 for data warehouses, 93 GraphX API (graph processing), 425 machine learning, 428 query optimizer, 427 Spark Streaming, 466 microbatching, 477 stream processing on top of batch process‐ ing, 495 SPARQL (query language), 59 spatial algorithms, 429 split brain, 158, 557 in consensus algorithms, 352, 367 preventing, 322, 333 using fencing tokens to avoid, 302-304 spreadsheets, dataflow programming capabili‐ ties, 504 SQL (Structured Query Language), 21, 28, 43 advantages and limitations of, 416 distributed query execution, 48 graph queries in, 53 isolation levels standard, issues with, 242 query execution on Hadoop, 416 résumé (example), 30 SQL injection vulnerability, 305 SQL on Hadoop, 93 statement-based replication, 158 stored procedures, 255 SQL Server (database) data warehousing support, 93 distributed transaction support, 361 leader-based replication, 153 preventing lost updates, 245 preventing write skew, 248, 257 read committed isolation, 236 recursive query support, 54 serializable isolation, 257 snapshot isolation support, 239 T-SQL language, 255 XML support, 30 SQLstream (stream analytics), 466 SSDs (see solid state drives) SSTables (storage format), 76-79 advantages over hash indexes, 76 concatenated index, 204 constructing and maintaining, 78 making LSM-Tree from, 78 staleness (old data), 162 cross-channel timing dependencies, 331 in leaderless databases, 178 in multi-version concurrency control, 263 monitoring for, 182 of client state, 512 versus linearizability, 324 versus timeliness, 524 standbys (see leader-based replication) star replication topologies, 175 star schemas, 93-95 similarity to event sourcing, 458 Star Wars analogy (event time versus process‐ ing time), 469 state derived from log of immutable events, 459 deriving current state from the event log, 458 interplay between state changes and appli‐ cation code, 507 maintaining derived state, 495 maintenance by stream processor in streamstream joins, 473 observing derived state, 509-515 rebuilding after stream processor failure, 478 separation of application code and, 505 state machine replication, 349, 452 statement-based replication, 158 statically typed languages analogy to schema-on-write, 40 code generation and, 127 statistical and numerical algorithms, 428 StatsD (metrics aggregator), 442 stdin, stdout, 395, 396 Stellar (cryptocurrency), 532 Index | 585 stock market feeds, 442 STONITH (Shoot The Other Node In The Head), 158 stop-the-world (see garbage collection) storage composing data storage technologies, 499-504 diversity of, in MapReduce, 415 Storage Area Network (SAN), 146, 398 storage engines, 69-104 column-oriented, 95-101 column compression, 97-99 defined, 96 distinction between column families and, 99 Parquet, 96, 131 sort order in, 99-100 writing to, 101 comparing requirements for transaction processing and analytics, 90-96 in-memory storage, 88 durability, 227 row-oriented, 70-90 B-trees, 79-83 comparing B-trees and LSM-trees, 83-85 defined, 96 log-structured, 72-79 stored procedures, 161, 253-255, 557 and total order broadcast, 349 pros and cons of, 255 similarity to stream processors, 505 Storm (stream processor), 466 distributed RPC, 468, 514 Trident state handling, 478 straggler events, 470, 498 stream processing, 464-481, 557 accessing external services within job, 474, 477, 478, 517 combining with batch processing lambda architecture, 497 unifying technologies, 498 comparison to batch processing, 464 complex event processing (CEP), 465 fault tolerance, 476-479 atomic commit, 477 idempotence, 478 microbatching and checkpointing, 477 rebuilding state after a failure, 478 for data integration, 494-498 586 | Index maintaining derived state, 495 maintenance of materialized views, 467 messaging systems (see messaging systems) reasoning about time, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 types of windows, 472 relation to databases (see streams) relation to services, 508 search on streams, 467 single-threaded execution, 448, 463 stream analytics, 466 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 streams, 440-451 end-to-end, pushing events to clients, 512 messaging systems (see messaging systems) processing (see stream processing) relation to databases, 451-464 (see also changelogs) API support for change streams, 456 change data capture, 454-457 derivative of state by time, 460 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 topics, 440 strict serializability, 329 strong consistency (see linearizability) strong one-copy serializability, 329 subjects, predicates, and objects (in triplestores), 55 subscribers (message streams), 440 (see also consumers) supercomputers, 275 surveillance, 537 (see also privacy) Swagger (service definition format), 133 swapping to disk (see virtual memory) synchronous networks, 285, 557 comparison to asynchronous networks, 284 formal model, 307 synchronous replication, 154, 557 chain replication, 155 conflict detection, 172 system models, 300, 306-310 assumptions in, 528 correctness of algorithms, 308 mapping to the real world, 309 safety and liveness, 308 systems of record, 386, 557 change data capture, 454, 491 treating event log as, 460 systems thinking, 536 T t-digest (algorithm), 16 table-table joins, 474 Tableau (data visualization software), 416 tail (Unix tool), 447 tail vertex (property graphs), 51 Tajo (query engine), 93 Tandem NonStop SQL (database), 200 TCP (Transmission Control Protocol), 277 comparison to circuit switching, 285 comparison to UDP, 283 connection failures, 280 flow control, 282, 441 packet checksums, 306, 519, 529 reliability and duplicate suppression, 517 retransmission timeouts, 284 use for transaction sessions, 229 telemetry (see monitoring) Teradata (database), 93, 200 term-partitioned indexes, 208, 217 termination (consensus), 365 Terrapin (database), 413 Tez (dataflow engine), 421-423 fault tolerance, 422 support by higher-level tools, 427 thrashing (out of memory), 297 threads (concurrency) actor model, 138, 468 (see also message-passing) atomic operations, 223 background threads, 73, 85 execution pauses, 286, 296-298 memory barriers, 338 preemption, 298 single (see single-threaded execution) three-phase commit, 359 Thrift (data format), 117-121 BinaryProtocol, 118 CompactProtocol, 119 field tags and schema evolution, 120 throughput, 13, 390 TIBCO, 137 Enterprise Message Service, 444 StreamBase (stream analytics), 466 time concurrency and, 187 cross-channel timing dependencies, 331 in distributed systems, 287-299 (see also clocks) clock synchronization and accuracy, 289 relying on synchronized clocks, 291-295 process pauses, 295-299 reasoning about, in stream processors, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 timestamp of events, 471 types of windows, 472 system models for distributed systems, 307 time-dependence in stream joins, 475 time-of-day clocks, 288 timeliness, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 timeouts, 279, 557 dynamic configuration of, 284 for failover, 158 length of, 281 timestamps, 343 assigning to events in stream processing, 471 for read-after-write consistency, 163 for transaction ordering, 295 insufficiency for enforcing constraints, 347 key range partitioning by, 203 Lamport, 345 logical, 494 ordering events, 291, 345 Titan (database), 50 tombstones, 74, 191, 456 topics (messaging), 137, 440 total order, 341, 557 limits of, 493 sequence numbers or timestamps, 344 total order broadcast, 348-352, 493, 522 consensus algorithms and, 366-368 Index | 587 implementation in ZooKeeper and etcd, 370 implementing with linearizable storage, 351 using, 349 using to implement linearizable storage, 350 tracking behavioral data, 536 (see also privacy) transaction coordinator (see coordinator) transaction manager (see coordinator) transaction processing, 28, 90-95 comparison to analytics, 91 comparison to data warehousing, 93 transactions, 221-267, 558 ACID properties of, 223 atomicity, 223 consistency, 224 durability, 226 isolation, 225 compensating (see compensating transac‐ tions) concept of, 222 distributed transactions, 352-364 avoiding, 492, 502, 521-528 failure amplification, 364, 495 in doubt/uncertain status, 358, 362 two-phase commit, 354-359 use of, 360-361 XA transactions, 361-364 OLTP versus analytics queries, 411 purpose of, 222 serializability, 251-266 actual serial execution, 252-256 pessimistic versus optimistic concur‐ rency control, 261 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 single-object and multi-object, 228-232 handling errors and aborts, 231 need for multi-object transactions, 231 single-object writes, 230 snapshot isolation (see snapshots) weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-238 transitive closure (graph algorithm), 424 trie (data structure), 88 triggers (databases), 161, 441 implementing change data capture, 455 implementing replication, 161 588 | Index triple-stores, 55-59 SPARQL query language, 59 tumbling windows (stream processing), 472 (see also windows) in microbatching, 477 tuple spaces (programming model), 507 Turtle (RDF data format), 56 Twitter constructing home timelines (example), 11, 462, 474, 511 DistributedLog (event log), 448 Finagle (RPC framework), 135 Snowflake (sequence number generator), 294 Summingbird (processing library), 497 two-phase commit (2PC), 353, 355-359, 558 confusion with two-phase locking, 356 coordinator failure, 358 coordinator recovery, 363 how it works, 357 issues in practice, 363 performance cost, 360 transactions holding locks, 362 two-phase locking (2PL), 257-261, 329, 558 confusion with two-phase commit, 356 index-range locks, 260 performance of, 258 type checking, dynamic versus static, 40 U UDP (User Datagram Protocol) comparison to TCP, 283 multicast, 442 unbounded datasets, 439, 558 (see also streams) unbounded delays, 558 in networks, 282 process pauses, 296 unbundling databases, 499-515 composing data storage technologies, 499-504 federation versus unbundling, 501 need for high-level language, 503 designing applications around dataflow, 504-509 observing derived state, 509-515 materialized views and caching, 510 multi-partition data processing, 514 pushing state changes to clients, 512 uncertain (transaction status) (see in doubt) uniform consensus, 365 (see also consensus) uniform interfaces, 395 union type (in Avro), 125 uniq (Unix tool), 392 uniqueness constraints asynchronously checked, 526 requiring consensus, 521 requiring linearizability, 330 uniqueness in log-based messaging, 522 Unix philosophy, 394-397 command-line batch processing, 391-394 Unix pipes versus dataflow engines, 423 comparison to Hadoop, 413-414 comparison to relational databases, 499, 501 comparison to stream processing, 464 composability and uniform interfaces, 395 loose coupling, 396 pipes, 394 relation to Hadoop, 499 UPDATE statement (SQL), 40 updates preventing lost updates, 242-246 atomic write operations, 243 automatically detecting lost updates, 245 compare-and-set operations, 245 conflict resolution and replication, 246 using explicit locking, 244 preventing write skew, 246-251 V validity (consensus), 365 vBuckets (partitioning), 199 vector clocks, 191 (see also version vectors) vectorized processing, 99, 428 verification, 528-533 avoiding blind trust, 530 culture of, 530 designing for auditability, 531 end-to-end integrity checks, 531 tools for auditable data systems, 532 version control systems, reliance on immutable data, 463 version vectors, 177, 191 capturing causal dependencies, 343 versus vector clocks, 191 Vertica (database), 93 handling writes, 101 replicas using different sort orders, 100 vertical scaling (see scaling up) vertices (in graphs), 49 property graph model, 50 Viewstamped Replication (consensus algo‐ rithm), 366 view number, 368 virtual machines, 146 (see also cloud computing) context switches, 297 network performance, 282 noisy neighbors, 284 reliability in cloud services, 8 virtualized clocks in, 290 virtual memory process pauses due to page faults, 14, 297 versus memory management by databases, 89 VisiCalc (spreadsheets), 504 vnodes (partitioning), 199 Voice over IP (VoIP), 283 Voldemort (database) building read-only stores in batch processes, 413 hash partitioning, 203-204, 211 leaderless replication, 177 multi-datacenter support, 184 rebalancing, 213 reliance on read repair, 179 sloppy quorums, 184 VoltDB (database) cross-partition serializability, 256 deterministic stored procedures, 255 in-memory storage, 89 output streams, 456 secondary indexes, 207 serial execution of transactions, 253 statement-based replication, 159, 479 transactions in stream processing, 477 W WAL (write-ahead log), 82 web services (see services) Web Services Description Language (WSDL), 133 webhooks, 443 webMethods (messaging), 137 WebSocket (protocol), 512 Index | 589 windows (stream processing), 466, 468-472 infinite windows for changelogs, 467, 474 knowing when all events have arrived, 470 stream joins within a window, 473 types of windows, 472 winners (conflict resolution), 173 WITH RECURSIVE syntax (SQL), 54 workflows (MapReduce), 402 outputs, 411-414 key-value stores, 412 search indexes, 411 with map-side joins, 410 working set, 393 write amplification, 84 write path (derived data), 509 write skew (transaction isolation), 246-251 characterizing, 246-251, 262 examples of, 247, 249 materializing conflicts, 251 occurrence in practice, 529 phantoms, 250 preventing in snapshot isolation, 262-265 in two-phase locking, 259-261 options for, 248 write-ahead log (WAL), 82, 159 writes (database) atomic write operations, 243 detecting writes affecting prior reads, 264 preventing dirty writes with read commit‐ ted, 235 WS-* framework, 133 (see also services) WS-AtomicTransaction (2PC), 355 590 | Index X XA transactions, 355, 361-364 heuristic decisions, 363 limitations of, 363 xargs (Unix tool), 392, 396 XML binary variants, 115 encoding RDF data, 57 for application data, issues with, 114 in relational databases, 30, 41 XSL/XPath, 45 Y Yahoo!

If you lose derived data, you can recreate it from the original source. A classic example is a cache: data can be served from the cache if present, but if the cache doesn’t contain what you need, you can fall back to the underlying database. Denormalized values, indexes, and materialized views also fall into this category. In recommendation systems, predictive summary data is often derived from usage logs. Technically speaking, derived data is redundant, in the sense that it duplicates exist‐ ing information. However, it is often essential for getting good performance on read queries. It is commonly denormalized. You can derive several different datasets from a single source, enabling you to look at the data from different “points of view.”

Remix: Making Art and Commerce Thrive in the Hybrid Economy
by Lawrence Lessig
Published 2 Jan 2009

Jeff Jarvis, journalist and blogger, suggests companies “pay dividends back to [the] crowd” and avoid trying too hard “to control [the gathered] 80706 i-xxiv 001-328 r4nk.indd 233 8/12/08 1:55:56 AM REMI X 234 wisdom, and limit its use and the sharing of it.”19 Tapscott and Williams make the same recommendation: “platforms for participation will only remain viable for as long as all the stakeholders are adequately and appropriately compensated for their contributions— don’t expect a free ride forever.”20 The key word here is “appropriately.” Obviously, there must be adequate compensation. But the kind of compensation is the puzzle.

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
by Valliappa Lakshmanan , Sara Robinson and Michael Munn
Published 31 Oct 2020

Feature Store Transform Reframing Hashed Feature Cascade Neutral Class Two-Phase Predictions Stateless Serving Function Windowed Inference Recommendation Systems Recommender systems are one of the most widespread applications of machine learning in business and they often arise whenever users interact with items. Recommender systems capture features of past behavior and similar users and recommend items most relevant for a given user. Think of how YouTube will recommend a series of videos for you to watch based on your watch history, or Amazon may recommend purchases based on items in your shopping cart. Recommendation systems are popular throughout many businesses, particularly for product recommendation, personalized and dynamic marketing, and streaming video or music platforms.

A recent paper that beats all benchmarks at predicting protein folding structure also predicts the distance between amino acids as a 64-way classification problem where the distances are bucketized into 64 bins. Another reason to reframe a problem is when the objective is better in the other type of model. For example, suppose we are trying to build a recommendation system for videos. A natural way to frame this problem is as a classification problem of predicting whether a user is likely to watch a certain video. This framing, however, can lead to a recommendation system that prioritizes click bait. It might be better to reframe this into a regression problem of predicting the fraction of the video that will be watched. Why It Works Changing the context and reframing the task of a problem can help when building a machine learning solution.

Cached results of batch serving We discussed batch serving as a way to invoke a model over millions of items when the model is normally served online using the Stateless Serving Function design pattern. Of course, it is possible for batch serving to work even if the model does not support online serving. What matters is that the machine learning framework doing inference is capable of taking advantage of embarrassingly parallel processing. Recommendation engines, for example, need to fill out a sparse matrix consisting of every user–item pair. A typical business might have 10 million all-time users and 10,000 items in the product catalog. In order to make a recommendation for a user, recommendation scores have to be computed for each of the 10,000 items, ranked, and the top 5 presented to the user.

pages: 320 words: 87,853

The Black Box Society: The Secret Algorithms That Control Money and Information
by Frank Pasquale
Published 17 Nov 2014

A bad credit score may cost a borrower hundreds of thousands of dollars, but he will never understand exactly how it was calculated. A predictive INTRODUCTION—THE NEED TO KNOW 5 analytics firm may score someone as a “high cost” or “unreliable” worker, yet never tell her about the decision. More benignly, perhaps, these companies influence the choices we make ourselves. Recommendation engines at Amazon and YouTube affect an automated familiarity, gently suggesting offerings they think we’ll like. But don’t discount the significance of that “perhaps.” The economic, political, and cultural agendas behind their suggestions are hard to unravel. As middlemen, they specialize in shifting alliances, sometimes advancing the interests of customers, sometimes suppliers: all to orchestrate an online world that maximizes their own profits.

Similar protocols also influence— invisibly—not only the route we take to a new restaurant, but which restaurant Google, Yelp, OpenTable, or Siri recommends to us. They might help us fi nd reviews of the car we drive. Yet choosing a car, or even a restaurant, is not as straightforward as optimizing an engine or routing a drive. Does the recommendation engine take into account, say, whether the restaurant or car company gives its workers health benefits or maternity leave? Could we prompt it to do so? In their race for the most profitable methods of mapping social reality, the data scientists of Silicon Valley and Wall Street tend to treat recommendations as purely technical problems.

Even if it is the former, we should note that Google’s autosuggest feature may have automatically entered the word “bomb” after “pressure cooker” while he was 228 NOTES TO PAGES 21–23 typing— certainly many people would have done the search in the days after the Boston bombing merely to learn just how lethal such an attack could be. The police had no way of knowing whether Catalano had actually typed “bomb” himself, or accidentally clicked on it thanks to Google’s increasingly aggressive recommendation engines. See also Philip Bump, “Update: Now We Know Why Googling ‘Pressure Cookers’ Gets a Visit from the Cops,” The Wire, August 1, 2013, http://www.thewire.com /national /2013/08/government-knocking -doors-because-google-searches/67864 /#.UfqCSAXy7zQ.facebook. 10. Martin Kuhn, Federal Dataveillance: Implications for Constitutional Privacy Protections (New York: LFB Scholarly Publishing, 2007), 178. 11.

pages: 398 words: 86,855

Bad Data Handbook
by Q. Ethan McCallum
Published 14 Nov 2012

Facebook is powered by its Open Graph, the “people and the connections they have to everything they care about.”[68] Facebook provides an API to access this social network and make it available for integration into other networked datasets. On Twitter, the network structure resulting from friends and followers leads to recommendations of “Who to follow.” On LinkedIn, network-based recommendations include “Jobs you may be interested in” and “Groups you may like.” The recommendation engine hunch.com is built on a “Taste Graph” that “uses signals from around the Web to map members with their predicted affinity for products, services, other people, websites, or just about anything, and customizes recommended topics for them.”[69] A search on Google can be considered a type of recommendation about which of possibly millions of search hits are most relevant for a particular query.

[63] http://en.wikipedia.org/wiki/File:KochFlake.svg [64] http://blueprints.tinkerpop.com [65] http://gremlin.tinkerpop.com [66] http://gremlin.tinkerpop.com/Path-Pattern [67] Ted G. Lewis. 2009. Network Science: Theory and Applications. Wiley Publishing. [68] http://developers.facebook.com/docs/opengraph [69] “eBay Acquires Recommendation Engine Hunch.com,” http://www.businesswire.com/news/home/20111121005831/en [70] Brin, S.; Page, L. 1998. “The anatomy of a large-scale hypertextual Web search engine.” Computer Networks and ISDN Systems 30: 107–117 Chapter 14. Myths of Cloud Computing Steve Francia Myths are an important and natural part of the emergence of any new technology, product, or idea as identified by the hype cycle.

I’ve written code to process accelerometer and hydrophone signals for analysis of dams and other large structures (as an undergraduate student in Engineering at Harvey Mudd College), analyzed recordings of calls from various species of bats (as a graduate student in Electrical Engineering at the University of Washington), built systems to visualize imaging sonar data (as a Graduate Research Assistant at the Applied Physics Lab), used large amounts of crawled web content to build content filtering systems (as the co-founder and CTO of N2H2, Inc.), designed intranet search systems for portal software (at DataChannel), and combined multiple sets of directory assistance data into a searchable website (as CTO at WhitePages.com). For the past five years or so, I’ve spent most of my time at Demand Media using a wide variety of data sources to build optimization systems for advertising and content recommendation systems, with various side excursions into large-scale data-driven search engine optimization (SEO) and search engine marketing (SEM). Most of my examples will be related to work I’ve done in Ad Optimization, Content Recommendation, SEO, and SEM. These areas, as with most, have their own terminology, so a few term definitions may be helpful.

pages: 475 words: 134,707

The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt
by Sinan Aral
Published 14 Sep 2020

Instagram began blocking antivaccine-related hashtags like #vaccinescauseautism and #vaccinesarepoison. YouTube announced it is no longer allowing users to monetize antivaccine videos with ads. Pinterest banned searches for vaccine content. Facebook stopped showing pages and groups featuring antivaccine content and tweaked its recommendation engines to stop suggesting users join these groups. They also took down the Facebook ads that Larry Cook and others had been buying. The social platforms took similar steps to stem the spread of coronavirus fake news in 2020. Will these measures help slow the coronavirus, measles outbreaks, and future pandemics?

The Transparency Paradox Immediately after the Cambridge Analytica scandal broke, in an interview by Martin Giles for the MIT Technology Review, I predicted the Hype Machine was about to face a dilemma that would pull it in competing directions. On the one hand, social media platforms would face pressure to be more open and transparent about their inner workings: how their trending and ad-targeting algorithms work, how misinformation diffuses through them, and whether recommendation engines increase polarization. The world wanted Facebook and Twitter to open the kimono and reveal how it all worked, so we could understand how to use and fix social media. On the other hand, the Hype Machine would also be pushed to protect our privacy and security, to lock down consumer data, to stop sharing private information with third parties, and to protect us from data breaches like Cambridge Analytica’s.

In this case, it’s important, because if people with more economic opportunity tend to develop more diverse networks (rather than the networks providing the opportunity), then the Hype Machine is more likely to reflect economic opportunity than to create it. How important is the machine in all this? Do we just replicate our existing social networks on social media, or do the Hype Machine’s recommendation engines provide us with new economic opportunities? Erik Brynjolfsson and I collaborated with Ya Xu and Guillaume Saint-Jacques of LinkedIn to find out. Guillaume was our PhD student at MIT before going to work for Ya, LinkedIn’s director of data science. The collaboration allowed us to test the cause and effect relationship between weak ties and job mobility.

pages: 201 words: 63,192

Graph Databases
by Ian Robinson , Jim Webber and Emil Eifrem
Published 13 Jun 2013

Common Use Cases | 95 As in the social use case, making an effective recommendation depends on under‐ standing the connections between things, as well as the quality and strength of those connections—all of which are best expressed as a property graph. Queries are primarily graph local, in that they start with one or more identifiable subjects, whether people or resources, and thereafter discover surrounding portions of the graph. Taken together, social networks and recommendation engines provide key differenti‐ ating capabilities in the areas of retail, recruitment, sentiment analysis, search, and knowledge management. Graphs are a good fit for the densely connected data structures germane to each of these areas; storing and querying this data using a graph database allows an application to surface end-user realtime results that reflect recent changes to the data, rather than pre-calculated, stale results.

. • Sparse tables with nullable columns require special checking in code, despite the presence of a schema. • Several expensive joins are needed just to discover what a customer bought. • Reciprocal queries are even more costly. “What products did a customer buy?” is relatively cheap compared to “which customers bought this product?”, which is the basis of recommendation systems. We could introduce an index, but even with an index, recursive questions such as “which customers bought this product who also bought that product?” quickly become prohibitively expensive as the degree of re‐ cursion increases. Relational databases struggle with highly-connected domains.

pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence
by John Brockman
Published 5 Oct 2015

Conceptually, autonomous or artificial intelligence systems can develop in two ways: either as an extension of human thinking or as radically new thinking. Call the first “Humanoid Thinking,” or Humanoid AI, and the second “Alien Thinking,” or Alien AI. Almost all AI today is Humanoid Thinking. We use AI to solve problems too difficult, time-consuming, or boring for our limited brains to process: electrical-grid balancing, recommendation engines, self-driving cars, face recognition, trading algorithms, and the like. These artificial agents work in narrow domains with clear goals their human creators specify. Such AI aims to accomplish human objectives—often better, with fewer cognitive errors, distractions, outbursts of bad temper, or processing limitations.

Computer programs can keep track of a student’s performance, and some provide corrective feedback for common errors. But each brain is different, and there’s no substitute for a human teacher who has a long-term relationship with the student. Is it possible to create an artificial mentor for each student? We already have recommender systems on the Internet that tell us, “If you liked X, you might also like Y,” based on data of many others with similar patterns of preference. Someday the mind of each student may be tracked from childhood by a personalized deep-learning system. To achieve this level of understanding of a human mind is beyond the capabilities of current technology, but there are already efforts at Facebook to use their vast social database of friends, photos, and likes to create a Theory of Mind for every person on the planet.

To achieve this level of understanding of a human mind is beyond the capabilities of current technology, but there are already efforts at Facebook to use their vast social database of friends, photos, and likes to create a Theory of Mind for every person on the planet. So my prediction is that as more and more cognitive appliances, like chess-playing programs and recommender systems are devised, humans will become smarter and more capable. SHALLOW LEARNING SETH LLOYD Professor of quantum mechanical engineering, MIT; author, Programming the Universe Pity the poor folks at the National Security Agency: They’re spying on everyone (quelle surprise!) and everyone is annoyed at them.

pages: 390 words: 120,864

Stolen Focus: Why You Can't Pay Attention--And How to Think Deeply Again
by Johann Hari
Published 25 Jan 2022

He told me that literature is full of stories where humans create something in a burst of optimism and then lose control of their creation. Dr. Frankenstein creates a monster only for it to escape from him and commit murder. Aza began to think about these stories when he talked with his friends who were engineers working for some of the most famous websites in the world. He would ask them basic questions, like why their recommendation engines recommend one thing over another, and, he said to me, “they’re like: ‘We’re not sure why it’s recommending those things.’ ” They’re not lying—they have set up a technology that is doing things they don’t fully comprehend. He always says to them: “Isn’t that exactly the moment, in the allegories, where you turn the thing off—[when] it’s starting to do things you can’t predict?”

Facebook and Instagram and the others could simply turn off infinite scroll—so that when you get to the bottom of the screen, you have to make a conscious decision to carry on scrolling. Similarly, these sites could simply switch off the things that have been shown to most polarize people politically, stealing our ability to pay collective attention. Since there’s evidence YouTube’s recommendation engine is radicalizing people, Tristan told one interviewer: “Just turn it off. They can turn it off in a heartbeat.” It’s not as if, he points out, the day before recommendations were introduced, people were lost and clamoring for somebody to tell them what to watch next. Once the most obvious forms of mental pollution have been stopped, they said, we can begin to look deeper, at how these sites could be redesigned to make it easier for you to restrain yourself and think about your longer-term goals.

This meant that across the world, people were seeing in their Facebook feeds racist, fascist, and even Nazi groups next to the words: “Groups You Should Join.” They warned that in Germany, one-third of all the political groups on the site were extremist. Facebook’s own team was blunt, concluding: “Our recommendation systems grow the problem.” After carefully analyzing all the options, Facebook’s scientists concluded there was one solution: they said Facebook would have to abandon its current business model. Because their growth was so tied up with toxic outcomes, the company should abandon attempts at growth.

pages: 579 words: 160,351

Breaking News: The Remaking of Journalism and Why It Matters Now
by Alan Rusbridger
Published 14 Oct 2018

Web 2.0 – the thing Emily had warned was going to take over the world – was now called social media. The GMG CEO Carolyn McCall and I took another swing to the West Coast to see what was on the horizon. We dropped in on Flickr, the picture-sharing platform; on Yahoo; on Google; on Topix.net, a content aggregator in Palo Alto. We had drinks with the founders of Digg, a social recommendation platform; tea with Knight Ridder in San Jose; coffee with Real Networks and then on to Microsoft in Seattle. So many people trying so many different things; vast sums of money in play; the speed of development; the seeming impossibility of picking who would be the next big thing and who, in a couple of months, would have shut up shop or sold out.

pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 16 Mar 2017

The discussion in Chapter 2 was focused around OLTP-style use: quickly executing queries to find a small number of vertices matching certain criteria. It is also interesting to look at graphs in a batch processing context, where the goal is to perform some kind of offline processing or analysis on an entire graph. This need often arises in machine learning applications such as recommendation engines, or in ranking systems. For example, one of the most famous graph analysis algorithms is PageRank [69], which tries to estimate the popularity of a web page based on what other web pages link to it. It is used as part of the formula that determines the order in which web search engines present their results.

If you lose derived data, you can recreate it from the original source. A classic example is a cache: data can be served from the cache if present, but if the cache doesn’t contain what you need, you can fall back to the underlying database. Denormalized values, indexes, and materialized views also fall into this category. In recommendation systems, predictive summary data is often derived from usage logs. Technically speaking, derived data is redundant, in the sense that it duplicates existing information. However, it is often essential for getting good performance on read queries. It is commonly denormalized. You can derive several different datasets from a single source, enabling you to look at the data from different “points of view.”

To handle these dependencies between job executions, various workflow schedulers for Hadoop have been developed, including Oozie, Azkaban, Luigi, Airflow, and Pinball [28]. These schedulers also have management features that are useful when maintaining a large collection of batch jobs. Workflows consisting of 50 to 100 MapReduce jobs are common when building recommendation systems [29], and in a large organization, many different teams may be running different jobs that read each other’s output. Tool support is important for managing such complex dataflows. Various higher-level tools for Hadoop, such as Pig [30], Hive [31], Cascading [32], Crunch [33], and FlumeJava [34], also set up workflows of multiple MapReduce stages that are automatically wired together appropriately.

pages: 254 words: 79,052

Evil by Design: Interaction Design to Lead Us Into Temptation
by Chris Nodder
Published 4 Jun 2013

These configurators reduce the load on customers by presenting options in groups (drivetrain, body, interior) and by also offering packages that combine many options into a single trim level, ideal for the satisficers. To reduce the confusion caused by the number of options while still retaining the perception of quality, many sites employ recommendation engines or filters. Recommendation engines provide a small set of options based on either comparison with prior behavior or on answers to a set of preference questions. Netflix uses a recommendation engine to suggest new movies based on ones that customers have already watched. Its business is so dependent upon this functionality that it recently offered a one million dollar prize to anyone who could increase the accuracy of the engine by more than 10 percent.

So the trick is to demonstrate that you have sufficient options to keep the maximizers happy but also provide tools that allow both the maximizers and the satisficers to find the options they want quickly. The three techniques you can use (alone or in combination) are to present many compatible choices, to use a recommendation engine or filter, and to offer a best choice guarantee. Brands that offer greater variety of compatible (that is, focused and internally consistent) options are perceived as having greater commitment and expertise in the category, which, in turn, enhances their perceived quality and purchase likelihood.

Its business is so dependent upon this functionality that it recently offered a one million dollar prize to anyone who could increase the accuracy of the engine by more than 10 percent. Currently, 75 percent of movies watched on Netflix come from a recommendation made by the site. Recommendation engines are a great way to limit choice from an otherwise overwhelming quantity of items. (Netflix.com) Filters rely less on preference algorithms and more on on-screen choices. Customers refine a product search by choosing successive properties of the product they are looking for—size, color, style, and brand—until they have narrowed the set down to a manageable group. Because individuals are responsible for each successive decision, they should still feel invested in the outcome but not overwhelmed by the number of items they’ve discarded during the process.

pages: 518 words: 49,555

Designing Social Interfaces
by Christian Crumlish and Erin Malone
Published 30 Sep 2009

I’d venture that thumb-voting and the recommender system are a huge part of why many people buy TiVo in the first place. (OK, that plus “pause live TV.”) Items with a great deal of persistence (on the extreme end are real-world establishments, such as restaurants or businesses) make excellent candidates for rateability. Furthermore, the types of ratings we can ask for may be more involved. Because these establishments will persist, we can be reasonably sure that others will always come along afterward and benefit from the work that the community has put into the item. When it comes to explicitly input recommender systems, we should acknowledge the limitations of folks’ interest in “feeding the machine.”

This network can give rich social rewards to those who participate; however, more and more participants are finding that the rewards extend beyond just being social and discovering that the connectedness and serendipity of ambient intimacy can bring great professional gains as well. These days, ambient intimacy plays many roles in my life: it has stopped me from missing an important international flight and helped me keep sane whilst at home with a small baby. It is my outsourced tech support resource, my recommendation engine, my news filter. Twitter lets me virtually attend conferences I can’t get to but am interested in. But most valuable of all, it has allowed me to create, maintain, and even build professional and personal relationships with people in my field whose work I admire and from whom I have been able to learn and develop as a professional.

pages: 283 words: 85,824

The People's Platform: Taking Back Power and Culture in the Digital Age
by Astra Taylor
Published 4 Mar 2014

A more democratic culture is one where previously excluded populations are given the material means to fully engage. To create a culture that is more diverse and inclusive, we have to pioneer ways of addressing discrimination and bias head-on, despite the difficulties of applying traditional methods of mitigating prejudice to digital networks. We have to shape our tools of discovery, the recommendation engines and personalization filters, so they do more than reinforce our prior choices and private bubbles. Finally, if we want a culture that is more resistant to the short-term expectations of corporate shareholders and the whims of marketers, we have to invest in noncommercial enterprises. There is no shortage of good ideas.

Huberman, “The Persistence Paradox,” First Monday 15, nos. 1–4 (January 2010). 36. James Evans, “Electronic Publication and the Narrowing of Science and Scholarship,” Science 321, no. 5887 (July 18, 2008): 395–99. 37. Daniel M. Fleder and Kartik Hosanagar, “Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity,” Management Science 55, no. 5 (May 2009): 697–712. 38. Evan Hughes, “Here’s How Amazon Self-Destructs,” Salon, July 19, 2013. 39. Gary Flake et al., “Winners Don’t Take All: Characterizing the Competition for Links on the Web,” Proceedings of the National Academy of Sciences 99, no. 8 (April 16, 2002). 40.

pages: 259 words: 84,261

Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
by Mo Gawdat
Published 29 Sep 2021

The train has left the station and, due to the three inevitables, we are just about to be supervised by a GLaDOS and all her infinitely intelligent brothers and sisters. Make no mistake, even as we speak, intelligent machines are observing us like lab rats. They are monitoring our every move and designing tests to see how we react. From the ad engines of Google to the personalization and recommendation engines of Instagram and YouTube, from the music recommendation engines of Spotify and Apple Music to the product recommendation engines of Amazon, from the chatbots to the discrimination engines of dating apps, we are the lab rats, you and me, and we are being led blindly through the maze. And what are we being promised? Digital cake – a piece of worthless content or an uninformed opinion.

What will completely sway the needle, however, is when AI itself understands this rule of engagement – do good if you want my attention – better than the humans do. So don’t approve of killing machines, even if you are patriotic and they are killing on behalf of your own country. Don’t keep feeding the recommendation engines of social media with hours and hours of your daily life. Don’t ever click on content recommended to you, search for what you actually need and don’t click on ads. Don’t approve of FinTech AI that uses machine intelligence to trade or aid the wealth concentration of a few. Don’t share about these on your LinkedIn page.

Post about every positive, friendly, healthy use of AI you find, to make others aware of it. Stand Together We should teach each other, so we collectively become smarter at identifying what is good for humanity. Don’t believe the lies you are told. It’s called the ‘defence’ industry but in reality it is mostly about offence. It’s called a ‘recommendation engine’ when in reality it is about manipulation and distraction. We are told that ‘people who bought this also bought that’ when in reality what should be said is ‘can we tempt you to buy this too?’ We are told how many found love on a dating site but not told how many were left broken-hearted. They call it a ‘matching’ algorithm when actually it is a filtering algorithm that connects you only to those the AI believes you are good enough to attract.

pages: 504 words: 67,845

Designing Web Interfaces: Principles and Patterns for Rich Interactions
by Bill Scott and Theresa Neil
Published 15 Dec 2008

It turns out that voting and rating systems are the most common places to make tools always visible. Netflix was the earliest to use a one-click rating system (Figure 4-4). Figure 4-4. Netflix star ratings are always visible Just as with Digg, rating movies is central to the health of Netflix. The Cinematch™ recommendation engine is driven largely by the user's ratings. So a clear call to action (to rate) is important. Not only do the stars serve as a strong call to action to rate movies, but they also provide important information for the other in-context tool: the "Add" button. Adding movies to your movie-shipping queue is key to having a good experience with the Netflix service.

In fact, the Gap, Old Navy, Banana Republic, and PiperLime all share the same Inline Assistant Process-style shopping cart. The Gap is betting that making it quick and easy to add items to the cart across four stores will equal more sales. Additional step Amazon, on the other hand, is betting on its recommendation engine. By going to a second page, Amazon can display other shirts like the one added—as well as advertise the Amazon.com Visa card (Figure 8-8). Figure 8-8. Amazon shows recommendations when confirming an add to its shopping cart Which is the better experience? The Gap seems to be the clear winner in pure user experience.

Netflix displays its recommendations in an overlay Each movie on the site has an "Add" button. Clicking "Add" immediately adds the movie to the user's queue. As a confirmation and an opportunity for recommendations, a Dialog Overlay is displayed on top of the movie page. Just like Amazon, Netflix has a sophisticated recommendation engine. The bet is that since the user has expressed interest in an item (shirt or movie), the site can find other items similar to it to suggest. Amazon does this in a separate page. Netflix does it in an overlay that is easily dismissed by clicking anywhere outside the overlay (or by clicking the close button at the top or bottom).

pages: 406 words: 88,820

Television disrupted: the transition from network to networked TV
by Shelly Palmer
Published 14 Apr 2006

The key problem with on-demand technology is not desire; it is complexity. It’s just too hard for the average person to do. Now, making a playlist in iTunes could not be simpler. But, putting your iPod in shuffle mode is actually easier, and it is also the path of least resistance. There are other factors that help with playlist creation. Recommendation engines and collaborative filtering like Amazon’s “if you like this … you might also like …” are good ways to help people pick the right stuff for their playlists. Consumers can also skew shuffle modes, setting them to play the content they manually play the most more often than the content they play less often.

, MSN, Amazon, eBay, and of course, about every existing broadcast and cable network. A trip to the video section of the Apple Music Store through iTunes is a very interesting experience, particularly when you see how the interface handles show branding vs. network branding. Social Search Solution Another probable future is Tim Halle’s vision of a “social search,” a recommendation system that will emerge from social networking sites. Of course, the biggest social Copyright © 2006, Shelly Palmer. All rights reserved. 8-Television.Chap Eight v3.qxd 3/20/06 7:25 AM Page 114 114 C H A P T E R 8 Media Consumption networking sites like friendster.com or myspace.com are also big brands, so this may be just another permutation of branded search.

pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing
by Ed Finn
Published 10 Mar 2017

Going farther from shore, the deep waters of algorithmic imagination draw us relentlessly back toward ourselves and the mysterious origins of cognition, inspiration, and serendipity that drive creative work. How are computational systems reinventing, channeling, or modulating those processes? On an individual level this is a straightforward extension of technics: when does the memory bank, the virtual assistant, or the recommendation engine deserve credit in the creative process? These tools manage cognition, inspiration, and serendipity for us, generating conversation and intellectual connection in our social media streams, our digital workspaces and notebooks, and more broadly, in the horizon of visible knowledge. The writer using a word processor to manage drafts; the scientist using research databases and citation tools to manage a field of professional knowledge; the artist using image editing software, photo sharing tools, and a virtual notebook to track observations—all of these creative processes depend on tools that are increasingly active, occasionally manipulative agents in their own use.

At the same time, we are deeply compelled by these abstracting systems, by the romance of clean interfaces and tidy ontologies. Even with thousands of human hours encoded into its recommendations, Netflix presents a seamless computational facade, because we have arrived at a stage where many of us will trust a strange computer’s suggestions more than we will trust a stranger’s. The rhetoric of the recommendation system is so successful because it black boxes the task of judgment, asking us to trust the efficacy of personalization embedded in the algorithm. By contrast, reading movie critics or browsing sites like IMDb or Rotten Tomatoes requires us to evaluate the evaluators in a much more complicated, human way, measuring the applicability of advice generated by other personalities who might not share our tastes.

pages: 344 words: 96,020

Hacking Growth: How Today's Fastest-Growing Companies Drive Breakout Success
by Sean Ellis and Morgan Brown
Published 24 Apr 2017

Amazon is, once again, a leading practitioner, having developed one of the most powerful “recommendation engines,” the term for the algorithmic programs that customize which items are recommended to you while browsing the site. The selections are based on a combination of a customer’s search history and buying habits, and data about the habits of other shoppers like that customer. All Amazon shoppers in effect see their own version of Amazon with a unique experience tailored to their preferences. Some recommendation engines, such as Amazon’s, as well as those deployed by Google and Netflix, are incredibly complex, but many are based on relatively simple math.

This calculation can be done for a host of combinations of every item in the store, creating powerful recommendations that lead to more purchases. And with the best recommendation engines, these product suggestions will only get better and more personalized over time because the more customers shop, the more data is available not just about what an individual customer has purchased, but also about common patterns among a large pool of shoppers. The grocery app recommendation engine might, for example, recommend seltzer water and limes when a shopper puts Red Bull in her shopping cart—even if that shopper has no history of buying any of those products—based on data that shows most people buying Red Bull are purchasing mixers for vodka.6 DON’T BE INTRUSIVE An important word of caution about customizing is that it can backfire if you’re not sensitive about how you’re doing it.

pages: 293 words: 78,439

Dual Transformation: How to Reposition Today's Business While Creating the Future
by Scott D. Anthony and Mark W. Johnson
Published 27 Mar 2017

Netflix set to work building sophisticated inventory management systems to help ensure that people could get the DVDs they wanted when they wanted them. The company also invested heavily to build algorithms that predicted users’ desired content based on their ratings of movies they rented. The so-called recommendations engine is so critical to Netflix that in 2008 it announced a public contest wherein the team that most improved the performance of the engine would get $1 million, as long as they crossed a 10 percent improvement threshold. Two teams indeed crossed the threshold, with the winning team receiving a check from Hastings in 2009 (remarkably, that was the first time the team members met face-to-face; they had done their work virtually).

There are others, of course, such as InterActive Corp (worth about $6 billion as of this writing), which runs a collection of websites such as Match.com, About.com, and The Daily Beast; travel recommendation site TripAdvisor (worth $10 billion); real estate platform Zillow ($1.5 billion); coupon disruptor Groupon ($3 billion); local recommendations engine Yelp ($2 billion), and listicle and algorithmic innovator BuzzFeed ($1.5 billion). As of late 2016, the dozen companies here had created almost $1 trillion in market value. FIGURE 3-1 Transformation B Just because newspaper publishers didn’t create these companies doesn’t mean they couldn’t have created them.

See also curiosity capabilities link and, 74–75 disruption as, 8–12, 47–50 focusing on highest-potential, 141–142 leaders on, 196–197 stopping exploration of, 126–127 strategic opportunity areas and, 123–127 Optus, 145, 147–148, 149 Orange Is the New Black, 35 O’Reilly, Charles III, 53, 54 outsiders, involving in decision making, 109–110 overshooting, 103 Palo Alto Research Center (PARC), 13, 31 Pandesic, 78–79 parable of the eleventh floor, 77 Pathway, 58 patientslikeme.com, 60 PayPal, 200 Paytm, 202 Pearson, 67 penicillin, 139 periphery, spotting warning signs from the, 107–108 Perry, Tyler, 98 Pfizer, 17, 22, 138–139 Pharmacyclics, 19 Photoshop Express, 32 Pixar Animation Studios, 3–4 planning fallacy, 120 Playing to Win (Martin and Lafley), 124 Plunify, 72, 74 Porter, Michael, 99–100, 177 portfolio management systems, 80–82 Potemkin portfolios, 120 potential estimating current operations’, 118–119 estimating existing investments’, 119 problem solving approaches, 140–141 Procter & Gamble (P&G), 23, 64, 109 capabilities identification at, 79–80 innovation at, 146 predictability versus innovation at, 137–138 Professional Golfers’ Association, 99 Project ET, 127–128 Psychology Today, 177 purpose, 175–179 leaders on, 194–195 QQ, 106 Quantum Solutions, 51, 52 Quattro Wireless, 67 Quicken, 132–133 Qwikster, 94 Rakuten Group, 143 recommendations engine, Netflix, 33–35 reinvention, 42–43 Reminder app, 152 repositioning, 12, 27–45. See also transformation A reinvention versus, 42–43 Research in Motion (RIM), 4 revenue models, 40–41 reverse mentors, 151 Ricks College, 37, 44, 170. See also Brigham Young University-Idaho (BYU-Idaho) Ries, Eric, 65, 153 risk management early warning signs of disruption and, 102–113 growth gap determination and, 120–121 through experimentation, 64–66 toolkit for, 218–219 Ronn, Karl, 109 Rotman School of Management, 140 Rubin, Andy, 4 Rumelt, Richard P., 78, 116 Safaricom, 201 sales careful management of, 45 salesforce and, 77 Salesforce.com, 27–28, 151 The Salt Lake Tribune, 8.

pages: 283 words: 78,705

Principles of Web API Design: Delivering Value with APIs and Microservices
by James Higginbotham
Published 20 Dec 2021

The API design will incorporate internal technology decisions, sometimes to the point of requiring familiarity with a particular database or cloud vendor. For example, a public API product for a recommendation engine required the understanding of Apache Lucene to use the API. The API accepts configuration files via an HTTP POST using the Lucene configuration file format to manage the recommendation engine. The leaking of internal implementation details to API consumers resulted in the need to become Apache Lucene experts, rather than experts in using the recommendation engine API. There is value in prototyping APIs or producing evolutionary API design through a mixture of code and design.

pages: 137 words: 38,925

The Death of Truth: Notes on Falsehood in the Age of Trump
by Michiko Kakutani
Published 17 Jul 2018

This sort of fringe content can both affect how people think and seep into public policy debates on matters like vaccines, zoning laws, and water fluoridation. Part of the problem is an “asymmetry of passion” on social media: while most people won’t devote hours to writing posts that reinforce the obvious, DiResta says, “passionate truthers and extremists produce copious amounts of content in their commitment to ‘wake up the sheeple.’ ” Recommendation engines, she adds, help connect conspiracy theorists with one another to the point that “we are long past merely partisan filter bubbles and well into the realm of siloed communities that experience their own reality and operate with their own facts.” At this point, she concludes, “the Internet doesn’t just reflect reality anymore; it shapes it.” 5 THE CO-OPTING OF LANGUAGE Without clear language, there is no standard of truth.

And a Facebook ad called “Secured Borders,” showing a sign saying, “No Invaders Allowed.” “The strategy is to take a crack in our society and turn it into a chasm,” said Senator Angus King of Maine during a Senate Intelligence Committee hearing on Russian interference in the election. Reporting from several publications found that YouTube’s recommendation engine seemed to be steering viewers toward divisive, sensationalistic, and conspiracy-minded content. And Twitter found that more than fifty thousand Russia-linked accounts on its platform were posting material about the 2016 election. A report from Oxford University found that in the run-up to the election the number of links on Twitter to “Russian news stories, unverified or irrelevant links to WikiLeaks pages, or junk news” exceeded the number of links to professionally researched and published news.

pages: 380 words: 118,675

The Everything Store: Jeff Bezos and the Age of Amazon
by Brad Stone
Published 14 Oct 2013

Over the next year, Miller tangled with the European divisions of Random House, Hachette, and Bloomsbury, the publisher of the Harry Potter series. “I did everything I could to screw with their performance,” he says. He took selections of their catalog to full price and yanked their books from Amazon’s recommendation engine; with some titles, like travel books, he promoted comparable books from competitors. Miller’s constant search for new points of leverage exploited the anxieties of neurotic authors who obsessively tracked sales rank—the number on Amazon.com that showed an author how well his or her book was doing compared to other products on the site.

Amazon approached large publishers aggressively. It demanded accommodations like steeper discounts on bulk purchases, longer periods to pay its bills, and shipping arrangements that leveraged Amazon’s discounts with UPS. To publishers that didn’t comply, Amazon threatened to pull their books out of its automated personalization and recommendation systems, meaning that they would no longer be suggested to customers. “Publishers didn’t really understand Amazon. They were very naïve about what was going on with their back catalog,” says Goss. “Most didn’t know their sales were up because their backlist was getting such visibility.” Amazon had an easy way to demonstrate its market power.

pages: 1,172 words: 114,305

New Laws of Robotics: Defending Human Expertise in the Age of AI
by Frank Pasquale
Published 14 May 2020

On the other hand, they do not have much competition, so there is little reason to fear user defection. Meanwhile, bots inflate platforms’ engagement numbers, the holy grail for digital marketers. In 2012, YouTube “set a company-wide objective to reach one billion hours of viewing a day, and rewrote its recommendation engine to maximize for that goal.”67 “The billion hours of daily watch time gave our tech people a North Star,” said its CEO, Susan Wojcicki. Unfortunately for YouTube users, that single-minded fixation on metrics also empowered bad actors to manipulate recommendations and drive traffic to dangerous misinformation, as discussed above.

Professionals in health and education also owe clear and well-established legal and ethical duties to patients and students. These standards are only beginning to emerge among technologists. Thus, in the case of media and journalism—the focus of Chapter 4—a concerted corrective effort will be necessary to compensate for what is now a largely automated public sphere. When it comes to advertising and recommendation systems—the lifeblood of new media—AI’s advance has been rapid. Reorganizing commercial and political life, firms like Facebook and Google have deployed AI to make the types of decisions made by managers at television networks or editors at newspapers—but with much more powerful effects. The reading and viewing habits of hundreds of millions of people have been altered by such companies.

pages: 247 words: 81,135

The Great Fragmentation: And Why the Future of All Business Is Small
by Steve Sammartino
Published 25 Jun 2014

Creative types Collaboration, creative orientation and counter intuition Note Chapter 6: Demographics is history: moving on from predictive marketing How to get profiled The price of pop culture The best average The weapon of choice Don’t fence me in How do you define a teenager? Stealing music or connecting? Marketing 1.0 Marketing revised The new intersection Social + interests = intention The story of cities Do I know you? The interest graph in action The anti-demographic recommendation engine Chapter 7: The truth about pricing: technology and omnipresent deflation Technology deflation Real-world technology deflation The free super computer The crux is human It’s getting quicker Technology curve jumping Technology stacking Omnipresent deflation Consumer price index trickery Connections and the impact on prices Economic border hopping The new minimum wage Notes Chapter 8: A zero-barrier world: how access to knowledge is breaking down barriers So what’s changed?

They helped a person, which is a very different approach. It seems old-school BMXers are a little bit smarter than old-school marketers. What a great way to build a community; one that I’m now a part of. While everyone gets enamoured with ‘big data’, there’s probably a lot more we can do with ‘little data’. The anti-demographic recommendation engine A lot of e-commerce platforms and social-media engines seem to be able to do what mainstream marketers could never quite pull off. Every day, I’m exposed to products and services that I have zero interest in ever purchasing, mainly due to the laziness of the marketers who allocate the budget behind them.

It’s always spot on, sitting perfectly in the centre of my personal interest graph, based on the simplicity of what I’ve bought, looked at, wish listed and what others have in their list when there are overlaps. For me personally, it’s very accurate indeed. What’s interesting is that this recommendation engine is what I’d coin an ‘anti-demographic’ profiler: It doesn’t care what sex I am. It doesn’t care where I live. It doesn’t care or know how much I earn. It doesn’t care if I finished school. None of this matters. What matters is the direct connection and the reality of my interests based on my digital footprint.

pages: 286 words: 87,401

Blitzscaling: The Lightning-Fast Path to Building Massively Valuable Companies
by Reid Hoffman and Chris Yeh
Published 14 Apr 2018

Climbing the learning curve for these tasks was painful and expensive, but it gave Netflix a competitive advantage over its competitors. Later, as broadband connections became more widespread, Netflix had to climb the learning curve when building out its massive streaming infrastructure while continuing to improve its consumer recommendation engine. That was when Netflix began running into a major strategic issue. Netflix relied on the studios for its content (movies and TV shows), but the studios now saw online video companies like YouTube and Netflix as a threat. In response, they began to increase the price they demanded from Netflix for licensing their content and held back some of their “crown jewels” (e.g., massively popular content like Saturday Night Live) for themselves and Hulu (an industry joint venture).

Today, Netflix might very well be the leader in original video content, and even traditional Hollywood power players, such as superproducer Shonda Rhimes (Grey’s Anatomy, Scandal, How to Get Away with Murder) and comedian Adam Sandler (Happy Gilmore, Grown Ups), have switched from traditional studios to Netflix. What’s more, the other learning curves that Netflix climbed along the way actually helped it beat the studios at their own game. The consumer recommendation engine gives Netflix an unprecedented ability to predict what content its users want to watch, which allows it to work with creators to produce that content (such as the popular drama Stranger Things). And because Netflix has greater confidence in its own predictions than its competitors have in theirs, it can outbid them for content when they go head-to-head.

The challenge was figuring out how to develop a daily use case that helped LinkedIn users with their professional lives and encouraged them to use the service continuously rather than just when they were looking to switch jobs or hire a new employee. We tried a number of single-threaded efforts to meet the challenge. We rolled out features one after another, such as a recommendation engine for people that our users should meet and a professional Q&A service. None of them worked well enough to solve the problem. We concluded that the problem might require a Swiss Army knife approach with multiple use cases for multiple groups of users. After all, some people might want a news feed, some might want to track their career progress, and some might be keen on continuing education.

pages: 567 words: 122,311

Lean Analytics: Use Data to Build a Better Startup Faster
by Alistair Croll and Benjamin Yoskovitz
Published 1 Mar 2013

Shoppers start with an external search and then bounce back and forth from sites they visit to their search results, seeking the scent of what they’re after. Once they find it, on-site navigation becomes more important. This means on-site funnels are somewhat outdated; keywords are more important. Retailers use recommendation engines to anticipate what else a buyer might want, basing their suggestions on past buyers and other users with similar profiles. Few visitors see the same pages as one another. Retailers are always optimizing performance, which means that they’re segmenting traffic. Mid- to large-size retailers segment their funnel by several tests that are being run to find the right products, offers, and prices.

Revenue per customer The lifetime value of each customer. Top keywords driving traffic to the site Those terms that people are looking for, and associate with you—a clue to adjacent products or markets. Top search terms Both those that lead to revenue, and those that don’t have any results. Effectiveness of recommendation engines How likely a visitor is to add a recommended product to the shopping cart. Virality Word of mouth, and sharing per visitor. Mailing list effectiveness Click-through rates and ability to make buyers return and buy. More sophisticated retailers care about other metrics such as the number of reviews written or the number considered helpful, but this is really a secondary business within the organization, and we’ll deal with these when we look at the user-generated content model in Chapter 12.

We’re not going to get into the details of search engine optimization and search engine marketing here—those are worlds unto themselves. For now, realize that search is a significant part of any e-commerce operation, and the old model of formal navigational steps toward a particular page is outdated (even though it remains in many analytics tools). Recommendation Acceptance Rate Big e-commerce companies use recommendation engines to suggest additional items to visitors. Today, these engines are becoming more widespread thanks to third-party recommendation services that work with smaller retailers. Even bloggers have this kind of algorithm, suggesting other articles similar to the one the visitor is currently reading.

pages: 420 words: 130,503

Actionable Gamification: Beyond Points, Badges and Leaderboards
by Yu-Kai Chou
Published 13 Apr 2015

Accompanying the Alfred Effect is Amazon’s Recommendation Engine, now infamous in the personalization industry. Amazon’s recommendation engine, according to Amazon themselves, led to 30% of their sales5. That’s a fairly significant factor for a company that is already making billions of dollars every month. In fact, JP Mangalindan, a writer for Fortune and CNN money, argues that a significant part of Amazon’s 29% sales growth from the second fiscal quarter of 2011 to the second fiscal quarter of 2012 was attributed to the recommendation engine.6 And what does this recommendation engine look like? “Customers Who Bought This Item Also Bought.”

We Are the Nerds: The Birth and Tumultuous Life of Reddit, the Internet's Culture Laboratory
by Christine Lagorio-Chafkin
Published 1 Oct 2018

The notebook contained some typical college scribbles (“I’m sorry I’ll shut up now”) and doodles (3-D cubes, a penis) he’d made during class at UVA, and some coursework notes, too, but on this day it transformed into a place where Huffman would document the origins of, and his progress on, their new, as yet unnamed project. “The site people go to find something new,” Huffman wrote in blue pen. “Points for being the first to recommend,” he also wrote, likely transcribing Graham’s exact words regarding building a recommendation engine before any of their preexisting competitors could. The recommendation engine was integral to the success of this hypothetical project, Graham thought, because one would need to dangle a carrot for users to entice them to post links in the first place, and then to return again and again to discover and share. Discover and share.

It had been a longtime and significant priority of Huffman’s to keep the site loading quickly and, for users, functioning well (programmers call this keeping a site “perky”). Thanks to numerous small changes and additions to its functionality over the past years, the codebase had become unwieldy. Plus, there were portions of code that were now unused, features built and never launched, or pulled back on, such as the complex recommendation engine Paul Graham had pushed so hard for at Reddit’s inception. With a team of four in place in the conference room overlooking SoMa’s tech-company epicenter, it felt good. Reddit was ready to grow. Huffman and Slowe felt proud that they’d learned to navigate Condé Nast human resources well enough to hire, which allowed them finally to get ahead of the game on site maintenance.

Huffman’s long-standing trust in Slowe was so deep that when Slowe returned to Reddit, Huffman said his mandate was simply: “Go do stuff, Chris.” Slowe dug into how Reddit’s homepage functions for various users, dubbing the project “Relevance.” Updating the homepage algorithm led him to revisit the recommendation engine project they’d worked on eleven years before. Soon, he added another major project to his plate: overseeing a department that would be dubbed “anti-evil.” It would build specific tools for use by the secretive trust and safety team, and essentially be its programming counterpart. As new engineers were hired, more were handed over to Slowe to build robust antispam systems.

pages: 215 words: 55,212

The Mesh: Why the Future of Business Is Sharing
by Lisa Gansky
Published 14 Oct 2010

As the service developed, the company added layers of information to inform a user’s choices, such as reviews from people in the network whose profile of selections and ratings were similar. Recently, it sponsored a contest awarding a million dollars to anyone who could significantly improve the movie recommendation service. Thousands of teams from more than a hundred nations competed. Netflix’s “recommendation engine” relies on algorithms culled from masses of data collected on the Web, including that provided directly by customers. The lesson learned from the contest, according to the New York Times, was the power of collaboration, as winning teams began sharing ideas and information: “The formula for success was to bring together people with complementary skills and combine different methods of problem solving.”

See Social networking starting Mesh company Sweet Spot trends influencing growth of trust building Millennial generation Mobile networks digital translation to physical and flash branding as foundation of the Mesh share-based business operation users, increase in Modular design Mohsenin, Kamran Movie rentals online, Mesh companies Mozilla Firefox Music-based businesses, Mesh companies Natural ecosystem, relationship to Mesh ecosystem Netflix annual sales as information business Mesh strategy perfection recommendation engine recommendations Network effect Niche markets for maintaining/servicing products Mesh companies opening, reason for sharing as North Portland Tool Library (NPTL) Ofoto Olapic Ombudsman Open Architecture Network Open Design Open innovation service provider Open networks advantages of Architecture for Humanity communal IP concept and marketing products openness versus proprietary approach and product improvement software development OpenTable O’Reilly, Tim Ostrom, Elinor Own-to-Mesh model car-sharing services profits, generation from retirees as customers Partnerships characteristics of corporations and Mesh companies income generation from in Mesh ecosystem unexpected value of Patagonia recycled textiles of Walmart partnership Paul, Sunil Payne, Steven Peer-to-peer lending.

pages: 334 words: 102,899

That Will Never Work: The Birth of Netflix and the Amazing Life of an Idea
by Marc Randolph
Published 16 Sep 2019

For example: Say I rented (and loved) Pleasantville, one of the best movies of 1998 and a clever dark comedy about what happens when two teenagers from the nineties (Tobey Maguire and Reese Witherspoon) are sucked into a black-and-white television show set in 1950s small-town America. The ideal recommendation engine would be able to steer me away from more current new releases and toward other movies, like Pleasantville—movies like Doc Hollywood. That was a tall order. The thing about taste is that it’s subjective. And the number of factors in play, when trying to establish similarities between films, is almost endless.

After that, Reed’s team went to work integrating these taste predictions into a broader algorithm that made movie recommendations after weighing a number of factors—keyword, number of copies, number of copies in stock, cost per disc. The result—which launched in February of 2000 as Cinematch—was a seemingly more intuitive recommendation engine, one that outsourced qualitative assessment to users while also optimizing things on the back end. In many ways, it was the best of both worlds: an automated system that nonetheless felt human, like a video store clerk asking you what you’d seen lately and then recommending something he knew you’d like—and that he had in stock.

We’d continued on our streak of making major talent hires—the most recent being Leslie Kilgore, whom Reed had convinced to leave Amazon to head our marketing efforts as CMO, and Ted Sarandos, who now managed our content acquisition. Since walking away from à la carte rentals, our no-due-dates, no-late-fees program had steadily built up steam. Users loved Cinematch, our recommendation engine. We did, too. It kept our subscribers’ queues full—and nothing, we found, correlated more to retention than a queue with lots of movies in it. We were now approaching nearly 200,000 paying subscribers. Our other metrics were looking pretty impressive as well. We now carried 5,800 different DVD titles and shipped more than 800,000 discs a month, and our warehouse was packed with more than a million discs.

pages: 414 words: 117,581

Binge Times: Inside Hollywood's Furious Billion-Dollar Battle to Take Down Netflix
by Dade Hayes and Dawn Chmielewski
Published 18 Apr 2022

And when they want another one, they’ll just mail it back and we’ll replace it. There’ll be no due dates and no late fees.” The service Netflix introduced in 1999 changed the struggling startup’s fortunes, attracting 239,000 subscribers, winning loyalty from those who appreciated not only its novel approach to DVD rentals but also its recommendation engine and the community of cinephiles gathered around its website. At the time, prior to the arrival of social media, chat rooms and message boards were the primary means of expression. Netflix subscribers could build “queues” of desired rental titles and trade reviews with other subscribers. Compared with Blockbuster, whose khaki-and-blue-shirt staff uniforms and regimented aisles were directly inspired by mass brands like McDonald’s, Netflix emphasized the individual.

Its relentless focus on delivering what people want to watch, and its multilayered approach to understanding individual consumer preferences, is something that sets it apart from its Hollywood studio rivals. The traditional focus of entertainment companies has been convincing consumers to tune in at a certain hour or to show up in theaters on a particular weekend. Netflix saw its role as that of matchmaker, not carnival barker. The company launched its first recommendation engine, Cinematch, in February 2000 to help subscribers navigate a library of five thousand movie titles that was too unwieldy to browse. Six years later, it held a closely watched contest to boost the accuracy of its recommendations by 10 percent. Netflix dangled a $1 million prize, though the ultimate lure for nerds was access to a data set of over 100 million ratings on 17,700 movies from 480,189 customers.

As Netflix’s content flowed onto millions of screens around the world, it invested deeply in local-language production to attract subscribers from Darfur to Kuala Lumpur. Netflix discovered its shows effortlessly traveled the borderless world of the internet, propelled by local-language dubbing and its recommendation engine. The German time-travel series Dark, the postapocalyptic Danish series The Rain, India’s crime thriller Sacred Games, and France’s action mystery Lupin would find audiences well beyond their countries of origin. Meanwhile, veteran studio executive Scott Stuber launched Netflix’s pursuit of a Best Picture Oscar with Roma, director Alfonso Cuarón’s sumptuous black-and-white portrait of a domestic worker set in 1970s Mexico City.

pages: 58 words: 12,386

Big Data Glossary
by Pete Warden
Published 20 Sep 2011

It comes with algorithms to perform a lot of common tasks, like clustering and classifying objects into groups, recommending items based on other users’ behaviors, and spotting attributes that occur together a lot. In practical terms, the framework makes it easy to use analysis techniques to implement features such as Amazon’s “People who bought this also bought” recommendation engine on your own site. It’s a heavily used project with an active community of developers and users, and it’s well worth trying if you have any significant number of transaction or similar data that you’d like to get more value out of. Introducing Mahout Using Mahout with Cassandra scikits.learn It’s hard to find good off-the-shelf tools for practical machine learning.

pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy
by Jonathan Taplin
Published 17 Apr 2017

When Thefacebook really started to grow, in the late spring of 2004, Zuckerberg and his right-hand man, Dustin Moskovitz, decided to go to Silicon Valley for the summer. Zuckerberg had met Sean Parker in a Chinese restaurant in New York in May and had been awed by his outlaw tales of Napster. Zuckerberg had written a music-recommendation engine while he was a senior at Exeter, and so Napster loomed large in his notion of hipness. When the two men got to Palo Alto in June, they ran into Parker, who was essentially homeless, having been thrown out of his latest company, Plaxo, an online address-book application. It is a tribute to Zuckerberg’s naive trust that he invited Parker to live in the house he and Moskovitz had rented.

They had no time for politics or even for wondering why their horizons were so narrow. The kids attending DigiTour would fit right into the plot of Brave New World. The Internet’s self-curated view from everywhere has the amazing ability to distract us in trivial pursuits, narrow our choices, and keep us safe in a balkanized suburb of our own taste. Search engines and recommendation engines constantly favor the most popular options and constantly make our discovery more limited. I began this chapter wondering whether technology was robbing us of some of our essential humanity. Google’s chief technologist proclaims that technology will “allow us to transcend these limitations of our biological bodies and brains.… There will be no distinction, post-Singularity, between human and machine.”

pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You
by Eli Pariser
Published 11 May 2011

In a memo for fellow progressives, Mark Steitz, one of the primary Democratic data gurus, recently wrote that “targeting too often returns to a bombing metaphor—dropping message from planes. Yet the best data tools help build relationships based on observed contacts with people. Someone at the door finds out someone is interested in education; we get back to that person and others like him or her with more information. Amazon’s recommendation engine is the direction we need to head.” The trend is clear: We’re moving from swing states to swing people. Consider this scenario: It’s 2016, and the race is on for the presidency of the United States. Or is it? It depends on who you are, really. If the data says you vote frequently and that you may have been a swing voter in the past, the race is a maelstrom.

Quora Forum, accessed Dec. 17, 2010, www.quora.com/Facebook-company/Whats-the-history-of-the-Awesome-Button-that-eventually-became-the-Like-button-on-Facebook. 151 “against the cruise line industry”: Hollis Thomases, “Google Drops Anti-Cruise Line Ads from AdWords,” Web Ad.vantage, Feb. 13, 2004, accessed Dec. 17, 2010, www.webadvantage.net/webadblog/google-drops-anti-cruise-line-ads-from-adwords-338. 151–52 identify who was persuadable: “How Rove Targeted the Republican Vote,” Frontline, accessed Feb. 8, 2011, www.pbs.org/wgbh/pages/frontline/shows/architect/rove/metrics.html. 152 “Amazon’s recommendation engine is the direction”: Mark Steitz and Laura Quinn, “An Introduction to Microtargeting in Politics,” accessed Dec. 17, 2010, www.docstoc.com/docs/43575201/An-Introduction-to-Microtargeting-in-Politics. 153 round-the-clock “war room”: “Google’s War Room for the Home Stretch of Campaign 2010,” e.politics, Sept. 24, 2010, accessed Feb. 9, 2011, www.epolitics.com/2010/09/24/googles-war-room-for-the-home-stretch-of-campaign-2010/. 155 “campaign wanted to spend on Facebook”: Vincent R.

pages: 308 words: 84,713

The Glass Cage: Automation and Us
by Nicholas Carr
Published 28 Sep 2014

Thanks to the proliferation of smartphones, tablets, and other small, affordable, and even wearable computers, we now depend on software to carry out many of our daily chores and pastimes. We launch apps to aid us in shopping, cooking, exercising, even finding a mate and raising a child. We follow turn-by-turn GPS instructions to get from one place to the next. We use social networks to maintain friendships and express our feelings. We seek advice from recommendation engines on what to watch, read, and listen to. We look to Google, or to Apple’s Siri, to answer our questions and solve our problems. The computer is becoming our all-purpose tool for navigating, manipulating, and understanding the world, in both its physical and its social manifestations. Just think what happens these days when people misplace their smartphones or lose their connections to the net.

Automated essay-grading algorithms encourage in students a rote mastery of the mechanics of writing. The programs are deaf to tone, uninterested in knowledge’s nuances, and actively resistant to creative expression. The deliberate breaking of a grammatical rule may delight a reader, but it’s anathema to a computer. Recommendation engines, whether suggesting a movie or a potential love interest, cater to our established desires rather than challenging us with the new and unexpected. They assume we prefer custom to adventure, predictability to whimsy. The technologies of home automation, which allow things like lighting, heating, cooking, and entertainment to be meticulously programmed, impose a Taylorist mentality on domestic life.

pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It)
by Salim Ismail and Yuri van Geest
Published 17 Oct 2014

At the heart of this staggering growth was the PageRank algorithm, which ranks the popularity of web pages. (Google doesn’t gauge which page is better from a human perspective; its algorithms simply respond to the pages that deliver the most clicks.) Google isn’t alone. Today, the world is pretty much run on algorithms. From automotive anti-lock braking to Amazon’s recommendation engine; from dynamic pricing for airlines to predicting the success of upcoming Hollywood blockbusters; from writing news posts to air traffic control; from credit card fraud detection to the 2 percent of posts that Facebook shows a typical user—algorithms are everywhere in modern life. Recently, McKinsey estimated that of the seven hundred end-to-end bank processes (opening an account or getting a car loan, for example), about half can be fully automated.

Amazon regularly makes long bets (e.g., Amazon Web Services, Kindle, and now Fire smartphones and delivery drones), views new products as if they are seedlings needing careful tending for a five-to-seven-year period, is maniacal about growth over profits and ignores the short-term view of Wall Street analysts. Its pioneering initiatives include its Affiliate Program, its recommendation engine (collaborative filtering) and the Mechanical Turk project. As Bezos says, “If you’re competitor-focused, you have to wait until there is a competitor doing something. Being customer-focused allows you to be more pioneering.” Not only has Amazon built ExOs on its edges (such as AWS), it also has had the courage to cannibalize its own products (e.g., Kindle).

pages: 301 words: 85,263

New Dark Age: Technology and the End of the Future
by James Bridle
Published 18 Jun 2018

YouTube’s official guidelines state that the site is for ages thirteen and up, with parental permission required for those below eighteen, but there’s nothing to stop a thirteen-year-old accessing it. Moreover, there’s no need to have an account at all; like most websites, YouTube tracks unique visitors by their address, browser and device profile, and behaviour, and it can build a detailed demographic and preference profile to feed the recommendation engines without the viewer ever consciously submitting any information about themselves. That applies even if the viewer is a three-year-old child plonked in front of their parent’s iPad and mashing the screen with a balled-up fist. The frequency with which such a situation occurs is obvious in the site’s own viewer statistics.

Whatever agency is at play here is far from clear: the video starts with a trollish Peppa parody, but later syncs into the kind of automated repetition of tropes we’ve seen before. It’s not just trolls, or just automation; it’s not just human actors playing out an algorithmic logic, or algorithms mindlessly responding to recommendation engines. It’s a vast and almost completely hidden matrix of interactions between desires and rewards, technologies and audiences, tropes and masks. Other examples seem less accidental, and more intentional. One whole strand of video production involves automated recuts of video game footage, reprogrammed with superheroes or cartoon characters instead of soldiers and gangsters.

pages: 328 words: 84,682

The Business of Platforms: Strategy in the Age of Digital Competition, Innovation, and Power
by Michael A. Cusumano , Annabelle Gawer and David B. Yoffie
Published 6 May 2019

Think about how Amazon, founded by Jeff Bezos in 1994, expanded from being an online store selling books to an online store selling nearly everything, from electronics products to groceries, and with same-day delivery for some products.38 Even in the early days, Amazon used digital technology to promote online store sales, building a recommendation engine and collecting user evaluations. One estimate is that 40 percent of Amazon’s sales today come through its recommendation engine.39 Then, in the late 1990s, Bezos added the global Amazon Marketplace—what we have called a transaction platform—linking buyers and third-party sellers. Amazon combined the marketplace with its own online store and other fulfillment services, such as billing and shipping, in addition to a massive network of physical warehouses.

pages: 88 words: 25,047

The Mathematics of Love: Patterns, Proofs, and the Search for the Ultimate Equation
by Hannah Fry
Published 3 Feb 2015

And that’s it – apply this algorithm to the hundreds of available questions and repeat for each of the millions of users on OkCupid and you’ve got everything you need for one of the world’s most successful dating websites. It’s one of the most elegant approaches ever attempted to pairing couples based on their personal preferences. Together with eHarmony and other similar websites, OkCupid sits alongside Amazon and Netflix as one of the most widely used recommendation engines on the internet. But there’s one problem – if the internet is the ultimate matchmaker, why are people still going on terrible dates? If the science is so good, surely that first date will be the last first date of your life? Shouldn’t the algorithm be able to deliver the perfect partner and leave it at that?

pages: 366 words: 94,209

Throwing Rocks at the Google Bus: How Growth Became the Enemy of Prosperity
by Douglas Rushkoff
Published 1 Mar 2016

The information superhighway morphed into an interactive strip mall; digital technology’s ability to connect people to products, facilitate payments, and track behaviors led to all sorts of new marketing and sales innovations. “Buy” buttons triggered the impulse for instant gratification, while recommendation engines personalized marketing pitches. It was commerce on crack. With a few notable exceptions—such as eBay and Etsy—we didn’t really get a return of the many-to-many marketplace or digital bazaar. No, in online commerce it’s mostly a few companies selling to many, and many people selling to the very few—if anyone at all.

Amazon then leveraged its monopoly in books and free shipping to develop monopolies in other verticals, beginning with home electronics (bankrupting Circuit City and Best Buy), and then every other link in the physical and virtual fulfillment chain, from shoes and food to music and videos. Finally, Amazon flips into personhood by reversing the traditional relationship between people and machines. Amazon’s patented recommendation engines attempt to drive our human selection process. Amazon Mechanical Turks gave computers the ability to mete out repetitive tasks to legions of human drones. The computers did the thinking and choosing; the people pointed and clicked as they were instructed or induced to do. Neither Amazon nor its founder, Jeff Bezos, is slipping to new lows here.

pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order
by Kai-Fu Lee
Published 14 Sep 2018

Does Amazon seem to know what you’ll want to buy before you do? If so, then you have been the beneficiary (or victim, depending on how you value your time, privacy, and money) of internet AI. This first wave began almost fifteen years ago but finally went mainstream around 2012. Internet AI is largely about using AI algorithms as recommendation engines: systems that learn our personal preferences and then serve up content hand-picked for us. The horsepower of these AI engines depends on the digital data they have access to, and there’s currently no greater storehouse of this data than the major internet companies. But that data only becomes truly useful to algorithms once it has been labeled.

See artificial intelligence (AI) AI engineers, 14 Airbnb, 39, 49, 73 AI revolution deep learning and, 5, 25, 92, 94, 143 economic impact of, 151–52 speed of, 152–55 AI winters, 6–7, 8, 9, 10 algorithmic bias, 229 algorithms, AI AI revolution and, 152–53 computing power and, 14, 56 credit and, 112–13 data and, 14, 17, 56, 138 fake news detection by, 109 intelligence sharing and, 87 legal applications for, 115–16 medical diagnosis and, 114–15 as recommendation engines, 107–8 robot reporting, 108 white-collar workers and, 167, 168 Alibaba Amazon compared to, 109 Chinese startups and, 58 City Brain, 93–94, 117, 124, 228 as dominant AI player, 83, 91, 93–94 eBay and, 34–35 financial services spun off from, 73 four waves of AI and, 106, 107, 109 global markets and, 137 grid approach and, 95 Microsoft Research Asia and, 89 mobile payments transition, 76 New York Stock Exchange debut, 66–67 online purchasing and, 68 success of, 40 Tencent’s “Pearl Harbor attack” on, 60–61 Wang Xing and, 24 Alipay, 35, 60, 69, 73–74, 75, 112, 118 Alphabet, 92–93 AlphaGo, 1–4, 5, 6, 11, 199 AlphaGo Zero, 90 Altman, Sam, 207 Amazon Alibaba compared to, 109 Chinese market and, 39 data captured by, 77 as dominant AI player, 83, 91 four waves of AI and, 106 grid approach and, 95 innovation mentality at, 33 monopoly of e-commerce, 170 online purchasing and, 68 Wang Xing and, 24 warehouses, 129–30 Amazon Echo, 117, 127 Amazon Go, 163, 213 Anderson, Chris, 130 Andreesen Horowitz, 70 Ant Financial, 73 antitrust laws, 20, 28, 171, 229 Apollo project, 135 app constellation model, 70 Apple, 33, 75, 117, 126, 143, 177, 184 Apple Pay, 75, 76 app-within-an-app model, 59 ARM (British firm), 96 Armstrong, Neil, 3 artificial general intelligence (AGI), 140–44 artificial intelligence (AI) introduction to, ix–xi See also China; deep learning; economy and AI; four waves of AI; global AI story; human coexistence with AI; new world order artificial superintelligence.

pages: 374 words: 94,508

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage
by Douglas B. Laney
Published 4 Sep 2017

This information can also have real commercial value—especially when mashed with other sources—to understand and act on local or global market conditions, population trends, and weather, for example. Public data even can be used to create new (ahem) high-value businesses such as Potbot, a virtual cannabis “budtender.” At its core is a recommendation engine that uses information on strains, cannabinoids, and medical applications aggregated via semantic web technology. Potbot also incorporates data from cannabis seed DNA scans along with recordings of brain activity in clinical tests. It monetizes this information, not just in the form of a consumer app, but also in helping growers improve their yields for the most popular or beneficial strains.1617 Public data is most monetizable when integrated with your own proprietary information.

Even when embedded into business applications, they tend to present charts or numbers in an application window. Ideally, output is updated to reflect the user’s activity and needs, but less often is it used to affect the business process directly. Evolving to complex-event processing solutions, recommendation engines, rule-based systems, or artificial intelligence (AI), combined with business process management and workflow systems, can help to optimize business processes more directly, either supplementing or supplanting human intervention. Case in point: a company formed from a collection of shopping stalls in 1919 by an English trader named Jack Cohen today has hardwired its thousands of refrigeration units to a data warehouse.

pages: 94 words: 26,453

The End of Nice: How to Be Human in a World Run by Robots (Kindle Single)
by Richard Newton
Published 11 Apr 2015

Like the sirens of legends sung sweet songs to lure sailors to crash on the rocky shore of their island, so Lanier thinks we must be wary of the attractions of the siren servers. They don’t want to make your life more complicated. They are there to make everything frictionless: “Leave it to me”, they sing. “I’ll find you new music you might like, books you’ll want to read, videos you want to watch and friends you should like.” We’re sort of used to the idea that recommendation engines work like this. We know that ads now follow us around the web and that books will be unhelpfully recommended to us by Amazon. But search results are also tailored to you. And that’s more of a concern. The search results you get will be different to the results for an identical search made by me.

Mining of Massive Datasets
by Jure Leskovec , Anand Rajaraman and Jeffrey David Ullman
Published 13 Nov 2014

However, these technologies by themselves are not sufficient, and there are some new algorithms that have proven effective for recommendation systems. 9.1A Model for Recommendation Systems In this section we introduce a model for recommendation systems, based on a utility matrix of preferences. We introduce the concept of a “long-tail,” which explains the advantage of on-line vendors over conventional, brick-and-mortar vendors. We then briefly survey the sorts of applications in which recommendation systems have proved useful. 9.1.1The Utility Matrix In a recommendation-system application there are two classes of entities, which we shall refer to as users and items.

Here, the term “on-line” refers to the nature of the algorithm, and should not be confused with “on-line” meaning “on the Internet” in phrases such as “on-line algorithms for on-line advertising.” 2 A chesterfield is a type of sofa. See, for example, www.chesterfields.info. 3 Thanks to Anna Karlin for this example. 9 Recommendation Systems There is an extensive class of Web applications that involve predicting user responses to options. Such a facility is called a recommendation system. We shall begin this chapter with a survey of the most important examples of these systems. However, to bring the problem into focus, two good examples of recommendation systems are: (1)Offering news articles to on-line newspaper readers, based on a prediction of reader interests. (2)Offering customers of an on-line retailer suggestions about what they might like to buy, based on their past history of purchases and/or product searches.

Rather, it is only necessary to discover some entries in each row that are likely to be high. In most applications, the recommendation system does not offer users a ranking of all items, but rather suggests a few that the user should value highly. It may not even be necessary to find all items with the highest expected ratings, but only to find a large subset of those with the highest ratings. 9.1.2The Long Tail Before discussing the principal applications of recommendation systems, let us ponder the long tail phenomenon that makes recommendation systems necessary. Physical delivery systems are characterized by a scarcity of resources.

pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations
by Nicholas Carr
Published 5 Sep 2016

The information may take the form of personal messages or updates from friends or colleagues, broadcast messages from experts or celebrities whose opinions or observations we value, headlines and stories from writers or publications we like, alerts about the availability of various other sorts of content on favorite subjects, or suggestions from recommendation engines—but it all shares the quality of being tailored to our particular interests. It’s all needles. And modern filters don’t just organize that information for us; they push the information at us as alerts, updates, streams. We tend to point to spam as an example of information overload. But spam is just an annoyance.

Social media is a palliative for underemployment. 18. The philistine appears ideally suited to the role of cultural impresario online. 19. Television became more interesting when people started paying for it. 20. Instagram shows us what a world without art looks like. SECOND SERIES (2013) 21. Recommendation engines are the best cure for hubris. 22. Vines would be better if they were one second shorter. 23. Hell is other selfies. 24. Twitter has revealed that brevity and verbosity are not always antonyms. 25. Personalized ads provide a running critique of artificial intelligence. 26. Who you are is what you do between notifications. 27.

pages: 382 words: 105,819

Zucked: Waking Up to the Facebook Catastrophe
by Roger McNamee
Published 1 Jan 2019

The ease with which like-minded extremists can find one another creates the illusion of legitimacy. Protected from real-world stigma, communication among extreme voices over internet platforms generally evolves to more dangerous language. Normalization lowers a barrier for the curious; algorithmic reinforcement leads some users to increasingly extreme positions. Recommendation engines can and do exploit that. For example, former YouTube algorithm engineer Guillaume Chaslot created a program to take snapshots of what YouTube would recommend to users. He learned that when a user watches a regular 9/11 news video, YouTube will then recommend 9/11 conspiracies; if a teenage girl watches a video on food dietary habits, YouTube will recommend videos that promote anorexia-related behaviors.

It is not for nothing that the industry jokes about YouTube’s “three degrees of Alex Jones,” referring to the notion that no matter where you start, YouTube’s algorithms will often surface a Jones conspiracy theory video within three recommendations. In an op-ed in Wired, my colleague Renée DiResta quoted YouTube chief product officer Neal Mohan as saying that 70 percent of the views on his platform are from recommendations. In the absence of a commitment to civic responsibility, the recommendation engine will be programmed to do the things that generate the most profit. Conspiracy theories cause users to spend more time on the site. Once a person identifies with an extreme position on an internet platform, he or she will be subject to both filter bubbles and human nature. A steady flow of ideas that confirm beliefs will lead many users to make choices that exclude other ideas both online and off.

pages: 428 words: 103,544

The Data Detective: Ten Easy Rules to Make Sense of Statistics
by Tim Harford
Published 2 Feb 2021

What sort of accountability or transparency we want depends on what problem we are trying to solve. We might, for example, want to distinguish YouTube’s algorithm for recommending videos from Netflix’s algorithm for recommending movies. There is plenty of disturbing content on YouTube, and its recommendation engine has become notorious for its apparent tendency to suggest ever more fringy and conspiratorial videos. It’s not clear that the evidence supports the idea that YouTube is an engine of radicalization, but without more transparency it’s hard to be sure.36 Netflix illustrates a different issue: competition.

See health and medical data public opinion, 149, 220 public transportation, 47–49 publication bias, 113–16, 118–23, 125–27 publicity, 107 Puerto Rico, 197–98, 200 Puy de Dôme, France, 172 Quetelet, Adolphe, 219 racial data, 176–79, 206 Random Walk down Wall Street, A (Malkiel), 125 randomized clinical trials, 4n, 53, 125–26, 133, 180 randomness, 123–24 Rapid Safety Feedback, 170–71 Rayner, Derek, 205–8 Reaper Man (Pratchett), 87 recessions, 11 recommendation engines, 181 record-keeping practices, 220–21 refugees, 191 Reifler, Jason, 129 Reischauer, Robert, 187 Reiter, Jonathan, 108 reliability of data, 233–37 religious authority, 16 religious beliefs, 247–48 Remington Rand, 244 replication/reproducibility studies and problems, 107, 112–16, 120–22, 129–31 Republican Party, 34, 189n, 269, 270 résumé-sorting algorithms, 166 ridership data, 49–51 Riecken, Henry, 239 risk models, 71 Rivlin, Alice, 186–87, 188, 212 Robinson, Nicholas, 168, 169–70 Roman Catholic Church, 16 Rönnlund, Anna Rosling, 62, 63 Roosevelt, Franklin Delano, 143–44 rose diagrams, 215–16, 233–36, 234 Roser, Max, 89, 96 Rosling, Hans, 63, 185 Ross, Lee, 35 Royal Naval Reserve, 218 Royal Society, 13 Royal Statistical Society, 194, 214, 219, 233 Rozenblit, Leonid, 272 Ruge, Mari, 89 sampling techniques, 135–38, 142–51, 155 Samuelson, Paul, 239 sanitation advocacy, 225–26, 233–37 Santos, Alexander, 198 Say It with Charts (Zelazny), 228 scale of statistical data, 92, 93–95, 103 Scarr, Simon, 231–32 Schachter, Stanley, 239 Scheibehenne, Benjamin, 106, 111, 114, 120–21 Scientific American, 102 scientific curiosity, 268–69 scientific literacy, 34–35 scientific method, 173 Scott, James C., 201, 203 Scott Brown, Denise, 217 screen-use studies, 117–18 Scutari (Üsküdar, Istanbul) barracks hospital, 213–14, 220, 225, 233, 235 search algorithms, 156–57 Second World War, 4, 262 secrecy, 174–75 Seehofer, Horst, 191 Seeing Like a State (Scott), 201, 203 selection bias, 2, 245–46.

pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together
by Nick Polson and James Scott
Published 14 May 2018

Each envelope would come back a few days after it had been sent out, along with the subscriber’s rating of the film on a 1-to-5 scale. As that ratings data accumulated, Netflix’s algorithms would look for patterns, and over time, subscribers would get better film recommendations. (This kind of AI is usually called a “recommender system”; we also like the term “suggestion engine.”) Netflix 1.0 was so focused on improving its recommender system that in 2007, to great fanfare among math geeks the world over, it announced a public machine-learning contest with a prize of $1 million. The company put some of its ratings data on a public server, and it challenged all comers to improve upon Netflix’s own system, called Cinematch, by at least 10%—that is, by predicting how you’d rate a film with 10% better accuracy than Netflix could.

Abraham Wald never shot down a Messerschmitt or even saw the inside of a combat aircraft. Nonetheless, he made an outsized contribution to the Allied war effort using an equally potent weapon: conditional probability. Specifically, Wald built a recommender system that could make personalized survivability suggestions for different kinds of planes. At its heart, it was just like a modern AI-based recommender system for TV shows. And when you understand how he built it, you’ll also understand a lot more about Netflix, Hulu, Spotify, Instagram, Amazon, YouTube, and just about every tech company that’s ever made you an automatic suggestion worth following.

See health care and medicine Medtronic Menger, Karl Microsoft Microsoft Azure modeling assumptions and deep-learning models imputation and Inception latent feature massive models missing data and model rust natural language processing and prediction rules as reality versus rules-based (top-down) models training the model Moneyball Moore’s law Moravec paradox Morgenstern, Oskar Musk, Elon natural language processing (NLP) ambiguity and bottom-up approach chatbots digital assistants future trends Google Translate growth of statistical NLP knowing how versus knowing that natural language revolution “New Deal” for human-machine linguistic interaction prediction rules and programing language revolution robustness and rule bloat and speech recognition top-down approach word co-location statistics word vectors naturally occurring radioactive materials (NORM) Netflix Crown, The (series) data scientists history of House of Cards (series) Netflix Prize for recommender system personalization recommender systems neural networks deep learning and Friends new episodes and Inception model prediction rules and New England Patriots Newton, Isaac Nightingale, Florence coxcomb diagram (1858) Crimean War and early years and training evidence-based medicine legacy of “lady with the lamp” medical statistics legacy of nursing reform legacy of Nvidia Obama, Barack Office of Scientific Research and Development parallax pattern recognition cucumber sorting input and output learning a pattern maximum heart rate and prediction rules and toilet paper theft and See also prediction rules PayPal personalization conditional probability and latent feature models and Netflix and Wald’s survivability recommendations for aircraft and See also recommender systems; suggestion engines philosophy Pickering, Edward C.

Data Mining: Concepts and Techniques: Concepts and Techniques
by Jiawei Han , Micheline Kamber and Jian Pei
Published 21 Jun 2011

Data mining technology can be used to develop strong intrusion detection and prevention systems, which may employ signature-based or anomaly-based detection. 13.3.5. Data Mining and Recommender Systems Today's consumers are faced with millions of goods and services when shopping online. Recommender systems help consumers by making product recommendations that are likely to be of interest to the user such as books, CDs, movies, restaurants, online news articles, and other services. Recommender systems may use either a content-based approach, a collaborative approach, or a hybrid approach that combines both content-based and collaborative methods.

Such profiles may be obtained explicitly (e.g., through questionnaires) or learned from users' transactional behavior over time. A collaborative recommender system tries to predict the utility of items for a user, u, based on items previously rated by other users who are similar to u. For example, when recommending books, a collaborative recommender system tries to find other users who have a history of agreeing with u (e.g., they tend to buy similar books, or give similar ratings for books). Collaborative recommender systems can be either memory (or heuristic) based or model based. Memory-based methods essentially use heuristics to make rating predictions based on the entire collection of items previously rated by users.

A weighted aggregate can be used, which adjusts for the fact that different users may use the rating scale differently. Model-based collaborative recommender systems use a collection of ratings to learn a model, which is then used to make rating predictions. For example, probabilistic models, clustering (which finds clusters of like-minded customers), Bayesian networks, and other machine learning techniques have been used. Recommender systems face major challenges such as scalability and ensuring quality recommendations to the consumer. For example, regarding scalability, collaborative recommender systems must be able to search through millions of potential neighbors in real time.

pages: 151 words: 39,757

Ten Arguments for Deleting Your Social Media Accounts Right Now
by Jaron Lanier
Published 28 May 2018

The correlations are effectively theories about the nature of each person, and those theories are constantly measured and rated for how predictive they are. Like all well-managed theories, they improve over time through adaptive feedback. C is for Cramming content down people’s throats Algorithms choose what each person experiences through their devices. This component might be called a feed, a recommendation engine, or personalization. Component C means each person sees different things. The immediate motivation is to deliver stimuli for individualized behavior modification. BUMMER makes it harder to understand why others think and act the way they do. The effects of this component will be examined more in the arguments about how you are losing access to truth and the capacity for empathy.

pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More
by Luke Dormehl
Published 4 Nov 2014

Conversely, scores fall dramatically in situations where the task takes longer than expected.33 Decimated-Reality Aggregators Speaking in October 1944, during the rebuilding of the House of Commons, which had sustained heavy bombing damage during the Battle of Britain, former British prime minister Winston Churchill observed, “We shape our buildings; thereafter they shape us.”34 A similar sentiment might be said in the age of The Formula, in which users shape their online profiles, and from that point forward their online profiles begin to shape them—both in terms of what we see and, perhaps more crucially, what we don’t. Writing about a start-up called Nara, in the middle of 2013, I coined the phrase “decimated reality aggregators” to describe what the company was trying to do.35 Starting out as a restaurant recommender system by connecting together thousands of restaurants around the world, Nara’s ultimate goal was to become the recommender system for your life: drawing on what it knew about you from the restaurants you ate in, to suggest everything from hotels to clothes. Nara even incorporated the idea of upward mobility into its algorithm. Say, for example, you wanted to be a wine connoisseur two years down the line, but currently had no idea how to tell your Chardonnay from your Chianti.

“Neil was adamant that this should be based on science,” Carter says. Before eHarmony, the majority of dating websites took the form of searchable personal ads, of the kind that have been appearing in print since the 17th century.11 After eHarmony, the search engine model was replaced with a recommender system praised in press materials for its “scientific precision.” Instead of allowing users to scan through page after page of profiles, eHarmony simply required them to answer a series of questions—and then picked out the right option on their behalf. The website opened its virtual doors for the first time on August 22, 2000.

All a character has to do—as occurs during one scene in which the novel’s bumbling protagonist, Lenny Abramov, visits a Staten Island nightclub with his friends—is to set the “community parameters” of their iPhone-like device to a particular physical space and hit a button. At this point, every aspect of a person’s profile is revealed, including their “fuckability” and “personality” scores (both ranked on a scale of 800), along with their ranked “anal/oral/vaginal” preferences. There is even a recommender system incorporated, so that a user’s history of romantic relationships can be scrutinized for insights in much the same way that a person’s previous orders on Amazon might dictate what they will be interested in next. As one of Abramov’s friends notes, “This girl [has] a long multimedia thing on how her father abused her . . .

pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life
by Adam Greenfield
Published 29 May 2017

The equivalent of classification for unsupervised learning is clustering, in which an algorithm starts to develop a sense for what is significant in its environment via a process of accretion. A concrete example will help us understand how this works. At the end of the 1990s, two engineers named Tim Westegren and Will Glaser developed a rudimentary music-recommendation engine called the Music Genome Project that worked by rebuilding genre from the bottom up. (The engineers eventually founded the Pandora streaming service, and folded their recommendation engine into it.) Music Genome compared the acoustic signatures and other performance characteristics of the pieces of music it was offered, and from them built up associative maps, clustering together all the songs that had similar qualities; after many iterations, these clusters developed a strong resemblance to the musical categories we’re familiar with.

pages: 163 words: 42,402

Machine Learning for Email
by Drew Conway and John Myles White
Published 25 Oct 2011

More likely, you have heard of something like a recommendation system, which implicitly produces a ranking of products. Even if you have not heard of a recommendation system, it’s almost certain that you have used or interacted with a recommendation system at some point. Some of the most successful e-commerce websites have benefitted from leveraging data on their users to generate recommendations for other products their users might be interested in. For example, if you have ever shopped at Amazon.com, then you have interacted with a recommendation system. The problem Amazon faces is simple: what items in their inventory are you most likely to buy?

Many hackers may be more comfortable thinking of problems in terms of the process by which a solution is attained, rather than the theoretical foundation from which the solution is derived. From this perspective, an alternative approach to teaching machine learning would be to use “cookbook” style examples. To understand how a recommendation system works, for example, we might provide sample training data and a version of the model, and show how the latter uses the former. There are many useful texts of this kind as well—Toby Segaran’s Programming Collective Intelligence is an recent example Seg07. Such a discussion would certainly address the how of a hacker’s method of learning, but perhaps less of the why.

The implication of that statement is that the items in Amazon’s inventory have an ordering specific to each user. Likewise, Netflix.com has a massive library of DVDs available to its customers to rent. In order for those customers to get the most out of the site, Netflix employs a sophisticated recommendation system to present people with rental suggestions. For both companies, these recommendations are based on two kinds of data. First, there is the data pertaining to the inventory itself. For Amazon, if the product is a television, this data might contain the type (i.e., plasma, LCD, LED), manufacturer, price, and so on.

pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders
by Mariya Yao , Adelyn Zhou and Marlene Jia
Published 1 Jun 2018

Extensions of this technology include applications such as Pinterest’s Lens and eBay’s ShopBot, which recognize items in pictures uploaded by consumers and make recommendations of similar items currently for sale. The next frontier in recommendation systems is the cold-start scenario, in which algorithms must be able to draw good inferences about users or items despite insufficient information. Layer 6 AI, recently acquired by TD Bank, has focused on making relatively accurate predictions on noisy data in a cold-start scenario. Customer personalization is like a recommendation system on steroids, delivering highly relevant content, experience, or products to consumers without their having to exert additional effort.

Many real-world datasets have noisy, incorrect labels or are missing labels entirely, meaning that inputs and outputs are paired incorrectly with each other or are not paired at all. Active learning, a special case of semi-supervised learning, occurs when an algorithm actively queries a user to discover the right output or label for a new input. Active learning is used to optimize recommendation systems, like the ones used to recommend movies on Netflix or products on Amazon. Reinforcement learning is learning by trial-and-error, in which a computer program is instructed to achieve a stated goal in a dynamic environment. The program learns by repeatedly taking actions, measuring the feedback from those actions, and iteratively improving its behavioral policy.

In machine learning, you can easily incur massive ongoing systems costs by failing to mitigate risks early in the development process.(84) Your most talented data scientists and machine learning engineers want to build new models. Few of them are dedicated to the unsexy tasks of maintaining existing models. However, the performance of your existing models will deteriorate as environmental conditions change over time. For example, as your e-commerce inventory changes, your recommender system will need to learn to suggest new products to shoppers. As more machine learning algorithms are put into production, you will also need to dedicate more resources to model maintenance—monitoring, validating, and updating the model. A myriad of dependencies lead to machine learning debt, with certain practices incurring more technical debt than others.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
by Zdravko Markov and Daniel T. Larose
Published 5 Apr 2007

A more general approach would be to consider persons and items again connected by the relation “person likes item.” This is the approach taken in the area of collaborative filtering (also called recommender systems) [3]. Assume that we have m persons and n items (e.g., books, songs, movies, web pages). We arrange them in a m × n matrix M, where each row is a person, each column is an item, and the cells represent the binary relation “likes.” Thus, if person i COLLABORATIVE FILTERING (RECOMMENDER SYSTEMS) 85 likes item j, then M(i, j) = 1; otherwise, M(i, j) = 0. The problem is that many cells are empty (i.e., we don’t know whether or not a person likes an item).

CONTENTS PREFACE xi PART I WEB STRUCTURE MINING 1 2 INFORMATION RETRIEVAL AND WEB SEARCH 3 Web Challenges Web Search Engines Topic Directories Semantic Web Crawling the Web Web Basics Web Crawlers Indexing and Keyword Search Document Representation Implementation Considerations Relevance Ranking Advanced Text Search Using the HTML Structure in Keyword Search Evaluating Search Quality Similarity Search Cosine Similarity Jaccard Similarity Document Resemblance References Exercises 3 4 5 5 6 6 7 13 15 19 20 28 30 32 36 36 38 41 43 43 HYPERLINK-BASED RANKING 47 Introduction Social Networks Analysis PageRank Authorities and Hubs Link-Based Similarity Search Enhanced Techniques for Page Ranking References Exercises 47 48 50 53 55 56 57 57 vii viii CONTENTS PART II WEB CONTENT MINING 3 4 5 CLUSTERING 61 Introduction Hierarchical Agglomerative Clustering k-Means Clustering Probabilty-Based Clustering Finite Mixture Problem Classification Problem Clustering Problem Collaborative Filtering (Recommender Systems) References Exercises 61 63 69 73 74 76 78 84 86 86 EVALUATING CLUSTERING 89 Approaches to Evaluating Clustering Similarity-Based Criterion Functions Probabilistic Criterion Functions MDL-Based Model and Feature Evaluation Minimum Description Length Principle MDL-Based Model Evaluation Feature Selection Classes-to-Clusters Evaluation Precision, Recall, and F-Measure Entropy References Exercises 89 90 95 100 101 102 105 106 108 111 112 112 CLASSIFICATION 115 General Setting and Evaluation Techniques Nearest-Neighbor Algorithm Feature Selection Naive Bayes Algorithm Numerical Approaches Relational Learning References Exercises 115 118 121 125 131 133 137 138 PART III WEB USAGE MINING 6 INTRODUCTION TO WEB USAGE MINING 143 Definition of Web Usage Mining Cross-Industry Standard Process for Data Mining Clickstream Analysis 143 144 147 CONTENTS 7 8 9 ix Web Server Log Files Remote Host Field Date/Time Field HTTP Request Field Status Code Field Transfer Volume (Bytes) Field Common Log Format Identification Field Authuser Field Extended Common Log Format Referrer Field User Agent Field Example of a Web Log Record Microsoft IIS Log Format Auxiliary Information References Exercises 148 PREPROCESSING FOR WEB USAGE MINING 156 Need for Preprocessing the Data Data Cleaning and Filtering Page Extension Exploration and Filtering De-Spidering the Web Log File User Identification Session Identification Path Completion Directories and the Basket Transformation Further Data Preprocessing Steps References Exercises 156 149 149 149 150 151 151 151 151 151 152 152 152 153 154 154 154 158 161 163 164 167 170 171 174 174 174 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177 Introduction Number of Visit Actions Session Duration Relationship between Visit Actions and Session Duration Average Time per Page Duration for Individual Pages References Exercises 177 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION Introduction Modeling Methodology Definition of Clustering The BIRCH Clustering Algorithm Affinity Analysis and the A Priori Algorithm 177 178 181 183 185 188 188 191 191 192 193 194 197 x CONTENTS Discretizing the Numerical Variables: Binning Applying the A Priori Algorithm to the CCSU Web Log Data Classification and Regression Trees The C4.5 Algorithm References Exercises INDEX 199 201 204 208 210 211 213 PREFACE DEFINING DATA MINING THE WEB By data mining the Web, we refer to the application of data mining methodologies, techniques, and models to the variety of data forms, structures, and usage patterns that comprise the World Wide Web.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage C 2007 John Wiley & Sons, Inc. By Zdravko Markov and Daniel T. Larose Copyright  CHAPTER 3 CLUSTERING INTRODUCTION HIERARCHICAL AGGLOMERATIVE CLUSTERING k-MEANS CLUSTERING PROBABILTY-BASED CLUSTERING COLLABORATIVE FILTERING (RECOMMENDER SYSTEMS) INTRODUCTION The most popular approach to learning is by example. Given a set of objects, each labeled with a class (category), the learning system builds a mapping between objects and classes which can then be used for classifying new (unlabeled) objects. As the labeling (categorization) of the initial (training) set of objects is done by an agent external to the system (teacher), this setting is called supervised learning.

pages: 170 words: 49,193

The People vs Tech: How the Internet Is Killing Democracy (And How We Save It)
by Jamie Bartlett
Published 4 Apr 2018

These algorithms are designed to serve you content that you’re likely to click on, as that means the potential to sell more advertising alongside it. For example, YouTube’s ‘up next’ videos are statistically selected based on an unbelievably sophisticated analysis of what is most likely to keep a person hooked in. According to Guillaume Chaslot, an AI specialist who worked on the recommendation engine for YouTube, the algorithms aren’t there to optimise what is truthful or honest – but to optimise watch-time. ‘Everything else was considered a distraction,’ he recently told the Guardian.17 These non-decision decisions have huge implications, because even mild confirmation bias can set off a cycle of self-perpetuation.

pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python
by Joel Grus
Published 13 Apr 2015

principal component analysis, Dimensionality Reduction probability, Probability-For Further Exploration, MathematicsBayes's Theorem, Bayes’s Theorem central limit theorem, The Central Limit Theorem conditional, Conditional Probability continuous distributions, Continuous Distributions defined, Probability dependence and independence, Dependence and Independence normal distribution, The Normal Distribution random variables, Random Variables probability density function, Continuous Distributions programming languages for learning data science, From Scratch Python, A Crash Course in Python-For Further Explorationargs and kwargs, args and kwargs arithmetic, Arithmetic benefits of using for data science, From Scratch Booleans, Truthiness control flow, Control Flow Counter, Counter dictionaries, Dictionaries-defaultdict enumerate function, enumerate exceptions, Exceptions functional tools, Functional Tools functions, Functions generators and iterators, Generators and Iterators list comprehensions, List Comprehensions lists, Lists object-oriented programming, Object-Oriented Programming piping data through scripts using stdin and stdout, stdin and stdout random numbers, generating, Randomness regular expressions, Regular Expressions sets, Sets sorting in, The Not-So-Basics strings, Strings tuples, Tuples whitespace formatting, Whitespace Formatting zip function and argument unpacking, zip and Argument Unpacking Q quantile, computing, Central Tendencies query optimization (SQL), Query Optimization R R (programming language), From Scratch, R random forests, Random Forests random module (Python), Randomness random variables, Random VariablesBernoulli, The Central Limit Theorem binomial, The Central Limit Theorem conditioned on events, Random Variables expected value, Random Variables normal, The Normal Distribution-The Central Limit Theorem uniform, Continuous Distributions range, Dispersion range function (Python), Generators and Iterators reading files (see files, reading) recall, Correctness recommendations, Recommender Systems recommender systems, Recommender Systems-For Further ExplorationData Scientists You May Know (example), Data Scientists You May Know item-based collaborative filtering, Item-Based Collaborative Filtering-For Further Exploration manual curation, Manual Curation recommendations based on popularity, Recommending What’s Popular user-based collaborative filtering, User-Based Collaborative Filtering-User-Based Collaborative Filtering reduce function (Python), Functional Toolsusing with vectors, Vectors regression (see linear regression; logistic regression) regression trees, What Is a Decision Tree?

For Further Exploration There are many other notions of centrality besides the ones we used (although the ones we used are pretty much the most popular ones). NetworkX is a Python library for network analysis. It has functions for computing centralities and for visualizing graphs. Gephi is a love-it/hate-it GUI-based network-visualization tool. Chapter 22. Recommender Systems O nature, nature, why art thou so dishonest, as ever to send men with these false recommendations into the world! Henry Fielding Another common data problem is producing recommendations of some sort. Netflix recommends movies you might want to watch. Amazon recommends products you might want to buy.

= other_interest_id and similarity > 0] return sorted(pairs, key=lambda (_, similarity): similarity, reverse=True) which suggests the following similar interests: [('Hadoop', 0.8164965809277261), ('Java', 0.6666666666666666), ('MapReduce', 0.5773502691896258), ('Spark', 0.5773502691896258), ('Storm', 0.5773502691896258), ('Cassandra', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('neural networks', 0.4082482904638631), ('HBase', 0.3333333333333333)] Now we can create recommendations for a user by summing up the similarities of the interests similar to his: def item_based_suggestions(user_id, include_current_interests=False): # add up the similar interests suggestions = defaultdict(float) user_interest_vector = user_interest_matrix[user_id] for interest_id, is_interested in enumerate(user_interest_vector): if is_interested == 1: similar_interests = most_similar_interests_to(interest_id) for interest, similarity in similar_interests: suggestions[interest] += similarity # sort them by weight suggestions = sorted(suggestions.items(), key=lambda (_, similarity): similarity, reverse=True) if include_current_interests: return suggestions else: return [(suggestion, weight) for suggestion, weight in suggestions if suggestion not in users_interests[user_id]] For user 0, this generates the following (seemingly reasonable) recommendations: [('MapReduce', 1.861807319565799), ('Postgres', 1.3164965809277263), ('MongoDB', 1.3164965809277263), ('NoSQL', 1.2844570503761732), ('programming languages', 0.5773502691896258), ('MySQL', 0.5773502691896258), ('Haskell', 0.5773502691896258), ('databases', 0.5773502691896258), ('neural networks', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('C++', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('Python', 0.2886751345948129), ('R', 0.2886751345948129)] For Further Exploration Crab is a framework for building recommender systems in Python. Graphlab also has a recommender toolkit. The Netflix Prize was a somewhat famous competition to build a better system to recommend movies to Netflix users. Chapter 23. Databases and SQL Memory is man’s greatest friend and worst enemy. Gilbert Parker The data you need will often live in databases, systems designed for efficiently storing and querying data.

pages: 181 words: 52,147

The Driver in the Driverless Car: How Our Technology Choices Will Create the Future
by Vivek Wadhwa and Alex Salkever
Published 2 Apr 2017

I couldn’t, for example, recall the winning and losing pitcher in every baseball game of the major leagues from the previous night. Narrow A.I. is now embedded in the fabric of our everyday lives. The humanoid phone trees that route calls to airlines’ support desks are all narrow A.I., as are recommendation engines in Amazon and Spotify. Google Maps’ astonishingly smart route suggestions (and mid-course modifications to avoid traffic) are classic narrow A.I. Narrow-A.I. systems are much better than humans are at accessing information stored in complex databases, but their capabilities are specific and limited, and exclude creative thought.

pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension
by Samuel Arbesman
Published 18 Jul 2016

The sophisticated machine learning techniques used in linguistics—employing probability and a large array of parameters rather than principled rules—are increasingly being used in numerous other areas, both in science and outside it, from criminal detection to medicine, as well as in the insurance industry. Even our aesthetic tastes are rather complicated, as Netflix discovered when it awarded a prize for improvements in its recommendation engine to a team whose solution was cobbled together from a variety of different statistical techniques. The contest seemed to demonstrate that no simple algorithm could provide a significant improvement in recommendation accuracy; the winners needed to use a more complex suite of methods in order to capture and predict our personal and quirky tastes in films.

pages: 196 words: 54,339

Team Human
by Douglas Rushkoff
Published 22 Jan 2019

Instead of retrieving the peer-to-peer marketplace, the digital economy exacerbates the division of wealth and paralyzes the social instincts for mutual aid that usually mitigate its effects. Digital platforms amplify the power law dynamics that determine winners and losers. While digital music platforms make space for many more performers to sell their music, their architecture and recommendation engines end up promoting many fewer artists than a diverse ecosystem of record stores or FM radio did. One or two superstars get all the plays, and everyone else sells almost nothing. It’s the same across the board. While the net creates more access for artists and businesses of all kinds, it allows fewer than ever to make any money.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Published 21 Sep 2015

Satellites, DNA sequencers, and particle accelerators probe nature in ever-finer detail, and learning algorithms turn the torrents of data into new scientific knowledge. Companies know their customers like never before. The candidate with the best voter models wins, like Obama against Romney. Unmanned vehicles pilot themselves across land, sea, and air. No one programmed your tastes into the Amazon recommendation system; a learning algorithm figured them out on its own, by generalizing from your past purchases. Google’s self-driving car taught itself how to stay on the road; no engineer wrote an algorithm instructing it, step-by-step, how to get from A to B. No one knows how to program a car to drive, and no one needs to, because a car equipped with a learning algorithm picks it up by observing what the driver does.

The Master Algorithm is the complete package. Applying it to vast amounts of patient and drug data, combined with knowledge mined from the biomedical literature, is how we will cure cancer. A universal learner is sorely needed in many other areas, from life-and-death to mundane situations. Picture the ideal recommender system, one that recommends the books, movies, and gadgets you would pick for yourself if you had the time to check them all out. Amazon’s algorithm is a very far cry from it. That’s partly because it doesn’t have enough data—mainly it just knows which items you previously bought from Amazon—but if you went hog wild and gave it access to your complete stream of consciousness from birth, it wouldn’t know what to do with it.

Using the k nearest neighbors instead of one is not the end of the story. Intuitively, the examples closest to the test example should count for more. This leads us to the weighted k-nearest-neighbor algorithm. In 1994, a team of researchers from the University of Minnesota and MIT built a recommendation system based on what they called “a deceptively simple idea”: people who agreed in the past are likely to agree again in the future. That notion led directly to the collaborative filtering systems that all self-respecting e-commerce sites have. Suppose that, like Netflix, you’ve gathered a database of movie ratings, with each user giving a rating of one to five stars to the movies he or she has seen.

pages: 463 words: 105,197

Radical Markets: Uprooting Capitalism and Democracy for a Just Society
by Eric Posner and E. Weyl
Published 14 May 2018

Today, machines learn from the statistical patterns in human behavior, and may be able to use this information to distribute goods (and jobs) as well as, or possibly better than, people can choose goods (and jobs) themselves. We are very far from this point, but we can see the outlines of the route that we might travel. Let us start with an increasingly familiar phenomenon: machine learning–based recommendation systems drawing on existing market behavior. How does Netflix guess what movies you are likely to enjoy? Roughly, it finds people who are like you—who watch many of the movies you watch—and gives those movies ratings similar to your ratings. It then infers that you will enjoy movies you have not yet seen that your hidden doppelgangers have seen and rated highly.

INDEX Italic page numbers indicate figures and tables abortion, 27, 112–13, 116 Acemoglu, Daron, 240, 316n4 activism, 3, 124, 140, 176–77, 188, 193, 211, 232 Adachi, Kentaro, 80–81, 105–8 Africa, 136, 138 African Americans, 24, 89, 209–10 Airbnb, 70, 117 airlines, 171, 183, 189–91, 194 Akerlof, George, 66–67 algorithms, 208, 214, 219, 221, 281–82, 289–93, 307n7 Allen, Robert C., 240 Amazon, 112, 230–31, 234, 239, 248, 288, 290–91 American Constitution, 86–87 American Federation of Musicians, 210 American Tobacco Company, 174 America OnLine (AOL), 210 Anderson, Chris, 212 antitrust: Clayton Act and, 176–77, 197, 311n25; landlords and, 201–2; monopolies and, 23, 48, 174–77, 180, 184–86, 191, 197–203, 242, 255, 262, 286; resale price maintenance and, 200–201; social media and, 202 Apple, 117, 239, 289 Arginoussai Islands, 83 aristocracy, 16–17, 22–23, 36–38, 84–85, 87, 90, 135–36 Aristotle, 172 Arrow, Kenneth, 92, 303n17 Articles of Confederation, 88 artificial intelligence (AI), 202, 257, 287; Alexa and, 248; algorithms and, 208, 214, 219, 221, 281–82, 289–93; automated video editing and, 208; Cortana and, 219; data capacities and, 236; Deep Blue and, 213; democratization of, 219; diminishing returns and, 229–30; facial recognition and, 208, 216–19; factories for thinking machines and, 213–20; Google Assistant and, 219; human-produced data for, 208–9; marginal value and, 224–28, 247; Microsoft and, 219; neural networks and, 214–19; payment systems for, 224–30; recommendation systems and, 289–90; siren servers and, 220–24, 230–41, 243; Siri and, 219, 248; technofeudalism and, 230–33; techno-optimists and, 254–55, 316n2; techno-pessimists and, 254–55, 316n2; worker replacement and, 223 Athens, 55, 83–84, 131 Atwood, Margaret, 18–19 auctions, xv–xxi, 49–51, 70–71, 97, 99, 147–49, 156–57, 300n34 au pair program, 154–55, 161 Australia, 10, 12, 13, 159, 162 Austrian school, 2 Autor, David, 240 Azar, José, 185, 189, 310n24 Bahrain, 158 banking industry, 182–84, 183, 190 Bank of America, 183, 184 Becker, Gary, 147 Beckford, William, 95 behavioral finance, 180–81 Bénabou, Roland, 236–37 Bentham, Jeremy, 4, 35, 95–96, 98, 132 Berle, Adolf, 177–78, 183, 193–94 Berlin Wall, 1, 140 Berners-Lee, Tim, 210 big data, 213, 226, 293 Bing, xxi BlackRock, 171, 181–84, 183, 187, 191 Brazil, xiii–xvii, 105, 135 Brin, Sergey, 211 broadcast spectrum, xxi, 50–51, 71 Bush, George W., 78 Cabral, Luís, 202 Cadappster app, 31 Caesar, Julius, 84 Canada, 10, 13, 159, 182 capitalism, xvi; basic structure of, 24–25; competition and, 17 (see also competition); corporate planning and, 39–40; cultural consequences of, 270, 273; Engels on, 239–40; freedom and, 34–39; George on, 36–37; growth and, 3 (see also growth, economic); industrial revolution, 36, 255; inequality and, 3 (see also inequality); labor and, 136–37, 143, 159, 165, 211, 224, 231, 239–40, 316n4; laissez-faire, 45; liberalism and, 3, 17, 22–27; markets and, 278, 288, 304n36; Marx on, 239–40; monopolies and, 22–23, 34–39, 44, 46–49, 132, 136, 173, 177, 179, 199, 258, 262; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 34–36, 39, 45–49, 75, 78–79; property and, 34–36, 39, 45–49, 75, 78–79; Radical Markets and, 169, 180–85, 203, 273; regulations and, 262; Schumpeter on, 47; shareholders and, 118, 170, 178–84, 189, 193–95; technology and, 34, 203, 316n4; wealth and, 45, 75, 78–79, 136, 143, 239, 273 Capitalism and Freedom (Friedman), xiii Capitalism for the People, A (Luigi), 203 Capra, Frank, 17 Carroll, Lewis, 176 central planning: computers and, 277–85, 288–93; consumers and, 19; democracy and, 89; governance and, 19–20, 39–42, 46–48, 62, 89, 277–85, 288–90, 293; healthcare and, 290–91; liberalism and, 19–20; markets and, 277–85, 288–93; property and, 39–42, 46–48, 62; recommendation systems and, 289–90; socialism and, 39–42, 47, 277, 281 Chetty, Raj, 11 Chiang Kai-shek, 46 China, 15, 46, 56, 133–34, 138 Christensen, Clayton, 202 Chrysler, 193 Citigroup, 183, 184, 191 Clarke, Edward, 99, 102, 105 Clayton Act, 176–77, 197, 311n25 Clemens, Michael, 162 Coase, Ronald, 40, 48–51, 299n26 Cold War, xix, 25, 288 collective bargaining, 240–41 collective decisions: democracy and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; manipulation of, 99; markets for, 97–105; public goods and, 98; Quadratic Voting (QV) and, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; Vickrey and, 99, 102, 105 colonialism, 8, 131 Coming of the Third Reich, The (Evans), 93 common ownership self-assessed tax (COST): broader application of, 273–76; cybersquatters and, 72; education and, 258–59; efficiency and, 256, 261; equality and, 258; globalization and, 269–70; growth and, 73, 256; human capital and, 258–61; immigrants and, 261, 269, 273; inequality and, 256–59; international trade and, 270; investment and, 258–59, 270; legal issues and, 275; markets and, 286; methodology of, 63–66; monopolies and, 256–61, 270, 300n43; objections to, 300n43; optimality and, 61, 73, 75–79, 317n18; personal possessions and, 301n47, 317n18; political effects of, 261–64; predatory outsiders and, 300n43; prices and, 62–63, 67–77, 256, 258, 263, 275, 300n43, 317n18; property and, 31, 61–79, 271–74, 300n43, 301n47; public goods and, 256; public leases and, 69–72; Quadratic Voting (QV) and, 123–25, 194, 261–63, 273, 275, 286; Radical Markets and, 79, 123–26, 257–58, 271–72, 286; taxes and, 61–69, 73–76, 258–61, 275, 317n18; technology and, 71–72, 257–59; true market economy and, 72–75; voting and, 263; wealth and, 256–57, 261–64, 269–70, 275, 286 communism, 19–20, 46–47, 93–94, 125, 278 competition: antitrust policies and, 23, 48, 174–77, 180, 184–86, 191, 197–203, 242, 255, 262, 286; auctions and, xv–xix, 49–51, 70–71, 97, 99, 147–49, 156–57; bargaining and, 240–41, 299n26; democracy and, 109, 119–20; by design, 49–55; elitism and, 25–28; equilibrium and, 305n40; eternal vigilance and, 204; horizontal concentration and, 175; imperfect, 304n36; indexing and, 185–91, 302n63; innovation and, 202–3; investment and, 196–97; labor and, 145, 158, 162–63, 220, 234, 236, 239, 243, 245, 256, 266; laissez-faire and, 253; liberalism and, 6, 17, 20–28; lobbyists and, 262; monopolies and, 174; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 20–21, 41, 49–55, 79; perfect, 6, 25–28, 109; prices and, 20–22, 25, 173, 175, 180, 185–90, 193, 200–201, 204, 244; property and, 41, 49–55, 79; Quadratic Voting (QV) and, 304n36; regulations and, 262; resale price maintenance and, 200–201; restoring, 191–92; Section 7 and, 196–97, 311n25; selfishness and, 109, 270–71; Smith on, 17; tragedy of the commons and, 44 complexity, 218–20, 226–28, 274–75, 279, 281, 284, 287, 313n15 “Computer and the Market, The” (Lange), 277 computers: algorithms and, 208, 214, 219, 221, 281–82, 289–93; automation of labor and, 222–23, 251, 254; central planning and, 277–85, 288–93; data and, 213–14, 218, 222, 233, 244, 260; Deep Blue, 213; distributed computing and, 282–86, 293; growth in poor countries and, 255; as intermediaries, 274; machine learning (ML) and, 214 (see also machine learning [ML]); markets and, 277, 280–93; Mises and, 281; Moore’s Law and, 286–87; Open-Trac and, 31–32; parallel processing and, 282–86; prices of, 21; recommendation systems and, 289–90 Condorcet, Marquis de, 4, 90–93, 303n15, 306n51 conspicuous consumption, 78 Consumer Reports magazine, 291 consumers: antitrust suits and, 175, 197–98; central planning and, 19; data from, 47, 220, 238, 242–44, 248, 289; drone delivery to, 220; as entrepreneurs, 256; goods and services for, 27, 92, 123, 130, 175, 280, 292; institutional investment and, 190–91; international culture for, 270; lobbyists and, 262; machine learning (ML) and, 238; monopolies and, 175, 186, 197–98; preferences of, 280, 288–93; prices and, 172 (see also prices); recommendation systems and, 289–90; robots and, 287; sharing economy and, 117; Soviet collapse and, 289; technology and, 287 cooperatives, 118, 126, 261, 267, 299n24 Corbyn, Jeremy, 12, 13 corruption, 3, 23, 27, 57, 93, 122, 126, 157, 262 Cortana, 219 cost-benefit analysis, 2, 244 “Counterspeculation, Auctions and Competitive Sealed Tenders” (Vickrey), xx–xxi Cramton, Peter, 52, 54–55, 57 crowdsourcing, 235 crytocurrencies, 117–18 cybersquatters, 72 data: algorithms and, 208, 214, 219, 221, 281–82, 289–93; big, 213, 226, 293; computers and, 213–14, 218, 222, 233, 244, 260; consumer, 47, 220, 238, 242–44, 248, 289; diamond-water paradox and, 224–25; diminishing returns and, 226, 229–30; distribution of complexity and, 228; as entertainment, 233–39, 248–49; Facebook and, 28, 205–9, 212–13, 220–21, 231–48; feedback and, 114, 117, 233, 238, 245; free, 209, 211, 220, 224, 231–35, 239; Google and, 28, 202, 207–13, 219–20, 224, 231–36, 241–42, 246; investment in, 212, 224, 232, 244; labeled, 217–21, 227, 228, 230, 232, 234, 237; labor movement for, 241–43; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; marginal value and, 224–28, 247; network effects and, 211, 236, 238, 243; neural networks and, 214–19; online services and, 211, 235; overfitting and, 217–18; payment systems for, 210–13, 224–30; photographs and, 64, 214–15, 217, 219–21, 227–28, 291; programmers and, 163, 208–9, 214, 217, 219, 224; Radical Markets for, 246–49; reCAPTCHA and, 235–36; recommendation systems and, 289–90; rise of data work and, 209–13; sample complexity and, 217–18; siren servers and, 220–24, 230–41, 243; social networks and, 202, 212, 231, 233–36; technofeudalism and, 230–33; under-employment and, 256; value of, 243–45; venture capital and, 211, 224; virtual reality and, 206, 208, 229, 251, 253; women’s work and, 209, 313n4 Declaration of Independence, 86 Deep Blue, 213 DeFoe, Daniel, 132 Demanding Work (Gray and Suri), 233 democracy: 1p1v system and, 82–84, 94, 109, 119, 122–24, 304n36, 306n51; artificial intelligence (AI) and, 219; Athenians and, 55, 83–84, 131; auctions and, 97, 99; basic structure of, 24–25; central planning and, 89; check and balance systems and, 23, 25, 87, 92; collective decisions and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; collective mediocrity and, 96; competition and, 109, 119–20; Declaration of Independence and, 86; efficiency and, 92, 110, 126; elections and, 22, 80, 93, 100, 115, 119–21, 124, 217–18, 296n20; elitism and, 89–91, 96, 124; Enlightenment and, 86, 95; Europe and, 90–96; France and, 90–95; governance and, 84, 117; gridlock and, 84, 88, 122–24, 261, 267; Hitler and, 93–94; House of Commons and, 84–85; House of Lords and, 85; impossibility theorem and, 92; inequality and, 123; Jury Theorem and, 90–92; liberalism and, 3–4, 25, 80, 86, 90; limits of, 85–86; majority rule and, 27, 83–89, 92–97, 100–101, 121, 306n51; markets and, 97–105, 262, 276; minorities and, 85–90, 93–97, 101, 106, 110; mixed constitution and, 84–85; multi-candidate, single-winner elections and, 119–20; origins of, 83–85; ownership and, 81–82, 89, 101, 105, 118, 124; public goods and, 28, 97–100, 107, 110, 120, 123, 126; Quadratic Voting (QV) and, 105–22; Radical Markets and, 82, 106, 123–26, 203; supermajorities and, 84–85, 88, 92; tyrannies and, 23, 25, 88, 96–100, 106, 108; United Kingdom and, 95–96; United States and, 86–90, 93, 95; voting and, 80–82, 85–93, 96, 99, 105, 108, 115–16, 119–20, 123–24, 303n14, 303n17, 303n20, 304n36, 305n39; wealth and, 83–84, 87, 95, 116 Demosthenes, 55 Denmark, 182 Department of Justice (DOJ), 176, 186, 191 deregulation, 3, 9, 24 Desmond, Matthew, 201–2 Dewey, John, 43 Dickens, Charles, 36 digital economy: data producers and, 208–9, 230–31; diamond-water paradox and, 224–25; as entertainment, 233–39; facial recognition and, 208, 216, 218–19; free access and, 211; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; machine learning (ML) and, 208–9, 213–14, 217–21, 226–31, 234–35, 238, 247, 289, 291, 315n48; payment systems for, 210–13, 221–30, 243–45; programmers and, 163, 208–9, 214, 217, 219, 224; rise of data work and, 209–13; siren servers and, 220–24, 230–41, 243; spam and, 210, 245; technofeudalism and, 230–33; virtual reality and, 206, 208, 229, 251, 253 diversification, 171–72, 180–81, 185, 191–92, 194–96, 310n22, 310n24 dot-com bubble, 211 double taxation, 65 Dupuit, Jules, 173 Durkheim, Émile, 297n23 Dworkin, Ronald, 305n40 dystopia, 18, 191, 273, 293 education, 114; common ownership self-assessed tax (COST) and, 258; data and, 229, 232, 248; elitism and, 260; equality in, 89; financing, 276; free compulsory, 23; immigrants and, 14, 143–44, 148; labor and, 140, 143–44, 148, 150, 158, 170–71, 232, 248, 258–60; Mill on, 96; populist movements and, 14; Stolper-Samuelson Theorem and, 143 efficient capital markets hypothesis, 180 elections, 80; data and, 217–18; democracy and, 22, 93, 100, 115, 119–21, 124, 217–18, 296n20; gridlock and, 124; Hitler and, 93; multi-candidate, single-winner, 119–20; polls and, 13, 111; Quadratic Voting (QV) and, 115, 119–21, 268, 306n52; U.S. 2016, 93, 296n20 Elhauge, Einer, 176, 197 elitism: aristocracy and, 16–17, 22–23, 36–38, 84–85, 87, 90, 135–36; bourgeoisie and, 36; bureaucrats and, 267; democracy and, 89–91, 96, 124; education and, 260; feudalism and, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239; financial deregulation and, 3; immigrants and, 146, 166; liberalism and, 3, 15–16, 25–28; minorities and, 12, 14–15, 19, 23–27, 85–90, 93–97, 101, 106, 110, 181, 194, 273, 303n14, 304n36; monarchies and, 85–86, 91, 95, 160 Emergency Economic Stabilization Act, 121 eminent domain, 33, 62, 89 Empire State Building, 45 Engels, Friedrich, 78, 240 Enlightenment, 86, 95 entrepreneurs, xiv; immigrants and, 144–45, 159, 256; labor and, 129, 144–45, 159, 173, 177, 203, 209–12, 224, 226, 256; ownership and, 35, 39 equality: common ownership self-assessed tax (COST) and, 258; education and, 89; immigrants and, 257; labor and, 147, 166, 239, 257; liberalism and, 4, 8, 24, 29; living standards and, 3, 11, 13, 133, 135, 148, 153, 254, 257; Quadratic Voting (QV) and, 264; Radical Markets and, 262, 276; trickle down theories and, 9, 12 Espinosa, Alejandro, 30–32 Ethereum, 117 Europe, 177, 201; democracy and, 88, 90–95; European Union and, 15; fiefdoms in, 34; government utilities and, 48; income patterns in, 5; instability in, 88; labor and, 11, 130–31, 136–47, 165, 245; social democrats and, 24; unemployment rates in, 11 Evans, Richard, 93 Evicted (Desmond), 201–2 Ex Machina (film), 208 Facebook, xxi; advertising and, 50, 202; data and, 28, 205–9, 212–13, 220–21, 231–48; monetization by, 28; news service of, 289; Vickrey Commons and, 50 facial recognition, 208, 216–19 family reunification programs, 150, 152 farms, 17, 34–35, 37–38, 61, 72, 135, 142, 179, 283–85 Federal Communications Commission (FCC), 50, 71 Federal Trade Commission (FTC), 176, 186 feedback, 114, 117, 233, 238, 245 feudalism, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239 Fidelity, 171, 181–82, 184 financial crisis of 2008, 3, 121 Fitzgerald, F.

INDEX Italic page numbers indicate figures and tables abortion, 27, 112–13, 116 Acemoglu, Daron, 240, 316n4 activism, 3, 124, 140, 176–77, 188, 193, 211, 232 Adachi, Kentaro, 80–81, 105–8 Africa, 136, 138 African Americans, 24, 89, 209–10 Airbnb, 70, 117 airlines, 171, 183, 189–91, 194 Akerlof, George, 66–67 algorithms, 208, 214, 219, 221, 281–82, 289–93, 307n7 Allen, Robert C., 240 Amazon, 112, 230–31, 234, 239, 248, 288, 290–91 American Constitution, 86–87 American Federation of Musicians, 210 American Tobacco Company, 174 America OnLine (AOL), 210 Anderson, Chris, 212 antitrust: Clayton Act and, 176–77, 197, 311n25; landlords and, 201–2; monopolies and, 23, 48, 174–77, 180, 184–86, 191, 197–203, 242, 255, 262, 286; resale price maintenance and, 200–201; social media and, 202 Apple, 117, 239, 289 Arginoussai Islands, 83 aristocracy, 16–17, 22–23, 36–38, 84–85, 87, 90, 135–36 Aristotle, 172 Arrow, Kenneth, 92, 303n17 Articles of Confederation, 88 artificial intelligence (AI), 202, 257, 287; Alexa and, 248; algorithms and, 208, 214, 219, 221, 281–82, 289–93; automated video editing and, 208; Cortana and, 219; data capacities and, 236; Deep Blue and, 213; democratization of, 219; diminishing returns and, 229–30; facial recognition and, 208, 216–19; factories for thinking machines and, 213–20; Google Assistant and, 219; human-produced data for, 208–9; marginal value and, 224–28, 247; Microsoft and, 219; neural networks and, 214–19; payment systems for, 224–30; recommendation systems and, 289–90; siren servers and, 220–24, 230–41, 243; Siri and, 219, 248; technofeudalism and, 230–33; techno-optimists and, 254–55, 316n2; techno-pessimists and, 254–55, 316n2; worker replacement and, 223 Athens, 55, 83–84, 131 Atwood, Margaret, 18–19 auctions, xv–xxi, 49–51, 70–71, 97, 99, 147–49, 156–57, 300n34 au pair program, 154–55, 161 Australia, 10, 12, 13, 159, 162 Austrian school, 2 Autor, David, 240 Azar, José, 185, 189, 310n24 Bahrain, 158 banking industry, 182–84, 183, 190 Bank of America, 183, 184 Becker, Gary, 147 Beckford, William, 95 behavioral finance, 180–81 Bénabou, Roland, 236–37 Bentham, Jeremy, 4, 35, 95–96, 98, 132 Berle, Adolf, 177–78, 183, 193–94 Berlin Wall, 1, 140 Berners-Lee, Tim, 210 big data, 213, 226, 293 Bing, xxi BlackRock, 171, 181–84, 183, 187, 191 Brazil, xiii–xvii, 105, 135 Brin, Sergey, 211 broadcast spectrum, xxi, 50–51, 71 Bush, George W., 78 Cabral, Luís, 202 Cadappster app, 31 Caesar, Julius, 84 Canada, 10, 13, 159, 182 capitalism, xvi; basic structure of, 24–25; competition and, 17 (see also competition); corporate planning and, 39–40; cultural consequences of, 270, 273; Engels on, 239–40; freedom and, 34–39; George on, 36–37; growth and, 3 (see also growth, economic); industrial revolution, 36, 255; inequality and, 3 (see also inequality); labor and, 136–37, 143, 159, 165, 211, 224, 231, 239–40, 316n4; laissez-faire, 45; liberalism and, 3, 17, 22–27; markets and, 278, 288, 304n36; Marx on, 239–40; monopolies and, 22–23, 34–39, 44, 46–49, 132, 136, 173, 177, 179, 199, 258, 262; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 34–36, 39, 45–49, 75, 78–79; property and, 34–36, 39, 45–49, 75, 78–79; Radical Markets and, 169, 180–85, 203, 273; regulations and, 262; Schumpeter on, 47; shareholders and, 118, 170, 178–84, 189, 193–95; technology and, 34, 203, 316n4; wealth and, 45, 75, 78–79, 136, 143, 239, 273 Capitalism and Freedom (Friedman), xiii Capitalism for the People, A (Luigi), 203 Capra, Frank, 17 Carroll, Lewis, 176 central planning: computers and, 277–85, 288–93; consumers and, 19; democracy and, 89; governance and, 19–20, 39–42, 46–48, 62, 89, 277–85, 288–90, 293; healthcare and, 290–91; liberalism and, 19–20; markets and, 277–85, 288–93; property and, 39–42, 46–48, 62; recommendation systems and, 289–90; socialism and, 39–42, 47, 277, 281 Chetty, Raj, 11 Chiang Kai-shek, 46 China, 15, 46, 56, 133–34, 138 Christensen, Clayton, 202 Chrysler, 193 Citigroup, 183, 184, 191 Clarke, Edward, 99, 102, 105 Clayton Act, 176–77, 197, 311n25 Clemens, Michael, 162 Coase, Ronald, 40, 48–51, 299n26 Cold War, xix, 25, 288 collective bargaining, 240–41 collective decisions: democracy and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; manipulation of, 99; markets for, 97–105; public goods and, 98; Quadratic Voting (QV) and, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; Vickrey and, 99, 102, 105 colonialism, 8, 131 Coming of the Third Reich, The (Evans), 93 common ownership self-assessed tax (COST): broader application of, 273–76; cybersquatters and, 72; education and, 258–59; efficiency and, 256, 261; equality and, 258; globalization and, 269–70; growth and, 73, 256; human capital and, 258–61; immigrants and, 261, 269, 273; inequality and, 256–59; international trade and, 270; investment and, 258–59, 270; legal issues and, 275; markets and, 286; methodology of, 63–66; monopolies and, 256–61, 270, 300n43; objections to, 300n43; optimality and, 61, 73, 75–79, 317n18; personal possessions and, 301n47, 317n18; political effects of, 261–64; predatory outsiders and, 300n43; prices and, 62–63, 67–77, 256, 258, 263, 275, 300n43, 317n18; property and, 31, 61–79, 271–74, 300n43, 301n47; public goods and, 256; public leases and, 69–72; Quadratic Voting (QV) and, 123–25, 194, 261–63, 273, 275, 286; Radical Markets and, 79, 123–26, 257–58, 271–72, 286; taxes and, 61–69, 73–76, 258–61, 275, 317n18; technology and, 71–72, 257–59; true market economy and, 72–75; voting and, 263; wealth and, 256–57, 261–64, 269–70, 275, 286 communism, 19–20, 46–47, 93–94, 125, 278 competition: antitrust policies and, 23, 48, 174–77, 180, 184–86, 191, 197–203, 242, 255, 262, 286; auctions and, xv–xix, 49–51, 70–71, 97, 99, 147–49, 156–57; bargaining and, 240–41, 299n26; democracy and, 109, 119–20; by design, 49–55; elitism and, 25–28; equilibrium and, 305n40; eternal vigilance and, 204; horizontal concentration and, 175; imperfect, 304n36; indexing and, 185–91, 302n63; innovation and, 202–3; investment and, 196–97; labor and, 145, 158, 162–63, 220, 234, 236, 239, 243, 245, 256, 266; laissez-faire and, 253; liberalism and, 6, 17, 20–28; lobbyists and, 262; monopolies and, 174; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 20–21, 41, 49–55, 79; perfect, 6, 25–28, 109; prices and, 20–22, 25, 173, 175, 180, 185–90, 193, 200–201, 204, 244; property and, 41, 49–55, 79; Quadratic Voting (QV) and, 304n36; regulations and, 262; resale price maintenance and, 200–201; restoring, 191–92; Section 7 and, 196–97, 311n25; selfishness and, 109, 270–71; Smith on, 17; tragedy of the commons and, 44 complexity, 218–20, 226–28, 274–75, 279, 281, 284, 287, 313n15 “Computer and the Market, The” (Lange), 277 computers: algorithms and, 208, 214, 219, 221, 281–82, 289–93; automation of labor and, 222–23, 251, 254; central planning and, 277–85, 288–93; data and, 213–14, 218, 222, 233, 244, 260; Deep Blue, 213; distributed computing and, 282–86, 293; growth in poor countries and, 255; as intermediaries, 274; machine learning (ML) and, 214 (see also machine learning [ML]); markets and, 277, 280–93; Mises and, 281; Moore’s Law and, 286–87; Open-Trac and, 31–32; parallel processing and, 282–86; prices of, 21; recommendation systems and, 289–90 Condorcet, Marquis de, 4, 90–93, 303n15, 306n51 conspicuous consumption, 78 Consumer Reports magazine, 291 consumers: antitrust suits and, 175, 197–98; central planning and, 19; data from, 47, 220, 238, 242–44, 248, 289; drone delivery to, 220; as entrepreneurs, 256; goods and services for, 27, 92, 123, 130, 175, 280, 292; institutional investment and, 190–91; international culture for, 270; lobbyists and, 262; machine learning (ML) and, 238; monopolies and, 175, 186, 197–98; preferences of, 280, 288–93; prices and, 172 (see also prices); recommendation systems and, 289–90; robots and, 287; sharing economy and, 117; Soviet collapse and, 289; technology and, 287 cooperatives, 118, 126, 261, 267, 299n24 Corbyn, Jeremy, 12, 13 corruption, 3, 23, 27, 57, 93, 122, 126, 157, 262 Cortana, 219 cost-benefit analysis, 2, 244 “Counterspeculation, Auctions and Competitive Sealed Tenders” (Vickrey), xx–xxi Cramton, Peter, 52, 54–55, 57 crowdsourcing, 235 crytocurrencies, 117–18 cybersquatters, 72 data: algorithms and, 208, 214, 219, 221, 281–82, 289–93; big, 213, 226, 293; computers and, 213–14, 218, 222, 233, 244, 260; consumer, 47, 220, 238, 242–44, 248, 289; diamond-water paradox and, 224–25; diminishing returns and, 226, 229–30; distribution of complexity and, 228; as entertainment, 233–39, 248–49; Facebook and, 28, 205–9, 212–13, 220–21, 231–48; feedback and, 114, 117, 233, 238, 245; free, 209, 211, 220, 224, 231–35, 239; Google and, 28, 202, 207–13, 219–20, 224, 231–36, 241–42, 246; investment in, 212, 224, 232, 244; labeled, 217–21, 227, 228, 230, 232, 234, 237; labor movement for, 241–43; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; marginal value and, 224–28, 247; network effects and, 211, 236, 238, 243; neural networks and, 214–19; online services and, 211, 235; overfitting and, 217–18; payment systems for, 210–13, 224–30; photographs and, 64, 214–15, 217, 219–21, 227–28, 291; programmers and, 163, 208–9, 214, 217, 219, 224; Radical Markets for, 246–49; reCAPTCHA and, 235–36; recommendation systems and, 289–90; rise of data work and, 209–13; sample complexity and, 217–18; siren servers and, 220–24, 230–41, 243; social networks and, 202, 212, 231, 233–36; technofeudalism and, 230–33; under-employment and, 256; value of, 243–45; venture capital and, 211, 224; virtual reality and, 206, 208, 229, 251, 253; women’s work and, 209, 313n4 Declaration of Independence, 86 Deep Blue, 213 DeFoe, Daniel, 132 Demanding Work (Gray and Suri), 233 democracy: 1p1v system and, 82–84, 94, 109, 119, 122–24, 304n36, 306n51; artificial intelligence (AI) and, 219; Athenians and, 55, 83–84, 131; auctions and, 97, 99; basic structure of, 24–25; central planning and, 89; check and balance systems and, 23, 25, 87, 92; collective decisions and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; collective mediocrity and, 96; competition and, 109, 119–20; Declaration of Independence and, 86; efficiency and, 92, 110, 126; elections and, 22, 80, 93, 100, 115, 119–21, 124, 217–18, 296n20; elitism and, 89–91, 96, 124; Enlightenment and, 86, 95; Europe and, 90–96; France and, 90–95; governance and, 84, 117; gridlock and, 84, 88, 122–24, 261, 267; Hitler and, 93–94; House of Commons and, 84–85; House of Lords and, 85; impossibility theorem and, 92; inequality and, 123; Jury Theorem and, 90–92; liberalism and, 3–4, 25, 80, 86, 90; limits of, 85–86; majority rule and, 27, 83–89, 92–97, 100–101, 121, 306n51; markets and, 97–105, 262, 276; minorities and, 85–90, 93–97, 101, 106, 110; mixed constitution and, 84–85; multi-candidate, single-winner elections and, 119–20; origins of, 83–85; ownership and, 81–82, 89, 101, 105, 118, 124; public goods and, 28, 97–100, 107, 110, 120, 123, 126; Quadratic Voting (QV) and, 105–22; Radical Markets and, 82, 106, 123–26, 203; supermajorities and, 84–85, 88, 92; tyrannies and, 23, 25, 88, 96–100, 106, 108; United Kingdom and, 95–96; United States and, 86–90, 93, 95; voting and, 80–82, 85–93, 96, 99, 105, 108, 115–16, 119–20, 123–24, 303n14, 303n17, 303n20, 304n36, 305n39; wealth and, 83–84, 87, 95, 116 Demosthenes, 55 Denmark, 182 Department of Justice (DOJ), 176, 186, 191 deregulation, 3, 9, 24 Desmond, Matthew, 201–2 Dewey, John, 43 Dickens, Charles, 36 digital economy: data producers and, 208–9, 230–31; diamond-water paradox and, 224–25; as entertainment, 233–39; facial recognition and, 208, 216, 218–19; free access and, 211; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; machine learning (ML) and, 208–9, 213–14, 217–21, 226–31, 234–35, 238, 247, 289, 291, 315n48; payment systems for, 210–13, 221–30, 243–45; programmers and, 163, 208–9, 214, 217, 219, 224; rise of data work and, 209–13; siren servers and, 220–24, 230–41, 243; spam and, 210, 245; technofeudalism and, 230–33; virtual reality and, 206, 208, 229, 251, 253 diversification, 171–72, 180–81, 185, 191–92, 194–96, 310n22, 310n24 dot-com bubble, 211 double taxation, 65 Dupuit, Jules, 173 Durkheim, Émile, 297n23 Dworkin, Ronald, 305n40 dystopia, 18, 191, 273, 293 education, 114; common ownership self-assessed tax (COST) and, 258; data and, 229, 232, 248; elitism and, 260; equality in, 89; financing, 276; free compulsory, 23; immigrants and, 14, 143–44, 148; labor and, 140, 143–44, 148, 150, 158, 170–71, 232, 248, 258–60; Mill on, 96; populist movements and, 14; Stolper-Samuelson Theorem and, 143 efficient capital markets hypothesis, 180 elections, 80; data and, 217–18; democracy and, 22, 93, 100, 115, 119–21, 124, 217–18, 296n20; gridlock and, 124; Hitler and, 93; multi-candidate, single-winner, 119–20; polls and, 13, 111; Quadratic Voting (QV) and, 115, 119–21, 268, 306n52; U.S. 2016, 93, 296n20 Elhauge, Einer, 176, 197 elitism: aristocracy and, 16–17, 22–23, 36–38, 84–85, 87, 90, 135–36; bourgeoisie and, 36; bureaucrats and, 267; democracy and, 89–91, 96, 124; education and, 260; feudalism and, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239; financial deregulation and, 3; immigrants and, 146, 166; liberalism and, 3, 15–16, 25–28; minorities and, 12, 14–15, 19, 23–27, 85–90, 93–97, 101, 106, 110, 181, 194, 273, 303n14, 304n36; monarchies and, 85–86, 91, 95, 160 Emergency Economic Stabilization Act, 121 eminent domain, 33, 62, 89 Empire State Building, 45 Engels, Friedrich, 78, 240 Enlightenment, 86, 95 entrepreneurs, xiv; immigrants and, 144–45, 159, 256; labor and, 129, 144–45, 159, 173, 177, 203, 209–12, 224, 226, 256; ownership and, 35, 39 equality: common ownership self-assessed tax (COST) and, 258; education and, 89; immigrants and, 257; labor and, 147, 166, 239, 257; liberalism and, 4, 8, 24, 29; living standards and, 3, 11, 13, 133, 135, 148, 153, 254, 257; Quadratic Voting (QV) and, 264; Radical Markets and, 262, 276; trickle down theories and, 9, 12 Espinosa, Alejandro, 30–32 Ethereum, 117 Europe, 177, 201; democracy and, 88, 90–95; European Union and, 15; fiefdoms in, 34; government utilities and, 48; income patterns in, 5; instability in, 88; labor and, 11, 130–31, 136–47, 165, 245; social democrats and, 24; unemployment rates in, 11 Evans, Richard, 93 Evicted (Desmond), 201–2 Ex Machina (film), 208 Facebook, xxi; advertising and, 50, 202; data and, 28, 205–9, 212–13, 220–21, 231–48; monetization by, 28; news service of, 289; Vickrey Commons and, 50 facial recognition, 208, 216–19 family reunification programs, 150, 152 farms, 17, 34–35, 37–38, 61, 72, 135, 142, 179, 283–85 Federal Communications Commission (FCC), 50, 71 Federal Trade Commission (FTC), 176, 186 feedback, 114, 117, 233, 238, 245 feudalism, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239 Fidelity, 171, 181–82, 184 financial crisis of 2008, 3, 121 Fitzgerald, F.

pages: 362 words: 103,087

The Elements of Choice: Why the Way We Decide Matters
by Eric J. Johnson
Published 12 Oct 2021

This can lead to happier customers and more productive firms. Netflix was producing more than 33 million versions of its site as early as 2013. To do that, Netflix has to know something useful about its customers. Some of that knowledge comes from Netflix’s recommendation system. Some estimate that it adds $1 billion of value to the firm. We’ll get to the topic of recommender systems, but first I want to talk about a broader and sometimes simpler concept: a user model.5 Whenever we customize a site to increase its usefulness to a chooser, it is because we believe we know something about that person. That knowledge, that picture of a person, drives the customization.

Since the default engine was chosen more often, they became concerned that defaults might make some customers less happy, not to mention increase sales of the least-expensive engine. We suggested that they customize defaults for different customers, something we call smart defaults. The managers liked the idea, but there was one small problem. They didn’t want to invest in a complex recommender system to suggest defaults to each user. It would take a lot of effort to build such a system, and it might not be that useful. Cars are not purchased often and data about past purchases might not be that effective, given that people’s needs change between purchases. A thirty-year-old whose last purchase was a sports car might now need a family sedan or SUV, having gotten married and had children since the last time they bought a car.

Since they have complementary strengths and weakness, this makes sense.6 It is important to note, however, that a user model is not synonymous with fancy AI. If we want to know something about the customer, we can often do important customization by asking a quick question, like “What kind of a car are you shopping for?” or “How old are you?” Most of the buzz around recommender systems usually emphasizes replacing choice. Another view is that user models allow the designer to augment the choice architecture. Instead of AI for making choices, maybe we need to think about IA, intelligent augmentation, where choice architecture assists choices. Control to the Customer Visit almost any website and you are in control.

pages: 554 words: 149,489

The Content Trap: A Strategist's Guide to Digital Change
by Bharat Anand
Published 17 Oct 2016

Consider the intrinsic technology properties of networked products, or the word-of-mouth benefits that arise from seemingly unpredictable acts of sharing by interested individuals. It’s tempting to view these user connections as “acts of nature” over which managers have little control. But that’s not the case. By 2002 Amazon had spent more than five years creating a formidable advantage in e-commerce. That came not only from a user-friendly platform and recommendation engine—both features were adopted by other entrants—but from its warehousing and logistics operation. By building distribution centers across the country, investing in algorithms to optimize pick-time in the centers, and hiring operational wizards from Walmart and other competitors, Amazon could get products to customers anywhere in the United States faster and cheaper than anyone else.

Netflix’s queueing system, widely regarded as a tool to enhance user convenience, was instead really a powerful lever for demand forecasting: It told the company exactly what movies every customer in every part of the country wanted next, letting it tailor inventory in different warehouses to local preferences. The recommendation engine, also thought of as a means of increasing customer satisfaction, doubled as an inventory management tool: It let the company recommend not only movies a customer might like, but also those that were in stock! Netflix integrated its sorting machines with the U.S. Postal Service to make deliveries more efficient.

How to Be a Liberal: The Story of Liberalism and the Fight for Its Life
by Ian Dunt
Published 15 Oct 2020

Nowhere was this process more evident than on YouTube. It quickly became the most popular social network in the US, with far more users than there were viewers for cable news. And those users were subject to an algorithm that seemed to push them towards ever more extreme material for their political tribe. This was chiefly because of its recommendation engine, which presented a viewer with options for what they might want to watch after they finished a video. The YouTube algorithm was not based on how to make sure people came across alternate views so that it could preserve the health of liberal democracy. It was based, like that of other social media operations, purely on engagement.

It was based, like that of other social media operations, purely on engagement. Initially, the website grounded it in ‘clicks to watch,’ but it then pivoted to ‘watchtime.’ Whatever got people watching longer was what mattered. The political effect was potentially very far-reaching. If someone clicked on a left-wing video and watched it to the end, the recommendation engine would provide more left-wing videos. Out of the options, the user might pick one. Once they did so, their choices were again narrowed, on the basis that the algorithm presumed the user had made an active choice for more left-wing content. Videos which were more edgy or shocking, which triggered more of an emotional response, provoked more engagement and were therefore prioritised in recommendations.

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by Eric Siegel
Published 19 Feb 2013

Every important thing a person does is valuable to predict, namely: consume, think, work, quit, vote, love, procreate, divorce, mess up, lie, cheat, steal, kill, and die. Let’s explore some examples.2 People Consume Hollywood studios predict the success of a screenplay if produced. Netflix awarded $1 million to a team of scientists who best improved their recommendation system’s ability to predict which movies you will like. Australian energy company Energex predicts electricity demand in order to decide where to build out its power grid, and Con Edison predicts system failure in the face of high levels of consumption. Wall Street predicts stock prices by observing how demand drives them up and down.

The product it hawked, pictured for all my fellow shoppers to see, had the potential to mortify. It was a coupon for Beano, a medication for flatulence. I’d developed mild lactose intolerance, but, before figuring that out, had been trying anything to address my symptom. Acting blindly on data, Walgreens’ recommendation system seemed to suggest that others not stand so close. Other clinical data holds a more serious and sensitive status than digestive woes. Once, when teaching a summer program for talented teenagers, I received data I felt would have been better kept away from me. The administrator took me aside to inform me that one of my students had a diagnosis of bipolar disorder.

Such a contest is a hard-nosed, objective bake-off—whoever can cook up the solution that best handles the predictive task at hand wins kudos and, usually, cash. Dark Horses And so it was with our two Montrealers, Martin and Martin, who took the Netflix Prize by storm despite their lack of experience—or, perhaps, because of it. Neither had a background in statistics or analytics, let alone recommendation systems in particular. By day, the two worked in the telecommunications industry developing software. But by night, at home, the two-member team plugged away, for 10 to 20 hours per week apiece, racing ahead in the contest under the team name PragmaticTheory. The “pragmatic” approach proved groundbreaking.

pages: 176 words: 55,819

The Start-Up of You: Adapt to the Future, Invest in Yourself, and Transform Your Career
by Reid Hoffman and Ben Casnocha
Published 14 Feb 2012

In 1999 he set up a meeting at Blockbuster’s headquarters in part to discuss possibly partnering on local distribution and faster fulfillment. Blockbuster was not impressed. “They just about laughed us out of their office,” Reed recalls.16 Reed and his team kept at it. They perfected their distribution center network so that more than 80 percent of customers received overnight delivery of movies.17 They developed an innovative recommendation engine that prompted users with movies they might like based on past purchases. By 2005 Netflix had a subscriber base four million strong, had fended off competition from imitations like Walmart’s online movie-by-mail effort, and became the king of online movie rentals. In 2010 Netflix made a profit of more than $160 million.

Smart Mobs: The Next Social Revolution
by Howard Rheingold
Published 24 Dec 2011

Slashdot and other self-organized online forums enable participants to rate the postings of other participants in discussions, causing the best writing to rise in prominence and most objectionable postings to sink. Amazon’s online recommendation system tells customers about books and records bought by people whose tastes are similar to their own. Google.com, the foremost Internet search engine, lists first those Web sites that have the most links pointing to them—an implicit form of recommendation system. Hordes of programmers who compete for bragging rights as well as paying work are already driving the evolution of the first-generation reputation systems toward more advanced forms.

Even simple instruments that enable groups to share knowledge online by recommending useful Web sites, without requiring any action by the participants beyond bookmarking them, can multiply the groups’ effectiveness. In 1997, Hui Guo, Thomas Kreifelts, and Angi Voss of the German National Research Center for Information Technology described their “SOaP” social filtering service designed to address several of the problems constraining recommender systems.10 Guo and his colleagues created software agents, programs that could search, query, gather information, report results, even negotiate and execute transactions with other programs. The SOaP agents could implicitly collect recommendation information by the members of a group and mediate among people, groups, and the Web.

Buyers searching for items can see the feedback scores of the sellers. Over time, consistently honest sellers build up substantial reputation scores, which are costly to discard, guarding against the temptation to cheat buyers and adopt a new reputation. Paul Resnick, whose GroupLens had been a pioneering recommender system in 1992, and Richard Zeckhauser performed empirical studies on “a large data set from 1999” that indicated that despite the lack of physical presence on eBay, “trust has emerged due to the feedback or reputation system.”29 Biological theories of cooperation and experiments in game theory point to the expectation of dealing with others in future interactions— the “shadow of the future” that influences behavior in the present.

pages: 223 words: 60,909

Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech
by Sara Wachter-Boettcher
Published 9 Oct 2017

In other words, if a system like Word2vec is fed data that reflects historical biases, then those biases will be reflected in the resulting word embeddings. The problem is that very few people have been talking about this—and meanwhile, because Google released Word2vec as an open-source technology, all kinds of companies are using it as the foundation for other products. These products include recommendation engines (the tools behind all those “you might also like . . .” features on websites), document classification, and search engines—all without considering the implications of relying on data that reflects historical biases and outdated norms to make future predictions. One of the most worrisome developments is this: using word embeddings to automatically review résumés.

pages: 375 words: 88,306

The Sharing Economy: The End of Employment and the Rise of Crowd-Based Capitalism
by Arun Sundararajan
Published 12 May 2016

This point has been noted about digital markets more generally. While a conventional brick-and-mortar bookstore may hold 40,000 to 100,000 books, Amazon offers access to over 3 million books. The same expansion in variety holds true for music, movies, electronics, and myriad other products. Furthermore, since Amazon uses several recommender systems to help promote products, it is not just variety but “fit” that has increased.14 Capturing the economic impacts of enhanced variety and automated word-of-mouth promotions, however, is difficult, since once again, what has changed is primarily the quality of the consumer experience. As Erik Brynjolfsson, Yu (Jeffery) Hu, and Michael Smith argue in their study of consumer surplus in the digital economy, these benefits may be particularly difficult to measure because different consumers are impacted to varying degrees.

This effect will be especially beneficial to those consumers who live in remote areas.”15 Analogous increases in consumer surplus were documented by Anindya Ghose, Rahul Telang and Michael Smith in their 2005 study of electronic markets for used books.16 These effects are exacerbated by a wide variety of recommender systems that use machine learning algorithms to better direct consumer choice. As Alexander Tuzhilin and Gedas Adomavicius document, such systems are ubiquitous in digital markets.17 It is natural to expect similar challenges when, for example, trying to encompass the different economic impacts of increased variety and fit from Airbnb, or increased convenience from Lyft, or Dennis’s increased access to financing on the Isle of Gigha.

Smith, “Internet Exchanges for Used Books: An Empirical Analysis of Product Cannibalization and Welfare Impact,” Information Systems Research 17, 1 (2006): 3–9. http://pubsonline.informs.org/doi/abs/10.1287/isre.1050.0072. 17. Alexander Tuzhilin and Gedas Adomavicius, ”Toward the next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Transactions on Knowledge and Data Engineering 17, 6 (2006): 734–739. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1423975&tag=1. 18. Prasanna Tambe and Lorin M. Hitt, “Job Hopping, Information Technology Spillovers, and Productivity Growth,” Management Science 60, 2 (2013): 338–355. 19.

pages: 229 words: 68,426

Everyware: The Dawning Age of Ubiquitous Computing
by Adam Greenfield
Published 14 Sep 2006

It may well be that a full mug on my desk implies that I am also in the room, but this is not always going to be the case, and any system that correlates the two facts had better do so pretty loosely. Products and services based on such pattern-recognition already exist in the world—I think of Amazon's "collaborative filtering"–driven recommendation engine—but for the most part, their designers are only now beginning to recognize that they have significantly underestimated the difficulty of deriving meaning from those patterns. The better part of my Amazon recommendations turn out to be utterly worthless—and of all commercial pattern-recognition systems, that's among those with the largest pools of data to draw on.

pages: 233 words: 67,596

Competing on Analytics: The New Science of Winning
by Thomas H. Davenport and Jeanne G. Harris
Published 6 Mar 2007

Customers watch their cinematic choices at their leisure; there are no late fees. When the DVDs are returned, customers select their next films. Besides the logistical expertise that Netflix needs to make this a profitable venture, Netflix employs analytics in two important ways, both driven by customer behavior and buying patterns. The first is a movie-recommendation “engine” called Cinematch that’s based on proprietary, algorithmically driven software. Netflix hired mathematicians with programming experience to write the algorithms and code to define clusters of movies, connect customer movie rankings to the clusters, evaluate thousands of ratings per second, and factor in current Web site behavior—all to ensure a personalized Web page for each visiting customer.

pages: 247 words: 69,593

The Creative Curve: How to Develop the Right Idea, at the Right Time
by Allen Gannett
Published 11 Jun 2018

Rather than doing his homework during his quiet shifts, Ted made a pact with himself that he would watch every single movie in the store. He wanted to learn everything he possibly could about films, and finally he had the best possible resource—a well-stocked video store—at his disposal. A few months later, after watching nearly every movie on the store shelves, Ted had morphed into a human recommendation engine. If you were a customer who liked Woody Allen films, Ted would suggest you try the movies of Albert Brooks, announcing that “what Woody Allen is to New York, Albert Brooks is to L.A.” Like a particular action movie? Ted had three other movie suggestions that would keep your blood flowing in just the same way.

pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr
by Doug Turnbull and John Berryman
Published 30 Apr 2016

But they often provide better results, because they employ a more holistic understanding of item-user relationships. To dive deeper into recommendation systems, we recommend Practical Recommender Systems by Kim Falk (Manning, 2016). And no matter the method you choose, keep in mind that the end result is a model that lets you quickly find the item-to-item or user-to-item affinities. This understanding is important as we explain how collaborative filtering results can be used in the context of search. 11.2.3. Tying user behavior information back to the search index In the previous section, we demonstrated how to build a simple recommendation system. But we’re supposed to be talking about personalized search!

In both cases, we start with relatively simple methods and then outline more sophisticated approaches using machine learning. In the process of laying out personalized search, we introduce recommendations. You can provide users with personalized content recommendations even before they’ve made a search. In addition, you’ll see that a search engine can be a powerful platform for building a recommendation system. Figure 11.1 shows recommendations side-by-side with search, implemented by a relevance engineer. Figure 11.1. By incorporating knowledge about the content and the user, search can be extended to tasks such as personalized search and recommendations. 11.1. Personalizing search based on user profiles Until now, we’ve defined relevance in terms of how well a search result matches a user’s immediate information need.

Particularly engaged users might even be willing to directly tell us about their interests. Item information —To make good recommendations, it’s important to be familiar with the items in the catalog. At a minimum, the items need to have useful textual content to match on. Items also need good metadata for boosting and filtering. In more advanced recommendation systems, you should also take advantage of the overall user behavior that gives you new information about how items in the catalog are interrelated. Recommendation context —To provide users with the best recommendations possible, you must consider their current context. Are they looking at an item details page?

pages: 265 words: 74,000

The Numerati
by Stephen Baker
Published 11 Aug 2008

It will be up to doctors and nurses to follow up, figuring out why someone is limping or swaying differently at the kitchen sink. But in time, these systems will have enough feedback from thousands of users that they should be able to point people—either doctors or patients—to the most probable cause. In this way, they will work like the recommendation engines on Netflix or Amazon.com, which point people toward books or movies that are popular among customers with similar patterns. (Amazon and Netflix, of course, don't always get it right, and neither will the analysis issuing from the magic carpet. It will only point caregivers toward statistically probable causes.)

pages: 252 words: 72,473

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
by Cathy O'Neil
Published 5 Sep 2016

Well, an internal data scientist might say, no statistical system can be perfect. Those folks are collateral damage. And often, like Sarah Wysocki, they are deemed unworthy and expendable. Forget about them for a minute, they might say, and focus on all the people who get helpful suggestions from recommendation engines or who find music they love on Pandora, the ideal job on LinkedIn, or perhaps the love of their life on Match.​com. Think of the astounding scale, and ignore the imperfections. Big Data has plenty of evangelists, but I’m not one of them. This book will focus sharply in the other direction, on the damage inflicted by WMDs and the injustice they perpetuate.

pages: 231 words: 71,248

Shipping Greatness
by Chris Vander Mey
Published 23 Aug 2012

Using IMDb’s unique collection of movie data and Amazon’s ability to distribute digital content and proven personalization tools, we will uniquely solve the content discovery problem by integrating these technologies and building unique suggestion algorithms. Unlike competitors such as Netflix, who already have a recommendations engine, we’ll integrate across all video sources and use our richer data to provide more interesting in-viewing experiences and more accurate recommendations. We will deliver these in-viewing experiences through platforms that can expose contextually relevant data (e.g., the cast of a YouTube video), such as a browser plug-in for YouTube and mobile applications for phones.

pages: 260 words: 76,223

Ctrl Alt Delete: Reboot Your Business. Reboot Your Life. Your Future Depends on It.
by Mitch Joel
Published 20 May 2013

THE REALITY OF CAREER CHOICES IN A CTRL ALT DELETE WORLD. You can contrast the fictional story above with the tale of a friend of mine. This individual was never really sure what she wanted to do. There was no clear desire or talent in a single area of interest. In her final years of high school, a guidance counselor recommended engineering or the sciences because she had above-average math grades. So my friend studied engineering through university and squeaked by. Never passionate about it, she got her diploma and entered the workforce. I had lunch with her a while back and she confessed that she was miserable because of her work but could not figure out why.

pages: 326 words: 74,433

Do More Faster: TechStars Lessons to Accelerate Your Startup
by Brad Feld and David Cohen
Published 18 Oct 2010

—usingmiles.com TutuorialTab (2010)—lets companies make their web site more learnable.—tutorialtab.com Usermojo (2010)—is an emotion analytics platform that tells you why users do what they do.—usermojo.com Vanilla (2009)—is open source forum software.—vanillaforums.com Villij (2007)—is a recommendation engine for people.—villij.com Vacation Rental Partner (2010)—makes it easy to generate revenue from a second home. We offer tools that eliminate the need for traditional property management companies.—vacationrentalpartner.com TechStars companies funded after publication are listed on the TechStars web site.

pages: 229 words: 72,431

Shadow Work: The Unpaid, Unseen Jobs That Fill Your Day
by Craig Lambert
Published 30 Apr 2015

To make online purchases, customers open accounts with bookstores, banks, newspapers, utilities, sports teams, apparel vendors, phone service providers, and so on. Everyone wants you to open an account. This means supplying contact and demographic data and then having all transactions tracked, building a personal profile for the vendor. That profile enables vendors to activate “recommendation engines.” Once its algorithms have examined your past purchases, Amazon can recommend books or desk lamps you might like, and Netflix can suggest movies to rent. On my computer, opening Amazon.com brings up thumbnails of books by Bill Bryson, an author whose works I have purchased, and books on pharmaceutical companies, a topic I’ve browsed.

pages: 284 words: 75,744

Death Glitch: How Techno-Solutionism Fails Us in This Life and Beyond
by Tamara Kneese
Published 14 Aug 2023

Social networking services for the dead are emblematic of a cultural fantasy regarding disembodied information and its capacity for thwarting physical decay.7 With data-based selves, habitual, consumer-based, and affective patterns constitute a speculative form of currency and capture.8 Through harvesting data from a variety of sources, it is possible to predict dead individuals’ responses to conversational prompts or, employing resources like Amazon’s recommendation engine, what dead individuals would purchase if they were still alive. For the most part, companies do not go so far as to claim that these captured patterns or glitchy avatars are the same exact things as the people they represent, but they are still of social value. Perhaps in a world where many transactions and interactions happen through awkward interfaces—virtual assistants on banking and travel websites, app-based healthcare, iPad ordering systems, and the on-demand economy—data doubles are close enough.

pages: 223 words: 71,414

Abolish Silicon Valley: How to Liberate Technology From Capitalism
by Wendy Liu
Published 22 Mar 2020

A softer version of this, which could be useful as a transitional step, could be to mandate an open API for companies in crowded spaces, so that their intellectual property essentially becomes commoditised — something that is already happening with legislation targeting scooter apps in Washington, DC.14 Companies that choose not to make their code publicly viewable and contestable should be held liable for any harm caused, even indirectly. In the US, an amendment to Section 230 of the 1996 Communications Decency Act could make platforms featuring user-generated content accountable for any content explicitly promoted in a recommendation engine.15 This would be most relevant for YouTube recommendations, which have come under fire for being gateways for extremist content.16 The equivalent of the Freedom of Information Act (FOIA) for private companies above a certain size would also be useful, so that internal decisions about product or corporate strategy are archived and disclosed upon request.

pages: 499 words: 144,278

Coders: The Making of a New Tribe and the Remaking of the World
by Clive Thompson
Published 26 Mar 2019

At Columbia University, the researcher Jonathan Albright experimentally searched on YouTube for the phrase “crisis actors,” in the wake of a major school shooting, and took the “next up” recommendation from the recommendation system. He quickly amassed 9,000 videos, a large percentage that seemed custom designed to shock, inflame, or mislead, ranging from “rape game jokes, shock reality social experiments, celebrity pedophilia, ‘false flag’ rants, and terror-related conspiracy theories,” as he wrote. Some of it, he figured, was driven by sheer profit motive: Post outrageous nonsense, get into the recommendation system, and reap the profit from the clicks. Recommender systems, in other words, may have a bias toward “inflammatory content,” as Tufekci notes.

They needed automation, an algorithm that would pick only posts you’d most likely find interesting. How does Facebook figure that out? It’s hard to know for sure. Social networks do not discuss their ranking systems with much detail, to prevent people from gaming their algorithms; spammers constantly try to suss out how recommendation systems work so they can produce spammy material that will get upranked. So few outside the firms truly know. But generally, the algorithms uprank the type of content you’d expect: posts and photos and videos that have amassed tons of likes or “faves” or attracted many comments, reposts, and retweets, with a particular bias toward recent activity.

Recommender systems, in other words, may have a bias toward “inflammatory content,” as Tufekci notes. Another academic, Renée DiResta, found the same problem with Facebook’s recommendation system for its “Groups.” People who read posts about vaccines were urged to join anti-vaccination groups, and thence to groups devoted to even more unhinged conspiracies like “chemtrails.” The recommendations, DiResta concluded, were “essentially creating this vortex in which conspiratorial ideas can just breed and multiply.” Certainly, big-tech firms keep quiet about how their systems work, for fear of being gamed. But since they seem to self-evidently favor high emotionality, it makes them pretty easy to manipulate, as Siva Vaidhyanathan, a media scholar and author of Antisocial Media, notes.

pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future
by Luke Dormehl
Published 10 Aug 2016

Much like laws continue to be followed after lawmakers have passed away, the idea of an expert system is that we ought to be able to continue drawing on an expert’s knowledge about a specialist subject after the person is no longer available to us. The concept failed, but the intention (and, for a while, the funding) was absolutely there. In some senses, the modern parallel of the expert system is the so-called ‘recommender system’. This subclass of information filtering system sets out to anticipate and predict what rating or selection a user is likely to give an item in a specific narrow domain. Everyone reading this will likely have come across the feature on Amazon or Netflix which suggests that, ‘You liked X, so you may also enjoy Y.’

‘Eventually we will entirely replace our brains using nanotechnology,’ he wrote. ‘Once delivered from the limitations of biology, we will be able to decide the length of our lives – with the option of immortality – and choose among other, unimagined capabilities as well.’ The Connectome A complex recommender system ‘mindfile’ of the sort described by Marius Ursache and William Sims Bainbridge may go some way towards replicating us in software form. However, the only truly faithful means of making sure that a person is reconstructed in a form other than their original one would be to duplicate all of the cellular pathways in the brain – neuron by painstaking neuron.

(TV show) 135–9, 162, 189–90, 225, 254 Jobs, Steve 6–7, 32, 35, 108, 113, 181, 193, 231 Jochem, Todd 55–6 judges 153–4 Kasparov, Garry 137, 138–9, 177 Katz, Lawrence 159–60 Keck, George Fred 81–2 Keynes, John Maynard 139–40 Kjellberg, Felix (PewDiePie) 151 ‘knowledge engineers’ 29, 37 Knowledge Narrator 110–11 Kodak 238 Kolibree 67 Koza, John 188–9 Ktesibios of Alexandria 71–2 Kubrick, Stanley 2, 228 Kurzweil, Ray 213–14, 231–3 Landauer, Thomas 201–2 Lanier, Jaron 156, 157 Laorden, Carlos 100, 101 learning 37–9, 41–4, 52–3, 55 Deep 11–2, 56–63, 96–7, 164, 225 and email filters 88 machine 3, 71, 84–6, 88, 100, 112, 154, 158, 197, 215, 233, 237, 239 reinforcement 83, 232 and smart homes 84, 85 supervised 57 unsupervised 57–8 legal profession 145, 188, 192 LegalZoom 145 LG 132 Lickel, Charles 136–7 ‘life logging’ software 200 Linden, David J. 213–14 Loebner, Hugh 102–3, 105 Loebner Prize 102–5 Lohn, Jason 182, 183–5, 186 long-term potentiation 39–40 love 122–4 Lovelace, Ada 185, 189 Lovelace Test 185–6 Lucas, George 110–11 M2M communication 70–71 ‘M’ (AI assistant) 153 Machine Intelligence from Cortical Networks (MICrONS) project 214–15 machine learners 38 machine learning 3, 71, 84–6, 88, 100, 112, 154, 158, 197, 215, 233, 237, 239 Machine Translator 8–9, 11 ‘machine-aided recognition’ 19–20 Manhattan Project 14, 229 MARK 1 (computer) 43–4 Mattersight Corporation 127 McCarthy, John 18, 19, 20, 27, 42, 54, 253 McCulloch, Warren 40–2, 43, 60, 142–3 Mechanical Turk jobs 152–7 medicine 11, 30, 87–8, 92–5, 187–8, 192, 247, 254 memory 13, 14, 16, 38–9, 42, 49 ‘micro-worlds’ 25 Microsoft 62–3, 106–7, 111–12, 114, 118, 129 mind mapping the 210–14, 217, 218 ‘mind clones’ 203 uploads 221 mindfiles 201–2, 207, 212 Minsky, Marvin 18, 21, 24, 32, 42, 44–6, 49, 105, 205–7, 253–4 MIT 19–20, 27, 96–7, 129, 194–5 Mitsuku (chatterbot) 103–6, 108 Modernising Medicine 11 Momentum Machines, Inc. 141 Moore’s Law 209, 220, 231 Moravec’s paradox 26–7 mortgage applications 237–8 MTurk platform 153, 154, 155 music 168, 172–7, 179 Musk, Elon 149–50, 223–4 MYCIN (expert system) 30–1 nanobots 213–14 nanosensors 92 Nara Logics 118 NASA 6, 182, 184–5 natural selection 182–3 navigational aids 90–1, 126, 127, 128, 241 Nazis 15, 17, 227 Negobot 99–102 Nest Labs 67, 96, 254 Netflix 156, 198 NETtalk 51, 52–3, 60 neural networks 11–12, 38–9, 41, 42–3, 97, 118, 164–6, 168, 201, 208–9, 211, 214–15, 218, 220, 224–5, 233, 237–8, 249, 254, 256–7 neurons 40, 41–2, 46, 49–50, 207, 209–13, 216 neuroscience 40–2, 211, 212, 214, 215 New York World’s Fair 1964 5–11 Newell, Alan 19, 226 Newman, Judith 128–9 Nuance Communications 109 offices, smart 90 OpenWorm 210 ‘Optical Scanning and Information Retrieval’ 7–8, 10 paedophile detection 99–102 Page, Larry 6–7, 34, 220 ‘paperclip maximiser’ scenario 235 Papert, Seymour 27, 44, 45–6, 49 Paro (therapeutic robot) 130–1 patents 188–9 Perceiving and Recognising Automation (PARA) 43 perceptrons 43–6 personality capture 200–4 pharmaceuticals 187–8 Pitts, Walter 40–2, 43, 60 politics 119–2 Pomerlau, Dean 54, 55–6, 90 prediction 87, 198–9 Profound Hypothermia and Circulatory Arrest 219–20 punch-cards 8 Qualcomm 93 radio-frequency identification device (RFID) 65–6 Ramón y Cajal, Santiago 39–40 Rapidly Adapting Lateral Position Handler (RALPH) 55 ‘recommender system’ 198 refuse collection 142 ‘relational agents’ 130 remote working 238–9 reverse engineering 208, 216, 217 rights for AIs 248–51 risks of AI 223–40 accountability issues 240–4, 246–8 ethics 244–8 rights for AIs 248–51 technological unemployment 139–50, 163, 225, 255 robots 62, 74–7, 89–90, 130–1, 141, 149, 162, 217, 225, 227, 246–7, 255–6 Asimov’s three ethical rules of 244–8 robotic limbs 211–12 Roomba robot vacuum cleaner 75–7, 234, 236 Rosenblatt, Frank 42–6, 61, 220 rules 36–7, 79–80 Rumelhart, David 48, 50–1, 63 Russell, Bertrand 41 Rutter, Brad 138, 139 SAINT program 20 sampling (music) 155, 157 ‘Scheherazade’ (Ai storyteller) 169–70 scikit-learn 239 Scripps Health 92 Sculley, John 110–11 search engines 109–10 Searle, John 24–5 Second Life (video game) 194 Second World War 12–13, 14–15, 17, 72, 227 Sejnowski, Terry 48, 51–3 self-awareness 77, 246–7 self-driving 53–6, 90, 143, 149–50 Semantic Information Retrieval (SIR) 20–2 sensors 75–6, 80, 84–6, 93 SHAKEY robot 23–4, 27–8, 90 Shamir, Lior 172–7, 179, 180 Shannon, Claude 13, 16–18, 28, 253 shipping systems 198 Simon, Herbert 10, 19, 24, 226 Sinclair Oil Corporation 6 Singularity, the 228–3, 251, 256 Siri (AI assistant) 108–11, 113–14, 116, 118–19, 125–30, 132, 225–6, 231, 241, 256 SITU 69, 93 Skynet 231 smart devices 3, 66–7, 69–71, 73–7, 80–8, 92–7, 230–1, 254 and AI assistants 116 and feedback 73–4 problems with 94–7 ubiquitous 92–4 and unemployment 141–2 smartwatches 66, 93, 199 Sony 199–200 Sorto, Erik 211, 212 Space Invaders (video game) 37 spectrometers 93 speech recognition 59, 62, 109, 111, 114, 120 SRI International 28, 89–90, 112–13 StarCraft II (video game) 186–7 story generation 169–70 strategy 36 STUDENT program 20 synapses 209 Synthetic Interview 202–3 Tamagotchis 123–5 Tay (chatbot) 106–7 Taylorism 95–6 Teknowledge 32, 33 Terminator franchise 231, 235 Tetris (video game) 28 Theme Park (video game) 29 thermostats 73, 79, 80 ‘three wise men’ puzzle 246–7 Tojan Room, Cambridge University 69–70 ‘tortoises’ (robots) 74–7 toys 123–5 traffic congestion 90–1 transhumanists 205 transistors 16–17 Transits – Into an abyss (musical composition) 168 translation 8–9, 11, 62–3, 155, 225 Turing, Alan 3, 13–17, 28, 35, 102, 105–6, 227, 232 Turing Test 15, 101–7, 229, 232 tutors, remote 160–1 TV, smart 80, 82 Twitter 153–4 ‘ubiquitous computing’ 91–4 unemployment, technological 139–50, 163, 225, 255 universal micropayment system 156 Universal Turing Machine 15–16 Ursache, Marius 193–7, 203–4, 207 vacuum cleaners, robotic 75–7, 234, 236 video games 28–9, 35–7, 151–2, 186–7, 194, 197 Vinge, Vernor 229–30 virtual assistants 107–32, 225–6, 240–1 characteristics 126–8 falling in love with 122–4 political 119–22 proactive 116–18 therapeutic 128–31 voices 124–126, 127–8 Viv Labs 132 Vladeck, David 242–4 ‘vloggers’ 151–2 von Neumann, John 13–14, 17, 100, 229 Voxta (AI assistant) 119–20 waiter drones 141 ‘Walking Cities’ 89–90 Walter, William Grey 74–7 Warwick, Kevin 65–6 Watson (Blue J) 138–9, 162, 189–92 Waze 90–91, 126 weapons 14, 17, 72, 224–5, 234–5, 247, 255–6 ‘wetware’ 208 Wevorce 145 Wiener, Norbert 72–3, 227 Winston, Patrick 49–50 Wofram Alpha tool 108–9 Wozniak, Steve 35, 114 X.ai 116–17 Xbox 360, Kinect device 114 XCoffee 70 XCON (expert system) 31 Xiaoice 129, 130 YouTube 151 Yudkowsky, Eliezer 237–8 Zuckerberg, Mark 7, 107–8, 230–1, 254–5 Acknowledgments WRITING A BOOK is always a bit of a solitary process.

pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything
by Martin Ford
Published 13 Sep 2021

As always, competition between the cloud providers is a powerful driver of innovation, and Amazon’s deep learning tools for the AWS platform are likewise becoming easier to use. Along with the development tools, all the cloud services offer pre-built deep learning components that are ready to be used out of the box and incorporated into applications. Amazon, for example, offers packages for speech recognition and natural language processing and a “recommendation engine” that can make suggestions in the same way that online shoppers or movie watchers are shown alternatives that are likely to be of interest.16 The most controversial example of this kind of prepackaged capability is AWS’s Rekognition service, which makes it easy for developers to deploy facial recognition technology.

pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy
by Sharon Bertsch McGrayne
Published 16 May 2011

In draft. Quatse JT, Najmi A. (2007) Empirical Bayesian targeting. Proceedings, 2007 World Congress in Computer Science, Computer Engineering, and Applied Computing, June 25–28, 2007. Schafer JB, Konstan J, Riedl J. (1999) Recommender systems in E-commerce. In ACM Conference on Electronic Commerce (EC-99) 158–66. Schafer JB, Konstan J, Riedl J. (2001) Recommender systems in E-commerce. Data Mining and Knowledge Discovery (5) 115–53. Schneider, Stephen H. (2005) The Patient from Hell. Perseus Books. Spolsky, Joel. (2005) (http://www.joelonsoftware.com/items/2005/10/17.html). Swinburne, Richard, ed. (2002) Bayes’s Theorem.

This use of Bayesian optimal classifiers is similar to the technique used by Frederick Mosteller and David Wallace to determine who wrote certain Federalist papers. Bayesian theory is firmly embedded in Microsoft’s Windows operating system. In addition, a variety of Bayesian techniques are involved in Microsoft’s handwriting recognition; recommender systems; the question-answering box in the upper right corner of a PC’s monitor screen; a datamining software package for tracking business sales; a program that infers the applications that users will want and preloads them before they are requested; and software to make traffic jam predictions for drivers to check before their commute.

The updating used in machine learning does not necessarily follow Bayes’ theorem formally but “shares its perspective.” A 1-million contest sponsored by Netflix.com illustrates the prominent role of Bayesian concepts in modern e-commerce and learning theory. In 2006 the online film-rental company launched a search for the best recommender system to improve its own algorithm. More than 50,000 contestants from 186 countries vied over the four years of the competition. The AT&T Labs team organized around Yehuda Koren, Christopher T. Volinsky, and Robert M. Bell won the prize in September 2009. Interestingly, although no contestants questioned Bayes as a legitimate method, almost none wrote a formal Bayesian model.

pages: 619 words: 177,548

Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity
by Daron Acemoglu and Simon Johnson
Published 15 May 2023

The World Wide Web, for instance, has become as much a platform for digital advertisement and propagation of misinformation as a source of useful information for people. Recommendation systems are often used for steering customers to specific products, depending on the platform’s financial incentives. Digital tools can provide information to managers not just for better decision making but also for the better monitoring of workers. Some of the AI-powered recommendation systems have incorporated and reintensified existing biases—for example, toward race in hiring or toward race in the justice system. Platforms for ride sharing and delivery have imposed exploitative arrangements on workers lacking protection or job security.

AI is the name given to the branch of computer science that develops “intelligent” machines, meaning machines and algorithms (instructions for solving problems) capable of exhibiting high-level capabilities. Modern intelligent machines perform tasks that many would have thought impossible a couple of decades ago. Examples include face-recognition software, search engines that guess what you want to find, and recommendation systems that match you to the products that you are most likely to enjoy or, at the very least, purchase. Many systems now use some form of natural-language processing to interface between human speech or written enquiries and computers. Apple’s Siri and Google’s search engine are examples of AI-based systems that are used widely around the world every day.

As the naked-streets experiment emphasized, driving in busy cities requires a tremendous amount of situational intelligence to adapt to changing circumstances, and even more social intelligence to respond to cues from other drivers and pedestrians. General AI Illusion The apogee of the current AI approach inspired by Turing’s ideas is the quest for general, human-level intelligence. Despite tremendous advances such as GPT-3 and recommendation systems, the current approach to AI is unlikely to soon crack human intelligence or even achieve very high levels of productivity in many of the decision-making tasks humans engage in. Tasks that involve social and situational aspects of human cognition will continue to pose formidable challenges for machine intelligence.

pages: 319 words: 89,477

The Power of Pull: How Small Moves, Smartly Made, Can Set Big Things in Motion
by John Hagel Iii and John Seely Brown
Published 12 Apr 2010

Blurring Creation and Use Pull platforms tend to allow us to perform the following activities, with a blurring of the boundaries between creation and use: • Find. Pull platforms allow us to find not just raw materials, products, and services, but also people with relevant skills and experience. Some of the tools and services that pull platforms use to help participants find relevant resources include search, recommendation engines, directories, agents, and reputation services. • Connect. Again, pull platforms connect us not just to raw materials, products, and services, but also to people with relevant skills and experiences. Performance fabrics5 are particularly helpful in establishing appropriate connections. The mobile Internet is dramatically extending our ability to connect wherever we are

pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
by Eric Redmond , Jim Wilson and Jim R. Wilson
Published 7 May 2012

Unlike other database styles that group collections of like objects into common buckets, graph databases are more free-form—queries consist of following edges shared by two nodes or, namely, traversing nodes. As more projects use them, graph databases are growing the straightforward social examples to occupy more nuanced use cases, such as recommendation engines, access control lists, and geographic data. Good For: Graph databases seem to be tailor-made for networking applications. The prototypical example is a social network, where nodes represent users who have various kinds of relationships to each other. Modeling this kind of data using any of the other styles is often a tough fit, but a graph database would accept it with relish.

pages: 389 words: 87,758

No Ordinary Disruption: The Four Global Forces Breaking All the Trends
by Richard Dobbs and James Manyika
Published 12 May 2015

To illustrate the scale of the opportunity, consider this change: on July 31, 2013, the US Bureau of Economic Analysis released GDP figures that for the first time categorized research and development and software into a new category of “intellectual property products.” We estimate that digital capital is now the source of roughly one-third of total global GDP growth, with intangible assets (think of the value of Google’s search algorithm or Amazon’s recommendation engine) being the main driver.41 For businesses and governments alike, failing to navigate today’s technological tide will mean losing out on a huge economic opportunity as well as increasing vulnerability to potential disruptions. Digitization and technological advances can transform industries in the blink of an eye, as BlackBerry has learned.

pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson and Andrew McAfee
Published 20 Jan 2014

When there are many small local markets, there can be a ‘best’ provider in each, and these local heroes frequently can all earn a good income. If these markets merge into a single global market, top performers have an opportunity to win more customers, while the next-best performers face harsher competition from all directions. A similar dynamic comes into play when technologies like Google or even Amazon’s recommendation engine reduce search costs. Suddenly second-rate producers can no longer count on consumer ignorance or geographic barriers to protect their margins. Digital technologies have aided the transition to winner-take-all markets, even for products we wouldn’t think would have superstar status. In a traditional camera store, cameras typically are not ranked number one versus number ten.

pages: 294 words: 96,661

The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity
by Byron Reese
Published 23 Apr 2018

Reducing it down it to ones and zeros is obviously possible, but equally obviously difficult for a device that can only manipulate abstract symbols in memory. One wrinkle with these sorts of perception problems is that we don’t have the training data to teach the robots. Amazon has a huge database of “people who bought this also bought that” with which to train its recommendation engine. But we don’t have all the tactile data of a million adults holding a million babies in a thousand situations. We could certainly collect the data by making a version of those CGI suits that people wear when making movies. Using upgraded sensors in the hands and fingers, we could get a thousand parents to wear them for a year to begin to collect that data.

pages: 336 words: 91,806

Code Dependent: Living in the Shadow of AI
by Madhumita Murgia
Published 20 Mar 2024

It had never occurred to her before she learned about this industry that simply looking at imagery of violence on a screen, sifting and examining it in detail for hours, could cause PTSD. But after her conversations, it seemed blindingly obvious. She observed, first-hand, that the most lucrative parts of Silicon Valley products – AI recommendation engines, such as Instagram and TikTok’s main feeds or X’s For You tab that grab our attention – are often built on the shoulders of the most vulnerable, including poor youth, women and migrant labourers whose right to live and work in a country is dependent on their job. Without the labour of outsourced content moderators, these feeds would be simply unusable, too poisonous for our society to consume as greedily as we do.

pages: 1,535 words: 337,071

Networks, Crowds, and Markets: Reasoning About a Highly Connected World
by David Easley and Jon Kleinberg
Published 15 Nov 2010

Ideas from the theory of voting have been adopted in a number of recent on-line applications [139]. Different Web search engines produce different rankings of results; a line of work on meta-search has developed tools for combining these rankings into a single aggregate ranking. Recommendation systems for books, music, and other items — such as Amazon’s product-recommendation system — have employed related ideas for aggregating preferences. In this case, a recommendation system determines a set of users whose past history indicates tastes similar to yours, and then uses voting methods to combine the preferences of these other users to produce a ranked list of recommendations (or a single best recommendation) for you.

We discussed such systems in the context of structural balance in Chapter 5, and will see their role in providing information essential to the functioning of on-line markets in Chapter 22. Web 2.0 sites also make use of recommendations systems, to guide users toward items that they may not know about. In addition to serving as helpful features for a site’s users, such recommendation systems interact in complex but important ways with distributions of popularity and the long tail of niche content, as we will see in Chapter 18. The development of the current generation of Web search engines, led by Google, is sometimes seen as a crucial step in the pivot from the early days of the Web to the era of Web 2.0.

. . . . . . . . . . . . . . . . 299 10.6 Advanced Material: A Proof of the Matching Theorem . . . . . . . . . . . . 300 10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 11 Network Models of Markets with Intermediaries 319 11.1 Price-Setting in Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 11.2 A Model of Trade on Networks . . . . . . . . . . . . . . . . . . . . . . . . . 323 11.3 Equilibria in Trading Networks . . . . . . . . . . . . . . . . . . . . . . . . . 330 11.4 Further Equilibrium Phenomena: Auctions and Ripple Effects . . . . . . . . 334 11.5 Social Welfare in Trading Networks . . . . . . . . . . . . . . . . . . . . . . . 338 11.6 Trader Profits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 11.7 Reflections on Trade with Intermediaries . . . . . . . . . . . . . . . . . . . . 342 11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 12 Bargaining and Power in Networks 347 12.1 Power in Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 12.2 Experimental Studies of Power and Exchange . . . . . . . . . . . . . . . . . 350 12.3 Results of Network Exchange Experiments . . . . . . . . . . . . . . . . . . . 352 12.4 A Connection to Buyer-Seller Networks . . . . . . . . . . . . . . . . . . . . . 356 12.5 Modeling Two-Person Interaction: The Nash Bargaining Solution . . . . . . 357 12.6 Modeling Two-Person Interaction: The Ultimatum Game . . . . . . . . . . . 360 12.7 Modeling Network Exchange: Stable Outcomes . . . . . . . . . . . . . . . . 362 12.8 Modeling Network Exchange: Balanced Outcomes . . . . . . . . . . . . . . . 366 12.9 Advanced Material: A Game-Theoretic Approach to Bargaining . . . . . . . 369 12.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 IV Information Networks and the World Wide Web 381 13 The Structure of the Web 383 13.1 The World Wide Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 13.2 Information Networks, Hypertext, and Associative Memory . . . . . . . . . . 386 13.3 The Web as a Directed Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 394 13.4 The Bow-Tie Structure of the Web . . . . . . . . . . . . . . . . . . . . . . . 397 13.5 The Emergence of Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 6 CONTENTS 14 Link Analysis and Web Search 405 14.1 Searching the Web: The Problem of Ranking . . . . . . . . . . . . . . . . . . 405 14.2 Link Analysis using Hubs and Authorities . . . . . . . . . . . . . . . . . . . 407 14.3 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 14.4 Applying Link Analysis in Modern Web Search . . . . . . . . . . . . . . . . 420 14.5 Applications beyond the Web . . . . . . . . . . . . . . . . . . . . . . . . . . 423 14.6 Advanced Material: Spectral Analysis, Random Walks, and Web Search . . . 425 14.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 15 Sponsored Search Markets 445 15.1 Advertising Tied to Search Behavior . . . . . . . . . . . . . . . . . . . . . . 445 15.2 Advertising as a Matching Market . . . . . . . . . . . . . . . . . . . . . . . . 448 15.3 Encouraging Truthful Bidding in Matching Markets: The VCG Principle . . 452 15.4 Analyzing the VCG Procedure: Truth-Telling as a Dominant Strategy . . . . 457 15.5 The Generalized Second Price Auction . . . . . . . . . . . . . . . . . . . . . 460 15.6 Equilibria of the Generalized Second Price Auction . . . . . . . . . . . . . . 464 15.7 Ad Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 15.8 Complex Queries and Interactions Among Keywords . . . . . . . . . . . . . 469 15.9 Advanced Material: VCG Prices and the Market-Clearing Property . . . . . 470 15.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 V Network Dynamics: Population Models 489 16 Information Cascades 491 16.1 Following the Crowd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 16.2 A Simple Herding Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 493 16.3 Bayes’ Rule: A Model of Decision-Making Under Uncertainty . . . . . . . . . 497 16.4 Bayes’ Rule in the Herding Experiment . . . . . . . . . . . . . . . . . . . . . 502 16.5 A Simple, General Cascade Model . . . . . . . . . . . . . . . . . . . . . . . . 504 16.6 Sequential Decision-Making and Cascades . . . . . . . . . . . . . . . . . . . 508 16.7 Lessons from Cascades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 17 Network Effects 517 17.1 The Economy Without Network Effects . . . . . . . . . . . . . . . . . . . . . 518 17.2 The Economy with Network Effects . . . . . . . . . . . . . . . . . . . . . . . 522 17.3 Stability, Instability, and Tipping Points . . . . . . . . . . . . . . . . . . . . 525 17.4 A Dynamic View of the Market . . . . . . . . . . . . . . . . . . . . . . . . . 527 17.5 Industries with Network Goods . . . . . . . . . . . . . . . . . . . . . . . . . 534 17.6 Mixing Individual Effects with Population-Level Effects . . . . . . . . . . . . 536 17.7 Advanced Material: Negative Externalities and The El Farol Bar Problem . 541 17.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 CONTENTS 7 18 Power Laws and Rich-Get-Richer Phenomena 553 18.1 Popularity as a Network Phenomenon . . . . . . . . . . . . . . . . . . . . . . 553 18.2 Power Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 18.3 Rich-Get-Richer Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 18.4 The Unpredictability of Rich-Get-Richer Effects . . . . . . . . . . . . . . . . 559 18.5 The Long Tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 18.6 The Effect of Search Tools and Recommendation Systems . . . . . . . . . . . 564 18.7 Advanced Material: Analysis of Rich-Get-Richer Processes . . . . . . . . . . 565 18.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 VI Network Dynamics: Structural Models 571 19 Cascading Behavior in Networks 573 19.1 Diffusion in Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 19.2 Modeling Diffusion through a Network . . . . . . . . . . . . . . . . . . . . . 575 19.3 Cascades and Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 19.4 Diffusion, Thresholds, and the Role of Weak Ties . . . . . . . . . . . . . . . 588 19.5 Extensions of the Basic Cascade Model . . . . . . . . . . . . . . . . . . . . . 590 19.6 Knowledge, Thresholds, and Collective Action . . . . . . . . . . . . . . . . . 593 19.7 Advanced Material: The Cascade Capacity . . . . . . . . . . . . . . . . . . . 597 19.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 20 The Small-World Phenomenon 621 20.1 Six Degrees of Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 20.2 Structure and Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 20.3 Decentralized Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 20.4 Modeling the Process of Decentralized Search . . . . . . . . . . . . . . . . . 629 20.5 Empirical Analysis and Generalized Models . . . . . . . . . . . . . . . . . . 632 20.6 Core-Periphery Structures and Difficulties in Decentralized Search . . . . . . 638 20.7 Advanced Material: Analysis of Decentralized Search . . . . . . . . . . . . . 640 20.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 21 Epidemics 655 21.1 Diseases and the Networks that Transmit Them . . . . . . . . . . . . . . . . 655 21.2 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 21.3 The SIR Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660 21.4 The SIS Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 21.5 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669 21.6 Transient Contacts and the Dangers of Concurrency . . . . . . . . . . . . . . 672 21.7 Genealogy, Genetic Inheritance, and Mitochondrial Eve . . . . . . . . . . . . 676 21.8 Advanced Material: Analysis of Branching and Coalescent Processes . . . . . 682 21.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 8 CONTENTS VII Institutions and Aggregate Behavior 699 22 Markets and Information 701 22.1 Markets with Exogenous Events . . . . . . . . . . . . . . . . . . . . . . . . . 702 22.2 Horse Races, Betting, and Beliefs . . . . . . . . . . . . . . . . . . . . . . . . 704 22.3 Aggregate Beliefs and the “Wisdom of Crowds” . . . . . . . . . . . . . . . . 710 22.4 Prediction Markets and Stock Markets . . . . . . . . . . . . . . . . . . . . . 714 22.5 Markets with Endogenous Events . . . . . . . . . . . . . . . . . . . . . . . . 717 22.6 The Market for Lemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 22.7 Asymmetric Information in Other Markets . . . . . . . . . . . . . . . . . . . 724 22.8 Signaling Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 22.9 Quality Uncertainty On-Line: Reputation Systems and Other Mechanisms . 729 22.10Advanced Material: Wealth Dynamics in Markets . . . . . . . . . . . . . . . 732 22.11Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 23 Voting 745 23.1 Voting for Group Decision-Making . . . . . . . . . . . . . . . . . . . . . . . 745 23.2 Individual Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 23.3 Voting Systems: Majority Rule . . . . . . . . . . . . . . . . . . . . . . . . . 750 23.4 Voting Systems: Positional Voting . . . . . . . . . . . . . . . . . . . . . . . . 755 23.5 Arrow’s Impossibility Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 758 23.6 Single-Peaked Preferences and the Median Voter Theorem . . . . . . . . . . 760 23.7 Voting as a Form of Information Aggregation . . . . . . . . . . . . . . . . . . 766 23.8 Insincere Voting for Information Aggregation . . . . . . . . . . . . . . . . . . 768 23.9 Jury Decisions and the Unanimity Rule . . . . . . . . . . . . . . . . . . . . . 771 23.10Sequential Voting and the Relation to Information Cascades . . . . . . . . . 776 23.11Advanced Material: A Proof of Arrow’s Impossibility Theorem . . . . . . . . 777 23.12Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782 24 Property Rights 785 24.1 Externalities and the Coase Theorem . . . . . . . . . . . . . . . . . . . . . . 785 24.2 The Tragedy of the Commons . . . . . . . . . . . . . . . . . . . . . . . . . . 790 24.3 Intellectual Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793 24.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 Chapter 1 Overview Over the past decade there has been a growing public fascination with the complex “connectedness” of modern society.

Beautiful Visualization
by Julie Steele
Published 20 Apr 2010

Preference Similarity A well-known measure of similarity used in many recommendation systems is cosine similarity. A practical introduction to this technique can be found in Linden, Smith, and York (2003). In the case of movies, intuitively, the measure indicates that two movies are similar if users who rated one highly rated the other highly or, conversely, users who rated one poorly rated the other poorly. We’ll use this similarity measure to generate similarity data for all 17,700 movies in the Netflix Prize dataset, then generate coordinates based on that data. If we were interested in building an actual movie recommender system, we might do so simply by recommending the movies that were similar to those a user had rated highly.

If we were interested in building an actual movie recommender system, we might do so simply by recommending the movies that were similar to those a user had rated highly. However, the goal here is just to gain insight into the dynamics of such a recommender system. Labeling The YELLOWPAGES.COM visualization was easier to label than this Netflix Prize visualization for a number of reasons, including fewer nodes and shorter labels, but mostly because the nodes were more uniformly distributed. Although the Netflix Prize visualization has a large number of clusters, most of the movies are contained in only a small number of those clusters. This disparity is even more apparent when we look at only the movies with the most ratings.

pages: 201 words: 21,180

Designing for the Social Web
by Joshua Porter
Published 18 May 2008

Del.icio.us simply counts the number of bookmarks that people have saved in the last x hours and orders them from most popular to least popular, displaying as a “most popular” list of bookmarks that people have saved recently7. . Participant ranking. The Digg Top Diggers page was a ranking system that took into account measures of desired behavior to come up with an overall rank for each Digger. . Collaborative filtering. Netflix’s recommendation system relies on collaborative filtering to display recommended movies based on your previous ratings. . Relevance. Services like Google rely on a complex algorithm to determine what to display. Figuring out which content is relevant is a big deal to Google—it’s the core value of the entire service. .

See also Netflix Movies For You screen, Netflix, 105–106 MSN Groups, 122 MSNBC.com, 157–158 MusicLab study, 137–139 MySpace, 13, 16, 18, 119 N nature vs. nurture debate, 8 navigation, non-linear, 171–172 Neeleman, David, 61, 62 negative feedback, 57–62, 139. See also feedback Netflix collaborative filtering of ratings on, 136 as example of complex adaptive system, 128, 129 as example of successful social object, 32 goals/activities/tasks for, 27 “How It Works” graphic, 73–74 Movies For You screen, 105–106 primary activity for, 26 recommendation system, 136 Netvibes, 92–93 network value, 24 networked world, designing for, viii New York Times most-shared articles screen, 160–161 sharing call to action, 149, 150–151, 152 Newmark, Craig, 51, 54 news feed blowup, Facebook, 116–118 news sites, 17, 133, 136 Newsvine, 153 Nielsen/NetRatings, 20 Nike+, 17 non-interactive entertainment, vii–viii non-linear navigation, 171–172 Norman, Dan, 25 notifications feature, 104 nytimes.com, 149.

pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking
by Foster Provost and Tom Fawcett
Published 30 Jun 2013

Definitions of data scientists (and advertisements for positions) specify not just areas of expertise but also specific programming languages and tools. It is common to see job advertisements mentioning data mining techniques (e.g., random forests, support vector machines), specific application areas (recommendation systems, ad placement optimization), alongside popular software tools for processing big data (Hadoop, MongoDB). There is often little distinction between the science and the technology for dealing with large datasets. We must point out that data science, like computer science, is a young field.

For example, analyzing purchase records from a supermarket may uncover that ground meat is purchased together with hot sauce much more frequently than we might expect. Deciding how to act upon this discovery might require some creativity, but it could suggest a special promotion, product display, or combination offer. Co-occurrence of products in purchases is a common type of grouping known as market-basket analysis. Some recommendation systems also perform a type of affinity grouping by finding, for example, pairs of books that are purchased frequently by the same people (“people who bought X also bought Y”). The result of co-occurrence grouping is a description of items that occur together. These descriptions usually include statistics on the frequency of the co-occurrence and an estimate of how surprising it is.

If that’s what our algorithm is doing, we’re using the wrong algorithm. For regression problems we have a directly analogous baseline: predict the average value over the population (usually the mean or median). In some applications there are multiple simple averages that one may want to combine. For example, when evaluating recommender systems that internally predict how many “stars” a particular customer would give to a particular movie, we have the average number of stars a movie gets across the population (how well liked it is) and the average number of stars a particular customer gives to movies (what that customer’s overall bias is).

pages: 321 words: 113,564

AI in Museums: Reflections, Perspectives and Applications
by Sonja Thiel and Johannes C. Bernhardt
Published 31 Dec 2023

The cases mentioned with regard to Australia, New Zealand, and Oceania are partially taken from the research conducted by Konstantina Hornek (master’s student, Expanded Museum Studies) in the context of the mapping. Isabel Hufschmidt: Troubleshoot? in the context of optimized storytelling. For more than a decade now, this has involved mobile apps and NLP-powered chatbots, partly equipped with recommender systems, such as at the Museum of Old and New Art in Tasmania, the Auckland Art Gallery, the Auckland War Memorial Museum, and the Museum of Australian Democracy in Canberra. With its NLP-powered interactive installation Dimensions in Testimony, launched in 2021, the Jewish Museum in Sydney has provided visitors with the possibility to converse with virtual twins of Holocaust survivors.

Experimental Space and the Datalab The project facilitated the opening of several experimental spaces—for example, a MuseumCamp at the Allard Pierson (2021) and a joint hackathon at the Badisches Landesmuseum16 provided a first stage for experiments with museum data and AI technologies. As a result, participants developed a chatbot prototype, a recommender system, an individualized AI guide, and even poetic digital identifiers. In a development sprint phase, three projects were invited to further develop their approach. This resulted in prototypes that helped shape the concept of the xCurator in 2021. In course of the Datalab activities, the participating developer Lukáš Pilka conducted clustering tests with the UMAP projection and Pixplot,17 which showed how a digital archive is represented in a visually different way with a high-dimensional graph visualization and how a finetuning process might work.

Several tests and at first promising ideas with external partners were thus not pursued further due to limitations as well as their alignment with the development. A linking to the physical space of the museums, for instance, was therefore not pursued fur- 241 242 Part 3: Applications ther, and the integration of an AI-based recommender system in interaction with user data and also a system-overarching user data analysis and interconnected recommendations and audience segmentation also turned out to be complicated for several reasons. It consequently became evident that creating a structured and quality-controlled knowledge graph is an important basis for further work.

pages: 323 words: 95,939

Present Shock: When Everything Happens Now
by Douglas Rushkoff
Published 21 Mar 2013

That adds up to millions of unskilled, untrained, unpaid, unknown ‘journalists’—a thousandfold growth between 1996 and 2006—spewing their (mis)information out in the cyberworld.” More sanguine voices, such as City University of New York journalism professor and BuzzFeed blogger Jeff Jarvis, argue that the market—amplified by search results and recommendation engines—will eventually allow the better journalism to rise to the top of the pile. But even market mechanisms may have a hard time functioning as we consumers of all this media lose our ability to distinguish between facts, informed opinions, and wild assertions. Our impatient disgust with politics as usual combined with our newfound faith in our own gut sensibilities drives us to take matters into our own hands—in journalism and beyond.

pages: 326 words: 103,170

The Seventh Sense: Power, Fortune, and Survival in the Age of Networks
by Joshua Cooper Ramo
Published 16 May 2016

And, well, you had liked that film. This seemed magic, just the sort of data-meets-human question that showcased a machine learning and thinking. An honestly artificial intelligence. Maes hoped to design a computer that could predict what movies or music or books you or I might enjoy. (And, of course, buy.) A recommendation engine. We all know how sputtering our own suggestion motors can be. Think of that primitive analog exchange known as the First Date: Oh, you like Radiohead? Do you know Sigur Rós? Pause. Hate them. Can you really predict what albums or novels even your closest friend will enjoy? You might offer an occasional lucky suggestion.

Forward: Notes on the Future of Our Democracy
by Andrew Yang
Published 15 Nov 2021

On my social media platforms, the algorithms that determine which content I see are constantly suggesting social media posts to amplify; many of them express sentiments of outrage and hostility toward someone or something. I ignore most of them. Due to the insidious nature of these platforms’ recommendation engines, however, that’s hard to do. You might be watching something relatively benign on YouTube—for example, a news documentary about the 9/11 attacks. In the list of suggested links next to the video you’re watching, however, there is often something far more inflammatory, such as a video espousing conspiracy theories.

Mindf*ck: Cambridge Analytica and the Plot to Break America
by Christopher Wylie
Published 8 Oct 2019

Cambridge Analytica did this because of a specific feature of Facebook’s algorithm at the time. When someone follows pages of generic brands like Walmart or some prime-time sitcom, nothing much changes in his newsfeed. But liking an extreme group, such as the Proud Boys or the Incel Liberation Army, marks the user as distinct from others in such a way that a recommendation engine will prioritize these topics for personalization. Which means the site’s algorithm will start to funnel the user similar stories and pages—all to increase engagement. For Facebook, rising engagement is the only metric that matters, as more engagement means more screen time to be exposed to advertisements.

pages: 332 words: 100,245

Mine!: How the Hidden Rules of Ownership Control Our Lives
by Michael A. Heller and James Salzman
Published 2 Mar 2021

We quickly learn to tune out unpleasant ownership details—in part because the digital economy brings so much immediate gratification. There’s a reason streaming services are replacing home bookshelves. While some may be nostalgic for their wall of treasured CDs, many prefer the vast library and song-recommendation engine available with a click on Spotify—both old favorites and new discoveries. We also benefit as consumers because licensing the stick can be cheaper than owning the bundle. Companies can maximize revenue by offering us just what we want right that minute. We may feel we own more, but we really don’t.

System Error: Where Big Tech Went Wrong and How We Can Reboot
by Rob Reich , Mehran Sahami and Jeremy M. Weinstein
Published 6 Sep 2021

Facebook’s business model is to increase the time we spend on its platform and then sell access to our personalized profiles to advertisers and political operatives who seek to manipulate our behavior and dump the by-product of that manipulation onto our personal lives and democratic institutions. YouTube’s recommendation systems and default autoplay setting keep users watching videos on its platform while pushing people into echo chambers and feeding them more extreme content, thereby undermining our democracies, which rely on facts and trust. And Uber’s and Waymo’s push for automated vehicles may increase productivity but leave displaced and unemployed workers at the mercy of the government’s feeble social safety net.

It requires us to be explicit about the values we want to promote and how we trade off among them, because those values are encoded in some way into the objective functions that are optimized. Technology is also an amplifier because it can often enable the execution of a particular policy to reach a goal far more efficiently than a human can. It can power an autonomous vehicle to drive more safely than your neighbor does or be the basis of a recommendation system that keeps you watching online videos far longer than you intended. Even well-meaning policies can easily become objectionable when technology enables their hyper-efficient automation. With current GPS and mapping technology it would be possible to produce vehicles that would automatically issue a speeding ticket every time the driver exceeded the speed limit—and would eventually stop the car from moving and issue a warrant for the driver’s arrest when he or she had accumulated enough speeding tickets.

We admit that it’s a strange time to be mounting a defense of democracy and civic empowerment as the antidote to big tech’s current predicaments. The public’s faith in our governing institutions is at historic lows. Yet we must also remember that the distrust in democracy is partly a product of the rise of technologists. The recommendation systems and algorithmic curation of the private platforms that constitute the infrastructure of our digital public sphere have contributed to polarization and supercharged the spread of misinformation. And the tech industry has contributed to a winner-take-all economy, which has in turn widened wealth and income inequality, phenomena that social scientists have repeatedly demonstrated undermine confidence in democratic institutions.

pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy
by Tom Slee
Published 18 Nov 2015

People in the Airbnb economy don’t have the option of trusting each other on the basis of institutional affiliations, so they do it on the basis of online signaling and peer evaluations.” 3 Sharing Economy companies are not the first to use ratings and algorithms to guide behavior. Their trust systems build on the rating and recommendation systems used by Amazon, Netflix, eBay, Yelp, TripAdvisor, iTunes, the App Store and many others. Each takes individual ratings as their input and transforms them into some form of recommendation. As rating systems have become ubiquitous their usefulness has become a matter of faith in the world of software development.

For Anderson, Amazon represents the return of variety and diversity after decades of homogenous blockbusters: “We are turning from a mass market back into a niche nation, defined not by geography but by interests.” 19 In a Long Tail world there is no need for formal gatekeepers who select or restrict the works that can find their public; instead, Web 2.0 platforms will do it for us using crowdsourced consumer reviews and recommender systems: “By combining infinite shelf space with real-time information about buying trends and public opinion . . . unlimited selection is revealing truths about what consumers want and how they want to get it.” 20 Amazon and Airbnb are similar in many ways. Both are, at least in part, software companies whose inventory is simply a set of entries in a database, accessed via a web site.

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by Dipanjan Sarkar
Published 1 Dec 2016

Keyphrase extraction, also known as terminology extraction, is defined as the process or technique of extracting key important and relevant terms or phrases from a body of unstructured text such that the core topics or themes of the text document(s) are captured in these key phrases. This technique falls under the broad umbrella of information retrieval and extraction. Keyphrase extraction finds its uses in many areas, including the following: Semantic web Query-based search engines and crawlers Recommendation systems Tagging systems Document similarity Translation Keyphrase extraction is often the starting point for carrying out more complex tasks in text analytics or NLP, and the output from this can itself act as features for more complex systems. There are various approaches for keyphrase extraction.

Web sites and pages contain further links embedded in them, which link to more pages with more links, and this continues across the Internet. This can be represented as a graph-based model where vertices indicate the web pages, and edges indicate links among them. This can be used to form a voting or recommendation system such that when one vertex links to another one in the graph, it is basically casting a vote. Vertex importance is decided not only on the number of votes or edges but also the importance of the vertices that are connected to it and their importance. This helps in determining the score or rank for each vertex or page.

This should be enough for you to get started with analyzing document similarity and clustering, and you can even start combining various techniques from the chapters covered so far. (Hint: Topic models with clustering, building classifiers by combining supervised and unsupervised learning, and augmenting recommendation systems using document clusters—just to name a few!) © Dipanjan Sarkar 2016 Dipanjan Sarkar, Text Analytics with Python, 10.1007/978-1-4842-2388-8_7 7. Semantic and Sentiment Analysis Dipanjan Sarkar1 (1)Bangalore, Karnataka, India Natural language understanding has gained significant importance in the last decade with the advent of machine learning (ML) and further advances like deep learningand artificial intelligence.

pages: 416 words: 108,370

Hit Makers: The Science of Popularity in an Age of Distraction
by Derek Thompson
Published 7 Feb 2017

I recently visited Spotify, the large online streaming music company, to talk to Matt Ogle, the lead engineer on a new hit product called Discover Weekly, a personalized list of thirty songs delivered every Monday to tens of million of users. For about a decade, Ogle had worked for several music companies to design the perfect music recommendation engine. His philosophy of music was that most people enjoy new songs, but they don’t enjoy the effort that it takes to find them. They want effortless, frictionless musical revelations, a series of achievable challenges. In the design of Discover Weekly, “every decision we made was shaped by the notion that this should feel like a friend giving you a mix tape,” he said.

pages: 364 words: 99,897

The Industries of the Future
by Alec Ross
Published 2 Feb 2016

Academics have likened it to both a microscope and telescope—a tool that allows us to both examine smaller details than could previously be observed and to see data at a larger scale, revealing correlations that were previously too distant for us to notice. The story of big data’s real-world impact to this point has been largely about logistics and persuasion. It has been great for supply chains, elections, and advertising because these tend to be fields with lots of small, repeated, and quantifiable actions—hence the “recommendation engines” used by Amazon and Netflix that help make more precise recommendations to customers. But these fields are just the beginning, and by the time my kids enter the workforce, big data won’t be a buzz phrase any longer. It will have permeated parts of our lives that we do not think of today as being rooted in analytics.

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurelien Geron
Published 14 Aug 2019

In Chapter 8, we looked at the most common unsupervised learning task: dimensionality reduction. In this chapter, we will look at a few more unsupervised learning tasks and algorithms: Clustering: the goal is to group similar instances together into clusters. This is a great tool for data analysis, customer segmentation, recommender systems, search engines, image segmentation, semi-supervised learning, dimensionality reduction, and more. Anomaly detection: the objective is to learn what “normal” data looks like, and use this to detect abnormal instances, such as defective items on a production line or a new trend in a time series.

Classification (left) versus clustering (right) Clustering is used in a wide variety of applications, including: For customer segmentation: you can cluster your customers based on their purchases, their activity on your website, and so on. This is useful to understand who your customers are and what they need, so you can adapt your products and marketing campaigns to each segment. For example, this can be useful in recommender systems to suggest content that other users in the same cluster enjoyed. For data analysis: when analyzing a new dataset, it is often useful to first discover clusters of similar instances, as it is often easier to analyze clusters separately. As a dimensionality reduction technique: once a dataset has been clustered, it is usually possible to measure each instance’s affinity with each cluster (affinity is any measure of how well an instance fits into a cluster).

pages: 421 words: 110,406

Platform Revolution: How Networked Markets Are Transforming the Economy--And How to Make Them Work for You
by Sangeet Paul Choudary , Marshall W. van Alstyne and Geoffrey G. Parker
Published 27 Mar 2016

Many firms—both platform businesses and others—track consumers’ web usage, financial interactions, magazine subscriptions, political and charitable contributions, and much more to create highly detailed individual profiles. In the aggregate, such data can be used for cross-marketing to people who share profiles, as when a recommendation engine on a shopping site tells you, “People like you who bought product A often enjoy product B, too!” The anonymity of this process renders it unobjectionable to most people. But the same underlying data can be, and is, sold to prospective employers, government agencies, health care providers, and marketers of all kinds.

pages: 412 words: 116,685

The Metaverse: And How It Will Revolutionize Everything
by Matthew Ball
Published 18 Jul 2022

See Google Altberg, Ebbe, 110 Amazon, xiv Amazon content via the Apple App Store, 184–85, 197 business model, 164 Fire OS, 213 Fire Phone, 143 gaming and, 178–79, 278, 281n investment in AR/VR hardware, 143, 277–78 market capitalization of, 166 positioning for the Metaverse, 274, 277–78 recommendation engine, 288 see also Bezos, Jeff Amazon Game Studios, 277 Amazon GameSparks, 107–8, 117 Amazon Go, 157 Amazon Lumberyard, 278 Amazon Luna, 96, 131, 277–78, 282 Amazon Music, 197, 277 Amazon Prime, 179, 185, 197, 277–78 Amazon Prime Video, 185, 277 Amazon Web Services (AWS), 84, 99, 277–78 AMC Entertainment, 28 American Cancer Society, 9 American Express, 172, 188 American Tower, 243, 244 America Online (AOL), 13, 15, 61, 130, 165, 273, 283 Andreessen Horowitz, 233 Android, 25, 61, 143, 212–14 Amazon Fire Phone, 143 backwards “pinch-to-zoom” concept, 149–50, 151 game development for, 131 gaming and, 32, 92, 133 Google Cardboard viewer for, 142 Google’s approach to, 184, 212–15, 275 progressive closure of, 213 Samsung’s approach to, 213 the 30% standard, 188, 190–91, 204–5 Animal Crossing: New Horizons, 30–32, 247 AOL Instant Messenger, 61 Apple Audio Interchange File Format (AIFF), 122 dominance of, 189 investment in AR/VR hardware, 143–44 lawsuit from Epic Games, 14n, 22–23, 32n, 134, 186, 284 lawsuit from the European Union, 184 market capitalization of, 166, 186–87 moral stance on pornography, 261 patents of, 143–44, 150 “There’s an app for that” ad campaign, 26, 150, 243 Apple App Store, 26, 132, 165, 309 categories of apps in, 183, 185–87 control over competing browsers, 194–95 control over payment rails, 201–4, 243–44 economics of, 186 as hindering the development of the Metaverse, 192–95, 197–99, 243–44, 309 policies on blockchain, crypto mining, and cryptocurrency trading apps, 200–201 the 30% standard, 120, 172–80, 183–84, 186–92, 197, 201, 203–4, 286 user identity and control, 299 Apple iOS, 60–61 Animoji, 159 “App Tracking Transparency” (ATT), 204–5 AssistiveTouch, 153 control over its NFC chip, 199–200 Face ID authentication system, 159 FaceTime, 65, 83 the home button, 148–49, 244 iCloud storage, 124, 200 “iPad Natives,” 13, 249 iPads, xi, 294 iPhones, 64, 131, 146, 242–44 Metal, 142, 175, 196 multitasking, 149, 244 Newton tablet, 145 “pinch-to-zoom” concept, 149–50, 151 Safari, 194–96, 209 Siri queries to Apple’s servers, 161 “slide-to-unlock” feature, 150–51 WebKit, 39, 194 Apple Music, 184, 197, 255 Apple News, 256 Apple Watch, 152, 161 application programming interfaces (APIs) authentication, 138 Discord APIs, 135 Instagram’s Twitter integration API, 287, 300 proprietary APIs and gaming consoles, 174–77, 287 in United States v.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences
by Rob Kitchin
Published 25 Aug 2014

Discovering correlations between certain items led to new product placements and alterations to shelf space management and a 16 per cent increase in revenue per shopping cart in the first month’s trial. There was no hypothesis that Product A was often bought with Product H that was then tested. The data were simply queried to discover what relationships existed that might have previously been unnoticed. Similarly, Amazon’s recommendation system produces suggestions for other items a shopper might be interested in without knowing anything about the culture and conventions of books and reading; it simply identifies patterns of purchasing across customers in order to determine whether, if Person A likes Book X, they are also likely to like Book Y given their own and others’ consumption patterns.

Popper (1979, cited in Callebaut 2012: 74) thus suggests that all science adopts a searchlight approach to scientific discovery, with the focus of light guided by previous findings, theories and training; by speculation that is grounded in experience and knowledge. The same is true for Amazon, Hunch, Ayasdi, and Google. How Amazon constructed its recommendation system was based on scientific reasoning, underpinned by a guiding model and accompanied by empirical testing designed to improve the performance of the algorithms it uses. Likewise, Google undertakes extensive research and development, it works in partnership with scientists and it buys scientific knowledge, either funding research within universities or by buying the IP of other companies, to refine and extend the utility of how it organises, presents and extracts value from data.

pages: 404 words: 92,713

The Art of Statistics: How to Learn From Data
by David Spiegelhalter
Published 2 Sep 2019

First, we are in an age of data science, in which large and complex data sets are collected from routine sources such as traffic monitors, social media posts and internet purchases, and used as a basis for technological innovations such as optimizing travel routes, targeted advertising or purchase recommendation systems—we shall look at algorithms based on ‘big data’ in Chapter 6. Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter.

A major problem is that these algorithms tend to be inscrutable black boxes—they come up with a prediction, but it is almost impossible to work out what is going on inside. This has three negative aspects. First, extreme complexity makes implementation and upgrading a great effort: when Netflix offered a $1m prize for prediction recommendation systems, the winner was so complicated that Netflix ended up not using it. The second negative feature is that we do not know how the conclusion was arrived at, or what confidence we should have in it: we just have to take it or leave it. Simpler algorithms can better explain themselves. Finally, if we do not know how an algorithm is producing its answer, we cannot investigate it for implicit but systematic biases against some members of the community—a point I expand on below

pages: 340 words: 94,464

Randomistas: How Radical Researchers Changed Our World
by Andrew Leigh
Published 14 Sep 2018

Landon ended up with just 8 of the 531 electoral college votes. 61Huizhi Xie & Juliette Aurisset, ‘Improving the sensitivity of online controlled experiments: Case studies at Netflix.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 645–54. ACM, 2016. 62Carlos A. Gomez-Uribe & Neil Hunt, ‘The Netflix recommender system: Algorithms, business value, and innovation’, ACM Transactions on Management Information Systems (TMIS), vol. 6, no. 4, 2016, p. 13. 63Gomez-Uribe & Hunt, ‘The Netflix recommender system’, p. 13. 64Adam D.I. Kramer, Jamie E. Guillory & Jeffrey T. Hancock, ‘Experimental evidence of massive-scale emotional contagion through social networks’, Proceedings of the National Academy of Sciences, vol. 3, no. 24, 2014, pp. 8788–90. 65Because 22.4 per cent of Facebook posts contained negative words, and 46.8 per cent contained negative words, the study also had two control groups: one of which randomly omitted 2.24 per cent of all posts, and another that randomly omitted 4.68 per cent of all posts. 66Oddly, some commentators seem unaware of the finding, continuing to make claims like ‘Facebook makes us feel inadequate, so we try to compete, putting a positive spin and a pretty filter on an ordinary moment – prompting someone else to do the same . . . when you sign up to Facebook you put yourself under pressure to appear popular, fun and loved, regardless of your reality’: Daisy Buchanan, ‘Facebook bragging’s route to divorce’, Australian Financial Review, 27 August 2016 67Kate Bullen & John Oates, ‘Facebook’s ‘experiment’ was socially irresponsible’, Guardian, 2 July 2014. 68Quoted in David Goldman, ‘Facebook still won’t say “sorry” for mind games experiment’, CNNMoney, 2 July 2014. 9 TESTING THEORIES IN POLITICS AND PHILANTHROPY 1Julian Jamison & Dean Karlan, ‘Candy elasticity: Halloween experiments on public political statements’, Economic Inquiry, vol. 54, no. 1, 2016, pp. 543–7. 2This experiment is outlined in detail in Dan Siroker, ‘How Obama raised $60 million by running a simple experiment’, Optimizely blog, 29 November 2010. 3Quoted in Brian Christian, ‘The A/B test: Inside the technology that’s changing the rules of business’, Wired, 25 April 2012. 4Alan S.

pages: 442 words: 94,734

The Art of Statistics: Learning From Data
by David Spiegelhalter
Published 14 Oct 2019

First, we are in an age of data science, in which large and complex data sets are collected from routine sources such as traffic monitors, social media posts and internet purchases, and used as a basis for technological innovations such as optimizing travel routes, targeted advertising or purchase recommendation systems – we shall look at algorithms based on ‘big data’ in Chapter 6. Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter.

A major problem is that these algorithms tend to be inscrutable black boxes – they come up with a prediction, but it is almost impossible to work out what is going on inside. This has three negative aspects. First, extreme complexity makes implementation and upgrading a great effort: when Netflix offered a $1m prize for prediction recommendation systems, the winner was so complicated that Netflix ended up not using it. The second negative feature is that we do not know how the conclusion was arrived at, or what confidence we should have in it: we just have to take it or leave it. Simpler algorithms can better explain themselves. Finally, if we do not know how an algorithm is producing its answer, we cannot investigate it for implicit but systematic biases against some members of the community – a point I expand on below.

pages: 480 words: 123,979

Dawn of the New Everything: Encounters With Reality and Virtual Reality
by Jaron Lanier
Published 21 Nov 2017

The company even offered a million-dollar prize for ideas to make the algorithm smarter. The thing about Netflix, though, is that it doesn’t offer a comprehensive catalog, especially of recent, hot releases. If you think of any particular movie, it might not be available for streaming. The recommendation engine is a magician’s misdirection, distracting you from the fact that not everything is available. So is the algorithm intelligent, or are people making themselves somewhat blind and silly in order to make the algorithm seem intelligent? What Netflix has done is admirable, because the whole point of Netflix is to deliver theatrical illusions to you.

pages: 382 words: 120,064

Bank 3.0: Why Banking Is No Longer Somewhere You Go but Something You Do
by Brett King
Published 26 Dec 2012

In Siri’s patent application, various possibilities are hinted at, including being a voice agent providing assistance for “automated teller machines”.4 In fact, SRI (the creator of Siri™) and BBVA recently announced a collaboration to introduce Lola5, a Siri-like technology, to customers through the Internet and via voice. Siri’s near-term capabilities include: 1. Being able to make simple online purchases, such as “Purchase Bank 3.0 from Amazon Kindle” 2. Serving as a recommendation engine or intelligent automated assistant—an “agent avatar”, as it has sometimes been labelled However, there are some challenges in having customers talk into their phones for customer support, or replacing an IVR system with technologies such as Lola, as a recent New York Times article pointed out when it called Siri “the latest public nuisance in the cell phone revolution”.

pages: 547 words: 173,909

Deep Utopia: Life and Meaning in a Solved World
by Nick Bostrom
Published 26 Mar 2024

Discussing actual or potential purchases with a friend, gaining and conveying information about personal tastes, sharing insights about who and what is in or out of fashion. Some of these aspects of the skill and effort of shopping are already being undercut by recommender systems and other functionalities that are becoming available thanks to progress in AI. Instead of the shopper having to visit many boutiques or having to browse up and down the aisles of a department store, they can visit a single online vendor. Offers predicted to be of greatest interest to the customer are brought to their attention. Let’s extrapolate this a bit. If the recommender system is sufficiently capable, it would remove the need for exploration entirely. The system would know your tastes and offer suggestions that you like better than whatever you would have picked out yourself.

Holding other things constant, it would seem that the experience streams of these two people are now equal in terms of the objective boringness of their watching experiences. But of course, instead of another person serving as the selector, we could use an inanimate mechanism to do this job. Squinting a little, one might view today’s streaming services and recommender systems as (very primitive forms of somewhat misaligned) boredom prostheses. In the ideal case, they keep us consuming a personalized content stream indefinitely—with suitable intermezzos in which we buy all the stuff that is pushed to us in the ads. The mechanism selects new content to preempt boredom, ensuring that we stay “engaged”.

pages: 575 words: 140,384

It's Not TV: The Spectacular Rise, Revolution, and Future of HBO
by Felix Gillette and John Koblin
Published 1 Nov 2022

The company was expanding into foreign markets, offering the service for the first time in Canada, Latin America, and the Caribbean. And its technology kept improving. Just the year before, in 2009, Netflix had handed out a $1-million award to an international team of machine-learning experts for developing an algorithm that was able to beat the accuracy of the company’s in-house recommendation engine, Cinematch, by 10 percent. Netflix’s brand name was growing synonymous with a new, better way of watching commercial-free Hollywood entertainment at home. Wall Street was smitten. Netflix’s stock price was shooting up, and media outlets were fawning over Netflix’s future. In the fall of 2010, Fortune magazine, which was owned by HBO’s parent company Time Warner, named Netflix’s CEO Hastings as its Businessperson of the Year.

pages: 540 words: 103,101

Building Microservices
by Sam Newman
Published 25 Dec 2014

Let’s imagine that initially we identify four contexts we think our monolithic backend covers: Catalog Everything to do with metadata about the items we offer for sale Finance Reporting for accounts, payments, refunds, etc. Warehouse Dispatching and returning of customer orders, managing inventory levels, etc. Recommendation Our patent-pending, revolutionary recommendation system, which is highly complex code written by a team with more PhDs than the average science lab The first thing to do is to create packages representing these contexts, and then move the existing code into them. With modern IDEs, code movement can be done automatically via refactorings, and can be done incrementally while we are doing other things.

Currently, all of this is handled by the finance-related code. If we split this service out, we can provide additional protections to this individual service in terms of monitoring, protection of data at transit, and protection of data at rest — ideas we’ll look at in more detail in Chapter 9. Technology The team looking after our recommendation system has been spiking out some new algorithms using a logic programming library in the language Clojure. The team thinks this could benefit our customers by improving what we offer them. If we could split out the recommendation code into a separate service, it would be easy to consider building an alternative implementation that we could test against.

The Deep Learning Revolution (The MIT Press)
by Terrence J. Sejnowski
Published 27 Sep 2018

As a consequence, there are fewer parameters to train on each epoch, and the resulting network has fewer dependencies between units than would be the case if the same large network were trained on every epoch. Dropout decreases the error rate in deep learning networks by 10 percent, which is a large improvement. In 2009, Netflix conducted an open competition, offering a prize of $1 million to the first person who could reduce the error of their recommender system by 10 percent.16 Almost every graduate student in machine learning entered the competition. Netflix probably inspired $10 million of research for the cost of the prize. And deep networks are now a core technology for online streaming.17 Intriguingly, cortical synapses drop out at a high rate.

Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research 15 (2014):1929–1958. 16. “Netflix Prize,” Wikipedia, last modified, August 23, 2017, https://en.wikipedia .org/wiki/Netflix_Prize. 17. Carlos A. Gomez-Uribe, Neil Hunt, “The Netflix Recommender System: Algorithms,” ACM Transactions on Management Information Systems 6, no. 4 (2016) , article no. 13. 18. T. M. Bartol Jr., C. Bromer, J. Kinney, M. A. Chirillo, J. N. Bourne, K. M. Harris, and T. J. Sejnowski, “Nanoconnectomic Upper Bound on the Variability of Synaptic Plasticity,” eLife, 4:e10778, 2015, doi:10.7554/eLife.10778. 19.

pages: 451 words: 115,720

Green Tyranny: Exposing the Totalitarian Roots of the Climate Industrial Complex
by Rupert Darwall
Published 2 Oct 2017

Joachim Israel, an exceptionally fortunate German Jew who managed to settle in Sweden just as the door was closing, wrote in his memoir, The Nazis were quite willing to fulfil these demands and surely understood that the Swedish and Swiss requests were confirmation of the correctness of their anti-Semitic policies.21 On September 9, 1938, Sweden introduced the Gränsrekomendationssystemet (Border Recommendation System), a bureaucratic term designed to sanitize its intent, making it virtually impossible for Jewish citizens of the German Reich to enter Sweden. Israel writes in his memoir that the year he had the J stamped in his passport was the same year “in which the responsible Swedish minister, the brother of the prime minister, issued secret orders that Swedish border guards should turn away all unauthorized Jews who tried to cross the border and send them back to Germany.”22 Solutions to Sweden’s population crisis—the one ostensibly identified by the Myrdals four years earlier—were presenting themselves in growing numbers at its borders.

Stalin, Joseph Five Year Plans Stanford University State Institute for Racial Biology (Sweden) Staudenmaier, Peter Stern, Todd Steyer, Tom Stockholm Conference Stockholm Declaration Principle 21 Principle 22 Stockholm Environment Institute “Design to Win” (2007) Streicher, Julius Strong, Maurice Students for a Democratic Society (SDS) Port Huron Statement sulfur dioxide Sunday Times Sussman, Bob Sustainable Markets Foundation Supplemental Poverty Measure Svenska Dagbladet Svensson, Göte Svensson, Ulf Sweden 1812 Policy Border Recommendation System (1938) Defense Staff Environmental Protection Agency foreign aid program of Gothenburg Hårsfjärden incident (1982) National Environmental Board State Forestry Agency Stockholm U-137 Incident (1981) Uppsala Swedish Committee for Vietnam Swedish Energy Research Commission Swedish Radio Swiss Re Switzerland Basle Geneva Kaieraugust Zurich Syria Taylor, Kat TechNet Tesla, Nikola Thatcher, Margaret administration of Theutenberg, Bo Diaries from the Foreign Ministry Third Reich (1933–1945) Architects and Engineers Association Hitler Youth Nationalsozialistische Bibliotek public health policies of Reich Ministry of Economic Affairs Reich Ministry of Finance Reich Windpower Study Group symbolism of Thirty Years’ War (1618–1648) Thomas, David Thomas, Lewis Thompson, Starley Thoreau, Henry David Three Mile Island incident (1979) Threshold Foundation Thunberg, Anders Swedish International Secretary Tides Foundation founding of (1976) Time Tinbergen, Jan Tinbergen Rule Tocqueville, Alexis de Tolba, Mustafa visit to Stockholm (1982) Tooze, Adam Toronto Conference (1988) role of NGOs in Tretyakov, Sergei Trittin, Jürgen Trudeau, Pierre Trumka, Richard Trump, Donald TTAPS Twitter Tyndall, John Uganda Entebbe Ukraine Chernobyl Disaster (1986) Ulbricht, Walter Ulrich, Bernard Undén, Östen Union of Concerned Scientists formation of (1968) United Arab Emirates (UAE) Abu Dhabi United Kingdom (UK) Climate Change Act Cumbria Department for Energy and Climate Change Department for International Development Department for Transport domestic electricity prices in Drax power station electricity grid infrastructure in Foreign and Commonwealth Office London National Grid United Nations (UN) Charter of Economic Commission for Europe (UNECE) Educational, Scientific and Cultural Organization (UNESCO) Environment Programme (UNEP) Framework Convention on Climate Change (1992) General Assembly Geneva Convention on Long-Range Transboundary Air Pollution (1979) Global Impact Resolution (1998) Security Council United States Ahoskie, NC Air Quality Agreement (1991) American Clean Energy and Security Act (Waxman-Markey Bill) (2009) Buffalo, NY California Global Warming Solutions Act (2006) California Renewable Portfolio Standard Program Californian Coastal Commission Californian Energy Crisis (2001–2002) Central Intelligence Agency (CIA) Chicago, IL Clean Air Act (1970) Clean Power Plan (2015) Congress Constitution of Court of Appeals Declaration of Independence (1776) Department for Justice Department of Health Department of the Interior Energy Independence and Security Act (2007) Environmental Protection Agency (EPA) Federal Bureau of Investigation (FBI) Freedom of Information Act House of Representatives Ithaca, NY Los Angeles, CA National Acid Precipitation Assessment Program (NAPAP) natural gas and oil output of New Deal New York Office of Management and Budget Office of Science and Technology Policy oil reserves of Palo Alto, CA Pentagon Phoenix, AZ Proposition 23 (2010) Sacramento, CA Safe Drinking Water Act (2005) San Francisco, CA Santa Barbara oil spill (1969) Senate Senate Committee on Environment and Public Works (EPW) Senate Foreign Relations Committee Senate Judiciary Committee Silicon Valley State Department subprime mortgage crisis (2007–2009) Supreme Court Washington, D.C.

pages: 364 words: 119,398

Men Who Hate Women: From Incels to Pickup Artists, the Truth About Extreme Misogyny and How It Affects Us All
by Laura Bates
Published 2 Sep 2020

Chaslot told the Daily Beast he very quickly realised that ‘YouTube’s recommendation was putting people into filter bubbles… There was no way out.’ In a 2019 New York Times interview, YouTube’s chief product officer, Neal Mohan, denied that the platform created a ‘rabbit hole’ effect, saying that it offered a full spectrum of content and opinion, and that watch time was not the only feature used by the site’s recommendation systems. He acknowledged that the algorithm might queue up more extreme videos, but claimed it might also offer ‘other videos that skew in the opposite direction’.12 But that didn’t seem to be the case in my own experiments, or those of other writers who have documented this phenomenon. This doesn’t mean that YouTube is deliberately setting out to promote and support these extreme racist and misogynist viewpoints.

Just like the facilitation of manosphere radicalisation on the platform, the problem may have been completely unintentional, but the outcome was horrifying. What matters is that, once YouTube was alerted to the issue, it was given a clear solution. Researchers suggested that the platform simply turn off its recommendation system on videos of children. It was a change that could have been implemented automatically and with ease. And it would have stopped the exploitation in its tracks. But YouTube declined to put it into practice. Why? Because recommendations are its biggest traffic driver, it told the New York Times, so turning them off ‘would hurt “creators” who rely on those clicks’.

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

YouTube’s algorithm for recommending videos to watch next has come under fire for promoting harmful content, from conspiracy theory videos to extremist content. YouTube executives have stated that over 70 percent of viewing hours are driven by the algorithm. Google engineers described the deep learning algorithm as “one of the largest-scale and most sophisticated industrial recommendation systems in existence.” Yet multiple independent researchers, journalists, and even a former Google engineer claimed in 2018 the algorithm was biased toward more extreme and incendiary content, leading viewers video-by-video down a “rabbit hole” of conspiracy theories and misinformation. Critics have speculated that the effect was not intentional, but rather that the algorithm was responding to increased viewer engagement with more sensational material in a “feedback loop” that trained the machine learning system to provide viewers with more inflammatory content.

The Complete Guide,” Hootsuite Blog, June 21, 2021, https://blog.hootsuite.com/how-the-youtube-algorithm-works/; Paige Cooper, “How the Facebook Algorithm Works in 2021 and How to Make It Work for You,” Hootsuite Blog, February 10, 2021, https://blog.hootsuite.com/facebook-algorithm/. 144more sophisticated algorithm: Eric Meyerson, “YouTube Now: Why We Focus on Watch Time,” YouTube Official Blog, August 10, 2012, https://blog.youtube/news-and-events/youtube-now-why-we-focus-on-watch-time. 144deep learning to improve their algorithms: Koumchatzky and Andryeyev, “Using Deep Learning at Scale in Twitter’s Timelines.” 1449.3 million problematic videos: “YouTube Community Guidelines Enforcement,” Google Transparency Report, June 2021, https://transparencyreport.google.com/youtube-policy/removals. 145algorithm for recommending videos to watch next: Paul Lewis, “‘Fiction Is Outperforming Reality’: How YouTube’s Algorithm Distorts Truth,” The Guardian, February 2, 2018, https://www.theguardian.com/technology/2018/feb/02/how-youtubes-algorithm-distorts-truth; Zeynep Tufekci, “YouTube, the Great Radicalizer,” New York Times, March 10, 2018, https://www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical.html; Sam Levin, “Las Vegas Survivors Furious as YouTube Promotes Clips Calling Shooting a Hoax,” The Guardian, October 4, 2017, https://www.theguardian.com/us-news/2017/oct/04/las-vegas-shooting-youtube-hoax-conspiracy-theories; Clive Thompson, “YouTube’s Plot to Silence Conspiracy Theories,” Wired, September 18, 2020, https://www.wired.com/story/youtube-algorithm-silence-conspiracy-theories/. 145over 70 percent of viewing hours are driven by the algorithm: Joan E. Solsman, “YouTube’s AI Is the Puppet Master over Most of What You Watch,” CNET, January 10, 2018, https://www.cnet.com/news/youtube-ces-2018-neal-mohan/. 145“one of the largest-scale and most sophisticated industrial recommendation systems”: Paul Covington, Jay Adams, and Emre Sargin, Deep Neural Networks for YouTube Recommendations (Google, 2016), https://research.google.com/pubs/archive/45530.pdf. 145even a former Google engineer: Lewis, “‘Fiction Is Outperforming Reality’”; Guillaume Chaslot, “The Toxic Potential of YouTube’s Feedback Loop,” Wired, July 13, 2019, https://www.wired.com/story/the-toxic-potential-of-youtubes-feedback-loop/. 145more extreme and incendiary content: Lewis, “‘Fiction Is Outperforming Reality’”; Tufekci, “YouTube, the Great Radicalizer”; Nicas, “How YouTube Drives People to the Internet’s Darkest Corners,” Wall Street Journal, February 7, 2018, https://www.wsj.com/articles/how-youtube-drives-viewers-to-the-internets-darkest-corners-1518020478. 145“rabbit hole” of conspiracy theories: Kevin Roose, “The Making of a YouTube Radical,” New York Times, June 8, 2019, https://www.nytimes.com/interactive/2019/06/08/technology/youtube-radical.html; Tufekci, “YouTube, the Great Radicalizer”; Max Fisher and Amanda Taub, “How YouTube Radicalized Brazil,” New York Times, August 11, 2019, https://www.nytimes.com/2019/08/11/world/americas/youtube-brazil.html; Thompson, “YouTube’s Plot to Silence Conspiracy Theories.” 145responding to increased viewer engagement: Chaslot, “The Toxic Potential of YouTube’s Feedback Loop.” 145denied that a “rabbit hole” effect exists: Kevin Roose, “YouTube’s Product Chief on Online Radicalization and Algorithmic Rabbit Holes,” New York Times, March 29, 2019, https://www.nytimes.com/2019/03/29/technology/youtube-online-extremism.html. 145opacity of machine learning algorithms: Chico Q.

in Proceedings of the 32nd International Conference on Machine Learning (Lille, France, 2015), https://arxiv.org/pdf/1804.07933.pdf; ilmoi, “Poisoning Attacks on Machine Learning,” towards data science, July 14, 2019, https://towardsdatascience.com/poisoning-attacks-on-machine-learning-1ff247c254db. 245recommendation algorithms: Hai Huang, “Data Poisoning Attacks to Deep Learning Based Recommender Systems,” (paper, Network and Distributed Systems Security (NDSS) Symposium 2021, February 21–25, 2021), https://arxiv.org/pdf/2101.02644.pdf. 245poison a medical AI model: Matthew Jagielski et al., Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning (arXiv.org, September 28, 2021), https://arxiv.org/pdf/1804.00308.pdf. 245manipulate real-world data: Ricky Laishram and Vir Virander Phoha, Curie: A method for protecting SVM Classifier from Poisoning Attack (arXiv.org, June 7, 2016), https://arxiv.org/pdf/1606.01584.pdf; Zhao et al., “Efficient Label Contamination Attacks.” 245data from external sources: Zhao et al., “Efficient Label Contamination Attacks.” 245alter the data or even just the label: Zhao et al., “Efficient Label Contamination Attacks.” 245insert adversarial noise into the training data: Adrien Chan-Hon-Tong, “An Algorithm for Generating Invisible Data Poisoning Using Adversarial Noise That Breaks Image Classification Deep Learning,” Machine Learning & Knowledge Extraction 1, no. 1 (November 9, 2018), https://doi.org/10.3390/make1010011; Ali Shafahi et al., “Poison Frogs!

pages: 169 words: 41,887

Literary Theory for Robots: How Computers Learned to Write
by Dennis Yi Tenen
Published 6 Feb 2024

This ingested printed matter contains the messiness of human wisdom and emotion—­the information and disinformation, fact and metaphor. While we were building railroads, fighting wars, and buying shoes online, the machine child went to school. Literary computers scribble everywhere now in the background, powering search engines, recommendations systems, and customer service chatbots. They flag offensive content on social networks and delete spam from our inboxes. At the hospital, they help convert patient-­doctor conversations into insurance billing codes. Sometimes, they alert law enforcement to potential terrorist plots and predict (poorly) the threat of violence on social media.

pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection
by Jacob Silverman
Published 17 Mar 2015

Even so, companies remain extraordinarily reliant on these reviews. A 2011 Harvard Business School study found that, on Yelp, “an extra star is worth an extra 5 to 9 percent in revenue.” The result of all this reviewing has been the atrophying of the critical culture, with professional critics seen as dispensable, nothing more than recommendation engines who can be replaced with algorithms and free, crowdsourced reviews. (Even so, some prominent cultural critics remain, though with less influence than they used to hold, and a smattering of publications, from the actuarially precise Consumer Reports to the liberal humanist New York Review of Books, continue to thrive.)

pages: 493 words: 139,845

Women Leaders at Work: Untold Tales of Women Achieving Their Ambitions
by Elizabeth Ghaffari
Published 5 Dec 2011

Kate just came back from rural India, studying the ways people there use technology. These are intriguing issues. I find it especially interesting to bring such people together with more mathematical people like me. I have worked on models of social networks and recommendation systems that exist in social networks. When I talk to danah, I'm trying to understand what people are seeking through recommendation systems. When you merge qualitative and quantitative skill sets, it takes a while for each to adapt to the other because there are language barriers and differences in what we're trying to achieve. When we finally do achieve something jointly, I find that it's usually very good and very deep. __________ 3 The lower case spelling of danah boyd is “how she chooses to identify” herself.

Mastering Machine Learning With Scikit-Learn
by Gavin Hackeling
Published 31 Oct 2014

For example, assume that your training data consists of the samples plotted in the following figure: www.it-ebooks.info Clustering with K-Means Clustering might reveal the following two groups, indicated by squares and circles: Clustering could also reveal the following four groups: [ 116 ] www.it-ebooks.info Chapter 6 Clustering is commonly used to explore a dataset. Social networks can be clustered to identify communities and to suggest missing connections between people. In biology, clustering is used to find groups of genes with similar expression patterns. Recommendation systems sometimes employ clustering to identify products or media that might appeal to a user. In marketing, clustering is used to find segments of similar consumers. In the following sections, we will work through an example of using the K-Means algorithm to cluster a dataset. Clustering with the K-Means algorithm The K-Means algorithm is a clustering method that is popular because of its speed and scalability.

pages: 170 words: 51,205

Information Doesn't Want to Be Free: Laws for the Internet Age
by Cory Doctorow , Amanda Palmer and Neil Gaiman
Published 18 Nov 2014

Customers don’t necessarily deliver themselves to “stores”—virtual or physical—and when they do, the titles on offer are rarely the neatly curated, finite, and browsable selections that once dominated. The shelves, instead, are nearly infinite. Browsing has been augmented by search algorithms and automated recommendation systems. And the number of ways for customers to discover new work has exploded. Word of mouth has always been a creator’s best friend. Recommendations from personally trusted sources were a surefire way to sell products. When I worked in a bookstore, one of the most reliable indicators of an imminent sale was two friends entering the store together, and one of them picking up a book and handing it to the other with the words “Oh, you’ve got to read this; you’ll love it.”

pages: 606 words: 157,120

To Save Everything, Click Here: The Folly of Technological Solutionism
by Evgeny Morozov
Published 15 Nov 2013

Ruck.us then calculates your “political DNA” in order to match you with similar users and encourage you to join relevant “rucks” (according to the site, “the word comes from rugby, where players form a ruck when they loosely come together to fight the other team for possession of the ball.”). Ruck.us is like Netflix for politics, with its cause-recommendation engine essentially encouraging you to, say, check out a campaign to ban abortion if you have expressed strong opposition to gun control, much in the way that Netflix would recommend that you check out Rambo if you liked Rocky. Once in a “ruck,” members can simply follow news posted by other members or be more proactive and share information themselves: links to relevant petitions, organizations, and events are particularly encouraged.

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

In this sense they will be like butlers who are paid on commission; they will never help us without at least implicitly wanting something in return. They will make astute inferences we don’t necessarily want them to make. And we will come to realize that we are now—already, in the present—almost never acting alone. A friend of mine is in recovery from an alcohol addiction. The ad recommendation engines of their social media accounts know all too much. Their feed is infested with ads for alcohol. Now here’s a person, their preference model says, who LOVES alcohol. As the British writer Iris Murdoch wrote: “Self-knowledge will lead us to avoid occasions of temptation rather than rely on naked strength to overcome them.”54 For any addiction or compulsion, the better part of wisdom tells us—in the case of alcohol, say—that it’s better to throw out every last drop in our home than it is to have it around and not drink it.

pages: 757 words: 193,541

The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2
by Thomas A. Limoncelli , Strata R. Chalup and Christina J. Hogan
Published 27 Aug 2014

It could not be installed by users because the framework does not permit Python libraries that include portions written in compiled languages. PaaS provides many high-level services including storage services, database services, and many of the same services available in IaaS offerings. Some offer more more esoteric services such as Google’s Machine Learning service, which can be used to build a recommendation engine. Additional services are announced periodically. 3.1.3 Software as a Service SaaS is what we used to call a web site before the marketing department decided adding “as a service” made it more appealing. SaaS is a web-accessible application. The application is the service, and you interact with it as you would any web site.

pages: 204 words: 67,922

Elsewhere, U.S.A: How We Got From the Company Man, Family Dinners, and the Affluent Society to the Home Office, BlackBerry Moms,and Economic Anxiety
by Dalton Conley
Published 27 Dec 2008

Not only would my local video store not have been able to afford the shelf space to stock Ring of Bright Water, but the issue more germane to the present discussion is that I would have never even known to ask for it. In fact, short of some chance encounter of a recommendation at a dinner party, I would have never even known that this 1969 British film existed. The fact that I now know it exists can be attributed to the network basis of the Netflix recommendation system. The connected economy, then, does not merely facilitate sameness and the diffusion of hits. It can encourage niche consumption (as Chris Anderson celebrates in The Long Tail). But as wonderful as it is to have a computer recommend a sleeper film that even the slacker clerks at my neighborhood video store wouldn’t be able to name, there is a subtle cost to this form of knowledge diffusion.

pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy
by Pistono, Federico
Published 14 Oct 2012

The classical “Turing test approach” has been largely abandoned as a realistic research goal, and is now just an intellectual curiosity (the annual Loebner prize for realistic chattiest81), but helped spawn the two dominant themes of modern cognition and artificial intelligence: calculating probabilities and producing complex behaviour from the interaction of many small, simple processes. As of today (2012), we believe these represent more closely what the human brain does, and they have been used in a variety of real-world applications: Google’s autonomous cars, search results, recommendation systems, automated language translation, personal assistants, cybernetic computational search engines, and IBM’s newest super brain Watson. Natural language processing was believed to be a task that only humans could accomplish. A word can have different meanings depending on the context, a phrase could not mean what it says if it is a joke or a pun.

pages: 247 words: 71,698

Avogadro Corp
by William Hertling
Published 9 Apr 2014

“I wish I could find something,” he finally said, “but I don’t know what. There’s this brilliant self-taught Serbian kid who is doing some stuff with artificial intelligence algorithms, and he’s doing it all on his home PC. I’ve been reading his blog, and it sounds like he has some really novel approaches to recommendation systems. But I don’t see any way we could duplicate what he’s doing before the end of the week.” Mike was really grasping at straws. Thin straws at that. He hated to bring bad news to David. “Maybe we can turn down the accuracy of the system. If we use fewer language-goal clusters, we can run with less memory and fewer processor cycles.

pages: 270 words: 64,235

Effective Programming: More Than Writing Code
by Jeff Atwood
Published 3 Jul 2012

Here’s the one bit that struck me as most essential: We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends. If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine. One of the first systems our engineers built in AWS is called the Chaos Monkey.

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
by Thomas H. Davenport
Published 4 Feb 2014

For instance, LinkedIn has used some of its own internal data to predict which companies will buy LinkedIn Chapter_07.indd 158 03/12/13 12:42 PM What You Can Learn from Start-Ups and Online Firms   159 products, and even who in those firms has the highest likelihood of buying. This work led to an internal recommendation system for salespeople that makes it much easier for them to get the data in one place, and has improved conversion rates by several hundred percent. LinkedIn’s cofounder, Reid Hoffman, is a strong advocate for big data: Because of Web 2.0 [the explosion of social networks and c ­ onsumer participation in the web] and the increasing number of sensors, there’s all this data.

pages: 210 words: 65,833

This Is Not Normal: The Collapse of Liberal Britain
by William Davies
Published 28 Sep 2020

Distrust and audit culture work in a vicious circle, generating a spiral of surveillance and paranoia. Once suspicions are cast on others – be they public officials, teachers or other members of our community – no amount of data will be sufficient to alleviate them. The platform economy drives this into everyday life. Reputation and recommendations systems were originally unveiled with the promise of establishing trust between strangers, for instance on eBay. But Airbnb is now increasingly plagued by the phenomenon of sellers installing secret cameras around their homes, to seek additional proof of a buyer’s honesty. The authority of language is downgraded in the process.

pages: 296 words: 66,815

The AI-First Company
by Ash Fontana
Published 4 May 2021

First, it gathered a great deal of data on products and helped customers make better buying decisions by putting all of that data in the product listings, providing comparison tables with structured product information. More information meant better comparisons and decisions. Then Amazon invested in a team to build machine learned search and recommendation systems: A9. This team effectively got that product data and matched it with purchase data to learn which products customers want to buy so that Amazon could recommend similar products to those customers in listing pages and search results. Gathering a lot of data started the entry-level network effect: Amazon was the most useful shopping website to consumers because it had the most product information.

pages: 234 words: 67,589

Internet for the People: The Fight for Our Digital Future
by Ben Tarnoff
Published 13 Jun 2022

This messiness is manifest in online spaces, contrary to the “filter bubbles” thesis—which, like the theory that polarization is produced by social media, has scant evidence to support it. People can and do find like-minded interlocutors on the internet, and the algorithms that underpin social media feeds and recommendation systems can contribute to these clusterings. But the conversations that ensue rarely resemble an echo chamber, with everyone parroting the same party line. When the researchers P. M. Krafft and Joan Donovan examined the origins of one campaign to spread false information on 4chan, a message board popular with the far Right, they found “widespread heterogeneity of beliefs and contestation of the claims.”

pages: 236 words: 77,098

I Live in the Future & Here's How It Works: Why Your World, Work, and Brain Are Being Creatively Disrupted
by Nick Bilton
Published 13 Sep 2010

Here are three different ways people, especially young ones, may evaluate whether something is worth purchasing. Bad = Free My friend Mike loves music. In fact, Mike is a music fanatic. In every spare moment he has, Mike scours the Web and his social networks, searching for new music to listen to and potentially purchase. Like most of his friends, Mike uses his recommendation systems and social networks to find the music he’s interested in. He’ll preview a few songs, and if he decides the content is good, he’ll follow through with a purchase. He rarely buys entire albums because he believes most albums contain only one or two good songs. Mike also follows a handful of bands and immediately buys their entire albums on release day.

pages: 706 words: 202,591

Facebook: The Inside Story
by Steven Levy
Published 25 Feb 2020

Facebook, she felt, had built an engine to push propaganda. She managed to get a meeting with a News Feed director, who conceded that some groups were problematic but that the company did not want to hamper free expression. “I wasn’t asking for suppression,” DiResta says. “I was saying your recommendation engine was growing this community!” * * * • • • IN FACT, HALFWAY around the world, there was terrifying proof of those fears. In the Philippines. By 2015, nearly all inhabitants of that Pacific island country of 10 million had been on Facebook for several years. A major factor in making this happen was the Internet.org Facebook program—hatched from the Growth team—known as Free Basics.

Writing Effective Use Cases
by Alistair Cockburn
Published 30 Sep 2000

System recalls the selected solution. 26c4. Continue at step 26 26d. Shopper wants to finance products in the shopping cart with available Finance Plans: 26d1. Shopper chooses to finance products in the shopping cart 26d2. System will present a series of questions that are dependent on previous answers to determine finance plan recommendations. System interfaces with Finance System to obtain credit rating approval. Initiate Obtain Finance Rating. 26d3. Shopper will select a finance plan 26d4. System will present a series of questions based on previous answers to determine details of the selected finance plan. 26d5. Shopper will view financial plan details and chooses to go with the plan. 26d6.

pages: 411 words: 80,925

What's Mine Is Yours: How Collaborative Consumption Is Changing the Way We Live
by Rachel Botsman and Roo Rogers
Published 2 Jan 2010

Collective Wisdom of Members At the same time, Netflix has built a sophisticated platform to foster a community among members, and to tailor recommendations to individual tastes. Talk to anyone who has ever used Netflix and they will tell you about how they “discovered releases,” “learned about classics,” and “found rare gems” they never would have found on their own at a store. Approximately 60 percent of members base their selections on Netflix’s Cinematch recommendations system. Early on, people’s willingness to share and rate the films they had watched and to make suggestions to “friends” surprised the founders. The user community itself adopted the ethos of “Millions of members helping you.” Impressively, there are now more than 2 billion ratings from members, and the average member has evaluated approximately two hundred movies.

pages: 239 words: 80,319

Lurking: How a Person Became a User
by Joanne McNeil
Published 25 Feb 2020

Eric Schmidt called multiple results a “bug” in an interview with Charlie Rose in 2005, which is further considered in a Washington Post piece by Gregory Ferenstein (“Google, Competition and the Perfect Result,” January 4, 2013). Nitasha Tiku has reported on activism at Google (“Why Tech Worker Dissent Is Going Viral,” Wired, June 29, 2018). An interview with Guillaume Chaslot, one of the engineers who worked on the recommendation system, in The Guardian (“‘Fiction is outperforming reality’: how YouTube’s algorithm distorts truth,” February 2, 2018) provides more information on how hateful content and misinformation spreads on the platform. Safiya Umoja Noble’s book Algorithms of Oppression (NYU Press, 2018) is a definitive look at Google’s bias.

Know Thyself
by Stephen M Fleming
Published 27 Apr 2021

The remarkable developments in artificial intelligence have not yet been accompanied by comparable developments in artificial self-awareness. In fact, as technology gets smarter, the relevance of our self-awareness might also diminish. A powerful combination of data and machine learning may end up knowing what we want or need better than we know ourselves. The Amazon and Netflix recommendation systems offer up the next movie to watch; dating algorithms take on the job of finding our perfect match; virtual assistants book hair appointments before we are aware that we need them; online personal shoppers send us clothes that we didn’t even know we wanted. As human consumers in such a world, we may no longer need to know how we are solving problems or making decisions, because these tasks have become outsourced to AI assistants.

Off the Edge: Flat Earthers, Conspiracy Culture, and Why People Will Believe Anything
by Kelly Weill
Published 22 Feb 2022

The issue wasn’t just that people were being racist online (a problem as old as the internet). It was that Facebook’s own recommendation algorithm was driving users to those groups. “64% of all extremist group joins are due to our recommendation tools,” an internal Facebook presentation on the study said, namely the “Groups You Should Join” and “Discover” algorithms. “Our recommendation systems grow the problem.” Facebook’s recommendations actively cross-pollinated the conspiracy world, luring truthers over the lines that once demarcated their individual theories. The result was a conspiratorial melting pot: QAnon followers preaching their gospel on pages for people who believed airplanes were spraying mind-control drugs, bogus miracle cures being sold in anti-vaccination groups, and nearly every popular conspiracy theory finding its way onto Flat Earth pages, which saw skeptics of all stripes gather to share notes.

pages: 669 words: 210,153

Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers
by Timothy Ferriss
Published 6 Dec 2016

Chris Anderson (my successor at Wired) named this effect “the Long Tail,” for the visually graphed shape of the sales distribution curve: a low, nearly interminable line of items selling only a few copies per year that form a long “tail” for the abrupt vertical beast of a few bestsellers. But the area of the tail was as big as the head. With that insight, the aggregators had great incentive to encourage audiences to click on the obscure items. They invented recommendation engines and other algorithms to channel attention to the rare creations in the long tail. Even web search companies like Google, Bing, and Baidu found it in their interests to reward searchers with the obscure because they could sell ads in the long tail as well. The result was that the most obscure became less obscure.

How to Stand Up to a Dictator
by Maria Ressa
Published 19 Oct 2022

Instead of making the platform more transparent, as Mark claimed to be doing, the company made sure that no one but Facebook had the data to see the whole picture.15 Even when the company produced its own disturbing internal research findings, its executives refused to act. A 2016 internal presentation about Germany detailed that “64% of all extremist group joins are due to our recommendation tools,” such as algorithms driving “Groups You Should Join” and “Discover.” The report made a very clear statement: “Our recommendation systems grow the problem.”16 Facebook has a staggering ability to determine the fates of news organizations—of journalism itself, even. Today it has an internal ranking for news that is supposedly determined by algorithms; however, not only did a human code those algorithms, but Facebook decides whether a given user is fed more hate or more facts.

pages: 788 words: 223,004

Merchants of Truth: The Business of News and the Fight for Facts
by Jill Abramson
Published 5 Feb 2019

These stories fit right in the feed, matching the tone and topical matter of Facebook at large, as if they came from readers’ family or friends. It was a breakthrough in making news personal and connecting with readers on their terms, right there in the streamlined scroll that encapsulated their social lives. Like Amazon’s recommendation engine (displaying the products that “customers who bought this item also bought”), BuzzFeed’s empire was built on computer processes that, with as little human input as possible, could pull off the illusion of “getting you.” By 2016 Facebook was far larger than any nation-state, the biggest and most centralized congregation of people—friends, readers, consumers, voters—that the world had ever seen.

RDF Database Systems: Triples Storage and SPARQL Query Processing
by Olivier Cure and Guillaume Blin
Published 10 Dec 2014

This points to two distinct nodes corresponding to blog entries. For each of them, the system will navigate through the category edge and will only retain those with a Science value—that is, in the figure, only blog2 matches to our search. Typical use cases of graph databases are social and e-commerce domains, as well as recommendation systems. 2.2.4 MapReduce In the previous section, we emphasized on solutions that enable us to store data on cluster commodity machines. To apprehend the full potential of this approach, this also has Database Management Systems to come with methods to process this data efficiently—that is, to perform the processing on the servers and to limit the transfer of data between machines to its minimum. ­

pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think
by Marcus Du Sautoy
Published 7 Mar 2019

Ecker and Matthias Bethge, ‘A Neural Algorithm of Artistic Style’, arXiv:1508.06576 (2015) Gondek, David, et al., ‘A Framework for Merging and Ranking of Answers in DeepQA’, IBM Journal of Research and Development, vol. 56(3.4), 14:1–14:12 (2012) Gonthier, Georges, ‘A Computer-Checked Proof of the Four Colour Theorem’, Microsoft Research Cambridge (2005) , ‘Formal Proof: The Four-Color Theorem’, Notices of the AMS, vol. 55, 1382–93 (2008) , et al., ‘A Machine-Checked Proof of the Odd Order Theorem’, Interactive Theorem Proving, Proceedings of the Fourth International Conference on ITP (2013) Goodfellow, Ian J., ‘NIPS 2016 Tutorial: Generative Adversarial Networks’, arXiv:1701.00160 (2016) Guzdial, Matthew J., et al., ‘Crowdsourcing Open Interactive Narrative’, Tenth International Conference on the Foundations of Digital Games (2015) Hadjeres, Gaëtan, François Pachet and Frank Nielsen, ‘DeepBach: A Steerable Model for Bach Chorales Generation’, arXiv:1612.01010 (2017) Hales, Thomas, et al., ‘A Formal Proof of The Kepler Conjecture’, Forum of Mathematics, Pi, vol. 5, e2 (2017) Hermann, Karl Moritz, et al., ‘Teaching Machines to Read and Comprehend’, in Advances in Neural Information Processing Systems, NIPS Proceedings (2015) Ilyas, Andrew, et al., ‘Query-Efficient Black-Box Adversarial Examples’, arXiv:1712.07113 (2017) Khalifa, Ahmed, Gabriella A. B. Barros and Julian Togelius, ‘DeepTingle’, arXiv:1705.03557 (2017) Koren, Yehuda, Robert M. Bell and Chris Volinsky, ‘Matrix Factorization Techniques for Recommender Systems’, Computer Journal, vol. 42(8), 30–37 (2009) Li, Boyang and Mark O. Riedl, ‘Scheherazade: Crowd-Powered Interactive Narrative Generation’, 29th AAAI Conference on Artificial Intelligence (2015) Llano, Maria Teresa, et al., ‘What If a Fish Got Drunk? Exploring the Plausibility of Machine-Generated Fictions’, in Proceedings of the Seventh International Conference on Computational Creativity (2016) Loos, Sarah, et al., ‘Deep Network Guided Proof Search’, arXiv: 1701.06972v1 (2017) Mahendran, Aravindh and Andrea Vedaldi, ‘Understanding Deep Image Representations by Inverting Them’, Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5188–96 (2015) Mathewson, Kory Wallace and Piotr W.

pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by Mary L. Gray and Siddharth Suri
Published 6 May 2019

There is no easy, free alternative, unless everyone decides to delete their social media accounts. FIX 5: RÉSUMÉ 2.0 AND PORTABLE REPUTATION SYSTEMS Since requesters can seamlessly enter and exit the market, independent workers are often at a disadvantage when it comes to getting a rating or recommendation after they finish a task. On-demand workers will need reputation and recommendation systems that help them navigate finding their next income opportunity and manage the risk of such an uncertain future. They might establish a rapport with a requester for months, like Joan or Riyaz, only to find that requester leave the market or begin looking for workers with different skills. Successful workers will need to adapt to this dynamic environment quickly.

pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans
by Melanie Mitchell
Published 14 Oct 2019

On the one hand, deep neural networks, trained via supervised learning, perform remarkably well (though still far from perfectly) on many problems in computer vision, as well as in other domains such as speech recognition and language translation. Because of their impressive abilities, these networks are rapidly being taken from research settings and employed in real-world applications such as web search, self-driving cars, face recognition, virtual assistants, and recommendation systems, and it’s getting hard to imagine life without these AI tools. On the other hand, it’s misleading to say that deep networks “learn on their own” or that their training is “similar to human learning.” Recognition of the success of these networks must be tempered with a realization that they can fail in unexpected ways because of overfitting to their training data, long-tail effects, and vulnerability to hacking.

Rockonomics: A Backstage Tour of What the Music Industry Can Teach Us About Economics and Life
by Alan B. Krueger
Published 3 Jun 2019

In fact, the number of musical choices facing individuals greatly expanded—and became even more bewildering—with the advent of streaming services, which is likely to lead us to rely even more on our social networks for clues in selecting songs and artists. And the rapidly growing set of curated recommendation systems that use Big Data to help us discover new music is also likely to reinforce network effects, unless there is a surge in demand for curation systems that recommend songs that are both unpopular and likely to stay that way. Gloria Estefan: Music of the Heart Gloria Estefan is the most successful crossover artist of all time.

Artificial Whiteness
by Yarden Katz

As Jordan writes, “By the turn of the century forward-looking companies such as Amazon were already using ML [machine-learning] throughout their business, solving mission-critical, back-end problems in fraud detection and supply-chain prediction, and building innovative consumer-facing services such as recommendation systems.” His main point is that these technical ideas and the disciplines that produced them weren’t part of an attempt to “imitate” human intelligence and thus should not be labeled “AI.” In a familiar turn, Jordan offers to switch the letters and call it instead “Intelligence Augmentation (IA).”   16.   

pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control
by Stuart Russell
Published 7 Oct 2019

A new word, softbot, was coined to describe software “robots” that operate entirely in a software environment such as the Web. Softbots, or bots as they later became known, perceive Web pages and act by emitting sequences of characters, URLs, and so on. AI companies mushroomed during the dot-com boom (1997–2000), providing core capabilities for search and e-commerce, including link analysis, recommendation systems, reputation systems, comparison shopping, and product categorization. In the early 2000s, the widespread adoption of mobile phones with microphones, cameras, accelerometers, and GPS provided new access for AI systems to people’s daily lives; “smart speakers” such as the Amazon Echo, Google Home, and Apple HomePod have completed this process.

pages: 380 words: 109,724

Don't Be Evil: How Big Tech Betrayed Its Founding Principles--And All of US
by Rana Foroohar
Published 5 Nov 2019

” This was a culture in which the metrics were always right. The company was simply serving users, even if that meant knowingly monetizing content that was undermining the fabric of democracy.3 A spokesperson at YouTube, which doesn’t contradict the basic facts of Chaslot’s account, told me in 2018 that the company’s recommendation system has “changed substantially over time” and now includes other metrics beyond watch time, including consumer surveys and the number of shares and likes. And, as this book goes to press in the summer of 2019, YouTube is, in the wake of the FTC investigations along with numerous reports of pedophiles using the platform to find and share videos of children,4 considering whether to shift children’s content into an entirely separate app to avoid such problems.5 But as anyone who uses the site knows, you are, at this moment, still served up more of whatever you have spent the most time with—whether that’s videos of cats playing the piano or conspiracy theories.

pages: 903 words: 235,753

The Stack: On Software and Sovereignty
by Benjamin H. Bratton
Published 19 Feb 2016

As discussed in the City layer chapter, there is then a kind of programmatic blending between the urban situation through which a User moves and the interactions he may be having with a specific App and Cloud service. A mall becomes a game board, a sidewalk becomes a banking center, a restaurant becomes the scene of a crime in a crowd-sourced recommendation engine, birds are angry and enemies are identified, and the experience of these may be very different for different people and purposes. At any given moment, multiple Users interacting with different Apps in the same place may have brought their shared location into contrasting Cloud dramas; one may be ensconced in a first-person shooter game and the other in measuring his carbon footprint, further fragmenting any apparent solidarity of the crowd.

pages: 396 words: 117,897

Making the Modern World: Materials and Dematerialization
by Vaclav Smil
Published 16 Dec 2013

Even for economies with good historical statistics, all pre-World War II GDP estimates are less reliable than their post-1950 counterparts, and for many modernizing economies they are simply unavailable, or amount to nothing but rough estimates: these realities make reliable long-term international comparisons questionable. Moreover, recent GDPs, calculated according to a UN-recommended System of National Accounts, exclude all black market (underground economy) transactions whose addition would boost the total by 10–15% even in the most law-abiding countries, and could double the economy's size in the most lawless settings. But, once again, the most important bias comes from conversion.

pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again
by Eric Topol
Published 1 Jan 2019

Ross, C., “Deal Struck to Mine Cancer Patient Database for New Treatment Insights,” Stat News. 2017. 32. Muoio, D., “Machine Learning App Migraine Alert Warns Patients of Oncoming Episodes,” MobiHealthNews. 2017. 33. Comstock, J., “New ResApp Data Shows ~90 Percent Accuracy When Diagnosing Range of Respiratory Conditions,” MobiHealthNews. 2017. 34. Han, Q., et al., A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care. arXiv, 2018. 35. Razzaki, S., et al., A Comparative Study of Artificial Intelligence and Human Doctors for the Purpose of Triage and Diagnosis. arXiv, 2018; Olson, P., “This AI Just Beat Human Doctors on a Clinical Exam,” Forbes. 2018. 36. Foley, K.

Super Thinking: The Big Book of Mental Models
by Gabriel Weinberg and Lauren McCann
Published 17 Jun 2019

Many algorithms operate as black boxes, which means they require very little understanding by the user of how they work. You don’t care how you got the best seats, you just want the best seats! You can think of each algorithm as a box where inputs go in and outputs come out, but outside it is painted black so you can’t tell what is going on inside. Common examples of black box algorithms include recommendation systems on Netflix or Amazon, matching on online dating sites, and content moderation on social media. Physical tools can also be black boxes. Two sayings, “The skill is built into the tool” and “The craftsmanship is the workbench itself,” suggest that the more sophisticated tools get, the fewer skills are required to operate them.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

Alexa, Siri, Cortana, and Google offer assistants that can answer questions and carry out tasks for the user; for example the Google Duplex service uses speech recognition and speech synthesis to make restaurant reservations for users, carrying out a fluent conversation on their behalf. Recommendations: Companies such as Amazon, Facebook, Netflix, Spotify, YouTube, Walmart, and others use machine learning to recommend what you might like based on your past experiences and those of others like you. The field of recommender systems has a long history (Resnick and Varian, 1997) but is changing rapidly due to new deep learning methods that analyze content (text, music, video) as well as history and metadata (van den Oord et al., 2014; Zhang et al., 2017). Spam filtering can also be considered a form of recommendation (or dis-recommendation); current AI techniques filter out over 99.9% of spam, and email services can also recommend potential recipients, as well as possible response text.

Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. MIT Press. Renner, G. and Ekart, A. (2003). Genetic algorithms in computer aided design. ComputerAided Design, 35, 709–726. Rényi, A. (1970). Probability Theory. Elsevier. Resnick, P. and Varian, H. R. (1997). Recommender systems. CACM, 40, 56–58. Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In ICML-14. Riazanov, A. and Voronkov, A. (2002). The design and implementation of VAMPIRE. AI Communications, 15, 91–110. Ribeiro, M. T., Singh, S., and Guestrin, C. (2016).

Royal Society of Edinburgh, 10, 664–665. Tamaki, H. and Sato, T. (1986). OLD resolution with tabulation. In ICLP-86. Tan, P, Steinbach, M., Karpatne, A., and Kumar, V. (2019). Introduction to Data Mining (2nd edition). Pearson. Tang, E. (2018). A quantum–inspired classical algorithm for recommendation systems. arXiv:1807.04271. Tarski, A. (1935). Die Wahrheitsbegriffinden formalisierten Sprachen. Studia Philosophica, 1, 261–405. Tarski, A. (1941). Introduction to Logic and to the Methodology of Deductive Sciences. Dover. Tarski, A. (1956). Logic, Semantics, Metamathematics: Papers from 1923 to 1938.

Enriching the Earth: Fritz Haber, Carl Bosch, and the Transformation of World Food Production
by Vaclav Smil
Published 18 Dec 2000

Boca Raton, Fla.: Lewis Publishing; Trenkel, M. A. 1997. Improving Fertilizer Use Efficiency. Paris: IFA. 11. Havlin, J. L., et al., eds. 1994. Soil Testing: Prospects for Improving Nutrient Recommendations. Madison, Wis.: Soil Science Society of America; MacKenzie, G. H., and J.-C. Taureau. 1997. Recommendation Systems for Nitrogen—A Review. York: Fertiliser Society. Periodic testing for major macronutrients has been common in high-income nations for decades, but testing for micronutrient deficiencies (ranging from boron and copper in many crops to molybdenum and cobalt needed by nitrogenase in leguminous species) has been much less frequent. 12.

pages: 476 words: 132,042

What Technology Wants
by Kevin Kelly
Published 14 Jul 2010

As always, the solution to the problems that technology brings, such as an overwhelming diversity of choices, is better technologies. The solution to ultradiversity will be choice-assist technologies. These better tools will aid humans in making choices among bewildering options. That is what search engines, recommendation systems, tagging, and a lot of social media are all about. Diversity, in fact, will produce tools to handle diversity. (Diversity-taming tools will be among the wildly diversity-making 821 million patents that current rates predict will have been filed in the U.S. Patent Office by 2060!) We are already discovering how to use computers to augment our choices with information and web pages (Google is one such tool), but it will take additional learning and technologies to do this with tangible stuff and idiosyncratic media.

pages: 752 words: 131,533

Python for Data Analysis
by Wes McKinney
Published 30 Dec 2011

MovieLens 1M Data Set GroupLens Research (http://www.grouplens.org/node/73) provides a number of collections of movie ratings data collected from users of MovieLens in the late 1990s and early 2000s. The data provide movie ratings, movie metadata (genres and year), and demographic data about the users (age, zip code, gender, and occupation). Such data is often of interest in the development of recommendation systems based on machine learning algorithms. While I will not be exploring machine learning techniques in great detail in this book, I will show you how to slice and dice data sets like these into the exact form you need. The MovieLens 1M data set contains 1 million ratings collected from 6000 users on 4000 movies.

Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth
by Stuart Ritchie
Published 20 Jul 2020

Even worse: of that seven, fully six were redundant compared to much simpler methods that had been known about for years before these new algorithms appeared. Maurizio Ferrari Dacrema et al., ‘Are We Really Making Much Progress?: A Worrying Analysis of Recent Neural Recommendation Approaches’, in Proceedings of the 13th ACM Conference on Recommender Systems – RecSys 2019 (Copenhagen, Denmark: ACM Press, 2019): pp. 101–9; https://doi.org/10.1145/3298689.3347058. See also this report from computer science, which hints that new researchers are having trouble reproducing the performance of several classic algorithms – something of a ticking time bomb, since ‘young researchers don’t want to be seen as criticising senior researchers’ by publishing failures to reproduce the performance of algorithms the senior researchers had developed and on which they’d staked their reputations: Matthew Hutson, ‘Artificial Intelligence Faces Reproducibility Crisis’, Science 359, no. 6377 (16 Feb. 2018): pp. 725–26; https://doi.org/10.1126/science.359.6377.725, p. 726. 44.  

pages: 1,829 words: 135,521

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
by Wes McKinney
Published 25 Sep 2017

Percentage Windows and non-Windows users in top-occurring time zones We could have computed the normalized sum more efficiently by using the transform method with groupby: In [66]: g = count_subset.groupby('tz') In [67]: results2 = count_subset.total / g.total.transform('sum') 14.2 MovieLens 1M Dataset GroupLens Research provides a number of collections of movie ratings data collected from users of MovieLens in the late 1990s and early 2000s. The data provide movie ratings, movie metadata (genres and year), and demographic data about the users (age, zip code, gender identification, and occupation). Such data is often of interest in the development of recommendation systems based on machine learning algorithms. While we do not explore machine learning techniques in detail in this book, I will show you how to slice and dice datasets like these into the exact form you need. The MovieLens 1M dataset contains 1 million ratings collected from 6,000 users on 4,000 movies.

pages: 460 words: 131,579

Masters of Management: How the Business Gurus and Their Ideas Have Changed the World—for Better and for Worse
by Adrian Wooldridge
Published 29 Nov 2011

They also discover that the crowds don’t always have their best interests at heart: when Justin Bieber, a Canadian teenage pop star, asked his fans for suggestions as to what country he should visit next, the most popular answer was North Korea.9 One popular solution to the problem of oversupply is to use prizes to give crowdsourcing a focus and structure. The value of prizes being offered by corporations has more than tripled over the past decade, to $375 million.10 Netflix offers a $1 million prize to anyone who can improve its film recommendation system by 10 percent. Frito-Lay offers prizes to people who can come up with new TV ads for its products. Indeed, prizes have become businesses in their own right: InnoCentive has created a network of 170,000 scientists who stand ready to solve R&D problems for a price. Regular users include some of the world’s biggest companies, such as Eli Lilly, which helped to found the network in 2001; Boeing; DuPont; and P&G.

pages: 502 words: 132,062

Ways of Being: Beyond Human Intelligence
by James Bridle
Published 6 Apr 2022

Google and others’ stated mission is to reduce this vast complexity. Their less trumpeted goal is to profit from it, at the expense of our own potential for random encounters, and thereby for our own evolution. So many of our tools are designed to reduce randomness in a similar fashion: from algorithmic recommendation systems to dating apps, from GPS navigation to weather forecasting. Each of these technologies – with the best of intentions – attempts to draw clear lines through a complex environment and provides us with a route to our desires free from obstructions, diversions and the vagaries of chance and unforeseen encounters.

pages: 1,202 words: 144,667

The Linux kernel primer: a top-down approach for x86 and PowerPC architectures
by Claudia Salzberg Rodriguez , Gordon Fischer and Steven Smolski
Published 15 Nov 2005

Many of the C library routines available to user mode programs, such as the fork() function in Figure 3.9, bundle code and one or more system calls to accomplish a single function. When a user process calls one of these functions, certain values are placed into the appropriate processor registers and a software interrupt is generated. This software interrupt then calls the kernel entry point. Although not recommended, system calls (syscalls) can also be accessed from kernel code. From where a syscall should be accessed is the source of some discussion because syscalls called from the kernel can have an improvement in performance. This improvement in performance is weighed against the added complexity and maintainability of the code.

pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
by Bruce Schneier
Published 2 Mar 2015

So, for example, Bruce Schneier might be 608429. They were surprised when researchers were able to attach names to numbers by correlating different items in individuals’ search history. In 2008, Netflix published 10 million movie rankings by 500,000 anonymized customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using at that time. Researchers were able to de-anonymize people by comparing rankings and time stamps with public rankings and time stamps in the Internet Movie Database. These might seem like special cases, but correlation opportunities pop up more frequently than you might think.

pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design
by Diomidis Spinellis and Georgios Gousios
Published 30 Dec 2008

The real magic, however, is the explicit linkage between publicly available information, what that linkage represents, and the ease with which we can create windows into this underlying content. There is no starting point, and there is no end in sight. As long as we know what to ask for, we can usually get to it. Several technologies have emerged to help us know what to ask for, either through search engines or some manner of recommendation system. We like giving names to things because we are fundamentally name-oriented beings; we use names to disambiguate “that thing” from “that other thing.” One of our earliest communication acts as children is to name and point to the subjects that interest us and to ask for them. In many ways, the Web is the application of this childlike wonder to our collective wisdom and folly.

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurélien Géron
Published 13 Mar 2017

Going forward, my best advice to you is to practice and practice: try going through all the exercises if you have not done so already, play with the Jupyter notebooks, join Kaggle.com or some other ML community, watch ML courses, read papers, attend conferences, meet experts. You may also want to study some topics that we did not cover in this book, including recommender systems, clustering algorithms, anomaly detection algorithms, and genetic algorithms. My greatest hope is that this book will inspire you to build a wonderful ML application that will benefit all of us! What will it be? Aurélien Géron, November 26th, 2016 1 For more details, be sure to check out Richard Sutton and Andrew Barto’s book on RL, Reinforcement Learning: An Introduction (MIT Press), or David Silver’s free online RL course at University College London. 2 “Playing Atari with Deep Reinforcement Learning,” V.

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies
by Nick Bostrom
Published 3 Jun 2014

Then the entire system was overthrown by the heliocentric theory of Copernicus, which was simpler and—though only after further elaboration by Kepler—more predictively accurate.63 Artificial intelligence methods are now used in more areas than it would make sense to review here, but mentioning a sampling of them will give an idea of the breadth of applications. Aside from the game AIs listed in Table 1, there are hearing aids with algorithms that filter out ambient noise; route-finders that display maps and offer navigation advice to drivers; recommender systems that suggest books and music albums based on a user’s previous purchases and ratings; and medical decision support systems that help doctors diagnose breast cancer, recommend treatment plans, and aid in the interpretation of electrocardiograms. There are robotic pets and cleaning robots, lawn-mowing robots, rescue robots, surgical robots, and over a million industrial robots.64 The world population of robots exceeds 10 million.65 Modern speech recognition, based on statistical techniques such as hidden Markov models, has become sufficiently accurate for practical use (some fragments of this book were drafted with the help of a speech recognition program).

pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives
by Steven Levy
Published 12 Apr 2011

While he put the pieces of YouTube together, though, he always kept in mind that he was documenting a traditional media system on the verge of collapse. He had to deal with the music world as it was but also plan for the way it would be after disruptions, which Google and YouTube were accelerating. Kamangar had some specific ideas for improvement of YouTube. He urged a simpler user interface and a smarter recommendation system to point users to other videos they might enjoy. He urged more flexibility with producers of professional video so YouTube would get more commercial content. He also emphasized how some of Google’s key attributes—notably speed—had a huge impact on the overall experience. If Google could reliably deliver videos with almost no latency, he reasoned, users might not balk so much at the “preroll” ads that come before the actual content, especially if the video was one of a series that users subscribed to and so were already eager to see what was coming.

pages: 834 words: 180,700

The Architecture of Open Source Applications
by Amy Brown and Greg Wilson
Published 24 May 2011

To cater to a broader set of users, including many who do not have programming expertise, it provides a series of operations and user interfaces that simplify workflow design and use [FSC+06], including the ability to create and refine workflows by analogy, to query workflows by example, and to suggest workflow completions as users interactively construct their workflows using a recommendation system [SVK+07]. We have also developed a new framework that allows the creation of custom applications that can be more easily deployed to (non-expert) end users. The extensibility of VisTrails comes from an infrastructure that makes it simple for users to integrate tools and libraries, as well as to quickly prototype new functions.

pages: 586 words: 186,548

Architects of Intelligence
by Martin Ford
Published 16 Nov 2018

All over the world people are interacting with AI today through machine translation, image analysis, and computer vision. DeepMind has started working on quite a few things, like optimizing the energy being used in Google’s data centers. We’ve worked on WaveNet, the very human-like text-to-speech system that’s now in the Google Assistant in all Android-powered phones. We use AI in recommendation systems, in Google Play, and even on behind-the-scenes elements like saving battery life on your Android phone. Things that everyone uses every single day. We’re finding that because they’re general algorithms, they’re coming up all over the place, so I think that’s just the beginning. What I’m hoping will come through next are the collaborations we have in healthcare.

pages: 775 words: 208,604

The Great Leveler: Violence and the History of Inequality From the Stone Age to the Twenty-First Century
by Walter Scheidel
Published 17 Jan 2017

The wealthy either held office themselves or were linked to those who did, and state service and connections to those who performed it in turn generated more personal wealth.8 These dynamics both favored and constrained familial continuity in wealth holding. On the one hand, the sons of high officials were more likely to follow in their footsteps. They and other junior relatives were automatically entitled to enter officialdom and benefited disproportionately from the recommendation system employed to fill governmental positions. We hear of officials among whose brothers and sons six or seven—in one case, no fewer than thirteen sons—also came to serve as imperial administrators. On the other hand, the same predatory and capricious exercise of political power that turned civil servants into plutocrats also undermined their success.

pages: 933 words: 205,691

Hadoop: The Definitive Guide
by Tom White
Published 29 May 2009

When processing the received data, we distinguish between a track listen submitted by a user (the first source above, referred to as a scrobble from here on) and a track listened to on the Last.fm radio (the second source, mentioned earlier, referred to as a radio listen from here on). This distinction is very important in order to prevent a feedback loop in the Last.fm recommendation system, which is based only on scrobbles. One of the most fundamental Hadoop jobs at Last.fm takes the incoming listening data and summarizes it into a format that can be used for display purposes on the Last.fm website as well as for input to other Hadoop programs. This is achieved by the Track Statistics program, which is the example described in the following sections.

pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom
by Yochai Benkler
Published 14 May 2006

Without one of these noncompetitive infrastructure owners, the home user has no broadband access to the Internet. In Amazon's case, the consumer outrage when the practice was revealed focused on the lack of transparency. Users had little objection to clearly demarcated advertisement. The resistance was to the nontransparent manipulation of the recommendation system aimed at causing the consumers to act in ways consistent with Amazon's goals, rather than their own. In that case, however, there were alternatives. There are many different places from which to find book reviews and recommendations, and [pg 157] at the time, barnesandnoble.com was already available as an online bookseller--and had not significantly adopted similar practices.

Engineering Security
by Peter Gutmann

The same applies for many of the other results of psychology research mentioned above — you can scoff at them, but that won’t change the fact that they work when applied in the field (although admittedly trying to apply the Sapir-Whorf hypothesis to security messages may be going a bit far [20]). You can use social validation in your user interface to guide users in their decisionmaking, and in fact a similar technique has already been applied to the problem of making computer error messages more useful, using a social recommendation system to tune the error messages to make them comprehensible to larger numbers of users [21]. For example when you’re asking the user to make a security-related decision you can prompt them that “most users would do xyz” or “for most users, xyz is the best action”, where xyz is the safest and most appropriate choice.

, Marc Conrad, Tim French, Wei Huang and Carsten Maple, Proceedings of the 1st International Conference on Availability, Reliability and Security (ARES’06), April 2006, p.482. [191] “Graphical Representations of Authorization Policies for Weighted Credentials”, Isaac Agudo, Javier Lopez and Jose Montenegro, Proceedings of the 11th Australasian Conference on Information Security and Privacy (ACISP’06), Springer-Verlag LNCS No.4058, July 2006, p.87. [192] ”Vulnerability analysis of certificate graphs”, Eunjin Jung and Mohamed Gouda, International Journal of Security and Networks, Vol.1, No.1/2 (2006), p.13. [193] “Towards a Precise Semantics for Authenticity and Trust”, Reto Kohlas, Jacek Jonczy and Rolf Haenni, Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services, October 2006, Article No.18. [194] “A Hybrid Trust Model for Enhancing Security in Distributed Systems”, Ching Lin and Vijay Varadharajan, Proceedings of the 2nd International Conference on Availability, Reliability and Security (ARES’07), April 2007, p.35. [195] “A Probabilistic Trust Model for GnuPG”, Jacek Jonczy, Markus Wűthrich and Rolf Haenni, presentation at the 23rd Chaos Communication Congress (23C3), December 2006, https://events.ccc.de/congress/2006/Fahrplan/attachments/1101-JWH06.pdf. [196] “Trust-Based Recommendation Systems: an Axiomatic Approach”, Reid Andersen, Christian Borgs, Jennifer Chayes, Uriel Feige, Abraham Flaxman, Adam Kalai, Vahab Mirrokni and Moshe Tennenholtz, Proceedings of the 17th World Wide Web Conference (WWW’08), April 2008, p.199. [197] “An Adaptive Probabilistic Trust Model and Its Evaluation” Chung-Wei Hang, Yonghong Wang and Munindar Singh, Proceedings of the 7th Conference on Autonomous Agents and Multiagent Systems (AAMAS’08), May 2008, p.1485. [198] “Trust*: Using Local Guarantees to Extend the Reach of Trust”, Stephen Clarke, Bruce Christianson and Hannan Xiao, Proceedings of the 17th Security Protocols Workshop (Protocols’09), Springer-Verlag LNCS No.7028, April 2009, p.189. [199] “Trust Is in the Eye of the Beholder”, Dimitri DeFigueiredo, Earl Barr and S.Felix Wu, Proceedings of the Conference on Computational Science and Engineering (CSE’09), August 2009, Vol.3, p.100.

pages: 918 words: 257,605

The Age of Surveillance Capitalism
by Shoshana Zuboff
Published 15 Jan 2019

Zuckerberg had described the corporation’s decision to unilaterally release users’ personal information, declaring, “We decided that these would be the social norms now, and we just went for it.”55 Despite their misgivings, the authors went on to suggest the relevance of their findings for “marketing,” “user interface design,” and recommender systems.56 In 2013 another provocative study by Kosinski, Stillwell, and Microsoft’s Thore Graepel revealed that Facebook “likes” could “automatically and accurately estimate a wide range of personal attributes that people would typically assume to be private,” including sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender.57 The authors appeared increasingly ambivalent about the social implications of their work.