book scanning

back to index

description: process of converting physical media into digital media

33 results

pages: 380 words: 109,724

Don't Be Evil: How Big Tech Betrayed Its Founding Principles--And All of US
by Rana Foroohar
Published 5 Nov 2019

After all, just as no small innovator with a patent could battle Big Tech, there was no way that an individual writer or musician, for example, could wage a legal battle to try to get royalties from the likes of Facebook or Google—or even to understand how much money the companies were making by linking to the content or leading advertisers to it as part of their search or social business models.18 As an example, consider the decade-long battle between Google and myriad authors and publishers over the Google Print project, later renamed Google Books. Scanning every single page of every single book in the world had long been an obsession of Page and Brin—it was, after all, a typically Google-sized ambition. They knew that the majority of the world’s books were protected under copyright from such unauthorized copying and distribution. But the Googlers felt, in typical form, that such pesky rules didn’t apply to them.

Plus, they couldn’t understand why anyone would think it was better for authors to make money on books than for the entire world to have free access to information. So in 2002, they simply began scanning pages, albeit covertly. As tech writer Steven Levy put it in his book, In the Plex, which devotes twenty pages to the book-scanning project, “The secrecy was yet another expression of the paradox of a company that sometimes embraced transparency and other times seemed to model itself on the NSA.”19 Schmidt, who had by then decided that “evil is what Sergey says is evil,”20 was all for the project, which he declared “genius.”21 The publishing industry disagreed.

Google, which was earning about $10 billion in yearly revenue at that point, would pay the relatively tiny sum of $125 million to establish a registry of book rights holders and pay lawyers to organize the system and the payouts. It was a complete coup for Big Tech. Brewster Kahle, the head of the nonprofit Internet Archive, which wanted to do its own book-scanning project, claimed (not incorrectly) that Google had become an information monopolist. Even Lawrence Lessig, the digital law expert who favors many of the policies that the platforms support, said that Google’s deal was the equivalent of a “digital bookstore, not a digital library.”23 What he means is that even as Google was presenting the entire project as being done for the benefit of users, Google itself would ultimately benefit the most.

pages: 117 words: 30,654

Kindle Formatting: The Complete Guide to Formatting Books for the Amazon Kindle
by Joshua Tallent
Published 1 Apr 2009

There are a variety of options available to the do-it-yourself person or to the pay-someone-else person. The main benefit to doing the process yourself is saving money, but you may find that having some help in the process is easier and faster. The first step in the OCR process is to have your book scanned. This is a process where each page of your book is turned into an image that can be loaded into the OCR program. There are a variety of places that will do scanning for you, or you can tackle the process yourself. Some copy and print stores (like FedEx/Kinko’s) offer scanning services, but you will often find the best prices at companies that specialize in scanning documents onto microfiche.

Be aware that the easiest way to scan a book on regular consumer scanners is to cut off the binding, which will effectively ruin the book. If your book is rare and you want to keep it intact, you should make sure the scanning company knows to handle it gently and to not cut off the binding. There is one consumer scanner called the OpticBook 3600 that is specifically designed for book scanning. That device is built in a way that allows a good scan of the pages without cutting the binding off or breaking the binding by forcing the book into unnatural positions on a flat surface. If you decide to scan the book yourself, you will need a flatbed or feed scanner. These devices are available at most electronics and computer stores and at various retailers online.

pages: 413 words: 106,479

Because Internet: Understanding the New Rules of Language
by Gretchen McCulloch
Published 22 Jul 2019

Even those of us who know that a single book isn’t the sole repository of a language and that dictionaries are records of how people are already using the language, not providers of words for us to start using—we still often think of the English language as contained within a sufficiently large quantity of books. We think of it as “the language of Shakespeare,” or the twenty volumes of the second edition of the Oxford English Dictionary, or the entire Library of Congress, or the millions of books scanned and made searchable by Google Books. This association isn’t accidental. If we look at how frequently people wrote the phrase “English language” across all the books scanned by Google, from 1500 to 2000, we see a major upswing between 1750 and 1800. It’s consistently low beforehand, and consistently high thereafter. “English” and “language” by themselves are pretty much steady—it’s just the two words together that go up.

By the time computers did start supporting lowercase characters, we were faced with two competing standards: one group of people assumed that all caps is just how you write on a computer, while another group insisted that it stood for yelling. Ultimately, the emotional meaning won out. The shift in function happened in parallel with a shift in name: according to the millions of books scanned in Google Books, the terms “all caps” and “all uppercase” started rising sharply in the early 1990s. By contrast, in the earlier part of the century, the preferred terms were “block letters” or “block capitals.” People tended to use “all caps” to talk about the loud kind, while block capitals more often referred to the official kind, on signs and on forms.

In Library of Congress, ed., Chronicling America: Historic American Newspapers. chroniclingamerica.loc.gov/lccn/sn84026925/1856-04-17/ed-1/seq-4/. at one point it did: Thanks to Guy English (personal communication) for confirming that this was the case for FORTRAN and COBOL. millions of books scanned: Search for block capitals,block letters,all caps,all uppercase,caps lock in Google Books Ngram Viewer with date parameter 1800 to 2000. books.google.com/ngrams/graph?content=block+capitals%2Cblock+letters%2Call+caps%2Call+uppercase%2Ccaps+lock&year_start=1800&year_end=2000&corpus=15&smoothing=3.

pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives
by Steven Levy
Published 12 Apr 2011

Or they could simply be borrowed in the first place. “We came up with all these numbers,” says Mayer. “We were emailing them around, the right cost per hour, the right number of pages per hour—debate, debate, debate. After one thread hinged on how many pages an hour we could do, we decided we should just scan one.” They set up a makeshift book scanning device. They tried several sizes of books, the first one, appropriately enough, being The Google Book, an illustrated children’s story by V. C. Vickers. (The “Google” in the title was an odd creature with aspects of mammal, reptile, and fish.) They then tested a photo book, Ancient Forests by David Middleton; a dense text, Algorithms in C by Robert Sedgewick; and a general-interest book, Startup, by Jerry Kaplan.

So it commissioned some of its best wizards to build a machine that, presumably, would work much more accurately and at a somewhat brisker rate than Marissa Mayer turning pages one by one. Though Google wasn’t known for actually building machines, its data center needs had generated a lot of engineering expertise in that area: remember, it was the world’s biggest manufacturer of computer servers. One of the difficulties in book scanning rested in producing high-quality images from the printed page, so that OCR software could accurately translate the shapes of the letters on the page to computer-readable text. The problem was that, on their own, books did not sit flat on the platform: they presented a 3-D problem requiring a 2-D solution.

And as far as user information was concerned, Google made it easy for people not to become locked into using its products. It even had an initiative called the Data Liberation Front to make sure that users could easily move information they created with Google documents off Google’s servers. It would seem that book scanning was a good candidate for similar transparency. If Google had a more efficient way to scan books, sharing the improved techniques could benefit the company in the long run—inevitably, much of the output would find its way onto the web, bolstering Google’s indexes. But in this case, paranoia and a focus on short-term gain kept the machines under wraps.

pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think
by Viktor Mayer-Schonberger and Kenneth Cukier
Published 5 Mar 2013

Its system sucked in every translation it could find, in order to train the computer. In went to corporate websites in multiple languages, identical translations of official documents, and reports from intergovernmental bodies like the United Nations and the European Union. Even translations of books from Google’s book-scanning project were included. Where Candide had used three million carefully translated sentences, Google’s system harnessed billions of pages of translations of widely varying quality, according to the head of Google Translate, Franz Josef Och, one of the foremost authorities in the field. Its trillion-word corpus amounted to 95 billion English sentences, albeit of dubious quality.

Transforming words into data unleashes numerous uses. Yes, the data can be used by humans for reading and by machines for analysis. But as the paragon of a big-data company, Google knows that information has multiple potential purposes that can justify its collection and datafication. So Google cleverly used the datafied text from its book-scanning project to improve its machine-translation service. As explained in Chapter Three, the system would take books that are translations and analyze what words and phrases the translators used as alternatives from one language to another. Knowing this, it could then treat translation as a giant math problem, with the computer figuring out probabilities to determine what word best substitutes for another between languages.

. [>] Quantifying the world—Much of the authors’ thinking on the history of datafication has been inspired by Crosby, The Measure of Reality. [>] Europeans were never exposed to abacuses—Ibid., 112. Calculating faster using Arabic numerals—Alexander Murray, Reason and Society in the Middle Ages (Oxford University Press, 1978), p. 166. [>] Total number of books published and Harvard study on Google book-scanning project—Jean-Baptiste Michel et al., “Quantitative Analysis of Culture Using Millions of Digitized Books,” Science 331 (January 14, 2011), pp. 176–182 (http://www.sciencemag.org/content/331/6014/176.abstract). For a video lecture on the paper, see Erez Lieberman Aiden and Jean-Baptiste Michel, “What We Learned from 5 Million Books,” TEDx, Cambridge, MA, 2011 (http://www.ted.com/talks/what_we_learned_from_5_million_books.html). [>] On wireless modules in cars and insurance—See Cukier, “Data, Data Everywhere.”

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by Eric Siegel
Published 19 Feb 2013

The demands of open question answering reach far beyond the computer’s traditional arena of storing and accessing data for flight reservations and bank records. We’re going to need a smarter robot. The Ultimate Knowledge Source We are not scanning all those books to be read by people. We are scanning them to be read by an AI. —A Google employee regarding Google’s book scanning, as quoted by George Dyson in Turing’s Cathedral: The Origins of the Digital Universe A bit of good news: IBM didn’t need to create comprehensive databases for the Jeopardy! challenge because the ultimate knowledge source already exists: the written word. I am pleased to report that people like to report; we write down what we know in books, web pages, Wikipedia entries, blogs, and newspaper articles.

McKeown, “Learning Methods to Combine Linguistic Indicators: Improving Aspectual Classification and Revealing Linguistic Insights,” Computational Linguistics 26, issue 4 (December 2000). doi:10.1162/089120100750105957, http://dl.acm.org/citation.cfm?id=971886. Googling only 30 percent of the Jeopardy! questions right: Stephen Baker, Final Jeopardy: Man vs. Machine and the Quest to Know Everything (Houghton Mifflin Harcourt, 2011), 212–224. Quote about Google’s book scanning project: George Dyson, Turing’s Cathedral: The Origins of the Digital Universe (Pantheon Books, 2012). Natural language processing: Dursun Delen, Andrew Fast, Thomas Hill, Robert Nisbit, John Elder, and Gary Miner, Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications (Academic Press, 2012).

pages: 465 words: 109,653

Free Ride
by Robert Levine
Published 25 Oct 2011

Perhaps most important, the settlement would have set an informal precedent that scanning books requires an agreement with publishers or authors. “The alternative was to take our chances on winning the lawsuit, and we probably would have,” Aiken says. “But if we didn’t, it would have been a catastrophe because [Google would have] millions of books scanned that authors and publishers would have no legal control over.” Like Amazon and Apple, Google sees books as a means to an end—in this case giving its search engine access to more information. “Probably the highest-quality knowledge is captured in books,” Sergey Brin said.16 Like record labels, publishers have become arms suppliers in a cold war between technology companies.

Roy MacLeod, The Library of Alexandria: Centre of Learning in the Ancient World (New York: I. B. Tauris, 2000), p. 5. According to MacLeod, customs officials confiscated texts from passing ships, as well as visitors. They took originals for the library and returned copies to the owners. 12. There are two common views of whether Google’s book-scanning project qualifies as fair use. One, held by copyright reform activists, is that scanning books in order to create an index is no different from a card catalog, so it obviously falls under fair use. The other is that such a big project by a private company couldn’t possibly qualify. A court would probably find the issue less obvious than either side makes it out to be.

pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future
by Kevin Kelly
Published 6 Jun 2016

Legal tussles over the right to sample—to remix—snippets of music, particularly when either the sampled song or the borrowing song make a lot of money, are ongoing. The appropriateness of remixing, reusing material from one news source for another is a major restraint for new journalistic media. Legal uncertainty about Google’s reuse of snippets from the books it scanned was a major reason it closed down its ambitious book scanning program (although the court belatedly ruled in Google’s favor in late 2015). Intellectual property is a slippery realm. There are many aspects of contemporary intellectual property laws that are out of whack with the reality of how the underlying technology works. For instance, U.S. copyright law gives a temporary monopoly to a creator for his or her creation in order to encourage further creation, but the monopoly has been extended for at least 70 years after the death of the creator, long after a creator’s dead body can be motivated by anything.

., 62 extraordinary events, 277–79 eye tracking, 219–20 Facebook and aggregated information, 147 and artificial intelligence, 32, 39, 40 and “click-dreaming,” 280 cloud of, 128, 129 and collaboration, 273 and consumer attention system, 179, 184 and creative remixing, 199, 203 face recognition of, 39, 254 and filtering systems, 170, 171 flows of posts through, 63 and future searchability, 24 and interactivity, 235 and intermediation of content, 150 and lifestreaming, 246 and likes, 140 nonhierarchical infrastructure of, 152 number of users, 143, 144 as platform ecosystem, 123 and sharing economy, 139, 144, 145 and tracking technology, 239–40 and user-generated content, 21–22, 109, 138 facial recognition, 39, 40, 43, 220, 254 fan fiction, 194, 210 fear of technology, 191 Felton, Nicholas, 239–40 Fifield, William, 288 films and film industry, 196–99, 201–2 filtering, 165–91 and advertising, 179–89 differing approaches to, 168–75 filter bubble, 170 and storage capacity, 165–67 and superabundance of choices, 167–68 and value of attention, 175–79 findability of information, 203–7 firewalls, 294 first-in-line access, 68 first-person view (FPV), 227 fitness tracking, 238, 246, 255 fixity, 78–81 Flickr, 139, 199 Flows and flowing, 61–83 and engagement of users, 81–82 and free/ubiquitous copies, 61–62, 66–68 and generative values, 68–73 move from fixity to, 78–81 in real time, 64–65 and screen culture, 88 and sharing, 8 stages of, 80–81 streaming, 66, 74–75, 82 and users’ creations, 73–74, 75–78 fluidity, 66, 79, 282 food as service (FaS), 113–14 footnotes, 201 411 information service, 285 Foursquare, 139, 246 fraud, 184 freelancers (prosumers), 113, 115, 116–17, 148, 149 Freeman, Eric, 244–45 fungibility of digital data, 195 future, blindness to, 14–22 Galaxy phones, 219 gatekeepers, 167 Gates, Bill, 135, 136 gaze tracking, 219–20 Gelernter, David, 244–46 General Electric, 160 generatives, 68–73 genetics, 69, 238, 284 Gibson, William, 214 gifs, 195 global connectivity, 275, 276, 292 gluten, 241 GM, 185 goods, fixed, 62, 65 Google AdSense ads, 179–81 and artificial intelligence, 32, 36–37, 40 book scanning projects, 208 cloud of, 128, 129 and consumer attention system, 179, 184 and coveillance, 262 and facial recognition technology, 254 and filtering systems, 172, 188 and future searchability, 24 Google Drive, 126 Google Glass, 217, 224, 247, 250 Google Now, 287 Google Photo, 43 and intellectual property law, 208–9 and lifelogging, 250–51, 254 and lifestreaming, 247–48 and photo captioning, 51 quantity of searches, 285–86 and smart technology, 223–25 translator apps of, 51 and users’ usage patterns, 21, 146–47 and virtual reality technology, 215, 216–17 and visual intelligence, 203 government, 167, 175–76, 252, 255, 261–64 GPS technology, 226, 274 graphics processing units (GPU), 38–39, 40 Greene, Alan, 31–32, 238 grocery shopping, 62, 253 Guinness Book of World Records, 278 hackers, 252 Hall, Storrs, 264–65 Halo, 227 Hammerbacher, Jeff, 280 hand motion tracking, 222 haptic feedback, 233–34 harassment, online, 264 hard singularity, 296 Harry Potter series, 204, 209–10 Hartsell, Camille, 252 hashtags, 140 Hawking, Stephen, 44 health-related websites, 179–81 health tracking, 173, 238–40, 250 heat detection, 226 hierarchies, 148–54, 289 High Fidelity, 219 Hinton, Geoff, 40 historical documents, 101 hive mind, 153, 154, 272, 281 Hockney, David, 155 Hollywood films, 196–99 holodeck simulations, 211–12 HoloLens, 216 the “holos,” 292–97 home surveillance, 253 HotWired, 18, 149, 150 humanity, defining, 48–49 hyperlinking antifacts highlighted by, 279 of books, 95, 99 of cloud data, 125–26 and creative remixing, 201–2 early theories on, 18–19, 21 and Google search engines, 146–47 IBM, 30–31, 40, 41, 128, 287 identity passwords, 220, 235 IMAX technology, 211, 217 implantable technology, 225 indexing data, 258 individualism, 271 industrialization, 49–50, 57 industrial revolution, 189 industrial robots, 52–53 information production, 257–64.

pages: 629 words: 142,393

The Future of the Internet: And How to Stop It
by Jonathan Zittrain
Published 27 May 2009

The act of creating a search engine, like the act of surfing itself, is something so commonplace that it would be difficult to imagine deeming it illegal—but this is not to say that search engines rest on any stronger of a legal basis than the practice of using robots.txt to determine when it is and is not appropriate to copy and archive a Web site.114 Only recently, with Google’s book scanning project, have copyright holders really begun to test this kind of question.115 That challenge has arisen over the scanning of paper books, not Web sites, as Google prepares to make them searchable in the same way Google has indexed the Web.116 The long-standing practice of Web site copying, guided by robots.txt, made that kind of indexing uncontro-versial even as it is, in theory, legally cloudy.

., 188–92; and procrastination principle, 152, 164, 180, 242, 245; security in, 166; stability of, 153–74; use of term, 74; as what we make them, 242, 244–46; as works in progress, 152 generative technology: accessibility of, 72–73, 93; and accountability, 162–63; adaptability of, 71–72, 93, 125; affordance theory, 78; Apple II, 2; benefits of, 64, 79–80, 84–85; blending of models for innovation, 86–90; control vs. anarchy in, 98, 150, 157–62; design features of, 43; ease of mastery, 72; end-to-end neutrality of, 165; expansion of, 34; features of, 71–73; freedom vs. security in, 3–5, 40–43, 151; free software philosophy, 77; and generative content, 245; group creativity, 94, 95; hourglass architecture, 67–71, 99; innovation as output of, 80–84, 90; input/participation in, 90–94; leverage in, 71, 92–93; non-generative generative technology (continued) compared to, 73–76; openness of, 19, 150, 156–57, 178; pattern of, 64, 67, 96–100; as platform, 2, 3; recursive, 95–96; success of, 42–43; theories of the commons, 78–79; transferability of, 73; vulnerability of, 37–51, 54–57, 60–61, 64–65 generative tools, 74–76 generativity: extra-legal solutions for, 168–73; Libertarian model of, 131; and network neutrality, 178–81; paradox of, 99; recursive, 94; reducing, and increasing security, 97, 102, 165, 167, 245; repurposing via, 212; use of term, 70; and Web 2.0, 123–26 Geocities.com, 119, 189 GNE, 132–33, 134, 135 GNU/Linux, 64, 77, 89, 114, 190, 192 GNUpedia, 132 goldfish bowl cams, 158 “good neighbors” system, 160 Google: and advertising, 56; book scanning project of, 224–25, 242; Chinese censorship of, 113, 147; clarification available on, 230; data gathering by, 160, 221; death penalty of, 218, 220; image search on, 214–15; innovation in, 84; map service of, 124, 184, 185; privacy policy on, 306n47; and procrastination principle, 242; as search engine, 223, 226; and security, 52, 171; and spam, 170–73 Google Desktop, 185 Google News, 242 Google Pagerank, 160 Google Video, 124 governments: abuse of power by, 117–19, 187; oppressive, monitoring by, 33; PCs investigated by, 186–88; research funding from, 27, 28 GPS (Global Positioning Systems), 109, 214 graffiti, 45 Griffith, Virgil, and Wikiscanner, 151 Gulf Shipbuilding Corporation, 172 gun control legislation, 117 hackers: ethos of, 43, 45, 53; increasing skills of, 245 Harvard University, Berkman Center, 159, 170 HD-DVDs, 123 Health, Education, and Welfare (HEW) Department, U.S., privacy report of (1973), 201–5, 222, 233–34 Herdict, 160, 163, 167–68, 173, 241 Hippel, Eric von, 86–87, 98, 146 Hollerith, Herman, 11–12, 13; business model of, 17, 20, 24 Hollerith Tabulating Machine Company, 11–12 home boxes, 180–81 honor codes, 128–29 Horsley, Neal, 215 Hotmail, 169 “How’s My Driving” programs, 219, 229 HTML (hypertext markup language), 95 Hunt, Robert, 190 Hush-A-Phone, 21–22, 81, 82, 121 hyperlinks, 56, 89 hypertext, coining of term, 226 IBM (International Business Machines): antitrust suit against, 12; business model of, 12, 23, 30, 161; competitors of, 12–13; and generative technology, 64; Internet Security Systems, 47–48; mainframe computers, 12, 57; OS/2, 88; and risk aversion, 17, 57; System 360, 174 identity tokens, unsheddable, 228 image recognition, 215–16 immigration, illegal, 209 information appliances: accessibility of, 29, 232; code thickets, 188–92; content thickets, 192–93; and data portability, 176–78; generative systems compared to, 73–76; limitations of, 177; and network neutrality, 178–85; PCs as, 4, 59–61, 102, 185–88; PCs vs., 18, 29, 57–59; and perfect enforcement, 161; and privacy, 185–88; regulatory interventions in, 103–7, 125, 197; remote control of, 161; remote updates of, 106–7, 176; security dilemma of, 42, 106–7, 123–24, 150, 176–88; specific injunction, 108–9; variety of designs for, 20; Web 2.0 and, 102; See also specific information appliances information overload, 230 information services, early forms, 9 InnoCentive, 246 innovation: blending models for, 86–90; generativity as parent of, 80–84, 90; group, 94; and idiosyncrasy, 90–91; inertia vs., 83–84; “sustaining” vs.

pages: 189 words: 57,632

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future
by Cory Doctorow
Published 15 Sep 2008

More importantly, the free e-book skeptics have no evidence to offer in support of their position — just hand-waving and dark muttering about a mythological future when book-lovers give up their printed books for electronic book-readers (as opposed to the much more plausible future where book lovers go on buying their fetish objects and carry books around on their electronic devices). I started giving away e-books after I witnessed the early days of the "bookwarez" scene, wherein fans cut the binding off their favorite books, scanned them, ran them through optical character recognition software, and manually proofread them to eliminate the digitization errors. These fans were easily spending 80 hours to rip their favorite books, and they were only ripping their favorite books, books they loved and wanted to share. (The 80-hour figure comes from my own attempt to do this — I'm sure that rippers get faster with practice.)

The Orbital Perspective: Lessons in Seeing the Big Picture From a Journey of 71 Million Miles
by Astronaut Ron Garan and Muhammad Yunus
Published 2 Feb 2015

The driver or homeowner has demonstrated a track record of living up to agreements, and the collective wisdom of the crowd can point to a high level of dependability. This is similar to Duolingo’s use of beginning language students to provide translations or ReÂ�CAPTCHA’s ability to crowdsource the accuracy of book scans. Community-Based Trust These examples relate to personal trust, but there are countless similar examples of communities that form online for a specific purpose and operate in a coordinated way for the greater good. Wikipedia, for instance, was built on the premise that people enjoy interacting within a community, which in the case of Wikipedia, is a global village documenting human knowledge.

pages: 173 words: 14,313

Peers, Pirates, and Persuasion: Rhetoric in the Peer-To-Peer Debates
by John Logie
Published 29 Dec 2006

Google intended to display small portions of the books, limiting users to reviewing a page at a time, and blocking printing. Less than a year later some members of the American Association of University Presses were petitioning the courts, demanding the right to opt out Pa r l orPr e s s wwwww. p a r l or p r e s s . c om Conclusion: The Cat Came Back 147 of having their authors’ books scanned. Other publishers are now demanding that Google request and receive permissions for each book it scans. And, for good measure, free speech advocates are encouraging Google to refuse to honor the publishers’ wishes and publish everything based on a hard-line fair use claim. Once again, U.S. Copyright Law has magically transformed an attempt to build Borges’s Library of Babel into the Tower of Babel, wherein the participants are unable to communicate with one another, and progress toward lofty goals is impossible.

Not That Kind of Girl: A Young Woman Tells You What She's "Learned"
by Lena Dunham
Published 28 Sep 2014

I know I shouldn’t drink anymore, or should at least temper it with a few handfuls of the crisps they are passing around. No one can explain how they came to live here. Nellie hops up, discarding her coat while announcing that it’s freezing. “Let me show you round,” she says. I take in every detail of the house like I’m six again and reading a picture book, scanning the illustrations carefully. Next to a marble fireplace lies an issue of Elle, a torn thigh-high stocking, an empty pack of Marlboros, a half-eaten pudding cup. And each room leads to another, like one of those New York real-estate dreams where you open a hidden door and discover massive rooms you didn’t even know you had.

pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future
by Luke Dormehl
Published 10 Aug 2016

Compare that number to the 25,000 books read by the woman who has laid claim to the title of Britain’s most avid reader, having read around a dozen books each week since 1946. In an entire lifetime, even the most prolific reader is unable to read one-thousandth of the books Google has absorbed since it started its book-scanning project in just October 2004. With increasingly large datasets, computers are getting better and better at performing tasks like textual analysis, which is why they are being used for tasks like identifying who wrote particular books in cases where this is unknown. But generating novelty is not enough.

pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything
by Gordon Bell and Jim Gemmell
Published 15 Feb 2009

Other publications papers and reports g. People, references, recommendations, vitae h. Archived company and organizational folders (X) i) Digital Equipment Corp. . . . ii) NSF i. Archived calendars and correspondence (t) j. Archived files (e.g., DEC WPS, e-mail) 3. My Books books authored, books scanned 4. My Voice Conversations and Notes (telephone conversations are held in MyLifeBits database) 5. My Media, i.e., song collections from ripped CDs 6. My Videos including c. 1950s 8mm movies and lectures Psychologists have identified “lifetime periods” as an important way that autobiographical memories work.

pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room
by David Weinberger
Published 14 Jul 2011

Even before books, the hundreds of thousands of scrolls in the Library of Alexandria were more than could be carried out to safety from the great fire, much less be read in a lifetime. Only about 2 percent of the Harvard University library system’s physical holdings circulate every year, and most of those are the same works that circulated the previous year.1 The new abundance makes the old abundance look like scarcity. The Google book-scanning project alone has over 15 million scanned books, which you can search through more easily than you can look up an item in the index of the book on your night table.2 Harvard’s Robert Darnton, whom we met in Chapter 6, is among those proposing a Digital Public Library of America,3 a call that has excited interest among public and research librarians, the government, and some large Internet projects.

pages: 253 words: 79,595

The Joy of Less, A Minimalist Living Guide: How to Declutter, Organize, and Simplify Your Life
by Francine Jay

Invest in an electronic reader, and buy digital books instead of physical ones. A single, paperback-size device can hold hundreds of titles (and give you access to thousands of others), eliminating the need for entire bookshelves. Use the power of technology to downsize your photo albums as well. Instead of storing those bulky books, scan the contents into digital format. You can print the ones you’d like to display, one by one, when it strikes your fancy. The benefits of digital photographs are numerous. First, they’re much easier to access. If you want to view pictures from your trip to Paris or the office Christmas party, they’re right at your fingertips on your computer.

pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations
by Nicholas Carr
Published 5 Sep 2016

In 2002, the Google cofounder decided that it was time for his young company to scan all the world’s books into its database. If printed texts weren’t brought online, he feared, Google would never fulfill its mission of making the world’s information “universally accessible and useful.” After doing some book-scanning tests in his office—he manned the camera while Marissa Mayer, then a product manager, turned pages to the beat of a metronome—he concluded that Google had the smarts and the money to get the job done. He set a team of engineers and programmers to work. In a matter of months, they had invented an ingenious scanning device that used a stereoscopic infrared camera to correct for the bowing of pages that occurs when a book is opened.

pages: 240 words: 109,474

Masters of Doom: How Two Guys Created an Empire and Transformed Pop Culture
by David Kushner
Published 2 Jan 2003

Romero, in pure Melvin mode, imagined all the crazy stuff they could do in a game where the object was, as he said, “to mow down Nazis.” He wanted to have the suspense of an Apple II game pumped up with the shock and horror of storming a Nazi bunker. There would be SS soldiers and Hitler. 79 Adrian hit the history books, scanning images of the German leader to include throughout the game. But that wasn’t enough. “How about,” Romero suggested, “we throw in guard dogs? Dogs that you can shoot! Fucking German shepherds!” Adrian cracked up, sketching out a dog that, in a death animation, could yelp back. “And there should be blood,” Romero said, “lots of blood, blood like you never see in games.

pages: 864 words: 272,918

Palo Alto: A History of California, Capitalism, and the World
by Malcolm Harris
Published 14 Feb 2023

Though computer programs didn’t crawl these real-life surfaces by themselves, Google could afford to contract low-wage workers to drive cameras around and to turn pages. Mostly these workers disappeared behind user interfaces, but there were predictable glitches, like the reflection of a Street View worker captured in a shiny window. Artist Benjamin Shaykin’s project Google Hands features problem pages from Google Books scans, including accidentally scanned worker fingers. The fingers periodically get caught, a consistent malfunction in the scraper’s cyborg apparatus. Andrew Norman Wilson’s 2011 short film, Workers Leaving the Googleplex, focuses on the same ScanOps contractors. In the grand NorCal tradition of labor-market segregation, these laborers carried unique yellow badges, though that was hardly necessary to mark them, Wilson writes: “It was the same group of workers, mostly black and Latino, on a campus of mostly white and Asian employees, walking out of the exit like a factory bell had just gone off.”25 They entered and exited at their own special scheduled times—4:00 a.m. and 2:15 p.m.

It’s a plan that backfired with Wilson’s movie, which shows the yellow-badge exodus as Wilson tells the audience how he lost his red-badge job editing film for Google’s on-campus contractor Transvideo after being reported for speaking with ScanOps workers and recording the scene. As Google grew, it combined the monopolistic business strategy of Microsoft with the disrupting scraper speed of Napster. It’s a potent combination, and it left Google strong enough to defend its book scanning from the Authors Guild all the way to the top courts. Not even Bill Gates himself could have conceived of a business plan in which his company extracted value from every word accessed or typed on a Windows machine. Google belonged to a different era. In the closing decades of the twentieth century, as output growth slowed and capital hunted for low-commitment bets, global advertising increased dramatically.

pages: 380 words: 118,675

The Everything Store: Jeff Bezos and the Age of Amazon
by Brad Stone
Published 14 Oct 2013

It was considered a Jeff project, which meant that the product manager met with Bezos every few weeks and received a constant stream of e-mail from the CEO, usually containing extraordinarily detailed recommendations and frequently arriving late at night. Amazon started using Mechanical Turk internally in 2005 to have humans do things like review Search Inside the Book scans and check product images uploaded to Amazon by customers to ensure they were not pornographic. The company also used Mechanical Turk to match the images with the corresponding commercial establishments in A9’s fledgling Block View tool. Bezos himself became consumed with this task and used it as a way to demonstrate the service.

pages: 385 words: 112,842

Arriving Today: From Factory to Front Door -- Why Everything Has Changed About How and What We Buy
by Christopher Mims
Published 13 Sep 2021

Maybe it contained a USB charger. I know we’d already delivered electronics earlier in the day—one woman was delighted to receive her brand-new iPhone on the same day that model was released. So let’s say it was a USB charger, after all. No one was home. We walked to the front door, doing everything by the book—scanning our walk path for obstructions, walking briskly. (“A brisk pace commands attention,” notes the manual, in a tiny haiku that sums up every impression I’ve ever gotten from a UPS driver, but especially as they strode up to my desk or my front door, martial in their uniform and bearing.) We left the package tucked up against the side of the house, in case of rain, but not directly in front of the door, because you never want the customer to trip over it.

pages: 387 words: 119,409

Work Rules!: Insights From Inside Google That Will Transform How You Live and Lead
by Laszlo Bock
Published 31 Mar 2015

Susan Wojcicki and Sheryl Sandberg, a sales VP at the time and now COO of Facebook, were instrumental in growing the concept behind these talks, using their networks and interests to recruit a range of speakers to Google to speak about leadership, women’s issues, and politics. Googlers first self-organized these events into a more formal program in 2006, when they noticed more and more authors visiting to speak with our book-scanning teams. The volunteers asked visiting authors to stick around for a conversation, and our first official Authors@Google guest was none other than Malcolm Gladwell. This grew into today’s broader program called Talks at Google, a speaker series where authors, scientists, business leaders, performers, politicians, and other thought-provoking figures are invited to campus to share their thoughts.

pages: 510 words: 120,048

Who Owns the Future?
by Jaron Lanier
Published 6 May 2013

The real people from whom the initial answers were gathered deserve to be paid for each new answer given by the machine. Consider too the act of scanning a book into digital form. The historian George Dyson has written that a Google engineer once said to him: “We are not scanning all those books to be read by people. We are scanning them to be read by an AI.” While we have yet to see how Google’s book scanning will play out, a machine-centric vision of the project might encourage software that treats books as grist for the mill, decontextualized snippets in one big database, rather than separate expressions from individual writers. In this approach, the contents of books would be atomized into bits of information to be aggregated, and the authors themselves, the feeling of their voices, their differing perspectives, would be lost.

pages: 642 words: 141,888

Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination
by Mark Bergen
Published 5 Sep 2022

A hip-hop devotee, he noticed artists were rising not on MTV, the old kingmaker, but on YouTube, the new cultural barometer. He pulled up Google Video and saw Charlie Rose. Google had another apparent handicap: the company was reluctant to publish videos without screening them first. (The Authors Guild had sued Google over its book-scanning project in 2005, which left Google lawyers nervous.) That screening process worked well during the week, when Google Video teams were clocked in, but not on weekends, when they weren’t. If someone posted a video Friday night, why would they want to wait two days to see it? Especially if they could see it immediately on YouTube.

pages: 496 words: 154,363

I'm Feeling Lucky: The Confessions of Google Employee Number 59
by Douglas Edwards
Published 11 Jul 2011

He dubbed it "product review." Google had birthed a process. Product review met in Larry and Sergey's office. I arrived early to get a seat on the black pleather couch. Otherwise, I'd have had to balance my laptop while sitting on a three-foot rubber ball. A large metal exoskeleton—the prototype for Larry's book-scanning project-held a camera and an array of lights pointing down at the coffee table in front of me. Karen White, Marissa Mayer, Jen McGrath from the front-end team, and Craig Silverstein worked around it, connecting cables to a projector so we could display mockups against the office wall. Sergey leaned back in his desk chair across from us, reading and eating a sandwich.

pages: 541 words: 173,676

Generations: The Real Differences Between Gen Z, Millennials, Gen X, Boomers, and Silents—and What They Mean for America's Future
by Jean M. Twenge
Published 25 Apr 2023

Treating people as individuals means setting aside the idea of group membership as destiny, which gave rise to movements for individual rights based on gender, race, and class, enshrining equality as a core value of the culture. With so much reliance on the self, it was important that people feel good about themselves, so viewing the self positively received more emphasis. Between 1980 and 2019, individualistic phrases promoting self-expression and positivity became steadily more common in the 25 million books scanned in by Google (see Figure 1.3; you can try this database yourself by googling “ngram viewer”). Assuming verbal language mirrored written language, Boomers growing up in the 1950s were only rarely told “just be yourself” or “you’re special,” but Millennials and Gen Z’ers heard these phrases much more often.

pages: 645 words: 184,311

American Gods
by Neil Gaiman
Published 30 Jun 2001

Shadow tore off a strip of a paper towel and placed it into the book as a bookmark. He could imagine Hinzelmann's pleasure in seeing the reference to his grandfather. He wondered if the old man knew that his family had been instrumental in building the lake. Shadow flipped forward through the book, scanning for more references to the lake-building project. They had dedicated the lake in a ceremony in the spring of 1876, as a precursor to the town's centennial celebrations. A vote of thanks to Mr. Hinzelmann was taken by the council. Shadow checked his watch. It was five-thirty. He went into the bathroom, shaved, combed his hair.

How to Hide an Empire: A History of the Greater United States
by Daniel Immerwahr
Published 19 Feb 2019

Hawai‘i appeared seven times that year, Guam not once. In contrast, the Times ran 639 articles about India, Britain’s largest colony. That was nearly three times as many as it ran about all U.S. territories combined, territories in which more than 10 percent of the U.S. population lived. It wasn’t much different in the realm of books. Scanning the library shelves, it’s easy to find high-profile books from the interwar period depicting Native Americans and the western frontier (Little House on the Prairie is one), but prominent treatments of overseas territories are rare. The only one with a truly large audience was Coming of Age in Samoa (1928) by the anthropologist Margaret Mead, a wildly popular ethnography that featured frank discussions of Samoan sexuality and launched Mead’s career as one of the most famous scholars in the country.

EuroTragedy: A Drama in Nine Acts
by Ashoka Mody
Published 7 May 2018

Germans led the intellectual inquiry into “flexible exchange rates.” (Frequency of reference to “flexible exchange rate” in books digitized by Google) Note: The graph was created using the Google Books Ngram Viewer (https://​books.google.com/​ngrams/​ info). It reports the frequency with which the phrase “flexible exchange rate” is mentioned in the books scanned by Google. The term “flexible Wechselkurs” was used for German books, and “taux de change flexible” was used for French books. The English variation “floating exchange rate,” the German variation “schwankender Wechselkurs,” and the French variation “taux de change flottant” yielded similar trends. three leaps in the dark 41 Robert Hetzel, economist at the Federal Reserve Bank of Richmond, would later explain: “Germany’s commitment to a free market economy pushed it to reject fixed exchange rates and adopt floating exchange rates.”82 Thus, in proposing a monetary union, Pompidou was defying not only the global experience that was causing fixed-​exchange-​rate systems to break down, but he was also ignoring the clash between the French dirigiste temperament and the German market-​oriented economic ideology.

pages: 918 words: 257,605

The Age of Surveillance Capitalism
by Shoshana Zuboff
Published 15 Jan 2019

“European Commission—Press Release—Antitrust: Commission Sends Statement of Objections to Google on Android Operating System and Applications,” European Commission, April 20, 2016, http://europa.eu/rapid/press-release_IP-16-1492_en.htm. 22. “Complaint of Disconnect, Inc.,” 40. 23. Marc Rotenberg, phone interview with author, June 2014. 24. Jennifer Howard, “Publishers Settle Long-Running Lawsuit Over Google’s Book-Scanning Project,” Chronicle of Higher Education, October 4, 2012, https://chronicle.com/article/Publishers-Settle-Long-Running/134854; “Google Books Settlement and Privacy,” EPIC.org, October 30, 2016, https://epic.org/privacy/googlebooks; Juan Carlos Perez, “Google Books Settlement Proposal Rejected,” PCWorld, March 22, 2011, http://www.pcworld.com/article/222882/article.html; Eliot Van Buskirk, “Justice Dept. to Google Books: Close, but No Cigar,” Wired, February 5, 2010, http://www.wired.com/2010/02/justice-dept-to-google-books-close-but-no-cigar; Miguel Helft, “Opposition to Google Books Settlement Jells,” New York Times—Bits Blog, April 17, 2009, https://bits.blogs.nytimes.com/2009/04/17/opposition-to-google-books-settlement; Brandon Butler, “The Google Books Settlement: Who Is Filing and What Are They Saying?”

pages: 936 words: 85,745

Programming Ruby 1.9: The Pragmatic Programmer's Guide
by Dave Thomas , Chad Fowler and Andy Hunt
Published 15 Dec 2000

Let’s give ourselves a simple problem to solve. Let’s say that we’re running a secondhand bookstore. Every week, we do stock control. A gang of clerks uses portable bar-code scanners to record every book on our shelves. Each scanner generates a simple comma-separated value (CSV) file containing one row for each book scanned. The row contains (among other things) the book’s ISBN and price. An extract from one of these files looks something like this: "Date","ISBN","Amount" "2008-04-12","978-1-9343561-0-4",39.45 "2008-04-13","978-1-9343561-6-6",45.67 "2008-04-14","978-1-9343560-7-4",36.95 Our job is to take all the CSV files and work out how many of each title we have, as well as the total list price of the books in stock.

pages: 1,205 words: 308,891

Bourgeois Dignity: Why Economics Can't Explain the Modern World
by Deirdre N. McCloskey
Published 15 Nov 2011

In Massinger’s A New Way to Pay Old Debts (mid-1620s) everyone, high and low, speaks in blank verse. 7. Thus: “For he today that sheds his blood with me,” iambic pentameter. 8. Magnusson 1999, p. 120: the lower orders “lack the mastery to assimilate the prestige forms successfully to their actual performance.” 9. Google Books scan of the reprinted 1698 edition, p. 117. The first public edition had been 1664, well after Mun’s death. Bizarrely, this famous remark (and “One man’s necessity becomes another man’s opportunity,” p. 116) is in aid of showing that expenditure on a suit at law is a good thing, because at least the money “is still in the kingdom,” and so foreign trade is unaffected, and so all is well in the crucial matter of acquiring bullion from abroad.