Computational Linguistics

back to index

76 results

pages: 502 words: 107,510

Natural Language Annotation for Machine Learning
by James Pustejovsky and Amber Stubbs
Published 14 Oct 2012

Organizations and Conferences Much of the work on annotation that is available to the public is being done at universities, making conference proceedings the best place to start looking for information about tasks that might be related to your own. Here is a list of some of the bigger conferences that examine annotation and corpora, as well as some organizations that are interested in the same topic: Association for Computational Linguistics (ACL) Institute of Electrical and Electronics Engineers (IEEE) Language Resources and Evaluation Conference (LREC) European Language Resources Association (ELRA) Conference on Computational Linguistics (COLING) American Medical Informatics Association (AMIA) The LINGUIST List is not an organization that sponsors conferences and workshops itself, but it does keep an excellent up-to-date list of calls for papers and dates of upcoming conferences.

Some workshops that you may want to look into are: SemEval This is a workshop held every three years as part of the Association for Computational Linguistics. It involves a variety of challenges including word sense disambiguation, temporal and spatial reasoning, and Machine Translation. Conference on Natural Language Learning (CoNLL) Shared Task This is a yearly NLP challenge held as part of the Special Interest Group on Natural Language Learning of the Association for Computational Linguistics. Each year a new NLP task is chosen for the challenge. Past challenges include uncertainty detection, extracting syntactic and semantic dependencies, and multilingual processing.

In this chapter our goal was to show you the role that annotation is playing in cutting-edge developments in computational linguistics and machine learning. We pointed out how the different components of the MATTER development cycle are being experimented with and improved, including new ways to collect annotations, better methods for training algorithms, ways of leveraging cloud computing and distributed data, and stronger communities for resource sharing and collaboration. Because of a lack of understanding of the role that annotation plays in the development of computational linguistics systems, there is always some discussion that the role of annotation is outdated; that with enough data, accurate clusters can be found without the need for human-vetted categories or other labels.

pages: 174 words: 56,405

Machine Translation
by Thierry Poibeau
Published 14 Sep 2017

In Proceedings of the Twelfth Conference on Computational Linguistics, Vol. 1, 71–76. Association for Computational Linguistics, Stroudsburg, PA. http://dx.doi.org/10.3115/991635.991651/. Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin (1990). “A statistical approach to machine translation.” Computational Linguistics 16 (2): 79–85. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer (1993). “The mathematics of statistical machine translation: Parameter estimation.” Computational Linguistics 19 (2): 263–311.

“The state of the art of automatic language translation: an appraisal” In Beiträge zur Sprachkunde und Informationsverarbeitung, n°2, 17–29. William A. Gale and Kenneth W. Church (1993). “A program for aligning sentences in bilingual corpora.” Journal of Computational Linguistics 19 (1): 75–102. Martin Kay and Martin Röscheisen (1993). “Text-translation alignment.” Journal of Computational Linguistics 19 (1): 121–142. Makoto Nagao (1984). “A framework of a mechanical translation between Japanese and English by analogy principle.” In Artificial and Human Intelligence (A. Elithorn and R. Banerji, eds.). Elsevier Science Publishers, Amsterdam.

Fortieth Annual Meeting of the Association for Computational Linguistics, 311–318. Philadelphia. George Doddington (2002). “Automatic evaluation of machine translation quality using n-gram cooccurrence statistics.” Proceedings of the Human Language Technology Conference, 128–132. San Diego. Satanjeev Banerjee and Alon Lavie (2005). “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments.” Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the Forty-Third Annual Meeting of the Association of Computational Linguistics. Ann Arbor, MI. Martin Kay (2013).

pages: 504 words: 89,238

Natural language processing with Python
by Steven Bird , Ewan Klein and Edward Loper
Published 15 Dec 2009

Journal of Natural Language Engineering, 1:29–81, 1995. [Artstein and Poesio, 2008] Ron Artstein and Massimo Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, pages 555–596, 2008. [Baayen, 2008] Harald Baayen. Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge University Press, 2008. 449 [Bachenko and Fitzpatrick, 1990] J. Bachenko and E. Fitzpatrick. A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics, 16:155–170, 1990. [Baldwin & Kim, 2010] Timothy Baldwin and Su Nam Kim. Multiword Expressions. In Nitin Indurkhya and Fred J.

For more information on the topics covered in Section 1.5, and on NLP more generally, you might like to consult one of the following excellent books: • Indurkhya, Nitin and Fred Damerau (eds., 2010) Handbook of Natural Language Processing (second edition), Chapman & Hall/CRC. • Jurafsky, Daniel and James Martin (2008) Speech and Language Processing (second edition), Prentice Hall. • Mitkov, Ruslan (ed., 2002) The Oxford Handbook of Computational Linguistics. Oxford University Press. (second edition expected in 2010). The Association for Computational Linguistics is the international organization that represents the field of NLP. The ACL website hosts many useful resources, including: information about international and regional conferences and workshops; the ACL Wiki with links to hundreds of useful resources; and the ACL Anthology, which contains most of the NLP research literature from the past 50 years, fully indexed and freely downloadable. 34 | Chapter 1: Language Processing and Python Some excellent introductory linguistics textbooks are: (Finegan, 2007), (O’Grady et al., 2004), (OSU, 2007).

[Earley, 1970] Jay Earley. An efficient context-free parsing algorithm. Communications of the Association for Computing Machinery, 13:94–102, 1970. [Emele and Zajac, 1990] Martin C. Emele and Rémi Zajac. Typed unification grammars. In Proceedings of the 13th Conference on Computational Linguistics, pages 293– 298. Association for Computational Linguistics, Morristown, NJ, 1990. [Farghaly, 2003] Ali Farghaly, editor. Handbook for Language Engineers. CSLI Publications, Stanford, CA, 2003. [Feldman and Sanger, 2007] Ronen Feldman and James Sanger. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data.

pages: 268 words: 109,447

The Cultural Logic of Computation
by David Golumbia
Published 31 Mar 2009

By focusing on written exemplars, CL and NLP have pursued a program that has much in common with the “Strong AI” programs of the 1960s and 1970s that Hubert Dreyfus (1992), John Haugeland (1985), John Searle (1984, 1992), and others have so effectively critiqued. This program has two distinct aspects, which although they are joined intellectually, are often pursued with apparent independence from each other—yet at the same time, the mere presence of the phrase “computational linguistics” in a title is often not at all enough to distinguish which program the researcher has in mind. SHRDLU and the State of the Art in Computational Linguistics The two faces of CL and NLP in its strong mode are either (1) to make computers use language in a fully human fashion, generally via conversational agents that can interact “in the same way as a human being” with human or other language-using interlocutors; and (2) to demonstrate that human language is itself a computational system, and therefore can be made algorithmically tractable for computation.

Despite the repetition of such claims throughout the history of computers, these computationalists presume that CL researchers simply have not experimented with obvious strategies for comprehension and translation, or that insufficient processing power has been brought to bear on the problem. While the presence of professional computational linguists in industry today means that companies who focus on language products make such claims less often than in the past, they can still be found with unexpected frequency outside professional CL circles. Yet the views of professional computational linguists have a surprisingly restricted sphere of influence. They are rarely referenced even by the computer scientists who are working to define future versions of the World Wide Web, especially in the recent project to get the the web to “understand” meanings called the Semantic Web.

This is what early computer scientists, especially those unfamiliar with the history of 20th-century logical work that produced the computer, believed computers might do. The “device” of which Chomsky speaks so often in his early work can be most accurately understood as a computer program, as no less a prominent computational linguist than John Goldsmith (2004) has recently suggested. In the 1950s, the main U.S. centers for the study of automated communications and information systems were established in Cambridge, Massachusetts (especially the Psycho-Acoustic and Electro-Acoustic Laboratories, PAL and EAL respectively).

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

It was understood that, in principle, a big-enough neural network, with enough training examples and time, can learn almost anything.16 But no one had fast-enough computers, enough data to train on, or enough patience to make good on that theoretical potential. Many lost interest, and the field of computer vision, along with computational linguistics, largely moved on to other things. As Hinton would later summarize, “Our labeled datasets were thousands of times too small. [And] our computers were millions of times too slow.”17 Both of these things, however, would change. With the growth of the web, if you wanted not fifty but five hundred thousand “flash cards” for your network, suddenly you had a seemingly bottomless repository of images.

Now imagine you’re a computer with a complete lack of such common sense—let alone the ability to put yourself in the shoes of a prospective treasure burier—but what you do have is an extremely large sample (a “corpus”) of real-world texts to scan for patterns. How good a job could you do at predicting the missing word purely based on the statistics of the language itself? Constructing these kinds of predictive models has long been a grail for computational linguists.54 (Indeed, Claude Shannon founded information theory in the 1940s on a mathematical analysis of this very sort, noticing that some missing words are more predictable than others, and attempting to quantify by how much.55) Early methods involved what are known as “n-grams,” which meant simply counting up every single chain of, say, two words in a row that appeared in a particular corpus—“appeared in,” “in a,” “a particular,” “particular corpus”—and tallying them in a huge database.56 Then it was simple enough, given a missing word, to look at the preceding word and find which n-gram in the database beginning with that preceding word had appeared most often.

When our model guesses wrong, we’ll adjust the coordinates of our word representations to slightly nudge the correct word toward the context words in our mathematical space and slightly nudge any incorrect guesses away. After we make this tiny adjustment, we’ll pick another phrase at random and go through this process again. And again. And again. And again. And again.62 “At this point,” explains Stanford computational linguist Christopher Manning, “sort of a miracle occurs.” In his words: It’s sort of surprising—but true—that you can do no more than set up this kind of prediction objective, make it the job of every word’s word vectors to be such that they’re good at predicting the words that appear in their context or vice-versa—you just have that very simple goal—and you say nothing else about how this is going to be achieved—but you just pray and depend on the magic of deep learning. . . .

pages: 370 words: 94,968

The Most Human Human: What Talking With Computers Teaches Us About What It Means to Be Alive
by Brian Christian
Published 1 Mar 2011

Early approaches were about building huge “dictionaries” of word-to-word pairings, based on meaning, and algorithms for turning one syntax and grammar into another (e.g., if going to Spanish from English, move the adjectives that come before a noun so that they come after it). To get a little more of the story, I spoke on the phone with computational linguist Roger Levy of UCSD. Related to the problem of translation is the problem of paraphrase. “Frankly,” he says, “as a computational linguist, I can’t imagine trying to write a program to pass the Turing test. Something I might do as a confederate is to take a sentence, a relatively complex sentence, and say, ‘You said this. You could also express the meaning with this, this, this, and this.’

Some judges, I would discover, would be startled or confused at this jumping of the gun, and I saw them pause, hesitate, yield, even start backspacing what they had half written. Other judges cottoned on immediately, and leaped right in after.4 In the first round of the 2009 contest, judge Shalom Lappin—computational linguist at King’s College London—spoke with Cleverbot, and then myself. My strategy of verbosity was clearly in evidence: I made 1,089 keystrokes in five minutes (3.6 keystrokes a second) to Cleverbot’s 356 (1.2/sec), and Lappin made 548 keystrokes (1.8/sec) in my conversation, compared to 397 (1.3/sec) with Cleverbot.

They are, in other words, insensitive. Deformation as Mastery In his famous 1946 essay “Politics and the English Language,” George Orwell says that any speaker repeating “familiar phrases” has “gone some distance towards turning himself into a machine.” The Turing test would seem to corroborate that. UCSD’s computational linguist Roger Levy: “Programs have gotten relatively good at what is actually said. We can devise complex new expressions, if we intend new meanings, and we can understand those new meanings. This strikes me as a great way to break the Turing test [programs] and a great way to distinguish yourself as a human.

pages: 239 words: 64,812

Geek Sublime: The Beauty of Code, the Code of Beauty
by Vikram Chandra
Published 7 Nov 2013

But in one of the villages, “people learnt to speak the language with much hope and now wait in vain for the gains that were to follow.”3 The Congress government that followed dropped the project, so the villagers’ ambitions of being appointed Sanskrit teachers for other villages remain frustrated. The Special Centre for Sanskrit Studies at the Jawaharlal Nehru University in New Delhi has more explicit aims: Sanskrit computational linguistics, Sanskrit informatics, Sanskrit computing, Sanskrit language processing. There has also been an effort over the past two decades to reintroduce the Indian scholastic tradition into humanities departments, and students have responded with enthusiasm. Controversies have flared over some of the more clumsy attempts by academic nationalists to proclaim—by fiat—the continuing relevance and accuracy of “Vedic astrological science” and similar subjects.

Jha, Ganganatha, and Mammaṭācārya. The Kāvyapṛakāsha of Mammaṭa. Varanasi: Bharatiya Vidya Prakashan, 1995. Jong89 [pseud.]. “Razorlength—1036 Early Winter by Jong89.” Dwarf Fortress Map Archive, 2009. http://mkv25.net/dfma/poi-22127-dwarvencomputer. Joshi, S. D. “Background of the Aṣṭādhyāyī.” In Sanskrit Computational Linguistics, 1–5. Springer, 2009. http://link.springer.com/chapter/10.1007/978-3-540-93885-9_1. Kapoor, Kapil. Dimensions of Pāṇini Grammar: The Indian Grammatical System. New Delhi: D.K. Printworld, 2005. ______. Text and Interpretation: The Indian Tradition. New Delhi: D.K. Printworld, 2005. Kapoor, Kapil and Nalini M.

Khan, Taslima. “40% of Startups in Silicon Valley Are Headed by India-Based Entrepreneurs.” Business Today, March 21, 2013. http://businesstoday.intoday.in/story/google-executive-chairman-eric-schmidt-on-india/1/193496.html. Kiparsky, Paul. “On the Architecture of Pāṇini’s Grammar.” In Sanskrit Computational Linguistics, 33–94. Springer, 2009. http://link.springer.com/chapter/10.1007/978-3-642-00155-0_2. ______. “Paninian Linguistics.” The Encyclopedia of Language and Linguistics 6 (1995): 59–65. Knuth, Donald E. “All Questions Answered.” Notices of the AMS 49, no. 3 (2002): 318–24. ______. “Literate Programming.”

pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans
by Melanie Mitchell
Published 14 Oct 2019

Chen et al., “Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, Long Papers (2018), 2587–97. 28.  N. Carlini and D. Wagner, “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text,” in Proceedings of the First Deep Learning and Security Workshop (2018). 29.  R. Jia and P. Liang, “Adversarial Examples for Evaluating Reading Comprehension Systems,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017). 30. C. D. Manning, “Last Words: Computational Linguistics and Deep Learning,” Nautilus, April 2017. 14: On Understanding   1.  

here Figure 43: Photographs and captions from H. Chen et al., “Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, Long Papers (2018), 2587–97. Reprinted with permission of Hongge Chen and the Association for Computational Linguistics. here Figure 44: Dorothy Alexander / Alamy Stock Photo. here Figure 45: From www.foundalis.com/res/bps/bpidx.htm. Original images are from M. Bongard, Pattern Recognition (New York: Spartan Books, 1970). here Figure 46: From www.foundalis.com/res/bps/bpidx.htm.

Packer, “Understanding the Language of Facebook,” EmTech Digital video lecture, May 23, 2016, events.technologyreview.com/video/watch/alan-packer-understanding-language. 15.  DeepL Pro, press release, March 20, 2018, www.deepl.com/press.html. 16.  K. Papineni et al., “BLEU: A Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002), 311–18. 17.  Wu et al., “Google’s Neural Machine Translation System”; H. Hassan et al., “Achieving Human Parity on Automatic Chinese to English News Translation,” arXiv:1803.05567 (2018). 18.  Google Translate’s French translation of the “Restaurant” story: Un homme est entré dans un restaurant et a commandé un hamburger, cuit rare.

pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution
by Gregory Zuckerman
Published 5 Nov 2019

Later, Brown realized probabilistic mathematical models also could be used for translation. Using data that included thousands of pages of Canadian parliamentary proceedings featuring paired passages in French and English, the IBM team made headway toward translating text between languages. Their advances partly laid the groundwork for a revolution in computational linguistics and speech processing, playing a role in future speech-recognition advances, such as Amazon’s Alexa, Apple’s Siri, Google Translate, text-to-speech synthesizers, and more. Despite that progress, the researchers were frustrated by IBM’s lack of a clear plan to let the group commercialize its advances.

For whatever reason, she thought Penn was a Jewish university, so she left that one untouched. Magerman thrived at Penn, partly because he had embraced a new cause—proving the other schools had made a mistake turning him down. He excelled in his majors, computer science and mathematics. Chosen to be a teaching assistant in a computational-linguistics course, he lapped up the resulting attention and respect of his fellow students, especially the coeds. His senior-year thesis also gained some recognition. Magerman, an adorable, if insecure, teddy bear of a kid, was finally in his element. At Stanford University, Magerman’s doctoral thesis tackled the exact topic Brown, Mercer, and other IBM researchers were struggling with: how computers could analyze and translate language using statistics and probability.

The team improved its predictive algorithms by developing a rather simple measure of how many times a company was mentioned in a news feed—no matter if the mentions were positive, negative, or even pure rumors. It became clear to Mercer and others that trading stocks bore similarities to speech recognition, which was part of why Renaissance continued to raid IBM’s computational linguistics team. In both endeavors, the goal was to create a model capable of digesting uncertain jumbles of information and generating reliable guesses about what might come next—while ignoring traditionalists who employed analysis that wasn’t nearly as data driven. As more trading became electronic, with human market-makers and middlemen elbowed out of the business, Medallion spread its moves among an expanding number of electronic networks, making it easier and more efficient to buy and sell.

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by Eric Siegel
Published 19 Feb 2013

Ask IBM researchers whether their question answering Watson system is anything like HAL, which goes famously rogue in the film, and they’ll quickly reroute your comparison toward the obedient computers of Star Trek. The field of research that develops technology to work with human language is natural language processing (NLP, aka computational linguistics). In commercial application, it’s known as text analytics. These fields develop analytical methods especially designed to operate across the written word. If data is all Earth’s water, textual data is the part known as “the ocean.” Often said to compose 80 percent of all data, it’s everything we the human race know that we’ve bothered to write down.

In the field of medicine, most clinical studies do this same thing—compare two treatments and see which tends to work better overall. For knee surgery after a ski accident, I had to select a graft source from which to reconstruct my busted anterior cruciate ligament (ACL, the knee’s central ligament—previously known to me as the Association for Computational Linguists). I based my decision on a study that showed subsequent knee-walking was rated “difficult or impossible” by twice as many patients who donated their own patellar tissue rather than hamstring tissue.3 It’s good but it’s not personalized. I can never know if my choice for knee surgery was the best for my particular case (although my knee does seem great now).

Regarding “Time flies like an arrow”: Gilbert Burck, The Computer Age and Its Potential for Management (Harper & Row, 1965). My (the author’s) PhD research pertained to the “have a car/baby” example (temporal meaning in verbs): Eric V. Siegel and Kathleen R. McKeown, “Learning Methods to Combine Linguistic Indicators: Improving Aspectual Classification and Revealing Linguistic Insights,” Computational Linguistics 26, issue 4 (December 2000). doi:10.1162/089120100750105957, http://dl.acm.org/citation.cfm?id=971886. Googling only 30 percent of the Jeopardy! questions right: Stephen Baker, Final Jeopardy: Man vs. Machine and the Quest to Know Everything (Houghton Mifflin Harcourt, 2011), 212–224.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

Work on applications of language processing is presented at the biennial Applied Natural Language Processing conference (ANLP), the conference on Empirical Methods in Natural Language Processing (EMNLP), and the journal Natural Language Engineering. A broad range of NLP work appears in the journal Computational Linguistics and its conference, ACL, and in the International Computational Linguistics (COLING) conference. Jurafsky and Martin (2020) give a comprehensive introduction to speech and NLP. 1And even computer vision applications: WordNet provides the set of categories used by ImageNet. 2Sometimes the authors are credited in the order CKY. 3The subjective case is also sometimes called the nominative case and the objective case is sometimes called the accusative case.

B.3Online Supplemental Material The book has a Web site with supplemental material, instructions for sending suggestions, and opportunities for joining discussion lists: •aima.cs.berkeley.edu The algorithms in the book, and multiple additional programming exercises, have been implemented in Python and Java (and some in other languages) at the online code repository, accessible from the Web site and currently hosted at: •github.com/aimacode Bibliography The following abbreviations are used for frequently cited conferences and journals: AAAI Proceedings of the AAAI Conference on Artificial Intelligence AAMAS Proceedings of the International Conference on Autonomous Agents and Multi-agent Systems ACL Proceedings of the Annual Meeting of the Association for Computational Linguistics AIJ Artificial Intelligence (Journal) AIMag AI Magazine AIPS Proceedings of the International Conference on AI Planning Systems AISTATS Proceedings of the International Conference on Artificial Intelligence and Statistics BBS Behavioral and Brain Sciences CACM Communications of the Association for Computing Machinery COGSCI Proceedings of the Annual Conference of the Cognitive Science Society COLING Proceedings of the International Conference on Computational Linguistics COLT Proceedings of the Annual ACM Workshop on Computational Learning Theory CP Proceedings of the International Conference on Principles and Practice of Constraint Programming CVPR Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition EC Proceedings of the ACM Conference on Electronic Commerce ECAI Proceedings of the European Conference on Artificial Intelligence ECCV Proceedings of the European Conference on Computer Vision ECML Proceedings of the The European Conference on Machine Learning ECP Proceedings of the European Conference on Planning EMNLP Proceedings of the Conference on Empirical Methods in Natural Language Processing FGCS Proceedings of the International Conference on Fifth Generation Computer Systems FOCS Proceedings of the Annual Symposium on Foundations of Computer Science GECCO Proceedings of the Genetics and Evolutionary Computing Conference HRI Proceedings of the International Conference on Human-Robot Interaction ICAPS Proceedings of the International Conference on Automated Planning and Scheduling ICASSP Proceedings of the International Conference on Acoustics, Speech, and Signal Processing ICCV Proceedings of the International Conference on Computer Vision ICLP Proceedings of the International Conference on Logic Programming ICLR Proceedings of the International Conference on Learning Representations ICML Proceedings of the International Conference on Machine Learning ICPR Proceedings of the International Conference on Pattern Recognition ICRA Proceedings of the IEEE International Conference on Robotics and Automation ICSLP Proceedings of the International Conference on Speech and Language Processing IJAR International Journal of Approximate Reasoning IJCAI Proceedings of the International Joint Conference on Artificial Intelligence IJCNN Proceedings of the International Joint Conference on Neural Networks IJCV International Journal of Computer Vision ILP Proceedings of the International Workshop on Inductive Logic Programming IROS Proceedings of the International Conference on Intelligent Robots and Systems ISMIS Proceedings of the International Symposium on Methodologies for Intelligent Systems ISRR Proceedings of the International Symposium on Robotics Research JACM Journal of the Association for Computing Machinery JAIR Journal of Artificial Intelligence Research JAR Journal of Automated Reasoning JASA Journal of the American Statistical Association JMLR Journal of Machine Learning Research JSL Journal of Symbolic Logic KDD Proceedings of the International Conference on Knowledge Discovery and Data Mining KR Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning LICS Proceedings of the IEEE Symposium on Logic in Computer Science NeurIPS Advances in Neural Information Processing Systems PAMI IEEE Transactions on Pattern Analysis and Machine Intelligence PNAS Proceedings of the National Academy of Sciences of the United States of America PODS Proceedings of the ACM International Symposium on Principles of Database Systems RSS Proceedings of the Conference on Robotics: Science and Systems SIGIR Proceedings of the Special Interest Group on Information Retrieval SIGMOD Proceedings of the ACM SIGMOD International Conference on Management of Data SODA Proceedings of the Annual ACM–SIAM Symposium on Discrete Algorithms STOC Proceedings of the Annual ACM Symposium on Theory of Computing TARK Proceedings of the Conference on Theoretical Aspects of Reasoning about Knowledge UAI Proceedings of the Conference on Uncertainty in Artificial Intelligence Aaronson, S. (2014).

The design of a high-performance cache controller: A case study in asynchronous synthesis. Integration: The VLSI Journal, 15, 241–262. Och, F. J. and Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29, 19–51. Och, F. J. and Ney, H. (2004). The alignment template approach to statistical machine translation. Computational Linguistics, 30, 417–449. Och, F. J. and Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In COLING–02. Ogawa, S., Lee, T.-M., Kay, A. R., and Tank, D. W. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation.

pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines
by Thomas H. Davenport and Julia Kirby
Published 23 May 2016

It views this capability as a “cognitive operating system” that would function like Windows for a variety of cognitive apps. All of these apps employ machine learning to improve the quality of the results over time. Other systems that handle text take a “computational linguistics” approach, and focus on understanding the underlying grammatical structures of sentences and paragraphs. RAGE Frameworks, a company that also has tools for rapid development of a wide variety of computer applications, has tools that use computational linguistics to understand, for example, a wide range of information about companies and their operational and financial performance. The objective is to churn through documents on companies, identify the key statements in them, and diagnose the implications for investors or analysts.

See also artificial intelligence abilities of, 34–35 as a big-picture perspective, 100 cognitive cloud, 45 content analysis, 20 context awareness and learning, 52–54 creating new, 176–200 education in, 230–37 future of human work and, 250–51 Great Convergence, 35–36, 50 higher learning for machines, 41–52 how smart are smart machines, 33–36 image recognition, 34, 46–47, 50, 54, 57 intelligent personal assistants, 167 language recognition, 34, 37, 39–40, 43, 44–46, 50, 53, 56 178, 212 Lawton and, 182 newer ways to support humans, 39–41 self-awareness and, 54–57 Shiller’s warning, 7 steady advance of, 36–37 Types of Cognitive Technology and Their Sophistication (Figure 3.1), 34 weaponry and, 243–44, 248, 250 as wheels for the mind, 63–65 where humans fit in, 57–58 Colton, Simon, 125 Colvin, Geoff, 127, 244–45 complex communication, 27, 28, 63 computational linguistics, 45–46 computers, 165. See also artificial intelligence; cognitive technologies deep learning, 39, 47, 50, 56, 165, 178, 181, 237 Engelbart’s point and click interface, 64 Jobs’s “bicycle for our minds,” 63–64 neural networks, 37, 39, 57, 165, 178 Wiener and, 64 Co-operative Bank, 156–57 Coursera, 178 Cowan, Alister, 205 Cowen, Tyler, 74 creativity, 24, 68, 115, 129, 164–65 augmentation and, 122, 123 computational creativity, 125–28, 129 “design thinking” skills, 120 Lascaux II and, 127–28 “low self-monitors” and, 171 in organizations, 170–71 Credit Suisse, 186 crowdfunding, 247 CSC, 83 Csikszentmihalyi, Mihaly, 164–65 Cunningham, Merce, 123 customer service, 48, 74, 148, 195 Daley, Andrew, 100–102, 195 Dalio, Ray, 92–93 Darwin, Charles, 115 DataXu, 179, 180, 193, 195, 197–98 Davidow, Bill, 6–7 Day, Dr.

pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension
by Samuel Arbesman
Published 18 Jul 2016

To avoid losing our exceptions and edge cases, we need models that can handle the complexity of these exceptions and details. As Peter Norvig, Google’s director of research, put it, “What constitutes a language is not an eternal ideal form, represented by the settings of a small number of parameters, but rather is the contingent outcome of complex processes.” So, computational linguists incorporate edge cases and try to build a robust and rich technological model of a complex system—in this case, language. What do they end up with? A complex technological system. For a clear example of the necessary complexity of a machine model for language, we need only look at how computers are used to translate one language into another.

abstraction, 163 biological thinking’s avoidance of, 115–16 in complexity science, 133, 135 in physics thinking, 115–16, 121–22, 128 specialization and, 24, 26–27 technological complexity and, 23–28, 81, 121–22 accretion, 65 in complex systems, 36–43, 51, 62, 65, 191 in genomes, 156 in infrastructure, 42, 100–101 legacy systems and, 39–42 in legal system, 40–41, 46 in software, 37–38, 41–42, 44 in technological complexity, 130–31 unexpected behavior and, 38 aesthetics: biological thinking and, 119 and physics thinking, 113, 114 aggregation, diffusion-limited, 134–35 algorithm aversion, 5 Amazon, 5 American Philosophical Society, 90 Anaximander of Miletus, 139 Apple, 161, 163 Apple II computer, 77 applied mathematics, 143 arche, 140 Ariane 5 rocket, 1996 explosion of, 11–12 Aristotle, 151 Ascher, Kate, 100 Asimov, Isaac, 124 atomic nucleus, discovery of, 124, 141 Audubon, John James, 109 autocorrect, 5, 16 automobiles: self-driving, 91, 231–32 software in, 10–11, 13, 45, 65, 100, 174 see also Toyota automobiles Autonomous Technology (Winner), 22 Average Is Over (Cowen), 84 awe, as response to technological complexity, 6, 7, 154–55, 156, 165, 174 bacteria, 124–25 Balkin, Jack, 60–61 Ball, Philip, 12, 87–88, 136, 140 Barr, Michael, 10 Barrow, Isaac, 89 BASIC, 44–45 Bayonne Bridge, 46 Beacock, Ian, 12–13 Benner, Steven, 119 “Big Ball of Mud” (Foote and Yoder), 201 binary searches, 104–5 biological systems, 7 accretion in, 130–31 complexity of, 116–20, 122 digital technology and, 49 kluges in, 119 legacy code in, 118, 119–20 modules in, 63 tinkering in, 118 unexpected behavior in, 109–10, 123–24 biological thinking, 222 abstraction avoided in, 115–16 aesthetics and, 119 as comfortable with diversity and complexity, 113–14, 115 concept of miscellaneous in, 108–9, 140–41, 143 as detail oriented, 121, 122, 128 generalization in, 131–32 humility and, 155 physics thinking vs., 114–16, 137–38, 142–43, 222 technological complexity and, 116–49, 158, 174 Blum, Andrew, 101–2 Boeing 777, 99 Bogost, Ian, 154 Bookout, Jean, 10 Boorstin, Daniel, 89 Borges, Jorge Luis, 76–77, 131 Boston, Mass., 101, 102 branch points, 80–81 Brand, Stewart, 39–40, 126, 198–99 Brookline, Mass., 101 Brooks, David, 155 Brooks, Frederick P., Jr., 38, 59, 93 bugs, in software, see software bugs bureaucracies, growth of, 41 cabinets of curiosities (wunderkammers), 87–88, 140 calendar application, programming of, 51–53 Cambridge, Mass., 101 cancer, 126 Carew, Diana, 46 catastrophes, interactions in, 126 Challenger disaster, 9, 11, 12, 192 Chandra, Vikram, 77 Chaos Monkey, 107, 126 Chekhov, Anton, 129 Chekhov’s Gun, 129 chess, 84 Chiang, Ted, 230 clickstream, 141–42 Clock of the Long Now, The (Brand), 39–40 clouds, 147 Code of Federal Regulations, 41 cognitive processing: of language, 73–74 limitations on, 75–76, 210 nonlinear systems and, 78–79 outliers in, 76–77 working memory and, 74 see also comprehension, human collaboration, specialization and, 91–92 Commodore VIC-20 computer, 160–61 complexity, complex systems: acceptance of, see biological thinking accretion in, 36–43, 51, 62, 65, 191 aesthetics of, 148–49, 156–57 biological systems and, 116–17, 122 buoys as examples of, 14–15, 17 complication vs., 13–15 connectivity in, 14–15 debugging of, 103–4 edge cases in, 53–62, 65, 201, 205 feedback and, 79, 141–45 Gall on, 157–58, 227 hierarchies in, 27, 50–51 human interaction with, 163 infrastructure and, 100–101 inherent vs. accidental, 189 interaction in, 36, 43–51, 62, 65, 146 interconnectivity of, see interconnectivity interpreters of, 166–67, 229 kluges as inevitable in, 34–36, 62–66, 127 in legal systems, 85 and limits of human comprehension, 1–7, 13, 16–17, 66, 92–93 “losing the bubble” and, 70–71, 85 meaning of terms, 13–20 in natural world, 107–10 scientific models as means of understanding, 165–67 specialization and, 85–93 unexpected behavior in, 27, 93, 96–97, 98–99, 192 see also diversity; technological complexity complexity science, 132–38, 160 complication, complexity vs., 13–15 comprehension, human: educability of, 17–18 mystery and, 173–74 overoptimistic view of, 12–13, 152–53, 156 wonder and, 172 see also cognitive processing comprehension, human, limits of, 67, 212 complex systems and, 1–7, 13, 16–17, 66, 92–93 humility as response to, 155–56 interconnectivity and, 78–79 kluges and, 42 legal system and, 22 limitative theorems and, 175 “losing the bubble” in, 70–71, 85 Maimonides on, 152 stock market systems and, 26–27 technological complexity and, 18–29, 69–70, 80–81, 153–54, 169–70, 175–76 unexpected behavior and, 18–22, 96–97, 98 “Computational Biology” (Doyle), 222 computational linguistics, 54–57 computers, computing: complexity of, 3 evolutionary, 82–84, 213 impact on technology of, 3 see also programmers, programming; software concealed electronic complexity, 164 Congress, U.S., 34 Constitution, U.S., 33–34 construction, cost of, 48–50 Cope, David, 168–69, 229–30 corpus, in linguistics, 55–56 counting: cognitive limits on, 75 human vs. computer, 69–70, 97, 209 Cowen, Tyler, 84 Cryptonomicon (Stephenson), 128–29 “Crystalline Structure of Legal Thought, The” (Balkin), 60–61 Curiosity (Ball), 87–88 Dabbler badge, 144–45 dark code, 21–22 Darwin, Charles, 115, 221, 227 Daston, Lorraine, 140–41 data scientists, 143 datasets, massive, 81–82, 104–5, 143 debugging, 103–4 Deep Blue, 84 diffusion-limited aggregation (DLA), 134–35 digital mapping systems, 5, 49, 51 Dijkstra, Edsger, 3, 50–51, 155 “Divers Instances of Peculiarities of Nature, Both in Men and Brutes” (Fairfax), 111–12 diversity, 113–14, 115 see also complexity, complex systems DNA, see genomes Doyle, John, 222 Dreyfus, Hubert, 173 dwarfism, 120 Dyson, Freeman, on unity vs. diversity, 114 Dyson, George, 110 Economist, 41 edge cases, 53–62, 65, 116, 128, 141, 201, 205, 207 unexpected behavior and, 99–100 see also outliers Einstein, Albert, 114 Eisen, Michael, 61 email, evolution of, 32–33 emergence, in complex systems, 27 encryption software, bugs in, 97–98 Enlightenment, 23 Entanglement, Age of, 23–29, 71, 92, 96, 97, 165, 173, 175, 176 symptoms of, 100–102 Environmental Protection Agency, 41 evolution: aesthetics and, 119 of biological systems, 117–20, 122 of genomes, 118, 156 of technological complexity, 127, 137–38 evolutionary computation, 82–84, 213 exceptions, see edge cases; outliers Facebook, 98, 189 failure, cost of, 48–50 Fairfax, Nathanael, 111–12, 113, 140 fear, as response to technological complexity, 5, 7, 154–55, 156, 165 Federal Aviation Administration (FAA), Y2K bug and, 37 feedback, 14–15, 79, 135 Felsenstein, Lee, 21 Fermi, Enrico, 109 Feynman, Richard, 9, 11 field biologists, 122 for complex technologies, 123, 126, 127, 132 financial sector: interaction in, 126 interconnectivity of, 62, 64 see also stock market systems Firthian linguistics, 206 Flash Crash (2010), 25 Fleming, Alexander, 124 Flood, Mark, 61, 85 Foote, Brian, 201 Fortran, 39 fractals, 60, 61, 136 Frederick the Great, king of Prussia, 89 fruit flies, 109–10 “Funes the Memorious” (Borges), 76–77, 131 Galaga, bug in, 95–96, 97, 216–17 Gall, John, 157–58, 167, 227 game theory, 210 garden path sentences, 74–75 generalists, 93 combination of physics and biological thinking in, 142–43, 146 education of, 144, 145 explosion of knowledge and, 142–49 specialists and, 146 as T-shaped individuals, 143–44, 146 see also Renaissance man generalization, in biological thinking, 131–32 genomes, 109, 128 accretion in, 156 evolution of, 118, 156 legacy code (junk) in, 118, 119–20, 222 mutations in, 120 RNAi and, 123–24 Gibson, William, 176 Gingold, Chaim, 162–63 Girl Scouts, 144–45 glitches, see unexpected behavior Gmail, crash of, 103 Gödel, Kurt, 175 “good enough,” 27, 42, 118, 119 Goodenough, Oliver, 61, 85 Google, 32, 59, 98, 104–5 data centers of, 81–82, 103, 189 Google Docs, 32 Google Maps, 205 Google Translate, 57 GOTO command, 44–45, 81 grammar, 54, 57–58 gravitation, Newton’s law of, 113 greeblies, 130–31 Greek philosophy, 138–40, 151 Gresham College, 89 Guide of the Perplexed, The (Maimonides), 151 Haldane, J.

pages: 321 words: 113,564

AI in Museums: Reflections, Perspectives and Applications
by Sonja Thiel and Johannes C. Bernhardt
Published 31 Dec 2023

Qurator: Innovative Technologies for Content and Data Curation. CEUR Workshop Proceedings, 2535. Available online at https://ceur-ws.org/Vol-2535/paper_17.pdf. Schaefer, Robin / Neudecker, Clemens (2020). A Two-Step Approach for Automatic OCR Post-Correction. Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 52–57. Available online at https://aclanthology.org/2020.latechclfl-1 .6/. Shen, Zejiang/Zhang, Ruochen/Dell, Melissa et al. (2021). LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv:2103.15348. https://doi.org/10.48550/arXiv.2103.15348.

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv:2103.15348. https://doi.org/10.48550/arXiv.2103.15348. Springmann, Uwe/Reul, Christian/Dipper, Stefanie et al. (2018). Ground Truth for Training OCR Engines on Historical Documents in German Fraktur and Early Modern Latin. Journal for Language Technology and Computational Linguistics 33 (1), 97–114. https://doi.org/10.21248/jlcl.33.2018.220. 161 162 Part 2: Perspectives Suominen, Osma (2019). Annif: DIY Automated Subject Indexing Using Multiple Algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29 (1), 1–25. https://doi.org/10.18352/lq.10285. Wick, Christoph/Reul, Christian/Puppe, Frank (2018).

This annotation process, which is ultimately undertaken by experts, aims to train AI to automatically replicate the same work performed by humans and follow the same logic, albeit on a much larger scale. References Finkel, Jenny Rose/Manning, Christopher D. (2009). Nested Named Entity Recognition. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore, Association for Computational Linguistics, 141–50. https://doi.org/10.3115/1699510.1699529 (all URLs here accessed in August 2023). IFAR (2023). International Foundation for Art Research (IFAR) Provenance Guide. Available online at https://www.ifar.org/Provenance_Guide.pdf. Rother, Lynn/Koss, Max/Mariani, Fabio (2022). Taking Care of History: Toward a Politics of Provenance Linked Open Data in Museums.

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by Dipanjan Sarkar
Published 1 Dec 2016

Natural Language Processing I’ve mentioned the term natural language processing (NLP) several times in this chapter. By now, you may have formed some idea about what NLP means. NLP is defined as a specialized field of computer science and engineering and artificial intelligence with roots in computational linguistics. It is primarily concerned with designing and building applications and systems that enable interaction between machines and natural languages evolved for use by humans. This also makes NLP related to the area of Human-Computer Interaction (HCI ) . NLP techniques enable computers to process and understand natural human language and utilize it further to provide useful output.

The following list of libraries and frameworks are some of the most popular text analytics frameworks, and we will be utilizing several of them throughout the course of the book: NLTK: The Natural Language Toolkit is a complete platform that contains more than 50 corpora and lexical resources. It also provides the necessary tools, interfaces, and methods to process and analyze text data. pattern: The pattern project started out as a research project at the Computational Linguistics & Psycholinguistics research center at the University of Antwerp. It provides tools and interfaces for web mining, information retrieval, NLP, machine learning, and network analysis. The pattern.en module contains most of the utilities for text analytics. gensim: The gensim library has a rich set of capabilities for semantic analysis, including topic modeling and similarity analysis.

Also notice that we are able to correctly predict positive sentiment for 6434 out of 7510 positive movie reviews, and negative sentiment correctly for 4080 out of 7490 negative movie reviews. Pattern Lexicon The pattern package is a complete package for NLP, text analytics, and information retrieval. We discussed it in detail in previous chapters and have also used it several times to solve several problems. This package is developed by CLiPS (Computational Linguistics & Psycholinguistics), a research center associated with the Linguistics Department of the Faculty of Arts of the University of Antwerp. It has a sentiment module associated with it, along with modules for analyzing mood and modality of a body of text. For sentiment analysis, it analyzes any body of text by decomposing it into sentences and then tokenizing it and tagging the various tokens with necessary parts of speech.

The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy
by Matthew Hindman
Published 24 Sep 2018

In Proceedings of the 23rd International Conference on the World Wide Web (pp. 283–92). ACM. Banko, M., and Brill, E. (2001). Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, FR (pp. 26–33). Association for Computational Linguistics. Barabási, A., and Albert, R. (1999). Emergence of scaling in random networks. Science, 286 (5439), 509. Barlow, J. P. (1996). A declaration of the independence of cyberspace. Bart, Y., Shankar, V., Sultan, F., and Urban, G. L. (2005). Are the drivers and role of online trust the same for all websites and consumers?

pages: 319 words: 95,854

You Are What You Speak: Grammar Grouches, Language Laws, and the Politics of Identity
by Robert Lane Greene
Published 8 Mar 2011

Descriptivists like Pullum do believe people can misuse those rules; when they do, they miscommunicate or just sound silly. But linguists imagine rules very differently than do Wallace-style grouchy grammarians, deriving them from observation of some of the billions of words spoken each day and analyses of the trillions that have been written down (and that are, today, conveniently searchable by computer). Linguists might observe native speakers undetected, record them in casual conversation, ask them what sounds grammatical to them, or observe what educated people write. If a sentence strikes the vast majority of speakers of a language as well formed, it is well formed. This doesn’t rule out variation, say, by dialect or region.

Research has shown that search engines such as Google return results comparable to traditional corpora such as, for example, newspaper databases. See “Corpus Colossal,” The Economist, January 25, 2005, and Frank Keller and Mirella Lapata, “Using the Web to Obtain Frequencies for Unseen Bigrams,” Computational Linguistics 29 (2003), pp. 459–84. 13 another borrowing is far more prominent: Searches performed though Google, using Advanced Search and selecting only pages in French, in October 2009. 14 The Ministry of Culture commissioned: Ager, Identity, Insecurity and Image, p. 154. 15 which is the “real” Norwegian: Kristin Grøntoft, “Brenner nynorsk-bok i tønne,” Dagbladet, available at www.dagbladet.no/nyheter/2005/08/17/440490.html. 16 When not inventing words: George Thomas, Linguistic Purism (London: Longmans, 1995), p. 78. 17 “I cannot help it”: Quoted by E.

pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by Mary L. Gray and Siddharth Suri
Published 6 May 2019

Social Science History 17, no. 1 (1993): 135–60. https://doi.org/10.2307/1171247. Foroohar, Rana. “We’re About to Live in a World of Economic Hunger Games.” Time, July 19, 2016. http://time.com/4412410/andy-stern-universal-basic-income/. Fort, Karën, Gilles Adda, and K. Bretonnel Cohen. “Amazon Mechanical Turk: Gold Mine or Coal Mine?” Computational Linguistics 37, no. 2 (2011): 413–20. Foster, John Bellamy, Robert W. McChesney, and R. Jamil Jonna. “The Global Reserve Army of Labor and the New Imperialism.” Monthly Review 63, no. 6 (2011): 1. Frahm, Jill. “The Hello Girls: Women Telephone Operators with the American Expeditionary Forces During World War I.”

Jesse Chandler, Pam Mueller, and Gabriele Paolacci, “Nonnaïveté Among Amazon Mechanical Turk Workers: Consequences and Solutions for Behavioral Researchers,” Behavior Research Methods 46, no. 1 (March 2014): 112–30, https://doi.org/10.3758/s13428-013-0365-7. [back] 20. Stewart et al., “The Average Laboratory Samples a Population of 7,300 Amazon Mechanical Turk Workers,” Judgment and Decision Making 10, no. 5 (2015): 13; Karën Fort, Gilles Adda, and K. Bretonnel Cohen, “Amazon Mechanical Turk: Gold Mine or Coal Mine?,” Computational Linguistics 37, no. 2 (2011): 413–20. [back] 21. Ruth Schwartz Cowan, More Work for Mother: The Ironies of Household Technology from the Open Hearth to the Microwave, 2nd ed. (New York: Basic Books, 1985). [back] 22. Arlie Hochschild and Anne Machung, The Second Shift: Working Families and the Revolution at Home, rev. ed (New York: Penguin, 2012); Gregg, Work’s Intimacy.

The Deep Learning Revolution (The MIT Press)
by Terrence J. Sejnowski
Published 27 Sep 2018

Deep learning has shown us that, like the neural networks of the brain itself, model neural networks are capable of “generalization” of the sort that Chomsky dismissed as “mysticism,” and that they can be trained to selectively recognize speech from many languages, to translate between languages and to generate captions for images, with perfectly good syntax. The ultimate irony is that machine learning has solved the problem of automatically parsing sentences, something that Chomsky’s “abstract theories” of syntax never accomplished despite strenuous efforts by computational linguists. When coupled with reinforcement learning, whose study in animals Skinner pioneered, complex problems can be solved that depend on making a sequence of choices to achieve a goal. This is the essence of problem solving and ultimately the basis of intelligence. Dripping with disdain, Chomsky’s essay went far beyond taking down B.

Nature Is Cleverer Than We Are 257 In “Why Natural Language Processing is Now Statistical Natural Language Processing,” Eugene Charniak explained that a basic part of grammar is to tag parts of speech in a sentence. This is something that humans can be trained to do much better than the extant parsing programs. The field of computational linguistics initially tried to apply the generative grammar approach pioneered by Noam Chomsky in the 1980s, but the results were disappointing. What eventually worked was to hire Brown undergraduates to hand-label the parts of speech for thousands of articles from the Wall Street Journal, and then to apply statistical techniques to identify the most likely part of speech for a particular word in the neighborhood of other specific words.

pages: 413 words: 106,479

Because Internet: Understanding the New Rules of Language
by Gretchen McCulloch
Published 22 Jul 2019

top twenty most lengthened words: Samuel Brody and Nicholas Diakopoulos. 2011. “Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs.” Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 562–570. expressive lengthening: Tyler Schnoebelen. January 8, 2013. “Aww, hmmm, ohh heyyy nooo omggg!” Corpus Linguistics. corplinguistics.wordpress.com/2013/01/08/aww-hmmm-ohh-heyyy-nooo-omggg/. Jen Doll. 2016. “Why Drag It Out?” The Atlantic. www.theatlantic.com/magazine/archive/2013/03/dragging-it-out/309220/.

Proceedings of the International Joint Conference on Work Activities, Coordination, and Collaboration (WACC’99). pp. 227–235. www.psychology.stonybrook.edu/sbrennan-/papers/brenwacc.pdf. Wikipedia administrators: Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. “A Computational Approach to Politeness with Application to Social Factors.” Presented at 51st Annual Meeting of the Association for Computational Linguistics. arxiv.org/abs/1306.6078. study by Carol Waseleski: Carol Waseleski. 2006. “Gender and the Use of Exclamation Points in Computer-Mediated Communication: An Analysis of Exclamations Posted to Two Electronic Discussion Lists.” Journal of Computer-Mediated Communication 11(4). pp. 1012–1024.

pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think
by James Vlahos
Published 1 Mar 2019

At this point in the history of conversational computing, the field diverged. Most government, academic, and corporate researchers heeded the report’s advice. They specialized, focusing on problems such as automatic speech recognition, which is the process of converting the audio waveforms of speech into written words, and computational linguistics, which is the practice of statistically analyzing patterns in language use. (Only in the past decade have researchers began to unite the subdisciplines into full dialogue systems, and we will explore how that happened in the next chapter.) But at around the same time, in the mid-1960s, another, more renegade camp began to form.

See also talking toys abuse of AI and, 241, 244 chatbots for, xiii friendship with, 12, 169–82, 190–92, 194, 195, 196 robots and, 190–93, 244 smart home devices and, 192, 195, 235 surveillance of, 234–37, 240, 249 Children’s Online Privacy Protection Act, 235 chitchat, 146–47, 187 Chronicle of the Kings of England (William of Malmesbury), 64–65 Clark, Peter, 162–63 Clarke, Arthur C., 110 classification, 94, 144, 152 Clegg, Dan, 173–74 Clinton, Hillary, 115 Clippy, 123 cloud computing, 4–5, 26 CloudPets, 229 Cog (robot), 191–92 Cognitive Assistant that Learns and Organizes (CALO), 22–23, 24, 25 cognitive behavioral therapy, 246–47 CogniToys, 234–35 Colby, Kenneth, 75 Cold War, 9, 71 Collins, Victor, 222–24 Colloquis, xiii Colossal Cave Adventure (video game), 78–79, 98, 253 common sense, 161–62 computational linguistics, 72 computational propaganda, 216–20 Computel, 107 Computer Power and Human Reason (Weizenbaum), 73 Concept Graph, 204–5, 212 concierge chatbots, 58 Connell, Derek, 130 Consumer Electronics Show (CES), xiii–xvi, 7, 43 Consumer Watchdog, 232 conversational AI. See also chatbots; natural-language systems; socialbots; talking toys; voice AI Amazon and, 50, 54 (See also Alexa Prize competition) Apple and, 35, 40 (See also Siri) business development pattern, 59 common sense and, 161–62 Facebook and, 51–52, 56, 213 Google’s incremental approach to, 48–49 introduction to, 4–7 knowledge and, 161–63 lifelike qualities of, 249–50 machine learning and, 163–64 propaganda bots and, 217 recent popularity of, 59–60 tech world’s embrace of, 50–56 text-based, 56–57 variability challenge, 82 Winograd Schema Challenge, 160–61 conversational computing.

pages: 194 words: 36,223

Smart and Gets Things Done: Joel Spolsky's Concise Guide to Finding the Best Technical Talent
by Joel Spolsky
Published 1 Jun 2007

These people can be identified because they love to point out the theoretical similarity between two widely divergent concepts. For example, they will say, “Spreadsheets are really just a special case of programming language,” and then go off for a week and write 97 98 Smart and Gets Things Done a thrilling, brilliant whitepaper about the theoretical computational linguistic attributes of a spreadsheet as a programming language. Smart, but not useful. The other way to identify these people is that they have a tendency to show up at your office, coffee mug in hand, and try to start a long conversation about the relative merits of Java introspection vs. COM type libraries, on the day you are trying to ship a beta.

pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack
by Matthew A. Russell
Published 15 Jan 2011

NLTK can interface with drawing toolkits so that you can inspect the chunked output in a more intuitive visual form than the raw text output you see in the interpreter Unless otherwise noted, the remainder of this chapter assumes you’ll be using NLTK “as-is” as well. (If you had a PhD in computational linguistics or something along those lines, you’d be more than capable of modifying NLTK for your own needs and would probably be reading much more scholarly material than this chapter.) With that brief introduction to NLP concluded, let’s get to work mining some blog data. Sentence Detection in Blogs with NLTK Given that sentence detection is probably the first task you’ll want to ponder when building an NLP stack, it makes sense to start there.

Closing Remarks This chapter introduced the bare essentials of advanced unstructured data analytics, and demonstrated how to use NLTK to go beyond the sentence parsing that was introduced in Chapter 7, putting together the rest of an NLP pipeline and extraction entities from text. The field of computational linguistics is still quite nascent, and nailing the problem of NLP for most of the world’s most commonly spoken languages is arguably the problem of the century. Push NLTK to its limits, and when you need more performance or quality, consider rolling up your sleeves and digging into some of the academic literature.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Published 21 Sep 2015

Learners like it are now used in just about every speech recognizer, including Siri. Fred Jelinek, head of the speech group at IBM, famously quipped that “every time I fire a linguist, the recognizer’s performance goes up.” Stuck in the knowledge-engineering mire, computational linguistics had a near-death experience in the late 1980s. Since then, learning-based methods have swept the field, to the point where it’s hard to find a paper devoid of learning in a computational linguistics conference. Statistical parsers analyze language with accuracy close to that of humans, where hand-coded ones lagged far behind. Machine translation, spelling correction, part-of-speech tagging, word sense disambiguation, question answering, dialogue, summarization: the best systems in these areas all use learning.

pages: 479 words: 113,510

Fed Up: An Insider's Take on Why the Federal Reserve Is Bad for America
by Danielle Dimartino Booth
Published 14 Feb 2017

It felt like election night to see which statement would emerge victorious. When I ran up to the break room to watch CNBC’s Steve Liesman read the FOMC statement, I was on tenterhooks, wondering which words had prevailed. I fully grasped the ridiculous pageantry because I knew the markets would parse every single word. A Fed “computational linguistics” study of FOMC statements released in 2015 concluded: “natural language processing can strip away false impressions and uncover hidden truths about complex communications such as those of the Federal Reserve.” The Street had it right all along. Depressingly, the option Fisher preferred was rarely the one that came out of Liesman’s mouth.

CHAPTER 17: A TURNING POINT The plausible outcomes: Christopher Swann, “US Current Account Deficit ‘Unsustainable’—NY Fed Chief,” Financial Times, January 23, 2006. Wall Street legend Arthur Cashin: Linette Lopez, “Art Cashin Just Gave a Hilarious Wall Street History Lesson Going Back to the 1950s,” BusinessInsider.com, October 8, 2013. A Fed “computational linguistics”: FRB: Miguel Acosta and Ellen Meade, “Hanging on Every Word: Semantic Analysis of the FOMC’s Postmeeting Statement,” FEDS Notes, September 30, 2015, www.federalreserve.gov/econresdata/notes/feds-notes/2015/semantic-analysis-of-the-FOMCs-postmeeting-statement-20150930.html. The thought police: FRBD: Danielle DiMartino Booth and David Luttrell, “The Fallacy of a Pain-Free Path to a Healthy Housing Market,” Economic Letter, Vol. 5, No. 14, December 2010, www.dallasfed.org/assets/documents/research/eclett/2010/el1014.pdf.

The Kingdom of Speech
by Tom Wolfe
Published 30 Aug 2016

Berwick, came up with the Integration Hypothesis…which says that human language has two components: E for “expressive,” as in birdsong, and L for “lexical,” as in monkey cries. In man, E and L come together to create human language. Why were they eager to have Berwick on the team? Because he was an MIT “computational linguistics” star who knew how to parameterize—that’s the word, parameterize—any linguistic theory into modules and press a button and run them through his Prolog system and just like that determine how the theory works for any or all of several dozen languages in terms of “psycholinguistic fidelity” and “logical adequacy.”

The Science of Language
by Noam Chomsky
Published 24 Feb 2012

He earlier construed his contribution as not only the construction of a plausible theory of language within the rationalist tradition, but a criticism of empiricist dogma about how to construct theories of the mind. The criticism of behaviorism is generalized now to criticism of the naïve form of evolution one finds in at least some views of the role of selection and adaptation, backed up by a developed biolinguistic (or as indicated elsewhere, bio-physico-computational linguistic) account of language, its evolution in the species, and its development/growth in individuals. See also Appendix II. Chapter 13 Page 72, Chomsky, simplicity, and Goodman It is useful to read this section and the next (on Chomsky's relationship to Nelson Goodman) together, for Chomsky got some of his formal techniques from Goodman's “constructionist” project (one that goes back to Carnap and his Aufbau) and his appreciation for and pursuit of the notion of simplicity from Goodman.

Corpuscularism As used in this volume, any theory that postulates a set of elements that are taken as primitives for the purposes of whatever combinatory principles the theory deals with. The theory states how these elements can be put together to make complexes. For chemistry, the primitives are atoms, the complexes molecules. For computational linguistics, the primitives are (combinable) lexical items and the complexes are sentences/expressions. Distributed morphology Any of several versions of ways to conceive of how words might in the course of a derivation be put together out of theoretically defined primitives to yield the complexes we hear or (with sign) see.

pages: 611 words: 130,419

Narrative Economics: How Stories Go Viral and Drive Major Economic Events
by Robert J. Shiller
Published 14 Oct 2019

KDD ’09 Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 497–506. Lin, Yuri, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman, and Slav Petrov. 2012. “Syntactic Annotations for the Google Books Ngram Corpus.” Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, July 8–14, 169–74. Jeju Island, Korea, http://aclweb.org/anthology/P12-3029. Lindbeck, Assar, and Dennis J. Snower. 2001. “Insiders versus Outsiders.” Journal of Economic Perspectives 15(1):165–88. Litman, Barry R. 1983. “Predicting Success of Theatrical Movies: An Empirical Study.” Journal of Popular Culture 16(4):159–75.

“Regret Theory: An Alternative Theory of Rational Choice under Uncertainty.” Economic Journal 92(368):805–24. Lorayne, Harry. 2007. Ageless Memory: The Memory Expert’s Prescription for a Razor-Sharp Mind. New York: Black Dog and Leventhal Publishers. Losh, Molly, and Peter C. Gordon. 2014. “Quantifying Narrative Ability in Autism Spectrum Disorder: A Computational Linguistic Analysis of Narrative Coherence.” Journal of Autism and Developmental Disorders 44 (12): 3016–25. Lowen, Anice C., Samira Mubareka, John Steel, and Peter Palese. 2007. “Influenza Virus Transmission Is Dependent on Relative Humidity and Temperature.” PLOS Pathogens, https://doi.org/10.1371/journal.ppat.0030151.

pages: 214 words: 14,382

Monadic Design Patterns for the Web
by L.G. Meredith

In fact, Noam Chomsky’s formalization of the notion of grammar is one of the first compositional accounts of computation (in contrast with Turing or von Neumann’s models of computation, which are decidedly not compositional) and, as such, is also one of the most practical and most influential accounts. Chomsky’s work has heavily influenced parsing, rewrite systems, and many areas of computational linguistics. I am not a number, I am a free monoid In mathematics, concatenation is the most basic means of representing composition forms: from the product of numbers to the operation of a group to function composition. Thus, it should not be surprising that the notion of words finds its way in the development of the algebra field.

pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions
by Brian Christian and Tom Griffiths
Published 4 Apr 2016

Using this rule, if you had previously seen 100 lottery drawings with only 10 winning tickets (w = 10, n = 100), your estimate after seeing a single winning draw for this new lottery would be a much more reasonable 12/103 (not far from 10%). Variants on Laplace’s Law are used extensively in computational linguistics, where they provide a way to estimate the probabilities of words that have never been seen before (Chen and Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling”). or last for five millennia: For a quantity like a duration, which ranges from 0 to ∞, the uninformative prior on times t is the probability density p(t) ∝ 1/t.

Charles, Susan T., and Laura L. Carstensen. “Social and Emotional Aging.” Annual Review of Psychology 61 (2010): 383–409. Chen, Stanley F., and Joshua Goodman. “An Empirical Study of Smoothing Techniques for Language Modeling.” In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, 1996, 310–318. Chen, Xi, and Xiaotie Deng. “Settling the Complexity of Two-Player Nash Equilibrium.” In Foundations of Computer Science, 2006, 261–272. Chow, Y. S., and Herbert Robbins. “A Martingale System Theorem and Applications.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability.

Data Mining: Concepts and Techniques: Concepts and Techniques
by Jiawei Han , Micheline Kamber and Jian Pei
Published 21 Jun 2011

[Bay98] Bayardo, R.J., Efficiently mining long patterns from databases, In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98) Seattle, WA. (June 1998), pp. 85–93. [BB98] Bagga, A.; Baldwin, B., Entity-based cross-document coreferencing using the vector space model, In: Proc. 1998 Annual Meeting of the Association for Computational Linguistics and Int. Conf. Computational Linguistics (COLING-ACL’98) Montreal, Quebec, Canada. (Aug. 1998). [BB01] Baldi, P.; Brunak, S., Bioinformatics: The Machine Learning Approach. 2nd ed. (2001) MIT Press, Cambridge, MA . [BB02] Borgelt, C.; Berthold, M.R., Mining molecular fragments: Finding relevant substructures of molecules, In: Proc. 2002 Int.

Other topics in multimedia mining include classification and prediction analysis, mining associations, and video and audio data mining (Section 13.2.3). Mining Text Data Text mining is an interdisciplinary field that draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. A substantial portion of information is stored as text such as news articles, technical papers, books, digital libraries, email messages, blogs, and web pages. Hence, research in text mining has been very active. An important goal is to derive high-quality information from text. This is typically done through the discovery of patterns and trends by means such as statistical pattern learning, topic modeling, and statistical language modeling.

pages: 236 words: 67,823

Hacking Vim 7.2
by Kim Schulz
Published 29 Apr 2010

Without her positive attitude and help, I would never have gotten this book ready. I would also like to add a great thank you to Bram Moolenaar for developing the Vim Editor—I appreciate the fruits of your work every day. About the Reviewers Boris Fersing is an amateur photographer and student in computational linguistics at the University of Saarland, Germany. For his studies, he participated in many projects and used many programming languages (SML, C/C++, Java, Ruby, Prolog) and Vim was always his editor of choice. He also worked as system administrator for a department of the University of Saarland. With this job he learned how to use some Unix tools and improved his knowledge about the Vim editor.

pages: 212 words: 68,649

Wordslut: A Feminist Guide to Taking Back the English Language
by Amanda Montell
Published 27 May 2019

In 2017 sociolinguist Eliza Scruton conducted a study in which she examined a corpus (that’s a large collection of language samples) containing more than fifty million words from the internet to determine exactly how gender-specific words like nasty, bossy, and nag really are. In short? Very. Her data revealed that these terms skew strongly female in usage, often appearing before the words wife and mother. Chi Luu, a computational linguist and language columnist at JSTOR Daily, once made the point that the purpose of name-calling is to accuse a person of not behaving as they should in the eyes of the speaker. The end goal of the insult is to shape the recipient’s actions to fit the speaker’s desired image of a particular group.

pages: 584 words: 187,436

More Money Than God: Hedge Funds and the Making of a New Elite
by Sebastian Mallaby
Published 9 Jun 2010

One of the few exceptions to the Renaissance rule of not hiring from Wall Street was Robert Frey, a mathematician who had been at Morgan Stanley. 20. Wadhwani interview. 21. See, for example, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, “The Mathematics of Statistical Machine Translation: Parameter Estimation,” Computational Linguistics 19, no. 2 (1993). As noted below, the Della Pietra brothers followed Brown and Mercer from IBM to Renaissance Technologies. 22. As far back as 1949, code breakers had wondered about the application of their technique to translation. But they lacked computing power; statistical translation depended on feeding a vast number of pairs of sentences into a computer, so that the computer had enough data from which to extract meaningful patterns.

It is also interesting that Brown and Mercer’s coauthors who followed them to Renaissance, Stephen and Vincent Della Pietra, explicitly presented their experience with statistical machine translation as relevant to finding order in other types of data, including financial data. See Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics 22, no. 1 (March 1996): pp. 39–71. 30. To manage the potential linguistic chaos resulting from this permissiveness, neologisms had to be submitted to a review. Mercer interview. 31. The Russian employees were Pavel Volfbeyn and Alexander Belopolsky. The firm that they defected to was Millennium.

pages: 586 words: 186,548

Architects of Intelligence
by Martin Ford
Published 16 Nov 2018

That’s not a problem for a sprained ankle, but if you’re asking about a heart attack because you think someone’s had one, it could actually lead to death. People would assume a system that can answer one of those questions you can answer the other. A related problem arises with dialogue systems based on learning from data. Last summer (2017), I was given the Association for Computational Linguistics Lifetime Achievement Award and almost all the people listening to my talk at the conference work on deep learning based natural-language systems. I told them, “if you want to build a dialogue system, you have to recognize that Twitter is not a real dialogue.” To build a dialogue system that can handle dialogues of the sort people actually engage in, you need to have real data of real people having real dialogues, and that’s much harder to get than Twitter data.

Her many awards and distinctions include election to the National Academy of Engineering, the American Philosophical Society, and the American Academy of Arts and Sciences, and as a fellow of the Association for the Advancement of Artificial Intelligence and the Association for Computing Machinery. She received the 2009 ACM/AAAI Allen Newell Award, the 2015 IJCAI Award for Research Excellence, and the 2017 Association for Computational Linguistics Lifetime Achievement Award. She is also known for her leadership of interdisciplinary institutions and contributions to the advancement of women in science. Chapter 16. JUDEA PEARL The current machine learning concentration on deep learning and its non-transparent structures is a hang-up.

pages: 265 words: 74,000

The Numerati
by Stephen Baker
Published 11 Aug 2008

Intelligence services are often flummoxed by even the most basic piece of data in a person's file: his or her name. This is one crucial area where our cultural diversity defies the sorting and counting magic of the computer. Jack Hermansen knows this all too well. He's been working on the electronic recognition of names since 1984, when he got his doctorate in computational linguistics from Georgetown University. The U.S. State Department called on him back then to help figure out which names belonged to which people. It seemed like a simple enough task. Figure out the variations, from culture to culture, on the spelling of a name like Sean, Mohammed, or Chang, and stick them into a computer.

pages: 238 words: 77,730

Final Jeopardy: Man vs. Machine and the Quest to Know Everything
by Stephen Baker
Published 17 Feb 2011

Early in his tenure at IBM he and a friend tried, in their spare time, to teach a machine to write fiction by itself. They trained it for various literary themes, from love to betrayal, and they named it Brutus, for Julius Caesar’s traitorous comrade. Ferrucci was comfortable talking about everything from the details of computational linguistics to the evolution of life on earth and the nature of human thought. This made him an ideal ambassador for a Jeopardy machine. After all, his project would raise a broad range of issues, and fears, about the role of brainy machines in society. Would they compete for jobs? Could they establish their own agendas, like the infamous computer HAL, in 2001: A Space Odyssey, and take control?

pages: 284 words: 79,265

The Half-Life of Facts: Why Everything We Know Has an Expiration Date
by Samuel Arbesman
Published 31 Aug 2012

While scientific progress isn’t necessarily correlated with a single publication—some papers might have multiple discoveries, and others might simply be confirming something we already know—it is often a good unit of study. Focusing on the scientific paper gives us many pieces of data to measure and study. We can look at the title and text and, using sophisticated algorithms from computational linguistics or text mining, determine the subject area. We can look at the authors themselves and create a web illustrating the interactions between scientists who write papers together. We can examine the affiliations of each of the authors and try to see which collaborations between individuals at different institutions are more effective.

pages: 337 words: 86,320

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
by Seth Stephens-Davidowitz
Published 8 May 2017

They analyzed the keystroke logs and noted any time someone corrected a word. More details can be found in Yukino Baba and Hisami Suzuki, “How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs,” Proceedings of the Fiftieth Annual Meeting of the Association for Computational Linguistics, 2012. The data, code, and a further description of this research can be found at sethsd.com. 51 Consider all searches of the form “I want to have sex with my”: The full data—warning: graphic—is as follows: “I WANT TO HAVE SEX WITH . . .” MONTHLY GOOGLE SEARCHES WITH THIS EXACT PHRASE my mom 720 my son 590 my sister 590 my cousin 480 my dad 480 my boyfriend 480 my brother 320 my daughter 260 my friend 170 my girlfriend 140 52 cartoon porn: For example, “porn” is one of the most common words included in Google searches for various extremely popular animated programs, as seen below. 52 babysitters: Based on author’s calculations, these are the most popular female occupations in porn searches by men, broken down by the age of men: CHAPTER 3: DATA REIMAGINED 56 algorithms in place: Matthew Leising, “HFT Treasury Trading Hurts Market When News Is Released,” Bloomberg Markets, December 16, 2014; Nathaniel Popper, “The Robots Are Coming for Wall Street,” New York Times Magazine, February 28, 2016, MM56; Richard Finger, “High Frequency Trading: Is It a Dark Force Against Ordinary Human Traders and Investors?”

pages: 317 words: 84,400

Automate This: How Algorithms Came to Rule Our World
by Christopher Steiner
Published 29 Aug 2012

Sebastian Mallaby, More Money Than God: Hedge Funds and the Making of a New Elite (New York: Penguin Press, 2010). 6. Peter Brown, Robert Mercer, Stephen Della Pietra, and Vincent J. Della Pietra, “The Mathematics of Statistical Machine Translation: Parameter Estimation,” Journal of Computational Linguistics 19, no. 2 (1993): 263–311. 7. Ingfei Chen, “Scientist at Work: Nick Patterson,” New York Times, December 12, 2006. CHAPTER 8: WALL STREET VERSUS SILICON VALLEY 1. Rana Foroohar, “Wall Street: Aiding the Economic Recovery, or Strangling It?” Time, April 4, 2011. 2.

pages: 301 words: 85,263

New Dark Age: Technology and the End of the Future
by James Bridle
Published 18 Jun 2018

‘HP cameras are racist’, YouTube video, username: wzamen01, December 10, 2009. 14.David Smith, ‘“Racism” of early colour photography explored in art exhibition’, Guardian, January 25, 2013, theguardian.com. 15.Phillip Martin, ‘How A Cambridge Woman’s Campaign Against Polaroid Weakened Apartheid’, WGBH News, December 9, 2013, news.wgbh.org. 16.Hewlett-Packard, ‘Global Citizenship Report 2009’, hp.com. 17.Trevor Paglen, ‘re:publica 2017 | Day 3 – Livestream Stage 1 – English’, YouTube video, username: re:publica, May 10, 2017. 18.Walter Benjamin, ‘Theses on the Philosophy of History’, in Walter Benjamin: Selected Writings, Volume 4: 1938–1940, Cambridge, MA: Harvard University Press, 2006. 19.PredPol, ‘5 Common Myths about Predictive Policing’, predpol.com. 20.G. O. Mohler, M. B. Short, P. J. Brantingham, et al., ‘Self-exciting point process modeling of crime’, JASA 106 (2011). 21.Daniel Jurafsky and James H. Martin, Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edition, Upper Saddle River, NJ: Prentice Hall, 2009. 22.Walter Benjamin, ‘The Task of the Translator’, in Selected Writings Volume 1 1913–1926, Marcus Bullock and Michael W. Jennings, eds, Cambridge, MA and London: Belknap Press, 1996. 23.Murat Nemet-Nejat, ‘Translation: Contemplating Against the Grain’, Cipher, 1999, cipherjournal.com. 24.Tim Adams, ‘Can Google break the computer language barrier?’

pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together
by Nick Polson and James Scott
Published 14 May 2018

See Tomas Mikolov et al., “Distributed Representations of Words and Phrases and Their Compositionality,” Advances in Neural Information Processing Systems 26 (NIPS, 2013), https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality. 26.  Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig, “Linguistic Regularities in Continuous Space Word Representations,” in Proceedings of NAACL-HLT, 2013 (Stroudsburg, PA: Association for Computational Linguistics, 2013), 746–51. CHAPTER 5   1.  We distinctly remember hearing this piece of commentary on a TV show in the wake of the coin-flip incident, but we have been unable to find a transcript of the show online. Our apologies to the witty and sadly anonymous commentator.   2.  Stephen Quinn, “Gold, Silver, and the Glorious Revolution: Arbitrage Between Bills of Exchange and Bullion,” The Economic History Review 49, no. 3 (1996): 479–90.   3.  

pages: 315 words: 93,628

Is God a Mathematician?
by Mario Livio
Published 6 Jan 2009

For instance, there is a Journal of Mathematical Sociology (which in 2006 was in its thirtieth volume) that is oriented toward a mathematical understanding of complex social structures, organizations, and informal groups. The journal articles address topics ranging from a mathematical model for predicting public opinion to one predicting interaction in social groups. Going in the other direction—from mathematics into the humanities—the field of computational linguistics, which originally involved only computer scientists, has now become an interdisciplinary research effort that brings together linguists, cognitive psychologists, logicians, and artificial intelligence experts, to study the intricacies of languages that have evolved naturally. Is this some mischievous trick played on us, such that all the human struggles to grasp and comprehend ultimately lead to uncovering the more and more subtle fields of mathematics upon which the universe and we, its complex creatures, were all created?

pages: 375 words: 88,306

The Sharing Economy: The End of Employment and the Rise of Crowd-Based Capitalism
by Arun Sundararajan
Published 12 May 2016

For a more recent discussion of their place in computer-mediated transactions, see Hal Varian, “Computer-Mediated Transactions,” American Economic Review 100, 2 (2010): 1–10. For a study on the importance of the content in feedback text, see Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan, “Opinion Mining Using Econometrics: A Case Study on Reputation Systems,” Proceedings of the 44th Annual Meeting of the Association of Computational Linguistics, 2007, http://www.cs.brandeis.edu/~marc/misc/proceedings/acl-2007/ACLMain/pdf/ACLMain53.pdf. For a study of the bias introduced because of the fear of retaliation, see Chrysanthos Dellarocas and Charles A. Wood, “The Sound of Silence in Online Feedback: Estimating Trading Risks in the Presence of Reporting Bias,” Management Science 54, 3 (2008): 460–476. 14.

pages: 313 words: 92,053

Places of the Heart: The Psychogeography of Everyday Life
by Colin Ellard
Published 14 May 2015

For example, the social media application Twitter, which has taken the world by storm by providing a free “microblogging” service by which users can share news, discoveries, insights, or photos of their lunches, can also be used to map the world’s feelings. Twitter feeds can be mined for emotional content using computational linguistics. So-called “sentiment analyses” have become big business as corporations can use them to measure attitudes of users about a particular product (by, say, analyzing the emotional content of all the tweets that contain the word Starbucks). But tweets can also be geocoded, which means that they could be used to map the frequency of use of emotion words in different locations.

pages: 398 words: 86,855

Bad Data Handbook
by Q. Ethan McCallum
Published 14 Nov 2012

* * * [8] http://www.weotta.com [9] http://en.wikipedia.org/wiki/Natural_language_processing [10] http://citygrid.com/ [11] http://bit.ly/X9sqWR [12] http://nltk.org [13] http://en.wikipedia.org/wiki/Text_classification [14] http://www.cs.cornell.edu/people/pabo/movie-review-data/ [15] http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf [16] http://bit.ly/QibGfE [17] http://en.wikipedia.org/wiki/Naive_Bayes_classifier [18] http://en.wikipedia.org/wiki/Maximum_entropy_classifier [19] https://github.com/japerk/nltk-trainer [20] http://en.wikipedia.org/wiki/Part-of-speech_tagging [21] http://en.wikipedia.org/wiki/Chunking_(computational_linguistics) [22] http://www.slideshare.net/japerk/corpus-bootstrapping-with-nltk Chapter 7. Will the Bad Data Please Stand Up? Philipp K. Janert Among hikers and climbers, they say that “there is no such thing as bad weather—only inappropriate clothing.” And as anybody who has spent some time outdoors can attest, it is often precisely trips undertaken under more challenging circumstances that lead to the most noteworthy memories.

The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do
by Erik J. Larson
Published 5 Apr 2021

Or we could instruct all contestants to play as if interviewing for the job of understanding conversational English—not a bad test for future voice-activated personal assistants! In such cases, tricks would be immediate violations of conversational rules baked into the test. Goostman would be sunk. Computational linguists and AI researchers have known all along that engaging in open-ended dialogue is formally more difficult than interpreting monologue, as in understanding a newspaper article. Yet another way to preserve Turing’s intuition that natural language ability is a suitable test of human-level intelligence is to consider a simplification of his original test, requiring only monologue.

pages: 327 words: 103,336

Everything Is Obvious: *Once You Know the Answer
by Duncan J. Watts
Published 28 Mar 2011

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E87A (9):2379–86. Snow, Rion, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. “Cheap and Fast—But Is It Good? Evaluating Non-Expert Annotations for Natural Language Tasks.” In Empirical Methods in Natural Language Processing. Honolulu, Hawaii: Association for Computational Linguistics. Somers, Margaret R. 1998. “ ‘We’re No Angels’: Realism, Rational Choice, and Relationality in Social Science.” American Journal of Sociology 104 (3):722–84. Sorkin, Andrew Ross (ed). 2008. “Steve & Barry’s Files for Bankruptcy.” New York Times, July 9. Sorkin, Andrew Ross. 2009a. Too Big to Fail: The Inside Story of How Wall Street and Washington Fought to Save the Financial System from Crisis—and Themselves.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity
by Amy Webb
Published 5 Mar 2019

New York: Viking, 2005. Libicki, R. Cyberspace in Peace and War. Annapolis: Naval Institute Press, 2016. Lin, J. Y. Demystifying the Chinese Economy. Cambridge, UK: Cambridge University Press, 2011. Marcus, M. P., et al. “Building a Large Annotated Corpus of English: The Penn Treebank.” Computational Linguistics 19, no. 2 (1993): 313–330. Massaro, T. M., and H. Norton. “Siri-ously? Free Speech Rights and Artificial Intelligence.” Northwestern University Law Review 110, no. 5 (2016): 1169–1194, Arizona Legal Studies Discussion Paper No. 15–29. Minsky, M., P. Singh, and A. Sloman. “The St. Thomas Common Sense Symposium: Designing Architectures for Human-Level Intelligence.”

pages: 338 words: 106,936

The Physics of Wall Street: A Brief History of Predicting the Unpredictable
by James Owen Weatherall
Published 2 Jan 2013

and try to deduce an answer based on a list of facts it had stored in a database. A major part of this project was devoted to simply parsing the question, trying to determine what the questioner was even after. Black’s work represented an important early contribution to the field known as computational linguistics, in which people try to figure out how to make computers understand and produce natural language. Word spread quickly around Cambridge of Black’s work at BBN. In the spring of 1963, Minsky heard about Black’s question-answering program. He was sufficiently impressed — and sufficiently influential — that he negotiated readmission to Harvard on Black’s behalf.

pages: 370 words: 105,085

Joel on Software
by Joel Spolsky
Published 1 Aug 2004

These kind of people can be identified because they love to point out the theoretical similarity between two widely divergent concepts. For example, they will say, "Spreadsheets are really just a special case of programming language," and then go off for a week and write a thrilling, brilliant whitepaper about the theoretical computational linguistic attributes of a spreadsheet as a programming language. Smart, but not useful. The other way to identify these people is that they have a tendency to show up at your office, coffee mug in hand, and try to start a long conversation about the relative merits of Java introspection vs. COM type libraries on the day you are trying to ship a beta.

pages: 370 words: 107,983

Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All
by Robert Elliott Smith
Published 26 Jun 2019

Like an expert system, one can run the rules of a formal language from a starting point, chaining through rule-after-rule, until one reaches a final rule, a destination, which is the completion of the sentence. One can also run the rules over sentences, to determine whether they are valid in that language. Chomsky’s theories about language hierarchies created the entire field of computational linguistics, and the study of human language in terms of computational syntax. It also created the area of computer science concerned with the power of programming languages (code). Like languages, computer codes require a formal system of rules on which to operate. However, unlike human languages, computers can come up with nonsensical outcomes because the code doesn’t contain a representation of meaning.

pages: 477 words: 106,069

The Sense of Style: The Thinking Person's Guide to Writing in the 21st Century
by Steven Pinker
Published 1 Jan 2014

Language and Linguistics Compass, 6/7, 403–415. Grice, H. P. 1975. Logic and conversation. In P. Cole & J. L. Morgan (eds.), Syntax & semantics (Vol. 3, Speech acts). New York: Academic Press. Grosz, B. J., Joshi, A. K., & Weinstein, S. 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21, 203–225. Haidt, J. 2012. The righteous mind: Why good people are divided by politics and religion. New York: Pantheon. Haussaman, B. 1993. Revising the rules: Traditional grammar and modern linguistics. Dubuque, Iowa: Kendall/Hunt. Hayes, J. R., & Bajzek, D. 2008. Understanding and reducing the knowledge effect: Implications for writers.

pages: 390 words: 109,519

Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media
by Tarleton Gillespie
Published 25 Jun 2018

Television and New Media, 1527476414554402. WAGNER, R. POLK. 1998. “Filters and the First Amendment.” Minnesota Law Review 83: 755. WARNER, WILLIAM, AND JULIA HIRSCHBERG. 2012. “Detecting Hate Speech on the World Wide Web.” In Proceedings of the Second Workshop on Language in Social Media, 19–26. Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=2390377. WAUTERS, E., E. LIEVENS, AND P. VALCKE. 2014. “Towards a Better Protection of Social Media Users: A Legal Perspective on the Terms of Use of Social Networking Sites.” International Journal of Law and Information Technology 22 (3): 254–94. WEBER, STEVEN. 2009.

pages: 414 words: 109,622

Genius Makers: The Mavericks Who Brought A. I. To Google, Facebook, and the World
by Cade Metz
Published 15 Mar 2021

He’d been impressed by a talk Girshick had given in which he had described a system that pushed image recognition beyond what Hinton and his students demonstrated the previous December. Among those joining them was a young researcher named Meg Mitchell, who began applying similar techniques to language. Mitchell, a Southern Californian who had studied computational linguistics in Scotland, would later become a key figure in the deep learning movement after she told Bloomberg News that artificial intelligence suffered from a “sea of dudes” problem—that this new breed of technology would fall short of its promise because it was built almost entirely by men. It was an issue that would come to haunt the big Internet companies, including Microsoft.

pages: 272 words: 103,638

Unit X: How the Pentagon and Silicon Valley Are Transforming the Future of War
by Raj M. Shah and Christopher Kirchhoff
Published 8 Jul 2024

At the retreat he met Trae Stephens, a former intelligence analyst and Palantir executive who’d joined Founders Fund with the goal of investing in venture-backed companies aimed at the defense sector. Like a lot of people in Silicon Valley, Stephens had taken an unusual route into the world of technology. He majored in Regional and Comparative Studies at Georgetown before going to work in the intelligence community on computational linguistics. Having imagined futurist command centers and real-time analytics, he became dismayed at the actual state of the tech that agencies were using. Just like what Raj had discovered at the air force command center in Qatar, Stephens was taken aback to see U.S. intelligence analysts working with incompatible databases that couldn’t connect to each other.

pages: 429 words: 114,726

The Computer Boys Take Over: Computers, Programmers, and the Politics of Technical Expertise
by Nathan L. Ensmenger
Published 31 Jul 2010

Within two years of their publication the Curriculum ‘68 guidelines had been implemented in at least twenty-six universities.69 The special committee assembled by the ACM to produce the Curriculum ‘68 report, the Curriculum Committee on Computer Science (C3S), followed up with a series of articles in the Communications of the ACM highlighting specific topics from the recommendations, including computational linguistics, formal languages, automata, and abstract switching and computability. In collaboration with the National Science Foundation, the C3S also hosted a series of conferences aimed at enabling smaller universities and teaching colleges to implement Curriculum ‘68.70 Over the course of the next decade, the C3S would continue to refine and monitor its recommendations.

pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets
by David J. Leinweber
Published 31 Dec 2008

The technologically innovative firms describe increasingly sophisticated trading strategies. They show expansion into ever more markets. Recruiting information and old press releases show a staff of Ph.D.’s drawn largely from areas outside finance: Computer science, AI, mathematics, and physics are popular. More surprising is the presence of computational linguists, people whose expertise lies in the area of textual, qualitative information. With these clues, we can move on to areas where technology will be more important to the broader investment community in the future. Future Technological Stars As Niels Bohr stated: “Prediction is very difficult, especially about the future.”

pages: 426 words: 117,027

Mind in Motion: How Action Shapes Thought
by Barbara Tversky
Published 20 May 2019

The system of comics. Translated by B. Beaty & N. Nguyen. Jackson: University Press of Mississippi. Adding information to words and images Clark, H. H. (1975). Bridging. In Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing (pp. 169–174). Cambridge, MA: Association for Computational Linguistics. Intraub, H., Bender, R. S., & Mangels, J. A. (1992). Looking at pictures but remembering scenes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(1), 180. Segmenting events and stories McCloud, S. (1993). Understanding comics. New York, NY: William Morrow Paperbacks.

pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See
by Gary Price , Chris Sherman and Danny Sullivan
Published 2 Jan 2003

Search Form URL: See Main Page Mathematics and Physics arXiv.org e-Print archive http://xxx.lanl.gov/ Since August 1991, arXiv.org (formerly xxx.lanl.gov) is a fully automated electronic archive and distribution server for research papers. Covered areas include physics and related disciplines, mathematics, nonlinear sciences, computational linguistics, and neuroscience. Search Form URL: http://xxx.lanl.gov/form Related Resources: High Energy Physics Conference Database http://www.slac.stanford.edu/spires/conferences/ Sloane’s Online Encyclopedia of Integer Sequences http://www.research.att.com/~njas/sequences/index.html Search for integer sequences.

pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots
by John Markoff
Published 24 Aug 2015

Dismayed by the claustrophobic family business, he soon desperately needed to do something different and he remembered both a programming class he had taken at Chicago and his obsession with A Space Odyssey. He enrolled as a graduate student in computer science at the University of Pennsylvania. Once there he studied with Aravind Krishna Joshi, an early specialist in computational linguistics. Even though he had come in with a liberal arts background he quickly became a star. He went through the program in five years, getting perfect scores in all of his classes and writing his graduate thesis on the subject of building natural language front ends to databases. As a newly minted Ph.D., Kaplan gave job audition lectures at Stanford and MIT, visited SRI, and spent an entire week being interviewed at Bell Labs.

pages: 348 words: 119,358

The Long History of the Future: Why Tomorrow's Technology Still Isn't Here
by Nicole Kobie
Published 3 Jul 2024

But Gebru left Google years before, in 2020 – also over ethical concerns.19 Gebru and five co-authors were set to publish a paper that examined the downsides of large language models – it raised concerns about environmental impact and bias, nothing too controversial, really – but managers at Google asked her to remove her name and those of her colleagues, leaving Emily Bender, director of the University of Washington’s Computational Linguistics Laboratory, as the only author. Gebru refused, and got told her job was gone, though Google’s version of the story differs and suggests she left of her own accord. Jeffrey Dean, the head of Google AI, told employees in an email that the paper didn’t ‘meet our bar for publication’. Whether Gebru quit or was forced out, her departure was followed by that of her colleague, Margaret Mitchell, co-leader of the Ethical Artificial Intelligence team, reportedly because she ‘exfiltrated thousands of files’.

pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts
by Richard Susskind and Daniel Susskind
Published 24 Aug 2015

Using technology borrowed from facial-recognition software, they swiftly found 1,000 new joins.115 They have used the same technology to piece together fragments of Tibetan manuscripts116 and the Dead Sea Scrolls.117 Nachum Dershowitz, also a computer scientist at Tel Aviv University, used technology in a different way. For centuries, Bible scholars have argued furiously about who wrote different parts of the Bible. Using advances in ‘computational linguistics’, Nachum Dershowitz found that he could predict with ‘over 90 percent accuracy’ the scholarly consensus on the authorship of different parts of the texts.118 Many online religious communities exist, often functioning without any external direction from traditional religious authorities.

How I Became a Quant: Insights From 25 of Wall Street's Elite
by Richard R. Lindsey and Barry Schachter
Published 30 Jun 2007

Today 6.251 is called Introduction to Mathematical Programming, still part of the Electrical Engineering degree, but with not even a hint of the prior incarnation of the course. My electrical engineering aspirations have likewise long since disappeared, and I rely on others to plug in equipment and change light bulbs. I followed 6.251 with classes in computer linguistics and programming (which I loved), but then found that I neither liked nor had any JWPR007-Lindsey 86 May 28, 2007 15:39 h ow i b e cam e a quant aptitude for circuit boards, electricity, or the mechanical facets of the electrical engineer’s trade. Thus disillusioned, I was delighted to learn that MIT allowed students to petition for a customized course of study.

pages: 643 words: 131,673

How to Invent Everything: A Survival Guide for the Stranded Time Traveler
by Ryan North
Published 17 Sep 2018

See also poisons bright coloration and, 58 of plants and animals, 56–57 of potatoes, 71 of soybeans, 73 vitamin A and, 112n of yams, 79 traction in position, 338 transformers, 200–203 transportation, 258–305 treadle, 226, 227f treadles (bicycle), 261n triangular sail, 284 trigonometry, 395–96, 397t–98t trillo, 350 trip hammer, 177, 405 true north, 205 truth tables, 304t, 362, 363t, 364t, 365t, 367t, 368t, 372n tube (for musical instrument), 342 tuning forks, 346n turkeys, 85, 101 turnips, 51 twine, 227 two-field crop rotation, 49, 49t, 51 two-point perspective, 319–20, 320f two-rotation screw propeller, 287, 288f two-terminal battery, 195–96 type cases, 254n type metal, 254 ultraviolet radiation, 277–78, 277n units of measurement, 39–44, 129n universal constants, 399t–401t Universal Edibility Test, 58–59, 112n uppercase, 254n urine in processing steel, 244–45 in tanning, 222 vaccination, 327–28, 328n vacuum, 199n valves in internal combustion engines, 191, 191f for wind instruments, 343f vegetables, history of, 53–54 venom, 56–57 vermin, 88 vinegar, 138–39 violin, 342 viruses, 327, 327n Viscott, David, 161 visible light, 275–76, 275n visual arts, 316–23, 321f vitamins A, 109, 111t, 112n B, 111t, 143 C, 74–75, 109, 109n–10n, 139–40 D, 110n, 112t E, 112t in human diet, 108–10 K, 110n, 112t Vitruvius, 239 volcanic ash, 238 volt, 200–201, 200n–201n voltage, 202 volume, measurement of, 44 Vonnegut, Kurt, 234 vulcanization, 73 Walker, Alice, 127 “wandering womb” theory, 326n washtub bass, 342 water boiling and freezing points of, 211, 211n centigrade temperature system and, 40 chemical structure of, 308 computer that runs on, 371–73, 371f distillation of, 124 heating, 331n nutrition and, 110 in thermometers, 212 in treatment of disease, 330–31 water-based ink, 255–56 water clocks, 205 waterproofing, 282–83 waterwheels, 176–79, 178t–79t watt, 44 wax, 94, 137 weather prediction, 214 wedging, 159 weight in aircraft, 297 combustion and, 31 measurement of, 40–41 welding, 245–46 wells, 160 wheat, 51, 76–77 wheelbarrow, 134, 134n wheel(s) for bicycles, 260–61, 263f pottery, 163n, 183 whey, 139 white dwarf star, 311 white mulberry, 77–78, 225n white willow, 78–79 wild cabbage, 79 Williams, Robin, 245 willow, 102 wind instruments, 342–43 windmills, 176–79, 178t–79t wind tunnel, 294 wine monks producing, 80n natural fermentation of, 62n, 68 in thermometers, 212 wine-making, 255f wings, 295–97, 295f, 297f wire electrical, 199 for solenoids, 280 wire drawing, 244, 244f wired telegraph, 278, 278n withdrawal method, 231t wolves, 85, 91–92 women’s liberation, 262 woodblock printing, 252 wool, 225 Woolf, Virginia, 135 wounds, 338–39 Wright brothers, 298n written language, 11t, 16–20 written materials, 247–57 written numbers, 24t XOR (gate), 366t, 367–70, 367f, 367t, 369f, 370f, 371f, 372, 372n xylophone, 341n yam, 79 Y-chromosomal Adam, 53 yeast, 141–42 yokes, 128 zero, 26t, 28 ABCDEFGHIJKLMNOPQRSTUVWXYZ ABOUT THE AUTHOR Ryan North is the New York Times-bestselling author of Romeo and/or Juliet and To Be or Not To Be. He's the creator of Dinosaur Comics and the Eisner Award-winning writer of Adventure Time, Jughead, and The Unbeatable Squirrel Girl for Marvel Comics, and he has a master's in computational linguistics from the University of Toronto. Ryan lives in Toronto with his wife, Jenn, and their dog, Noam Chompsky. What’s next on your reading list? Discover your next great read! * * * Get personalized book picks and up-to-date news about this author. Sign up now. * And this is actually a simple example, dealing as it does in physical things like sisters and egged houses that can actually be imagined.

AI 2041: Ten Visions for Our Future
by Kai-Fu Lee and Qiufan Chen
Published 13 Sep 2021

A famous test of machine intelligence known as the Turing Test hinges on whether NLP conversational software is capable of fooling humans into thinking that it, too, is human. Scientists have been developing NLP to analyze, understand, and even generate human language for a long time. Starting in the 1950s, computational linguists attempted to teach natural language to computers according to naïve views of human language acquisition (starting from vocabulary sets, conjugation patterns, and grammatical rules). Recently, however, deep learning has superseded these early approaches. The reason, as you may have guessed, is that advances in deep learning have demonstrated the capability of modeling complex relationships and patterns in ways that are uniquely suitable for computers, and scalable with the increasing availability of very large training data sets.

pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence
by Ray Kurzweil
Published 31 Dec 1998

He cites the following sentence:“What number of products of products of products of products of products of products of products of products was the number of products of products of products of products of products of products of products of products?” as having 1,430 X 1,430 = 2,044,900 interpretations. 4 These and other theoretical aspects of computational linguistics are covered in Mary D. Harris, Introduction to Natural Language Processing (Reston, VA: Reston Publishing Co., 1985). CHAPTER 6: BUILDING NEW BRAINS ... 1 Hans Moravec is likely to make this argument in his 1998 book Robot: Mere Machine to Transcendent Mind (Oxford University Press; not yet available as of this writing). 2 One hundred fifty million calculations per second for a 1998 personal computer doubling twenty-seven times by the year 2025 (this assumes doubling both the number of components, and the speed of each component every two years) equals about 20 million billion calculations per second.

pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection
by Jacob Silverman
Published 17 Mar 2015

In addition to fake accounts, people also post things that are intentionally insincere and misleading, including in their profiles, which further complicates the effort to divide people into the kinds of highly specific categories (e.g., single dads from major cities who don’t belong to gyms) that market researchers like. Of course, these analytical tools are getting better, incorporating the latest discoveries in computational linguistics and deep learning, a form of artificial intelligence in which computers are taught to understand colloquial speech and recognize objects (such as people’s faces). Some sentiment analysis software now applies several different filters to each piece of text in order to consider not only the tone and meaning of the utterance but also whether the source is reliable or somehow biased.

pages: 661 words: 187,613

The Language Instinct: How the Mind Creates Language
by Steven Pinker
Published 1 Jan 1994

Here are the more common ones: -able -age -al -an -ant -ance -ary -ate -ed -en -er -ful -hood -ic -ify -ion -ish -ism -ist -ity -ive -ize -ly -ment -ness -ory -ous -y In addition, English is free and easy with “compounding,” which glues two words together to form a new one, like toothbrush and mouse-eater. Thanks to these processes, the number of possible words, even in morphologically impoverished English, is immense. The computational linguist Richard Sproat compiled all the distinct words used in the forty-four million words of text from Associated Press news stories beginning in mid-February 1988. Up through December 30, the list contained three hundred thousand distinct word forms, about as many as in a good unabridged dictionary.

pages: 698 words: 198,203

The Stuff of Thought: Language as a Window Into Human Nature
by Steven Pinker
Published 10 Sep 2007

Only if the language interpreter in the brain stupidly interprets a phrase by looking for the intersection of each of its components, such as the things in the world that are both “knives” and “good things.” A more incisive interpreter could wiggle a probe inside the noun and prise out the component of meaning that is modified by good, sparing it from having to saddle the word good with dozens of meanings. What is this meaning component? The computational linguist James Pustejovsky argues that Aristotle got it right when he proposed that the mind understands every entity in terms of four causes: who or what brought it about; what it’s made of; what shape it has; and what it’s for.56 The interpreter for adjectives like good and fast (a good road, a fast road) and for verbs like begin (He began his sandwich, She began the book) digs into the part of the noun’s conceptual structure that specifies how the object is intended to be used (roads are for driving on, sandwiches are for eating, books for reading) and concludes that good and begin refer to that part.

pages: 647 words: 43,757

Types and Programming Languages
by Benjamin C. Pierce
Published 4 Jan 2002

New languages for querying and manipulating XML provide powerful static type systems based directly on these schema languages (Hosoya and Pierce, 2000; Hosoya, Vouillon, and Pierce, 2001; Hosoya and Pierce, 2001; Relax, 2000; Shields, 2001). A quite different application of type systems appears in the field of computational linguistics, where typed lambda-calculi form the basis for formalisms such as categorial grammar (van Benthem, 1995; van Benthem and Meulen, 1997; Ranta, 1995; etc.). [1]Static elimination of array-bounds checking is a long-standing goal for type system designers. In principle, the necessary mechanisms (based on dependent types—see §30.5) are well understood, but packaging them in a form that balances expressive power, predictability and tractability of typechecking, and complexity of program annotations remains a significant challenge.

pages: 913 words: 219,078

The Marshall Plan: Dawn of the Cold War
by Benn Steil
Published 13 Feb 2018

“Europe: The New Dark Continent.” March 18, 1945. New York Times Magazine. “No.1 Envoy to Europe.” September 21, 1947. Niculae, Vlad, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. “Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game.” Proceedings of the Association of Computational Linguistics. June 2015. http://vene.ro/betrayal/niculae15betrayal.pdf. Norman, Laurence, and Julian E. Barnes. “EU Pushes Deeper Defense Cooperation.” Wall Street Journal. September 13, 2016. Nover, Barnet. “It’s Up to Us: Investing in World Freedom.” Washington Post. March 13, 1947. ———. “Marshall vs.

pages: 846 words: 232,630

Darwin's Dangerous Idea: Evolution and the Meanings of Life
by Daniel C. Dennett
Published 15 Jan 1995

In much the way there was biology before Darwin — natural history and physiology and taxonomy and such — all united by Darwin into what we know as biology today, so there was linguistics before Chomsky. The contemporary scientific field of linguistics, with its subdisciplines of phonology, syntax, semantics, and pragmatics, its warring schools and renegade offshoots (computational linguistics in AI, for instance), its subdisciplines of psycholinguistics and neurolinguistics, grows out of various scholarly traditions going back to pioneer language sleuths and theorists from the Grimm brothers to Ferdinand de Saussure and Roman Jakobson, but it was all unified into a richly interrelated family of scientific investigations by the theoretical advances first proposed by one pioneer, Noam Chomsky.

pages: 394 words: 110,352

The Art of Community: Building the New Age of Participation
by Jono Bacon
Published 1 Aug 2009

Striving for Clarity As communication is critical to the success of your community, it is stunning how many communities just get it plain wrong. Many are plagued with long-winded, overly complex, and difficult-to-use communication channels, and it seems you need a degree in rocket science to understand how to join these channels. Then, it seems you need to go back to school to get a degree in computational linguistics to fit into the culture and expectations of these communication channels. Many have an unwritten rule book stapled to the side of the channel, and if you are unfamiliar with its scriptures, the response can be terse, forthright, and sometimes outright rude. This is the last thing you want.

pages: 918 words: 257,605

The Age of Surveillance Capitalism
by Shoshana Zuboff
Published 15 Jan 2019

Andrew Schwartz et al., “Predicting Individual Well-Being Through the Language of Social Media,” in Biocomputing 2016: Proceedings of the Pacific Symposium, 2016, 516–27, https://doi.org/10.1142/9789814749411_0047; H. Andrew Schwartz et al., “Extracting Human Temporal Orientation from Facebook Language,” Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, http://www.academia.edu/15692796/Extracting_Human_Temporal_Orientation_from_Facebook_Language; David M. Greenberg et al., “The Song Is You: Preferences for Musical Attribute Dimensions Reflect Personality,” Social Psychological and Personality Science 7, no. 6 (2016): 597–605, https://doi.org/10.1177/1948550616641473; Michal Kosinski, David Stillwell, and Thore Graepel, “Private Traits and Attributes Are Predictable from Digital Records of Human Behavior,” Proceedings of the National Academy of Sciences of the United States of America 110, no. 15 (2013): 5802–5. 61.

pages: 1,263 words: 371,402

The Year's Best Science Fiction: Twenty-Sixth Annual Collection
by Gardner Dozois
Published 23 Jun 2009

He said, “Keep Sapphire frozen, and study your records of the Phites who first performed this boost. If they understood what they were doing, we can work it out too.” At the end of the week, Daniel signed the licensing deal and flew back to San Francisco. Lucien briefed him daily, and at Daniel’s urging hired a dozen new computational linguists to help with the problem. After six months, it was clear that they were getting nowhere. The Phites who’d invented the boost had had one big advantage as they’d tinkered with each other’s brains: it had not been a purely theoretical exercise for them. They hadn’t gazed at anatomical diagrams and then reasoned their way to a better design.