description: use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials
77 results
Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by
Dipanjan Sarkar
Published 1 Dec 2016
Automated Text Classification Text Classification Blueprint Text Normalization Feature Extraction Bag of Words Model TF-IDF Model Advanced Word Vectorization Models Classification Algorithms Multinomial Naïve Bayes Support Vector Machines Evaluating Classification Models Building a Multi-Class Classification System Applications and Uses Summary Chapter 5: Text Summarization Text Summarization and Information Extraction Important Concepts Documents Text Normalization Feature Extraction Feature Matrix Singular Value Decomposition Text Normalization Feature Extraction Keyphrase Extraction Collocations Weighted Tag–Based Phrase Extraction Topic Modeling Latent Semantic Indexing Latent Dirichlet Allocation Non-negative Matrix Factorization Extracting Topics from Product Reviews Automated Document Summarization Latent Semantic Analysis TextRank Summarizing a Product Description Summary Chapter 6: Text Similarity and Clustering Important Concepts Information Retrieval (IR) Feature Engineering Similarity Measures Unsupervised Machine Learning Algorithms Text Normalization Feature Extraction Text Similarity Analyzing Term Similarity Hamming Distance Manhattan Distance Euclidean Distance Levenshtein Edit Distance Cosine Distance and Similarity Analyzing Document Similarity Cosine Similarity Hellinger-Bhattacharya Distance Okapi BM25 Ranking Document Clustering Clustering Greatest Movies of All Time K-means Clustering Affinity Propagation Ward’s Agglomerative Hierarchical Clustering Summary Chapter 7: Semantic and Sentiment Analysis Semantic Analysis Exploring WordNet Understanding Synsets Analyzing Lexical Semantic Relations Word Sense Disambiguation Named Entity Recognition Analyzing Semantic Representations Propositional Logic First Order Logic Sentiment Analysis Sentiment Analysis of IMDb Movie Reviews Setting Up Dependencies Preparing Datasets Supervised Machine Learning Technique Unsupervised Lexicon-based Techniques Comparing Model Performances Summary Index Contents at a Glance About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1: Natural Language Basics Chapter 2: Python Refresher Chapter 3: Processing and Understanding Text Chapter 4: Text Classification Chapter 5: Text Summarization Chapter 6: Text Similarity and Clustering Chapter 7: Semantic and Sentiment Analysis Index About the Author and About the Technical Reviewer About the Author Dipanjan Sarkar is a data scientist at Intel, the world’s largest silicon company, which is on a mission to make the world more connected and productive.
…
The key aspect of sentiment analysis is to analyze a body of text for understanding the opinion expressed by it and other factors like mood and modality. Usually sentiment analysis works best on text that has a subjective context than on that with only an objective context. This is because when a body of text has an objective context or perspective to it, the text usually depicts some normal statements or facts without expressing any emotion, feelings, or mood. Subjective text contains text that is usually expressed by a human having typical moods, emotions, and feelings. Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment.
…
I encourage you to experiment with more propositions and FOL expressions by building your own assumptions, domain, and rules. Sentiment Analysis We will now discuss several concepts, techniques, and examples with regard to our second major topic in this chapter, sentiment analysis. Textual data , even though unstructured, mainly has two broad types of data points: factual based (objective) and opinion based (subjective). We briefly talked about these two categories at the beginning of this chapter when I introduced the concept of sentiment analysis and how it works best on text that has a subjective context. In general, social media, surveys, and feedback data all are heavily opinionated and express the beliefs, judgement, emotion, and feelings of human beings.
Terms of Service: Social Media and the Price of Constant Connection
by
Jacob Silverman
Published 17 Mar 2015
This insight might be useful for therapists, doctors, and public health professionals, but the company’s CEO told the Wall Street Journal that he drew on this information to advise drug companies in their ad targeting. The most likely application of sentiment analysis, then, is to give a slight edge to hedge funds and advertisers. At the very least, a gaggle of digital media consultants are pulling down hefty fees selling these services to deep-pocketed corporate clients. But what happens when sentiment analysis is not just spilling out reports for an executive’s consumption but is actually linked to potentially vital systems? And what happens then if a network becomes seeded with misinformation?
…
To become part of the social web, then, is to join the networks of surveillance, tracking, and data circulation that now support a vast informational economy and increasingly shape our social and cultural lives. Few aspects of contemporary life have gone unaffected by this shift, by the ability to publish immediately, freely, and to a massive audience. Shareability, and the drive to rack up likes and other metrics, guides the agendas of magazine editors and the budgets of marketers. Sentiment analysis—the mining of social-network data to determine the attitudes of individuals or whole populations—helps intelligence analysts learn where potential extremists are becoming radicalized. Advertisers collect social-media data and form consumer profiles with tens of thousands of pieces of information.
…
These companies will tinker with policies, especially after every public outrage and class-action lawsuit, but the end point remains the same: to retain rights over your data and expressions, and to make the transition from a status update to a related, paid advertisement as smooth as possible. CONVERTING EMOTIONS INTO PROFITABLE DATA Like buttons and taggable emotions are just two features of what has become a like economy, which depends on the growth of sentiment analysis, the examination of huge data sets to find out how people are reacting to news, products, or the events of their own lives. Retailers and advertisers want to know what individual consumers are thinking and buying, but they, along with investors, banks, consultants, and others, also want to be able to take the pulse of public opinion.
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
by
Seth Stephens-Davidowitz
Published 8 May 2017
The most negative words include “sad,” “death,” and “depression.” They thus have built an index of the mood of a huge set of words. Using this index, they can measure the average mood of words in a passage of text. If someone writes “I am happy and in love and feeling awesome,” sentiment analysis would code that as extremely happy text. If someone writes “I am sad thinking about all the world’s death and depression,” sentiment analysis would code that as extremely sad text. Other pieces of text would be somewhere in between. So what can you learn when you code the mood of text? Facebook data scientists have shown one exciting possibility. They can estimate a country’s Gross National Happiness every day.
…
But, once upon a time, not so long ago, human beings read stories, sometimes in books. Sentiment analysis can teach us a lot here, too. A team of scientists, led by Andy Reagan, now at the University of California at Berkeley School of Information, downloaded the text of thousands of books and movie scripts. They could then code how happy or sad each point of the story was. Consider, for example, the book Harry Potter and the Deathly Hallows. Here, from that team of scientists, is how the mood of the story changes, along with a description of key plot points. Note that the many rises and falls in mood that the sentiment analysis detects correspond to key events.
…
I call this graphic “Drink. Work. Pray.” In people’s teens, they’re drinking. In their twenties, they are working. In their thirties and onward, they are praying. DRINK.WORK.PRAY 19- to 22-year-olds 23- to 29-year-olds 30- to 65-year-olds A powerful new tool for analyzing text is something called sentiment analysis. Scientists can now estimate how happy or sad a particular passage of text is. How? Teams of scientists have asked large numbers of people to code tens of thousands of words in the English language as positive or negative. The most positive words, according to this methodology, include “happy,” “love,” and “awesome.”
Applied Artificial Intelligence: A Handbook for Business Leaders
by
Mariya Yao
,
Adelyn Zhou
and
Marlene Jia
Published 1 Jun 2018
Retrieved from http://www.businessinsider.com/google-ai-images-raise-100000-at-auction-2016-2 (15) Goleman, D. (2008, March 24). When Emotional Intelligence Does Not Matter More Than IQ. Retrieved from http://www.danielgoleman.info/whenemotional-intelligence-does-not-matter-more-than-iq (16) Sentiment analysis. (n.d.). In Wikipedia. Retrieved on November 17, 2017, from http://en.wikipedia.org/wiki/Sentiment_analysis (17) Knight, W. (2016, June 13). Emotional intelligence might be a virtual assistant’s secret weapon. MIT Technology Review. Retrieved from http://www.technologyreview.com/s/601654/amazon-working-on-making-alexarecognize-your-emotions/ (18) Talbot, D. (2014, September 19).
…
In 2016, Google hosted an exhibition of AI-generated art that collectively sold for $97,605.(14) Systems That Relate Daniel Goleman, a psychologist and author of the book Emotional Intelligence, believes that our emotional intelligence quotient (EQ) is more important than our intelligence quotient (IQ) in determining our success and happiness.(15) As human employees increasingly collaborate with AI tools at work and digital assistants like Apple’s Siri and Amazon Echo’s Alexa permeate our personal lives, machines will also need emotional intelligence to succeed in our society. Sentiment analysis, also known as opinion mining or emotion AI, extracts and quantifies emotional states from our text, voice, facial expressions, and body language.(16) Knowing a user’s affective state enables computers to respond empathetically and dynamically, as our friends do. The applications to digital assistants are obvious, and companies like Amazon are already prioritizing emotional recognition for voice products like the Echo.(17) Emotional awareness can also improve interpersonal business functions such as sales, marketing, and communications.
…
Data Connection Does the prospective product offer seamless connections with the other enterprise tools on which you depend, such as your data and analytics provider or CRM system? Is the integration built-in, and if so, is it offered via an application programming interface (API) or platform? If not, will it require custom development? Language Support If you’re working on a consumer-facing global product, such as a conversational agent or sentiment analysis, your solution may need to support additional languages. How many languages and types of voices does the prospective product support? Professional Support Most AI systems will need to be continually trained and updated. How accessible and competent is the vendor’s professional services team to help onboard and maintain your AI system?
Succeeding With AI: How to Make AI Work for Your Business
by
Veljko Krunic
Published 29 Mar 2020
(That part of AI which determines whether a customer is happy or not is technically called sentiment analysis.) You already have an AI software library that performs sentiment analysis. The data is in your customer support system, which is a web application. Answer to question 1: More than one result is an acceptable answer to this question; after all, there’s no universal ML pipeline that works the best in all cases! Figure B.1 shows one ML pipeline I’d start with. Web app handling customer support Support cases database Figure B.1 Sentiment Analysis Alerting System ML pipeline for sentiment analysis of the customer feedback Question 2: Suppose you implement the ML pipeline from the previous example in your organization.
…
Question 1: Construct an ML pipeline for this AI project: the project takes feedback from your customers and analyzes it. If a customer appears unhappy, an alert is issued so that you can contact the customer and try to appease them before they decide to leave. (That part of AI which determines whether a customer is happy or not is technically called sentiment analysis.) You already have an AI software library that performs sentiment analysis. The data is in your customer support system, which is a web application. Question 2: Suppose you implement the ML pipeline from the previous example in your organization. Which departments would be responsible for the implementation of which parts of the pipeline?
…
You have to run all your changes by a regulator, and changes are evaluated (almost exclusively) based on legal compliance, with a typical change taking five years to be approved. You plan to use AI to understand online customer feedback and your customers’ satisfaction. The technical term for this process is sentiment analysis. Question 6: What are some problems with the following proposal? We’ll use this AI and feed it patterns of our customer behavior, and it will reveal to us the causes of our customers’ decisions. Question 7: You’re working in a domain in which it isn’t easy to define business metrics that you can use to measure the business result.
When McKinsey Comes to Town: The Hidden Influence of the World's Most Powerful Consulting Firm
by
Walt Bogdanich
and
Michael Forsythe
Published 3 Oct 2022
Saudi Arabia’s population is one of the youngest in the world and one of the most engaged on platforms like Twitter and Facebook. A new technique—“sentiment analysis”—mined social media posts for keywords, allowing companies to measure attitudes about their products. McKinsey got excited about the technique, mentioning it in multiple reports. So did the Saudis; a group of Saudi scholars called it “opinion mining.” The Saudis latched onto the fact that sentiment analysis had potential way beyond determining how people felt about their pizza delivery experience. In a country like Saudi Arabia, where it seemed everyone was chatting on Facebook, Instagram, or Twitter, it could be used by the government to take the public’s temperature and smoke out influential malcontents.
…
Around the same time McKinsey was working alongside SCL, one of its Dubai-based senior partners, Enrico Benni, was seeking out people across the firm for a new potential Saudi assignment: to perform sentiment analysis in Arabic, one former employee said. McKinsey’s Saudi-based employees highlighted this work on their public profiles. One Saudi-based Elixir employee, Ahmad Alattas, listed “social media monitoring to conduct and study public sentiment analysis” among his jobs. Soon, this new line of work yielded results, but perhaps not the kind McKinsey had in mind. * * * — In early 2018 a person from Saudi Arabia phoned Omar Abdulaziz to check up on him.
…
“I thought, ‘Oh, that’s great,’ ” Abdulaziz recalled more than two years later. “In the beginning I didn’t know that it would be such an important thing. I couldn’t imagine how MBS would be interested in my work. So I thought nothing would happen.” The banal title of the nine-page report, “Austerity Measures in Saudi Arabia,” belies its explosive content. It was sentiment analysis: weaponized. “Omar has a multitude of negative tweets on topics such as austerity and the royal decrees,” read one McKinsey bullet point. That May, Saudi emissaries traveled to Montreal to urge Abdulaziz to return to his homeland. As a young, hip YouTube star, he’d be a celebrity, they told him.
Nervous States: Democracy and the Decline of Reason
by
William Davies
Published 26 Feb 2019
We communicate them physically in our facial expressions and body language. They tell us important things about our relationships, lifestyles, desires and identities. Feelings of this sort present themselves to our minds, such that we actually notice them, even if we can’t control them. Emotions can now be captured and algorithmically analyzed (“sentiment analysis”) thanks to the behavioral data that digital technologies collect. And yet feelings of this sort are not welcome everywhere. In public life, an accusation of being “emotional” traditionally carries the implication that someone has lost objectivity and given way to irrational forces. Feelings are how we orient ourselves, while also providing a reminder of shared humanity.
…
As more of our behavior and communication is digitally captured, and with rapid advances in “emotional artificial intelligence” (or “affective computing”), it is becoming possible to study the movement of emotions and sentiments through crowds with increasing scientific precision. Techniques of digital “sentiment analysis,” algorithmically trained upon social media content, facial movements, and other bodily cues, is taking Le Bon’s biological approach to psychology, and turning it into a whole industry of market research. The emotional content of a tweet, eye movement, or tone of voice can now be captured and analyzed.
…
Unlike statisticians or social scientists, Silicon Valley is not seeking to create an accurate portrait of society, but to provide the infrastructure on which we all depend, which will then capture our movements and sentiments with the utmost sensitivity. Advances in machine-learning techniques have improved sensitivity beyond that of human consciousness. “Sentiment analysis” involves training algorithms to detect different types of emotion in a given sentence, and can be used to monitor the emotions being expressed on Twitter, Facebook, email, or (due to voice-recognition technology) phones. “Facial analytics” does something similar to detect how someone might be feeling from the movements in their face, and can now apparently be used to detect a person’s sexuality.11 The entire field of “affective computing,” which is transforming market research, uses machine learning to enable computers to identify emotions by means of body language and behavior.
Python Data Analytics: With Pandas, NumPy, and Matplotlib
by
Fabio Nelli
Published 27 Sep 2018
doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">\r\n<html>\r\n<hea' Now, however, the conversion into NLTK corpus requires an additional library, bs4 (BeautifulSoup), which provides you with suitable parsers that can recognize HTML tags and extract the text contained in them.from bs4 import BeautifulSoup raw = BeautifulSoup(html, "lxml").get_text() tokens = nltk.word_tokenize(raw) text = nltk.Text(tokens) Now you also have a corpus in this case, even if you often have to perform more complex cleaning operations than the previous case to eliminate the words that do not interest you. Sentimental Analysis Sentimental analysis is a new field of research that has developed very recently in order to evaluate people’s opinions about a particular topic. This discipline is based on different techniques that use text analysis and its field of work in the world of social media and forums ( opinion mining ). Thanks to comments and reviews by users, sentimental analysis algorithms can evaluate the degree of appreciation or evaluation based on certain keywords. This degree of appreciation is called opinion and has three possible values: positive, neutral, or negative.
…
Population in 2014 Conclusions Chapter 12: Recognizing Handwritten Digits Handwriting Recognition Recognizing Handwritten Digits with scikit-learn The Digits Dataset Learning and Predicting Recognizing Handwritten Digits with TensorFlow Learning and Predicting Conclusions Chapter 13: Textual Data Analysis with NLTK Text Analysis Techniques The Natural Language Toolkit (NLTK) Import the NLTK Library and the NLTK Downloader Tool Search for a Word with NLTK Analyze the Frequency of Words Selection of Words from Text Bigrams and Collocations Use Text on the Network Extract the Text from the HTML Pages Sentimental Analysis Conclusions Chapter 14: Image Analysis and Computer Vision with OpenCV Image Analysis and Computer Vision OpenCV and Python OpenCV and Deep Learning Installing OpenCV First Approaches to Image Processing and Analysis Before Starting Load and Display an Image Working with Images Save the New Image Elementary Operations on Images Image Blending Image Analysis Edge Detection and Image Gradient Analysis Edge Detection The Image Gradient Theory A Practical Example of Edge Detection with the Image Gradient Analysis A Deep Learning Example: The Face Detection Conclusions Appendix A: Writing Mathematical Expressions with LaTeX With matplotlib With IPython Notebook in a Markdown Cell With IPython Notebook in a Python 2 Cell Subscripts and Superscripts Fractions, Binomials, and Stacked Numbers Radicals Fonts Accents Appendix B: Open Data Sources Political and Government Data Health Data Social Data Miscellaneous and Public Data Sets Financial Data Climatic Data Sports Data Publications, Newspapers, and Books Musical Data Index About the Author and About the Technical Reviewer About the Author Fabio Nelliis a data scientist and Python consultant, designing and developing Python applications for data analysis and visualization.
…
Analyzing these texts has therefore become a source of enormous interest, and there are many techniques that have been introduced for this purpose, creating a real discipline in itself. Some of the more important techniques are the following:Analysis of the frequency distribution of words Pattern recognition Tagging Analysis of links and associations Sentiment analysis The Natural Language Toolkit (NLTK) If you program in Python and want to analyze data in text form, one of the most commonly used tools at the moment is the Python Natural Language Toolkit (NLTK). NLTK is nothing more than a Python library ( https://www.nltk.org/ ) in which there are many tools specialized in processing and text data analysis.
Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
by
Valliappa Lakshmanan
,
Sara Robinson
and
Michael Munn
Published 31 Oct 2020
To convert the data from timestamp to day of the week, you’ll need to do some data preprocessing. This preprocessing step can also be referred to as data transformation. An instance is an item you’d like to send to your model for prediction. An instance could be a row in your test dataset (without the label column), an image you want to classify, or a text document to send to a sentiment analysis model. Given a set of features about the instance, the model will calculate a predicted value. In order to do that, the model is trained on training examples, which associate an instance with a label. A training example refers to a single instance (row) of data from your dataset that will be fed to your model.
…
Accurate data labels are just as important as feature accuracy. Your model relies solely on the ground truth labels in your training data to update its weights and minimize loss. As a result, incorrectly labeled training examples can cause misleading model accuracy. For example, let’s say you’re building a sentiment analysis model and 25% of your “positive” training examples have been incorrectly labeled as “negative.” Your model will have an inaccurate picture of what should be considered negative sentiment, and this will be directly reflected in its predictions. To understand data completeness, let’s say you’re training a model to identify cat breeds.
…
Although it depends on your data and prediction task, when we say “small dataset” here, we’re referring to datasets with hundreds or a few thousand training examples. Another factor to take into account when deciding whether to fine-tune is how similar your prediction task is to that of the original pre-trained model you’re using. When the prediction task is similar or a continuation of the previous training, as it was in our movie review sentiment analysis model, fine-tuning can produce higher-accuracy results. When the task is different or the datasets are significantly different, it’s best to freeze all the layers of the pre-trained model instead of fine-tuning them. Table 4-1 summarizes the key points.4 Table 4-1. Criteria to help choose between feature extraction and fine-tuning Criterion Feature extraction Fine-tuning How large is the dataset?
The Happiness Industry: How the Government and Big Business Sold Us Well-Being
by
William Davies
Published 11 May 2015
There are surely ample political and material problems to deal with right now, before we divert quite so much attention towards the mental and neural conditions through which we individually experience them. There is also a sense that when the doyens of the World Economic Forum seize an agenda with so much gusto, there is at least some cause for suspicion. The mood-tracking technologies, sentiment analysis algorithms and stress-busting meditation techniques are put to work in the service of certain political and economic interests. They are not simply gifted to us for our own Aristotelian flourishing. Positive psychology, which repeats the mantra that happiness is a personal ‘choice’, is as a result largely unable to provide the exit from consumerism and egocentricity that its gurus sense many people are seeking.
…
Companies such as Nike are now exploring ways in which health and fitness products can be sold alongside quantified self apps, which will allow individuals to make constant reports of their behaviour (such as jogging), generating new data sets for the company in the process. There is a third development, the political and philosophical implications of which are potentially the most radical of all. This concerns the capability to ‘teach’ computers how to interpret human behaviour in terms of the emotions that are conveyed. For example, the field of ‘sentiment analysis’ involves the design of algorithms to interpret the sentiment that is expressed in a given sentence, for example, a single tweet. The MIT Affective Computing research centre is dedicated to exploring new ways in which computers might read people’s moods through evaluating their facial expressions, or might carry out ‘emotionally intelligent’ conversations with people, to provide them with therapeutic support or friendship.
…
There are those who possess the power of algorithmic analysis and data mining to navigate a world in which there are too many pieces of data to be studied individually. These include market research agencies, social media platforms and the security services. But for the rest of us, impulse and emotion have become how we orientate and simplify our decisions. Hence the importance of fMRI and sentiment analysis in the digital age: tools which visualize, measure and codify our feelings become the main conduit between an esoteric, expert discourse of mathematics and facts, and a layperson’s discourse of mood, mystical belief and feeling. ‘We’ simply feel our way around, while ‘they’ observe and algorithmically analyse the results.
Finding Alphas: A Quantitative Approach to Building Trading Strategies
by
Igor Tulchinsky
Published 30 Sep 2019
Look-Ahead Bias in Machine Learning Researchers using machine learning techniques also can introduce look- ahead bias. In particular, they may tune some hyper-parameters on the entire data sample and then use those parameters in the backtest. Hyper- parameters should always be tuned using only backward-looking data. Similarly, in the area of sentiment analysis, researchers should take note of vendor-supplied sentiment dictionaries that may have been trained on forward-looking data. Data Mining A researcher may data mine a signal by tinkering with its construction until it has favorable in-sample performance; this is commonly called overfitting. The standard approach to controlling data mining involves a holdout, which withholds data in the simulation and takes one of two broad forms: a time-series holdout or an asset holdout.
…
In the pursuit of alphas, the meaningful interpretation and analysis of financial statements can be a solid basis for informed investment decisions. 20 Fundamental Analysis and Alpha Research By Xinye Tang and Kailin Qi Along with techniques such as pairs trading, momentum investing, event-driven investing, and news sentiment analysis, fundamental analysis is an important tool used in designing quantitative alphas. By examining relevant economic and financial factors, fundamental analysts attempt to reveal a security’s value and determine whether it is undervalued or overvalued. A potentially profitable portfolio can then be constructed by going long the relatively undervalued securities and/ or going short the overvalued ones.
…
Since then, key research areas include the prediction power of various forms of social media; social media applied to individual stocks; the discussion of noise in social media; finding valuable tweets by observing retweets and tweets from celebrities; and social media sentiment with long-term firm value. SENTIMENT Simply speaking, sentiment measures the quality of news. The most basic definition of sentiment is the polarity of the news: good, bad, or neutral. Advanced sentiment analysis can express more sophisticated emotional details, such as “anger,” “surprise,” or “beyond expectations.” The Impact of News and Social Media on Stock Returns161 The construction of news sentiment usually involves natural language processing and statistical/machine learning algorithms (for example, naive Bayes and support vector machines).
Bad Data Handbook
by
Q. Ethan McCallum
Published 14 Nov 2012
I knew unethical people would lie in online reviews in order to inflate ratings or attack competitors, but what I didn’t know, and only learned by accident, is that individuals will sometimes write reviews that completely contradict their associated rating, without any regard to how it affects a business’s online reputation. And often this is for businesses that an individual likes. How did I learn this? By using ratings and reviews to create a sentiment corpus, I trained a sentiment analysis classifier that could reliably determine the sentiment of a review. While evaluating this classifier, I discovered that it could also detect discrepancies between the review sentiment and the corresponding rating, thereby finding liars and confused reviewers. Here’s the whole story of how I used text classification to identify an unexpected source of bad data...
…
We wanted to do this for our data, as well as aggregate the overall positive sentiment from all the reviews for a business, independent of any average rating. With that in mind, I figured I could create a sentiment classifier,[11] using rated reviews as a training corpus. A classifier works by taking a feature set and determining a label. For sentiment analysis, a feature set is a piece of text, like a review, and the possible labels can be pos for positive text, and neg for negative text. Such a sentiment classifier could be run over a business’s reviews in order to calculate an overall sentiment, and to make up for any missing rating information.
…
The assumption behind this is that high rated reviews will have positive language, and low rated reviews will have more negative language. Polarized language is ideal for text classification, because the classifier can learn much more precisely those words that indicate pos and those words that indicate neg. Because I needed sentiment analysis for local businesses, not movies, I used a similar method to create my own sentiment training corpus for local business reviews. From a selection of businesses, I produced a corpus where the pos text came from 5 star reviews, and the neg text came from 1 star reviews. I actually started by using both 4 and 5 star reviews for pos, and 1 and 2 star reviews for neg, but after a number of training experiments, it was clear that the 2 and 4 star reviews had less polarizing language, and therefore introduced too much noise, decreasing the accuracy of the classifier.
Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web
by
Paul Adams
Published 1 Nov 2011
The good news is that research has shown that when businesses are transparent about what data they have on people, and people have control over that data, they tell advertisers more about themselves.10 If trustworthiness and expertise are requirements for credibility, then transparency is becoming increasingly critical for building trustworthiness. Why negative comments are good for your brand The emergence of the social web means that more people are talking openly about businesses, and many businesses are nervous about any negative commentary. Most want sentiment analysis in the advertising products they use so they can hide the negative comments and only promote the positive comments. But this is the wrong approach. People can easily differentiate between a natural conversation and something that is controlled, and they won’t react well to the latter. Hiding negative comments is not transparent; it will dramatically decrease credibility.
…
* * * Quick Tips Building credibility with a business is similar to building trust with someone you just met. It is a slow process, often taking months and even years, and marketers need to be patient. There is no quick solution to creating a credible brand. One way to fast-track it is to be recommended by people’s friends. Don’t use sentiment analysis to filter out negative comments, and don’t delete negative comments on your Facebook page. Look at it as an opportunity to learn and respond. If people have something negative to say, it’s because they had a poor experience with your brand. This is something you should want to rectify rather than hide
…
See social networks New York Times 19 News Feed 134, 135 Nickerson, Raymond 127 nonconscious brain 107–111 decision making by 103–104, 107, 109–110, 148 processing capacity of 107, 108 Nordgren, Loran 115 Nudge (Thaler and Sunstein) 97 O On Intelligence (Hawkins) 114 100 Things Every Designer Needs to Know About People (Weinschenk) 114 overconfidence 96 Owyang, Jeremiah 69, 144 P Pahl, Ray 52, 55, 66, 67 passive sharing 138 patterns 105, 110, 114 Pedigree community 122 Penenberg, Adam 67, 144 permission marketing 12, 14, 133–138 friends and 137–138, 143 word of mouth and 135–137 Permission Marketing (Godin) 14, 143 personal information 139–140 Persuasive Technology (Fogg) 98 photos, Facebook 3, 4 Politics of Happiness, The (Bok) 27 polls, business 22 Predictably Irrational (Ariely) 98, 128, 144 predictions 105 preferential attachment 32 priming 125 problem-solving 105 Proctor & Gamble 109, 121 public ratings 26 push marketing 137 R rational thinking 102–104 reductive thinking 102 relationships changes in 66 patterns of 55–58 strong ties 53, 54, 59–62 types of 52–54 uniqueness of 52 weak ties 53, 54, 62–65 relevance 138 reputation management 17 Rethinking Friendships (Spencer and Pahl) 67 S Salganik, Matthew 98 Science of Influence, The (Hogan) 115, 144 Searching for a Corporate Savior (Khurana) 82 sentiment analysis 140, 142 Sephora marketing campaign 18 serendipitous audience 25 Sernovitz, Andy 68 sharing feelings 19 information 41, 146 passive 138 similarity bias 118 Simon, Herbert 98 Simonson, Itamar 126, 128 six degrees of separation 43–44, 73 Six Degrees (Watts) 49, 82 Smart Lists 32 Social Animal, The (Brooks) 48 social behavior 150 social bonds 16–17, 18 social cognitive theory 128 social networks communication patterns on 23–24 consumer behavior and 106 decision making using 90–93 evolution of 31–32 groups connected through 39 historical overview of 9, 146 importance of understanding 150 influence within 94–95 information communicated on 24–26 pattern of connections in 33–35, 47 strong ties on 23, 60–61 structure of 30–35, 42–46, 81, 147–148 social norms 88 social proof 86–89 social web future of 149–151 how to think of 8 importance of 11–12 next great challenge on 93 summary points about 146–149 society, influence of 87–88 soulmates 53 Spencer, Liz 52, 55, 66, 67 Sponsored Stories 142 status updates 16–17 Strangers to Ourselves (Wilson) 99 strong ties 53, 54, 59–62 average number of 60 buying decisions and 61–62 communications with 60–61 disproportionate influence of 61, 147 importance of having 59 structure of social networks 30–35 connection patterns and 33–35 homophily principle and 32, 45–46 idea spreading and 76, 147–148 influence related to 42–46 laws governing 31–32 Stumbling on Happiness (Gilbert) 115 Sunstein, Cass 97 Surowiecki, James 92, 98 survival mechanism 16 sympathy group 34 T tagging photos 3 Target, poll example 22 targeted ads 80, 138, 139 technology human behavior and 9–10 interruption marketing and 130 Tetlock, Philip 95, 99 Thaler, Richard 97 Think Outside In blog 153 thinking rational 102–104 understanding of 151 three degrees of separation 43, 45, 46, 94 Ticketmaster 35 Tipping Point, The (Gladwell) 11, 14, 72, 82 transparency 139–141 trust building 139–142 levels of 91, 93 marketing and 131, 137 Twitter 73 U useful contacts 52 user ratings/reviews 137 V Viral Loop (Penenberg) 67, 144 visibility of products 21 W Watts, Duncan 10, 49, 73, 76, 82, 87, 98 weak ties 53, 54, 62–65 interactions with 62–64 sourcing information from 64–65 web, the how it’s changing 2–8 people-based rebuilding of 7, 8 phases of development 8 why it’s changing 9–10 See also social web Web Strategy blog 69 Weinschenk, Susan 114 Western cultures 88 Wikipedia 34, 90 Wilson, Timothy 99 Winning Decisions (Russo and Schoemaker) 99 word of mouth 135–137 Word of Mouth Marketing (Sernovitz) 68 Z Zynga games 2–3
Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
by
Thomas H. Davenport
Published 4 Feb 2014
Therefore, a more continuous approach to sampling, analyzing, and acting on data is necessary. Chapter_01.indd 16 03/12/13 3:24 AM Why Big Data Is Important to You and Your Organization 17 This is particularly at issue for applications involving ongoing monitoring of data, as in social media sentiment analysis. S entiment a nalysis allows an organization to assess whether the comments about its brands and products in blogs, tweets, and Facebook pages are positive or negative on balance. One potential problem with such monitoring applications is the tendency for managers to view a continuing stream of analysis and reports without making any decisions or taking any action.
…
Sometimes it’s important to admit that the data and analyses are not definitive. I’ve already talked about the HunchWorks project at the United Nations, which seeks to identify trends and hunches at an early stage in order to decide whether they merit further attention. This could also be the right approach for social sentiment analysis—to use it as a tipoff that further investigation is required, rather than a specific action. If you’re a little more certain—but not entirely—that something important is going on based on your big data analysis, you might consider an automated recommendation. If necessary, a human could override it.
…
See analytics business intelligence (BI), 7, 10, 10t, 14, 18, 23, 93, 102, 124, 128, 129, 130 business models, 41–42, 57, 168, 173, 188 business-to-business (B2B) firms, 42t, 43, 45–46 business-to-business-to-consumer (B2B2C) firms, 43, 46 business view, in big data stack, 119t, 123 Caesars Entertainment, 42, 179–180 Cafarella, Mike, 157 Capital One, 42 Carolinas HealthCare, 121, 122 cars driving data on, 52, 198 self-driving, 35, 41, 42, 65, 83, 148 Carter, John, 143 casino industry, 42, 179–180 Charles Schwab Corporation, 143 chief analytics officers, 143, 202 chief data officers (CDOs), 142–143 chief science officers, 142 Chrysler, 83 CIA, 19 Cisco Systems, 47 Citigroup, 187–188 Index.indd 219 Cleveland, William S., 195 cloud-based computing, 55, 89, 117, 163, 169, 192, 200, 208 Cloudera Hadoop, 115 commitment, culture of, 148 communication skills, 88, 92, 93, 99, 102–103 Competing on Analytics (Davenport and Harris), 2, 43 Compute Engine, 163 Concept 2, 12 conservative approach to big data adoption, 80, 81 consultants, data scientists as, 81, 98–99, 103–104, 112, 209 consumer products companies, 42, 42t, 43, 46, 54, 71, 82 Consumers Union, 67 Corporate Insight, 109 cost-reduction, 21, 60–63, 145 Coursera, 41 cows, data from, 11–12 credit card data, 37, 38, 42, 42t, 46, 164 culture for big data in organizations, 147–149, 152 customer relationship management (CRM), 54, 129f customers banking industry and, 9, 44, 49, 133 big data’s effect on relationships with, 26–27 business-to-business (B2B) firms and, 43, 45–46 business-to-business-to-consumer (B2B2C) firms and, 43, 46 data-based products and services for, 16, 23–24, 26, 66, 106, 155, 195 as focus of big data efforts, 16 future scenario of big data’s effect on relationships with, 35–38, 41–42, 58 identification of dissatisfaction and possible attrition of, 23, 48, 67, 68, 72, 78, 96, 179, 180, 181, 191 intermediaries reporting information about, 46 managers’ attention to, 21 marketing efforts targeted to, 27, 55, 63–64, 65, 67, 72, 79, 107, 108–109, 128, 142, 144, 179, 180, 197 media and entertainment firms and, 48, 49 03/12/13 2:04 PM 220 Index customers (continued) multichannel relationships with, 51, 67, 177, 186 Netflix Prize’s focus on, 16, 22, 66 overachievers and, 42, 42t, 46 regulatory environment for data from, 27 research on website behavior of, 164 sentiment analysis of, 17, 27, 107, 118, 123 service transaction histories from, 23 sharing data with, 167–168 social media and, 48, 50–51, 107 travel industry and, 75–76 underachievers and, 42t, 43–44 unstructured data from, 51, 67, 68, 69, 180, 186 volume of data warehoused from, 116–117, 168 Cutting, Doug, 157 CycleOps, 12 dashboards, 109, 128, 129, 130, 137, 167, 185, 198 data in big data stack, 119t, 121–122 success of big data initiatives and, 136–138 data disadvantaged organizations, 42t, 43 data discovery process big data strategy and, 70–72, 74–75, 75f, 84 enterprise orientation for, 139 focus of architecture on, 20, 201 GE’s experience with, 75 leadership and, 140 management orientation toward, 18–19 model generation for, 64 moderately aggressive approach to big data and, 82 objectives and, 75, 75f, 84 research on, 3 responsibility locus for, 76–77, 77f technical platform for, 131, 201 Data Lab product, 160 data mining, 122–123, 128, 183, 184 data production process big data strategy and, 70, 72–75, 75f, 84 data scientists and teams and, 201 enterprise orientation for, 139 Index.indd 220 GE’s experience with, 74–75 highly ambitious approach to big data and, 83 moderately aggressive approach to big data and, 82 objectives and, 75, 75f, 84 responsibility locus for, 76–77, 77f technical platform for, 74, 127, 129–130, 132, 133, 201 Data Science Central, 97 data scientists activities performed by, 15, 137–138, 148, 159–160, 199 analysts differentiated from, 15 background to, 86–87, 196–197 business expert traits of, 88 classic model of, 87–97 collaboration by, 165–167, 173, 176 development of products and services and, 16, 18, 20, 24, 61–62, 65, 66, 71, 79–80, 106, 161 education and training of, 14, 91, 92, 104, 184, 209 future for, 110–111 hacker traits of, 88–91 horizontal versus vertical, 97–99 job growth for, 111, 111f, 184–185 in large companies, 201 LinkedIn’s use of, 158, 160, 161 motivation of, 106 organizational structure with, 16, 61, 82, 140, 141, 142, 152, 153, 158, 173, 180, 187, 202, 207, 209 quantitative analyst traits of, 88, 93–97 research on, 3 retention of, 104–106, 112, 161 role of, 14, 209 scientist traits of, 88, 91–92 skills of, 71, 79, 88, 145, 147, 182–184, 185 sources of, for hiring, 101–105 start-ups using, 16, 157–158 team approach using, 99–101, 165–167, 181, 201, 209 traits of, 87, 88 trusted adviser traits of, 88, 92–93 data visualization, 124–125, 125f Davis, Jim, 163–164 DB2, 183.
Radical Technologies: The Design of Everyday Life
by
Adam Greenfield
Published 29 May 2017
As the capacity to detect and characterize emotional states has grown, these reasonably traditional, Taylorist notions of time-and-motion efficiency have been supplemented by a concern for the worker’s affective performance.38 Japan’s Keikyu Corporation, for example, began measuring the quality of its frontline employees’ smiles in 2009, scanning their “eye movements, lip curves and wrinkles,” and rating them on a 0-100 scale.39 As intrusive as this may seem, smiling is at least something under an employee’s conscious control, which cannot be said for all of the measurements of “body posture, facial expressions, physiology, semantics [and] who a person talks to and when” that the management consultancy Accenture recommends to ensure employees are “exhibiting effective social behaviors.”40 Such subconscious tells are picked up by the People Analytics suite the “emotion-aware sentiment analysis company” Kanjoya offers, which uses unstructured voice and text data to calibrate an employee’s “Attrition Risk” and “Workplace Value,” in addition to the expected “Performance.” The concern for retention implies something that a review of similar sentiment analysis systems makes entirely explicit: the demand that inner states be measured and used to determine the conditions of labor now applies to the white-collar workforce every bit as much as it does to checkout clerks or line workers.
…
A Snaptrends brochure for prospective customers in the law enforcement sector makes the proposition explicit: “From angry Facebook posts to suggestive Instagram uploads, today’s would-be criminals often leave A STRING OF CLUES across social media,” and a public-safety agency made aware of those CLUES can deploy its resources in time to preempt the commission of crime.19 Such tools use sentiment analysis, a facet of the emerging pseudoscience of “intent recognition,” to extract actionable intelligence from utterances.20 But it’s astonishing that anyone takes sentiment analysis seriously in any but the most trivial applications, let alone what is all too often the life-or-death context of a police stop. The algorithms involved are notoriously crude and simple-minded, stumbling when confronted with sarcasm and other common modes of expression.
…
As we’ve seen, however, the ability to detect behavioral anomalies and departures from acceptable performance profiles algorithmically and remotely is already well advanced. Though they presently stumble at precisely the kind of coded speech that the marginalized have always used to establish and maintain spaces free from oversight—Verlan, Cockney rhyming slang, Polari, 3arabizi—it would be foolish to assume that sentiment analysis and intent recognition will not develop further in the years ahead. And of course totalizing systems like the Chinese social-credit scheme now under active development propose to weave a net capable of capturing, characterizing and punishing all such insurgent acts and utterances, whether public or private; whether or not, indeed, they are conscious at all.
Natural Language Annotation for Machine Learning
by
James Pustejovsky
and
Amber Stubbs
Published 14 Oct 2012
This is where annotation comes in. The point of linguistic annotation is to identify textual components of your document that can be associated with particular features for the phenomena for which you want to develop learning algorithms. Let’s take some examples beyond the spam-ham distinction. Consider sentiment analysis applied to movie reviews or hotel ratings. The most expedient method for classifying movie reviews is to set up the learning problem with n-gram features. The words in the reviews are taken as independent features (lexical clues), and thrown into a description of the target function. While this works remarkably well in general, this approach will fail to capture properties that show up as nonlocal dependencies, such as the ways that negation and modality are often expressed in language.
…
The main idea behind SVMs is to find the best-fitting decision boundary between two classes, one that is maximally far from any point in the training data. Nonlinearly separable data can be handled elegantly by using a technique called the kernel trick, which maps the data into a higher dimension where it behaves in a linear fashion. SVMs have been applied very successfully to sentiment analysis (Pang et al. 2002). We won’t be going into detail about these, however; other books on machine learning (see the list at the start of the chapter) provide excellent guides for how these classifiers work, and the ones we’ve already discussed are enough to get you started in training algorithms on your annotated data.
…
They can be applied at a document, sentence, phrase, word, or any other level of language that is appropriate for your task. Using n-gram features is the simplest way to start with a classification system, but structure-dependent features and annotation-dependent features will help with more complex tasks such as event recognition or sentiment analysis. Decision trees are a type of ML algorithm that essentially ask “20 questions” of a corpus to determine what label should be applied to each item. The hierarchy of the tree determines the order in which the classifications are applied. The “questions” asked at each branch of a decision tree can be structure-dependent, annotation-dependent, or any other type of feature that can be discovered about the data.
Big Data: A Revolution That Will Transform How We Live, Work, and Think
by
Viktor Mayer-Schonberger
and
Kenneth Cukier
Published 5 Mar 2013
There are a myriad of ways to refer to IBM, notes the big-data expert DJ Patil, from I.B.M. to T. J. Watson Labs, to International Business Machines. And messiness can arise when we extract or process the data, since in doing so we are transforming it, turning it into something else, such as when we perform sentiment analysis on Twitter messages to predict Hollywood box office receipts. Messiness itself is messy. Suppose we need to measure the temperature in a vineyard. If we have only one temperature sensor for the whole plot of land, we must make sure it’s accurate and working at all times: no messiness allowed.
…
Yet the company enables the datafication of people’s thoughts, moods, and interactions, which could never be captured previously. Twitter has struck deals with two firms, DataSift and Gnip, to sell access to the data. (Although all tweets are public, access to the “firehose” comes at a cost.) Many businesses parse tweets, sometimes using a technique called sentiment analysis, to garner aggregate customer feedback or judge the impact of marketing campaigns. Two hedge funds, Derwent Capital in London and MarketPsych in California, started analyzing the datafied text of tweets as signals for investments in the stock market. (Their actual trading strategies were kept secret: rather than investing in firms that were ballyhooed, they may have bet against them.)
…
Importantly, their study used the metadata of who was connected to whom among Twitter followers to go a step further still. They noticed that subgroups of unvaccinated people may exist. What marks this research as particularly special is that where other studies, such as Google Flu Trends, used aggregated data to consider the state of individuals’ health, the sentiment analysis performed by Salathé actually predicted health behaviors. These early findings indicate where datafication will surely go next. Like Google, a gaggle of social media networks such as Facebook, Twitter, LinkedIn, Foursquare, and others sit on an enormous treasure chest of datafied information that, once analyzed, will shed light on social dynamics at all levels, from the individual to society at large.
Smarter Than You Think: How Technology Is Changing Our Minds for the Better
by
Clive Thompson
Published 11 Sep 2013
analyzed the color usage in Van Gogh’s major paintings: Cory Doctorow, “Van Gogh Pie-Charts,” Boing Boing, January 29, 2011, accessed March 23, 2013, boingboing.net/2011/01/29/van-gogh-pie-charts.html. a “sentiment analysis” of the Bible: “Applying Sentiment Analysis to the Bible,” OpenBible.info, October 10, 2011, accessed March 22, 2013, www.openbible.info/blog/2011/10/applying-sentiment-analysis-to-the-bible/. how characters interact in Hamlet: Richard Beck, “Hamlet and the Region of Death,” Boston Globe, May 29, 2011, accessed March 23, 2013, www.boston.com/bostonglobe/ideas/articles/2011/05/29/hamlet_and_the_region_of_death/.
…
Mind you, he’s confident they will. As he points out, fifteen years ago you couldn’t find much on the Web because the search engines were dreadful. “And the first MP3 players were horrendous for finding songs,” he adds. The most promising trends in search algorithms include everything from “sentiment analysis” (you could hunt for a memory based on how happy or sad it is) to sophisticated ways of analyzing pictures, many of which are already emerging in everyday life: detecting faces and locations or snippets of text in pictures, allowing you to hunt down hard-to-track images by starting with a vague piece of half recall, the way we interrogate our own minds.
…
Some of the biggest viral hits in recent years have been witty data crunches from odd, unexpected sources. Arthur Buxton, a young British Web designer, analyzed the color usage in Van Gogh’s major paintings and transformed them into pie charts, challenging viewers to figure out which was which. A group of Christian data nerds did a “sentiment analysis” of the Bible, using algorithms that determine whether a piece of text contains positive or negative language. (“Things start off well with creation, turn negative with Job and the patriarchs, improve again with Moses. . . . In the New Testament, things start off fine with Jesus, then quickly turn negative as opposition to his message grows.”)
Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning
by
Benjamin Bengfort
,
Rebecca Bilbro
and
Tony Ojeda
Published 10 Jun 2018
In the below example, we see that districtdatalabs.com is the website, or the collection of pages, and each of the individual documents listed below it (courses, projects, etc.) represent the individual web pages. districtdatalabs.com ├── / ├── /courses ├── /projects ├── /corporate-offerings ├── /about └── blog.districtdatalabs.com | ├── /an-introduction-to-machine-learning-with-python | ├── /the-age-of-the-data-product | └── /building-a-classifier-from-census-data | └── /modern-methods-for-sentiment-analysis ... The predictability of a common domain name makes systematic data collection simpler and more convenient. However, most ingested HTML does not arrive clean, ordered, and ready for analysis. For one thing, a raw HTML document collected from the web will include much that is not text: advertisements, headers and footers, navigation bars, etc.
…
In fact, most natural language processing uses machine learning in one form or another, from tokenization and part of speech tagging, as we saw in the previous chapter, to named entity recognition, entailment, and parsing. More recently, textual machine learning has enabled applications that utilize sentiment analysis, word sense disambiguation, automatic translation and tagging, scene recognition, captioning, chatbots, and more! Because of Python’s unique role in data science, it is rich in third party machine learning tools, from Scikit-Learn to TensorFlow, as well as language processing tools like NLTK and Gensim.
The Filter Bubble: What the Internet Is Hiding From You
by
Eli Pariser
Published 11 May 2011
He points to DirectLife, a wearable coaching device by Philips that figures out which arguments get people eating more healthily and exercising more regularly. But he told me he’s troubled by some of the possibilities. Knowing what kinds of appeals specific people respond to gives you power to manipulate them on an individual basis. With new methods of “sentiment analysis, it’s now possible to guess what mood someone is in. People use substantially more positive words when they’re feeling up; by analyzing enough of your text messages, Facebook posts, and e-mails, it’s possible to tell good days from bad ones, sober messages from drunk ones (lots of typos, for a start).
…
Given all that, I was a bit surprised when the first weapon he referred me to was a very quotidian one: a thesaurus. The key to changing public opinion, Rendon said, is finding different ways to say the same thing. He described a matrix, with extreme language or opinion on one side and mild opinion on the other. By using sentiment analysis to figure out how people in a country felt about an event—say, a new arms deal with the United States—and identify the right synonyms to move them toward approval, you could “gradually nudge a debate.” “It’s a lot easier to be close to what reality is” and push it in the right direction, he said, than to make up a new reality entirely.
…
PayPal PeekYou persuasion profiling Phantom Public, The (Lippmann) Philby, Kim Phorm Piaget, Jean Picasa Picasso, Pablo PK List Management Plato politics electoral districts and partisans and programmers and voting Popper, Karl postmaterialism predictions present bias priming effect privacy Facebook and facial recognition and genetic Procter & Gamble product recommendations Proulx, Travis Pulitzer, Joseph push technology and pull technology Putnam, Robert Qiang, Xiao Rapleaf Rather, Dan Raz, Guy reality augmented Reality Hunger (Shields) Reddit Rendon, John Republic.com (Sunstein) retargeting RFID chips robots Rodriguez de Montalvo, Garci Rolling Stone Roombas Rotenberg, Marc Rothstein, Mark Rove, Karl Royal Caribbean Rubel, Steve Rubicon Project Rumsfeld, Donald Rushkoff, Douglas Salam, Reihan Sandberg, Sheryl schemata Schmidt, Eric Schudson, Michael Schulz, Kathryn science Scientific American Scorpion sentiment analysis Sentry serendipity Shields, David Shirky, Clay Siegel, Lee signals click Simonton, Dean Singhal, Amit Sleepwalkers, The (Koestler) smart devices Smith, J. Walker social capital social graph Social Graph Symposium Social Network, The Solove, Daniel solution horizon Startup School Steitz, Mark stereotyping Stewart, Neal Stryker, Charlie Sullivan, Danny Sunstein, Cass systematization Taleb, Nassim Nicholas Tapestry TargusInfo Taylor, Bret technodeterminism technology television advertising on mean world syndrome and Tetlock, Philip Thiel, Peter This American Life Thompson, Clive Time Tocqueville, Alexis de Torvalds, Linus town hall meetings traffic transparency Trotsky, Leon Turner, Fred Twitter Facebook compared with Últimas Noticias Unabomber uncanny valley Upshot Vaidhyanathan, Siva video games Wales, Jimmy Wall Street Journal Walmart Washington Post Web site morphing Westen, Drew Where Good Ideas Come From (Johnson) Whole Earth Catalog WikiLeaks Wikipedia Winer, Dave Winner, Langdon Winograd, Terry Wired Wiseman, Richard Woolworth, Andy Wright, David Wu, Tim Yahoo News Upshot Y Combinator Yeager, Sam Yelp You Tube LeanBack Zittrain, Jonathan Zuckerberg, Mark Table of Contents Title Page Copyright Page Dedication Introduction Chapter 1 - The Race for Relevance Chapter 2 - The User Is the Content Chapter 3 - The Adderall Society Chapter 4 - The You Loop Chapter 5 - The Public Is Irrelevant Chapter 6 - Hello, World!
Picnic Comma Lightning: In Search of a New Reality
by
Laurence Scott
Published 11 Jul 2018
It seeks to decipher our feelings in all their modes of expression: in how we communicate online, in our external body language and, even more intimately, in the codes to our emotions that we keep inside our bodies. On social media, our words can often be the only evidence of our feelings. As a result, coders are busy improving their software’s capacity for ‘sentiment analysis’. Also known as opinion-mining, this genre of computer program attempts to discern our moods and feelings in the linguistic patterns of our social-media content. One of the obvious problems is that we don’t always say what we mean. Lotem Peled and Roi Reichart, two researchers in this burgeoning field, have given themselves ‘the novel task7 of sarcasm interpretation’.
…
Thermal-imaging cameras can analyse our heart rates, detecting that flutter of excitement at the verge of a purchase. There are countless8 commercial applications to these technologies, and it has been estimated that the business of detecting and interpreting emotion will be worth more than $36 billion by 2021. But Big Emotion is not satisfied with the remote scrutiny of sentiment analysis and facial-recognition cameras. The branch of this industry that deals in wearables is devoted to a new kind of empathy, an intimate exchange of information between the human body and the biosensors voluntarily strapped to it. One goal of wearable technologies is to judge our moods from quantifiable physiological responses.
…
As a result, in addition to all the relevant offers streaming into our wearables, Mustafa wonders whether, in this disappearing space between our feelings and our actions, ‘we might become more aware of ourselves, and hopefully more tolerant to others’. There is no doubt that small personal voices, silenced for so long, are rightfully being heard, and that the culture that brings us sentiment analysis has also enabled a more coordinated, sustained and empathetic movement towards social justice to mobilise in response to these voices. With this vanishing in-between, some of the shelters for unambiguous immorality and clear abuses of power are being torn down. When BBC Radio’s ethical debate programme, The Moral Maze, discussed the early days of the Harvey Weinstein scandal, the journalist Tim Dowling remarked: ‘Where’s the maze?’
Designing Great Data Products
by
Jeremy Howard
,
Mike Loukides
and
Margit Zwemer
Published 23 Mar 2012
In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters. Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses. These disaster applications are a particularly good example of why data products need simple, well-designed interfaces that produce concrete recommendations.
Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by
Mary L. Gray
and
Siddharth Suri
Published 6 May 2019
They create them by listening to short audio recordings of one sentence in one language, typically English, and entering the translation of the sentence in their mother tongue in an Excel file. Other common types of work on UHRS are market surveys—often restricted by demographics like age, gender, and location—and a task called “sentiment analysis.” In sentiment analysis, workers may look at a series of words, selfies, videos, or audio files and add a word to each data point that describes their sense of the mood of the word, person, action, or sound in front of them. These human insights become the training data for algorithms later shown the same materials.
…
See ratings or reputation score requesters API effect on, 171 bait-and-switch strategy, 83 collaboration of, 132 communication of tasks, 83–84 fluctuations in, 14 identity of, 31–32 inequality of power in, 91–93 information about, 223 n18, 233 n6, 236 n25 Microsoft employees as, 18 needs of, xvii–xviii origin of, 5–6 transaction costs, 70–75 vetting of, 76–77 Reuther, Walter, 47 rider-customers (Uber), 145–46 risk of entrepreneurship, 95 Industrial Revolution, 45–46 mitigation by requesters, 74 reputation score, 70–71, 81–82 scams, 104, 122, 125 workplace safety, xxiii–xxiv, 60, 86, 97, 190, 193–94 See also transaction costs Riyaz, 86–90 Roberts, Sarah T., 19 robots, xviii–xxiii Romney, Mitt, xii Rosie the Riveter, 47 S S&P Global Market Intelligence, 62 safety, workplace algorithmic cruelty, 86 Bangladesh Accord, 193–94 for full-time employment, 60, 97 Good Work Code, 157 industrial era, 45–46 unraveling of, xxiii–xxiv workspaces, 190 safety net, for workers, 189–92 Sanjay, 128–29 Sanjeev, 126 scaffolding technique, 149–50, 164, 240 n11 scams, 104, 122, 125 scheduling 80/20 rule, 103, 118 always-on workers, 104, 105, 126, 150–51, 158–59, 170, 190 control over, 96, 99–100, 108, 157 employer control over, xxvi, 48 experimentalists, 104, 126, 150–51 just-in-time scheduling, 100, 235 n11 MTurk, 5, 79 as priority, 147, 150, 155, 164 Treaty of Detroit, 48 Sears, Mark, 141, 143, 149 self-improvement, 100, 110–13 sentiment analysis, 19 Service Employees International Union, 158–59, 191 service jobs, growth of, 97 Shah, Palak, 157 shared workspaces, 180–81 Singh, Manmohan, 55 skilled work, 39, 51, 97 skills, learning, 100, 110–13 skills gap, 230 n26 Skype, 23, 132, 179 slavery, 40–41, 226 n2 Smart Glasses, 167–68 Smith, Aaron, 219 n2, 242 n2 Smith, Adam, 58 social consequences, algorithmic cruelty, 68–69 social entrepreneurship, 147–55 social environment forums as, 132–33, 164, 239 n8 job validation, 95 need for, 178–80, 233 n6 requesters on, 73–74 in workplaces, 121–23, 173–74 See also collaboration Software Technology Parks of India (STPI), 55 SpaceX, xviii Sparrow Cycling, 142 speech recognition, 30 spinning jenny, 43, 173 Star, Susan Leigh, 238 n2 Starbucks, 28, 100 Stern, Andy, 191 Strauss, Anselm, 238 n1 strikes, 47, 48 subcontracting, Industrial Revolution, 41–42 success, changing definition of, 97–98 Suchman, Lucy, 238 n3 support collaboration, 121–23, 133–37 for on-demand work, 105 as requirement, 162 of workers, 21, 140–43, 149, 240 n11 See also double bottom line; forums Suri, Siddharth, xxvii–xxix, 221 n23 surveys LeadGenius, 224 n27 market surveys, 3, 19 on payment, 90–91 as task, 87, 116, 219 n2, 242 n2 worker motivation, 100 T Taft, Robert A., 48 Taft-Hartley Act, 48–49, 54, 228 n20 Taste of the World, 14 Taylor, Frederick, 227 n6 Team Genius, 88–90 teamwork, 24, 28, 160–61, 164, 182–83 technology AI. see artificial intelligence (AI) APIs. see application programming interface (API) automation, xviii–xxiii, 173–77, 176–77, 243 n5 computers. see computers machinery, 42, 43–44, 58–59, 227 n5 paradox of automation, xxii, 36, 170, 173, 175 Technology, Entertainment and Design (TED).
Artificial Intelligence: A Guide for Thinking Humans
by
Melanie Mitchell
Published 14 Oct 2019
What’s more, such information may have predictive power about other aspects of a person’s life, such as likely voting patterns and responsiveness to certain types of news stories or political ads.8 Furthermore, there have been several efforts, with varying success, to apply “sentiment mining” of, say, economics-related tweets on Twitter to predict stock prices and election outcomes. Putting aside the ethics of these applications of sentiment analysis, let’s focus on how AI systems might be able to classify the sentiment of sentences like the ones above. While it’s quite easy for humans to see that these mini-reviews are all negative, getting a program to do this kind of classification in a general way is much harder than it might seem at first glance.
…
Looking at single words or short sequences in isolation is generally not sufficient to glean the overall sentiment; it’s necessary to capture the semantics of words in the context of the whole sentence. Soon after deep networks started to excel in computer vision and speech recognition, NLP practitioners experimented with applying them to sentiment analysis. As usual, the idea is to train the network on many human-labeled examples of sentences with both positive and negative sentiment and have the network itself learn useful features that allow it to output a classification confidence for “positive” or “negative” on a new sentence. But first, how can we get a neural network to process a sentence?
…
On the other hand, the “black hat” attackers—hackers who are actually trying to fool deployed systems for nefarious purposes—don’t publish the tricks they have come up with, so there might be many additional kinds of vulnerabilities of these systems of which we’re not yet aware. As far as I know, to date there has not been a real-world attack of these kinds on deep-learning systems, but I’d say it’s only a matter of time until we hear about such attacks. While deep learning has produced some very significant advances in speech recognition, language translation, sentiment analysis, and other areas of NLP, human-level language processing remains a distant goal. Christopher Manning, a Stanford professor and NLP luminary, noted this in 2017: “So far, problems in higher-level language processing have not seen the dramatic error rate reductions from deep learning that have been seen in speech recognition and in object recognition in vision.… The really dramatic gains may only have been possible on true signal processing tasks.”30 It seems to me to be extremely unlikely that machines could ever reach the level of humans on translation, reading comprehension, and the like by learning exclusively from online data, with essentially no real understanding of the language they process.
Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by
Aurélien Géron
Published 13 Mar 2017
More generally, they can work on sequences of arbitrary lengths, rather than on fixed-sized inputs like all the nets we have discussed so far. For example, they can take sentences, documents, or audio samples as input, making them extremely useful for natural language processing (NLP) systems such as automatic translation, speech-to-text, or sentiment analysis (e.g., reading movie reviews and extracting the rater’s feeling about the movie). Moreover, RNNs’ ability to anticipate also makes them capable of surprising creativity. You can ask them to predict which are the most likely next notes in a melody, then randomly pick one of these notes and play it.
…
Besides the long training time, a second problem faced by long-running RNNs is the fact that the memory of the first inputs gradually fades away. Indeed, due to the transformations that the data goes through when traversing an RNN, some information is lost after each time step. After a while, the RNN’s state contains virtually no trace of the first inputs. This can be a showstopper. For example, say you want to perform sentiment analysis on a long review that starts with the four words “I loved this movie,” but the rest of the review lists the many things that could have made the movie even better. If the RNN gradually forgets the first four words, it will completely misinterpret the review. To solve this problem, various types of cells with long-term memory have been introduced.
…
GRU computations Creating a GRU cell in TensorFlow is trivial: gru_cell = tf.contrib.rnn.GRUCell(num_units=n_neurons) LSTM or GRU cells are one of the main reasons behind the success of RNNs in recent years, in particular for applications in natural language processing (NLP). Natural Language Processing Most of the state-of-the-art NLP applications, such as machine translation, automatic summarization, parsing, sentiment analysis, and more, are now based (at least in part) on RNNs. In this last section, we will take a quick look at what a machine translation model looks like. This topic is very well covered by TensorFlow’s awesome Word2Vec and Seq2Seq tutorials, so you should definitely check them out. Word Embeddings Before we start, we need to choose a word representation.
This Is Not Normal: The Collapse of Liberal Britain
by
William Davies
Published 28 Sep 2020
If risk modelling (using notions of statistical normality) was the defining research technique of the nineteenth and twentieth centuries, sentiment analysis is the defining one of the emerging digital era. We no longer have stable, ‘factual’ representations of the world, but unprecedented new capacities to sense and monitor what is bubbling up where, who’s feeling what, what’s the general vibe. Financial markets are themselves far more like tools of sentiment analysis (representing the mood of investors) than producers of ‘facts’. This is why it was so absurd to look to currency markets and spread-betters for the truth of what would happen in the referendum: they could only give a sense of what certain people felt would happen in the referendum at certain times.
Targeted: The Cambridge Analytica Whistleblower's Inside Story of How Big Data, Trump, and Facebook Broke Democracy and How It Can Happen Again
by
Brittany Kaiser
Published 21 Oct 2019
And testing done by the data scientists and digital strategists, such as putting money behind a controlled set of ads versus targeted issues, could show (by measuring everything from the percentage increase in viewers’ favorability for Donald Trump to the percentage increase in the viewers’ intention to vote for him) if that campaign was working to convert impressions into votes. Besides Molly’s sets of dashboards, the team had access to data from “sentiment analysis platforms” such as Synthesio and Crimson Hexagon, which measured the effect, positive or negative, that all the campaign’s tweets, including Trump’s, were having.2 For example, if the campaign put out a video of Hillary calling Trump supporters “deplorables,” it could put money behind a few different versions of the ad and watch its performance in real time to determine how many people were watching, whether they paused the video, and whether they finished watching the video.
…
In this tumultuous moment, I pursued the former idea quietly, by reaching out to social justice and human rights contacts. I saw more clearly than ever that CA might be able to use Big Data to help diplomats manage crises in conflict zones. I brainstormed ways that AI and new language recognition and sentiment analysis could assist us in processing massive amounts of war crimes testimony, finding patterns in it. Perhaps psychographic modeling, which had been deployed on the U.S. population—to, I felt, disastrous effect—could be used to create regime change where it was most needed. I worked with Robert Murtfeld to reach out to Fatou Bensouda, the prosecutor of the International Criminal Court, and the U.S. ambassador-at-large for war crimes, Stephen Rapp, and we began to explore some options.
…
Trump Make America Great Again; Understanding the Voting Electorate,” PowerPoint presentation, Cambridge Analytica office, New York, December 7, 2016. 5.Lauren Etter, Vernon Silver, and Sarah Frier, “How Facebook’s Political Unit Enables the Dark Art of Digital Propaganda,” Bloomberg.com, December 21, 2017, https://www.bloomberg.com/news/features/2017–12–21/inside-the-facebook-team-helping-regimes-that-reach-out-and-crack-down. 6.Nancy Scola, “How Facebook, Google, and Twitter ‘Embeds’ Helped Trump in 2016,” Politico, October 26, 2017, https://www.politico.com/story/2017/10/26/facebook-google-twitter-trump-244191. 11: BREXIT BRITTANY 1.Jeremy Herron and Anna-Louise Jackson, “World Markets Roiled by Brexit as Stocks, Pound Drop; Gold Soars,” Bloomberg.com, June 23, 2016, https://www.bloomberg.com/news/articles/2016–06–23/pound-surge-builds-as-polls-show-u-k-to-remain-in-eu-yen-slips. 2.Aaron Wherry, “Canadian Company Linked to Data Scandal Pushes Back at Whistleblower’s Claims: AggregateIQ Denies Links to Scandal-Plagued Cambridge Analytica,” CBC, April 24, 2018, https://www.cbc.ca/news/politics/aggregate-iq-mps-cambridge-wylie-brexit-1.4633388. 13: POSTMORTEM 1. Nancy Scola, “How Facebook, Google, and Twitter ‘Embeds’ Helped Trump in 2016,” Politico, October 26, 2017, https://www.politico.com/story/2017/10/26/facebook-google-twitter-trump-244191. 2. Sentiment analysis has its roots, interestingly enough, in the innovations Robert Mercer pioneered years before at IBM. For the campaign, it measured not only if people liked tweets or retweeted them but something more nuanced: whether tweeters were feeling positive or negative when composing their tweets. 3.Glenn Kessler, “Did Michelle Obama Throw Shade at Hillary Clinton?”
Humans Are Underrated: What High Achievers Know That Brilliant Machines Never Will
by
Geoff Colvin
Published 3 Aug 2015
More advanced software can examine those faces and spot the muscle movements from Ekman’s system. The possibilities of such technology prompted six PhDs at the University of California at San Diego to form Emotient and to recruit Ekman to their advisory board. Point a video camera at any person’s face, and the company’s Sentiment Analysis software can tell you that person’s overall sentiment (positive, negative, neutral) plus display a continually updating bar chart showing levels of seven primary emotions—joy, surprise, sadness, fear, disgust, contempt, anger—and two advanced emotions, frustration and confusion (advanced because they’re combinations of other emotions).
…
Incorporate the software into Google Glass, as the company has done, and the emotion readouts for anyone you’re looking at appear before your eyes (and yes, several people quickly noted that the emotion you may very well detect is contempt for you because you’re wearing Google Glass). Emotient’s initial target market for selling the Sentiment Analysis system was retailers, but the possibilities are obviously much broader. Affectiva, a spin-off from MIT’s Media Lab, also uses Ekman’s research to analyze facial expressions, selling its software to marketers and advertisers so they can conduct consumer research online using webcams. No need to get your research subjects into a focus group and guess what they’re thinking; just have them talk to you online and let their faces tell the story.
The Twittering Machine
by
Richard Seymour
Published 20 Aug 2019
This is about a social industry. As an industry it is able, through the production and harvesting of data, to objectify and quantify social life in numerical form. As William Davies has argued, its unique innovation is to make social interactions visible and susceptible to data analytics and sentiment analysis.6 This makes social life eminently susceptible to manipulation on the part of governments, parties and companies who buy data services. But more than that, it produces social life; it programmes it. This is what it means when we spend more hours tapping on the screen than talking to anyone face to face; that our social life is governed by algorithm and protocol.
…
It is for cultural reasons, external to the logic of the platform, that such content can pose a threat, by inviting government regulation or encouraging users to disconnect. Even then, there is little the platforms can do without upsetting the ecologies of attention and data creation. For example, Facebook’s efforts to demonstrate conscientious engagement include changing the content of someone’s feed if the machine’s sentiment analysis discloses that they might be at risk of suicide. A page offering help for suicidal people might appear in the feed. Friends of the possible suicide might see an enlarged ‘report post’ button. But what if there are perverse incentives that arise from features that are intrinsic to the profit model?
The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences
by
Rob Kitchin
Published 25 Aug 2014
Examples would include entity extraction that automatically extracts metadata from text by searching for particular types of text and phrasing, such as person names, locations, dates, specialised terms and product terminology, and entity relation extraction that automatically identifies the relationships between semantic entities, linking them together (e.g., person name to birth date or location, or an opinion to an item) (McCreary 2009). A typical application of such techniques is sentiment analysis which seeks to determine the general nature and strength of opinions about an issue, for example, what people are saying about a product on social media. By using placemark metadata it is also possible to track where such sentiment is expressed (Graham et al. 2013) and to mine the dissemination of information within social media, for example, how widely Web addresses are favourited and shared between multiple users (Ohlhorst 2013).
…
Index A/B testing 112 abduction 133, 137, 138–139, 148 accountability 34, 44, 49, 55, 63, 66, 113, 116, 165, 171, 180 address e-mail 42 IP 8, 167, 171 place 8, 32, 42, 45, 52, 93, 171 Web 105 administration 17, 30, 34, 40, 42, 56, 64, 67, 87, 89, 114–115, 116, 124, 174, 180, 182 aggregation 8, 14, 101, 140, 169, 171 algorithm 5, 9, 21, 45, 76, 77, 83, 85, 89, 101, 102, 103, 106, 109, 111, 112, 118, 119, 122, 125, 127, 130, 131, 134, 136, 142, 146, 154, 160, 172, 177, 179, 181, 187 Amazon 72, 96, 131, 134 Anderson, C. 130, 135 Andrejevic, M. 133, 167, 178 animation 106, 107 anonymity 57, 63, 79, 90, 92, 116, 167, 170, 171, 172, 178 apophenia 158, 159 Application Programming Interfaces (APIs) 57, 95, 152, 154 apps 34, 59, 62, 64, 65, 78, 86, 89, 90, 95, 97, 125, 151, 170, 174, 177 archive 21, 22, 24, 25, 29–41, 48, 68, 95, 151, 153, 185 archiving 23, 29–31, 64, 65, 141 artificial intelligence 101, 103 Acxiom 43, 44 astronomy 34, 41, 72, 97 ATM 92, 116 audio 74, 77, 83 automatic meter reading (AMR) 89 automatic number plate recognition (ANPR) 85, 89 automation 32, 51, 83, 85, 87, 89–90, 98, 99, 102, 103, 118, 127, 136, 141, 146, 180 Ayasdi 132, 134 backup 29, 31, 40, 64, 163 barcode 74, 85, 92, Bates, J. 56, 61, 62, 182 Batty, M. 90, 111, 112, 140 Berry, D. 134, 141 bias 13, 14, 19, 28, 45, 101, 134–136, 153, 154, 155, 160 Big Brother 126, 180 big data xv, xvi, xvii, 2, 6, 13, 16, 20, 21, 27–29, 42, 46, 67–183, 186, 187, 188, 190, 191, 192 analysis 100–112 characteristics 27–29, 67–79 enablers 80–87 epistemology 128–148 ethical issues 165–183 etymology 67 organisational issues 160–163 rationale 113–127 sources 87–99 technical issues 149–160 biological sciences 128–129, 137 biometric data 8, 84, 115 DNA 8, 71, 84 face 85, 88, 105 fingerprints 8, 9, 84, 87, 88, 115 gait 85, 88 iris 8, 84, 88 bit-rot 20 blog 6, 95, 170 Bonferroni principle 159 born digital 32, 46, 141 Bowker, G. 2, 19, 20, 22, 24 Borgman, C. 2, 7, 10, 20, 30, 37, 40, 41 boyd, D. 68, 75, 151, 152, 156, 158, 160, 182 Brooks, D. 130, 145 business 1, 16, 42, 45, 56, 61, 62, 67, 79, 110, 113–127, 130, 137, 149, 152, 161, 166, 172, 173, 187 calculative practices 115–116 Campbell’s Law 63, 127 camera 6, 81, 83, 87, 88, 89, 90, 107, 116, 124, 167, 178, 180 capitalism 15, 16, 21, 59, 61, 62, 86, 95, 114, 119–123, 126, 136, 161, 184, 186 capta 2 categorization 6, 8, 12, 19, 20, 102, 106, 176 causation 130, 132, 135, 147 CCTV 87, 88, 180 census 17, 18, 19, 22, 24, 27, 30, 43, 54, 68, 74, 75, 76, 77, 87, 102, 115, 157, 176 Centro De Operações Prefeitura Do Rio 124–125, 182 CERN 72, 82 citizen science 97–99, 155 citizens xvi, 45, 57, 58, 61, 63, 71, 88, 114, 115, 116, 126, 127, 165, 166, 167, 174, 176, 179, 187 citizenship 55, 115, 170, 174 classification 6, 10, 11, 23, 28, 104, 105, 157, 176 clickstream 43, 92, 94, 120, 122, 154, 176 clustering 103, 104, 105, 106, 110, 122 Codd, E. 31 competitiveness xvi, 16, 114, computation 2, 4, 5, 6, 29, 32, 68, 80, 81–82, 83, 84, 86, 98, 100, 101, 102, 110, 129, 136, 139–147, 181 computational social science xiv, 139–147, 152, 186 computing cloud xv, 81, 86 distributed xv, 37, 78, 81, 83, 98 mobile xv, 44, 78, 80, 81, 83, 85, 139 pervasive 81, 83–84, 98, 124 ubiquitous 80, 81, 83–84, 98, 100, 124, 126 confidence level 14, 37, 133, 153, 160 confidentiality 8, 169, 175 control creep 126, 166, 178–179 cookies 92, 119, 171 copyright 16, 30, 40, 49, 51, 54, 96 correlation 105, 110, 130, 131, 132, 135, 145, 147, 157, 159 cost xv, 6, 11, 16, 27, 31, 32, 37, 38, 39, 40, 44, 52, 54, 57, 58, 59, 61, 66, 80, 81, 83, 85, 93, 96, 100, 116, 117, 118, 120, 127, 150 Crawford, K. 68, 75, 135, 151, 152, 155, 156, 158, 160, 182 credit cards 8, 13, 42, 44, 45, 85, 92, 167, 171, 176 risk 42, 63, 75, 120, 176, 177 crime 55, 115, 116, 123, 175, 179 crowdsourcing 37, 73, 93, 96–97, 155, 160 Cukier, K. 68, 71, 72, 91, 114, 128, 153, 154, 161, 174 customer relationship management (CRM) 42, 99, 117–118, 120, 122, 176 cyber-infrastructure 33, 34, 35, 41, 186 dashboard 106, 107, 108 data accuracy 12, 14, 110, 153, 154, 171 administrative 84–85, 89, 115, 116, 125, 150, 178 aggregators see data brokers amplification 8, 76, 99, 102, 167 analogue 1, 3, 32, 83, 88, 140, 141 analytics 42, 43, 63, 73, 80, 100–112, 116, 118, 119, 120, 124, 125, 129, 132, 134, 137, 139, 140, 145, 146, 149, 151, 159, 160, 161, 176, 179, 186, 191 archive see archive assemblage xvi, xvii, 2, 17, 22, 24–26, 66, 80, 83, 99, 117, 135, 139, 183, 184–192 attribute 4, 8–9, 31, 115, 150 auditing 33, 40, 64, 163 authenticity 12, 153 automated see automation bias see bias big see big data binary 1, 4, 32, 69 biometric see biometric data body 177–178, 187 boosterism xvi, 67, 127, 187, 192 brokers 42–45, 46, 57, 74, 75, 167, 183, 186, 187, 188, 191 calibration 13, 20 catalogue 32, 33, 35 clean 12, 40, 64, 86, 100, 101, 102, 152, 153, 154, 156 clearing house 33 commodity xvi, 4, 10, 12, 15, 16, 41, 42–45, 56, 161 commons 16, 42 consolidators see data brokers cooked 20, 21 corruption 19, 30 curation 9, 29, 30, 34, 36, 57, 141 definition 1, 2–4 deluge xv, 28, 73, 79, 100, 112, 130, 147, 149–151, 157, 168, 175 derived 1, 2, 3, 6–7, 8, 31, 32, 37, 42, 43, 44, 45, 62, 86, 178 deserts xvi, 28, 80, 147, 149–151, 161 determinism 45, 135 digital 1, 15, 31, 32, 67, 69, 71, 77, 82, 85, 86, 90, 137 directories 33, 35 dirty 29, 154, 163 dive 64–65, 188 documentation 20, 30, 31, 40, 64, 163 dredging 135, 147, 158, 159 dump 64, 150, 163 dynamic see dynamic data enrichment 102 error 13, 14, 44, 45, 101, 110, 153, 154, 156, 169, 175, 180 etymology 2–3, 67 exhaust 6–7, 29, 80, 90 fidelity 34, 40, 55, 79, 152–156 fishing see data dredging formats xvi, 3, 5, 6, 9, 22, 25, 30, 33, 34, 40, 51, 52, 54, 65, 77, 102, 153, 156, 157, 174 framing 12–26, 133–136, 185–188 gamed 154 holding 33, 35, 64 infrastructure xv, xvi, xvii, 2, 21–24, 25, 27–47, 52, 64, 102, 112, 113, 128, 129, 136, 140, 143, 147, 148, 149, 150, 156, 160, 161, 162, 163, 166, 184, 185, 186, 188, 189, 190, 191, 192 integration 42, 149, 156–157 integrity 12, 30, 33, 34, 37, 40, 51, 154, 157, 171 interaction 43, 72, 75, 85, 92–93, 94, 111, 167 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 156–157, 163, 184 interval 5, 110 licensing see licensing lineage 9, 152–156 linked see linked data lost 5, 30, 31, 39, 56, 150 markets xvi, 8, 15, 25, 42-45, 56, 59, 75, 167, 178 materiality see materiality meta see metadata mining 5, 77, 101, 103, 104–106, 109, 110, 112, 129, 132, 138, 159, 188 minimisation 45, 171, 178, 180 nominal 5, 110 ordinal 5, 110 open see open data ontology 12, 28, 54, 150 operational 3 ownership 16, 40, 96, 156, 166 preparation 40, 41, 54, 101–102 philosophy of 1, 2, 14, 17–21, 22, 25, 128–148, 185–188 policy 14, 23, 30, 33, 34, 37, 40, 48, 64, 160, 163, 170, 172, 173, 178 portals 24, 33, 34, 35 primary 3, 7–8, 9, 50, 90 preservation 30, 31, 34, 36, 39, 40, 64, 163 protection 15, 16, 17, 20, 23, 28, 40, 45, 62, 63, 64, 167, 168–174, 175, 178, 188 protocols 23, 25, 30, 34, 37 provenance 9, 30, 40, 79, 153, 156, 179 qualitative 4–5, 6, 14, 146, 191 quantitative 4–5, 14, 109, 127, 136, 144, 145, 191 quality 12, 13, 14, 34, 37, 40, 45, 52, 55, 57, 58, 64, 79, 102, 149, 151, 152–156, 157, 158 raw 1, 2, 6, 9, 20, 86, 185 ratio 5, 110 real-time 65, 68, 71, 73, 76, 88, 89, 91, 99, 102, 106, 107, 116, 118, 121, 124, 125, 139, 151, 181 reduction 5, 101–102 representative 4, 8, 13, 19, 21, 28 relational 3, 8, 28, 44, 68, 74–76, 79, 84, 85, 87, 88, 99, 100, 119, 140, 156, 166, 167, 184 reliability 12, 13–14, 52, 135, 155 resellers see data brokers resolution 7, 26, 27, 28, 68, 72, 73–74, 79, 84, 85, 89, 92, 133–134, 139, 140, 150, 180 reuse 7, 27, 29, 30, 31, 32, 39, 40, 41, 42, 46, 48, 49–50, 52, 56, 59, 61, 64, 102, 113, 163 scaled xvi, xvii 32, 100, 101, 112, 138, 149, 150, 163, 186 scarcity xv, xvi, 28, 80, 149–151, 161 science xvi, 100–112, 130, 137–139, 148, 151, 158, 160–163, 164, 191 secondary 3, 7–8 security see security selection 101, 176 semi-structured 4, 5–6, 77, 100, 105 sensitive 15, 16, 45, 63, 64, 137, 151, 167, 168, 171, 173, 174 shadow 166–168, 177, 179, 180 sharing 9, 11, 20, 21, 23, 24, 27, 29–41, 48–66, 80, 82, 95, 113, 141, 151, 174, 186 small see small data social construction 19–24 spatial 17, 52, 63, 68, 73, 75, 84–85, 88–89 standards xvi, 9, 14, 19, 22, 23, 24, 25, 31, 33, 34, 38, 40, 52, 53, 64, 102, 153, 156, 157 storage see storage stranded 156 structures 4, 5–6, 12, 21, 23, 30, 31, 40, 51, 68, 77, 86, 103, 106, 156 structured 4, 5–6, 11, 32, 52, 68, 71, 75, 77, 79, 86, 88, 105, 112, 163 tertiary 7–8, 9, 27, 74 time-series 68, 102, 106, 110 transient 6–7, 72, 150 transactional 42, 43, 71, 72, 74, 75, 85, 92, 93–94, 120, 122, 131, 167, 175, 176, 177 uncertainty see uncertainty unstructured 4, 5–6, 32, 52, 68, 71, 75, 77, 86, 100, 105, 112, 140, 153, 157 validity 12, 40, 72, 102, 135, 138, 154, 156, 158 variety 26, 28, 43, 44, 46, 68, 77, 79, 86, 139, 140, 166, 184 velocity 26, 28, 29, 68, 76–77, 78, 79, 86, 88, 102, 106, 112. 117, 140, 150, 153, 156, 184 veracity 13, 79, 102, 135, 152–156, 157, 163 volume 7, 26, 27, 28, 29, 32, 46, 67, 68, 69–72, 74, 76, 77, 78, 79, 86, 102, 106, 110, 125, 130, 135, 140, 141, 150, 156, 166, 184 volunteered 87, 93–98, 99, 155 databank 29, 34, 43 database NoSQL 6, 32, 77, 78, 86–87 relational 5, 6, 8, 32–33, 43, 74–75, 77, 78, 86, 100, 105 data-driven science 133, 137–139, 186 data-ism 130 datafication 181 dataveillance 15, 116, 126, 157, 166–168, 180, 181, 182, 184 decision tree 104, 111, 122, 159, deconstruction 24, 98, 126, 189–190 decontextualisation 22 deduction 132, 133, 134, 137, 138, 139, 148 deidentification 171, 172, 178 democracy 48, 55, 62, 63, 96, 117, 170 description 9, 101, 104, 109, 143, 147, 151, 190 designated community 30–31, 33, 46 digital devices 13, 25, 80, 81, 83, 84, 87, 90–91, 167, 174, 175 humanities xvi, 139–147, 152, 186 object identifier 8, 74 serendipity 134 discourse 15, 20, 55, 113–114, 117, 122, 127, 192 discursive regime 15, 20, 24, 56, 98, 113–114, 116, 123, 126, 127, 190 disruptive innovation xv, 68, 147, 184, 192 distributed computing xv, 37, 78, 81, 83, 98 sensors 124, 139, 160 storage 34, 37, 68, 78, 80, 81, 85–87, 97 division of labour 16 Dodge, M. 2, 21, 68, 73, 74, 76, 83, 84, 85, 89, 90, 92, 93, 96, 113, 115, 116, 124, 154, 155, 167, 177, 178, 179, 180, 189 driver’s licence 45, 87, 171 drone 88, Dublin Core 9 dynamic data xv, xvi, 76–77, 86, 106, 112 pricing 16, 120, 123, 177 eBureau 43, 44 ecological fallacy 14, 102, 135, 149, 158–160 Economist, The 58, 67, 69, 70, 72, 128 efficiency 16, 38, 55, 56, 59, 66, 77, 93, 102, 111, 114, 116, 118, 119, 174, 176 e-mail 71, 72–73, 82, 85, 90, 93, 116, 174, 190 empiricism 129, 130–137, 141, 186 empowerment 61, 62–63, 93, 115, 126, 165 encryption 171, 175 Enlightenment 114 Enterprise Resource Planning (ERP) 99, 117, 120 entity extraction 105 epistemology 3, 12, 19, 73, 79, 112, 128–148, 149, 185, 186 Epsilon 43 ethics 12, 14–15, 16, 19, 26, 30, 31, 40, 41, 64, 73, 99, 128, 144, 151, 163, 165–183, 186 ethnography 78, 189, 190, 191 European Union 31, 38, 45, 49, 58, 59, 70, 157, 168, 173, 178 everyware 83 exhaustive 13, 27, 28, 68, 72–73, 79, 83, 88, 100, 110, 118, 133–134, 140, 150, 153, 166, 184 explanation 101, 109, 132, 133, 134, 137, 151 extensionality 67, 78, 140, 184 experiment 2, 3, 6, 34, 75, 78, 118, 129, 131, 137, 146, 150, 160 Facebook 6, 28, 43, 71, 72, 77, 78, 85, 94, 119, 154, 170 facts 3, 4, 9, 10, 52, 140, 159 Fair Information Practice Principles 170–171, 172 false positive 159 Federal Trade Commission (FTC) 45, 173 flexibility 27, 28, 68, 77–78, 79, 86, 140, 157, 184 Flickr 95, 170 Flightradar 107 Floridi, L. 3, 4, 9, 10, 11, 73, 112, 130, 151 Foucault, M. 16, 113, 114, 189 Fourth paradigm 129–139 Franks, B. 6, 111, 154 freedom of information 48 freemium service 60 funding 15, 28, 29, 31, 34, 37, 38, 40, 41, 46, 48, 52, 54–55, 56, 57–58, 59, 60, 61, 65, 67, 75, 119, 143, 189 geographic information systems 147 genealogy 98, 127, 189–190 Gitelman, L. 2, 19, 20, 21, 22 Global Positioning System (GPS) 58, 59, 73, 85, 88, 90, 121, 154, 169 Google 32, 71, 73, 78, 86, 106, 109, 134, 170 governance 15, 21, 22, 23, 38, 40, 55, 63, 64, 66, 85, 87, 89, 117, 124, 126, 136, 168, 170, 178–182, 186, 187, 189 anticipatory 126, 166, 178–179 technocratic 126, 179–182 governmentality xvi, 15, 23, 25, 40, 87, 115, 127, 168, 185, 191 Gray, J. 129–130 Guardian, The 49 Gurstein, M. 52, 62, 63 hacking 45, 154, 174, 175 hackathon 64–65, 96, 97, 188, 191 Hadoop 87 hardware 32, 34, 40, 63, 78, 83, 84, 124, 143, 160 human resourcing 112, 160–163 hype cycle 67 hypothesis 129, 131, 132, 133, 137, 191 IBM 70, 123, 124, 143, 162, 182 identification 8, 44, 68, 73, 74, 77, 84–85, 87, 90, 92, 115, 169, 171, 172 ideology 4, 14, 25, 61, 113, 126, 128, 130, 134, 140, 144, 185, 190 immutable mobiles 22 independence 3, 19, 20, 24, 100 indexical 4, 8–9, 32, 44, 68, 73–74, 79, 81, 84–85, 88, 91, 98, 115, 150, 156, 167, 184 indicator 13, 62, 76, 102, 127 induction 133, 134, 137, 138, 148 information xvii, 1, 3, 4, 6, 9–12, 13, 23, 26, 31, 33, 42, 44, 45, 48, 53, 67, 70, 74, 75, 77, 92, 93, 94, 95, 96, 100, 101, 104, 105, 109, 110, 119, 125, 130, 138, 140, 151, 154, 158, 161, 168, 169, 171, 174, 175, 184, 192 amplification effect 76 freedom of 48 management 80, 100 overload xvi public sector 48 system 34, 65, 85, 117, 181 visualisation 109 information and communication technologies (ICTs) xvi, 37, 80, 83–84, 92, 93, 123, 124 Innocentive 96, 97 INSPIRE 157 instrumental rationality 181 internet 9, 32, 42, 49, 52, 53, 66, 70, 74, 80, 81, 82, 83, 86, 92, 94, 96, 116, 125, 167 of things xv, xvi, 71, 84, 92, 175 intellectual property rights xvi, 11, 12, 16, 25, 30, 31, 40, 41, 49, 50, 56, 62, 152, 166 Intelius 43, 44 intelligent transportation systems (ITS) 89, 124 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 149, 156–157, 163, 184 interpellation 165, 180, 188 interviews 13, 15, 19, 78, 155, 190 Issenberg, S. 75, 76, 78, 119 jurisdiction 17, 25, 51, 56, 57, 74, 114, 116 Kafka 180 knowledge xvii, 1, 3, 9–12, 19, 20, 22, 25, 48, 53, 55, 58, 63, 67, 93, 96, 110, 111, 118, 128, 130, 134, 136, 138, 142, 159, 160, 161, 162, 187, 192 contextual 48, 64, 132, 136–137, 143, 144, 187 discovery techniques 77, 138 driven science 139 economy 16, 38, 49 production of 16, 20, 21, 24, 26, 37, 41, 112, 117, 134, 137, 144, 184, 185 pyramid 9–10, 12, situated 16, 20, 28, 135, 137, 189 Latour, B. 22, 133 Lauriault, T.P. 15, 16, 17, 23, 24, 30, 31, 33, 37, 38, 40, 153 law of telecosm 82 legal issues xvi, 1, 23, 25, 30, 31, 115, 165–179, 182, 183, 187, 188 levels of measurement 4, 5 libraries 31, 32, 52, 71, 141, 142 licensing 14, 25, 40, 42, 48, 49, 51, 53, 57, 73, 96, 151 LIDAR 88, 89, 139 linked data xvii, 52–54, 66, 156 longitudinal study 13, 76, 140, 149, 150, 160 Lyon, D. 44, 74, 87, 167, 178, 180 machine learning 5, 6, 101, 102–104, 106, 111, 136, 188 readable 6, 52, 54, 81, 84–85, 90, 92, 98 vision 106 management 62, 88, 117–119, 120, 121, 124, 125, 131, 162, 181 Manovich, L. 141, 146, 152, 155 Manyika, J. 6, 16, 70, 71, 72, 104, 116, 118, 119, 120, 121, 122, 161 map 5, 22, 24, 34, 48, 54, 56, 73, 85, 88, 93, 96, 106, 107, 109, 115, 143, 144, 147, 154, 155–156, 157, 190 MapReduce 86, 87 marginal cost 11, 32, 57, 58, 59, 66, 151 marketing 8, 44, 58, 73, 117, 119, 120–123, 131, 176 marketisation 56, 61–62, 182 materiality 4, 19, 21, 24, 25, 66, 183, 185, 186, 189, 190 Mattern, S. 137, 181 Mayer-Schonberger, V. 68, 71, 72, 91, 114, 153, 154, 174 measurement 1, 3, 5, 6, 10, 12, 13, 15, 19, 23, 69, 97, 98, 115, 128, 166 metadata xvi, 1, 3, 4, 6, 8–9, 13, 22, 24, 29, 30, 31, 33, 35, 40, 43, 50, 54, 64, 71, 72, 74, 78, 85, 91, 93, 102, 105, 153, 155, 156 methodology 145, 158, 185 middleware 34 military intelligence 71, 116, 175 Miller, H.J. xvi, 27, 100, 101, 103, 104, 138, 139, 159 Minelli, M. 101, 120, 137, 168, 170, 171, 172, 174, 176 mixed methods 147, 191 mobile apps 78 computing xv, 44, 78, 80, 81, 83, 85, 139 mapping 88 phones 76, 81, 83, 90, 93, 151, 168, 170, 175 storage 85 mode of production 16 model 7, 11, 12, 24, 32, 37, 44, 57, 72, 73, 101, 103, 105, 106, 109, 110–112, 119, 125, 129, 130, 131, 132, 133, 134, 137, 139, 140, 144, 145, 147, 158–159, 166, 181 agent-based model 111, business 30, 54, 57–60, 61, 95, 118, 119, 121 environmental 139, 166 meteorological 72 time-space 73 transportation 7 modernity 3 Moore’s Law 81, moral philosophy 14 Moretti, F. 141–142 museum 31, 32, 137 NASA 7 National Archives and Records Administration (NARA) 67 National Security Agency (NSA) 45, 116 natural language processing 104, 105 near-field communication 89, 91 neoliberalism 56, 61–62, 126, 182 neural networks 104, 105, 111 New Public Management 62, non-governmental organisations xvi, 43, 55, 56, 73, 117 non-excludable 11, 151 non-rivalrous 11, 57, 151 normality 100, 101 normative thinking 12, 15, 19, 66, 99, 127, 144, 182, 183, 187, 192 Obama, B. 53, 75–76, 78, 118–119 objectivity 2, 17, 19, 20, 62, 135, 146, 185 observant participation 191 oligopticon 133, 167, 180 ontology 3, 12, 17–21, 22, 28, 54, 79, 128, 138, 150, 156, 177, 178, 184, 185 open data xv, xvi, xvii, 2, 12, 16, 21, 25, 48–66, 97, 114, 124, 128, 129, 140, 149, 151, 163, 164, 167, 186, 187, 188, 190, 191, 192 critique of 61–66 economics of 57–60 rationale 54–56 Open Definition 50 OpenGovData 50, 51 Open Knowledge Foundation 49, 52, 55, 58, 189, 190 open science 48, 72, 98 source 48, 56, 60, 87, 96 OpenStreetMap 73, 93, 96, 154, 155–156 optimisation 101, 104, 110–112, 120, 121, 122, 123 Ordnance Survey 54, 57 Organization for Economic Cooperation and Development (OECD) 49, 50, 59 overlearning 158, 159 panoptic 133, 167, 180 paradigm 112, 128–129, 130, 138, 147, 148, 186 participant observation 190, 191 participation 48, 49, 55, 66, 82, 94, 95, 96, 97–98, 126, 155, 165, 180 passport 8, 45, 84, 87, 88, 115 patent 13, 16, 41, 51 pattern recognition 101, 104–106, 134, 135 personally identifiable information 171 philanthropy 32, 38, 58 philosophy of science 112, 128–148, 185–188 phishing 174, 175 phone hacking 45 photography 6, 43, 71, 72, 74, 77, 86, 87, 88, 93, 94, 95, 105, 115, 116, 141, 155, 170 policing 80, 88, 116, 124, 125, 179 political economy xvi, 15–16, 25, 42–45, 182, 185, 188, 191 Pollock, R. 49, 54, 56, 57 58, 59 positivism 129, 136–137, 140, 141, 144, 145, 147 post-positivism 140, 144, 147 positionality 135, 190 power/knowledge 16, 22 predictive modelling 4, 7, 12, 34, 44, 45, 76, 101, 103, 104, 110–112, 118, 119, 120, 125, 132, 140, 147, 168, 179 profiling 110–112, 175–178, 179, 180 prescription 101 pre-analytical 2, 3, 19, 20, 185 pre-analytics 101–102, 112 pre-factual 3, 4, 19, 185 PRISM 45, 116 privacy 15, 28, 30, 40, 45, 51, 57, 63, 64, 96, 117, 163, 165, 166, 168–174, 175, 178, 182, 187 privacy by design 45, 173, 174 probability 14, 110, 153, 158 productivity xvi, 16, 39, 55, 66, 92, 114, 118 profiling 12, 42–45, 74, 75, 110–112, 119, 166, 168, 175–178, 179, 180, 187 propriety rights 48, 49, 54, 57, 62 prosumption 93 public good 4, 12, 16, 42, 52, 56, 58, 79, 97 –private partnerships 56, 59 sector information (PSI) 12, 48, 54, 56, 59, 61, 62 quantified self 95 redlining 176, 182 reductionism 73, 136, 140, 142, 143, 145 regression 102, 104, 105, 110, 111, 122 regulation xvi, 15, 16, 23, 25, 40, 44, 46, 83, 85, 87, 89–90, 114, 115, 123, 124, 126, 168, 174, 178, 180, 181–182, 187, 192 research design 7, 13, 14, 77–78, 98, 137–138, 153, 158 Renaissance xvi, 129, 141 repository 29, 33, 34, 41 representativeness 13, 14, 19, 21 Resource Description Framework (RDF) 53, 54 remote sensing 73–74, 105 RFID 74, 85, 90, 91, 169 rhetorical 3, 4, 185 right to be forgotten 45, 172, 187 information (RTI) 48, 62 risk 16, 44, 58, 63, 118, 120, 123, 132, 158, 174, 176–177, 178, 179, 180 Rosenberg, D. 1, 3 Ruppert, E. 22, 112, 157, 163, 187 sampling 13, 14, 27, 28, 46, 68, 72, 73, 77, 78, 88, 100, 101, 102, 120, 126, 133, 138, 139, 146, 149–150, 152, 153, 154, 156, 159 scale of economy 37 scanners 6, 25, 29, 32, 83, 85, 88, 89, 90, 91, 92, 175, 177, 180 science xvi, 1, 2, 3, 19, 20, 29, 31, 34, 37, 46, 65, 67, 71, 72, 73, 78, 79, 97, 98, 100, 101, 103, 111, 112, 128–139, 140, 147, 148, 150, 158, 161, 165, 166, 181, 184, 186 scientific method 129, 130, 133, 134, 136, 137–138, 140, 147, 148, 186 security data 28, 33, 34, 40, 45, 46, 51, 57, 126, 157, 166, 169, 171, 173, 174–175, 182, 187 national 42, 71, 88, 116–117, 172, 176, 178, 179 private 99, 115, 118, 151 social 8, 32, 45, 87, 115, 171 segmentation 104, 105, 110, 119, 120, 121, 122, 176 semantic information 9, 10, 11, 105, 157 Web 49, 52, 53, 66 sensors xv, 6, 7, 19, 20, 24, 25, 28, 34, 71, 76, 83, 84, 91–92, 95, 124, 139, 150, 160 sentiment analysis 105, 106, 121, Siegel, E. 103, 110, 111, 114, 120, 132, 158, 176, 179 signal 9, 151, 159 Silver, N. 136, 151, 158 simulation 4, 32, 37, 101, 104, 110–112, 119, 129, 133, 137, 139, 140 skills 37, 48, 52, 53, 57, 63, 94, 97, 98, 112, 149, 160–163, 164 small data 21, 27–47, 68, 72, 75, 76, 77, 79, 100, 103, 110, 112, 146, 147, 148, 150, 156, 160, 166, 184, 186, 188, 191 smart cards 90 cities 91, 92, 99, 124–125, 181–182 devices 83 metering 89, 123, 174 phones 81, 82, 83, 84, 90, 94, 107, 121, 155, 170, 174 SmartSantander 91 social computing xvi determinism 144 media xv, 13, 42, 43, 76, 78, 90, 93, 94–95, 96, 105, 119, 121, 140, 150, 151, 152, 154, 155, 160, 167, 176, 180 physics 144 security number 8, 32, 45, 87, 115, 171 sorting 126, 166, 168, 175–178, 182 sociotechnical systems 21–24, 47, 66, 183, 185, 188 software 6, 20, 32, 34, 40, 48, 53, 54, 56, 63, 80, 83, 84, 86, 88, 96, 132, 143, 160, 161, 163, 166, 170, 172, 175, 177, 180, 189 Solove, D. 116, 120, 168, 169, 170, 172, 176, 178, 180 solutionism 181 sousveillance 95–96 spatial autocorrelation 146 data infrastructure 34, 35, 38 processes 136, 144 resolution 149 statistics 110 video 88 spatiality 17, 157 Star, S.L. 19, 20, 23, 24 stationarity 100 statistical agencies 8, 30, 34, 35, 115 geography 17, 74, 157 statistics 4, 8, 13, 14, 24, 48, 77, 100, 101, 102, 104, 105, 109–110, 111, 129, 132, 134, 135, 136, 140, 142, 143, 145, 147, 159 descriptive 4, 106, 109, 147 inferential 4, 110, 147 non-parametric 105, 110 parametric 105, 110 probablistic 110 radical 147 spatial 110 storage 31–32, 68, 72, 73, 78, 80, 85–87, 88, 100, 118, 161, 171 analogue 85, 86 digital 85–87 media 20, 86 store loyalty cards 42, 45, 165 Sunlight Foundation 49 supervised learning 103 Supply Chain Management (SCM) 74, 99, 117–118, 119, 120, 121 surveillance 15, 71, 80, 83, 87–90, 95, 115, 116, 117, 123, 124, 151, 165, 167, 168, 169, 180 survey 6, 17, 19, 22, 28, 42, 68, 75, 77, 87, 115, 120 sustainability 16, 33, 34, 57, 58, 59, 61, 64–66, 87, 114, 123–124, 126, 155 synchronicity 14, 95, 102 technological handshake 84, 153 lock-in 166, 179–182 temporality 17, 21, 27, 28, 32, 37, 68, 75, 111, 114, 157, 160, 186 terrorism 116, 165, 179 territory 16, 38, 74, 85, 167 Tesco 71, 120 Thrift, N. 83, 113, 133, 167, 176 TopCoder 96 trading funds 54–55, 56, 57 transparency 19, 38, 44, 45, 48–49, 55, 61, 62, 63, 113, 115, 117, 118, 121, 126, 165, 173, 178, 180 trust 8, 30, 33, 34, 40, 44, 55, 84, 117, 152–156, 163, 175 trusted digital repository 33–34 Twitter 6, 71, 78, 94, 106, 107, 133, 143, 144, 146, 152, 154, 155, 170 uncertainty 10, 13, 14, 100, 102, 110, 156, 158 uneven development 16 Uniform Resource Identifiers (URIs) 53, 54 United Nations Development Programme (UNDP) 49 universalism 20, 23, 133, 140, 144, 154, 190 unsupervised learning 103 utility 1, 28, 53, 54, 55, 61, 63, 64–66, 100, 101, 114, 115, 134, 147, 163, 185 venture capital 25, 59 video 6, 43, 71, 74, 77, 83, 88, 90, 93, 94, 106, 141, 146, 170 visual analytics 106–109 visualisation 5, 10, 34, 77, 101, 102, 104, 106–109, 112, 125, 132, 141, 143 Walmart 28, 71, 99, 120 Web 2.0 81, 94–95 Weinberger, D. 9, 10, 11, 96, 97, 132, 133 White House 48 Wikipedia 93, 96, 106, 107, 143, 154, 155 Wired 69, 130 wisdom 9–12, 114, 161 XML 6, 53 Zikopoulos, P.C. 6, 16, 68, 70, 73, 76, 119, 151
Data Action: Using Data for Public Good
by
Sarah Williams
Published 14 Sep 2020
Longley, “The Geography of Twitter Topics in London,” Computers, Environment and Urban Systems 58 (2016): 85–96; Bernd Resch et al., “Citizen-Centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm,” Urban Planning 1, no. 2 (2016): 114–127, https://doi.org/10.17645/up.v1i2.617. 42 Anna Kovacs-Gyori et al., “#London2012: Towards Citizen-Contributed Urban Planning Through Sentiment Analysis of Twitter Data,” Urban Planning 3, no. 1 (2018): 75–99, https://doi.org/10.17645/up.v3i1.1287; Ayelet Gal-Tzur et al., “The Potential of Social Media in Delivering Transport Policy Goals,” Transport Policy 32 (2014): 115–123, https://doi.org/10.1016/j.tranpol.2014.01.007. 43 Gary W. Evans, Environmental Stress (Cambridge: Cambridge University Press, 1984). 44 Luca Maria Aiello, Rossano Schifanella, Danielle Quercia, and Francesco Aietta, “Chatty Maps: Constructing Sound Maps of Urban Areas from Social Media Data,” Royal Society Open Science 3, no. 3 (March 1, 2016): 150690, https://doi.org/10.1098/rsos.150690. 45 Ibid. 46 Daniele Quercia et al., “Smelly Maps: The Digital Life of Urban Smellscapes,” ArXiv:1505.06851 [Cs.SI], May 26, 2015, http://arxiv.org/abs/1505.06851. 47 Aiello et al., “Chatty Maps.” 48 Anselin and Williams, “Digital Neighborhoods.” 49 Justin Cranshaw et al., “The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City,” Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, 2012. 50 T.
…
Electronic Journal of Information Systems in Developing Countries 25, no. 1 (June 2006): 1–12. https://doi.org/10.1002/j.1681-4835.2006.tb00169.x. Kovacs-Gyori, Anna, Alina Ristea, Clemens Havas, Bernd Resch, and Pablo Cabrera-Barona. “#London2012: Towards Citizen-Contributed Urban Planning Through Sentiment Analysis of Twitter Data.” Urban Planning 3, no. 1 (2018): 75–99. https://doi.org/10.17645/up.v3i1.1287. Krambeck, Holly. Interview with Holly Krambeck, transportation specialist, World Bank. Phone call, August 2018. Krambeck, Holly. “The Open Transport Partnership.” presented at the Transforming Transportation, Washington, DC.
AI in Museums: Reflections, Perspectives and Applications
by
Sonja Thiel
and
Johannes C. Bernhardt
Published 31 Dec 2023
Another source of data used by museums to identify potential visitors or analyse the quality of an exhibit are assessments of social media posts or tourist website ratings. These analyses can inform strategic decisions within organizations, particularly around communication and operational activities. As reported by French and Villaespesa (2019), some museums are already assessing comments posted on platforms such as TripAdvisor using sentiment analysis techniques and topic modelling. These techniques, which can be grouped under the category of AI techniques, enable museums to analyse feedback from thousands of visitors and provide insights on how to improve exhibits, visitor experience, orientation in the museum, and their communication of the events organized.
…
This is nonetheless particularly challenging because of varying user expectations, and some users may attempt to test the intelligence and capabilities of the system, such as its ability to understand natural language, process text, and perform specific Ana Müller, Michael Schiffmann, Anke Neumeister, Anja Richert: Exploring Beyond the Exhibits tasks like counting or sentiment analysis, driven by motivations other than a need for information and service, such as curiosity about the system’s abilities. Besides the challenges posed by producing knowledge and acknowledging the diversity and varying demands of different users, additional phenomena of human-robot interaction must also be addressed and recognized from the perspectives of various users and stakeholders.
The Age of Surveillance Capitalism
by
Shoshana Zuboff
Published 15 Jan 2019
The rendition of “personality” was an important milestone in this quest: a frontier, yes, but not the final frontier. III. Machine Emotion In 2015 an eight-year-old startup named Realeyes won a 3.6 million euro grant from the European Commission for a project code-named “SEWA: Automatic Sentiment Analysis in the Wild.” The aim was “to develop automated technology that will be able to read a person’s emotion when they view content and then establish how this relates to how much they liked the content.” The director of video at AOL International called the project “a huge leap forward in video ad tech” and “the Holy Grail of video marketing.”86 Just a year later, Realeyes won the commission’s Horizon 2020 innovation prize thanks to its “machine learning-based tools that help market researchers analyze the impact of their advertising and make it more relevant.”87 The SEWA project is a window on a burgeoning new domain of rendition and behavioral surplus supply operations known as “affective computing,” “emotion analytics,” and “sentiment analysis.”
…
The director of video at AOL International called the project “a huge leap forward in video ad tech” and “the Holy Grail of video marketing.”86 Just a year later, Realeyes won the commission’s Horizon 2020 innovation prize thanks to its “machine learning-based tools that help market researchers analyze the impact of their advertising and make it more relevant.”87 The SEWA project is a window on a burgeoning new domain of rendition and behavioral surplus supply operations known as “affective computing,” “emotion analytics,” and “sentiment analysis.” The personalization project descends deeper toward the ocean floor with these new tools, where they lay claim to yet a new frontier of rendition trained not only on your personality but also on your emotional life. If this project of surplus from the depths is to succeed, then your unconscious—where feelings form before there are words to express them—must be recast as simply one more source of raw-material supply for machine rendition and analysis, all of it for the sake of more-perfect prediction.
…
Reddy, “A Real Time Facial Emotion Recognition Using Depth Sensor and Interfacing with Second Life Based Virtual 3D Avatar,” in International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014), 2014, 1–7, https://doi.org/10.1109/ICRAIE.2014.6909153. 90. “Sewa Project: Automatic Sentiment Analysis in the Wild,” SEWA, April 25, 2017, https://sewaproject.eu/description. 91. Mihkel Jäätma, “Realeyes—Emotion Measurement,” Realeyes Data Services, 2016, https://www.realeyesit.com/Media/Default/Whitepaper/Realeyes_White paper.pdf. 92. Mihkel Jäätma, “Realeyes—Emotion Measurement.” 93. Alex Browne, “Realeyes—Play Your Audience Emotions to Stay on Top of the Game,” Realeyes, February 21, 2017, https://www.realeyesit.com/blog/play-your-audience-emotions. 94.
Big Data Analytics: Turning Big Data Into Big Money
by
Frank J. Ohlhorst
Published 28 Nov 2012
Such talent is in high demand, and the need for data analysts and data scientists continues to grow at an almost exponential rate. Finding this talent means that organizations will have to focus on data science and hire statistical modelers and text data–mining professionals as well as people who specialize in sentiment analysis. Success with Big Data analytics requires solid data models, statistical predictive models, and test analytic models, since these will be the core applications needed to do Big Data. Locating the appropriate talent takes more than just a typical IT job placement; the skills required for a good return on investment are not simple and are not solely technology oriented.
Relevant Search: With Examples Using Elasticsearch and Solr
by
Doug Turnbull
and
John Berryman
Published 30 Apr 2016
Or they may find 20 duplicates of the same document, which would have the effect of pushing other relevant documents off the end of the search results page. Second, often the existing data can be post-processed to augment the features already there. For instance, machine-learning techniques can be used to classify or cluster documents. Or sentiment analysis can be used to determine whether the text in a document is more positive or negative in tone. The possibilities are endless. After this new metadata is attached to the documents, it can serve as a valuable feature for users to search upon. Finally, new information can be merged into the documents from external sources.
…
relevance-blind enterprise, 2nd relevance-centered enterprise business and domain awareness content curation risk of miscommunication with content curator role of content curator feedback learning to rank paired relevance tuning test-driven relevance using with user behavioral data user-focused culture vs. data-driven culture relevance-focused search application deploying designing combine and balance signals combining and balancing signals defining and modeling signals user experience improving information and requirements gathering business needs required and available information users and information needs law of diminishing returns monitoring requests library reranking rescoring response page retail_analyzer filter retail_syn_filter filter retention reweighting boosts S salient features scale variable scorable units score boost, 2nd score shaping boosting additive, with Boolean queries multiplicative, with function queries, 2nd signals defined filtering Solr strategies for achieving users’ recency goals capturing general-quality metrics combining function queries high-value tiers scored with function queries ignoring TF × IDF modeling boosting signals ranking scored documents scoring tiers, 2nd script scoring, 2nd search content exploring providing to search engine searching document search and retrieval aggregations Boolean search facets filtering Lucene-based search positional and phrase matching ranked results relevance sorting documents inverted index data structure analysis enrichment extraction indexing search antipattern search completion choosing method for from documents being searched from user input via specialized search indexes search engineer search relevance collaboration and curation and defined difficulty of class of search and lack of single solution feedback and gaining skills of relevance engineer information retrieval research into systematic approach for improving search-as-you-type searchable data semantic expansion sentiment analysis sentinel tokens, 2nd sharding short-tail application SHOULD clause, 2nd, 3rd, 4th, 5th signal construction signal discordance, 2nd avoiding combining fields into custom all fields mechanics of solving with cross_fields search signal measuring signal modeling best_fields calibrating controlling field preference in results more-precise signals field synchronicity and most_fields, 2nd boosting in when additional matches don’t matter signals boosting, 2nd combining and balancing behavior of signal weights building queries for related signals combining subqueries tuning and testing overall search tuning relevance parameters concept defined defining and modeling implementing source data model silli token similarity simple constants SimpleText data structure, 2nd snippet highlighting Solr analyzers analysis and mapping features building custom field mappings boosting additive, with Boolean queries boosting feature mappings multiplicative, with function queries feedback faceted browsing field collapsing match phrase prefix relevance feedback feature mappings suggestion and highlighting components multifield search all fields cross_fields search ergonomics query differences between Solr and Elasticsearch query feature mappings term-centric and field-centric search with edismax query parser sorting source data model span queries specificity, modeling with paths with synonyms standard analyzer, 2nd, 3rd, 4th standard filter, 2nd standard tokenizer, 2nd, 3rd, 4th, 5th standard_clone analyzer stemming stop filter stop words, 2nd, 3rd stored fields storing metadata string types subdivided text subobjects subquadrants suggest clause suggest endpoint, 2nd suggestion field sum_other_doc_count synonyms augmenting content with modeling specificity with overview, 2nd T term dictionary, 2nd term filter term frequency.
Artificial Intelligence: A Modern Approach
by
Stuart Russell
and
Peter Norvig
Published 14 Jul 2019
RNNs can also be used for sentence-level (or document-level) classification tasks, in which a single output comes at the end, rather than having a stream of outputs, one per time step. For example in sentiment analysis the goal is to classify a text as having either Positive or Negative sentiment. For example, “This movie was poorly written and poorly acted” should be classified as Negative. (Some sentiment analysis schemes use more than two categories, or use a numeric scalar value.) Using RNNs for a sentence-level task is a bit more complex, since we need to obtain an aggregate whole-sentence representation, y from the per-word outputs yt of the RNN.
…
This is known as an n-gram model (from the Greek root gramma meaning “written thing”): a sequence of written symbols of length n is called an n-gram, with special cases “unigram” for 1-gram, “bigram” for 2-gram, and “trigram” for 3-gram. In an n-gram model, the probability of each word is dependent only on the n – 1 previous words; that is: N-gram models work well for classifying newspaper sections, as well as for other classification tasks such as spam detection (distinguishing spam email from non-spam), sentiment analysis (classifying a movie or product review as positive or negative) and author attribution (Hemingway has a different style and vocabulary than Faulkner or Shakespeare). 24.1.3Other n-gram models An alternative to an n-gram word model is a character-level model in which the probability of each character is determined by the n – 1 previous characters.
…
Recently we have seen other test sets, such as the AI2 ARC test set of basic science questions (Clark et al., 2018). Summary The main points of this chapter are as follows: •Probabilistic language models based on n-grams recover a surprising amount of information about a language. They can perform well on such diverse tasks as language identification, spelling correction, sentiment analysis, genre classification, and named-entity recognition. •These language models can have millions of features, so preprocessing and smoothing the data to reduce noise is important. •In building a statistical language system, it is best to devise a model that can make good use of available data, even if the model seems overly simplistic.
Graph Databases
by
Ian Robinson
,
Jim Webber
and
Emil Eifrem
Published 13 Jun 2013
Queries are primarily graph local, in that they start with one or more identifiable subjects, whether people or resources, and thereafter discover surrounding portions of the graph. Taken together, social networks and recommendation engines provide key differenti‐ ating capabilities in the areas of retail, recruitment, sentiment analysis, search, and knowledge management. Graphs are a good fit for the densely connected data structures germane to each of these areas; storing and querying this data using a graph database allows an application to surface end-user realtime results that reflect recent changes to the data, rather than pre-calculated, stale results.
Keeping Up With the Quants: Your Guide to Understanding and Using Analytics
by
Thomas H. Davenport
and
Jinho Kim
Published 10 Jun 2013
While unstructured prints are an input to the process, the actual analysis to match them up doesn’t use the unstructured images, but rather structured information extracted from them. An example everyone will appreciate is the analysis of text. Let’s consider the now popular approach of social media sentiment analysis. Are tweets, Facebook postings, and other social comments directly analyzed to determine their sentiment? Not really. The text is parsed into words or phrases. Then, those words and phrases are flagged as good or bad. In a simple example, perhaps a “good” word gets a 1, a “bad” word gets a –1, and a “neutral” word gets a 0.
The Gig Economy: A Critical Introduction
by
Jamie Woodcock
and
Mark Graham
Published 17 Jan 2020
Figure 3(a) The availability of cloudwork Source: https://geonet.oii.ox.ac.uk/blog/mapping-the-availability-of-online-labour-in-2019/ Figure 3(b) The location of cloudworkers on the five largest English-language platforms Source: https://geonet.oii.ox.ac.uk/blog/mapping-the-availability-of-online-labour-in-2019/ Amazon’s Mechanical Turk – the world’s most well-known microwork platform – refers to these tasks as ‘artificial artificial intelligence’. These are tasks that usually rely on a distinctly human ability to interpret things (for instance image recognition or sentiment analysis). These are tasks that might, in theory, be performed by AI, but are cheaper and/or quicker to simply outsource to human workers. For some types of task, it may not be a simple case of humans or artificial intelligence, but rather human microworkers embedded into otherwise automated systems through application programming interfaces (APIs).
Data Mining: Concepts and Techniques: Concepts and Techniques
by
Jiawei Han
,
Micheline Kamber
and
Jian Pei
Published 21 Jun 2011
This is followed by deriving patterns within the structured data, and evaluation and interpretation of the output. “High quality” in text mining usually refers to a combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity-relation modeling (i.e., learning relations between named entities). Other examples include multilingual data mining, multidimensional text analysis, contextual text mining, and trust and evolution analysis in text data, as well as text mining applications in security, biomedical literature analysis, online media analysis, and analytical customer relationship management.
…
Cormack [BCC10]; Manning, Raghavan, and Schutze [MRS08]; Grossman and Frieder [GR04]; Baeza-Yates and Riberio-Neto [BYRN11]; Zhai [Zha08]; Feldman and Sanger [FS06]; Berry [Ber03]; and Weiss, Indurkhya, Zhang, and Damerau [WIZD04]. Text mining is a fast-developing field with numerous papers published in recent years, covering many topics such as topic models (e.g., Blei and Lafferty [BL09]); sentiment analysis (e.g., Pang and Lee [PL07]); and contextual text mining (e.g., Mei and Zhai [MZ06]). Web mining is another focused theme, with books like Chakrabarti [Cha03a], Liu [Liu06] and Berry [Ber03]. Web mining has substantially improved search engines with a few influential milestone works, such as Brin and Page [BP98]; Kleinberg [Kle99]; Chakrabarti, Dom, Kumar, et al.
…
Uncertainty in Artificial Intelligence (UAI’99) Stockholm, Sweden. (1999), pp. 541–550. [PKZT01] Papadias, D.; Kalnis, P.; Zhang, J.; Tao, Y., Efficient OLAP operations in spatial data warehouses, In: Proc. 2001 Int. Symp. Spatial and Temporal Databases (SSTD’01) Redondo Beach, CA. (July 2001), pp. 443–459. [PL07] Pang, B.; Lee, L., Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2 (2007) 1–135. [Pla98] Platt, J.C., Fast training of support vector machines using sequential minimal optimization, In: (Editors: Schölkopf, B.; Burges, C.J.C.; Smola, A.) Advances in Kernel Methods—Support Vector Learning (1998) MIT Press, Cambridge, MA, pp. 185–208.
Human + Machine: Reimagining Work in the Age of AI
by
Paul R. Daugherty
and
H. James Wilson
Published 15 Jan 2018
Neural networks that convert audio signals to text signals in a variety of languages. Applications include translation, voice command and control, audio transcription, and more. Natural language processing (NLP). A field in which computers process human (natural) languages. Applications include speech recognition, machine translation, and sentiment analysis. AI Applications Component Intelligent agents. Agents that interact with humans via natural language. They can be used to augment human workers working in customer service, human resources, training, and other areas of business to handle FAQ-type inquiries. Collaborative robotics (cobots).
Likewar: The Weaponization of Social Media
by
Peter Warren Singer
and
Emerson T. Brooking
Published 15 Mar 2018
In 2015, China’s official military strategy would put the challenge even more starkly: “War is accelerating its evolution to informatization.” Even the United States, birthplace of the free and open internet, has started to accept netwar as a matter of policy. In 2011, DARPA’s research division, which once created the internet itself, launched the new Social Media in Strategic Communications program to study online sentiment analysis and manipulation. Around the same time, the U.S. military’s Central Command began overseeing Operation Earnest Voice, a several-hundred-million-dollar effort to fight jihadists across the Middle East by distorting Arabic social media conversations. One part of this initiative was the development of an “online persona management service”—essentially sockpuppet software—“to allow one U.S. serviceman or woman to control up to 10 separate identities based all over the world.”
…
Some at these companies believe the next stage is to “hack harassment,” teaching neural networks to understand the flow of online conversation in order to identify trolls and issue them stern warnings before a human moderator needs to get involved. A Google system intended to detect online abuse—not just profanity, but toxic phrases and veiled hostility—has learned to rate sentences on an “attack scale” of 1 to 100. Its conclusions align with those of human moderators about 90 percent of the time. Such neural network–based sentiment analysis can be applied not just to individual conversations, but to the combined activity of every social media user on a platform. In 2017, Facebook began testing an algorithm intended to identify users who were depressed and at risk for suicide. It used pattern recognition to monitor user posts, tagging those suspected to include thoughts of suicide and forwarding them to its content moderation teams.
Humans Need Not Apply: A Guide to Wealth and Work in the Age of Artificial Intelligence
by
Jerry Kaplan
Published 3 Aug 2015
One of my favorite examples is that the number of prepaid cell phone cards purchased is an indicator of the size of certain crops in Africa, because the individual farmers, watching their crops grow, are preparing to contact potential buyers. The more optimistic they are, the more they spend on talk minutes. The latest foray in this arena uses what’s called “sentiment analysis.” Yes, that kind of sentiment— programs at investment banks scour the Internet for positive or negative comments about products and companies, then trade on the information. The typical justification proffered for doing all this is that HFT programs are providing a service to society. They are simply cleaning up inefficiencies in the markets.
The Creative Curve: How to Develop the Right Idea, at the Right Time
by
Allen Gannett
Published 11 Jun 2018
Here is what Vonnegut said: There is no reason why the simple shapes of stories can’t be fed into computers, they are beautiful shapes. Well, could the shapes of stories be fed into computers? Was there any way of proving that these stories possessed recurring patterns? The researcher soon brought together a team of academic superheroes, experts in sentiment analysis, statistics, and computer science. Based at the University of Vermont, they decided to use some of the latest data analysis tools to see if they could uncover patterns in the emotional arcs of stories, just as Vonnegut had suggested. To this end, the researchers downloaded novels from an online database, which also featured public statistics showing how many copies had been downloaded, which allowed the researchers to understand which books were the most popular.
The Formula: How Algorithms Solve All Our Problems-And Create More
by
Luke Dormehl
Published 4 Nov 2014
“If you spend a lot of time blogging it suggests that you’re not quite as good a programmer as someone who spends their time on Quora,” Ming says, referring to the question-and-answer website founded by two former Facebook employees. Even Twitter feeds are mined for their insights, using semantic and sentiment analysis. At the end, factors are combined to give prospective employees a “Gild Score” out of 100. “It’s very cool if you’re geeky about algorithms, but the really important take-away is that what we end up with is truly independent dimensions for describing people out in the world,” she says. “We’re talking about algorithms whose entire intent and purpose is to aggregate across your entire life to build up a very accurate representation of who you are.”
Mining of Massive Datasets
by
Jure Leskovec
,
Anand Rajaraman
and
Jeffrey David Ullman
Published 13 Nov 2014
For example, we might want to estimate the number of stars that would be assigned to reviews or tweets about a product, even if those reviews do not have star ratings. If we use star-labeled reviews as a training set, we can deduce the words that are most commonly associated with positive and negative reviews (called sentiment analysis). The presence of these words in other reviews can tell us the sentiment of those reviews. 12.1.5Exercises for Section 12.1 EXERCISE 12.1.1Redo Example 12.2 for the following different forms of f(x). (a)Require f(x) = ax; i.e., a straight line through the origin. Is the line that we discussed in the example optimal?
…
-Y., 383 Papert, S., 458 Parent, 333 Park, J.S., 226 Partition, 343 Pass, 199, 202, 209, 214 Path, 367 Paulson, E., 67 PCA, see Principal-component analysis PCY Algorithm, 207, 209, 210 Pearson, K., 414 Pedersen, J., 190 Perceptron, 17, 415, 419, 422, 455 Perfect matching, 273 Permutation, 76, 81 PIG, 66 Pigeonhole principle, 339 Piotte, M., 324 Pivotal condensation, 386 Plagiarism, 69, 195 Pnuts, 66 Point, 228, 257 Point assignment, 230, 241, 332 Polyzotis, A., 66 Position indexing, 115, 116 Positive example, 423 Positive integer, 147 Powell, A.L., 265 Power Iteration, 386 Power iteration, 387 Power law, 12 Predicate, 303 Prefix indexing, 113, 115, 116 Pregel, 42 Principal eigenvector, 158, 385 Principal-component analysis, 384, 391 Priority queue, 236 Priors, 352 Privacy, 269 Probe string, 114 Profile, see Item profile, see User profile Projection, 31, 33 Pruhs, K.R., 291 Pseudoinverse, see Moore–Penrose pseudoinverse Puz, N., 67 Quadratic programming, 442 Query, 125, 144, 261 Query example, 447 R-tree, 265 Rack, 20 Radius, 237, 240, 367 Raghavan, P., 18, 190, 382 Rahm, E., 382 Rajagopalan, S., 18, 190, 382 Ramakrishnan, R., 67, 265, 266 Ramsey, W., 290 Random hyperplanes, 99, 299 Random surfer, 155, 156, 161, 175, 357 Randomization, 215 Rank, 397 Rarest-first order, 287 Rastogi, R., 153, 266 Rating, 293, 296 Reachability, 369 Recommendation system, 16, 292 Recursion, 40 Recursive doubling, 371 Reduce task, 23, 24 Reduce worker, 25, 27 Reducer, 24 Reducer size, 51, 57 Reed, B., 67 Reflexive and transitive closure, 369 Regression, 416, 451, 455 Regularization parameter, 441 Reichsteiner, A., 414 Reina, C., 265 Relation, 30 Relational algebra, 30, 31 Replication, 22 Replication rate, 51, 57 Representation, 253 Representative point, 250 Representative sample, 128 Reservoir sampling, 152 Restart, 358 Retained set, 245 Revenue, 277 Ripple-carry adder, 147 RMSE, see Root-mean-square error Robinson, E., 67 Rocha, L.M., 414 Root-mean-square error, 295, 313, 402 Rosa, M., 382 Rosenblatt, F., 458 Rounding data, 307 Row, see Tuple Row-orthonormal matrix, 402 Rowsum, 252 Royalty, J., 67 S-curve, 84, 93 Saberi, A., 291 Salihoglu, S., 66 Sample, 215, 218, 221, 223, 242, 249, 253 Sampling, 127, 141 Savasere, A., 226 SCC, see Strongly connected component Schapire, R.E., 458 Schema, 30 Schutze, H., 18 Score, 105 Search ad, 268 Search engine, 166, 181 Search query, 125, 155, 176, 268, 285 Second-price auction, 279 Secondary storage, see Disk Selection, 31, 33 Sensor, 124 Sentiment analysis, 422 Set, 76, 112, see also Itemset Set difference, see Difference Shankar, S., 67 Shawe-Taylor, J., 458 Shi, J., 383 Shim, K., 266 Shingle, 72, 85, 109 Shivakumar, N., 226 Shopping cart, 193 Shortest paths, 42 Siddharth, J., 122 Signature, 75, 78, 85 Signature matrix, 78, 83 Silberschatz, A., 153 Silberstein, A., 67 Similarity, 4, 15, 69, 191, 299, 306 Similarity join, 52, 58 Simrank, 357 Singleton, R.C., 153 Singular value, 397, 401, 402 Singular-value decomposition, 312, 384, 397, 406 Six degrees of separation, 369 Sketch, 100 Skew, 26 Sliding window, 126, 142, 148, 257 Smart transitive closure, 372 Smith, B., 324 SNAP, 382 Social Graph, 326 Social network, 16, 325, 326, 384 SON Algorithm, 217 Source, 367 Space, 87, 228 Spam, see also Term spam, see also Link spam, 328, 421 Spam farm, 178, 180 Spam mass, 180, 181 Sparse matrix, 28, 76, 77, 168, 293 Spectral partitioning, 343 Spider trap, 161, 164, 184 Splitting clusters, 255 SQL, 19, 30, 66 Squares, 366 Srikant, R., 226 Srivastava, U., 67 Standard deviation, 245, 247 Standing query, 125 Stanford Network Analysis Platform, see SNAP Star join, 50 Stata, R., 18, 190 Statistical model, 1 Status, 287 Steinbach, M., 18 Stochastic gradient descent, 320, 445 Stochastic matrix, 158, 385 Stop clustering, 234, 238, 240 Stop words, 7, 74, 110, 194, 298 Stream, see Data stream Strength of membership, 355 String, 112 Striping, 29, 168, 170 Strong edge, 328 Strongly connected component, 159, 374 Strongly connected graph, 158, 368 Substochastic matrix, 161 Suffix length, 116 Summarization, 3 Summation, 147 Sun, J., 414 Supercomputer, 19 Superimposed code, see Bloom filter, 152 Supermarket, 193, 214 Superstep, 43 Supervised learning, 415, 417 Support, 192, 216, 218, 221 Support vector, 437 Support-vector machine, 17, 415, 419, 436, 455 Supporting page, 178 Suri, S., 383 Surprise number, 137 SVD, see Singular-value decomposition SVM, see Support-vector machine Swami, A., 226 Symmetric matrix, 346, 384 Szegedy, M., 152 Tag, 298, 329 Tail, 372 Tail length, 135, 376 Tan, P.
The Facebook era: tapping online social networks to build better products, reach new audiences, and sell more stuff
by
Clara Shih
Published 30 Apr 2009
Lexicon tracks the frequency and sentiment of a particular keyword (like your brand) across Facebook wall posts, status messages, and comments (see Figure 8.4). From the Library of Kerri Ross 152 Pa r t I I I Edited by Foxit Reader Copyright(C) by Foxit Software Company,2005-2008 Yo u r Evaluation S te p - B y - S te pOnly. G u i d e to Us i n g Fa ce b o o k fo r B u s i n e s s For Figure 8.4 Lexicon is a keyword frequency and sentiment analysis tool that looks across all Facebook wall posts, status messages, and comments. Lexicon measures frequency in three ways: the number of posts in which the keyword appears, the number of Facebook members who have referenced the keyword in a post, and what percentage of all members reference this keyword.
The Four: How Amazon, Apple, Facebook, and Google Divided and Conquered the World
by
Scott Galloway
Published 2 Oct 2017
Facebook analyzes any resulting behavioral changes on the network whenever a customer switches his or her relationship information. As the following graph shows, single people communicate more on Facebook. It’s part of the preening of courtship. But once they enter a relationship, communication plummets. The Facebook machine tracks this and runs it through a process called “sentiment analysis”—categorizing positive and negative opinions, in words and photos, of each person’s level of happiness. And as you might expect, coupling significantly increases happiness (though there appears to be a dip following the initial euphoria).13 Meyer, Robinson. “When You Fall in Love This Is What Facebook Sees.”
Pax Technica: How the Internet of Things May Set Us Free or Lock Us Up
by
Philip N. Howard
Published 27 Apr 2015
“Entering the new century,” he wrote recently, “whoever controls the internet, especially micro-blog resources, will have the right to control opinions.”44 The Party is aware that political conversations over social media have real-world consequences and can provide a metric of public opinion. Senior officials get exclusive access to social media sentiment analysis through the Party’s media research team. One Chinese pollster blames a 10 percent drop in confidence in the Party to the rapid spread of microblogs.45 When moderates and ideologues are given equal access to digital media, people tend to use social media to marginalize extremism, hate speech, and radical ideas.
AIQ: How People and Machines Are Smarter Together
by
Nick Polson
and
James Scott
Published 14 May 2018
As a result, by the 2000s, speech recognition again hit a plateau, at about 75–80% word-level accuracy. For nearly a decade, progress was discouragingly slow—and not just for speech recognition but also for other tasks in natural language processing that were hampered by a lack of data, from machine translation to sentiment analysis. Post 2010: The Natural Language Revolution Around 2010, everything started to change—slowly at first, then at a startling pace. What drove this change was a massive infusion of data. Jorge Luis Borges once wrote a story called “The Library of Babel,” about a library whose books contained all possible works of prose: that is, all possible orderings of the letters of the alphabet and the basic punctuation marks.
A Hacker's Mind: How the Powerful Bend Society's Rules, and How to Bend Them Back
by
Bruce Schneier
Published 7 Feb 2023
This is all a microcosm of the hacks that will be described in chapters to come. 19 Hacking Computerized Financial Exchanges Today, financial exchanges are all computerized, and the rise of computerization has led to all sorts of novel hacks. Front running, for example, is now much easier to implement and much harder to detect. Tying automated trading to “sentiment analysis”—so that trading programs buy when a stock becomes a meme or sell when bad news goes viral—can make pump-and-dumps and smear campaigns much more profitable. But the most virulent of all modern exchange hacks is high-frequency trading, or HFT. Instead of making use of true, albeit secret, information or disseminating disinformation, HFT exploits public information at lightning speed.
Places of the Heart: The Psychogeography of Everyday Life
by
Colin Ellard
Published 14 May 2015
Theoretically, it’s possible to tag the location at which a tweet occurred with city block precision, but this depends on the privacy settings set by the user of the application. It’s more common for tweets to be coded only to the home city of the tweeter. Nevertheless, the possibilities for using sentiment analysis, or even intention analysis, where text information is mined for clues as to what you plan to do next, will probably play an increasing role in the uses of social media to probe our inner states. With geographical variables added into the mix, this will make available to a wide range of commercial and institutional interests access to the emotional fabric of places.
AI Superpowers: China, Silicon Valley, and the New World Order
by
Kai-Fu Lee
Published 14 Sep 2018
The parents can use this information to enlist a remote tutor through services such as VIPKid, which connects American teachers with Chinese students for online English classes. Remote tutoring has been around for some time, but perception AI now allows these platforms to continuously gather data on student engagement through expression and sentiment analysis. That data continually feeds into a student’s profile, helping the platforms filter for the kinds of teachers that keep students engaged. Almost all of the tools described here already exist, and many are being implemented in different classrooms across China. Taken together, they constitute a new AI-powered paradigm for education, one that merges the online and offline worlds to create a learning experience tailored to the needs and abilities of each student.
Bold: How to Go Big, Create Wealth and Impact the World
by
Peter H. Diamandis
and
Steven Kotler
Published 3 Feb 2015
I used Amazon’s site Mechanical Turk (www.mturk.com) to get those magazine covers analyzed. While MTURK isn’t all that useful for more complicated jobs, it is where to go to get simple, quick tasks done fast. Aggregation and classification jobs tend to be popular uses. Aggregate photographs of red trucks, for example, or write product descriptions, or perform sentiment analysis exercises on thousands of Tweets. Requesters (you) post tasks known as HITs (human intelligence tasks) while workers (called providers) browse among existing tasks and complete them for a monetary payment.16 Another microtask site that I’ve previously relied upon (and with great result) is Fiverr (www.fiverr.com), an online marketplace offering microtasks starting at $5.
Data and the City
by
Rob Kitchin,Tracey P. Lauriault,Gavin McArdle
Published 2 Aug 2017
The stickiness of social media data resists the operationalization in automatic pipelines for knowledge extraction and manifests itself in false positives that can only be identified and resolved by a close reading of the source. This has consequences for the use of big data in urban governance, urban operation centres and predictive policing – applications that often rely on decontextualized data and reductive modes of analysis, such as text mining based on trigger words or dictionary-based sentiment analysis. Ignoring stickiness of context can lead to Sticky data 105 cases where a terrorism suspect identified by unsupervised text analysis turns out to be the journalist who reported on the issue (Currier et al. 2015). In this sense, stickiness points to issues of privacy even within the realm of publicly accessible data sources.
IRL: Finding Realness, Meaning, and Belonging in Our Digital Lives
by
Chris Stedman
Published 19 Oct 2020
Yet the “Bernie bro” perception itself is a result of the internet’s problem of proportion. Yes, there are over-the-top and abusive tweets from some white male Sanders supporters. But they’re not representative; computational social scientist Jeff Winchell looked at tweets from the supporters of each 2020 Democratic candidate, using sentiment analysis to determine how many of them were positive or negative, and told Salon’s Keith A. Spencer that “Bernie followers act pretty much the same on Twitter as any other follower.” Yet while by all available evidence they only represent a sliver of his base, the image of toxic Bernie bros has come to form the basis of how many people understand Sanders’s support more broadly.
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by
Eric Siegel
Published 19 Feb 2013
Eric Gilbert and Karrie Karahalios released the data, code, and models for this research: Eric Gilbert, “Update: Widespread Worry and the Stock Market,” Social.CS.UIUC.EDU, March 13, 2010. http://social.cs.uiuc.edu/people/gilbert/38. Predicting by social media: Sitaram Asu and Bernardo A. Huberman, “Predicting the Future with Social Media,” Cornell University Library, March 29, 2010, arXiv.org, arXiv:1003.5699. http://arxiv.org/abs/1003.5699/. Anshul Mittal and Arpit Goel, “Stock Prediction Using Twitter Sentiment Analysis,” Stanford University Libraries, December 16, 2011. http://cs229.stanford.edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf. Allison Aubrey, “Happiness: It Really Is Contagious,” NPR News, All Things Considered, December 5, 2008. www.npr.org/templates/story/story.php?
Zero to Sold: How to Start, Run, and Sell a Bootstrapped Business
by
Arvid Kahl
Published 24 Jun 2020
At that point, you should stay in close contact with them and see what they need, and if they find something that helps them. Then, learn how you can enable your own product to do that. Misalignment could be caused by something simple, like the wording of your messaging. For example, do your customers understand the phrase "heuristic-based statistical sentiment analysis," or would "find the tone of a message" be clearer? You don't need to dumb it down, but you also shouldn't overcomplicate it. As an engineer, I feel that I need to be as precise as possible. Customers don't necessarily value this as much as you think. Maybe your product is confusing. Your customers don't want to be confused.
Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think
by
James Vlahos
Published 1 Mar 2019
Advances in natural-language processing “could lead to superficially convincing conversations between robots and children in the near future,” the Sharkeys wrote. But there is a vast gulf between “superficially convincing” responses and those of a good human caregiver capable of true understanding and compassion. Affective computing—sentiment analysis from facial expressions, word choice, and tone—would bolster the quality of interaction but only to a limited degree. “A good carer’s response is based on grasping the cause of emotions rather than simply acting on the emotions displayed,” the Sharkeys wrote. “We should respond differently to a child crying because she has lost her toy than because she has been abused.”
The Data Detective: Ten Easy Rules to Make Sense of Statistics
by
Tim Harford
Published 2 Feb 2021
It’s not just the name and credit card details that you gave to Netflix: it’s everything you ever watched on the streaming service, when you watched it—or stopped watching it—and much else besides. When data like these are opportunistically scraped from cyberspace, they may be skewed in all sorts of awkward ways. If we want to put our finger on the pulse of public opinion, for example, we might run a sentiment analysis algorithm on Twitter rather than going to the expense of commissioning an opinion poll. Twitter can supply every message for analysis, although in practice most researchers use a subset of that vast firehose of data. But even if we analyzed every Twitter message—N = All—we would still learn only what Twitter users think, not what the wider world thinks.
Traffic: Genius, Rivalry, and Delusion in the Billion-Dollar Race to Go Viral
by
Ben Smith
Published 2 May 2023
Americans were talking about only two people: Hillary Clinton and Donald Trump. The conversation about Clinton tended to be negative. And nobody could get enough of Donald Trump. In September, a Facebook employee managing the program called Miller to tell her that Facebook were pausing it to upgrade its sentiment analysis, and promised to get back to delivering the data soon. By then, Donald Trump was running wild on the platform. I was a little surprised, but I assumed they’d stopped sending over the news because they found it a little embarrassing: Facebook, whose young employees in Palo Alto prided themselves on being the home of the liberal populism of Oscar Morales and Barack Obama, had been overrun by the angry baby boomers who loved Donald Trump.
Understanding Sponsored Search: Core Elements of Keyword Advertising
by
Jim Jansen
Published 25 Jul 2011
Sponsored-search analytics.╇ With the increased use of check-in and mobile apps, one would expect to see geo-location-based metrics to measure the increase in foot traffic to brick-and-mortar stores based on sponsored-search advertisements, similar to click-to-call metrics now. Certainly, given the increased availability of consumer data, the future will hold sponsored-search metrics beyond impressions, clicks, and conversions. For example, the increasingly social aspects of Web sites, such as reviews and consumer comments, will likely lead to sentiment-analysis metrics that measure the tone of consumer comments about a brand or ad. This data can potentially affect how quality score is calculated. Already, sponsored-search platforms are offering searchers and consumers the ability to rate ads, so integration of reviews from other sites cannot be far behind.
New Laws of Robotics: Defending Human Expertise in the Age of AI
by
Frank Pasquale
Published 14 May 2020
It is effectively the elevation of an alien, non-human intelligence in a system in which human meaning and communication are crucial to legitimacy.39 If we can actually tell people that there is some way in which changing their faces can reduce their likelihood of being criminal, then big data intervention presages exceptionally intense and granular social control. This double bind—between black box manipulation and intimate control—counsels against further work in the area. The same caution should also govern applications of affective computing. The firm Affectiva prides itself on having some of the best “sentiment analysis” in the world. Using databases of millions of faces, coded for emotions, Affectiva says its AI can read sorrow, joy, disgust, and many other feelings from video of faces. The demand for such analysis is great. Beset by a backlog of manual security clearances, the US military is looking for AI that can flag suspect expressions.40 There are numerous police, security, and military applications for emotion detectors, going all the way back to polygraph lie detectors (which are themselves the subject of significant controversy, and banned in US employment contexts).41 Rana el Kaliouby, Affectiva’s CEO and cofounder, has refused to license its technology to governments, but acknowledges that emotion recognition is a double-edged sword, wherever it is used.
Billionaire, Nerd, Savior, King: Bill Gates and His Quest to Shape Our World
by
Anupreeta Das
Published 12 Aug 2024
GatesNotes is a more casual platform for his musings, book recommendations, and updates on his work and thinking, typically written with a staff member. The posts are written in an accessible and engaging style, often laced with humor. Gates Ventures employees commission polls and surveys, usually called “sentiment analysis” in the public relations industry, to gauge what people wanted to hear from Gates about on GatesNotes. The GatesNotes blog was built to “dimensionalize” him, to give readers a better sense of his personality and make him appear more well-rounded, according to one person who worked on these strategies.
Digital Disconnect: How Capitalism Is Turning the Internet Against Democracy
by
Robert W. McChesney
Published 5 Mar 2013
Now they have better options, and consequently much of the media can get thrown overboard. The profit motive pushes this process into new and dangerous frontiers quickly. Increasingly, research—“persuasion profiling”—determines what types of sales pitches are most effective with each individual, and ads are tailored accordingly. Moreover, researchers are now working on “sentiment analysis,” to see what mood a person is in at a particular moment and what products and sales pitches would be most effective.188 Advertisers are at work developing emotional analysis software so webcams can monitor how one’s face responds to what is on the screen. “One way to persuade internet users to grant access to their images,” The Economist notes, “would be to offer them discounts or subscriptions to websites.”189 Pariser chronicles a range of developments on the horizon, including making machines more “human.”
The New Digital Age: Transforming Nations, Businesses, and Our Lives
by
Eric Schmidt
and
Jared Cohen
Published 22 Apr 2013
This allows them to either block a website altogether (e.g., YouTube in Iran) or process web content through “deep-packet inspection.” With deep-packet inspection, special software allows the router to look inside the packets of data that pass through it and check for forbidden words, among other things (the use of sentiment-analysis software to screen out negative statements about politicians, for example), which it can then block. Neither technique is foolproof; users can access blocked sites with circumvention technologies like proxy servers (which trick the routers) or by using secure https encryption protocols (which enable private Internet communication that, at least in theory, cannot be read by anyone other than your computer and the website you are accessing), and deep-packet inspection rarely catches every instance of banned content.
Lean Analytics: Use Data to Build a Better Startup Faster
by
Alistair Croll
and
Benjamin Yoskovitz
Published 1 Mar 2013
Lean Canvas and relevant metrics Lean Canvas box Some relevant metrics Problem Respondents who have this need, respondents who are aware of having the need Solution Respondents who try the MVP, engagement, churn, most-used/least-used features, people willing to pay Unique value proposition Feedback scores, independent ratings, sentiment analysis, customer-worded descriptions, surveys, search, and competitive analysis Customer segments How easy it is to find groups of prospects, unique keyword segments, targeted funnel traffic from a particular source Channels Leads and customers per channel, viral coefficient and cycle, net promoter score, open rate, affiliate margins, click-through rate, PageRank, message reach Unfair advantage Respondents’ understanding of the UVP (Unique Value Proposition), patents, brand equity, barriers to entry, number of new entrants, exclusivity of relationships Revenue streams Lifetime customer value, average revenue per user, conversion rate, shopping cart size, click-through rate Cost structure Fixed costs, cost of customer acquisition, cost of servicing the nth customer, support costs, keyword costs Sean Ellis’s Startup Growth Pyramid Sean Ellis is a well-known entrepreneur and marketer.
Python for Data Analysis
by
Wes McKinney
Published 30 Dec 2011
Even though it may not always be obvious, a large percentage of data sets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a data set into a structured form. As an example, a collection of news articles could be processed into a word frequency table which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data. Why Python for Data Analysis? For many people (myself among them), the Python language is easy to fall in love with.
Surveillance Valley: The Rise of the Military-Digital Complex
by
Yasha Levine
Published 6 Feb 2018
Started in 2007 and built by Lockheed Martin, the system ultimately grew into a full-fledged operational military prediction machine that had modules ingesting all sorts of open source network data—news wires, blogs, social media and Facebook posts, various Internet chatter, and “other sources of information”—and routing it through “sentiment analysis” in an attempt to predict military conflicts, insurgencies, civil wars, coups, and revolutions.18 DARPA’s ICEWS proved to be a success. Its core technology was spun off into a classified, operational version of the same system called ISPAN and absorbed into the US Strategic Command.19 The dream of building a global computer system that could watch the world and predict the future—it had a long and storied history in military circles.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
by
Wes McKinney
Published 25 Sep 2017
Even though it may not always be obvious, a large percentage of datasets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a dataset into a structured form. As an example, a collection of news articles could be processed into a word frequency table, which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data. 1.2 Why Python for Data Analysis? For many people, the Python programming language has strong appeal. Since its first appearance in 1991, Python has become one of the most popular interpreted programming languages, along with Perl, Ruby, and others.
The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt
by
Sinan Aral
Published 14 Sep 2020
The main goal is to understand, second by second, what’s in a video, what it’s about, its context, feelings, and sentiment, and to compare the presence or absence of these elements to key performance indicators (KPIs) like video view-throughs, retention, drop-off rates, clicks, engagement, brand recognition, and satisfaction. By closing the loop of video production, analytics, optimization, and publishing, VidMob can improve its clients’ return on marketing investment. ACS automatically extracts video metadata and performs sentiment analysis. It uses deep learning and computer vision to identify the emotions, objects, logos, people, and words in videos; it can detect facial expressions like delight, surprise, or disgust. It then analyzes how each of these elements corresponds, for instance, to moments when viewers are dropping off from watching the video, and it recommends (and automates) editing that improves retention.
Beautiful Data: The Stories Behind Elegant Data Solutions
by
Toby Segaran
and
Jeff Hammerbacher
Published 1 Jul 2009
Figure 20-1 shows a couple of pie charts I made demonstrating this particular data mashup. F I G U R E 2 0 - 1 . Pie charts resulting from a data mashup of SEC industry data and Center for Responsible Politics political contribution data. (See Color Plate 70.) I haven’t even touched things like linking stock prices to sentiment analysis of message boards, trying to tie together genetics and drug data, or determining whether restaurants in low-income neighborhoods are dirtier (according to the health inspector), but this should give you just a small taste of what’s possible when different data sources are connected. Unfortunately, the difficulty of automatically connecting sets ranges from nontrivial to nearly impossible.
Your Computer Is on Fire
by
Thomas S. Mullaney
,
Benjamin Peters
,
Mar Hicks
and
Kavita Philip
Published 9 Mar 2021
These include but are not limited to: • A variety of what I would describe as “first-order,” more rudimentary, blunt tools that are long-standing and widely adopted, such as keyword ban lists for content and user profiles, URL and content filtering, IP blocking, and other user-identifying mechanisms;13 • More sophisticated automated tools such as hashing technologies used in products like PhotoDNA (used to automate the identification and removal of child sexual exploitation content; other engines based on this same technology do the same with regard to terroristic material, the definitions of which are the province of the system’s owners);14 • Higher-order AI tools and strategies for content moderation and management at scale, examples of which might include: ◦ Sentiment analysis and forecasting tools based on natural language processing that can identify when a comment thread has gone bad or, even more impressive, when it is in danger of doing so;15 ◦ AI speech-recognition technology that provides automatic, automated captioning of video content;16 ◦ Pixel analysis (to identify, for example, when an image or a video likely contains nudity);17 ◦ Machine learning and computer vision-based tools deployed toward a variety of other predictive outcomes (such as judging potential for virality or recognizing and predicting potentially inappropriate content).18 Computer vision was in its infancy when I began my research on commercial content moderation.
Architects of Intelligence
by
Martin Ford
Published 16 Nov 2018
People are pretty good at monitoring the mental states of the people around them, and we know that about 55% of the signals we use are in facial expression and your gestures, while about 38% of the signal we respond to is from tone of voice. So how fast someone is speaking, the pitch, and how much energy is in the voice. Only 7% of the signal is in the text and the actual choice of words that someone uses! Now when you think of the entire industry of sentiment analysis, the multi-billion-dollar industry of people listening to tweets and analyzing text messages and all that, it only accounts for 7% of how humans communicate. What I like to think about what we’re doing here, is trying to capture the other 93% of non-verbal communication. So, back to your questions: about eighteen months ago I started a speech team that looks at these prosodic paralinguistic features.
Seeking SRE: Conversations About Running Production Systems at Scale
by
David N. Blank-Edelman
Published 16 Sep 2018
Automate workflows around “situations,” not individual alerts. Automate ticket categorization based on patterns of behavior. Forecast short-term for service levels and long-term for capacity planning. In addition to these are other existing solutions — for example, for text analysis like spam filtering, sentiment analysis, and information extraction. All of these will hopefully reduce toil and alerts for humans by letting the machine do its job. The Awakening of Applied AI As senior site reliability engineer at my organization, I tend to search for long-term solutions that make the machine do the work for us — the best path to reach durable automation.
Merchants of Truth: The Business of News and the Fight for Facts
by
Jill Abramson
Published 5 Feb 2019
They feared it would give BuzzFeed an unfair advantage, not only in reporting on how voters felt but in keeping that information to themselves and using it, as it did CrowdTangle, to determine what content would perform best with certain audiences. They warned that this might signal the end of the journalistic ideal of objective distance. Despite the flimsy science of “sentiment analysis,” BuzzFeed would read Facebook’s data as an earnest map of American political sentiment and assume that any clues they gleaned reflected truths about the electorate, when what they were actually observing was likely to be a fold in the fabric of Facebook’s own apparatus, a self-fulfilling reflection of its own hand in the conversation.
The Art of SEO
by
Eric Enge
,
Stephan Spencer
,
Jessie Stricchiola
and
Rand Fishkin
Published 7 Mar 2012
Given the nonreporting of many desktop and mobile clients, bitly’s tracking has become a must for those seeking accurate analytics on the pages they share. Radian6 Probably the best known of the social media monitoring tools, Radian6 is geared toward enterprises and large budgets and has impressive social tracking, sentiment analysis, and reporting capabilities. Klout Measures an author’s authority by tracking activity related to many social accounts, including Twitter, Google+, LinkedIn, and others (http://corp.klout.com/blog/2011/08/measuring-klout-on-10-networks/). BackType Another fantastic tool for tracking social metrics, which was acquired by Twitter in 2011.