data science

back to index

description: an interdisciplinary field that uses scientific methods, algorithms, and systems to extract knowledge and insights from structured and unstructured data.

428 results

pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline
by Cathy O'Neil and Rachel Schutt
Published 8 Oct 2013

datafication, Datafication employment opportunities in, Data Science Jobs Facebook and, The Current Landscape (with a Little History) Harvard Business Review, The Current Landscape (with a Little History) history of, The Current Landscape (with a Little History)–Data Science Jobs industry vs. academia in, Getting Past the Hype LinkedIn and, The Current Landscape (with a Little History) meta-definition thought experiment, Thought Experiment: Meta-Definition privacy and, Privacy process of, The Data Science Process–A Data Scientist’s Role in This Process RealDirect case study, Case Study: RealDirect–Sample R code scientific method and, A Data Scientist’s Role in This Process scientists, A Data Science Profile–Thought Experiment: Meta-Definition sociology and, Gabriel Tarde teams, A Data Science Profile Venn diagram of, The Current Landscape (with a Little History) data science competitions, Background: Data Science Competitions Kaggle and, A Single Contestant data scientists, A Data Science Profile–Thought Experiment: Meta-Definition as problem solvers, Being Problem Solvers chief, The Life of a Chief Data Scientist defining, A Data Science Profile ethics of, Being an Ethical Data Scientist–Being an Ethical Data Scientist female, On Being a Female Data Scientist hubris and, Being an Ethical Data Scientist–Being an Ethical Data Scientist in academia, In Academia in industry, In Industry next generation of, What Are Next-Gen Data Scientists?

datafication, Datafication employment opportunities in, Data Science Jobs Facebook and, The Current Landscape (with a Little History) Harvard Business Review, The Current Landscape (with a Little History) history of, The Current Landscape (with a Little History)–Data Science Jobs industry vs. academia in, Getting Past the Hype LinkedIn and, The Current Landscape (with a Little History) meta-definition thought experiment, Thought Experiment: Meta-Definition privacy and, Privacy process of, The Data Science Process–A Data Scientist’s Role in This Process RealDirect case study, Case Study: RealDirect–Sample R code scientific method and, A Data Scientist’s Role in This Process scientists, A Data Science Profile–Thought Experiment: Meta-Definition sociology and, Gabriel Tarde teams, A Data Science Profile Venn diagram of, The Current Landscape (with a Little History) data science competitions, Background: Data Science Competitions Kaggle and, A Single Contestant data scientists, A Data Science Profile–Thought Experiment: Meta-Definition as problem solvers, Being Problem Solvers chief, The Life of a Chief Data Scientist defining, A Data Science Profile ethics of, Being an Ethical Data Scientist–Being an Ethical Data Scientist female, On Being a Female Data Scientist hubris and, Being an Ethical Data Scientist–Being an Ethical Data Scientist in academia, In Academia in industry, In Industry next generation of, What Are Next-Gen Data Scientists?–Being Question Askers questioning as, Being Question Askers role of, in data science process, A Data Scientist’s Role in This Process soft skills of, Cultivating Soft Skills data visualization, Data Visualization and Fraud Detection–Data Visualization Exercise at Square, Data Visualization at Square–Data Visualization at Square Before Us is the Salesman’s House (Thorp/Hansen), eBay Transactions and Books–eBay Transactions and Books Cronkite Plaza (Thorp/Rubin/Hansen), Cronkite Plaza distant reading, Franco Moretti fraud and, The Risk Challenge–Detecting suspicious activity using machine learning Hansen, Mark, Data Visualization and Fraud Detection–Goals of These Exhibits history of, Data Visualization History Lives on a Screen (Thorp/Hansen), Project Cascade: Lives on a Screen machine learning and, Data Science and Risk Moveable Type (Rubin/Hansen), New York Times Lobby: Moveable Type–New York Times Lobby: Moveable Type personal data collection thought experiment, Mark’s Thought Experiment Processing programming language, Processing risk and, Data Science and Risk–Ian’s Thought Experiment samples of, A Sample of Data Visualization Projects–Mark’s Data Visualization Projects Shakespeare Machine (Rubin/Hansen), Public Theater Shakespeare Machine sociology and, Gabriel Tarde tutorials for, Data Visualization for the Rest of Us–Data Visualization for the Rest of Us data visualization exercise, Data Visualization Exercise data-generating processes, Statistical Inference DataEDGE, Challenges in features and learning datafication, Why Now?

There was quite a bit of variation, which is cool—lots of people in the class were coming from social sciences, for example. Where is your data science profile at the moment, and where would you like it to be in a few months, or years? As we mentioned earlier, a data science team works best when different skills (profiles) are represented across different people, because nobody is good at everything. It makes us wonder if it might be more worthwhile to define a “data science team”—as shown in Figure 1-3—than to define a data scientist. Figure 1-3. Data science team profiles can be constructed from data scientist profiles; there should be alignment between the data science team profile and the profile of the data problems they try to solve Thought Experiment: Meta-Definition Every class had at least one thought experiment that the students discussed in groups.

pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking
by Foster Provost and Tom Fawcett
Published 30 Jun 2013

as craft, Superior Data Scientists as strategic asset, Data and Data Science Capability as a Strategic Asset–Data and Data Science Capability as a Strategic Asset baseline methods of, Summary behavior predictions based on past actions, Example: Hurricane Frances Big Data and, Data Processing and “Big Data”–Data Processing and “Big Data” case studies, examining, Examine Data Science Case Studies classification modeling for issues in, Generalizing Beyond Classification cloud labor and, Final Example: From Crowd-Sourcing to Cloud-Sourcing–Final Example: From Crowd-Sourcing to Cloud-Sourcing customer churn, predicting, Example: Predicting Customer Churn data mining about individuals, Privacy, Ethics, and Mining Data About Individuals–Privacy, Ethics, and Mining Data About Individuals data mining and, The Ubiquity of Data Opportunities, Data Mining and Data Science, Revisited–Data Mining and Data Science, Revisited data processing vs., Data Processing and “Big Data”–Data Processing and “Big Data” data science engineers, Deployment data-analytic thinking in, Data-Analytic Thinking–Data-Analytic Thinking data-driven business vs., Data Processing and “Big Data” data-driven decision-making, Data Science, Engineering, and Data-Driven Decision Making–Data Science, Engineering, and Data-Driven Decision Making engineering, Data Science, Engineering, and Data-Driven Decision Making–Data Science, Engineering, and Data-Driven Decision Making engineering and, Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist evolving uses for, From Big Data 1.0 to Big Data 2.0–From Big Data 1.0 to Big Data 2.0 fitting problem to available data, Changing the Way We Think about Solutions to Business Problems–Changing the Way We Think about Solutions to Business Problems fundamental principles, The Ubiquity of Data Opportunities history, Machine Learning and Data Mining human interaction and, What Data Can’t Do: Humans in the Loop, Revisited–What Data Can’t Do: Humans in the Loop, Revisited human knowledge and, What Data Can’t Do: Humans in the Loop, Revisited–What Data Can’t Do: Humans in the Loop, Revisited Hurricane Frances example, Example: Hurricane Frances learning path for, Superior Data Scientists limits of, What Data Can’t Do: Humans in the Loop, Revisited–What Data Can’t Do: Humans in the Loop, Revisited mining mobile device data example, Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data–Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data opportunities for, The Ubiquity of Data Opportunities–The Ubiquity of Data Opportunities principles, Data Science, Engineering, and Data-Driven Decision Making, Business Problems and Data Science Solutions privacy and ethics of, Privacy, Ethics, and Mining Data About Individuals–Privacy, Ethics, and Mining Data About Individuals processes, Data Science, Engineering, and Data-Driven Decision Making software development vs., A Firm’s Data Science Maturity structure, Machine Learning and Data Mining techniques, Data Science, Engineering, and Data-Driven Decision Making technology vs. theory of, Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist–Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist understanding, The Ubiquity of Data Opportunities, Data Processing and “Big Data” data science maturity, of firms, A Firm’s Data Science Maturity–A Firm’s Data Science Maturity data scientists academic, Attracting and Nurturing Data Scientists and Their Teams as scientific advisors, Attracting and Nurturing Data Scientists and Their Teams attracting/nurturing, Attracting and Nurturing Data Scientists and Their Teams–Attracting and Nurturing Data Scientists and Their Teams evaluating, Superior Data Scientists–Superior Data Scientists managing, Superior Data Science Management–Superior Data Science Management Data Scientists, LLC, Attracting and Nurturing Data Scientists and Their Teams data sources, Evaluation, Baseline Performance, and Implications for Investments in Data data understanding, Data Understanding–Data Understanding expected value decomposition and, From an Expected Value Decomposition to a Data Science Solution–From an Expected Value Decomposition to a Data Science Solution expected value framework and, The Expected Value Framework: Structuring a More Complicated Business Problem–The Expected Value Framework: Structuring a More Complicated Business Problem data warehousing, Data Warehousing data-analytic thinking, Data-Analytic Thinking–Data-Analytic Thinking and unbalanced classes, Problems with Unbalanced Classes for business strategies, Thinking Data-Analytically, Redux–Thinking Data-Analytically, Redux data-driven business data science vs., Data Processing and “Big Data” understanding, Data Processing and “Big Data” data-driven causal explanations, Data-Driven Causal Explanation and a Viral Marketing Example–Data-Driven Causal Explanation and a Viral Marketing Example data-driven decision-making, Data Science, Engineering, and Data-Driven Decision Making–Data Science, Engineering, and Data-Driven Decision Making benefits, Data Science, Engineering, and Data-Driven Decision Making discoveries, Data Science, Engineering, and Data-Driven Decision Making repetition, Data Science, Engineering, and Data-Driven Decision Making database queries, as analytic technique, Database Querying–Database Querying database tables, Models, Induction, and Prediction dataset entropy, Example: Attribute Selection with Information Gain datasets, Models, Induction, and Prediction analyzing, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation attributes of, Overfitting in Mathematical Functions cross-validation and, From Holdout Evaluation to Cross-Validation limited, From Holdout Evaluation to Cross-Validation Davis, Miles, Example: Jazz Musicians, Example: Jazz Musicians Deanston single malt scotch, Understanding the Results of Clustering decision boundaries, Visualizing Segmentations, Classification via Mathematical Functions decision lines, Visualizing Segmentations decision nodes, Supervised Segmentation with Tree-Structured Models decision stumps, Evaluation, Baseline Performance, and Implications for Investments in Data decision surfaces, Visualizing Segmentations decision trees, Supervised Segmentation with Tree-Structured Models decision-making, automatic, Data Science, Engineering, and Data-Driven Decision Making deduction, induction vs., Models, Induction, and Prediction Dell, Data preparation, Achieving Competitive Advantage with Data Science demand, local, Example: Hurricane Frances dendrograms, Hierarchical Clustering, Hierarchical Clustering dependent variables, Models, Induction, and Prediction descriptive attributes, Data Mining and Data Science, Revisited descriptive modeling, Models, Induction, and Prediction Dictionary of Distances (Deza & Deza), * Other Distance Functions differential descriptions, * Using Supervised Learning to Generate Cluster Descriptions Digital 100 companies, Data-Analytic Thinking Dillman, Linda, Data Science, Engineering, and Data-Driven Decision Making dimensionality, of nearest-neighbor reasoning, Dimensionality and domain knowledge–Dimensionality and domain knowledge directed marketing example, Targeting the Best Prospects for a Charity Mailing–A Brief Digression on Selection Bias discoveries, Data Science, Engineering, and Data-Driven Decision Making discrete (binary) classifiers, ROC Graphs and Curves discrete classifiers, ROC Graphs and Curves discretized numeric variables, Selecting Informative Attributes discriminants, linear, Linear Discriminant Functions discriminative modeling methods, generative vs., Summary disorder, measuring, Selecting Informative Attributes display advertising, Example: Targeting Online Consumers With Advertisements distance functions, for nearest-neighbor reasoning, * Other Distance Functions–* Other Distance Functions distance, measuring, Similarity and Distance distribution Gaussian, Regression via Mathematical Functions Normal, Regression via Mathematical Functions distribution of properties, Selecting Informative Attributes Doctor Who (television show), Example: Evidence Lifts from Facebook “Likes” document (term), Representation domain knowledge data mining processes and, Dimensionality and domain knowledge nearest-neighbor reasoning and, Dimensionality and domain knowledge–Dimensionality and domain knowledge domain knowledge validation, Associations Among Facebook Likes domains, in association discovery, Associations Among Facebook Likes Dotcom Boom, Results, Formidable Historical Advantage double counting, Costs and benefits draws, statistical, * Logistic Regression: Some Technical Details E edit distance, * Other Distance Functions, * Other Distance Functions Einstein, Albert, Conclusion Elder Research, Attracting and Nurturing Data Scientists and Their Teams Ellington, Duke, Example: Jazz Musicians, Example: Jazz Musicians email, Why Text Is Important engineering, Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist, Business Understanding engineering problems, business problems vs., Other Data Science Tasks and Techniques ensemble method, Bias, Variance, and Ensemble Methods–Bias, Variance, and Ensemble Methods entropy, Selecting Informative Attributes–Selecting Informative Attributes, Selecting Informative Attributes, Example: Attribute Selection with Information Gain, Summary and Inverse Document Frequency, * The Relationship of IDF to Entropy change in, Selecting Informative Attributes equation for, Selecting Informative Attributes graphs, Example: Attribute Selection with Information Gain equations cosine distance, * Other Distance Functions entropy, Selecting Informative Attributes Euclidean distance, Similarity and Distance general linear model, Linear Discriminant Functions information gain (IG), Selecting Informative Attributes Jaccard distance, * Other Distance Functions L2 norm, * Other Distance Functions log-odds linear function, * Logistic Regression: Some Technical Details logistic function, * Logistic Regression: Some Technical Details majority scoring function, * Combining Functions: Calculating Scores from Neighbors majority vote classification, * Combining Functions: Calculating Scores from Neighbors Manhattan distance, * Other Distance Functions similarity-moderated classification, * Combining Functions: Calculating Scores from Neighbors similarity-moderated regression, * Combining Functions: Calculating Scores from Neighbors similarity-moderated scoring, * Combining Functions: Calculating Scores from Neighbors error costs, ROC Graphs and Curves error rates, Plain Accuracy and Its Problems, Error rates errors absolute, Regression via Mathematical Functions computing, Regression via Mathematical Functions false negative vs. false positive, Evaluating Classifiers squared, Regression via Mathematical Functions estimating generalization performance, From Holdout Evaluation to Cross-Validation estimation, frequency based, Probability Estimation ethics of data mining, Privacy, Ethics, and Mining Data About Individuals–Privacy, Ethics, and Mining Data About Individuals Euclid, Similarity and Distance Euclidean distance, Similarity and Distance evaluating models, Decision Analytic Thinking I: What Is a Good Model?

area under ROC curves (AUC), The Area Under the ROC Curve (AUC), Example: Performance Analytics for Churn Modeling, Example: Performance Analytics for Churn Modeling Armstrong, Louis, Example: Jazz Musicians assessing overfitting, Overfitting association discovery, Co-occurrences and Associations: Finding Items That Go Together–Associations Among Facebook Likes among Facebook Likes, Associations Among Facebook Likes–Associations Among Facebook Likes beer and lottery example, Example: Beer and Lottery Tickets–Example: Beer and Lottery Tickets eWatch/eBracelet example, Co-occurrences and Associations: Finding Items That Go Together–Co-occurrences and Associations: Finding Items That Go Together Magnum Opus system for, Associations Among Facebook Likes market basket analysis, Associations Among Facebook Likes–Associations Among Facebook Likes surprisingness, Measuring Surprise: Lift and Leverage–Measuring Surprise: Lift and Leverage AT&T, From an Expected Value Decomposition to a Data Science Solution attribute selection, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation, Selecting Informative Attributes–Supervised Segmentation with Tree-Structured Models, Example: Attribute Selection with Information Gain–Example: Attribute Selection with Information Gain, The Fundamental Concepts of Data Science attributes, Models, Induction, and Prediction finding, Introduction to Predictive Modeling: From Correlation to Supervised Segmentation heterogeneous, Dimensionality and domain knowledge, Heterogeneous Attributes variable features vs., Models, Induction, and Prediction Audubon Society Field Guide to North American Mushrooms, Example: Attribute Selection with Information Gain automatic decision-making, Data Science, Engineering, and Data-Driven Decision Making average customers, profitable customers vs., Answering Business Questions with These Techniques B bag of words approach, Bag of Words bags, Bag of Words base rates, Class Probability Estimation and Logistic “Regression”, Holdout Data and Fitting Graphs, Problems with Unbalanced Classes baseline classifiers, Advantages and Disadvantages of Naive Bayes baseline methods, of data science, Summary Basie, Count, Example: Jazz Musicians Bayes rate, Bias, Variance, and Ensemble Methods Bayes, Thomas, Bayes’ Rule Bayesian methods, Bayes’ Rule, Summary Bayes’ Rule, Bayes’ Rule–A Model of Evidence “Lift” beer and lottery example, Example: Beer and Lottery Tickets–Example: Beer and Lottery Tickets Beethoven, Ludwig van, Example: Evidence Lifts from Facebook “Likes” beginning cross-validation, From Holdout Evaluation to Cross-Validation behavior description, From Business Problems to Data Mining Tasks Being John Malkovich (film), Data Reduction, Latent Information, and Movie Recommendation Bellkors Pragmatic Chaos (Netflix Challenge team), Data Reduction, Latent Information, and Movie Recommendation benefit improvement, calculating, Costs and benefits benefits and underlying profit calculation, ROC Graphs and Curves data-driven decision-making, Data Science, Engineering, and Data-Driven Decision Making estimating, Costs and benefits in budgeting, Ranking Instead of Classifying nearest-neighbor methods, Computational efficiency bi-grams, N-gram Sequences bias errors, ensemble methods and, Bias, Variance, and Ensemble Methods–Bias, Variance, and Ensemble Methods Big Data data science and, Data Processing and “Big Data”–Data Processing and “Big Data” evolution of, From Big Data 1.0 to Big Data 2.0–From Big Data 1.0 to Big Data 2.0 on Amazon and Google, Thinking Data-Analytically, Redux big data technologies, Data Processing and “Big Data” state of, From Big Data 1.0 to Big Data 2.0 utilizing, Data Processing and “Big Data” Big Red proposal example, Example Data Mining Proposal–Flaws in the Big Red Proposal Bing, Why Text Is Important, Representation Black-Sholes model, Models, Induction, and Prediction blog postings, Why Text Is Important blog posts, Example: Targeting Online Consumers With Advertisements Borders (book retailer), Achieving Competitive Advantage with Data Science breast cancer example, Example: Logistic Regression versus Tree Induction–Example: Logistic Regression versus Tree Induction Brooks, David, What Data Can’t Do: Humans in the Loop, Revisited browser cookies, Example: Targeting Online Consumers With Advertisements Brubeck, Dave, Example: Jazz Musicians Bruichladdich single malt scotch, Understanding the Results of Clustering Brynjolfsson, Erik, Data Science, Engineering, and Data-Driven Decision Making, Data Processing and “Big Data” budget, Ranking Instead of Classifying budget constraints, Profit Curves building modeling labs, From Holdout Evaluation to Cross-Validation building models, Data Mining and Its Results, Business Understanding, From Holdout Evaluation to Cross-Validation Bunnahabhain single malt whiskey, Example: Whiskey Analytics, Hierarchical Clustering business news stories example, Example: Clustering Business News Stories–The news story clusters business problems changing definition of, to fit available data, Changing the Way We Think about Solutions to Business Problems–Changing the Way We Think about Solutions to Business Problems data exploration vs., Stepping Back: Solving a Business Problem Versus Data Exploration–Stepping Back: Solving a Business Problem Versus Data Exploration engineering problems vs., Other Data Science Tasks and Techniques evaluating in a proposal, Be Ready to Evaluate Proposals for Data Science Projects expected value framework, structuring with, The Expected Value Framework: Structuring a More Complicated Business Problem–The Expected Value Framework: Structuring a More Complicated Business Problem exploratory data mining vs., The Fundamental Concepts of Data Science unique context of, What Data Can’t Do: Humans in the Loop, Revisited using expected values to provide framework for, The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces–The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces business strategy, Data Science and Business Strategy–A Firm’s Data Science Maturity accepting creative ideas, Be Ready to Accept Creative Ideas from Any Source case studies, examining, Examine Data Science Case Studies competitive advantages, Achieving Competitive Advantage with Data Science–Achieving Competitive Advantage with Data Science, Sustaining Competitive Advantage with Data Science–Superior Data Science Management data scientists, evaluating, Superior Data Scientists–Superior Data Scientists evaluating proposals, Be Ready to Evaluate Proposals for Data Science Projects–Flaws in the Big Red Proposal historical advantages and, Formidable Historical Advantage intangible collateral assets and, Unique Intangible Collateral Assets intellectual property and, Unique Intellectual Property managing data scientists effectively, Superior Data Science Management–Superior Data Science Management maturity of the data science, A Firm’s Data Science Maturity–A Firm’s Data Science Maturity thinking data-analytically for, Thinking Data-Analytically, Redux–Thinking Data-Analytically, Redux C Caesars Entertainment, Data and Data Science Capability as a Strategic Asset call center example, Profiling: Finding Typical Behavior–Profiling: Finding Typical Behavior Capability Maturity Model, A Firm’s Data Science Maturity Capital One, Data and Data Science Capability as a Strategic Asset, From an Expected Value Decomposition to a Data Science Solution Case-Based Reasoning, How Many Neighbors and How Much Influence?

This is facilitated tremendously by strong and deep professional contacts. Data scientists call on each other to help in steering them to the right solutions. The better a professional network is, the better will be the solution. And, the best data scientists have the best connections. Superior Data Science Management Possibly even more critical to success for data science in business is having good management of the data science team. Good data science managers are especially hard to find. They need to understand the fundamentals of data science well, possibly even being competent data scientists themselves. Good data science managers also must possess a set of other abilities that are rare in a single individual: They need to truly understand and appreciate the needs of the business.

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
by Thomas H. Davenport
Published 4 Feb 2014

But if you’re interviewing one from another industry, make sure that the candidate shows interest and demonstrated business problem-solving ability in the industry from which he or she comes. Horizontal versus Vertical Data Scientists There are, of course, many types of data scientists. One way to ­characterize an important set of differences between types has been coined by Vincent Granville, who operates Data Science Central, Chapter_04.indd 97 03/12/13 12:00 PM 98 big data @ work a social n ­ etwork for data scientists like himself. In a blog post (with some ­analytical jargon you can toss around at cocktail parties), he described the difference between vertical and horizontal data scientists: • Vertical data scientists have very deep knowledge in some ­narrow field.

Where Do You Get Data Scientists? Data Scientists from Universities I noted earlier that data scientists today often have advanced degrees in science, but that won’t always be the most efficient way to procure the ­necessary skills. How soon will there be more direct educational paths to Chapter_04.indd 101 03/12/13 12:00 PM 102 big data @ work data science? Well, as I write I believe there is no university that has yet issued a degree in data science. But there are: (a) a growing n ­ umber of courses in the field and (b) a growing number of institutions that are ­planning data science degree programs.

And as I hinted in chapter 3, firms such as ­Accenture, Deloitte, and IBM have begun to hire and train data scientists in larger numbers. Predominantly offshore firms such as Mu Sigma, a “math factory” with thousands of quants as employees, are also hiring data scientists in considerable numbers. One data scientist has come up with a creative approach to training new data scientists. The Insight Data Science Fellows Program, started by Jake Klamka (whose academic background is high-energy physics), takes scientists for six weeks and teaches them the skills to be a data scientist. The program includes mentoring by local companies with big data challenges (e.g., Facebook, Twitter, Google, ­LinkedIn).

Succeeding With AI: How to Make AI Work for Your Business
by Veljko Krunic
Published 29 Mar 2020

Closely related fields that are sometimes considered part of data science include bioinformatics and quantitative analysis. While AI and data science closely overlap, they aren’t identical, because AI includes fields such as robotics, which are traditionally not considered part of data science. Harris, Murphy, and Vaisman’s book [66] provides a good summary of the state of data science before the advancement of deep learning. Data scientist—A practitioner of the field of data science. Many sources (including this book) classify AI practitioners as data scientists. Database administrator (DBA)—A professional responsible for the maintenance of a database. Most commonly, a DBA would be responsible for maintaining a RDBMS-based database.

As a manager, you should look for two things when hiring data scientists for your team. You should look for a candidate who has skills in the core domain that your initial AI project is likely to use, but you also need them to have a demonstrated ability to learn new skills. Chances are good that, along the way, your data scientist will need to learn many new methods. When hiring senior data science team members, don’t just look for a strong background in one set of AI methods. Senior data scientists should have a history of solving concrete problems using a diverse set of methods. Data science is a team sport. To completely cover all of the knowledge that’s part of data science, you need a whole team, so you must assemble a team with complementary skillsets.

They were the best decisions you could have made based on what you knew then. Types of data scientists to hire Leading experts focused on a narrow class of algorithms are worth their weight in gold if they know how to improve the performance of an algorithm by 0.1% that’s bringing $1 billion per year to your organization. However, if your question is, “What can AI do to help my business?” you’re probably better off with a data scientist who has a command of the wide range of data science methods. A data scientist with that profile has the best chance of finding a use case in which the profit margin would be large and you don’t need to get that last 0.1% improvement for the use case to be viable.

pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python
by Joel Grus
Published 13 Apr 2015

If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-90142-7 [LSI] Preface Data Science Data scientist has been called “the sexiest job of the 21st century,” presumably by someone who has never visited a fire station. Nonetheless, data science is a hot and growing field, and it doesn’t take a great deal of sleuthing to find analysts breathlessly prognosticating that over the next 10 years, we’ll need billions and billions more data scientists than we currently have. But what is data science? After all, we can’t produce data scientists if we don’t know what data science is. According to a Venn diagram that is somewhat famous in the industry, data science lies at the intersection of: Hacking skills Math and statistics knowledge Substantive expertise Although I originally intended to write a book covering all three, I quickly realized that a thorough treatment of “substantive expertise” would require tens of thousands of pages.

From Scratch There are lots and lots of data science libraries, frameworks, modules, and toolkits that efficiently implement the most common (as well as the least common) data science algorithms and techniques. If you become a data scientist, you will become intimately familiar with NumPy, with scikit-learn, with pandas, and with a panoply of other libraries. They are great for doing data science. But they are also a good way to start doing data science without actually understanding data science. In this book, we will be approaching data science from scratch. That means we’ll be building tools and implementing algorithms by hand in order to better understand them.

Now, before you start feeling too jaded: some data scientists also occasionally use their skills for good — using data to make government more effective, to help the homeless, and to improve public health. But it certainly won’t hurt your career if you like figuring out the best way to get people to click on advertisements. Motivating Hypothetical: DataSciencester Congratulations! You’ve just been hired to lead the data science efforts at DataSciencester, the social network for data scientists. Despite being for data scientists, DataSciencester has never actually invested in building its own data science practice. (In fairness, DataSciencester has never really invested in building its product either.)

pages: 296 words: 66,815

The AI-First Company
by Ash Fontana
Published 4 May 2021

Before creating a bunch of chatter between equations, have a single conversation with one equation to see if it answers customers’ questions. We’re not here to build a data science consulting firm, but DLEs start with data science. Most AI models are based on statistical methods. Starting with statistics allows for a smooth transition into AI when there’s enough time and money from customers to build it. Starting Small: Data Science Starting with a data scientist solving a well-defined problem saves time and money when compared to starting with a team big enough to solve an amorphous problem with machine learning. Dedicate a data scientist to serve as a consultant to customers and provide personalized, data-driven answers to a single question in order to demonstrate return on investment (ROI).

Linking databases, cleaning data, creating data pipelines, building features, and designing interfaces is a lot of work, but avoidable work. There’s also evidence that starting with data science works on Kaggle, where the largest community of data scientists and ML engineers compete to win prizes for solving problems. The summary is that data science methods get to the Pareto optimal solution (achieving 80 percent of the optimal solution for 20 percent of the work). Often, only the last 20 percent requires data science. Specifically, data science methods such as ensembles of decision trees—whether random forest or gradient boosted—combined with manual feature engineering win most of the competitions on structured data, and neural networks win most of the competitions on unstructured data.

hiring sequence Data analyst Business Low Yes First Data scientist Statistics Low Partially Second Data engineer Databases Medium Yes Third Machine learning engineer Computer science Medium No Fourth Data product manager Product management Medium No Fifth Data infrastructure engineer Distributed systems High Partially Sixth Machine learning researcher Machine learning High Maybe Seventh WHERE TO FIND THEM Starting with statistics means hiring analysts and data scientists before engineers and ML researchers. Essentially, by decoupling data science and software engineering, hiring can focus on data scientists without software engineering experience, thus broadening the pool of candidates to include every discipline in which manipulating data is part of the research process. One can find analysts and data scientists in the fields of economics, econometrics, accounting, actuarial science, biology, biostatistics, geology, geostatistics, epidemiology, demographics, engineering, and physics because these areas require high levels of mathematics and statistics.

pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else
by Steve Lohr
Published 10 Mar 2015

So, Hammerbacher says, “We decided to mush those two titles together and call them data scientists.” At first, a few PhDs resisted, viewing the change as a loss of title prestige. “But ultimately everyone embraced it, and it took on a life of its own,” he observes. And to him, it seemed natural. “Data science is what we did.” The origins of data science reach back half a century or more. Hammerbacher’s choice of terms wasn’t mere happenstance. As soon as he accepted the job at Facebook, Hammerbacher began poring through technical papers and books that provided clues to the evolution of data science. In the spring of 2012, he taught a course in data science at the University of California at Berkeley.

Alex Pentland, a computational social scientist at the Massachusetts Institute of Technology Media Lab, sees the promise of “a transition on a par with the invention of writing or the Internet.” The ranks of data scientists—people who wield their math and computing smarts to make sense of data—are modest compared to the workforce as a whole, but they loom large. Data science is hailed as the field of the future. Universities are rushing to establish data science centers, institutes, and courses, and companies are scrambling to hire data scientists. There is a trend-chasing side to the current data frenzy that invites ridicule. But it is hard to argue the direction. Jeffrey Hammerbacher was always a numbers kind of guy.

To explain, I think of a conversation with Claudia Perlich, the chief scientist of Dstillery, a data-science start-up in New York that specializes in ad targeting. Perlich is a former research scientist at IBM, a winner of prestigious data science contests, and a lecturer at New York University’s Stern School of Business. When I ask why she is using her skills to deliver ads, Perlich replies that digital marketing is a large, real-world testing ground where practitioners in a young field can safely learn valuable lessons. The online advertising marketplace, she says, is “a wonderful place for data scientists to experiment now. What happens if my algorithm is wrong?

pages: 398 words: 86,855

Bad Data Handbook
by Q. Ethan McCallum
Published 14 Nov 2012

While they are willing and able to work on many tasks across the data science process, from munging and modeling to visualizing and presenting, it’s quite rare to find talent with extensive experience in all aspects of data science. Organizations and managers would do well to adjust their expectations accordingly. A successful data science function is made up not by one person, but at a minimum two or three individuals whose broad skills have much overlap while their unique expertise does not. Where Do Data Scientists Live Within the Organization? Finding a place for data scientists can be a bit tricky. Sometimes you’ll find them living within an engineering organization, sometimes within a product organization, sometimes within a research organization, and other times they live under some other umbrella or on their own.

His skills as a programmer began while assisting with the development Sahana Disaster Management System, were refined helping Sugar Labs, the software which runs the One Laptop Per Child XO. Tim has recently moved into the escience field, where he works to support the research community’s uptake of technology. Marck Vaisman is a data scientist and claims he’s been one before the term was en vogue. He is also a consultant, entrepreneur, master munger, and hacker. Marck is the principal data scientist at DataXtract, LLC where he helps clients ranging from startups to Fortune 500 firms with all kinds of data science projects. His professional experience spans the management consulting, telecommunications, Internet, and technology industries. He is the co-founder of Data Community DC, an organization focused on building the Washington DC area data community and promoting data and statistical sciences by running Meetup events (including Data Science DC and R Users DC) and other initiatives.

Spillover can happen for any number of reasons, including inadequate accuracy in the partitioning scheme. While you may not be able to completely eliminate spillover, you can at least be aware of it. Don’t expect that the data is partitioned perfectly. Thou Shalt Provide Your Data Scientists with a Single Tool for All Tasks There is no single tool that allows you to perform all of your data science tasks. Many different tools exist, and each tool has a specific purpose. In order to be successful, data scientists should have access to the tools they need and also the ability to configure these tools as needed—at least in a research and development (R&D) environment—without having to jump through hoops to do their work.

pages: 337 words: 86,320

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
by Seth Stephens-Davidowitz
Published 8 May 2017

In other words, she spotted patterns and predicted how one variable will affect another. Grandma is a data scientist. You are a data scientist, too. When you were a kid, you noticed that when you cried, your mom gave you attention. That is data science. When you reached adulthood, you noticed that if you complain too much, people want to hang out with you less. That is data science, too. When people hang out with you less, you noticed, you are less happy. When you are less happy, you are less friendly. When you are less friendly, people want to hang out with you even less. Data science. Data science. Data science. Because data science is so natural, the best Big Data studies, I have found, can be understood by just about any smart person.

In other words, the Columbia and Microsoft researchers wrote a groundbreaking study by utilizing the natural, obvious methodology that everybody uses to make health diagnoses. But wait. Let’s slow down here. If the methodology of the best data science is frequently natural and intuitive, as I claim, this raises a fundamental question about the value of Big Data. If humans are naturally data scientists, if data science is intuitive, why do we need computers and statistical software? Why do we need the Kolmogorov-Smirnov test? Can’t we just use our gut? Can’t we do it like Grandma does, like nurses and doctors do? This gets to an argument intensified after the release of Malcolm Gladwell’s bestselling book Blink, which extols the magic of people’s gut instincts.

We are often wrong, in other words, about how the world works when we rely just on what we hear or personally experience. While the methodology of good data science is often intuitive, the results are frequently counterintuitive. Data science takes a natural and intuitive human process—spotting patterns and making sense of them—and injects it with steroids, potentially showing us that the world works in a completely different way from how we thought it did. That’s what happened when I studied the predictors of basketball success. When I was a little boy, I had one dream and one dream only: I wanted to grow up to be an economist and data scientist. No. I’m just kidding. I wanted desperately to be a professional basketball player, to follow in the footsteps of my hero, Patrick Ewing, all-star center for the New York Knicks.

pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together
by Nick Polson and James Scott
Published 14 May 2018

It turns out that when she wasn’t caring for soldiers, Nightingale was also a skilled data scientist who successfully convinced hospitals that they could improve health care using statistics. In fact, no other data scientist in history can claim to have saved so many lives as Florence Nightingale. In 1859, in honor of these achievements, she became the first woman ever elected to the U.K.’s Royal Statistical Society. Nightingale’s path to unlocking the power of health-care data offers three distinct lessons for today. First, it illustrates the kind of institutional commitment necessary for a data-science revolution to take hold in a given field.

Her ideas formed a clear model for the international system of disease classification used today, which serves as the bedrock for all of modern epidemiology and medical data science.37 Preventable Mischiefs in the Age of AI Nightingale’s three legacies all have clear parallels today. They also raise some sharp questions. She spoke of the “foul air and preventable mischiefs” that killed the soldiers of the Crimea, and while the air in modern hospitals may be less foul, there are still mischiefs aplenty. One big question is how to staff and train a modern health-care team. After Nightingale, no hospital could function without nurses. When will the same be true of data scientists and experts in artificial intelligence, who now play almost no day-to-day role in health care?

Data Sharing That bring us to another big question: Will data-science teams get access to the data they’ll need to improve the existing AI systems and build new ones? If you work for a single hospital, you might have access to thousands of patient records. But wouldn’t millions of records from lots of hospitals be much better? After all, a big reason that tech firms like Google and Facebook have such good AI is the sheer scale of their data sets. There are surely millions of clinical histories of kidney disease scattered across the medical databases of the world. In principle, these could be brought together, and teams of data scientists could be hired to analyze them using cutting-edge AI tools, in a way that still ensured patient privacy.

pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders
by Mariya Yao , Adelyn Zhou and Marlene Jia
Published 1 Jun 2018

These specialized engineers deploy models, manage infrastructure, and run operations related to machine learning projects. They are assisted by data scientists and data engineers to manage databases and build the data infrastructure necessary to support the products and services used by their customers. Data Scientists Data scientists typically work in an offline setting and do not deal directly with the production experience, which is what the end user would see. Data scientists collect data, spend most of their time cleaning it, and the rest of their time looking for patterns in the data and building predictive models. They often have degrees in statistics, data science, or a related discipline. Alternately, many have programming backgrounds and hold degrees in computer science, math, or physics.

Recruit from Specialized Training Programs To meet the rising demand for machine learning talent, education programs have emerged to train junior talent and help them find job placements. Abhi Jha, Director of Advanced Analytics at McKesson, initially hired data science students from Galvanize, a technical skills training provider. “We’ve had a lot of success hiring from career fairs that Galvanize organizes, where we present the unique challenges our company tackles in healthcare,” he adds.(57) Experienced Scientists and Researchers Hiring experienced data scientists and machine learning researchers requires a different approach. For these positions, employers typically look for a doctorate or extensive experience in machine learning, statistical modeling, or related fields.

Understand Different Job Titles Many companies struggle just to understand what “artificial intelligence” is, much less the myriad of titles, roles, skills, and technologies used to describe a prospective hire. Titles and descriptions vary from company to company, and terms are not well-standardized in the industry. However, most of the roles you encounter will resemble the following: Data Science Team Manager A data science team manager understands how best to deploy the expertise of his team in order to maximize their productivity on a project. This manager should have sufficient technical knowledge to understand what his team members are doing and how best to support them; at the same time, this manager must also have good communications skills in order to liaise with the leadership or non-technical units.

Thinking with Data
by Max Shron
Published 15 Aug 2014

Thinking with Data Max Shron Praise for Thinking with Data "Thinking with Data gets to the essence of the process, and guides data scientists in answering that most important question—what’s the problem we’re really trying to solve?” — Hilary Mason Data Scientist in Residence at Accel Partners; co-founder of the DataGotham Conference “Thinking with Data does a wonderful job of reminding data scientists to look past technical issues and to focus on making an impact on the broad business objectives of their employers and clients. It’s a useful supplement to a data science curriculum that is largely focused on the technical machinery of statistics and computer science

Statistics as a whole is concerned with generalizing from old to new data; causal analysis is concerned with generalizing from old to new scenarios where we have deliberately altered something. Generally speaking, because data science as a field is primarily concerned with generalizing knowledge only in highly specific domains (such as for one company, one service, or one type of product), it is able to sidestep many of the issues that snarl causal analysis in more scientific domains. As of today, data scientists do little work building theories intended to capture causal relationships in entirely new scenarios. For those that do, especially if their subject matter concerns human behavior, a more thorough grounding in topics such as construct validity and quasi-experimental design is highly recommended.[10] Defining Causality Different schools of thought have defined causality differently, but a particularly simple interpretation, suitable to many of the problems that are solvable with data, is the alternate universe perspective.

The fields of design, argument studies, critical thinking, national intelligence, problem-solving heuristics, education theory, program evaluation, various parts of the humanities—each of them have insights that data science can learn from. Data science is already a field of bricolage. Swaths of engineering, statistics, machine learning, and graphic communication are already fundamental parts of the data science canon. They are necessary, but they are not sufficient. If we look further afield and incorporate ideas from the “softer” intellectual disciplines, we can make data science successful and help it be more than just this decade’s fad. A focus on why rather than how already pervades the work of the best data professionals.

pages: 168 words: 49,067

Becoming Data Literate: Building a great business, culture and leadership through data and analytics
by David Reed
Published 31 Aug 2021

Table 2.3: Data roles within conventional organisational structures Function Role Task Analytics Customer churn analyst Churn propensity modelling Customer management Retention manager Churn propensity modelling Marketing Customer marketing manager Churn propensity modelling Board Chief customer officer Creating single view of the customer Data management Customer database manager Creating single view of the customer Business intelligence Customer analyst Net customer figure report Finance Chief financial officer Net customer figure report Compliance KYC manager Identity validation Ecommerce Channel manager Identity validation Information security Information security officer Identity validation Customer experience Cx manager Behavioural modelling Data science Data scientist Behavioural modelling Centralisations of roles, for example into a data and analytics centre of excellence, removes role and task duplication while supporting multiple internal customers (see Table 2.4). A similar multi-stakeholder effect can be achieved by using a virtual data and analytics organisation where roles are based within a specific business function, but serve multiple stakeholders across functions.

This is a sector that is very risk-averse, yet the new function created in 2016 was given a mission to “put a rocket up the company”. To accelerate its impact, 70 data scientists were recruited and tasked to “move fast and break things”. Significant changes did start to be realised, including a reduction in the friction during insurance pricing and quotation through predictive modelling based on customer data. Robust data underpinnings and platforms were put in place to support both customer science and data science with strong growth in internal demand as business units – which had been set against each other to create a competitive culture – looked for support for ongoing innovation and optimisation.

Aviva started on this journey in 2016 with the creation of its customer science team as part of a vision to create engaging, relevant customer experiences through intelligent use of data science. A key moment occurred in 2019 when this team realised it needed to make customer data central and relevant to the whole organisation. This would enable data science to avoid being just another siloed department by bringing data science to life for colleagues and providing a common language for the business to talk about customers. Of course, data foundations were important. In this example, a unified customer view brought together over 70 databases from multiple business units.

pages: 391 words: 123,597

Targeted: The Cambridge Analytica Whistleblower's Inside Story of How Big Data, Trump, and Facebook Broke Democracy and How It Can Happen Again
by Brittany Kaiser
Published 21 Oct 2019

The Mercers’ generosity to conservative causes is well known, but Bob saw in Alexander a marriage between his love for data science and his political motivations. Alexander recalled Bob’s reaction as something like “How much do you want, and where should I send it?” From there, Steve and Bekah and Bob formed the triumvirate that was the board of directors of the new company known as Cambridge Analytica, with Alexander Nix at the helm. Alexander had already hired some data scientists by then, but he went on to hire more, and he began to instruct SCL Group employees to split their time between international work and building the U.S. business. The data scientists began to purchase as much data as they could get their hands on, and within months, Cambridge Analytica had taken off.

That’s because San Antonio was home to Brad Parscale, of Giles-Parscale. Parscale had been a longtime website designer for Trump, and Trump had picked him to run his digital operations. The problem was that Parscale had no data science or data-driven communications experience, so Bekah knew that Trump needed Cambridge. When the early Cambridge Analytica team (which consisted of Matt Oczkowski, Molly Schweickert, and a handful of data scientists) arrived on the scene in San Antonio in June, they found Brad and the Trump campaign’s digital operations in an alarming state of disarray. Oczkowski—“Oz” for short—wrote to me on June 17, when I asked him a question about a commercial client, saying he had no time to help me, as he would need all his energy for working with Brad and getting their analytics up and running.

And for years, the SCL Group, Cambridge Analytica’s parent company, had been identifying and sorting people using the most sophisticated method in behavioral psychology, which gave it the capability of turning what was otherwise just a mountain of information about the American populace into a gold mine. Nix told us about his in-house army of data scientists and psychologists who had learned precisely how to know whom they wanted to message, what messaging to send them, and exactly where to reach them. He had hired the most brilliant data scientists in the world, people who could laser in on individuals wherever they were to be found (on their cell phones, computers, tablets, on television) and through any kind of medium you could imagine (from audio to social media), using “microtargeting.”

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
by Eric Siegel
Published 19 Feb 2013

The machine learning process is designed to accomplish this task, to mechanically develop new capabilities from data. This automation is the means by which PA builds its predictive power. The hunter returns back to the tribe, proudly displaying his kill. So, too, a data scientist posts her model on the bulletin board near the company ping-pong table. The hunter hands over the kill to the cook, and the data scientist cooks up her model, translates it to a standard computer language, and e-mails it to an engineer for integration. A well-fed tribe shows the love; a psyched executive issues a bonus. The tribe munches and the scientist crunches.

Special Forces: Dean Abbott, “Hiring and Selecting Key Personnel Using Predictive Analytics,” Predictive Analytics World San Francisco Conference, March 4, 2012, San Francisco, CA. www.predictiveanalyticsworld.com/sanfrancisco/2012/agenda.php#day1–1040a. LinkedIn: Manu Sharma, LinkedIn, “Data Science at LinkedIn: Iterative, Big Data Analytics and You,” Predictive Analytics World New York Conference, October 19, 2011, New York, NY. www.predictiveanalyticsworld.com/newyork/2011/agenda.php#track1-lnk. Scott Nicholson, LinkedIn, “Beyond Big Data: Better Living through Data Science,” Predictive Analytics World Boston Conference, October 1, 2012, Boston, MA. www.predictiveanalyticsworld.com/boston/2012/agenda.php#keynote-900. Scott Nicholson, LinkedIn, “Econometric Applications & Extracting Economic Insights from the LinkedIn Dataset,” Predictive Analytics World San Francisco Conference, March 5, 2012, San Francisco, CA. www.predictiveanalyticsworld.com/sanfrancisco/2012/agenda.php#day1–20a.

The stories range from inspiring to downright scary—read them and find out what we’ve been up to while you weren’t paying attention.” —Michael J. A. Berry, author of Data Mining Techniques, Third Edition “Eric Siegel is the Kevin Bacon of the predictive analytics world, organizing conferences where insiders trade knowledge and share recipes. Now, he has thrown the doors open for you. Step in and explore how data scientists are rewriting the rules of business.” —Kaiser Fung, VP, Vimeo; author of Numbers Rule Your World “Written in a lively language, full of great quotes, real-world examples, and case studies, it is a pleasure to read. The more technical audience will enjoy chapters on The Ensemble Effect and uplift modeling—both very hot trends.

pages: 287 words: 69,655

Don't Trust Your Gut: Using Data to Get What You Really Want in LIfe
by Seth Stephens-Davidowitz
Published 9 May 2022

The artists who make it big, in contrast, present to a far wider set of places, allowing themselves to stumble upon a big break. Many people have talked about the importance in your career of showing up. But data scientists have found it’s about showing up to a wide range of places. This book isn’t meant to give advice only for single people, new parents, or aspiring artists—though there will be more lessons here for all of them. My goal is to offer some lessons in new, big datasets that are useful for you, no matter what stage of life you are in. There will be lessons recently uncovered by data scientists in how to be happier, look better, advance your career, and much more. And the idea for the book all came to me one evening while . . .

In the past few years, other teams of researchers have mined online dating sites, combing through large, new datasets on the traits and swipes of tens of thousands of single people to determine what predicts romantic desirability. The findings from the research on romantic desirability, unlike the research on romantic happiness, has been definitive. While data scientists have found that it is surprisingly difficult to detect the qualities in romantic partners that lead to happiness, data scientists have found it strikingly easy to detect the qualities that are catnip in the dating scene. A recent study, in fact, found that not only is it possible to predict with great accuracy whether someone will swipe left or right on a particular person on an online dating site.

See rich people weather and happiness, 261–262 websites athletic scholarships stats, 95, 98 equestrianism on a budget, 108 neighborhood information, 77, 87 ScenicOrNot, 259 trackyourhappiness.org, 235 twin basketball players models, 103, 104 wholesale distribution business startup, 122 West Point cadets’ success, 195–196 wholesale beverage distribution, 111–113, 122, 130 website on business startup, 122 Wikipedia entries in contrasting counties, 73 work misery factor, 221, 238–239 misery factor lessened, 239–242 quitting job per coin flip, 242 Y you are drawn to you, 15, 39–40 Youkilis, Kevin, 46–48 young entrepreneurs, 140. See also age of typical entrepreneur Z zero-profit condition, 128 Zuckerberg, Mark, 140, 142, 149 About the Author SETH STEPHENS-DAVIDOWITZ is a data scientist, author, and keynote speaker. His 2017 book, Everybody Lies, was a New York Times bestseller and an Economist Book of the Year. He has worked as a contributing op-ed writer for the New York Times, a lecturer at the Wharton School, and a Google data scientist. He received a BA in philosophy from Stanford, where he graduated Phi Beta Kappa, and a PhD in economics from Harvard. He lives in Brooklyn and is a passionate fan of the Mets, Knicks, Jets, and Leonard Cohen.

pages: 23 words: 5,264

Designing Great Data Products
by Jeremy Howard , Mike Loukides and Margit Zwemer
Published 23 Mar 2012

Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Someone using Google’s self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work. But as data scientists build increasingly sophisticated products, they need a systematic design approach. We don’t claim that the Drivetrain Approach is the best or only method; our goal is to start a dialog within the data science and business communities to advance our collective vision. Objective-based data products We are entering the era of data as drivetrain, where we use data not just to generate more data (in the form of predictions), but use data to produce actionable outcomes.

In an emergency, a data product that just produces more data is of little use. Data scientists now have the predictive tools to build products that increase the common good, but they need to be aware that building the models is not enough if they do not also produce optimized, implementable outcomes. The future for data products We introduced the Drivetrain Approach to provide a framework for designing the next generation of great data products and described how it relies at its heart on optimization. In the future, we hope to see optimization taught in business schools as well as in statistics departments. We hope to see data scientists ship products that are designed to produce desirable business outcomes.

We hope to see data scientists ship products that are designed to produce desirable business outcomes. This is still the dawn of data science. We don’t know what design approaches will be developed in the future, but right now, there is a need for the data science community to coalesce around a shared vocabulary and product design process that can be used to educate others on how to derive value from their predictive models. If we do not do this, we will find that our models only use data to create more data, rather than using data to create actions, disrupt industries and transform lives. About the Authors Mike Loukides is an editor for O'Reilly & Associates.

pages: 252 words: 72,473

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
by Cathy O'Neil
Published 5 Sep 2016

,” Boston Globe, November 7, 2015, www.​bostonglobe.​com/​2015/​11/​07/​childwelfare-​bostonglobe-​com/​AZ2kZ7ziiP8c​BMOite2KKP/story.​html. ABOUT THE AUTHOR Cathy O’Neil is a data scientist and the author of the blog mathbabe.​org. She earned a PhD in mathematics from Harvard and taught at Barnard College before moving to the private sector, where she worked for the hedge fund D. E. Shaw. She then worked as a data scientist at various start-ups, building models that predict people’s purchases and clicks. O’Neil started the Lede Program in Data Journalism at Columbia and is the author of Doing Data Science. She appears weekly on the Slate Money podcast. What’s next on your reading list?

For diploma mills like the University of Phoenix, I think it’s safe to say, the goal is to recruit the greatest number of students who can land government loans to pay most of their tuition and fees. With that objective in mind, the data scientists have to figure out how best to manage their various communication channels so that together they generate the most bang for each buck. The data scientists start off with a Bayesian approach, which in statistics is pretty close to plain vanilla. The point of Bayesian analysis is to rank the variables with the most impact on the desired outcome. Search advertising, TV, billboards, and other promotions would each be measured as a function of their effectiveness per dollar.

And if it wants to find out what drives shopping recidivism, it carries out research. Its data scientists don’t just study zip codes and education levels. They also inspect people’s experience within the Amazon ecosystem. They might start by looking at the patterns of all the people who shopped once or twice at Amazon and never returned. Did they have trouble at checkout? Did their packages arrive on time? Did a higher percentage of them post a bad review? The questions go on and on, because the future of the company hinges upon a system that learns continually, one that figures out what makes customers tick. If I had a chance to be a data scientist for the justice system, I would do my best to dig deeply to learn what goes on inside those prisons and what impact those experiences might have on prisoners’ behavior.

Data Action: Using Data for Public Good
by Sarah Williams
Published 14 Sep 2020

So it is essential we investigate potential bias created by the technology used to collect data. Analysis of data acquired through creative means must include subject experts and the community represented in the data. Working with policy experts helps data scientists ask the right questions. This type of multidisciplinary team can generate more accurate and ethical results from the data. There are numerous examples of data scientists who try to predict the dynamics of human life, successfully at first; but without continued inclusion of subject experts their data models are quickly outdated. This is the hubris of big data. Why do we often think the data analyst can find the right questions to ask without asking those who have in-depth knowledge of the topics we seek to understand?

Method Throughout Data Action, I argue that unlocking data for policy change works best when the process engages multidisciplinary teams that include policy experts, data scientists, and data visualizers, among others. In the book's conclusion, “It's How We Work with Data That Really Matters” I stress that bringing together these experts allows the creative expression of data to truly blossom. Policy experts understand the issues, data scientists know how to develop algorithms, and graphic designers can share the results through compelling visuals. Working together these specialists extend their findings beyond the walls of academia or city hall—and reach the hands and minds of the public.

Kubzansky, “A Framework for Examining Social Stress and Susceptibility to Air Pollution in Respiratory Health,” Environmental Health Perspectives 117, no. 9 (September 1, 2009): 1351–1358, https://doi.org/10.1289/ehp.0900612. 54 Interview with Iyad Kheirbek of [New York City Department of Mental Health and Hygiene], January 2015. 55 “2014 West Africa Ebola Response—OpenStreetMap Wiki,” accessed January 25, 2019, https://wiki.openstreetmap.org/wiki/2014_West_Africa_Ebola_Response. 56 “Ushahidi,” accessed January 25, 2019, https://www.ushahidi.com/. 57 Ida Norheim-Hagtun and Patrick Meier, “Crowdsourcing for Crisis Mapping in Haiti,” Innovations: Technology, Governance, Globalization 5, no. 4 (2010): 81–89. 58 Jessica Ramirez, “‘Ushahidi’ Technology Saves Lives in Haiti and Chile,” Newsweek, March 3, 2010, https://www.newsweek.com/ushahidi-technology-saves-lives-haiti-and-chile-210262. 59 Norheim-Hagtun and Meier, “Crowdsourcing for Crisis Mapping in Haiti.” 60 Patrick Meier, “Crowdsourcing the Evaluation of Post-Sandy Building Damage Using Aerial Imagery,” IRevolutions (blog), November 1, 2012, https://irevolutions.org/2012/11/01/crowdsourcing-sandy-building-damage/. 61 Introduction to Data Science (IDA), “About,” IDS webpage, https://www.idsucla.org/about. 62 Ron Eglash, Juan E. Gilbert, and Ellen Foster, “Toward Culturally Responsive Computing Education,” Commun. ACM 56, no. 7 (July 2013): 33–36, https://doi.org/10.1145/2483852.2483864; Amelia McNamara and Mark Hansen, “Teaching Data Science to Teenagers,” in Proceedings of the Ninth International Conference on Teaching Statistics, 2014. 63 Nicole Lazar and Christine Franklin, “The Big Picture: Preparing Students for a Data-Centric World,” Chance 28, no. 4 (2015): 43–45.

pages: 1,409 words: 205,237

Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale
by Jan Kunigk , Ian Buss , Paul Wilkinson and Lars George
Published 8 Jan 2019

The architect is responsible for technical coordination and should have extensive experience working with big data and Hadoop. Data scientist Although the term “data scientist” has been around for some time, the current surge in its use in corporate IT reflects how much big data changes the complexity of data management in IT. It also reflects a shift in academia, where data science has evolved to become a fully qualified discipline at many renowned universities.1 Sometimes we see organizations that are largely indifferent to data science per se, or that simply try to rebrand all existing analyst staff as data scientists. The data scientist, however, actually does more: Statistics and classic BI The data scientist depends on classic tools to present and productize the result of his work, but before these tools can be used, a lot of exploration, cleansing, and modeling is likely to be required on the Hadoop layer.

End users commonly access the web interfaces of the YARN ResourceManager, MapReduce Job History Server, and Spark History Server. In addition, the Hue project offers a comprehensive user interface for many components in the stack, including HDFS, Hive, Impala, Oozie, and Solr. Additional user-oriented or specialized web UIs are also available, such as Jupyter Notebook, Apache Zeppelin, or Cloudera Data Science Workbench for data scientists, as shown in Table 11-1. Table 11-1. A summary of access mechanisms Project Programmatic Command line Web UI HDFS Java, REST (WebHDFS/HttpFS) hdfs NameNode and DataNode YARN Java, REST (RM) yarn ResourceManager and NodeManager ZooKeeper Java/C++ zookeeper-client - HBase Java, HBase REST/Thrift servera hbase shell Master and RegionServer Hive Thrift, JDBC, ODBC beeline HiveServer2 Oozie Java, REST oozie Server via extension Spark Java/Scala/Python, JDBC (via Thrift server) spark-shell, spark-submit, pyspark History Server Impala JDBC, ODBC impala-shell Statestore, catalog server, daemon Solr Java, REST solrctl Server Kudu Java/C++/Python kudu admin utility Master and tablet server Hue Python SDK - Hue Server a Apache Phoenix provides a JDBC interface to Apache HBase.

Coding Whereas the typical analyst or statistician understands methods and models mathematically, a good data scientist also has a solid background in parallel algorithms to build large-scale distributed applications around such models. As we already mentioned, the data scientist is well versed in coding in third-generation and functional programming languages, such as Scala, Java, and Python, in addition to the domain-specific languages of the classic analytics world. In this function, the data scientist collaborates with development departments to build fully fledged distributed applications that can be productively deployed to Hadoop.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences
by Rob Kitchin
Published 25 Aug 2014

The worry for many commentators is that the potential benefits of data-driven business and science will not be fully realised due to a shortage of human talent, in particular ‘data scientists, who combine the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data’ (Cukier 2010), and managers who understand how to convert such nuggets into wise decisions. With respect to the latter, as Shah et al. (2012: 23) note, ‘[i]nvestments in analytics can be useless, even harmful, unless employees can incorporate that data into complex decision making’. Universities are now starting to create new data science programmes and research centres, and to adapt existing courses to include training in new skills sets, in an effort to ameliorate some skills gaps.

He argues that ‘shifting the analysis to digital data... opens up epistemic questions as to who is the most legitimate producer of knowledge – the museum collector (the clinician, or the molecular biologist) or the statistician analyzing the data’ or producing the simulation or model (2012: 87). Some data scientists are thus undoubtedly ignoring the observations of Porway (2013): Without subject matter experts available to articulate problems in advance, you get [poor] results... Subject matter experts are doubly needed to assess the results of the work, especially when you’re dealing with sensitive data about human behavior. As data scientists, we are well equipped to explain the ‘what’ of data, but rarely should we touch the question of ‘why’ on matters we are not experts in. As Porway notes, what is really needed is for data scientists and domain experts to work with each other to ensure that the data analytics used make sense and that the results from such analytics are sensibly and contextually interpreted.

Berry, D. (2011) ‘The computational turn: thinking about the digital humanities’, Culture Machine, 12, http://www.culturemachine.net/index.php/cm/article/view/440/470 (last accessed 3 December 2012). Bertolucci, J. (2013) ‘IBM, universities team up to build data scientists’, InformationWeek, 15 January. http://www.informationweek.com/big-data/big-data-analytics/ibm-universities-team-upto-build-data-scientists/ (last accessed 16 January 2014). Bettencourt, L.M.A., Lobo, J., Helbing, D., Kuhnert, C. and West, G.B. (2007) ‘Growth, innovation, scaling, and the pace of life in cities’, Proceedings of the National Academy of Sciences, 104(17): 7301–06.

pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines
by Thomas H. Davenport and Julia Kirby
Published 23 May 2016

If you have a background in one of these (slightly) ancillary aspects of software, you can probably find the same kind of job related to automated software. Data Scientists —A couple of years ago, Tom and D. J. Patil, now chief data scientist of the White House Office of Science and Technology Policy, wrote (with Julia’s editing help) an article suggesting that data scientists held the “sexiest job of the 21st century.”1 It’s not that the people themselves were necessarily sexy, but that the jobs were difficult and hard to fill. They still are, though the shortage may be easing a bit, with the introduction of a number of new master’s programs in data science at U.S. universities. Data scientists are likely to be highly valued when the data used by cognitive systems are highly unstructured (voice or text or human genome records, as opposed to rows and columns of numbers) or difficult to extract from its source.

He does that with his team for about 60 percent of his time. He also spends a lot of time with customers—roughly a couple of days a week. He hears what their needs for new capabilities are, and translates that into data science activity by his team. Whenever he can squeeze it in, he interviews and hires new data scientists, and meets with other DataXu executives. When he hires other data scientists, Catanzaro looks for three types of skills, only one of which is technical. The first is “data science smarts”—being good with big data technologies, statistics, and so forth. He’s not so much interested in knowledge of a particular set of tools, but rather the “raw horsepower” to be able to master new tools.

And they also are likely to have either quantitative modeling skills or natural language processing skills. What do data scientists do day to day in the development of automated decision systems? Automated systems typically use a lot of data, so the data scientist might be scouting around to figure out the next great external data source. After a promising source is identified, he or she might be determining how to get the data into the right format, or how to combine it with the data that the organization already has. The data scientist might also be working on an algorithm to extract insights from the data. Or, since data scientists tend to be good at computational skills, too, they might be architecting or helping to develop the new or modified system.

pages: 241 words: 70,307

Leadership by Algorithm: Who Leads and Who Follows in the AI Era?
by David de Cremer
Published 25 May 2020

Second, some departments will employ algorithms more than others, thereby creating differences in work attitudes towards automation, which will make the process of digital transformation more difficult. Integrating teams of data scientists in the daily operations of the company With the work environment gradually being automated, organizations will increasingly hire more people with an engineering and data-science background. These new hires will have an expert understanding of the new technology and the ability to work with big data. A problem, however, is that those experts usually do not share the same mindset as the people who are not trained in the fields of engineering and data science. Organizations often fail to recognize this difference in mindset and put little effort into ensuring that data scientists are integrated into the organization.

In a way, leaders in the 21st century show competence by bringing the right teams of experts together to optimise the use of data to bring the value that is expected. For example, leaders connecting teams of data scientists with HR and finance teams in transparent and effective ways can help to increase the success rate of digital transformation strategies. The team of data scientists will help their colleagues to see what possibilities are available to digitalize information. Equally, the other teams can help data scientists understand their needs and thus provide input in designing a more user-friendly digital environment. Finally, because successes are rarely achieved immediately, it is important that leaders provide regular updates to the different parties involved on how the challenges are being approached.

Organizations often fail to recognize this difference in mindset and put little effort into ensuring that data scientists are integrated into the organization. Becoming an automated organization means that all operations will be affected. Teams of data scientists thus need to understand the goals of the finance department, human resource department, sales department and so forth. Likewise, organizations need to prepare all the other departments to be open and collaborative with the team of data scientists. It is only with an open-minded attitude that successful integration and implementation of algorithms within the context of each department can be achieved.²⁰² Promoting transparency in communication and exchange of data Today there is no longer any doubt that to succeed in the future, organizations will need to have the ability to deal with data and use algorithms.

pages: 374 words: 94,508

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage
by Douglas B. Laney
Published 4 Sep 2017

These types of algorithms can help to monetize almost any kind of information, be it granular IoT data or macro-level economic figures. Expect these kinds of algorithms to be a standard component in the vast majority of data scientist toolboxes and increasingly accepted by business leaders, despite their “black box” models. As a result, many data science tasks will become automated, increasing the productivity of data scientists and enabling a class of “citizen data scientists” to emerge. This will put signifi-cant pressure on information asset supply chains and information curation efforts, and engender a boom in information monetization ideas from all corners of the organization across all industries.

For example, the real estate aggregator Trulia discovered that 90 percent of its web traffic is from people clicking on photos of homes. But Trulia had no information about what was in the photos. The photos had no descriptions or tags. So Trulia’s data science team trained a one-billion-node neural network to learn what is depicted in them. Now, according to Todd Holloway, who started Trulia’s data science program, “The system can find you a home in the Hamptons with photos of wine cellars.”11 Helping a buyer find a home is one thing, but now Trulia can correlate sales data with what site users are looking at, and license this information and insight to realtors, homebuilders, appliance manufacturers, and any type of company within the periphery of the real estate market.

And value-focused CDOs, will deploy information assets to generate supplemental and significant revenue streams. Advanced Analytics, Data Science, and Artificial Intelligence Not just a global trend, but also a technology trend, advanced analytics solutions are becoming increasingly popular in driving business innovation and experimentation—and creating competitive advantage by monetizing available information assets inside and outside the organization. Over the foreseeable future, enterprises will be seeking to adopt advanced analytics and adapt their business models, establish specialist data science teams, and rethink their overall strategies to keep pace with the competition.

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
by Valliappa Lakshmanan , Sara Robinson and Michael Munn
Published 31 Oct 2020

Data engineers implement infrastructure and pipelines around data. Machine learning engineers do similar tasks to data engineers, but for ML models. They take models developed by data scientists, and manage the infrastructure and operations around training and deploying those models. ML engineers help build production systems to handle updating models, model versioning, and serving predictions to end users. The smaller the data science team at a company and the more agile the team is, the more likely it is that the same person plays multiple roles. If you are in such a situation, it is very likely that you read the above three descriptions and saw yourself partially in all three categories.

His team builds software solutions for business problems using Google Cloud’s data analytics and machine learning products. He founded Google’s Advanced Solutions Lab ML Immersion program. Before Google, Lak was a Director of Data Science at Climate Corporation and a Research Scientist at NOAA. Sara Robinson is a Developer Advocate on Google’s Cloud Platform team, focusing on machine learning. She inspires developers and data scientists to integrate ML into their applications through demos, online content, and events. Sara has a bachelor’s degree from Brandeis University. Before Google, she was a Developer Advocate on the Firebase team.

Roles Within an organization, there are many different job roles relating to data and machine learning. Below we’ll define a few common ones referenced frequently throughout the book. This book is targeted primarily at data scientists, data engineers, and ML engineers, so let’s start with those. A data scientist is someone focused on collecting, interpreting, and processing datasets. They run statistical and exploratory analysis on data. As it relates to machine learning, a data scientist may work on data collection, feature engineering, model building, and more. Data scientists often work in Python or R in a notebook environment, and are usually the first to build out an organization’s machine learning models.

pages: 366 words: 76,476

Dataclysm: Who We Are (When We Think No One's Looking)
by Christian Rudder
Published 8 Sep 2014

I transported the data to the previous Voronoi partition in order to maintain consistency with the previous Craigslist map. Years ago, an enterprising hacker The hacker is Pete Warden, and his post is “How to Split Up the US,” which you can find here: petewarden.com/2010/02/06/how-to-split-up-the-us/. As Warden notes in a later post, “Why You Should Never Trust a Data Scientist,” his grouping of the United States into the seven new zones is arbitrary—the data science version of “for entertainment purposes only.” I reference them here in that spirit. Matthew Zook, a geographer Professor Zook and his team maintain a fantastic geography blog called Floating Sheep, and that blog was my primary source for his work: floatingsheep.org.

For more information and the full study, please refer to the Facebook Data Science post on Coordinated Migration: www.facebook.com/notes/facebook-data-science/coordinated-migration/10151930946453859. As you’ll see when you visit the link, in reproducing their work, I modified their original map by removing the labels and focusing on a smaller part of the region, to make the map more readable in print. Thank you to Mike Develin, also at Facebook, for helping facilitate permission for this reproduction. All Facebook Data Science work is done on anonymized and aggregated data. Chapter 13: Our Brand Could Be Your Life But what they don’t tell you See Clare Baker, “Behind the Red Triangle: The Bass Pale Ale Brand and Logo” Logoworks.com, November 8, 2013, logoworks.com/blog/bass-pale-ale-brand-and-logo/.

And it’s because these few letters are such a concise description that Shazam is so fast: instead of a guitar, Paul McCartney, and just the right amount of reverb, “Yesterday” starts with •DRUUUUUUDDR. That’s a lot easier to understand. Like an app straining for a song, data science is about finding patterns. Time after time, I—and the many other people doing work like me—have had to devise methods, structures, even shortcuts to find the signal amidst the noise. We’re all looking for our own Parsons code. Something so simple and yet so powerful is a once-in-a-lifetime discovery, but luckily there are a lot of lifetimes out there. And for any problem that data science might face, this book has been my way to say: I like our odds. 1 For more on the Kafkaesque implications of the CFAA, please see “Until Today, If You Were 17, It Could Have Been Illegal to Read Seventeeen.com Under the CFAA” and “Are You a Teenager Who Reads News Online?

pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money
by Frank J. Ohlhorst
Published 28 Nov 2012

Determining those skills is one of the first steps in putting a team together. THE DATA SCIENTIST One of the first concepts to become acquainted with is the data scientist; a relatively new title, it is not readily recognized or accepted by many organizations, but it is here to stay. A data scientist is normally associated with an employee or a business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge. The data scientist is usually the de facto team leader during a Big Data analytics project. The title data scientist is sometimes disparaged because it lacks specificity and can be perceived as an aggrandized synonym for data analyst.

Much like the data themselves, the team should not be static in nature and should be able to evolve and adapt to the needs of the business. CHALLENGES REMAIN Locating the right talent to analyze data is the biggest hurdle in building a team. Such talent is in high demand, and the need for data analysts and data scientists continues to grow at an almost exponential rate. Finding this talent means that organizations will have to focus on data science and hire statistical modelers and text data–mining professionals as well as people who specialize in sentiment analysis. Success with Big Data analytics requires solid data models, statistical predictive models, and test analytic models, since these will be the core applications needed to do Big Data.

Nevertheless, the position is gaining acceptance with large enterprises that are interested in deriving meaning from Big Data, the voluminous amount of structured, unstructured, and semistructured data that a large enterprise produces or has access to. A data scientist must possess a combination of analytic, machine learning, data mining, and statistical skills as well as experience with algorithms and coding. However, the most critical skill a data scientist should possess is the ability to translate the significance of data in a way that can be easily understood by others. THE TEAM CHALLENGE Finding and hiring talented workers with analytics skills is the first step in creating an effective data analytics team.

pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics
by Thomas H. Davenport and Jinho Kim
Published 10 Jun 2013

For example, at Intuit, George Roumeliotis heads a data science group that analyzes and creates product features based on the vast amount of online data that Intuit collects. For every project in which his group engages with an internal customer, he recommends a methodology for doing and communicating about the analysis. Most of the steps have a strong business orientation: My understanding of the business problem How I will measure the business impact What data is available The initial solution hypothesis The solution The business impact of the solution Data scientists using this methodology are encouraged to create a wiki so that they can post the results of each step.

In the online information industry, companies have big data involving many petabytes of information. New information comes in at such volume and speed that it would be difficult for humans to comprehend it all. In this environment, the data scientists that work in such organizations (basically quantitative analysts with higher-than-normal IT skills) are often located in product development organizations. Their goal is to develop product prototypes and new product features, not reports or presentations. For example, the Data Science group at the business networking site LinkedIn is a part of the product organization, and has developed a variety of new product features and functions based on the relationships between social networks and jobs.

Patil eventually graduated and became a faculty member at Maryland, and worked on modeling the complexity of weather. He then worked for the US government on intelligence issues. Research funding was limited at the time, so he left to work for Skype, then owned by eBay. He next became the leader of data scientists at LinkedIn, where the people in that highly analytical position have been enormously influential in product development. Now Patil is a data scientist in residence (perhaps the first person with that title as well) at the venture capital firm Greylock Partners, helping the firm’s portfolio companies to think about data and analytics. He’s perhaps the world’s best example of latent math talent.

pages: 918 words: 257,605

The Age of Surveillance Capitalism
by Shoshana Zuboff
Published 15 Jan 2019

.… Who knows what other research they’re doing.”14 In other words, Fiske recognized that the experiment was merely an extension of Facebook’s standard practices of behavioral modification, which already flourish without sanction. Facebook data scientist and principal researcher Adam Kramer was deluged with hundreds of media queries, leading him to write on his Facebook page that the corporation really does “care” about its emotional impact. One of his coauthors, Cornell’s Jeffrey Hancock, told the New York Times that he didn’t realize that manipulating the news feeds, even modestly, would make some people feel violated.15 The Wall Street Journal reported that the Facebook data science group had run more than 1,000 experiments since its inception in 2007 and operated with “few limits” and no internal review board.

In studying the surveillance capitalist practices of Google, Facebook, Microsoft, and other corporations, I have paid close attention to interviews, patents, earnings calls, speeches, conferences, videos, and company programs and policies. In addition, between 2012 and 2015 I interviewed 52 data scientists from 19 different companies with a combined 586 years of experience in high-technology corporations and startups, primarily in Silicon Valley. These interviews were conducted as I developed my “ground truth” understanding of surveillance capitalism and its material infrastructure. Early on I approached a small number of highly respected data scientists, senior software developers, and specialists in the “internet of things.” My interview sample grew as scientists introduced me to their colleagues.

Surveillance capitalists adapted many of the highly contestable assumptions of behavioral economists as one cover story with which to legitimate their practical commitment to a unilateral commercial program of behavior modification. The twist here is that nudges are intended to encourage choices that accrue to the architect, not to the individual. The result is data scientists trained on economies of action who regard it as perfectly normal to master the art and science of the “digital nudge” for the sake of their company’s commercial interests. For example, the chief data scientist for a national drugstore chain described how his company designs automatic digital nudges that subtly push people toward the specific behaviors favored by the company: “You can make people do things with this technology.

Mindf*ck: Cambridge Analytica and the Plot to Break America
by Christopher Wylie
Published 8 Oct 2019

But that, of course, was not why Nix gave them full access to the private data of hundreds of millions of American citizens. Nix’s dream, as he had confided in our very first meeting, was to become the “Palantir of propaganda.” One lead data scientist from Palantir began making regular trips to the Cambridge Analytica office to work with the data science team on building profiling models. He was occasionally accompanied by colleagues, but the entire arrangement was kept secret from the rest of the CA teams—and perhaps Palantir itself. I can’t speculate about why, but the Palantir staff received Cambridge Analytica database logins and emails with fairly obvious pseudonyms like “Dr.

Fresh out of university, I had taken a job at a London firm called SCL Group, which was supplying the U.K. Ministry of Defence and NATO armies with expertise in information operations. After Western militaries were grappling with how to tackle radicalization online, the firm wanted me to help build a team of data scientists to create new tools to identify and combat extremism online. It was fascinating, challenging, and exciting all at once. We were about to break new ground for the cyber defenses of Britain, America, and their allies and confront bubbling insurgencies of radical extremism with data, algorithms, and targeted narratives online.

It seemed entirely hypocritical to, on the one hand, frustrate jihadist groups in places like Pakistan and then, on the other, assist an autocratic and Islamist-backed regime in Egypt in creating its own tyranny of people. But Nix didn’t care. Business was business; he just wanted to clinch the deal. The main challenge for me and the growing team of psychologists and data scientists at SCL was in the objective substance of extremism itself. What does it mean to be an extremist? What exactly is extremism, and how can you model it? These were subjective definitions, and clearly the Egyptian government had one idea, while we had another. But if you want to be able to quantify and predict a trait, you have to be able to create a definition of it.

pages: 245 words: 83,272

Artificial Unintelligence: How Computers Misunderstand the World
by Meredith Broussard
Published 19 Apr 2018

I’m going to take you through a tutorial from a site called DataCamp, which was recommended as a first step for competing in data-science competitions by a different site, Kaggle.16 Kaggle, which is owned by Google’s parent company, Alphabet, is a site in which people compete to get the highest score for analyzing a dataset. Data scientists use it to compete in teams, sharpen their skills, or practice collaborating. It’s also useful for teaching students about data science or for finding datasets. We’re going to do a DataCamp Titanic tutorial using Python and a few popular Python libraries: pandas, scikit-learn, and numpy.

Pilhofer, “A Note to Users of DocumentCloud.” Acknowledgments Thank you to all the people who helped to bring this book to reality. I am grateful to my colleagues at New York University’s (NYU’s) Arthur L. Carter Journalism Institute, my colleagues at the Moore-Sloan Data Science Environment at NYU’s Center for Data Science, the faculty and staff at the Tow Center for Digital Journalism at Columbia Journalism School, and my former colleagues at Temple University and the University of Pennsylvania. For reading, consulting on, or otherwise midwifing this manuscript, I am eternally indebted to Elena Lahr-Vivaz, Rosalie Siegel, Jordan Ellenberg, Cathy O’Neil, Miriam Peskowitz, Samira Baird, Lori Tharps, Kira Baker-Doyle, Jane Dmochowski, Josephine Wolff, Solon Barocas, Hanna Wallach, Katy Boss, Janet Alteveer, Leslie Hunt, Elizabeth Hunt, Kay Kinsey, Karen Masse, Stevie Santangelo, Jay Kirk, Claire Wardle, Gita Manaktala, Melinda Rankin, Kathleen Caruso, Kyle Gipson, my writers’ group, and the talented team at the MIT Press.

There has never been, nor will there ever be, a technological innovation that moves us away from the essential problems of human nature. Why, then, do people persist in thinking there’s a sunny technological future just around the corner? I started thinking about technochauvinism one day when I was talking with a twenty-something friend who works as a data scientist. I mentioned something about Philadelphia schools that didn’t have enough books. “Why not just use laptops or iPads and get electronic textbooks?” asked my friend. “Doesn’t technology make everything faster, cheaper, and better?” He got an earful. (You’ll get one too in a later chapter.) However, his assumption stuck with me.

pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It)
by Salim Ismail and Yuri van Geest
Published 17 Oct 2014

For talented workers, working on and getting paid for multiple projects is a particularly welcome opportunity. But there’s another angle as well: an increase in the diversity of ideas. The data science company Kaggle, for example, offers a platform that hosts private and public algorithm contests in which more than 185,000 data scientists around the world vie for prizes and recognition. In 2011, Insurance giant Allstate, with forty of the best actuaries and data scientists money could buy, wanted to see if its claims algorithm could be improved upon, so it ran a contest on Kaggle. It turned out that the Allstate algorithm, which has been carefully optimized for over six decades, was bested within three days by 107 competing teams.

In fact, in every one of Kaggle’s 150 contests to date, external data scientists have beaten the internal algorithms, often by a wide margin. And in most cases outsiders (non-experts) have beaten the experts in a particular domain, which shows the power of fresh thinking and diverse perspectives. In years past, having a large workforce differentiated your enterprise and allowed it to accomplish more. Today, that same large workforce can become an anchor that reduces maneuverability and slows you down. Moreover, traditional industries have great difficulty attracting on-demand high-skill workers such as data scientists because the available positions are perceived as being low in terms of opportunity and high in terms of bureaucratic obstacles.

While growing exponentially as a company, Interfaces are critical if an organization is to scale seamlessly, especially on a global level. The same is true of other firms that coordinate data and oversee everything from prizes to personnel. Kaggle has its own unique mechanisms to manage its 200,000 data scientists. The X Prize Foundation has created mechanisms and dedicated teams for each of its prizes. TED has strict guidelines that help its many “franchised” TEDx events around the world deliver with consistency. And Uber has its own ways of handling its army of drivers. Most of these Interface processes are unique and proprietary to the organization that developed them, and as such comprise a unique type of intellectual property that can be of considerable market value.

pages: 50 words: 13,399

The Elements of Data Analytic Style
by Jeff Leek
Published 1 Mar 2015

A large number of these are summarized in Karl Browman’s excellent presentation on displaying data badly. 11. Presenting data Giving data science talks can help you: Meet people Get people excited about your ideas/software/results Help people understand your ideas/software/results The importance of the first point can’t be overstated. The primary reason you are giving talks is for people to get to know you. Being well known and well regarded can make a huge range of parts of your job easier. So first and foremost make sure you don’t forget to talk to people before, after, and during your talk. Point 2 is more important than point 3. As a data scientist, it is hard to accept that the primary purpose of a talk is advertising, not data science.

As a data scientist, it is hard to accept that the primary purpose of a talk is advertising, not data science. See for example Hilary Mason’s great presentation Entertain, don’t teach. Here are reasons why entertainment is more important: That being said, be very careful to avoid giving a TED talk. If you are giving a data science presentation the goal is to communicate specific ideas. So while you are entertaining, don’t forget why you are entertaining. 11.1 Tailor your talk to your audience It depends on the event and the goals of the event. Here is a non-comprehensive list: Small group meeting: Goal: Update people you work with on what you are doing and get help.

Additional resources 15.1 Class lecture notes Johns Hopkins Data Science Specialization and Additional resources Data wrangling, exploration, and analysis with R Tools for Reproducible Research Data carpentry 15.2 Tutorials Git/github tutorial Make tutorial knitr in a knutshell Writing an R package from scratch 15.3 Leek group guides To data sharing To giving talks To developing R packages 15.4 Books An introduction to statistical learning Advanced data analysis from an elementary point of view Advanced R programming OpenIntro Statistics Statistical inference for data science

pages: 271 words: 62,538

The Best Interface Is No Interface: The Simple Path to Brilliant Technology (Voices That Matter)
by Golden Krishna
Published 10 Feb 2015

People like Gordon Bell—through his MyLifeBits project at Microsoft Research Silicon Valley Laboratory—started collecting information about their own lives to beat memory loss.11 Over time, the digital medium’s incredible ability to remember built up to what today we call “big data.” Enter the world of the data scientist to find relevant meaning in all that digital data, to power solutions that adjust to your wonderful uniqueness. To think of you. What do you want to eat tonight? What’s the best way home? Data science is one way we can find meaning in all that cheaply stored information—whether big data or even small, relevant, searchable sets—and develop real insights and accurate answers to valuable individual questions.

In 2006, Jonathan Goldman, a PhD graduate in physics from Stanford working at the professional social network LinkedIn, wondered if, by studying you, the service could suggest connections you might know. Could he create an interface of smart connections built around you and your friends rather than putting you in a box as part of a generic database? Even at LinkedIn, where data scientist celebrity DJ Patil, um, coined the term “data scientist” with Jeff Hammerbacher back in 2008, the concept of creating platforms that adapted to your wonderfully distinct qualities seemed foreign just two years earlier.17 According to Patel, outspoken stakeholders in the company were said to have been “openly dismissive” of Goldman’s concept.

Rosenwald, “For Tablet Computer Visionary Roger Fidler, a Lot of What-Ifs,” Washington Post, March 10, 2012. http://www.washingtonpost.com/business/for-tablet-computer-visionary-roger-fidler-a-lot-of-what-ifs/2012/02/28/gIQAM0kN1R_story.html 17 “Some colleagues were openly dismissive of Goldman’s ideas. Why would users need LinkedIn to figure out their networks for them?” Thomas H. Davenport and D.J. Patil, “Data Scientist: The Sexiest Job of the 21st Century,” Harvard Business Review, October 2012. http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1 Chapter 15 Proactive Computing 1 “During the second World War, the robot was stored in the basement of the Weeks’s family home in Ohio, where he became 8-year-old Jack’s playmate.” Noel Sharkey, “Sign In To Read: The Return of Elektro, the First Celebrity Robot,” New Scientist, December 25, 2008. http://www.newscientist.com/article/mg20026873.000-the-return-of-elektro-the-first-celebrity-robot.html?

pages: 276 words: 81,153

Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives
by David Sumpter
Published 18 Jun 2018

He describes the Big Five personality traits; he outlines how surveys can be replaced by Facebook profiles; he claims that the results of one of his regression models reveals our conscientiousness and neuroticism; he talks about how political messages can be targeted to the individual and then he closes by claiming that ‘my model, given your Facebook likes, your age and your gender, can predict how agreeable you are just as well as your spouse’. One day, he says, we might fall in love with a computer that understands us better than our partner. I start to doubt whether the data scientist in the video truly believes what he is saying. I’m not sure he even expects his audience to believe it. His ‘research’ is the product of an eight-week training programme at ASI Data Science, a programme for aspiring data scientists. But even if this is just some sort of practice talk, I am deeply disturbed by what I see. This is a young man with the highest level of scientific training: a PhD in theoretical physics from Cambridge University.

The story of Cambridge Analytica took me deep into a web of blogs and privacy activists’ websites. Following these links, I found my way to a YouTube video of a young data scientist, who now works for Cambridge Analytica, presenting a research project he had carried out when working as an intern at the company. He starts his presentation with a reference to the film Her, in which the lead character Theodore, played by Joaquin Phoenix, falls in love with his operating system (OS). In the film, the computer forms a deep understanding of Theodore’s personality, and the human and the OS fall in love. The young data scientist uses this story to set up his own five-minute presentation: ‘Can a computer ever understand us better than a human?’

Glenn told me that the process of making recommendations is far from a pure science, ‘half of my job is trying to work out which computer-generated responses make sense’. When Glenn chose his job title, he asked to be called ‘data alchemist’ instead of ‘data scientist’. He sees his job not as searching for abstract truths about musical styles, but as providing classi­fications that make sense to people. This process requires humans and computers to work together. Given the vast scope of ‘Every Noise at Once’, Glenn’s modesty resonated strongly with me. Like many of the data scientists I had spoken to, he saw his job as navigating a very high-dimensional space. But he was the first person I had talked to who openly acknowledged the deeply personal and unknowable dimensions of our minds.

pages: 482 words: 121,173

Tools and Weapons: The Promise and the Peril of the Digital Age
by Brad Smith and Carol Ann Browne
Published 9 Sep 2019

More than in the past this will require that those who create technology come not only from disciplines such as computer and data science but also from the social and natural sciences and humanities. If we’re to ensure that artificial intelligence makes decisions based on the best that humanity has to offer, its development must result from a multidisciplinary process. And as we think about the future of higher education, we’ll need to make certain that every computer and data scientist is exposed to the liberal arts, just as everyone who majors in the liberal arts will need a dose of computer and data science. We’ll also need to see more focus on ethics in computer and data science courses themselves.

Back to note reference 8. In 2018, we created a dedicated data science team to help us advance our work on key societal issues. We recruited one of Microsoft’s most experienced data scientists, John Kahan, to lead the team. He had led a large team that applied data analytics to track and analyze the company’s sales and product usage, and I had seen first-hand in weekly Senior Leadership Team meetings how this had improved our business performance. He also had a much broader set of interests, based in part on the work he and his team had pursued to use data science to better diagnose the causes of Sudden Infant Death Syndrome, or SIDS, to which John and his wife had lost their infant son, Aaron, more than a decade before.

He also had a much broader set of interests, based in part on the work he and his team had pursued to use data science to better diagnose the causes of Sudden Infant Death Syndrome, or SIDS, to which John and his wife had lost their infant son, Aaron, more than a decade before. Dina Bass, “Bereaved Father, Microsoft Data Scientists Crunch Numbers to Combat Infant Deaths,” Seattle Times, June 11, 2017, https://www.seattletimes.com/business/bereaved-father-microsoft-data-scientists-crunch-numbers-to-combat-infant-deaths/. One of the first projects we gave to the new team was to dig into the concerns we had developed regarding the FCC’s national data map on broadband availability. Within a few months, the team had used multiple data sets to analyze the broadband gap across the country, including data from the FCC and the Pew Research Center, as well as anonymized Microsoft data collected as part of ongoing work to improve the performance and security of our software and services.

pages: 208 words: 57,602

Futureproof: 9 Rules for Humans in the Age of Automation
by Kevin Roose
Published 9 Mar 2021

Just by tweaking its algorithms, Netflix can steer users to its original shows, Amazon can steer users to its house brands, and Apple can recommend its own apps in the App Store, even when other apps might be preferable. The power to change users’ preferences at scale has made some technologists uncomfortable. Rachel Schutt, a data scientist, said as much in a 2012 interview with the Times: “Models do not just predict,” she said, “but they can make things happen.” A former product manager at Facebook went even further, telling BuzzFeed News that Facebook’s recommendation algorithms amounted to an attempt to “reprogram humans.” “It’s hard to believe that you could get humans to override all of their values that they came in with,” the former Facebook employee said.

They’re discouraging us from building the kind of personal autonomy that will protect us in the age of AI and automation, by allowing us to think and act for ourselves. And they’re doing it under the guise of helping us. In a 2017 paper about the history of Amazon’s recommendation algorithms, Amazon engineer Brent Smith and Microsoft data scientist Greg Linden sketched out a vision of the AI-driven future that feels, to me, both deeply dystopian and very, very plausible. “Every interaction should reflect who you are and what you like, and help you find what other people like you have already discovered,” they wrote. “It should feel hollow and pathetic when you see something that’s obviously not you; do you not know me by now?”

Near the end of the talk, LeCun made an unexpected prediction about the effects all of this AI and machine learning technology would have on the job market. Despite being a technologist himself, he said that the people with the best chances of coming out ahead in the economy of the future were not programmers and data scientists, but artists and artisans. To illustrate his point, he projected a slide with two photos: one of a Blu-Ray DVD player, which was being sold on Amazon for $47, and another of a handmade ceramic bowl, which was selling for $750. The difference in complexity between the two objects, he said, was stark.

pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands
by Eric Topol
Published 6 Jan 2015

While there are a relatively small number of such professionals in a world inundated with data challenges in every sector, health care has not been able to attract enough gifted individuals proportional to the size and importance of this field. The irony now is that data scientists are the ones being referred to as the “high priests.”75b The iMedicine Galaxy Patients, companies, employers, doctors, government, data scientists: these major forces are like stars gravitationally bound in a galaxy, the movement of any one of them affecting all the others. What I have tried to convey here is that we have a new galaxy in the making. Just as the printing press was the great object around which modern culture has orbited, the smartphone and iMedicine are forcing a comparable transformation.

That seems a pretty grim prognosis. But medicine is morphing into a data science, now that big data, unsupervised algorithms, predictive analytics, machine learning, augmented reality, and neuromorphic computing are coming in. There’s still an opportunity to change medicine for the better and at least a chance for prevention. That is, if there was a surefire signal before a disease had ever manifested itself in a person—and this information was highly actionable—the individual’s illness might be preempted. This dream isn’t simply one of better data science, however. It is inextricably linked to the democratization of medicine.

Madrigal, “I’m Being Followed: How Google—and 104 Other Companies—Are Tracking Me on the Web,” The Atlantic, February 2014, http://www.theatlantic.com/technology/print/2012/02/im-being-foll%C951-and-104-other-companies-151-are-tracking-me-on-the-web/253758/. 26. S. Wolfram, “Data Science of the Facebook World,” Stephen Wolfram Blog, April 24, 2014, http://blog.stephenwolfram.com/2013/04/data-science-of-the-facebook-world/. 27. D. Mann, “1984 in 2014,” EP Studios Software, April 21, 2014, http://www.epstudiossoftware.com/?p=1411. 28. H. Kelly, “After Boston: The Pros and Cons of Surveillance Cameras,” CNN, April 26, 2013, http://www.cnn.com/2013/04/26/tech/innovation/security-cameras-boston-bombings/. 29.

Scikit-Learn Cookbook
by Trent Hauck
Published 3 Nov 2014

ISBN 978-1-78398-948-5 www.packtpub.com www.it-ebooks.info Credits Author Project Coordinator Trent Hauck Harshal Ved Reviewers Proofreaders Anoop Thomas Mathew Simran Bhogal Xingzhong Bridget Braund Amy Johnson Commissioning Editor Kunal Parikh Indexer Tejal Soni Acquisition Editor Owen Roberts Graphics Sheetal Aute Content Development Editor Dayan Hyames Technical Editors Mrunal M. Chavan Dennis John Copy Editors Janbal Dharmaraj Ronak Dhruv Abhinash Sahu Production Coordinator Manu Joseph Cover Work Manu Joseph Sayanee Mukherjee www.it-ebooks.info About the Author Trent Hauck is a data scientist living and working in the Seattle area. He grew up in Wichita, Kansas and received his undergraduate and graduate degrees from the University of Kansas. He is the author of the book Instant Data Intensive Apps with pandas How-to, Packt Publishing—a book that can get you up to speed quickly with pandas and other associated technologies.

It's a pretty simple function: f ( x) = 1 1 + e−t 75 www.it-ebooks.info Working with Linear Models Visually, it looks like the following: Let's use the make_classification method, create a dataset, and get to classifying: >>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_samples=1000, n_features=4) How to do it... The LogisticRegression object works in the same way as the other linear models: >>> from sklearn.linear_model import LogisticRegression >>> lr = LogisticRegression() Since we're good data scientists, we will pull out the last 200 samples to test the trained model on. Since this is a random dataset, it's fine to hold out the last 200; if you're dealing with structured data, don't do this (for example, if you deal with time series data): >>> >>> >>> >>> X_train = X[:-200] X_test = X[-200:] y_train = y[:-200] y_test = y[-200:] We'll discuss more on cross-validation later in the book.

Okay, so now that we've looked at how we can classify points based on distribution, let's look at how we can do this in scikit-learn: >>> >>> >>> >>> from sklearn.mixture import GMM gmm = GMM(n_components=2) X = np.row_stack((class_A, class_B)) y = np.hstack((np.ones(100), np.zeros(100))) Since we're good little data scientists, we'll create a training set: >>> train = np.random.choice([True, False], 200) >>> gmm.fit(X[train]) GMM(covariance_type='diag', init_params='wmc', min_covar=0.001, n_components=2, n_init=1, n_iter=100, params='wmc', random_state=None, thresh=0.01) 110 www.it-ebooks.info Chapter 3 Fitting and predicting is done in the same way as fitting is done for many of the other objects in scikit-learn: >>> gmm.fit(X[train]) >>> gmm.predict(X[train])[:5] array([0, 0, 0, 0, 0]) There are other methods worth looking at now that the model has been fit.

pages: 125 words: 27,675

Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning
by Benjamin Bengfort , Rebecca Bilbro and Tony Ojeda
Published 10 Jun 2018

By specifically targeting text data about baseball or basketball, we reduce this ambiguity, but we also reduce the overall size of our corpus. This is a significant tradeoff, because we will need a very large corpus in order to provide sufficient training examples to our language models, thus we must find a balance between domain specificity and corpus size. Data Ingestion of Text As data scientists, we rely heavily on structure and patterns, not only in the content of our data, but in its history and provenance. In general, good data sources have a determinable structure, where different pieces of content are organized according to some schema and can be extracted systematically via the application of some logic to that schema.

Most modern web and social media services have APIs that developers can access, and they are typically accompanied by documentation with instructions on how to access and obtain the data. Note As a web service evolves, both the API and the documentation are usually updated as well, and as developers and data scientists, we need to stay current on changes to the APIs we use in our data products. A RESTful API is a type of web service API that adheres to representational state transfer (REST) architectural constraints. REST is a simple way to organize interactions between independent systems, allowing for lightweight interaction with clients such as mobile phones and other websites.

The primary reason they do this is so that they can prevent abuse of their service. Many service providers allow for registration using OAuth, which is an open authentication standard that allows a user’s information to be communicated to a third party without exposing confidential information such as their password. APIs are popular data sources among data scientists because they provide us with a source of ingestion that is authorized, structured, and well-documented. The service provider is giving us permission and access to retrieve and use the data they have in a responsible manner. This isn’t true of crawling/scraping or RSS, and for this reason, obtaining data via API is preferable whenever it is an option.

pages: 706 words: 202,591

Facebook: The Inside Story
by Steven Levy
Published 25 Feb 2020

For the 2010 midterm election, Facebook expanded the program, with a prominent button proclaiming “I Voted” visible to users. But not all visitors. Facebook used the midterms to conduct an elaborate experiment. Two of Facebook’s top data scientists, working with researchers at the University of California at San Diego, decided to test whether the voter button actually affected voter turnout. If you saw that your friends voted, would it influence you to do the same? Cameron Marlow, who headed Data Science for Facebook at the time, says the experiment was an innocent exercise: “We had a product that had run in every single election and we were starting to run in other countries’ elections—the goal was to get people out to vote.”

The work still continued—after all, research led to Growth!—but the company did not want to be misunderstood again. “I don’t think that they stopped doing experiments—they just stopped publishing them,” says Cameron Marlow, who headed Data Science but left shortly before the emotion paper was published. “So is that a good thing for society? Probably not.” As I found by informal conversations at a Data Science conference on campus in 2019, though, most of its researchers stuck around. They feel that their work is important. * * * • • • BEGINNING IN 2013, Kogan was visiting the Facebook campus regularly. He ate a lot of free lunches.

He became a co-founder of Cloudera, a company that stored data in the Internet cloud, and later became involved in trying to solve cancer with data analysis. Though he felt no animosity toward his former employer, at times he has expressed its motivations in phrases that speak volumes. In 2011, assessing the jobs of data scientists at Facebook and its peers, he made a remark to a BusinessWeek journalist that would reverberate for years: “The best minds of my generation are thinking about how to make people click ads,” he said. “That sucks.” * * * • • • IN THE OLD television show Mission: Impossible, each episode began when the leader, Jim, flipped through the dossiers of spies, strongmen, and honeypots, putting rejects in one pile and tossing the photos of the talented ones who were perfect for the mission into a stack on his coffee table.

pages: 388 words: 111,099

Democracy for Sale: Dark Money and Dirty Politics
by Peter Geoghegan
Published 2 Jan 2020

“We could say, for example, we will target women between 35 and 45 who live in these particular geographical entities, who don’t have a degree,” Cummings later explained. He boasted of using physicists and experts in “quantum information” to crunch voter data. Vote Leave recorded spending over £70,000 with a firm called Advanced Skills Initiative.47 The company is better known as ASI Data Science, a tech start-up that marketed itself as a world leader in artificial intelligence, and which employed a number of data scientists that worked for Cambridge Analytica. Cummings also came up with clever ruses to find data on the “missing three million”. He ran an online competition during the 2016 European Football Championship. Win £50 million, the advertisement proclaimed, by successfully predicting the winner of all 51 games.

See also https://firstdraftnews.org/latest/thousands-of-misleading-conservative-ads-side-step-scrutiny-thanks-to-facebook-policy/; accessed 26 Jan. 2020. 44 Ella Hollowood and Matthew D’Ancona, ‘Big little lies’, Tortoise, December 2019. See also https://members.tortoisemedia.com/2019/12/11/lies-191211/content.html; accessed 26 Jan. 2020. 45 Dominic Cummings, ‘“Two hands are a lot” – we’re hiring data scientists, project managers, policy experts, assorted weirdos…’, Dominic Cummings’s Blog, January 2020. See also https://dominiccummings.com/2020/01/02/two-hands-are-a-lot-were-hiring-data-scientists-project-managers-policy-experts-assorted-weirdos/; accessed 26 Jan. 2020. 46 Rowland Manthorpe, ‘General election: WhatsApp messages urge British Hindus to vote against Labour’, Sky News, November 2019. 47 Jim Waterson, ‘What we learned about the media this election’, Guardian, December 2019. 48 James Cusick, ‘New evidence that LibDems sold voter data for £100,000 held back till after election’, openDemocracy, November 2019. 49 Rowland Manthorpe, ‘Data protection experts want watchdog to investigate Conservative and Labour parties’, Sky News, October 2019. 50 ‘General election 2019: Zac Goldsmith loses seat to Lib Dems again’, BBC, December 2019. 51 Kate Proctor, ‘Johnson accused of “rewarding racism” after Zac Goldsmith peerage’, Guardian, December 2019. 52 Isobel Thompson, ‘How Irish anti-abortion activists are drawing on Brexit and Trump campaigns to influence referendum’, openDemocracy, May 2018. 53 ‘Republicans Overseas UK: An Evening with GOP Strategist Matt Mackowiak’, Republicans Overseas UK.

Dominic Cummings Said It’s “TOP PRIORITY”’, Buzzfeed, September 2019. 3 William Norton, White Elephant: How the North East Said No (London, 2008), p. 200. 4 Dominic Cummings, ‘On the referendum #20: the campaign, physics and data science – Vote Leave’s ‘Voter Intention Collection System’ (VICS) now available for all’, Dominic Cummings’s Blog, October 2016. See also https://dominiccummings.com/2016/10/29/on-the-referendum-20-the-campaign-physics-and-data-science-vote-leaves-voter-intention-collection-system-vics-now-available-for-all/; accessed 19 Jan. 2020. 5 Alice Thomson and Rachel Sylvester, ‘Sir Nicholas Soames interview: “Johnson is nothing like Churchill and Jacob Rees-Mogg is an absolute fraud”’, The Times, September 2019. 6 Sam Knight, ‘The man who brought you Brexit’, Guardian, September 2016. 7 Tim Shipman, All Out War: The Full Story of Brexit (London, 2017), p. 27. 8 George Eaton, ‘Vote Leave head Matthew Elliott: “The Brexiteers won the battle but we could lose the war”’, New Statesman, September 2018. 9 Chloe Farand and Mat Hope, ‘Matthew and Sarah Elliott: How a UK Power Couple Links US Libertarians and Fossil Fuel Lobbyists to Brexit’, DeSmog UK, November 2018. 10 Robert Booth, ‘Who is behind the Taxpayers’ Alliance?’

pages: 344 words: 96,020

Hacking Growth: How Today's Fastest-Growing Companies Drive Breakout Success
by Sean Ellis and Morgan Brown
Published 24 Apr 2017

The growth hacking practices innovated by these early practitioners and others who have followed have been honed into a finely tuned business methodology—and spawned a powerful movement with hundreds of thousands of practitioners (and growing) across the globe. This vibrant community of growth hackers includes entrepreneurs, marketers, engineers, product managers, data scientists, and more, not just from the tech start-up world but from all walks of industry, from technology, retail, business-to-business, professional services, entertainment, and even the political arena. And while the details of how it is implemented vary somewhat from company to company, the core elements of the method are: • the creation of a cross-functional team, or a set of teams that break down the traditional silos of marketing and product development and combine talents; • the use of qualitative research and quantitative data analysis to gain deep insights into user behavior and preferences; and • the rapid generation and testing of ideas, and the use of rigorous metrics to evaluate—and then act on—those results.

Recognizing that Walmart’s greatest asset is its data, Brian Monahan, the company’s former VP of marketing, pushed forward a unification of the company’s data platforms across all divisions, one that would allow all teams, from engineering, to merchandising, to marketing, and even external agencies and suppliers, to capitalize on the data generated and collected. Growth hacking cultivates the maximization of big data through collaboration and information sharing. Monahan highlighted the business need this approach solves: “You need marketers who can appreciate what it takes to actually write software and you need data scientists who can really appreciate consumer insights and understand business problems,” he explained.19 THE RISING COSTS AND DUBIOUS RETURNS OF TRADITIONAL MARKETING The techniques of traditional marketing—both print and television advertising, and the newer online versions that have become essential parts of the traditional marketing toolkit—are in crisis, as markets are becoming more and more fragmented and ephemeral, while advertising is becoming both more expensive and less viewed.

Depending on the degree of sophistication of the experiments a team is running, it might be possible for the marketing or engineering team member to play this role, as in both of those fields, a certain level of data analytics aptitude has become important. At more technically advanced companies, analysts with expertise in reporting of experiments as well as data scientists, who are mining for deep insight, should both play a role. What is essential is that data analysis not be farmed out to the intern who knows how to use Google Analytics or to a digital agency, to cite extreme but all too frequent realities. As we will discuss in detail coming up in Chapter Three, too many companies do not place enough emphasis on data analysis, and rely too heavily on prepackaged programs, such as Google Analytics, with limited capacity for combining various pools of data, such as from sales and from customer service, and limited ability to delve into that data to make discoveries.

pages: 349 words: 98,868

Nervous States: Democracy and the Decline of Reason
by William Davies
Published 26 Feb 2019

Behind the scenes, this is gobbled up and mathematically processed. As the math has become more and more sophisticated, the user no longer even experiences it as mathematical. From science to data science A curiosity of big-data analytics is that its specialists are relatively uninterested in whether the data is generated by people, particles in the atmosphere, cars, financial prices, or bacteria. Data scientists are more often trained in mathematics or physics than in social science. They generate knowledge about our behavior, but they don’t profess any expertise about people, or shopping, or finance, or cities.

They don’t study nature or society, in the way that the archetypal expert does, but seek patterns in data that computers have already captured. As opposed to a scientist, a data scientist might better be compared to a librarian, someone who is skilled in navigating a vast collection of already-recorded information. The difference is that the data archive is growing at great speed, thanks to the mass of nonhuman sensors that gather it, and can only be sifted algorithmically. Take the example of psychology. Data science reveals a great deal that is of interest to psychologists, given the ability of algorithms to detect emotions, behaviors, and anxieties across populations.

The analyst’s value lies in pruning vast quantities of useless data, leaving only that which deserves our attention.16 But if they lack any intrinsic interest in the topic at hand (other than the mathematics), they also have no view of their own regarding what “something meaningful” means—they are therefore in the service of a client. Alternatively, their biases and assumptions creep in, without being consciously reflected on or criticized.17 The clients for data science are multiplying all the time. “Quants” can make big money working for Wall Street banks and hedge funds, building algorithms to analyze price movements. “Smart city” projects depend on data scientists to extract patterns of activity from the frenetic movements of urban populations, resources, and transport. Firms such as Peter Thiel’s Palantir help security services identify potential security threats, by isolating dangerous patterns of behavior.

Data and the City
by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle
Published 2 Aug 2017

Since the architecture provides flexible techniques for data and analysis sharing, communications within organizations, between organizations and between citizens and organizations can be improved. This is an important requirement for cities and citizens. With the rise of citizen data scientist and the corporate use of urban data, sharing data about cities using various bindings is advantageous. The architecture also can efficiently communicate with big data technologies. Current city dashboards are useful tool for now-casting. With big data technologies and data science, cities need to have other systems for sharing data using polyglot bindings and providing indicators and metrics about city for future (forecasting) using predictive analytics.

British Academy (2012) ‘Society Counts – Quantitative Studies in the Social Sciences and Humanities’, A British Academy Position Statement available from: www.britac.ac.uk/ policy/Society_Counts.cfm [accessed 9 December 2016]. Burkert, H. (1992) ‘The legal framework of public sector information: Recent legal policy developments in the EC’, Government Publications Review 19(5): 483–496. Cabinet Office (2015) ‘Open policy making toolkit: Data science’, available from: www. gov.uk/open-policy-making-toolkit-data-science [accessed 9 December 2016]. Clarke, R. (1988) ‘Information technology and dataveillance’, Communications of the ACM 31(5): 498–512. Crampton, J. and Krygier, J. (2005) ‘An introduction to critical cartography’, ACME: An International E-Journal for Critical Geographies 4(1), available from: http://ojs.unbc. ca/index.php/acme/article/view/723/585 [accessed 9 December 2016].

Moreover, further research is required to understand how data influence digital labour, investigating issues such as how institutional and organizational structures change with the introduction of new databased regimes, how data ecosystems change government and corporate work practices, and how the database managers and data scientists become more important within institutions with their knowledge and expertise becoming privileged over others. Epistemology As the chapters make clear, there are a diverse set of epistemologies being deployed to make sense of urban data, data-driven systems, and the relationship between data and the city.

Beautiful Data: The Stories Behind Elegant Data Solutions
by Toby Segaran and Jeff Hammerbacher
Published 1 Jul 2009

The most critical human component for accelerating the learning process and making use of the Information Platform is taking the shape of a new role: the Data Scientist. The Data Scientist In a recent interview, Hal Varian, Google’s chief economist, highlighted the need for employees able to extract information from the Information Platforms described earlier. As Varian puts it, “find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis.” INFORMATION PLATFORMS AND THE RISE OF THE DATA SCIENTIST Download at Boykma.Com 83 At Facebook, we felt that traditional titles such as Business Analyst, Statistician, Engineer, and Research Scientist didn’t quite capture what we were after for our team.

The workload for the role was diverse: on any given day, a team member could author a multistage processing pipeline in Python, design a hypothesis test, perform a regression analysis over data samples with R, design and implement an algorithm for some data-intensive product or service in Hadoop, or communicate the results of our analyses to other members of the organization in a clear and concise fashion. To capture the skill set required to perform this multitude of tasks, we created the role of “Data Scientist.” In the financial services domain, large data stores of past market activity are built to serve as the proving ground for complex new models developed by the Data Scientists of their domain, known as Quants. Outside of industry, I’ve found that grad students in many scientific domains are playing the role of the Data Scientist. One of our hires for the Facebook Data team came from a bioinformatics lab where he was building data pipelines and performing offline data analysis of a similar kind.

Recent books such as Davenport and Harris’s Competing on Analytics (Harvard Business School Press, 2007), Baker’s The Numerati (Houghton Mifflin Harcourt, 2008), and Ayres’s Super Crunchers (Bantam, 2008) have emphasized the critical role of the Data Scientist across industries in enabling an organization to improve over time based on the information it collects. In conjunction with the research community’s investigation of dataspaces, further definition for the role of the Data Scientist is needed over the coming years. By better articulating the role, we’ll be able to construct training curricula, formulate promotion hierarchies, organize conferences, write books, and fill in all of the other trappings of a recognized profession. In the process, the pool of available Data Scientists will expand to meet the growing need for expert pilots for the rapidly proliferating Information Platforms, further speeding the learning process across all organizations.

pages: 442 words: 94,734

The Art of Statistics: Learning From Data
by David Spiegelhalter
Published 14 Oct 2019

This common view of statistics as a basic ‘bag of tools’ is now facing major challenges. First, we are in an age of data science, in which large and complex data sets are collected from routine sources such as traffic monitors, social media posts and internet purchases, and used as a basis for technological innovations such as optimizing travel routes, targeted advertising or purchase recommendation systems – we shall look at algorithms based on ‘big data’ in Chapter 6. Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter.

cox regression: See hazard ratio. data literacy: the ability to understand the principles behind learning from data, carry out basic data analyses, and critique the quality of claims made on the basis of data. data science: the study and application of techniques for deriving insights from data, including constructing algorithms for prediction. Traditional statistical science forms part of data science, which also includes a strong element of coding and data management. deep learning: a machine-learning technique that extends standard artificial neural network models to many layers representing different levels of abstraction, say going from individual pixels of an image through to recognition of objects.

Any conclusions generally raise more questions, and so the cycle starts over again, as when we started looking at the time of day when Shipman’s patients died. Although in practice the PPDAC cycle laid out in Figure 0.3 may not be followed precisely, it underscores that formal techniques for statistical analysis play only one part in the work of a statistician or data scientist. Statistical science is a lot more than a branch of mathematics involving esoteric formulae with which generations of students have (often reluctantly) struggled. This Book When I was a student in Britain in the 1970s, there were just three TV channels, computers were the size of a double wardrobe, and the closest thing we had to Wikipedia was on the imaginary handheld device in Douglas Adams’ (remarkably prescient) Hitchhiker’s Guide to the Galaxy.

pages: 404 words: 92,713

The Art of Statistics: How to Learn From Data
by David Spiegelhalter
Published 2 Sep 2019

This common view of statistics as a basic ‘bag of tools’ is now facing major challenges. First, we are in an age of data science, in which large and complex data sets are collected from routine sources such as traffic monitors, social media posts and internet purchases, and used as a basis for technological innovations such as optimizing travel routes, targeted advertising or purchase recommendation systems—we shall look at algorithms based on ‘big data’ in Chapter 6. Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter.

cox regression: See hazard ratio. data literacy: the ability to understand the principles behind learning from data, carry out basic data analyses, and critique the quality of claims made on the basis of data. data science: the study and application of techniques for deriving insights from data, including constructing algorithms for prediction. Traditional statistical science forms part of data science, which also includes a strong element of coding and data management. deep learning: a machine-learning technique that extends standard artificial neural network models to many layers representing different levels of abstraction, say going from individual pixels of an image through to recognition of objects.

Any conclusions generally raise more questions, and so the cycle starts over again, as when we started looking at the time of day when Shipman’s patients died. Although in practice the PPDAC cycle laid out in Figure 0.3 may not be followed precisely, it underscores that formal techniques for statistical analysis play only one part in the work of a statistician or data scientist. Statistical science is a lot more than a branch of mathematics involving esoteric formulae with which generations of students have (often reluctantly) struggled. This Book When I was a student in Britain in the 1970s, there were just three TV channels, computers were the size of a double wardrobe, and the closest thing we had to Wikipedia was on the imaginary handheld device in Douglas Adams’ (remarkably prescient) Hitchhiker’s Guide to the Galaxy.

Calling Bullshit: The Art of Scepticism in a Data-Driven World
by Jevin D. West and Carl T. Bergstrom
Published 3 Aug 2020

* * * — WE HAVE DEVOTED OUR careers to teaching students how to think logically and quantitatively about data. This book emerged from a course we teach at the University of Washington, also titled “Calling Bullshit.” We hope it will show you that you do not need to be a professional statistician or econometrician or data scientist to think critically about quantitative arguments, nor do you need extensive data sets and weeks of effort to see through bullshit. It is often sufficient to apply basic logical reasoning to a problem and, where needed, augment that with information readily discovered via search engine. We have civic motives for wanting to help people spot and refute bullshit.

If you do know some of these things, you still probably don’t remember all of the details. We, the authors, use statistics on a daily basis, but we still have to look up this sort of stuff all the time. As a result, you can’t unpack the black box; you can’t go into the details of the analysis in order to pick apart possible problems. Unless you’re a data scientist, and probably even then, you run into the same kind of problem you encounter when you read about a paper that uses the newest ResNet algorithm to reveal differences in the facial features of dog and cat owners. Whether or not this is intentional on the part of the author, this kind of black box shields the claim against scrutiny.

If we continue the time series across the intervening years since Vigen published this figure, the correlation completely falls apart. Vigen finds his spurious correlation examples by collecting a large number of data sets about how things change over time. Then he uses a computer program to compare each trend with every other trend. This is an extreme form of what data scientists call data dredging. With a mere one hundred data series, one can compare nearly ten thousand pairs. Some of these pairs are going to show very similar trends—and thus high correlations—just by chance. For example, check out the correlation between the numbers of deaths caused by anticoagulants and the number of sociology degrees awarded in the US: You look at these two trends and think, Wow—what are the chances they would line up that well?

pages: 475 words: 134,707

The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt
by Sinan Aral
Published 14 Sep 2020

Whether you simply want to understand the how and why, or need to make business or policy decisions, this is a must-read!” —FOSTER PROVOST, NYU Stern School of Business, author of Data Science for Business “Sinan Aral is a scientist and entrepreneur, and his unique perspective makes him the perfect guide to the world we live in today. From ads to fake news, The Hype Machine is the best critical foundation for understanding the connected world and how we might navigate through it to a better future.” —HILARY MASON, founder and CEO of Fast Forward Labs, data scientist in residence at Accel, and former chief scientist at Bitly “If you want the truth about falsehoods, real information about misinformation, and rigorous analysis of hype, this is the book for you.

Moreover, while 92 percent of consumers read reviews, only 6 percent write reviews, which means a vocal minority is influencing the opinions of the overwhelming majority. The potential consequences of ratings herding are significant because the 6 percent have an outsize impact on how the rest of us shop. Sean Taylor, my PhD student at the time, who’s now a senior data scientist at Lyft and the former head of the statistics team in Facebook’s Core Data Science group, overheard my conversation with Lev and wandered across the hall. “Hey, what are you guys talking about?” This is how social science starts—it’s sparked by everyday puzzles that evolve into investigations of how and why things happen. Lev, Sean, and I took this nagging question about herd behavior and embarked on a research project to uncover the truth about population-scale opinion dynamics in the crowd.

He’s the David Austin Professor of Management, Marketing, IT, and Data Science at MIT, director of the MIT Initiative on the Digital Economy, and head of MIT’s Social Analytics Lab. He was the chief scientist of Social Amp and Humin before co-founding Manifest Capital, a VC fund that grows startups into the Hype Machine. Aral has worked closely with Facebook, Yahoo!, Twitter, LinkedIn, Snapchat, WeChat, and The New York Times, among others, and currently serves on the advisory boards of the Alan Turing Institute, the British national institute for data science, in London; the Centre for Responsible Media Technology and Innovation in Norway; and C6 Bank, one of the first all-digital banks of Brazil.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Published 21 Sep 2015

M., 138–139 Baldwin effect, 139, 140, 304 Bandit problems, 129–130 Barto, Andy, 221 Bayes, Thomas, 144–145 Bayesian learning, 166–170, 174–175 Bayesian methods, cell model and, 114 Bayesian model averaging, 166–167 Bayesian models, tweaking probabilities, 170–173 Bayesian networks, 24, 156–161, 305–306 Alchemy and, 250 gene regulation and, 159 inference problem and, 161–166 Master Algorithm and, 240, 245 relational learning and, 231 Bayesians, 51, 52–53, 54, 143–175 Alchemy and, 253 further reading, 304–305 hidden Markov model, 154–155 If . . . then . . . rules and, 155–156 inference problem, 161–166 learning and, 166–170 logic and probability and, 173–175 Markov chain, 153–155 Markov networks, 170–173 Master Algorithm and, 240–241, 242 medical diagnosis and, 149–150 models and, 149–153 nature and, 141 probabilistic inference and, 52, 53 See also Bayesian networks Bayes’ theorem, 31–32, 52–53, 143–149, 253 Beam search, 135 “Beer and diapers” rule, 69–70 Belief, probability and, 149 Belief propagation, 161–164, 242, 253 Bell Labs, 190 Bellman, Richard, 188, 220 Bellman’s equation, 220 Berkeley, George, 58 Berlin, Isaiah, 41 Bias, 78–79 Bias-free learning, futility of, 64 Bias-variance decomposition, 301 The Bible Code (Drosnin), 72 Big data, 21 A/B testing and, 227 algorithms and, 7 clustering and, 206–207 relational learning and, 232–233 science, machine learning, and, 14–16 scientific truth and, 40 Big-data systems, 258 Bing, 12 Biology, learning algorithms and, 15 Black swans, 38–39, 158, 232 The Black Swan (Taleb), 38 Blessing of nonuniformity, 189 Board games, reinforcement learning and, 219 Bohr, Niels, 178, 199 Boltzmann distribution, 103–104 Boltzmann machines, 103–104, 117, 250 Boole, George, 104, 175 Boolean circuits, 123, 136 Boolean variable, 149 Boosting, 238 Borges, Jorge Luis, 71 Box, George, 151 Brahe, Tycho, 14, 131 Brahe phase of science, 39–40 Brain learning algorithms and, 26–28 mapping, 118 number of connections in, 94–95 reverse engineering the, 52, 302 S curves and, 105 simulating with computer, 95 spin glasses and, 102–103 BRAIN initiative, 118 Breiman, Leo, 238 Brin, Sergey, 55, 227, 274 Bryson, Arthur, 113 Bucket brigade algorithm, 127 Building blocks, 128–129, 134 Buntine, Wray, 80 Burglar alarms, Bayesian networks and, 157–158 Burks, Arthur, 123 Burns, Bob, 206 Business, machine learning and, 10–13 C. elegans, 118 Cajal, Santiago Ramón y, 93–94 Caltech, 170 CancerCommons.org, 261 Cancer cure algorithm for, 53–54 Bayesian learning and, 174 inverse deduction and, 83–85 Markov logic network and, 249 program for (CanceRx), 259–261, 310 Cancer diagnosis, 141 Cancer drugs predicting efficacy of, 83–84 relational learning and models for, 233 selection of, 41–42 CanceRx, 259–261, 310 Capital One, 272 Carbonell, Jaime, 69 Carnap, Rudolf, 175 Cars driverless, 113, 166, 172, 306 learning to drive, 113 Case-based reasoning, 198, 307 Catch Me If You Can (film), 177 Cause and effect, Bayes’ theorem and, 145–149 Cell model of, 114–115 relational learning and workings of, 233 Cell assembly, 94 Cell phone, hidden Markov models and, 155 Centaurs, 277 Central Dogma, 83 Cerebellum, 27, 118 Chance, Bayes and, 145 Chaos, study of, 30 Checkers-playing program, 219 Cholera outbreak, London’s, 182–183 Chomsky, Noam, 36–38 Chrome, 266 Chunking, 223–227, 254, 309 Circuit design, genetic programming and, 135–136 Classes, 86–87, 209, 257 Classifiers, 86–87, 127 Master Algorithm and, 240 Naïve Bayes, 151–153 nearest-neighbor algorithm and, 183 Clinton, Bill, 18 Clustering, 205–210, 254, 257 hierarchical, 210 Cluster prototypes, 207–208 Clusters, 205–210 “Cocktail party” problem, 215 Cognition, theory of, 226 Coin toss, 63, 130, 167–168 Collaborative filtering systems, 183–184, 306–307 Columbus test, 113 Combinatorial explosion, 73–74 Commoner, Barry, 158 Commonsense reasoning, 35, 118–119, 145, 276–277, 300 Complexity monster, 5–6, 7, 43, 246 Compositionality, 119 Computational biologists, use of hidden Markov models, 155 Computers decision making and, 282–286 evolution of, 286–289 human interaction with, 264–267 as learners, 45 logic and, 2 S curves and, 105 as sign of Master Algorithm, 34 simulating brain using, 95 as unifier, 236 writing own programs, 6 Computer science, Master Algorithm and, 32–34 Computer vision, Markov networks and, 172 Concepts, 67 conjunctive, 66–68 set of rules and, 68–69 sets of, 86–87 Conceptual model, 44, 152 Conditional independence, 157–158 Conditional probabilities, 245 Conditional random fields, 172, 306 Conference on Neural Information Processing Systems (NIPS), 170, 172 Conjunctive concepts, 65–68, 74 Connectionists/connectionism, 51, 52, 54, 93–119 Alchemy and, 252 autoencoder and, 116–118 backpropagation and, 52, 107–111 Boltzmann machine and, 103–104 cell model, 114–115 connectomics, 118–119 deep learning and, 115 further reading, 302–303 Master Algorithm and, 240–241 nature and, 137–142 neural networks and, 112–114 perceptron, 96–101, 107–108 S curves and, 104–107 spin glasses and, 102–103 symbolist learning vs., 91, 94–95 Connectomics, 118–119 Consciousness, 96 Consilience (Wilson), 31 Constrained optimization, 193–195, 241, 242 Constraints, support vector machines and, 193–195 Convolutional neural networks, 117–119, 303 Cope, David, 199, 307 Cornell University, Creative Machines Lab, 121–122 Cortex, 118, 138 unity of, 26–28, 299–300 Counterexamples, 67 Cover, Tom, 185 Crawlers, 8–9 Creative Machines Lab, 121–122 Credit-assignment problem, 102, 104, 107, 127 Crick, Francis, 122, 236 Crossover, 124–125, 134–136, 241, 243 Curse of dimensionality, 186–190, 196, 201, 307 Cyber Command, 19 Cyberwar, 19–21, 279–282, 299, 310 Cyc project, 35, 300 DARPA, 21, 37, 113, 121, 255 Darwin, Charles, 28, 30, 131, 235 algorithm, 122–128 analogy and, 178 Hume and, 58 on lack of mathematical ability, 127 on selective breeding, 123–124 variation and, 124 Data accuracy of held-out, 75–76 Bayes’ theorem and, 31–32 control of, 45 first principal component of the, 214 human intuition and, 39 learning from finite, 24–25 Master Algorithm and, 25–26 patterns in, 70–75 sciences and complex, 14 as strategic asset for business, 13 theory and, 46 See also Big data; Overfitting; Personal data Database engine, 49–50 Databases, 8, 9 Data mining, 8, 73, 232–233, 298, 306. See also Machine learning Data science, 8. See also Machine learning Data scientist, 9 Data sharing, 270–276 Data unions, 274–275 Dawkins, Richard, 284 Decision making, artificial intelligence and, 282–286 Decision theory, 165 Decision tree induction, 85–89 Decision tree learners, 24, 301 Decision trees, 24, 85–90, 181–182, 188, 237–238 Deduction induction as inverse of, 80–83, 301 Turing machine and, 34 Deductive reasoning, 80–81 Deep learning, 104, 115–118, 172, 195, 241, 302 DeepMind, 222 Democracy, machine learning and, 18–19 Dempster, Arthur, 209 Dendrites, 95 Descartes, René, 58, 64 Descriptive theories, normative theories vs., 141–142, 304 Determinism, Laplace and, 145 Developmental psychology, 203–204, 308 DiCaprio, Leonardo, 177 Diderot, Denis, 63 Diffusion equation, 30 Dimensionality, curse of, 186–190, 307 Dimensionality reduction, 189–190, 211–215, 255 nonlinear, 215–217 Dirty Harry (film), 65 Disney animators, S curves and, 106 “Divide and conquer” algorithm, 77–78, 80, 81, 87 DNA sequencers, 84 Downweighting attributes, 189 Driverless cars, 8, 113, 166, 172, 306 Drones, 21, 281 Drugs, 15, 41–42, 83.

If you’re curious what all the hubbub surrounding big data and machine learning is about and suspect that there’s something deeper going on than what you see in the papers, you’re right! This book is your guide to the revolution. If your main interest is in the business uses of machine learning, this book can help you in at least six ways: to become a savvier consumer of analytics; to make the most of your data scientists; to avoid the pitfalls that kill so many data-mining projects; to discover what you can automate without the expense of hand-coded software; to reduce the rigidity of your information systems; and to anticipate some of the new technology that’s coming your way. I’ve seen too much time and money wasted trying to solve a problem with the wrong learning algorithm, or misinterpreting what the algorithm said.

At the end of the day, a browser is just a standard piece of software, but a search engine requires a different mind-set. The other reason machine learners are the über-geeks is that the world has far fewer of them than it needs, even by the already dire standards of computer science. According to tech guru Tim O’Reilly, “data scientist” is the hottest job title in Silicon Valley. The McKinsey Global Institute estimates that by 2018 the United States alone will need 140,000 to 190,000 more machine-learning experts than will be available, and 1.5 million more data-savvy managers. Machine learning’s applications have exploded too suddenly for education to keep up, and it has a reputation for being a difficult subject.

pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts
by Richard Susskind and Daniel Susskind
Published 24 Aug 2015

When this term was first used, it was confined to techniques for the handling of vast bodies of data—for example, the masses of data recorded by the Large Hadron Collider. Now, Big Data is also used to refer to the use of technology to analyse much smaller bodies of information. Some speak instead of ‘data analytics’, ‘data science’, and ‘predictive analytics’, all of which seem to mean roughly the same thing.36 Specialists in the area, whatever label is preferred, are often called ‘data scientists’. There has been no shortage of hype about Big Data. There are commentators who argue, with some justification, that its claims are too extravagant and that its methodology is underdeveloped.37 What is hard to deny is the volume of data that are swilling around.

There is less need for the ‘sage on the stage’ and more of a job for the ‘guide on the side’—those who help students navigate through alternative sources of expertise. There are new roles and new disciplines, like education software designers who build the ‘adaptive’ learning systems, the content curators who compile and manage online content, and the data scientists who collect large data sets and develop ‘learning analytics’ to interpret them. It is not surprising, therefore, that Larry Summers, former chair of the White House Council of Economic Advisers and past President of Harvard, has said that ‘the next quarter century will see more change in higher education than the last three combined’,93 and that Sir Michael Barber, a former Downing Street adviser, anticipates transformations in education in his aptly named report, An Avalanche is Coming.94 2.3.

But the competition is also advancing from outside the traditional boundaries of the professions—from new people and different institutions. In Chapter 2 we see a recurring need to draw on people with very different skills, talents, and ways of working. Practising doctors, priests, teachers, and auditors did not, for example, develop the software that supports the systems that we describe. Stepping forward instead are data scientists, process analysts, knowledge engineers, systems engineers, and many more (see Chapter 6). Today, professionals still provide much of the content, but in time they may find themselves down-staged by these new specialists. We also see a diverse set of institutions entering the fray—business process outsourcers, retail brands, Internet companies, major software and service vendors, to name a few.

pages: 251 words: 80,831

Super Founders: What Data Reveals About Billion-Dollar Startups
by Ali Tamaseb
Published 14 Sep 2021

As a result, Lake had to learn how to be incredibly efficient with the money she had. Stitch Fix sold clothes before paying its vendors, and it turned over inventory as quickly as possible. The company changed its cash cycle so that it didn’t have to sit on unsold products. By hiring very talented and senior data scientists, she also turned Stitch Fix into a technology powerhouse and a talent magnet for the best data scientists. Stitch Fix was able to better understand a customer’s style through a series of sophisticated algorithms, which could augment the workforce of human stylists and significantly lower costs. “If I had been handed $100 million, I don’t know that I would have understood the business as well as I did,” says Lake.

A business like Stitch Fix—which involves buying and holding inventory, storage, shipping, and a lot of physical labor, from stylists to warehouse workers—would be considered capital intensive and inefficient at first sight. Lake managed to become more efficient by renegotiating contracts with vendors and hiring data scientists to augment stylists to scale the business. It’s true that many capital-efficient companies are capital light. (They have low capital expenditures, abbreviated as “low CapEx.”) But in practice, companies with high capital needs can also be successful in both raising funding and reaching multibillion-dollar outcomes.

David Vélez was a partner at Sequoia Capital looking into Latin American investment opportunities before venturing out to start Nubank, the Brazilian online bank unicorn, and Andy Rachleff co-founded Benchmark Capital before starting Wealthfront, a consumer financial advisory company. Abe Othman, head of data science at AngelList—a website enabling angel investors to invest in early-stage startups and to diversify through index investments—told me, “One of the most significant characteristics we’ve observed among successful startups is when a founder had experience as an investor or angel investor.” This could be because former investors have an easier time raising money, and because they’re better at filtering their ideas and betting on the right one to spend time on.

pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again
by Eric Topol
Published 1 Jan 2019

Among today’s medical specialties, it will be the radiologist who, having a deep understanding of the nuances of such image-based diagnostic algorithms, is best positioned to communicate results to patients and provide guidance for how to respond to them. Nevertheless, although some have asserted that “radiologists of the future will be essential data scientists of medicine,” I don’t think that’s necessarily the direction we’re headed.44 Instead, they likely will be connecting far more with patients, acting as real doctors. FIGURE 6.2: Predicting longevity from a deep neural network of CT scans. Source: Adapted from L. Oakden-Rayner et al., “Precision Radiology: Predicting Longevity Using Feature Engineering and Deep Learning Methods in a Radiomics Framework,” Sci Rep (2017): 7(1), 1648.

With a heightened awareness of the opportunity to prevent such tragedies, in 2017, CEO Mark Zuckerberg announced new algorithms that look for patterns of posts and words for rapid review by dedicated Facebook employees: “In the future, AI will be able to understand more of the subtle nuances of language and will be able to identify different issues beyond suicide as well, including quickly spotting more kinds of bullying and hate.” Unfortunately, and critically, Facebook has refused to disclose the algorithmic details, but the company claims to have interceded with more than a hundred people intending to commit self-harm.42 Data scientists are now using machine learning on the Crisis Text Line’s 75 million texts43 to try to unravel text or emoji risk factors.44 Overall, even these very early attempts at using AI to detect depression and suicidal risk show some promising signs that we can do far better than the traditional subjective and clinical risk factors.

Cheng, Q., et al., “Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study.” J Med Internet Res, 2017. 19(7): p. e243. 42. McConnon, “AI Helps Identify Those at Risk for Suicide.” 43. “Crisis Trends,” July 19, 2018. https://crisistrends.org/#visualizations. 44. Resnick, B., “How Data Scientists Are Using AI for Suicide Prevention,” Vox. 2018. 45. Anthes, E., “Depression: A Change of Mind.” Nature, 2014. 515(7526): pp. 185–187. 46. Firth, J., et al., “The Efficacy of Smartphone-Based Mental Health Interventions for Depressive Symptoms: A Meta-Analysis of Randomized Controlled Trials.”

pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI
by Paul R. Daugherty and H. James Wilson
Published 15 Jan 2018

This way, customer satisfaction doesn’t take a hit, and the company saves on energy costs.22 Enable Discovery What kind of conversations are you having with your data? Are only analysts and data scientists benefiting from analysis tools? Your goal should be to extract insights in such a way that anyone, especially less-technical business users can take advantage of the story that the data is trying to tell. Ayasdi is democratizing discovery, providing software that’s useful to data scientists and non-technical business leaders alike. One of its customers, Texas Medical Center (TMC), focuses on the analysis of high-volume, high-dimensional data sets such as data from breast-cancer patients.

When you start typing search terms, Google not only considers the most generally popular associations for its autocomplete feature, but also considers your geographic location, previous search terms, and other factors. It can feel as if the software is reading your thoughts. Leveraging AI to Find a Job Bot-based empowerment skills also come in handy for job searches. If there’s one guarantee for workers in these AI days, it’s that the job landscape is quickly changing. Positions such as data scientist, which barely existed five years ago, are now all the rage. And positions that focus on rote tasks like data entry are quickly fading from job listings. How can people forge new career paths, find new training opportunities, or boost their online presence or personal brand on social media? The answer is bot-based empowerment.

Cade Metz, “DARPA Goes Full Tron with Its Brand Battle of the Hack Bots,” Wired, July 5, 2016, https://www.wired.com/2016/07/_trashed-19/. Various security companies have their own approaches to the problem. SparkCognition, for instance, offers a product called Deep Armor, which uses a combination of AI techniques including neural networks, heuristics, data science, and natural-language processing to detect threats never seen before and remove malicious files. Another company called Darktrace offers a product called Antigena, which is modeled on the human immune system, identifying and neutralizing bugs as they’re encountered.7 Behavioral analysis of network traffic is key to another company called Vectra.

pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think
by Viktor Mayer-Schonberger and Kenneth Cukier
Published 5 Mar 2013

So far, the first two of these elements get the most attention: the skills, which today are scarce, and the data, which seems abundant. A new profession has emerged in recent years, the “data scientist,” which combines the skills of the statistician, software programmer, infographics designer, and storyteller. Instead of squinting into a microscope to unlock a mystery of the universe, the data scientist peers into databases to make a discovery. The McKinsey Global Institute proffers dire predictions about the dearth of data scientists now and in the future (which today’s data scientists like to cite to feel special and to pump up their salaries). Hal Varian, Google’s chief economist, famously calls statistician the “sexiest” job around.

The value in data’s reuse is good news for organizations that collect or control large datasets but currently make little use of them, such as conventional businesses that mostly operate offline. They may sit on untapped informational geysers. Some companies may have collected data, used it once (if at all), and just kept it around because of low storage cost—in “data tombs,” as data scientists call the places where such old info resides. Internet and technology companies are on the front lines of harnessing the data deluge, since they collect so much information just by being online and are ahead of the rest of industry in analyzing it. But all firms stand to gain. The consultants at McKinsey & Company point to a logistics company, whose name they keep anonymous, which noticed that in the process of delivering goods, it was amassing reams of information on product shipments around the globe.

Data exhaust is the mechanism behind many services like voice recognition, spam filters, language translation, and much more. When users indicate to a voice-recognition program that it has misunderstood what they said, they in effect “train” the system to get better. Many businesses are starting to engineer their systems to collect and use information in this way. In Facebook’s early days, its first “data scientist,” Jeff Hammerbacher (and among the people credited with coining the term), examined its rich trove of data exhaust. He and the team found that a big predictor that people would take an action (post content, click an icon, and so on) was whether they had seen their friends do the same thing. So Facebook redesigned its system to put greater emphasis on making friends’ activities more visible, which sparked a virtuous circle of new contributions to the site.

pages: 317 words: 87,566

The Happiness Industry: How the Government and Big Business Sold Us Well-Being
by William Davies
Published 11 May 2015

But possibilities for psychological and behavioural data are heavily shaped by the power structures which facilitate them. The current explosion in happiness and well-being data is really an effect of new technologies and practices of surveillance. In turn, these depend on pre-existing power inequalities. Building the new laboratory In 2012, Harvard Business Review declared that ‘data scientist’ would be the ‘sexiest job of the twenty-first century’.4 We live during a time of tremendous optimism regarding the possibilities for data collection and analysis that is refuelling the behaviourist and utilitarian ambition to manage society purely through careful scientific observation of mind, body and brain.

But as that science becomes ever more advanced, eventually the subjective element of it starts to drop out of the picture altogether. Bentham’s presumption, that pleasure and pain are the only real dimensions of psychology, is now leading squarely towards the philosophical riddle whereby a neuroscientist or data scientist can tell me that I am objectively wrong about my own mood. We are reaching the point where our bodies are more trusted communicators than our words. If one way of ‘seeing’ happiness as a physiological event is via the face, the other way is to get even closer to its supposed locus: the brain.

, theatlantic.com, 2 April 2012. 25Jeremy Gilbert, ‘Capitalism, Creativity and the Crisis in the Music Industry’, opendemocracy.net, 14 September 2012. 7 Living in the Lab 1Jennifer Scanlon, ‘Mediators in the International Marketplace: US Advertising in Latin America in the Early Twentieth Century’, The Business History Review 77: 3, 2003. 2Jeff Merron, ‘Putting Foreign Consumers on the Map: J. Walter Thompson’s Struggle with General Motors’ International Advertising Account in the 1920s’, The Business History Review 73: 3, 1999. 3Ibid. 4Thomas Davenport and D. J. Patil, ‘Data Scientist: The Sexiest Job of the 21st Century’, Harvard Business Review, October 2012. 5Viktor Mayer-Schönberger, and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work and Think, London: John Murray, 2013. 6Anthony Townsend, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, New York: W.

pages: 292 words: 94,660

The Loop: How Technology Is Creating a World Without Choices and How to Fight Back
by Jacob Ward
Published 25 Jan 2022

With enough time to refine its process, the system will, eventually, astound us with its gift for telling us that this panting poodle is a dog, and that this heifer nosing the curtains is a cow. But how did the system get there? It matters because, as we’ll see, this technology goes about answering stupid questions and world-changing questions the same way. A data scientist handed this task would want to know more about the goal of the project in order to employ the most suitable flavor of machine learning to accomplish it. If you wanted the system to identify the cows and dogs in a photograph of the stage, for instance, you’d probably feed it into a convolutional neural network, a popular means of recognizing objects in a photograph these days.

We knew we probably would not win the competition that way, but there was a bigger point that we needed to make.2 And they were right: they didn’t win. The organizers of the competition would not allow the judges to play with and evaluate the Duke team’s visualization tool, and a trio from IBM Research won the competition for a system based on Boolean rules that would help a human data scientist inspect the black box. In its paper describing the winning system, the IBM team rightly pointed out the need for explainability “as machine learning pushes further into domains such as medicine, criminal justice, and business, where such models complement human decision-makers and decisions can have major consequences on human lives.”

Wald is credited with helping to identify something that statisticians now work hard to fight off: survivorship bias, the tendency to optimize our behavior for the future based on what may in fact be the rare instances in which someone or something survives extremely long odds and only as a result comes to our attention. Forecasting probability based on limited data is something statisticians now know not to do, thanks to Wald—data scientists often post his report’s famous diagram of a shot-up plane as an online meme—but Anna Todd is the spear-tip of an industry that forecasts future success by studying survivors. When Todd and I met, roughly four million writers were posting more than twenty-four hours of reading material to WattPad every sixty seconds of every day, according to the company.

pages: 589 words: 69,193

Mastering Pandas
by Femi Anthony
Published 21 Jun 2015

He enjoys analyzing data and solving complex business problems using SAS, R, EViews/Gretl, Minitab, SQL, and Python. Opeyemi is also an adjunct at Northwood University where he designs and teaches undergraduate courses in microeconomics and macroeconomics. Louis Hénault is a data scientist at OgilvyOne Paris. He loves combining mathematics and computer science to solve real-world problems in an innovative way. After getting a master's degree in engineering with a major in data sciences and another degree in applied mathematics in France, he entered into the French start-up ecosystem, working on several projects. Louis has gained experience in various industries, including geophysics, application performance management, online music platforms, e-commerce, and digital advertising.

Note For more information, refer the Wikipedia page on Python at http://en.wikipedia.org/wiki/Python_%28programming_language%29. Among the characteristics that make Python popular for data science are its very user-friendly (human-readable) syntax, the fact that it is interpreted rather than compiled (leading to faster development time), and its very comprehensive library for parsing and analyzing data, as well as its capacity for doing numerical and statistical computations. Python has libraries that provide a complete toolkit for data science and analysis. The major ones are as follows: NumPy: The general-purpose array functionality with emphasis on numeric computation SciPy: Numerical computing Matplotlib: Graphics pandas: Series and data frames (1D and 2D array-like types) Scikit-Learn: Machine learning NLTK: Natural language processing Statstool: Statistical analysis For this book, we will be focusing on the 4th library listed in the preceding list, pandas.

Application of machine learning – Kaggle Titanic competition In order to illustrate how we can use pandas to assist us at the start of our machine learning journey, we will apply it to a classic problem, which is hosted on the Kaggle website (http://www.kaggle.com). Kaggle is a competition platform for machine learning problems. The idea behind Kaggle is to enable companies that are interested in solving predictive analytics problems with their data to post their data on Kaggle and invite data scientists to come up with the proposed solutions to their problems. The competition can be ongoing over a period of time, and the rankings of the competitors are posted on a leader board. At the end of the competition, the top-ranked competitors receive cash prizes. The classic problem that we will study in order to illustrate the use of pandas for machine learning with scikit-learn is the Titanic: machine learning from disaster problem hosted on Kaggle as their classic introductory machine learning problem.

pages: 688 words: 147,571

Robot Rules: Regulating Artificial Intelligence
by Jacob Turner
Published 29 Oct 2018

An AI system learns to play the game independently of human training, but when its actions are matched to the human descriptions, a narrative can be generated by sewing together these descriptors. Taking a different route to Riedl et al., data scientist Daniel Whitenack identifies three general capabilities required for transparency in AI: data provenance (knowing the source of all data); reproducibility (the ability to recreate a given result); and data versioning (saving snapshot copies of the AI in particular states with a view to recording which input led to which output). Whitenack suggests that in order to make these three desiderata “standards within data science, we need proper tools to integrate these characteristics into workflows”. He says that ideally, AI transparency tools will be:Language agnostic—The language wars in data science between python, R, scala, and others will continue on forever.

See also James Vincent, “Tencent Says There Are Only 300,000 AI Engineers Worldwide, but Millions Are Needed”, The Verge, 5 December 2017, https://​www.​theverge.​com/​2017/​12/​5/​16737224/​global-ai-talent-shortfall-tencent-report, accessed 1 June 2018. By contrast, PWC estimate that in the USA alone, there will be 2.9 m people with data science and analytics skills by 2018. Not all will be AI professionals per se, but many of their skills will overlap. “What’s Next for the 2017 Data Science and Analytics Job Market?”, PWC Website, https://​www.​pwc.​com/​us/​en/​library/​data-science-and-analytics.​html, accessed 1 June 2018. 144Katja Grace, “The Asilomar Conference: A Case Study in Risk Mitigation”, MIRI Research Institute, Technical Report, 2015–9 (Berkeley, CA: MIRI, 15 July 2015), 15. 145A constantly-updated database of tech ethics curricula is available at: https://​docs.​google.​com/​spreadsheets/​d/​1jWIrA8jHz5fYAW4​h9CkUD8gKS5V98PD​JDymRf8d9vKI/​edit#gid=​0, accessed 1 June 2018. 146 Microsoft, The Future Computed: Artificial Intelligence and Its Role in Society (Redmond, WA: Microsoft Corporation, U.S.A., 2018), 55, https://​msblob.​blob.​core.​windows.​net/​ncmedia/​2018/​01/​The-Future_​Computed_​1.​26.​18.​pdf, accessed 1 June 2018. 147See, for example, s. 1 of the UK Road Traffic Act 1988, or s. 249(1)(a) of the Canadian Criminal Code. 148“About TensorFlow”, Website of TensorfFlow, https://​www.​tensorflow.​org/​, accessed 1 June 2018. 149See, for example, the UK Government’s “Guidance: Wine Duty”, 9 November 2009, https://​www.​gov.​uk/​guidance/​wine-duty, accessed 1 June 2018. 150See, for example, Max Weber, “Politics as a Vocation”, in From Max Weber: Essays in Sociology, translated by H.H.

He says that ideally, AI transparency tools will be:Language agnostic—The language wars in data science between python, R, scala, and others will continue on forever. We will always need a mix of languages and frameworks to enable advancements in a field as broad as data science. However, if tools enabling data versioning/provenance are language specific, they are unlikely to be integrated as standard practice. Infrastructure Agnostic—The tools should be able to be deployed on your existing infrastructure—locally, in the cloud, or on-prem. Scalable/distributed—It would be impractical to implement changes to a workflow if they were not able to scale up to production requirements.

pages: 237 words: 65,794

Mining Social Media: Finding Stories in Internet Data
by Lam Thuy Vo
Published 21 Nov 2019

To that end, what follows are a few helpful resources on writing clean code with Python and pandas, as well as producing reproducible data analysis in Jupyter Notebook. While by no means a comprehensive guide, they’re a good starting point: The general Python style guide (https://docs.python-guide.org/writing/style/) and a style guide for data scientists (http://columbia-applied-data-science.github.io/pages/lowclass-python-style-guide.html) Think Python, 2nd Edition, a book by Allen B. Downey (O’Reilly, 2015), available for free under the Creative Commons license on the author’s site (https://greenteapress.com/wp/think-python-2e/) “A Beginner’s Guide to Optimizing Pandas Code for Speed,” an article by Sofia Heisler (https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6/) “What We’ve Learned About Sharing Our Data Analysis,” an article by Jeremy Singer-Vine (https://source.opennews.org/articles/what-weve-learned-about-sharing-our-data-analysis/) The libraries and tools we’ve used in this book have stood the test of time among Python users, but new libraries pop up all the time and may do certain things better than what is already available.

The Jupyter Notebook web app, which evolved out of the web app IPython Notebooks, was created to accommodate three programming languages—Julia, Python, and R (Ju-Pyt-R)—but has since evolved to support many other coding languages. Jupyter Notebook is also used by many data scientists in a diverse range of fields, including people crunching numbers to improve website performance, sociologists studying demographic information, and journalists searching for trends and anomalies in data obtained through Freedom of Information Act requests. One huge benefit of this is that many of these data scientists and researchers put their notebooks—often featuring detailed and annotated analyses—online on code-sharing platforms like GitHub, making it easier for beginning learners like you to replicate their studies.

But the findings in summary data come from rather messy, wildly varying replies to surveys or other databases of raw data—that is, data that has not been processed yet. Data tables provided by organizations like the US Census Bureau often have been cleaned, processed, and aggregated from thousands—if not millions—of raw data entries, many of which may contain several inconsistencies that data scientists worked to resolve. For example, in a simple table listing people’s occupations, these organizations may have resolved different but essentially equivalent responses like “attorney” and “lawyer.” Likewise, the raw data we look at in this book—data from the social web—can be quite irregular and challenging to process because it’s pro-duced by real people, each with unique quirks and posting habits.

Pandas for Everyone: Python Data Analysis
by Unknown

Numpy 3 http://pandas.pydata.org/pandas-docs/stable/basics.html#descriptivestatistics print(ages.mean()) 49.0 print(ages.min()) 37 print(ages.max()) 61 print(ages.std()) 16.97056274847714 The mean, min, max, and std are also methods in the numpy.ndarray Series append methods Description Concatenates 2 or more Series corr Calculate a correlation with another Series* cov Calculate a covariance with another Series* describe Calculate summary statistics* drop duplicates Returns a Series without duplicates equals Sees if a Series has the same elements get values Get values of the Series, same as the values attribute hist Draw a histogram min Return the minimum value max Returns the maximum value mean Returns the arithmetic mean median Returns the median mode Returns the mode(s) quantile Returns the value at a given quantile replace Replaces values in the Series with a specified value sample Returns a random sample of values from the Series sort values Sort values to frame Converts Series to DataFrame transpose Return the transpose unique Returns a numpy.ndarray of unique values indicates missing values will be automatically dropped 2.5.2 Boolean subsetting Series Chapter 1 showed how we can use specific indicies to subset our data. However, it is rare that we know the exact row or column index to subset the data. Typically you are looking for values that meet (or don’t meet) a particular calculation or observation. First, let’s use a larger dataset scientists pd.read_csv(’../data/scientists.csv’) We just saw how we can calculate basic descriptive metrics of vectors 4 http://does.scipy.org/doc/numpy/reference/arrays.ndarray.html ages = scientists[’Age’] print(ages) 0 37 1 61 2 90 3 66 4 56 5 45 6 41 7 77 Name: Age, dtype: int64 print(ages.mean()) 59.125 print(ages.describe()) count 8.000000 mean 59.125000 std 18.325918 min 37.000000 25% 44.000000 50% 58.500000 75% 68.750000 max 90.000000 Name: Age, dtype: float64 What if we wanted to subset our ages by those above the mean?

print(taxi_loop_concat_comp equals(taxi_loop_concat)) True 6.7 Summary Here I showed you how we can reshape data to a format that is conducive for data analysis, visualization, and collection. We followed Hadley Wickham's Tidy Data paper to show the various functions and methods to reshape our data. This is an important skill since various functions will need data in a certain shape, tidy or not, in order to work. Knowing how to reshape your data will be an important still as a data scientist and analyst.

You can see TODO USING FUNCTIONS of you need more information on using function parameters 7 http://pandas.pydata.org/pandasdocs/stable/generated/pandas.Series.to_csv.html 8 http://pandas.pydata.org/pandasdocs/stable/generated/pandas.DataFrame.to_csv.html 9 http://pandas.pydata.org/pandasdocs/stable/generated/pandas.DataFrame.to_csv.html 10 http://pandas.pydata.org/pandasdocs/stable/generated/pandas.read_csv.html 2.8.3 Excel Excel, probably the most common data type (or second most common, next to CSVs). Excel has a bad reputation within the data science community. I discuessed some of the reasons why in Chapter 1.1. The goal of this book isn’t to bash Excel, but to teach you a resonable alternative tool for data analytics. In short, the more you can do your work in a scripting language, the easier it will be to scale up to larger projects, catch and fix mistakes, and collaborate.

pages: 156 words: 15,746

Personal Finance with Python
by Max Humber

The second edition of this hands-on guide—updated for Python 3.5 and pandas 1.0 —is packed with practical cases studies that show you how to effectively solve a broad set of data analysis problems, using Python libraries such as NumPy, pandas, Matplotlib, and IPython. Python Data Science Handbook: Tools and Techniques for Developers by Jake VanderPlas For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with this book do you get them all—IPython, NumPy, pandas, Matplotlib, scikit-learn, and other related tools. Finally, please e-mail me if you have any concerns, questions, or comments about this book.

Conclusion Chapter 3:​ Convert openexchangerate​s.​org Secrets Documentation Encapsulate show_​alternative .​apply Conclusion Chapter 4:​ Amortize Banks Amortization Payment Loop A Loop B Functionize Evaluate Conclusion Chapter 5:​ Budget Dates datetime Timestamp .​normalize Horizon Flows Totals Visualization Updating Vacation I English get_​dates Fun YAML Functionize Vacation II Loading YAML Conclusion Chapter 6:​ Invest Trade-Offs Instantiate Prices Orders Deposit Simulate Quotes get_​price get_​historical Portfolio Rebalance Conclusion Chapter 7:​ Spend Prophet Purchases Forecast Visualize Conclusion Appendix: Next Index About the Author and About the Technical Reviewer About the Author Max Humberis a Data Engineer interested in improving finance with technology. He works for Wealthsimple and previously served as the first data scientist for the online lending platform Borrowell. He has spoken at Pycon, ODSC, PyData, useR, and BigDataX in Colombia, London, Berlin, Brussels, and Toronto. About the Technical Reviewer Michael Thomas has worked in software development for more than 20 years as an individual contributor, team lead, program manager, and vice president of engineering.

pages: 25 words: 5,789

Data for the Public Good
by Alex Howard
Published 21 Feb 2012

If you want a deep look at what the work of digitizing data really looks like, read Carl Malamud’s interview with Slashdot on opening government data. Data for the public good, however, goes far beyond government’s own actions. In many cases, it will happen despite government action — or, often, inaction — as civic developers, data scientists and clinicians pioneer better analysis, visualization and feedback loops. For every civic startup or regulation, there’s a backstory that often involves a broad number of stakeholders. Governments have to commit to open up themselves but will, in many cases, need external expertise or even funding to do so.

To create public good from public goods — the public sector data that governments collect, the private sector data that is being collected and the social data that we generate ourselves — we will need to collectively forge new compacts that honor existing laws and visionary agreements that enable the new data science to put the data to work. About the Author Alexander B. Howard is the Government 2.0 Correspondent for O’Reilly Media, where he reports on technology, open government and online civics. Before joining O’Reilly, Howard was the associate editor of SearchCompliance.com at TechTarget. His work there focused on how regulations affect IT operations, including issues of data protection, privacy, security and enterprise IT strategy.

pages: 296 words: 78,112

Devil's Bargain: Steve Bannon, Donald Trump, and the Storming of the Presidency
by Joshua Green
Published 17 Jul 2017

Trump’s campaign rejected the attack and criticized the ADL for involving itself in partisan politics. “Darkness is good,” Bannon counseled Trump. “Don’t let up.” By this point, the campaign had curtailed most of its polling. But it wasn’t quite flying blind. A few days earlier, Trump’s team of data scientists, squirreled away in an office down in San Antonio, had delivered a report titled “Predictions: Five Days Out,” which contained stunning news that contradicted the widespread assumption that Clinton would win easily. It was suddenly clear that Comey’s FBI investigation was roiling the electorate.

Bannon was fixated on Michigan, constantly urging Stepien to click back over and zoom in on bellwethers like Macomb County—and with 30 percent, 40 percent, then 50 percent of precincts reporting, Trump’s lead was holding steady. Or growing. As the night wore on, he even led in Wisconsin, a scenario none of Trump’s data scientists had ever imagined. Later on, several of those crammed into the room would recall a moment when Stepien’s manic patter flagged for just a second and the room fell quiet. Then somebody—no one could remember who—muttered, “Holy shit. This is happening.” Drudge was right. The corporate media blew it.

The unmarked entrance is framed by palmetto trees and sits beneath a large, second-story veranda with sweeping overhead fans, where the (mostly male) staff gathers in the afternoons to smoke cigars and brainstorm. Established in 2012 to study crony capitalism and governmental malfeasance, GAI is staffed with lawyers, data scientists, and forensic investigators and has collaborated with such mainstream news outlets as Newsweek, ABC News, and CBS’s 60 Minutes on stories ranging from insider trading in Congress to credit-card fraud among presidential campaigns. It’s a mining operation for political scoops that, for two years, had trained its investigative firepower on the Clintons.

pages: 197 words: 35,256

NumPy Cookbook
by Ivan Idris
Published 30 Sep 2012

By night, he cultivates his academic interests in mathematics and computer science, and plays with mathematical and scientific software. Ryan R. Rosario is a Doctoral Candidate at the University of California, Los Angeles. He works at Riot Games as a Data Scientist, and he enjoys turning large quantities of massive, messy data into gold. He is heavily involved in the open source community, particularly with R, Python, Hadoop, and Machine Learning, and has also contributed code to various Python and R projects. He maintains a blog dedicated to Data Science and related topics at http://www.bytemining.com. He has also served as a technical reviewer for NumPy 1.5 Beginner's Guide. www.PacktPub.com Support files, eBooks, discount offers and more You might want to visit www.PacktPub.com for support files and downloads related to your book.

You should pay extra attention to the one line where we call the Canny filter function: from sklearn.datasets import load_sample_images from matplotlib.pyplot import imshow, show, axis import numpy import skimage.filter dataset = load_sample_images() img = dataset.images[0] edges = skimage.filter.canny(img[..., 0], 2, 0.3, 0.2) axis('off') imshow(edges) show() The code produces an image of the edges within the original picture, as shown in the following screenshot: Installing Pandas Pandas is a Python library for data analysis. It has some similarities with the R programming language, which are not coincidental. R is a specialized programming language popular with data scientists. For instance, the core DataFrame object is inspired by R. How to do it... On PyPi, the project is called pandas. So, for instance, run either of the following two command: sudo easy_install -U pandas pip install pandas If you are using a Linux package manager, you will need to install the python-pandas project.

pages: 586 words: 186,548

Architects of Intelligence
by Martin Ford
Published 16 Nov 2018

So, it’s not surprising that when we have real robots, they’re going to be able to do those jobs. I also think that the current mindset among governments is: “Oh, well then. I guess we really need to start training people to be data scientists, because that’s the job of the future—or robot engineers.” This clearly isn’t the solution because we don’t need a billion data scientists and robot engineers: we just need a few million. This might be a strategy for a small country like Singapore; or where I am currently, in Dubai, it might also be a viable strategy. But it’s not a viable strategy for any major country because there is simply not going to be enough jobs in those areas.

These are not often done within a single company. Does that integration pose new challenges? DAPHNE KOLLER: Absolutely. I think the biggest challenge is actually cultural, in getting scientists and data scientists to work together as equal partners. In many companies, one group sets the direction, and the other takes a back seat. At insitro, we really need to build a culture in which scientists, engineers, and data scientists work closely together to define problems, design experiments, analyze data, and derive insights that will lead us to new therapeutics. We believe that building this team and this culture well is as important to the success of our mission as the quality of the science or the machine learning that these different groups will create.

At the end of the day, we’re designing these systems, and we get to say how they are deployed, we can turn the switch off. CEO & CO-FOUNDER OF AFFECTIVA Rana el Kaliouby is the co-founder and CEO of Affectiva, a startup company that specializes in AI systems that sense and understand human emotions. Affectiva is developing cutting-edge AI technologies that apply machine learning, deep learning, and data science to bring new levels of emotional intelligence to AI. Rana is an active participant in international forums that focus on ethical issues and the regulation of AI to help ensure the technology has a positive impact on society. She was selected as a Young Global Leader by the World Economic Forum in 2017.

pages: 561 words: 157,589

WTF?: What's the Future and Why It's Up to Us
by Tim O'Reilly
Published 9 Oct 2017

CHAPTER 8: MANAGING A WORKFORCE OF DJINNS 155 breakthroughs and business processes: Steve Lohr, “The Origins of ‘Big Data’: An Etymological Detective Story,” New York Times, February 1, 2013, https://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/. 155 speech recognition and machine translation: Alon Halevy, Peter Norvig, and Fernando Pereira, “The Unreasonable Effectiveness of Data,” IEEE Intelligent Systems, 1541–1672/09, retrieved March 31, 2017, https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf. 156 “the sexiest job of the 21st century”: Thomas Davenport and D. J. Patil, “Data Scientist: The Sexiest Job of the 21st Century,” Harvard Business Review, October 2012, https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century. Hal Varian had used this same phrase about statistics in 2009. See “Hal Varian on How the Web Challenges Managers,” McKinsey & Company, January 2009, http://www.mckinsey.com/industries/high-tech/our-insights/hal-varian-on-how-the-web-challenges-managers. 157 “the right values for these parameters is something of a black art”: Sergey Brin and Larry Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Stanford University, retrieved March 31, 2017, http://infolab.stanford. edu/~backrub/google.html. 158 as many as 50,000 subsignals: Danny Sullivan, “FAQ: All About the Google RankBrain Algorithm,” Search Engine Land, June 23, 2016, http://searchengine land.com/faq-all-about-the-new-google-rankbrain-algorithm-234440. 158 “new synapses for the global brain”: Tim O’Reilly, “Freebase Will Prove Addictive,” O’Reilly Radar, March 8, 2007, http://radar.oreilly.com/2007/03/free base-will-prove-addictive.html. 158 “10 experiments for every successful launch”: Matt McGee, “BusinessWeek Dives Deep into Google’s Search Quality,” Search Engine Land, October 6, 2009, http://searchengineland.com/businessweek-dives-deep-into-googles-search-quality-27317. 159 the manual that they provide: Search Quality Evaluator Guide, Google, March 14, 2017, http://static.googleusercontent.com/media/www.google.com/en//inside search/howsearchworks/assets/search qualityevaluatorguidelines.pdf. 160 “Another big difference”: Brin and Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Section 3.2.

Its insight, that “simple models and a lot of data trump more elaborate models based on less data,” has been fundamental to progress in field after field, and is at the heart of many Silicon Valley companies. It is even more central to the latest breakthroughs in artificial intelligence. In 2008, D. J. Patil at LinkedIn and Jeff Hammerbacher at Facebook coined the term data science to describe their jobs, naming a field that a few years later was dubbed by Harvard Business Review as “the sexiest job of the 21st century.” Understanding the data science mindset and approach and how it differs from older methods of programming is critical for anyone who is grappling with the challenges of the twenty-first century. How Google deals with search quality provides important lessons.

Tellingly, Jeff Hammerbacher, who worked on Wall Street before leading the data team at Facebook, once said, “The best minds of my generation are thinking about how to make people click ads. That sucks.” Jeff left Facebook and now plays a dual role as chief scientist and cofounder at big data company Cloudera and faculty member of the Icahn School of Medicine at Mount Sinai, in New York, where he runs the Hammer Lab, a team of software developers and data scientists trying to understand how the immune system battles cancer. The choice of the problems to which we apply the superpowers of our new digital workforce is ultimately up to us. We are creating a race of djinns, eager to do our bidding. What shall we ask them to do? 9 “A HOT TEMPER LEAPS O’ER A COLD DECREE” I SPOKE IN EARLY 2017 AT A GATHERING OF MINISTERS FROM the Organisation for Economic Co-operation and Development (OECD) and G20 nations to discuss the digital future.

pages: 788 words: 223,004

Merchants of Truth: The Business of News and the Fight for Facts
by Jill Abramson
Published 5 Feb 2019

Her team created: Mathew Ingram, “BuzzFeed Opens Up Access to Its Viral Dashboard,” Gigaom, September 2, 2010, https://gigaom.com/2010/09/02/buzzfeed-opens-up-access-to-its-viral-dashboard/. This was called Social Lift: Dao Nguyen and Ky Harlin, “How BuzzFeed Thinks about Data Science,” BuzzFeed, September 24, 2014, https://www.buzzfeed.com/daozers/how-buzzfeed-thinks-about-data-science?utm_term=.cuPg946z1#.uqLpMYjOL. The dashboard offered more than: Felix Oberholzer-Gee, “BuzzFeed—The Promise of Native Advertising,” Harvard Business School Case 714-512, June 2014 (revised August 2014), 539; Christine Lagorio-Chafkin, “Meet BuzzFeed’s Secret Weapon,” Inc., September 2, 2014, https://www.inc.com/christine-lagorio/buzzfeed-secret-growth-weapon.html.

When Zuckerberg shared a photo of himself giving out full-size candy bars for Halloween, Peretti made known that he “liked” it and shared a link to BuzzFeed’s coverage of a trick-or-treat stunt, led by Matt Stopera’s fawning confession, “I personally really admire him for this.” He kept in touch with Cameron Marlow, his grad-school friend who had since become Facebook’s chief data scientist, by posting public messages back and forth on each other’s “walls.” He made “Facebook-official” friendships with the company’s heads of product, media partnerships, and global creative strategy, one of its cofounders, and a board member. Facebook was growing fast. By 2008 the number of monthly users had nearly tripled, and over the next two years it would more than quadruple to over 600 million.

“News might not be as big a business as entertainment, but news is the best way to have a big impact on the world.” He disliked wishy-washy barometers like “impact,” but for the time being the term would have to do. The practical value of journalism was too nebulous for his quantitatively inclined mind. Then he received a gift from Nguyen and her team of data scientists. It was a new tool built into the dashboard, which they called a Heat Map. The software noted how far down the page a reader scrolled, then collated those data in one simple visual with whether the story lost readers’ attention and where. It was just what BuzzFeed needed to apply its analytical rigor to stories that were longer and larger in scope.

pages: 467 words: 149,632

If Then: How Simulmatics Corporation Invented the Future
by Jill Lepore
Published 14 Sep 2020

Although, notably, Google abandoned “Don’t be evil” in 2015, and Alphabet Inc. adopted a code of conduct that urged employees to “do the right thing,” http://blogs.wsj.com/digits/2015/10/02/as-google-becomes-alphabet-dont-be-evil-vanishes/. Mathematicians-turned-businessmen Jeff Hammerbacher and D. J. Patil claim to have coined the term “data scientist” in 2008, and not long after, data science was also embraced both within and outside the academy as a new scientific method, a “fourth paradigm,” following the earlier paradigms of empirical, theoretical, and computational analysis. Thomas H. Davenport and D. J. Patil, “Data Scientist: The Sexiest Job of the 21st Century,” Harvard Business Review, October 2012. Tony Hey, Stewart Tansley, and Kristin Michele Tolle, The Fourth Paradigm: Data-Intensive Scientific Discovery (Redmond, WA: Microsoft Research, 2009).

“Don’t be evil,” the motto of Google, marked the limit of a swaggering, devil-may-care ethical ambition; doing good did not come into it.7 Incubated decades before, beneath a honeycombed, geodesic dome in Wading River, this work found a place, too, in universities. In the 2010s, a flood of money into universities attempted to make the study of data a science, with data science initiatives, data science programs, data science degrees, data science centers.8 Much academic research that fell under the label “data science” produced excellent and invaluable work, across many fields of inquiry, findings that would not have been possible with computational discovery.9 And no field should be judged by its worst practitioners. Still, the shadiest data science, like the shadiest behavioral science, grew in influence by way of self-mystification, exaggerated claims, and all-around chicanery, including fast-changing, razzle-dazzle buzzwords, from “big data” to “data analytics.”

Much of university life had by the 2010s followed the model of the Media Lab, collapsing the boundaries between corporate commissions, academic inquiry, and hucksterism; at its worst, behavioral data science’s self-mystification was meant to boggle the mind, daunt critics, and entice corporate sponsors and venture capitalists. “Data science is in its infancy,” wrote one MIT computer scientist in 2015. “Few individuals or organizations understand the potential of and the paradigm shift associated with Data science, let alone understand it conceptually.”13 The more mystification, the wealthier the donors. The number of data science programs stretched into the hundreds, even though little consensus had been reached on the meaning or purpose of “data science.” New disciplines and methods take time to find their way; that’s all to the good.

pages: 320 words: 87,853

The Black Box Society: The Secret Algorithms That Control Money and Information
by Frank Pasquale
Published 17 Nov 2014

To have been prominent at a critical point in Internet development was a similar piece of luck. Google or Facebook were once in the right place at the right time. It’s not clear whether they are still better than anyone else at online data science, or whether their prominence is such that they’ve become the permanent “default.” We also have to ask whether data science is still key here, or just the data itself. When intermediaries like Google and Facebook leverage their enormous databases of personalized information to target advertising, how much value do they add in the process? This is a matter of some dispute.

Yet choosing a car, or even a restaurant, is not as straightforward as optimizing an engine or routing a drive. Does the recommendation engine take into account, say, whether the restaurant or car company gives its workers health benefits or maternity leave? Could we prompt it to do so? In their race for the most profitable methods of mapping social reality, the data scientists of Silicon Valley and Wall Street tend to treat recommendations as purely technical problems. The values and prerogatives that the encoded rules enact are hidden within black boxes.23 INTRODUCTION—THE NEED TO KNOW 9 The most obvious question is: Are these algorithmic applications fair? Why, for instance, does YouTube (owned by Google) so consistently beat out other video sites in Google’s video search results?

“Competition is one click away,” chant the Silicon Valley antitrust lawyers when someone calls out a behemoth firm for unfair or misleading business practices.149 It’s not so. Alternatives are demonstrably worse, and likely to remain so as long as the dominant firms’ self-reinforcing data advantage grows. 84 THE BLACK BOX SOCIETY Search and Compensation At the 2013 Governing Algorithms conference at New York University, a data scientist gave a dazzling presentation of how her company maximized ad revenue for its clients. She mapped out information exchanges among networks, advertisers, publishers, and the other stars of the Internet universe, emphasizing how computers are taught by skilled programmers like herself to fi nd unexpected correlations in click-through activity.

pages: 205 words: 71,872

Whistleblower: My Journey to Silicon Valley and Fight for Justice at Uber
by Susan Fowler
Published 18 Feb 2020

It was that infrastructure—the servers, the operating systems, the networks, and all of the code that connected the applications together—that I would be working on, that I would need to make better, more reliable, and more fault-tolerant. After Engucation came more specialized training to prepare new hires for their particular roles within the company. New data scientists spent time with their data science teams, front-end developers learned how to work with the front-end code, and I would embed with one of the site reliability engineering (SRE) teams and learn the basics before I could join my permanent team. Eamon assigned me to one of his SRE teams for this phase of training.

The entire time I’d been a student at Penn, I’d watched as my undergraduate classmates in both physics and philosophy interviewed with companies like Facebook and Google, did software engineering internships in San Francisco over the summers, and boasted about their high-paying job offers from companies like Palantir and Microsoft. Almost all of the graduate students in my physics lab left academia after they completed their PhDs or finished their postdocs and moved to New York or San Francisco to work as data scientists, software engineers, and product managers. Jumping from physics into the technology industry wasn’t a very big leap: physics was a technical field, and physics students were technically inclined, comfortable with mathematics and computer science. I suspected that the leap from physics into software engineering wouldn’t be that difficult for me.

Everyone went out for drinks that night to celebrate the two new employees: me and the new office manager, Heidi. We were the only women in the office; as I later learned, they had us start on the same day so that we wouldn’t feel “alone.” I was the only woman in the office in a technical role (the only other technical woman, a data scientist, lived and worked in Boston), but that didn’t seem strange or unusual to me; at Penn, I had been the only woman on the floor where my lab was located in the David Rittenhouse Laboratory, and there were only men’s bathrooms on the floor where I worked. As the happy hour came to an end, we all stood outside the bar while each person called an Uber ride home.

pages: 290 words: 90,057

Billion Dollar Brand Club: How Dollar Shave Club, Warby Parker, and Other Disruptors Are Remaking What We Buy
by Lawrence Ingrassia
Published 28 Jan 2020

A “decision tree” algorithm, as its name implies, correlates a cascading series of dozens or hundreds or thousands of data points (like branches on a tree) to winnow out items you probably won’t like and add things you probably will. “Random forest” algorithms are ensembles of dozens or even hundreds of different simpler algorithms that work together and correct for possible errors. Stitch Fix, which employs a chief algorithms officer and more than a hundred data scientists, probably is the only consumer product company ever to post on its website an “Algorithms Tour.” An elaborate and lengthy online graphic, it explains how Stitch Fix uses data (some not having an obvious correlation or connection) to act as a clothing matchmaker. “Each attribute that describes a piece of merchandise can be represented as data and reconciled to each client’s unique preferences,” Eric Colson, Stitch Fix’s chief algorithms officer, explains on the company’s MultiThreaded blog.

As Stitch Fix has developed more sophisticated algorithms, it has incorporated the use of computer vision to help select clothing. “We have our machines look at photos of clothing that customers like (e.g., from Pinterest), and look for visually similar items,” the website explains. And while the company initially sold apparel and accessories made by others, its data scientists in 2017 started designing “Stitch Fix exclusive brand” items by combining different style characteristics from popular clothing. In-house designers create these “Hybrid Designs” by taking ideas generated by artificial intelligence about the kinds of clothing its customers might like. Predictive analysis is now employed by a wide variety of digitally native brands.

Then the women were asked if they liked the color better than the results they got with a do-it-yourself kit at a drugstore. “We recruited people who were willing to put this product on their hair, with no idea who we really are. The results were positive enough for us to say we have something. Now what?” he recalls. The next step: tapping their knowledge of tech and data science to figure out how to replicate color customization for sale to fifty thousand women, not just fifty. “We wanted to do something that was really innovative, not bullshit innovative,” Mourad says. “Really different and value added.” Omar Mourad, Tamim’s younger brother and another of PriceGrabber’s cofounders, read everything he could about dyeing hair.

pages: 248 words: 73,689

Age of the City: Why Our Future Will Be Won or Lost Together
by Ian Goldin and Tom Lee-Devlin
Published 21 Jun 2023

The combination of globalization and rapid technological progress has led to the disappearance of many manufacturing and clerical jobs – long anchors of the middle class – as work was either eliminated entirely or sent overseas to be performed more cheaply. As these jobs have disappeared, a barbell-shaped economy has emerged, characterized by high-paid knowledge jobs, such as management consultants and data scientists, on the one hand and low-paid service jobs, like baristas and warehouse workers, on the other. In the words of economists Maarten Goos and Alan Manning, we have seen the workforce split into ‘lousy’ and ‘lovely’ jobs.23 Manufacturing jobs in particular have long been the life force behind smaller cities and also many rural towns.

Company value chains often cross industry boundaries. Proximity to financial and professional services, for example, is an important drawcard for many large cities. The presence of top universities is another factor. The same universities that produce investment bankers also produce lawyers, data scientists and medics. Lastly, university-educated professionals typically choose to live in places where other university-educated professionals live. Partly this is about humans’ natural tendency to gravitate towards people with similar outlooks to themselves, what sociologists call ‘homophily’. Increasingly, it is also about romance.

As automation makes it increasingly challenging for poor countries to follow the traditional path of industrial development, a significant expansion of investment in education will be required. While we are sceptical that the future of knowledge work is entirely remote, we do see scope for certain activities to be shifted to well-educated workers in poor countries in cases where the gains to companies from lower wages overseas outweigh the disadvantages of remote collaboration. Data science, financial analysis and doubtless other areas of knowledge work could probably be performed from abroad. The experience of Bangalore suggests that a hub for offshoring knowledge work can also mature into a global centre of innovation in its own right. It is vital that, as they become more prosperous, the cities of the developing world do not adopt the car-based sprawl of many rich world cities.

pages: 389 words: 87,758

No Ordinary Disruption: The Four Global Forces Breaking All the Trends
by Richard Dobbs and James Manyika
Published 12 May 2015

As big data emerged as the next big opportunity in sectors ranging from finance to government, both the talent supply and employers’ understanding of the skills they need struggled to keep up. “There aren’t enough data scientists, not even close,” said Sandy Pentland, a computer scientist and management thinker at MIT. “We tend to teach people that everything that matters happens between your ears when in fact it actually mostly happens between people.” Pentland argues that the lack of data scientists makes it more difficult to fully apply the technology.5 More than two-thirds of companies are struggling against limited or no capabilities in data analytics techniques.6 The story is not restricted to data analytics positions.

Richard Dobbs, Anu Madgavkar, Dominic Barton, Eric Labaye, James Minyika, Charles Roxburgh, Susan Lund, and Siddarth Madhav, “The world at work: Jobs, pay, and skills for 3.5 billion people,” June 2012, McKinsey & Company, www.mckinsey.com/insights/employment_and_growth/the_world_at_work. 5. Danny Palmer, “Not enough data scientists, MIT expert tells Computing,” Computing, September 4, 2013, www.computing.co.uk/ctg/news/2292485/not-enough-data-scientists-mit-expert-tells-computing. 6. Thomas Wailgum, “Monday metric: 68% of companies struggle with big data analytics,” ASUG News, March 18, 2013, www.asugnews.com/article/monday-metric-68-of-companies-struggle-with-big-data-analytics. 7.

Upgrading to premium membership—monthly prices start at $59.99 per month for the Business Plus account—affords the user greater insight into who has been looking at his or her profile, the ability to send more messages to potential leads, and the use of more advanced search filters.60 A third model is monetization of big data, either through innovative business-to-business offerings (for example, crowd-sourcing business intelligence or outsourced data science services) or through developing more relevant products, services, or content for which consumers are willing to pay. LinkedIn, for example, makes 20 percent of its revenue from subscriptions, 30 percent from marketing, and 50 percent from talent solutions, a core part of which is selling targeted talent intelligence and tools to recruiters.61 You will have to keep experimenting in order to capture more consumer surplus for your business.

pages: 321 words: 113,564

AI in Museums: Reflections, Perspectives and Applications
by Sonja Thiel and Johannes C. Bernhardt
Published 31 Dec 2023

Another crucial aspect is the prevailing ideology of tech solutionism, which insists that innovation inherently relies on digitization (Lanier/ Weyl 2020). As a result, traditional approaches to problem-solving should be considered before technology. Furthermore, the knowledge informing this field is heavily concentrated in Silicon Valley, primarily among data scientists and AI engineers. So, we must ask ourselves: Who shapes the understanding within this field? (Stilgoe 2020). Why do we trust technical experts to make far-reaching decisions that profoundly impact our societies? Why do we grant such extensive power to privately owned companies? It is time to reclaim power and give it back to the people.

The collection data is, moreover, linked to high-level ontologies, vocabularies, or thesauri systems such as AAT, GND, Geonames, Wikidata, or ICONCLASS, which ensure the correct use of terms and provide additional context. These sources of knowledge representation provide a high-quality source for machine learning tasks, but so far nevertheless seem to be underrated. At the same time, the efforts to transfer formats and facilitate communication between domain experts and data scientists and developers should also not be underestimated. In recent years, there has been a shift in the understanding of collection data—from single object representations to open access, downloadable datasets, pre-curated datasets, or even data labs with an open application programming interface (API), where digital users are given new access to museum data—which is publicly available not only for viewing, but also for reuse or research.

Sonja Thiel: Managing AI The varying quality of object datasets and the different quality requirements of domain experts and machine learning experts remain a problem when compiling data for machine learning tasks. While, for curators, every single word, context, and a multifactorial and detailed description of an object are of paramount importance depending on the machine learning task, for data scientists or developers a lot of this information is not usable and therefore quickly removed from or simplified for a training dataset—thus leading to a potential contextual loss of cultural heritage information, the consequences of which have not yet been well studied. As pointed out by Srinivasan et al. (2021), many of the ethical concerns about machine learning technologies in creative fields are related to the underlying datasets.

pages: 345 words: 75,660

Prediction Machines: The Simple Economics of Artificial Intelligence
by Ajay Agrawal , Joshua Gans and Avi Goldfarb
Published 16 Apr 2018

Cardiogram, in its preliminary study, used six thousand people, including just two hundred with an irregular heart rhythm. Over time, one way to collect further data is through feedback on whether the app’s users have or develop irregular heart rhythms. Where did the six thousand come from? Data scientists have excellent tools for assessing the amount of data required given the expected reliability of the prediction and the need for accuracy. These tools are called “power calculations” and tell you how many units you need to analyze to generate a useful prediction.5 The salient management point is that you must make a trade-off: more accurate predictions require more units to study, and acquiring these additional units can be costly.

A sabermetric analyst develops measures for the rewards that the team would receive from signing different players. Sabermetric analysts are baseball’s reward function engineers. Now, most teams have at least one such analyst, and the role has appeared, under different names, in other sports. Better prediction created a new high-level position on the org chart. The research scientists, data scientists, and vice presidents of analytics are listed as key roles in the online front office directories. The Houston Astros even have a separate decision sciences unit headed by former NASA engineer Sig Mejdal. The strategic change also means a switch in who the team employs to pick the players. These analytics experts have mathematical skills, but the finest of them understand best what to tell the prediction machine to do.

Joshua holds a PhD in economics from Stanford University and, in 2008, was awarded the Economic Society of Australia’s Young Economist Award (the Australian equivalent of the John Bates Clark medal). AVI GOLDFARB is the Ellison Professor of Marketing at the Rotman School of Management, University of Toronto. Avi is also chief data scientist at the Creative Destruction Lab, senior editor at Marketing Science, and a research associate at the National Bureau of Economic Research. Avi’s research focuses on the opportunities and challenges of the digital economy, with funding from Google, Industry Canada, Bell Canada, AIMIA, SSHRC, the National Science Foundation, the Sloan Foundation, and others.

pages: 392 words: 114,189

The Ransomware Hunting Team: A Band of Misfits' Improbable Crusade to Save the World From Cybercrime
by Renee Dudley and Daniel Golden
Published 24 Oct 2022

Concerned about whether the HTCU could provide extra attention that people with autism might need, even broad-minded Marijn had his doubts about Peter’s pitch. But an HTCU program coordinator named Yvonne Horst was a believer. At her urging, the HTCU agreed to give Mark a six-month trial internship. Mark started as a junior data science intern at the HTCU at the same time as three professional data scientists hired by the unit. His new colleagues had university degrees and career experience at major companies such as accounting giant KPMG. Mark’s inexperience starkly contrasted with their advanced skills. “My six-month introductory course at ITvitae is not comparable to four years of university studies,” he conceded.

However, the Dutch National Police also sought candidates to work in its ten regional cyber squads, which handled more routine cases. Over time, ITvitae sent about two dozen additional students to work on those regional squads. Some handled tedious yet essential tasks such as reviewing crime-scene footage from security cameras, and were able to home in on small details that others might have missed. Others worked as data scientists or digital investigators. “If Mark did not advance the cold case, then Tom wouldn’t have fulfilled his dream to work at the police, and twenty-five others probably wouldn’t have had the opportunity,” Peter said, his voice quaking with emotion. “They are misfits. But they are very, very important to have at your organization

After crippling the DCH Regional Medical Center in Tuscaloosa, Alabama, and other hospitals in 2019, it doubled down on healthcare attacks in October 2020, sowing anxiety and confusion among patients and providers across the United States. The timing suggests that Ryuk was avenging one of the biggest and most damaging actions taken against ransomware. Since 2018, Microsoft’s Digital Crimes Unit—consisting of more than forty full-time investigators, analysts, data scientists, engineers, and attorneys—had been investigating TrickBot, the Russian malware that delivered Ryuk into victims’ computers. Concerns that the Putin regime might use TrickBot to disrupt the 2020 U.S. presidential election added urgency to the task, though the fears would prove unfounded. Microsoft investigators analyzed 61,000 samples of TrickBot malware, as well as the infrastructure underpinning the network of infected computers.

pages: 370 words: 112,809

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future
by Orly Lobel
Published 17 Oct 2022

For example, the Bread for the World Institute recently issued a report showing that 92 percent of gender-specific economic data is missing from Africa. We are not measuring the struggles of those who are most in need. Millions remain in the shadows. But we can tackle the problem today thanks to technology, and the institute is working with volunteer coders, data scientists, statisticians, and graphic designers to begin to systematically collect what matters—to materialize the missing data and bring problems like these to light. Supreme Court Justice Louis Brandeis famously said that sunlight is the best of disinfectants, and electric light the most efficient policeman.

An equality machine mindset actively charts the course of the future, anticipating the many ways in which that future is unknown. This means designing governance systems and infrastructure that will continue to channel technological advancement down a progressive path. Inside Out, Outside In In late 2021, Frances Haugen, a thirty-seven-year-old data scientist, became one of the most famous whistleblowers in recent times. Testifying before both American and European legislatures, Haugen revealed that Facebook, her former employer, was time and again choosing profit over the safety and well-being of its users. Haugen asserted that Facebook consistently valued profit over safety by allowing algorithms to favor hateful content in order to bring users back to the social media platform for more traffic.

As we move beyond traditional litigation frameworks, government agencies also become research and development arms that incentivize, test, approve, and monitor proactive prevention programs. The immense challenge of harnessing technology for equality is one that must involve people from all disciplines and sectors. Social scientists, for example, must work with data scientists to provide context and ask the critical questions about definitions, the sources of data, and the interpretation of patterns. There are accelerated, heated debates and numerous legislative proposals to tighten the regulation of digital technology, including to amend Section 230 of the U.S. Communications Decency Act of 1996 in ways that would limit digital platform immunity and require platforms to moderate illegal content.

pages: 571 words: 105,054

Advances in Financial Machine Learning
by Marcos Lopez de Prado
Published 2 Feb 2018

These features were discovered by different analysts studying a wide range of instruments and asset classes. The goal of the strategist is to make sense of all these observations and to formulate a general theory that explains them. Therefore, the strategy is merely the experiment designed to test the validity of this theory. Team members are data scientists with a deep knowledge of financial markets and the economy. Remember, the theory needs to explain a large collection of important features. In particular, a theory must identify the economic mechanism that causes an agent to lose money to us. Is it a behavioral bias? Asymmetric information? Regulatory constraints?

One of the scenarios of interest is how the strategy would perform if history repeated itself. However, the historical path is merely one of the possible outcomes of a stochastic process, and not necessarily the most likely going forward. Alternative scenarios must be evaluated, consistent with the knowledge of the weaknesses and strengths of a proposed strategy. Team members are data scientists with a deep understanding of empirical and experimental techniques. A good backtester incorporates in his analysis meta-information regarding how the strategy came about. In particular, his analysis must evaluate the probability of backtest overfitting by taking into account the number of trials it took to distill the strategy.

I have listed a few of them in the references section. The core audience of this book is investment professionals with a strong ML background. My goals are that you monetize what you learn in this book, help us modernize finance, and deliver actual value for investors. This book also targets data scientists who have successfully implemented ML algorithms in a variety of fields outside finance. If you have worked at Google and have applied deep neural networks to face recognition, but things do not seem to work so well when you run your algorithms on financial data, this book will help you. Sometimes you may not understand the financial rationale behind some structures (e.g., meta-labeling, the triple-barrier method, fracdiff), but bear with me: Once you have managed an investment portfolio long enough, the rules of the game will become clearer to you, along with the meaning of these chapters. 1.5 Requisites Investment management is one of the most multi-disciplinary areas of research, and this book reflects that fact.

pages: 602 words: 177,874

Thank You for Being Late: An Optimist's Guide to Thriving in the Age of Accelerations
by Thomas L. Friedman
Published 22 Nov 2016

This approach has driven a lot of innovation in education, most notably the partnership between Udacity, AT&T, and Georgia Tech to create an online master’s degree in computer science for $6,600 for the entire course—as compared with the $45,000 it would cost for two years on campus at Georgia Tech. Coursera has partnered with Johns Hopkins and Rice to create a similar certificate in data science. This is driving down the cost of education for everyone. The education “pie just got bigger,” said Blase. “We can now assist you to get the job of your dreams.” That’s intelligent assistance. “We do $250 million of training a year,” said Blase. A lot is teaching people to climb poles, install services, and run retail stores, but now a lot more is in data science, software-defined networks, Web development, introduction to programming, machine learning, and the Internet of Things.

It can’t just be the advocacy of abstract principles. When you put your value set together with your analysis of how the Machine works and your understanding of how it is affecting people and culture in different contexts, you have a worldview that you can then apply to all kinds of situations to produce your opinions. Just as a data scientist needs an algorithm to cut through all the unstructured data and all the noise to see the relevant patterns, an opinion writer needs a worldview to create heat and light. But to keep that worldview fresh and relevant, I suggested to Bojia, you have to be constantly reporting and learning—more so today than ever.

“The idea,” he explained, “was that if you know exactly how the gas turbine and combustion engine work, you can use the laws of physics and say: ‘This is how it is going to work and when it is going to break.’ There was not a belief in the traditional engineering community that the data had much to offer. They used the data to verify their physics models and then act upon them. The new breed of data scientists here say: ‘You don’t need to understand the physics to look for and find the patterns.’ There are patterns that a human mind could not find, because the signals are so weak early on that you won’t see them. But now that we have all this processing power, those weak signals just pop out at you.

pages: 170 words: 49,193

The People vs Tech: How the Internet Is Killing Democracy (And How We Save It)
by Jamie Bartlett
Published 4 Apr 2018

As I walked in to the ordinary looking office in central London – all offices are normal looking, except those of tech firms – I spotted a framed posted with a picture of Trump and a quote from famed US pollster Frank Luntz: ‘There are no longer any experts except Cambridge Analytica. They were Trump’s digital team who figured out how to win.’ Rows of employees were sitting staring at screens: project managers, IT specialists and data scientists.29 On a shelf in Nix’s glass office were copies of The Bad Boys of Brexit, the book written by UKIP donor Arron Banks, and Stealing Elections by John Fund. He seemed perfectly happy with these techniques, and said that micro-targeting was just getting started and represented the future of campaigning.

Although he rejects any single ‘why’, it’s clear that he thinks data was instrumental: One of our central ideas was that the campaign had to do things in the field of data that have never been done before. This included a) integrating data from social media, online advertising, websites, apps, canvassing, direct mail, polls, online fundraising, activist feedback and some new things we tried such as a new way to do polling . . . and b) having experts in physics and machine learning do proper data science in the way only they can – i.e. far beyond the normal skills applied in political campaigns. We were the first campaign in the UK to put almost all our money into digital communication then have it partly controlled by people whose normal work was subjects like quantum information . . . If you want to make big improvements in communication, my advice is – hire physicists, not communications people from normal companies

This did not seem to bother investors, since this 11-person business, with ambitions to revolutionise the entire trucking industry by building a self-driving fleet, managed to raise millions of dollars of funding from venture capitalists. ‘Everyone thought I was mad,’ 27-year-old Stefan told me when I visited Starsky’s Florida headquarters, a large rented property in a gated community, a few months ago. These days however, like many other industries, trucking is being disrupted by data science, artificial intelligence and venture capital. Stefan had agreed to let me drive in Starsky’s newest and shiniest truck with their resident driver Tony Hughes, a diminutive and friendly man with 20 years’ experience, who is perhaps better described as part-driver, part-machine supervisor. Tony is in his fifties, with a high school diploma in general studies from Shawnee Mission Northwest (Kansas) and a ‘solid track record of achieving efficient, cost-effective transportation operations’, but now finds himself training the machines that might eventually put him out of a job.

pages: 241 words: 43,252

Modern Vim: Craft Your Development Environment With Vim 8 and Neovim
by Drew Neil
Published 2 May 2018

Venkat Subramaniam (280 pages) ISBN: 9781934356760 $35 Data Science Essentials in Python Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python. Dmitry Zinoviev (224 pages) ISBN: 9781680501841 $29 Practical Programming, Third Edition Classroom-tested by tens of thousands of students, this new edition of the best-selling intro to programming book is for anyone who wants to understand computer science.

pages: 458 words: 116,832

The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism
by Nick Couldry and Ulises A. Mejias
Published 19 Aug 2019

Take any aspect of the social world, even ones for which we currently lack a causal model, and simply generate a proxy for it; there is no limit to what might work as such a proxy, and indeed the vagueness as to what is a “proxy variable” is a problem in legal proceedings that increasingly rely on them.64 So data scientists may ask: Could visual cues in Google Street View scenes be proxies of the likelihood of nearby crime? Could patterns in the distribution of more-expensive car models in Google Earth pictures be proxy demographic variables (income levels, relative poverty/wealth)? The temptation to pursue such proxy hunts is considerable, especially when public census data is costly and only intermittently collected.65 The scope for social experimentation that data relations provide to parts of the social quantification sector is huge but has sometimes proved controversial.66 Privacy concerns may act as a constraint, but, if so, China’s assumed lower sensitivity to privacy concerns works as a market advantage for its AI industry.67 Collect Everything If Big Data reasoning relies on the predictive power that comes from repetitive processing of unstructured data, this data can be generated, directly or indirectly, from whole populations.

This claim might shock those who see in “datafication . . . an essential enrichment in human comprehension” or “a great infrastructural project that rivals” the Enlightenment’s Encyclopédie.134 But let’s get beyond the hype and look at what is actually going on in the social sciences today. A good example is the work of celebrated data scientist Alex Pentland at the MIT Media Lab. Pentland contributed to the World Economic Forum’s Global Information Technology Reports in 2008, 2009, and 2014;135 his research team won the Defense Advanced Research Projects Agency’s (DARPA) prize to commemorate the internet’s fortieth anniversary.136 In his book Social Physics, Pentland reaches back to the origins of sociology.

There is, however, one bright note: the rise of critical information science that seeks to systematically establish the distortions woven into social caching’s presentations of the world. There is now an emerging movement for algorithmic justice, which has raised awareness of many specific issues. The critical movement within data science concerned with “Fairness, Accountability and Transparency” has regular conferences, and a number of universities have focused programs for investigating how algorithms cover the social domain.160 More generally, an important intersection between critical information science, legal theory, and social theory is opening up the question of how the social qualification sector presents the social world for action by powerful institutions.161 US civil society has generated some effective campaigning.

pages: 1,172 words: 114,305

New Laws of Robotics: Defending Human Expertise in the Age of AI
by Frank Pasquale
Published 14 May 2020

At its beginning, credit scoring was confined to one area of life (determining eligibility for loans) and based on a limited set of data (repayment history). Over decades, credit scores and similar measures have come to inform other determinations, including insurance rates and employment opportunities. More recently, data scientists have proposed more data sources for credit scoring, ranging from the way people type, to their political affiliation, to the types of websites they visit online. The Chinese government has also expanded the stakes of surveillance, proposing that “social credit scores” play a role in determining what trains or planes a citizen can board, what hotels a person can stay in, and what schools a family’s children can attend.

Malpractice law is designed to give patients reassurance that if their physician falls below a standard of care, a penalty will be imposed and some portion of it dedicated to their recovery.22 If providers fail to use sufficiently representative datasets to develop their medical AI, lawsuits should help hold them accountable, to ensure that everyone benefits from AI in medicine (not just those lucky enough to belong to the most studied groups). Data scientists sometimes joke that AI is simply a better-marketed form of statistics. Certainly narrow AI, designed to make specific predictions, is based on quantifying probability.23 It is but one of many steps taken over the past two decades to modernize medicine with a more extensive evidence base.24 Medical researchers have seized on predictive analytics, big data, artificial intelligence, machine learning, and deep learning as master metaphors for optimizing system performance.

Once a single perfect way of doing some task was found, there was little rationale for a human to continue doing it. Taylorism dovetailed with the psychological school of behaviorism, which sought to develop for human beings a blend of punishments and reinforcements reminiscent of animal training. The rise of data-driven predictive analytics has given behaviorism new purchase. The chief data scientist of a Silicon Valley e-learning firm once stated, “The goal of everything we do is to change people’s actual behavior at scale. When people use our app, we can capture their behaviors, identify good and bad behaviors, and develop ways to reward the good and punish the bad. We can test how actionable our cues are for them and how profitable for us.”72 A crowded field of edtech innovators promises to drastically reduce the cost of primary, secondary, and tertiary education with roughly similar methods: broadcast courses, intricate labyrinths of computerized assessment tools, and 360-degree surveillance tools to guarantee that students are not cheating.

pages: 848 words: 227,015

On the Edge: The Art of Risking Everything
by Nate Silver
Published 12 Aug 2024

Having once attracted eccentrics like Howard Hughes and Kirk Kerkorian, the industry is now highly corporatized—and it’s grown more profitable in large part because it’s become more data driven in figuring out how to track its customers and get them to gamble and spend more. The casino business is not alone in this regard—the Algorithmization of Everything is contributing to record corporate profits as data scientists also figure out how to get you to spend more on, say, a fast food delivery order. But “gaming”—the euphemism for gambling that the industry prefers for itself—offers a particularly clear case study of modern American algorithmic capitalism. Plus, since much of this book takes place inside casinos, I want to give them due consideration and not just treat them as scenery—partly because they’re fascinating places, and partly because it’s easy for someone like me to strut past the roulette tables and the rows of slot machines[*5] en route to the poker room and take for granted the experience that other patrons are having.

In a time of intense partisanship in the United States, this tends to make the Village a risk-averse place, especially when it comes to saying things that might offend others on your “team.” “If you’ve ever spent any time in D.C., it’s like a city of rule followers,” said David Shor, a data scientist and political consultant who works for Democratic campaigns. Shor has a theory for why this is the case. Election campaigns are like the minor league systems for Washington’s social hierarchy: bright kids in their twenties aspire to move up the ranks and get a White House job in their thirties before cashing out with a cushy life in lobbying, consulting, or media.

But the stock’s value declined, part of a so-called tech wreck in the market in summer 2017. He made other losing bets too—and in fact, he has pretty much been chasing that initial high ever since. Li is obviously a smart guy—he got a master’s degree in economics at the University of Toronto and has been employed as a data scientist at Meta and other companies. But he’s lost around $1 million in options trading, he told me. (Li cold-contacted me by email when he heard about my book; he wanted me to share his story. “I think if I can help at least one other person avoid the pitfalls that I face, that would be a win for me.

The Data Journalism Handbook
by Jonathan Gray , Lucy Chambers and Liliana Bounegru
Published 9 May 2012

Less guessing, less looking for quotes; instead, a journalist can build a strong position supported by data, and this can affect the role of journalism greatly. Additionally, getting into data journalism offers a future perspective. Today, when newsrooms downsize, most journalists hope to switch to public relations. Data journalists or data scientists, though, are already a sought-after group of employees, not only in the media. Companies and institutions around the world are looking for “sensemakers” and professionals who know how to dig through data and transform it into something tangible. There is a promise in data, and this is what excites newsrooms, making them look for a new type of reporter.

Far from it. In the information age, journalists are needed more than ever to curate, verify, analyze, and synthesize the wash of data. In that context, data journalism has profound importance for society. Today, making sense of big data, particularly unstructured data, will be a central goal for data scientists around the world, whether they work in newsrooms, Wall Street, or Silicon Valley. Notably, that goal will be substantially enabled by a growing set of common tools, whether they’re employed by government technologists opening Chicago, healthcare technologists, or newsroom developers. — Alex Howard, O’Reilly Media Our Lives Are Data Good data journalism is hard, because good journalism is hard.

INSEAD Working Paper, 2010 Business Models for Data Journalism Amidst all the interest and hope regarding data-driven journalism, there is one question that newsrooms are always curious about: what are the business models? While we must be careful about making predictions, a look at the recent history and current state of the media industry can give us some insight. Today there are many news organizations who have gained by adopting new approaches. Terms like “data journalism” and the newest buzzword, “data science,” may sound like they describe something new, but this is not strictly true. Instead these new labels are just ways of characterizing a shift that has been gaining strength over decades. Many journalists seem to be unaware of the size of the revenue that is already generated through data collection, data analytics, and visualization.

Work in the Future The Automation Revolution-Palgrave MacMillan (2019)
by Robert Skidelsky Nan Craig
Published 15 Mar 2020

xii Notes on Contributors She then switched over to the private sector, working as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She left finance in 2011 and started working as a data scientist in the New York start-up scene, building models that predicted people’s purchases and clicks. She wrote Doing Data Science in 2013 and launched the Lede Program in Data Journalism at Columbia in 2014. She is a regular contributor to Bloomberg View and wrote the book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. She recently founded ORCAA, an algorithmic auditing company.

and next you scan your memories, as well as your closet, for outfits that can optimize to that definition of success. In the case of a formal algorithm, the definition of success does not waver; it’s codified in computer code, and that precise concept of “success,” as well as the associated concept of the cost of failure, are embedded in a mathematical object called the objective function. Once the data scientist decides on the objective function, and the historical training data, the ensuing algorithm is largely determined. Sounds simple, and it sometimes is. But when the output of the algorithm (the prediction itself ) is used in a powerful way, a feedback loop is created: the algorithm doesn’t just predict the future, it causes the future.

pages: 234 words: 68,798

The Science of Storytelling: Why Stories Make Us Human, and How to Tell Them Better
by Will Storr
Published 3 Apr 2019

Jockers (Allen Lane, 2016) p. 163. 4.1 Researchers downloaded 1,327: ‘The emotional arcs of stories are dominated by six basic shapes’, Andrew J. Reagan, Lewis Mitchell, Dilan Kiley, Christopher M. Danforth, Peter Sheridan Dodds, EPJ Data Science, 5:31, 4 November 2016. 4.2 For the neuroscientist Professor Beau Lotto: Deviate, Beau Lotto (W&N, 2017) Kindle location 685. When the data scientist David Robinson: Examining the arc of 100,000 stories: a tidy analysis by David Robinson, http://varianceexplained.org/r/tidytext-plots, 26 April 2017. The psychologist and story analyst Professor Jordan Peterson: Maps of Meaning video lectures.

It’s only by being active, and having the courage to take on the external world with all its challenges and provocations, that these core mechanisms can ever be broken down and rebuilt. For the neuroscientist Professor Beau Lotto it’s ‘not just important to be active, it is neurologically necessary’. It’s the only way we grow. When the data scientist David Robinson analysed an enormous tranche of 112,000 plots including books, movies, television episodes and video games, his algorithm found one common story shape. Robinson described this as, ‘Things get worse and worse until, at the last minute, they get better.’ The pattern he detected reveals that many stories have a point, just prior to their resolution, in which the hero endures some deeply significant test.

pages: 326 words: 103,170

The Seventh Sense: Power, Fortune, and Survival in the Age of Networks
by Joshua Cooper Ramo
Published 16 May 2016

These systems run faster and better and more profitably because they are a shared system. They are gated by technology standards and by common connection. When we say that networks crave gates, this is the sort of gate we mean. If you had to look for your friends one by one on Facebook, Friendster, MySpace, and Google Plus, you’d exhaust yourself. So one winner emerges. Data scientists attribute the success of these winning nodes to preferential attachment—the idea that if Brian Arthur is using Microsoft Word, and I’m using it, you are likely to do so too. But there’s another secret: More widespread adoption makes the whole system faster. Think of five mechanics trying to fix a broken engine.

“seven friends in ten days”: Chamath Palihapitiya, “How We Put Facebook on the Path to 1 Billion Users” (lecture for the Udemy course “Growth Hacking: An Introduction,” published January 9, 2013, and available at https://www.youtube.com/watch?v=raIUQP71SBU). Pretty soon: Eman Yasser Daraghmi and Shyan-Ming Yuan, “We Are So Close, Less Than 4 Degrees Separating You and Me!,” Computers in Human Behavior 30 (January 2014), 273–85. Data scientists: Laurent Hébert-Dufresne et al. “Complex Networks as an Emerging Property of Hierarchical Preferential Attachment,” Physical Review E 92, 062809 (2015). But there’s another secret: Albert-László Barabási, “Network Science,” Philosophical Transactions of the Royal Society A: Mathematical, Physical, and Engineering Sciences 371, no. 1987 (March 2013).

Their appeal was both the potential of the new and the chance to get away from the rotting smell of old politics. This is one reason it is wrong to look at the world and consider it filled merely with random events, with so-called black swans. In fact, patterns appear everywhere. They can be searched and mapped and studied with the tools of data science, but, of course, they can also be felt. They may surprise you if you don’t know how to look for them. But they are there. There is more to human history than earthquakes alone. Even if it can’t be predicted, complexity in any system, whether it is an Indonesian coral reef or a Russian computer network, can at least be measured.

pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think
by Marcus Du Sautoy
Published 7 Mar 2019

Was Rembrandt’s considerable output sufficient for an algorithm to be able to learn how to create a new portrait that would be recognisably his? The internet contains millions of images of cats, but Shakespeare wrote thirty-seven plays and Beethoven nine symphonies. Will creative genius be protected from machine learning by a lack of data? Data scientists at Microsoft and Delft University of Technology were of the view that there was enough data for an algorithm to learn how to paint like Rembrandt. Ron Augustus from Microsoft, who worked on the project, believed the old master himself would approve of their project: ‘We are using technology and data like Rembrandt uses his paints and brushes to create something new.’

By layering the negatives on top of each other and exposing the resulting image Galton was rather shocked to see the array of distorted and ugly faces he had used transform into a handsome composite. It seems that when you smooth out the asymmetries, you end up with something quite attractive. The data scientists would have to devise a more clever plan if they were going to produce a painting that might be taken for a Rembrandt. Their algorithm would have to create new eyes, a new nose and a new mouth, as if it could see the world through Rembrandt’s eyes. Having created these features, they then investigated the proportions Rembrandt used to place these features on the faces he painted.

The current drive by humans to create algorithmic creativity is in the most part not one fuelled by the desire for extending artistic creation but rather enlarging a company bank balance. There is a huge amount of hype about AI. There are too many initiatives that are branded as AI but which are little more than statistics or data science. Just as any company wishing to make it at the turn of the millennium would put .com on the end of its name, today it is the addition of the tag AI or Deep which is what companies are using to jump on the bandwagon. Companies would love to be able to convince an audience that this AI is so great that it can write articles on its own, that it can compose music, paint Rembrandts.

pages: 420 words: 100,811

We Are Data: Algorithms and the Making of Our Digital Selves
by John Cheney-Lippold
Published 1 May 2017

Big data needs to be analyzed at a distance, because as Franco Moretti claims, “distance, let me repeat it, is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the [singular] text.”66 Big-data practitioners like Viktor Mayer-Schönberger and Kenneth Cukier aren’t really concerned with analytical rigor: “moving to a large scale changes not only the expectations of precision but the practical ability to achieve exactitude.”67 Big data is instead about understanding a new version of the world, one previously unavailable to us because in an antiquated era before digital computers and ubiquitous surveillance, we didn’t have the breadth and memory to store it all. Moretti, though, is no data scientist. He’s a humanist, best known for coining the concept of “distant reading.” Unlike the disciplinary English practice of “close reading,” in which a section of a certain text is painstakingly parsed for its unique semantic and grammatological usage, distant reading finds utility in the inability to do a close reading of large quantities of texts.68 One human cannot read and remember every last word of Hemingway, and she would be far less able to do the same for other U.S.

Nidhi Makhija-Chimnani, “People’s Insights Volume 1, Issue 52: Vicks Mobile Ad Campaign,” MSLGroup, 2013, asia.mslgroup.com. 59. David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani, “The Parable of Google Flu: Traps in Big Data Analysis,” Science 343 (March 14, 2014): 1203–1205. 60. Steve Lohr, “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights,” New York Times, August 17, 2014. 61. Lazer, Kennedy, and Vespignani, “Parable of Google Flu.” 62. Foucault, “Society Must Be Defended,” 249. 63. Mauricio Santillana, D. Wendong Zhang, Benjamin Althouse, and John Ayers, “What Can Digital Disease Detection Learn from (an External Revision to) Google Flu Trends?

It suggests, “We can stop looking for models. We can analyze the data without hypotheses about what it might show.”119 Of course, as discussed in the preceding pages, data is never “just” data. But this antihypothetical interpretation, with all its methodological baggage, has set the trajectory for big-data science itself. Data-mining and machine-learning research is at a fever pitch. The data culled from the surveillant assemblages of our networked society has been dubbed “oil of the 21st century,” while algorithmic analytics are “the combustion engine.”120 Even with powerful critiques of this hyperpositivism coming from scholars like David Ribes, Steven J.

pages: 1,082 words: 87,792

Python for Algorithmic Trading: From Idea to Cloud Deployment
by Yves Hilpisch
Published 8 Dec 2020

The book by Hilpisch (2020) focuses exclusively on the application of algorithms for machine and deep learning to the problem of identifying statistical inefficiencies and exploiting economic inefficiencies through algorithmic trading: Guido, Sarah, and Andreas Müller. 2016. Introduction to Machine Learning with Python: A Guide for Data Scientists. Sebastopol: O’Reilly. Hilpisch, Yves. 2020. Artificial Intelligence in Finance: A Python-Based Guide. Sebastopol: O’Reilly. VanderPlas, Jake. 2016. Python Data Science Handbook: Essential Tools for Working with Data. Sebastopol: O’Reilly. The books by Hastie et al. (2008) and James et al. (2013) provide a thorough, mathematical overview of popular machine learning techniques and algorithms: Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2008.

Background information about Python as applied to finance, financial data science, and artificial intelligence can be found in the following books: Hilpisch, Yves. 2018. Python for Finance: Mastering Data-Driven Finance. 2nd ed. Sebastopol: O’Reilly. ⸻. 2020. Artificial Intelligence in Finance: A Python-Based Guide. Sebastopol: O’Reilly. McKinney, Wes. 2017. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. 2nd ed. Sebastopol: O’Reilly. Ramalho, Luciano. 2021. Fluent Python: Clear, Concise, and Effective Programming. 2nd ed. Sebastopol: O’Reilly. VanderPlas, Jake. 2016. Python Data Science Handbook: Essential Tools for Working with Data.

The reader can post questions and comments in the user forum on the Quant Platform at any time (accounts are free). Online/video training (paid subscription) The Python Quants offer comprehensive online training programs that make use of the contents presented in the book and that add additional content, covering important topics such as financial data science, artificial intelligence in finance, Python for Excel and databases, and additional Python tools and skills. Contents and Structure Here’s a quick overview of the topics and contents presented in each chapter. Chapter 1, Python and Algorithmic Trading The first chapter is an introduction to the topic of algorithmic trading—that is, the automated trading of financial instruments based on computer algorithms.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity
by Amy Webb
Published 5 Mar 2019

There is another reason we should be concerned about China’s plans, and that brings us back to that place where AI’s tribes form: education. China is actively draining professors and researchers away from AI’s hubs in Canada and the United States, offering them attractive repatriation packages. There’s already a shortage of trained data scientists and machine-learning specialists. Siphoning off people will soon create a talent vacuum in the West. By far, this is China’s smartest long-term play—because it deprives the West of its ability to compete in the future. China’s talent pipeline is draining researchers back into the mainland as part of its Thousand Talents Plan.

My second-grader will be 30, and by then she may be reading a New York Times bestseller written entirely by a machine. My dad will be in his late 90s, and all of his medical specialists (cardiologists, nephrologists, radiologists) will be AGIs, directed and managed by a highly trained general practitioner, who is both an MD and a data scientist. The advent of ASI could follow soon or much longer after, between the 2040s and 2060s. It doesn’t mean that by 2070 superintelligent AIs will have crushed all life on Earth under the weight of quintillions of paperclips. But it doesn’t mean they won’t have either. The Stories We Must Tell Ourselves Planning for the futures of AI requires us to build new narratives using data from the real world.

In the United States, the G-MAFIA can commit to recalibrating its own hiring processes, which at present prioritize a prospective hire’s skills and whether they will fit into company culture. What this process unintentionally overlooks is someone’s personal understanding of ethics. Hilary Mason, a highly respected data scientist and the founder of Fast Forward Labs, explained a simple process for ethics screening during interviews. She recommends asking pointed questions and listening intently to a candidate’s answers. Questions like: “You’re working on a model for consumer access to a financial service. Race is a significant feature in your model, but you can’t use race.

pages: 562 words: 153,825

Dark Mirror: Edward Snowden and the Surveillance State
by Barton Gellman
Published 20 May 2020

Whatever that number, dozens or hundreds, you multiply it by itself to measure the growth at each hop. The NSA’s deputy director, John C. Inglis, had testified in Congress just the day before Negroponte and Blair joined me onstage. Inglis said NSA analysts typically “go out two or three hops” when they chain through the call database. For context, data scientists estimated decades ago that it would take no more than six hops to trace a path between any two people on Earth. Their finding made its way into popular culture in Six Degrees of Separation, the play and subsequent film by John Guare. Three students at Albright College refashioned the film as a parlor game, “Six Degrees of Kevin Bacon.”

MAINWAY’s analytic engine traced hidden paths across the map, looking for relationships that human analysts could not detect. MAINWAY had to produce that map on demand, under pressure of time, whenever its operators asked for a new contact chain. No one could predict the name or telephone number of the next Tsarnaev. From a data scientist’s point of view, the logical remedy was clear. If anyone could become an intelligence target, MAINWAY should try to get a head start on everyone. “You have to establish all those relationships, tag them, so that when you do launch the query you can quickly get them,” Rick Ledgett, the former NSA deputy director, told me years later.

He gave a valedictory interview to National Public Radio on January 10, 2014, archived at https://archive.is/5j5Yg. “go out two or three hops”: Testimony of John C. Inglis, “The Administration’s Use of FISA Authorities,” House Committee on the Judiciary, July 17, 2013, https://fas.org/irp/congress/2013_hr/fisa.pdf. data scientists estimated: Among the early works in this field was Michael Gurevitch, whose 1991 doctoral dissertation, “The Social Structure of Acquaintanceship Networks,” may be found at https://dspace.mit.edu/handle/1721.1/11312. Six Degrees of Separation: The play, which opened in previews on October 30, 1990, won the New York Drama Critics’ Circle Best Play of 1990.

Traffic: Genius, Rivalry, and Delusion in the Billion-Dollar Race to Go Viral
by Ben Smith
Published 2 May 2023

Jonah proposed moving to Palo Alto to oversee Facebook’s News Feed even as he continued to try his ideas out on BuzzFeed. He’d bring in Duncan Watts, the six degrees of separation expert, as Facebook’s “chief sociologist.” They’d work together with Jonah’s old MIT friend Cameron Marlow, who was already in the process of creating Facebook’s data science team, and “take the best from data science and machine learning, sociology and behavioral economics, and build enhanced publisher relationships” to make News Feed—the ever-changing column of content that still rules most people’s experience of Facebook—“live up to its name.” Jonah had other ideas too. He thought Facebook could track incoming and outgoing traffic to the rest of the web more deeply.

It turned out that Mark Wilkie, the developer, had created a domain called BuzzFed, with a single e, and was using it to host embedded content. The tricky-looking similar domain triggered Google’s new screens for malware and knocked the site out of the search engine. Five days later, all the Google traffic came roaring back. September 14 was “the biggest traffic day ever, by all metrics!” one of Jonah’s new hires, a data scientist, wrote him triumphantly. Suddenly the site was setting new records every few weeks. BuzzFeed’s best traffic day yet came on December 5, 2011. The post, one of Stopera’s, was a simple list titled “The 45 Most Powerful Images of 2011,” bringing readers back through a series of emotional public moments: Riots in London.

Elsewhere on the internet, the Los Angeles BuzzFeed employee Baked Alaska had revived the style of dumb stunt that had made him a star on Vine, and by 2018 was roving the streets of Los Angeles, recording a video livestream while speakers broadcast comments from viewers. Within minutes, the speakers were blasting out the N-word. Facebook employees knew what they’d done. “Our approach has had unhealthy side effects on important slices of public content, such as politics and news,” a team of data scientists wrote in a memo that a whistleblower, Frances Haugen, provided to The Wall Street Journal. But top executives couldn’t countenance the loss of engagement that might come with actually tamping down on the divisive speech that seemed to attract, in the late 2010s, most of Americans’ attention. The company did figure out how to dial back the toxicity of its platform, but it would reserve that for extreme situations—the run-up to elections around the world, for instance—and usually left the tap of anger open, keeping its users glued to the site.

Data Wrangling With Python: Tips and Tools to Make Your Life Easier
by Jacqueline Kazil
Published 4 Feb 2016

In her career, she has worked in technology focusing in finance, government, and journalism. Most notably, she is a former Presidential Innovation Fellow and cofounded a technology organization in government called 18F. Her career has consisted of many data science and wrangling projects including Geoq, an open source mapping workflow tool; a Congress.gov remake; and Top Secret America. She is active in Python and data communities—Python Software Foundation, PyLadies, Women Data Science DC, and more. She teaches Python in Washington, D.C. at meetups, conferences, and mini bootcamps. She often pairs pro‐ grams with her sidekick, Ellie (@ellie_the_brave). You can find her on Twitter @jack‐ iekazil or follow her blog, The coderSnorts.

Data Wrangling with Python TIPS AND TOOLS TO MAKE YOUR LIFE EASIER Jacqueline Kazil & Katharine Jarmul Praise for Data Wrangling with Python “This should be required reading for any new data scientist, data engineer or other technical data professional. This hands-on, step-by-step guide is exactly what the field needs and what I wish I had when I first starting manipulating data in Python. If you are a data geek that likes to get their hands dirty and that needs a good definitive source, this is your book.” —Dr. Tyrone Grandison, CEO, Proficiency Labs Intl. “There’s a lot more to data wrangling than just writing code, and this well-written book tells you everything you need to know.

She would like to thank all four of her parents for their patience with endless book updates and dong bells. Sie möchte auch Frau Hoffmann für ihre endlose Geduld bei zahllosen Gesprä‐ chen auf Deutsch über dieses Buch bedanken. xvi | Preface CHAPTER 1 Introduction to Python Whether you are a journalist, an analyst, or a budding data scientist, you likely picked up this book because you want to learn how to analyze data programmatically, sum‐ marize your findings, and clearly communicate those findings to others. You might show your findings in a report, a graphic, or summarized statistics. Essentially, you are trying to tell a story.

Digital Transformation at Scale: Why the Strategy Is Delivery
by Andrew Greenway,Ben Terrett,Mike Bracken,Tom Loosemore
Published 18 Jun 2018

Open-minded, multidisciplinary teams can deliver a lot more than just elegant websites. The internet era has arguably created some genuinely new roles, or at least redefined existing roles to the extent that they will be taken by different people applying a new attitude. Given the chance, statisticians will say data scientists are just chancers with good PR blagging the same job they’ve been unfashionably plugging away with for years. That’s an argument for another book. Many of the skills needed for digital transformation are not new. The UK government has achieved some proud moments in design, for example (Henry Beck’s famous Underground map and Margaret Calvert and Jock Kinneir’s work on road signs in the 1960s37 were both emulated worldwide), and couldn’t have done that without employing people who understood its value.

Be as iterative with your approach to communicating as you are with the products you build. The GDS began with one blog for the whole organisation and made that part of government communications infrastructure. From there, the team created many more tightly focused blogs, each with discrete and defined audiences, covering a huge variety of topics from user research to data science and HR. These created bounded spaces for experts to write to an audience they knew was interested, starting a conversation rather than a broadcast. They opened up networks, and left a legacy of knowledge that is still available for anyone to draw on. In many large organisations, hoarding information in emails and memos is a common form of controlling power.

If the culture, people and working practices of the institution are still grounded in principles that were set down in the age of the telegraph, the chances of responding with the requisite flexibility and agility to machine learning are slim. How can you be sure you’re not buying snake oil? Which roles and professions should be part of the conversation? Which start looking obsolete? Can you buy into the business models that AI or data science services will use? If you’ve failed to get through the first digital transformation of your organisation, you will also fail to make the best of the second. Retrospective: paperless driving For all the cold water we are pouring on them in this chapter, AI and machine learning represent a new frontier.

pages: 444 words: 118,393

The Nature of Software Development: Keep It Simple, Make It Valuable, Build It Piece by Piece
by Ron Jeffries
Published 14 Aug 2015

Michael Keeling (358 pages) ISBN: 9781680502091 $41.95 Data Science Essentials in Python Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python. Dmitry Zinoviev (224 pages) ISBN: 9781680501841 $29 A Common-Sense Guide to Data Structures and Algorithms If you last saw algorithms in a university course or at a job interview, you’re missing out on what they can do for your code.

If the system can determine in advance that it will fail at an operation, it’s always better to fail fast. That way, the caller doesn’t have to tie up any of its capacity waiting and can get on with other work. How can the system tell whether it will fail? Do we need Deep Learning? Don’t worry, you won’t need to hire a cadre of data scientists. It’s actually much more mundane than that. There’s a large class of “resource unavailable” failures. For example, when a load balancer gets a connection request but not one of the servers in its service pool is functioning, it should immediately refuse the connection. Some configurations have the load balancer queue the connection request for a while in the hopes that a server will become available in a short period of time.

pages: 254 words: 61,387

This Could Be Our Future: A Manifesto for a More Generous World
by Yancey Strickler
Published 29 Oct 2019

Using your skills to maximize financial value seems like a waste when a whole new frontier of value awaits. Led by some of the best and brightest of the Millennial and Z generations, these people become the pioneers of the new Values Maximizing Class. Accountants, carpenters, community organizers, construction workers, data scientists, designers, ecologists, economists, engineers, entrepreneurs, financial analysts, journalists, lawyers, line cooks, meteorologists, politicians, social scientists, teachers, truck drivers, venture capitalists, waitresses, students, and retirees dedicate themselves to the mission of identifying, measuring, and growing rational, nonfinancial values.

Coaches and TV announcers condemned it as selfish. It wasn’t how the game was played. But in the first decade of the 2000s, the way people thought about sports began to change. In the wake of Moneyball, the 2003 Michael Lewis book about an underdog baseball team using data analysis to outperform better-resourced competitors, data science became a new focus in sports. Including basketball. Trailblazing analysts started to ask new questions. Things like: where are the most efficient places on the court to shoot? This was a new kind of question. To know the most efficient shot, new forms of measurement were needed. To get the necessary data, new kinds of technology were required.

K., 54 Chick-fil-A, 165–66, 169, 175, 264 Chile, 198–99 China, ix, xii, 58–59, 71 Chouinard, Yvon, 172 Clear Channel Communications, 39–40 climate change, 144, 191–92 Cold War, 27–28, 31, 105 communitarianism, 236–37 community, xv, 48, 135, 243 companies contribute to, 51, 213, 216–17 as governing value, 142, 145 highly valued, xi–xiii, 45, 48 in pursuit of, 162–63 companies on hypergrowth path, 95–97, 100, 236 and public service, 60, 62, 101–2 purpose-oriented, 100–101 secular missions of, 212–13, 217–18 share-holder centric, 82–85, 169–70 values-minded, 165–66, 210–18 See also specific names; specific topics competition, 33, 39, 53–54, 83, 98–99, 104, 153, 172–74, 196 Compleat Strategyst, The (Williams), 29–31 compound interest, xiv, 191, 194 Confederacy of Dunces, A (Toole), 11 Conrad, Parker, 95–96 consumerism, xii, 51, 120, 168, 187, 217 cooperation/collaboration, 32–34, 102, 198–99, 213 Creative Independent, The, 170–71, 270 creativity and creating value, 12, 171, 175 highly valued, 45, 48 investment in, 5, 7, 10–13, 88, 170–71 and producing profits, 43–44, 134, 170 credit cards, 65–66, 74 crowdfunding, 4–13, 15, 247–48. See also specific companies cultural heritage, 180–81 Curtis, Adam, 270–71 data/data science, xv, 97, 123, 150, 159–62, 215 decision making, 28, 96–97 affects other people, 127–28, 144 and Bentoism, 130, 134, 138–40, 206–11 best-case outcome for, 126, 152, 243 guided by defaults, 22–23, 34–35 and making money, x–xi, 23, 135 of Maximizing Class, 62–63 rational, xiii, 134–35, 137, 139 values-driven, xiii, 132, 138–39, 152–53, 174–76, 209–10, 223 See also autonomy defaults (hidden) and bento box, 129–30 explanation of, 19–23 and financial maximization, x, 22–26, 64, 83 and game theory, 32, 34–35 and maximizing here/now, 136–37 set what’s normal, 34–35 and values, 214–15, 223 defaults (visible), 22 Defense Advanced Research Projects Agency (DARPA), 78–79 deregulation, 77–78, 83, 257 disrupting, in business, 87–88, 95–99, 103 downtown, demise of, 48, 51–52, 54 Drive (Pink), 117–19 drugs, xii, 23, 81, 249 Dublin, Ireland, 9–10 e-commerce, 47, 162 “Economic Possibilities for Our Grandchildren” (Keynes), 193–94 economy, 261 downturn in, 61–62, 71, 120 and financial maximization, x, xiii, 70, 72, 116 growth of, 120, 151, 193–95, 267 “Mullet,” 66–74, 77, 84, 110, 163 shareholder-centric, 60–61, 67–73, 82–85 See also gross domestic product (GDP); stock buybacks education, 24–26, 74–75, 110, 170–71, 197, 216, 259 electric cars, 173–75, 183 Ellison, Larry, 109–10 emotions, 22–23, 103, 113–15, 195, 260 Enron, 78, 210 Entrepreneurial State, The (Mazzucato), 78 entrepreneurship, 52–54, 75, 78–81, 196, 241 environmental issues, 14–15, 77, 172–74, 201, 212 Etsy, 212 “evergreen” model, 217 exercise, xiv, 177–78, 184–87, 189–90, 265, 267 Facebook, 53–54, 98, 109 fairness, xii–xiii, xv, 102, 142, 145, 158, 163, 195, 202, 216 family, the, xiii, xv, 26–27, 90–91, 111, 127, 138, 142, 222 Federal Reserve, 49 Federal Reserve Bank of Dallas, 72–73 financial crises, 77–78 debt, 65–66, 74–75 growth, xv–xvi instability, 110, 112 security, xi, 109–14, 116, 141, 201, 205 Financial Independence Retire Early (FIRE), 166–69 financial maximization, 236 becomes mainstream, 59–61, 63–65, 91, 180 case against, 110–11, 115–16, 119 dominance of, x–xiii, xvi, 23–25, 27, 37, 73, 97, 104–5, 133–35 downsides to, xi–xiii, 45, 196–97, 199, 243 ending its reign, xiii, 13–14, 225 four phases of, 83–85 growth of, xi, 92, 123, 196, 265 moving beyond it, 163, 168–69, 206, 209, 212 normalization of, 91, 183 not the goal, 9–10, 43–44 origins of, x, xvi, 26–32, 255 prioritizing it, 14, 63, 82–83 Financial Times, 70–71 First Amendment, 39 Fonda, Jane, 187 Food and Drug Administration, 188 Fortune, 68 Fox News, xii Friedman, Milton, 59–61, 63, 82, 90–91, 105, 180, 255, 261 Future Me examples of, 138, 167–69, 202–6, 211 explanation of, 132, 143, 206 and values helix, 218–23 Future Us, 196 examples of, 138, 169, 171–75, 201–6, 211 explanation of, 132, 144–45, 206 and values helix, 218–19 game theory, 237, 250 and collaboration, 32–34 and Community Game, 33–35, 98 its notion of rationality, 29–35, 97 and Prisoner’s Dilemma, 28–34, 130–33, 198–99 and Stag Hunt, 32–34 and Wall Street Game, 33–35, 98 Garfield, James, 146–47, 149, 179 Gates, Bill, 109–10 generational change, xiv, 180–84, 187, 191–92, 266–67 influence, xi, 152, 218–24, 271 generosity, xii, 7, 118, 134, 175 Gomory, Ralph, 82 Google, 53–54, 110, 123 Great Depression, 71, 77, 120, 193 Greatest Generation, 192 grit, 135, 143–45 gross domestic product (GDP), xii, 23, 83, 120–24, 196, 235 gross domestic value (GDV), 217–18 Groupon, 96–97, 100 gyms, xiv, 20, 186–87 Hanchett, Thomas, 49 happiness.

pages: 533

Future Politics: Living Together in a World Transformed by Tech
by Jamie Susskind
Published 3 Sep 2018

Mid-range cars already contain multiple microprocessors and sensors, allowing them to upload performance data to carmakers when the vehicle is serviced.24 The proportion of the world’s data drawn from machine sensors was 11 per cent in 2005; it is predicted to increase to 42 per cent in 2020.25 Data scientists have always wrestled with the challenge of turning raw data into information (by cleaning, processing, and organizing it), then into knowledge (by analysing and interpreting it).26 The arrival of big data has required some methodological innovation. As Mayer-Schönberger and Cukier explain, the benefit of analysing vast amounts of data about a topic rather than using a small representative sample has depended upon data scientists’ willingness to accept ‘data’s real-world messiness’ rather than seeking ­precision.27 In the 1990s IBM launched Candide, its effort to ­automate language translation using ten years’ worth of highquality transcripts from the Canadian parliament.

Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York: Crown, 2016), 114. 21. O’Neil, Weapons, 120. 22. Laurence Mills, ‘Numbers, Data and Algorithms: Why HR Professionals and Employment Lawyers Should Take Data Science and Analytics Seriously’, Future of Work Hub, 4 April 2017 <http:// www.futureofworkhub.info/comment/2017/4/4/numbers-dataand-algorithms-why-hr-professionals-and-employment-lawyersshould-take-data-science-seriously> (accessed 1 December 2017); Ifeoma Ajunwa, Kate Crawford, and Jason Schultz, ‘Limitless Worker Surveillance’, California Law Review 105, no. 3, 13 March 2016 <https:// OUP CORRECTED PROOF – FINAL, 30/05/18, SPi РЕЛИЗ ПОДГОТОВИЛА ГРУППА "What's News" VK.COM/WSNWS Notes 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 419 papers.ssrn.com/sol3/papers.cfm?

Miller, David and Larry Siedentop, eds. The Nature of Political Theory. Oxford: Oxford University Press, 1983. Mills, Laurence. ‘Numbers, Data and Algorithms—Why HR Professionals and Employment Lawyers Should Take Data Science and Analytics Seriously’. Future of Work Hub, 4 Apr. 2017 <http://www.futureofworkhub.info/comment/2017/4/4/numbers-data-and-algorithmswhy-hr-professionals-and-employment-lawyers-should-take-data-­ science-seriously> (accessed 1 Dec. 2017). Millward, David. ‘How Ford Will create a new generation of driverless cars’. Telegraph, 27 Feb. 2017 <http://www.telegraph.co.uk/business/ 2017/02/27/ford-seeks-pioneer-new-generation-driverless-cars/> (accessed 28 Nov. 2017).

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

Rudin, Cynthia, and Joanna Radin. “Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From An Explainable AI Competition.” Harvard Data Science Review, 2019. Rudin, Cynthia, and Berk Ustun. “Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice.” Interfaces 48, no. 5 (2018): 449–66. Rudin, Cynthia, Caroline Wang, and Beau Coker. “The Age of Secrecy and Unfairness in Recidivism Prediction.” Harvard Data Science Review 2, no. 1 (2020). Rumelhart, D. E., G. E. Hinton, and R. J. Williams. “Learning Internal Representations by Error Propagation.” In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1:318–62.

So how risky did those defendants turn out to be? There was only one way to find out. “I had the sad realization,” Angwin recounts, “that we had to look up the criminal records of every one of those eighteen thousand people. Which we did. And it sucked.”26 To link the set of COMPAS scores to the set of criminal records—what data scientists call a “join”—would take Angwin, and her team, and the county staff, almost an entire additional year of work. “We used, obviously, a lot of automated scraping of the criminal records,” she explains. “And then we had to match them on name and date of birth, which is the most terrible thing you could possibly ever imagine.

In 2014, United States Defense Advanced Research Projects Agency (DARPA) program manager Dave Gunning was talking to Dan Kaufman, director of DARPA’s Information Innovation Office. “We were just trying to kick around different ideas on what to do in AI,” Gunning tells me.15 “They had had a whole effort where they had sent a whole group of data scientists to Afghanistan to analyze data, try to find patterns that would be useful to the war fighters. And they were already beginning to see that these machine-learning techniques were learning interesting patterns, but the users often didn’t get an explanation for why.” A rapidly evolving set of tools was able to take in financial records, movement records, cell phone logs, and more to determine whether some group of people might be planning to strike.

pages: 499 words: 144,278

Coders: The Making of a New Tribe and the Remaking of the World
by Clive Thompson
Published 26 Mar 2019

You have to deal with uncertainty, weirdness. You guide the system toward doing what it’s supposed to do, like herding the cats of cognition. Maybe you’ll get them where you want; maybe you won’t. It’s like a point my friend Hilary Mason, a top data and machine-learning scientist, made about data science in the Harvard Business Review: “At the outset of a data science project, you don’t know if it’s going to work. At the outset of a software engineering project, you know it’s going to work.” On top of that, there’s the black-box problem. Once a neural net has been trained and you’re recognizing those cat photos, great! But if you ask the coder who built it, “How is this thing working?”

She thinks the reputation comes partly from their being comfortable and fluent with machines that intimidate and mystify most of the rest of the population. “If I had to characterize the programmers I know, I’d say there’s a certain confidence that comes with being infused with technology. It’s that confidence in actually understanding what this device in our hands is doing.” Mason is a pioneering data scientist and as committed a nerd as they come; when I first met her years earlier, she enthusiastically told me how she had “replaced myself with a bunch of small shell scripts”—she’d written dozens of short little programs to reply to dull, rote emails (students of hers asking “Will this be on the exam?”)

Mason is a pioneering data scientist and as committed a nerd as they come; when I first met her years earlier, she enthusiastically told me how she had “replaced myself with a bunch of small shell scripts”—she’d written dozens of short little programs to reply to dull, rote emails (students of hers asking “Will this be on the exam?”) because she’d rather save her time for more important stuff. But she’s also a connector who’s founded or helped start all manner of organizations designed to help bootstrap newbies to tech, including a Brooklyn “hackerspace” and hackNY, which runs hackathons for students. In a very data-scientist fashion, she rebels at the idea that a single archetype can hold true across an ever-larger cohort of coders worldwide. The population has grown so huge that you can’t generalize across the entire field anymore when it comes to personality. She’s right that the population of programmers has exploded.

pages: 510 words: 120,048

Who Owns the Future?
by Jaron Lanier
Published 6 May 2013

The code in some standardized form or framework that makes it reusable and tweakable? • Must analysis be performed in a way that anticipates standard practices of meta-analysis? • What documentation of the chain of custody of data must be standardized? • Must there be new practices established, analogous to double-blind tests or placebos, that help prevent big data scientists from fooling themselves? Should there be multiple groups developing code to analyze big data that remain completely insulated from each other in order to arrive at independent results? Before long, all these questions will be answered, but for now, practices are still in flux. Though the details need to mature, the core commitment to testing hypotheses unites all scientists whether their data is big or small.

So I am arguing both from the perspective of a big-time macher and from the perspective of a more typical person, because any solution has to be a solution from both perspectives. Big human data, that vase-shaped gap, is the arbiter of influence and power in our times. Finance is no longer about the case-by-case judgment of financiers, but about how good they are at locking in the best big-data scientists and technologists into exclusive contracts. Politicians target voters using similar algorithms to those that evaluate people for access to credit or insurance. The list goes on and on. As technology advances, Siren Servers will be ever more the objects of the struggle for wealth and power, because they are the only links in the chain that will not be commoditized.

We have become used to treating big business data as legitimate, even though it might really only seem so because of its special position in a network. Such data is valid by dint of tautology to an unknowable degree. Science demands a different approach to big data, but we don’t know as much about that approach as we will soon. Scientific method for big data is not yet entirely codified. Once practices are established for big data science, there will be uncontroversial answers to questions like: • What standard would have to be met to allow for the publication of replication of a result? To what degree must replication require the gathering of different, but similar big data, and not just the reuse of the same data with different algorithms?

The Deep Learning Revolution (The MIT Press)
by Terrence J. Sejnowski
Published 27 Sep 2018

But the terabyte-scale data sets collected by the Sloan Digital Sky Survey will themselves be outstripped a thousandfold by the petabyte-scale data sets to be collected by the Large Synoptic Sky Survey Telescope (https://www.lsst.org/) under construction. When Yann LeCun founded the Center for Data Science at New York University in 2013, faculty from every department came knocking on his door with data in hand. In 2018 UCSD dedicated a new Halıcıoğlu Data Science Institute. Master’s in Data Science degrees (MDSs) are becoming as popular as MBAs. Neural Information Processing Systems 165 Deep Learning at the Gaming Table Deep learning came of age at the 2012 NIPS Conference at Lake Tahoe (figure 11.3).

George Orwell, Nineteen Eighty-Four (London: Secker & Warburg, 1949). This book has recently taken on new meaning. 6. Founded in 2006, Women in Machine Learning has been creating opportunities for women in machine learning to present and promote their research. See http:// wimlworkshop.org. Chapter 12 1. The Kaggle website has a million data scientists who vie with each other to win the prize with the best performance. Cade Metz, “Uncle Sam Wants Your Deep Neural Networks,” New York Times, June 22, 2017, https://www.nytimes .com/2017/06/22/technology/homeland-security-artificial-intelligence-neural -network.html. Notes 307 2. For a video of my lecture “Cognitive Computing: Past and Present,” see https:// www.youtube.com/watch?

In the 1980s, there was hostility from faculty in their department toward neural networks, which was common at many institutions, but this did not deter either Ben or Andreas. Indeed, Andreas would go on to become a full professor at Hopkins and to cofound the Johns Hopkins University Center for Language and Speech Processing. Ben has a consulting group on data science for political and corporate clients. Learning to Recognize Handwritten Zip Codes More recently, Geoffrey Hinton and his students at the University of Toronto trained a Boltzmann machine with three layers of hidden units to classify handwritten zip codes with high accuracy (figure 7.6).20 Because the Boltzmann network had feedback as well as feedforward connections, it was possible to run the network in reverse, clamping one of the output units and generating input patterns that corresponded to the clamped output unit (figure 7.7).

pages: 232 words: 71,237

Kill It With Fire: Manage Aging Computer Systems
by Marianne Bellotti
Published 17 Mar 2021

The team maintaining the complete system has about 11 people on it. Four people are on operations, maintaining the servers and building tooling to help enforce standards. Four people are on the data science team, designing models and writing the code to implement them, and the remaining three people build the web services. That three-person team maintains Service B but also another service elsewhere in the system. The data science team maintains Service A, but also two other services. Both of those teams are a bit overloaded for their staffing levels, but the usage of the system is low, so the pressure isn’t too great.

Organizations tend to have responsibility gaps in the following areas: So-called 20 percent projects, or tools and services built (usually by a single engineer) as a side project. Interfaces. Not so much visual design but common components that were built to standardize experience or style before the organization was large enough to run a team to maintain them. New specializations. Is the role of a data engineer closer to a database administrator or a data scientist? Product engineering versus whatever the product runs on. Dev-Ops/site reliability engineering (SRE) didn’t solve that problem; this just moved it under more abstraction layers. If you’ve automated your infrastructure configuration, great—who maintains the automation tools? When there’s a responsibility gap, the organization has a blind spot.

pages: 339 words: 94,769

Possible Minds: Twenty-Five Ways of Looking at AI
by John Brockman
Published 19 Feb 2019

That goal was abandoned as they evolved into mathematical abstractions unrelated to how neurons actually function. But now there’s a kind of convergence that can be thought of as forward- rather than reverse-engineering biology, as the results of deep learning echo brain layers and regions. One of the most difficult research projects I’ve managed paired what we’d now call data scientists with AI pioneers. It was a miserable experience in moving goalposts. As the former progressed in solving long-standing problems posed by the latter, this was deemed to not count because it wasn’t accompanied by corresponding leaps in understanding the solutions. What’s the value of a chess-playing computer if you can’t explain how it plays chess?

By imposing statistical prediction, she continues, law enforcement in Camden during her tenure was able to reduce murders by 41 percent, saving thirty-seven lives, while dropping the total crime rate by 26 percent. After joining the Arnold Foundation as its vice president for criminal justice, she established a team of data scientists and statisticians to create a risk-assessment tool; fundamentally, she construed the team’s mission as deciding how to put “dangerous people” in jail while releasing the nondangerous. “The reason for this,” Milgram contended, “is the way we make decisions. Judges have the best intentions when they make these decisions about risk, but they’re making them subjectively.

Such an experiment would never have occurred to a Babylonian data fitter. Model-blind approaches impose intrinsic limitations on the cognitive tasks that Strong AI can perform. My general conclusion is that human-level AI cannot emerge solely from model-blind learning machines; it requires the symbiotic collaboration of data and models. Data science is a science only to the extent that it facilitates the interpretation of data—a two-body problem, connecting data to reality. Data alone are hardly a science, no matter how “big” they get and how skillfully they are manipulated. Opaque learning systems may get us to Babylon, but not to Athens.

Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth
by Stuart Ritchie
Published 20 Jul 2020

Meta-science experiments in which multiple research groups are tasked with analysing the same dataset or designing their own study from scratch to test the same hypothesis, have found a high degree of variation in method and results.70 Endless choices offer endless opportunities for scientists who begin their analysis without a clear idea of what they’re looking for. But as should now be clear, more analyses mean more chances for false-positive results. As the data scientists Tal Yarkoni and Jake Westfall explain, ‘The more flexible a[n] … investigator is willing to be – that is, the wider the range of patterns they are willing to ‘see’ in the data – the greater the risk of hallucinating a pattern that is not there at all.’71 It gets worse. So far, I’ve made it sound as though all p-hacking is done explicitly – running lots of analyses and publishing only those that give p-values lower than 0.05.

But the advocates of 0.005 are making the case that the problem of false positives, which their method would likely reduce, is a more pressing concern than that of false negatives. Here’s another way to deal with statistical bias and p-hacking: take the analysis completely out of the researchers’ hands. In this scenario, upon collecting their data, scientists would hand them over for analysis to independent statisticians or other experts, who would presumably be mostly free of the specific biases and desires of those who designed and performed the experiment.33 Such a system would be tricky to run and one can imagine it leading to conflict when scientists disagree with the analysis or interpretation that their assigned statistician has imposed on their precious data.34 As with some of the radical ideas for reforms that we’ll see later in the chapter, it could still be worth trying at small scale.

The authors wryly concluded that by ‘extrapolating the upward trend of positive words over the past forty years to the future, we predict that the word ‘novel’ will appear in every [abstract] by the year 2123’.57 It seems doubtful that scientific innovation has genuinely accelerated alongside the dramatic upsurge in hyperbolic language.58 A more likely explanation is that scientists are using this kind of language more frequently because it’s a great way to make their results appeal to readers and, perhaps more importantly, to the reviewers and editors of big-name journals. The most glamorous journals state on their websites that they want papers that have ‘great potential impact’ (Nature); that are ‘most influential in their fields’ and ‘present novel and broadly important data’ (Science); and that are of ‘unusual significance’ (Cell) or ‘exceptional importance’ (Proceedings of the National Academy of Sciences).59 Conspicuous by their absence from this list are any words about rigour or replicability – though hats off to the New England Journal of Medicine, the world’s top medical journal, for stating that it’s looking for ‘scientific accuracy, novelty, and importance’, in that order.60 The steep rise in positive-sounding phrases in scientific journals tells us that hype isn’t just restricted to press releases and popular-science books: it has seeped into the way scientists write their papers.

The Book of Why: The New Science of Cause and Effect
by Judea Pearl and Dana Mackenzie
Published 1 Mar 2018

The rest of statistics, including the many disciplines that looked to it for guidance, remained in the Prohibition era, falsely believing that the answers to all scientific questions reside in the data, to be unveiled through clever data-mining tricks. Much of this data-centric history still haunts us today. We live in an era that presumes Big Data to be the solution to all our problems. Courses in “data science” are proliferating in our universities, and jobs for “data scientists” are lucrative in the companies that participate in the “data economy.” But I hope with this book to convince you that data are profoundly dumb. Data can tell you that the people who took a medicine recovered faster than those who did not take it, but they can’t tell you why.

Chapter 1 assembles the three steps of observation, intervention, and counterfactuals into the Ladder of Causation, the central metaphor of this book. It will also expose you to the basics of reasoning with causal diagrams, our main modeling tool, and set you well on your way to becoming a proficient causal reasoner—in fact, you will be far ahead of generations of data scientists who attempted to interpret data through a model-blind lens, oblivious to the distinctions that the Ladder of Causation illuminates. Chapter 2 tells the bizarre story of how the discipline of statistics inflicted causal blindness on itself, with far-reaching effects for all sciences that depend on data.

I have watched its progress take shape in students’ cubicles and research laboratories, and I have heard its breakthroughs resonate in somber scientific conferences, far from the limelight of public attention. Now, as we enter the era of strong artificial intelligence (AI) and many tout the endless possibilities of Big Data and deep learning, I find it timely and exciting to present to the reader some of the most adventurous paths that the new science is taking, how it impacts data science, and the many ways in which it will change our lives in the twenty-first century. When you hear me describe these achievements as a “new science,” you may be skeptical. You may even ask, Why wasn’t this done a long time ago? Say when Virgil first proclaimed, “Lucky is he who has been able to understand the causes of things” (29 BC).

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurélien Géron
Published 13 Mar 2017

Then, before we set out to explore the Machine Learning continent, we will take a look at the map and learn about the main regions and the most notable landmarks: supervised versus unsupervised learning, online versus batch learning, instance-based versus model-based learning. Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started! Tip If you already know all the Machine Learning basics, you may want to skip directly to Chapter 2.

Poor-Quality Data Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor-quality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. It is often well worth the effort to spend time cleaning up your training data. The truth is, most data scientists spend a significant part of their time doing just that. For example: If some instances are clearly outliers, it may help to simply discard them or try to fix the errors manually. If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing values (e.g., with the median age), or train one model with the feature and one model without it, and so on.

In practice it often creates a few clusters per person, and sometimes mixes up two people who look alike, so you need to provide a few labels per person and manually clean up some clusters. 5 By convention, the Greek letter θ (theta) is frequently used to represent model parameters. 6 The code assumes that prepare_country_stats() is already defined: it merges the GDP and life satisfaction data into a single Pandas dataframe. 7 It’s okay if you don’t understand all the code yet; we will present Scikit-Learn in the following chapters. 8 For example, knowing whether to write “to,” “two,” or “too” depending on the context. 9 Figure reproduced with permission from Banko and Brill (2001), “Learning Curves for Confusion Set Disambiguation.” 10 “The Unreasonable Effectiveness of Data,” Peter Norvig et al. (2009). 11 “The Lack of A Priori Distinctions Between Learning Algorithms,” D. Wolperts (1996). Chapter 2. End-to-End Machine Learning Project In this chapter, you will go through an example project end to end, pretending to be a recently hired data scientist in a real estate company.1 Here are the main steps you will go through: Look at the big picture. Get the data. Discover and visualize the data to gain insights. Prepare the data for Machine Learning algorithms. Select a model and train it. Fine-tune your model. Present your solution. Launch, monitor, and maintain your system.

Seeking SRE: Conversations About Running Production Systems at Scale
by David N. Blank-Edelman
Published 16 Sep 2018

I strongly believe that we are now, in 2018, very close to 2001’s incredible imagined achievements. Data science and machine learning software have been available for some time now. The languages they use makes them more reachable by engineers to explore new ways to apply machine learning. In Figure 18-1, we see Python and the R language being the clear winners in this field. Notably, Anaconda, TensorFlow, and scikit-learn all use Python for interfacing with the user. For the hands-on section later in this chapter, we use TensorFlow, Python, and some Keras. Figure 18-1. Different software for machine learning (source: https://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning-software-leaders.html) What Is Machine Learning?

First, we need to share a little context about our engineering culture: at Spotify, we organize into small, autonomous teams. The idea is that every team owns a certain feature or user experience from front to back. In practice, this means a single engineering team consists of a cross-functional set of developers — from designer to backend developer to data scientist — working together on the various Spotify clients, backend services, and data pipelines. To support our feature teams, we created groups centered around infrastructure. These infrastructure teams in turn also became small, cross-functional, and autonomous to provide self-service infrastructure products.

You can start small by ensuring that service level indicators are inclusively measuring the experiences of all users, not just able-bodied users with fast, low-latency internet connections. Expanding your reach further, you can advocate for equity in products you contribute to. And your SRE skills come in useful if you choose to participate in social movements. Contributor Bio Emily Gorcenski is a data scientist and anti-racist activist from Charlottesville, Virginia, who now lives in Berlin. Her passion is the intersection of technology, regulation, and society, and she is a tireless advocate of transgender rights. Liz Fong-Jones is a developer advocate, activist, and site reliability engineer (SRE) with 14+ years of experience based out of Brooklyn, New York, and San Francisco, California.

pages: 411 words: 98,128

Bezonomics: How Amazon Is Changing Our Lives and What the World's Best Companies Are Learning From It
by Brian Dumaine
Published 11 May 2020

The company has gotten so good at applying computer technology that it has started to learn and get smarter on its own. No corporation has ever done this as successfully as Amazon. A lot of CEOs pay lip service to AI and hire a handful of data scientists in an effort to tack this technology onto their business model. At Amazon, technology is the key driver to everything it does. Consider that for the development and upgrading of its magic genie, Alexa, which runs on AI voice software, the company as of 2019 had deployed some ten thousand workers, the lion’s share of which were data scientists, engineers, and programmers. From day one, Amazon has been a technology company that just happens to sell books. Since those early days, Bezos has made big data and AI the heart of the company.

It comes as little surprise: “IDC Survey Finds Artificial Intelligence to Be a Priority for Organizations, but Few Have Implemented an Enterprise-Wide Strategy,” Business Wire, July 8, 2019. Today an estimated 35 percent: Stephen Cohn and Matthew W. Granade, “Models Will Run the World,” Wall Street Journal, August 19, 2018. This is why a computer science graduate in the U.S.: Entry Level Data Scientist Salaries, Glassdoor, https://www.glassdoor.com/Salaries/entry-level-data-scientist-salary-SRCH_KO0,26.htm. Facebook’s algorithms keep getting better: “Number of Monthly Active Facebook Users Worldwide as of 2nd Quarter 2019 (in Millions),” Statista, 2019, https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/.

Smart algorithms every day, every hour, every second learn how to please Amazon’s customers by figuring out ways to lower prices or speed up a delivery or suggest the appropriate songs or movies or have Alexa answer a question correctly in a few milliseconds. Think of this new iteration as the AI flywheel. The tens of thousands of engineers, data scientists, and programmers whom Bezos has hired have made the AI flywheel a learning machine, a cyber contraption with its own intelligence that takes all the data that Amazon collects on its 300 million customers and then analyzes it in minute detail. The machine makes decisions about what items to purchase, how much to charge for them, and where in the world to stock them.

Industry 4.0: The Industrial Internet of Things
by Alasdair Gilchrist
Published 27 Jun 2016

Adequately Skilled and Trained Staff This is imperative if you expect to benefit from serious analytics work as you will certainly need skilled data scientists, process engineers, and electromechanical engineers. Securing talent with the correct skills is proving to be a daunting task as colleges and universities seem to be behind the curve and are still pushing school leavers into careers as programmers rather than data scientists. This doesn’t seem to be changing anytime soon. This is despite the huge demand for data scientists and electro-mechanical engineers predicted over the next decade. The harsh financial reality is that the better the data analytical skills, the more likely the company can produce the algorithms required to distil information from their vast data lakes.

The whole point of intelligent devices in the Industrial Internet context is to harvest raw data and then manage the data flow, from device to the data store, to the analytic systems, to the data scientists, to the process, and then back to the device. This is the data flow cycle, where data flows from intelligent devices, through the gathering and analytical apparatus before perhaps returning as control feedback into the device. It is within this cycle where data scientists can extract prime value from the information. Key Opportunities and Benefits Not unexpectedly, when asked which key benefits most IIoT adopters want from the Industrial Internet, they say increased profits, increased revenue flows, and lower operational expenditures, in that order.

The answer is that currently they cannot, yet they can collect data in huge quantities and store it in distributed data storage facilities such as the cloud, and even take advantage of advanced analytical software to try to determine trends and correlation. However, we are not able to actually achieve this feat now, as we do not know the right questions to ask of the data. What we will require are data scientists, people skilled in understanding and trolling through vast quantities of unstructured data in search of sense and order, to distinguish patterns that ultimately will deliver value. Data scientists can use their skills in data analysis to determine patterns in the data, which is the core of M2M communication and understanding, while at the same time ask the relevant questions that derive true value from the data that will empower business strategy.

pages: 277 words: 70,506

We Are Bellingcat: Global Crime, Online Sleuths, and the Bold Future of News
by Eliot Higgins
Published 2 Mar 2021

But beyond sensible precautions, I find no reason to panic. Our opponents could try to harm me. Yet Bellingcat has become far more than a single person. In the year after our first Skripal investigations, Bellingcat opened our first office, a proper workspace in The Hague added to the mailing address in Leicester. We hired a business director, a data scientist and administrative experts, too, nearing twenty staffers – still nimble and innovative but with the heft of an established enterprise. While it’s true that I could do little to stop an attack, our opponents could do nothing to stop what we are becoming. 5 Next Steps The future of justice and the power of AI Eighteen men in orange jumpsuits knelt on the ground, hands tied behind their backs, hoods over their heads.

While the security services had sixty staffers struggling to advance their investigation, he watched Bellingcat swiftly pull together the route of the Buk launcher from open sources alone. When he pointed this out to colleagues, some were receptive. But the police force was conservative and preferred traditional methods of investigation. Soon thereafter, he completed a Ph.D. on anticipating criminal behaviour; with ambitions to use data science for better policing, he quit and founded his own company, Pandora Intelligence. One of its notable projects revamps the emergency-dispatch call using OSINT. In the traditional scenario, a dispatcher answers, notes down what is deemed significant, then forwards a briefing to an emergency service unit.

Yet at the executive level, many organisations – in the news media, human-rights activism, humanitarian law and beyond – still do not realise what is possible. In order to seed these powers among the younger generation, Bellingcat has launched a pilot training programme for university students in the Netherlands, attempting to build a grassroots movement among those studying journalism, data science and visualisation. With Dutch government funding, we conducted our first project in Utrecht, instructing a score of university students aged eighteen to twenty-five. The programme proved so successful that we expanded to five bootcamps. Each involved five days of training, spread over several weeks, allowing students to consolidate and practise the Bellingcat method: geolocation, chronolocation, social-media trawling, and so on.

pages: 296 words: 78,631

Hello World: Being Human in the Age of Algorithms
by Hannah Fry
Published 17 Sep 2018

Those rules had previously been approved in October 2016 by the Federal Communications Commission; but, after the change in government at the end of that year, they were opposed by the FCC’s new Republican majority and Republicans in Congress.15 So what does all this mean for your privacy? Well, let me tell you about an investigation led by German journalist Svea Eckert and data scientist Andreas Dewes that should give you a clear idea.16 Eckert and her team set up a fake data broker and used it to buy the anonymous browsing data of 3 million German citizens. (Getting hold of people’s internet histories was easy. Plenty of companies had an abundance of that kind of data for sale on British or US customers – the only challenge was finding data focused on Germany.)

As Rogier Creemers, an academic specializing in Chinese law and governance at the Van Vollenhoven Institute at Leiden University, puts it: ‘The best way to understand it is as a sort of bastard love child of a loyalty scheme.’29 I don’t have much comfort to offer in the case of Sesame Credit, but I don’t want to fill you completely with doom and gloom, either. There are glimmers of hope elsewhere. However grim the journey ahead appears, there are signs that the tide is slowly turning. Many in the data science community have known about and objected to the exploitation of people’s information for profit for quite some time. But until the furore over Cambridge Analytica these issues hadn’t drawn sustained, international front-page attention. When that scandal broke in early 2018 the general public saw for the first time how algorithms are silently harvesting their data, and acknow­ledged that, without oversight or regulation, it could have dramatic repercussions.

For more on this, see James Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter than the Few (New York: Doubleday, 2004), p. 4. 22. Netflix Technology Blog, https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-2-d9b96aa399f5. 23. Shih-ho Cheng, ‘Unboxing the random forest classifier: the threshold distributions’, Airbnb Engineering and Data Science, https://medium.com/airbnb-engineering/unboxing-the-random-forest-classifier-the-threshold-distributions-22ea2bb58ea6. 24. Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig and Sendhil Mullainathan, Human Decisions and Machine Predictions, NBER Working Paper no. 23180 (Cambridge, MA: National Bureau of Economic Research, Feb. 2017), http://www.nber.org/papers/w23180.

pages: 290 words: 87,549

The Airbnb Story: How Three Ordinary Guys Disrupted an Industry, Made Billions...and Created Plenty of Controversy
by Leigh Gallagher
Published 14 Feb 2017

Within that framework, a community-defense team performs proactive work to try to identify suspicious activity in advance, conducting spot checks on reservations and looking for signs that might suggest fraud or bad actors, while a community-response team handles incoming issues. The product team includes data scientists, who create behavioral models to help identify whether a reservation has a higher likelihood of, say, resulting in someone throwing a party or committing a crime (reservations are assigned a credibility score similar to a credit score), and engineers, who use machine learning to develop tools that analyze reservations to help detect risk.

Guesty, a professional management service for hosts, started by Israeli twin brothers, is one of the largest: hosts give Guesty access to their Airbnb accounts, and it handles booking management, all guest communication, calendar updating, and scheduling and coordinating with cleaners and other local service providers, for a fee of 3 percent of the booking charge. San Francisco–based Pillow creates a listing, hires cleaners, handles keys, and employs an algorithm to determine best pricing options. HonorTab brings a minifridge concept to Airbnb. Everbooked was founded by a self-described yield-management geek with expertise in data science who saw the need for dynamic pricing tools for Airbnb hosts. One of the biggest chores that hosts often need help with, for example, is turning over keys to guests. It can be hard to always arrange to be home when the guest arrives, especially if the host has a full-time job, or is out of town, or when travelers’ flights are delayed.

When the entire executive team took the Myers-Briggs Type Indicator personality test at an off-site one year, Blecharczyk registered as an ISTJ personality type, which correlated with the “inspector” role in the related Keirsey Temperament Sorter personality questionnaire. The characterization made the executive team laugh in recognition. (“That’s how they know me,” he says, “as someone who probes the details.”) Over time, Blecharczyk developed an interest in strategy, especially when, as CTO, he began to see more of the insights that were coming out of the data-science department, which reported directly to him. In the summer of 2014, after the executive team was beginning to realize the company wasn’t fully aligned on its many initiatives and goals, Blecharczyk started an “activity map” to document every project being worked on throughout the company. He identified 110 of them, but they were extremely fragmented, with different executives overseeing multiple projects in the same area.

pages: 286 words: 92,521

How Medicine Works and When It Doesn't: Learning Who to Trust to Get and Stay Healthy
by F. Perry Wilson
Published 24 Jan 2023

But we are not here to gawk; we are here to make money. Because tonight, for once, the odds are in our favor. You see, one of my data science interns, in an effort to please his cantankerous boss, has hacked into the servers controlling the slot machines and adjusted their payouts to our benefit. Two machines were successfully hacked. We just need to decide which one to play. Hacked slot machine #1 costs $1 per pull on the lever. The payout for a jackpot is $200. And thanks to my data science intern, we will hit that jackpot, on average, one out of every one hundred times. These are great odds, as you can see—chances are that by spending $100, we will win $200.

I spend most of my days at Yale’s Clinical and Translational Research Accelerator with members of my work family, who have heard about this book for the past two years in one form or another. I am thankful for their indulgence as I tried to get this project over the finish line. The team—researchers, statisticians, data scientists, technicians, students, coordinators, administrators—are all consummate scientific professionals, but they are also wonderful, compassionate individuals. They are the very engine of progress. Thanks especially to Deb Kearns, who somehow managed to keep my schedule intact during this period. Butterfly to the moon.

These are great odds, as you can see—chances are that by spending $100, we will win $200. It’s not guaranteed, of course, but it’s a hell of a lot better than the normal house odds. Hacked slot machine #2 is in the high-roller area. It costs $1,000 per pull on the lever. Like slot machine #1, the data science intern has hacked it to pay out, on average, one out of every one hundred plays. But this machine has an even better jackpot: $300,000. That’s right. We have a one in one hundred chance for a jackpot on both machines, but the jackpot for machine #1 is two hundred times the price to play, whereas it is three hundred times the price to play for machine #2.

Mastering Machine Learning With Scikit-Learn
by Gavin Hackeling
Published 31 Oct 2014

His work has appeared at top dependability conferences—DSN, ISSRE, ICAC, Middleware, and SRDS—and he has been awarded grants to attend DSN, ICAC, and ICNP. Fahad has also been an active contributor to security research while working as a cybersecurity engineer at NEEScomm IT. He has recently taken on a position as a systems engineer in the industry. Sarah Guido is a data scientist at Reonomy, where she's helping build disruptive technology in the commercial real estate industry. She loves Python, machine learning, and the startup world. She is an accomplished conference speaker and an O'Reilly Media author, and is very involved in the Python community. Prior to joining Reonomy, Sarah earned a Master's degree from the University of Michigan School of Information.

Packed with several machine learning libraries available in the Clojure ecosystem. Machine Learning with R ISBN: 978-1-78216-214-8 Paperback: 396 pages Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications 1. Harness the power of R for statistical computing and data science. 2. Use R to apply common machine learning algorithms with real-world applications. 3. Prepare, examine, and visualize data for analysis. 4. Understand how to choose between machine learning models. Please check www.PacktPub.com for information on our titles www.it-ebooks.info

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by Dipanjan Sarkar
Published 1 Dec 2016

Automated Text Classification Text Classification Blueprint Text Normalization Feature Extraction Bag of Words Model TF-IDF Model Advanced Word Vectorization Models Classification Algorithms Multinomial Naïve Bayes Support Vector Machines Evaluating Classification Models Building a Multi-Class Classification System Applications and Uses Summary Chapter 5:​ Text Summarization Text Summarization and Information Extraction Important Concepts Documents Text Normalization Feature Extraction Feature Matrix Singular Value Decomposition Text Normalization Feature Extraction Keyphrase Extraction Collocations Weighted Tag–Based Phrase Extraction Topic Modeling Latent Semantic Indexing Latent Dirichlet Allocation Non-negative Matrix Factorization Extracting Topics from Product Reviews Automated Document Summarization Latent Semantic Analysis TextRank Summarizing a Product Description Summary Chapter 6:​ Text Similarity and Clustering Important Concepts Information Retrieval (IR) Feature Engineering Similarity Measures Unsupervised Machine Learning Algorithms Text Normalization Feature Extraction Text Similarity Analyzing Term Similarity Hamming Distance Manhattan Distance Euclidean Distance Levenshtein Edit Distance Cosine Distance and Similarity Analyzing Document Similarity Cosine Similarity Hellinger-Bhattacharya Distance Okapi BM25 Ranking Document Clustering Clustering Greatest Movies of All Time K-means Clustering Affinity Propagation Ward’s Agglomerative Hierarchical Clustering Summary Chapter 7:​ Semantic and Sentiment Analysis Semantic Analysis Exploring WordNet Understanding Synsets Analyzing Lexical Semantic Relations Word Sense Disambiguation Named Entity Recognition Analyzing Semantic Representations Propositional Logic First Order Logic Sentiment Analysis Sentiment Analysis of IMDb Movie Reviews Setting Up Dependencies Preparing Datasets Supervised Machine Learning Technique Unsupervised Lexicon-based Techniques Comparing Model Performances Summary Index Contents at a Glance About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1:​ Natural Language Basics Chapter 2:​ Python Refresher Chapter 3:​ Processing and Understanding Text Chapter 4:​ Text Classification Chapter 5:​ Text Summarization Chapter 6:​ Text Similarity and Clustering Chapter 7:​ Semantic and Sentiment Analysis Index About the Author and About the Technical Reviewer About the Author Dipanjan Sarkar is a data scientist at Intel, the world’s largest silicon company, which is on a mission to make the world more connected and productive. He primarily works on analytics, business intelligence, application development, and building large-scale intelligent systems. He received his master’s degree in information technology from the International Institute of Information Technology, Bangalore, with a focus on data science and software engineering. He is also an avid supporter of self-learning, especially through massive open online courses, and holds a data science specialization from Johns Hopkins University on Coursera.

SSBM Finance Inc is a Delaware corporation. This book is dedicated to my parents, partner, well-wishers, and especially to all the developers, practitioners, and organizations who have created a wonderful and thriving ecosystem around analytics and data science. Introduction I have been into mathematics and statistics since high school, when numbers began to really interest me. Analytics, data science, and more recently text analytics came much later, perhaps around four or five years ago when the hype about Big Data and Analytics was getting bigger and crazier. Personally I think a lot of it is over-hyped, but a lot of it is also exciting and presents huge possibilities with regard to new jobs, new discoveries, and solving problems that were previously deemed impossible to solve.

Sarkar has been an analytics practitioner for over four years, specializing in statistical, predictive, and text analytics. He has also authored a couple of books on R and machine learning, reviews technical books, and acts as a course beta tester for Coursera. Dipanjan’s interests include learning about new technology, financial markets, disruptive startups, data science, and more recently, artificial intelligence and deep learning. In his spare time he loves reading, gaming, and watching popular sitcoms and football. About the Technical Reviewer Shanky Sharma Currently leading the AI team at Nextremer India, Shanky Sharma’s work entails implementing various AI and machine learning–related projects and working on deep learning for speech recognition in Indic languages.

pages: 567 words: 122,311

Lean Analytics: Use Data to Build a Better Startup Faster
by Alistair Croll and Benjamin Yoskovitz
Published 1 Mar 2013

They’re uneasy with their companies being optimized without a soul, and see the need to look at the bigger picture of the market, the problem they’re solving, and their fundamental business models. Ultimately, quantitative data is great for testing hypotheses, but it’s lousy for generating new ones unless combined with human introspection. How to Think Like a Data Scientist Monica Rogati, a data scientist at LinkedIn, gave us the following 10 common pitfalls that entrepreneurs should avoid as they dig into the data their startups capture. Assuming the data is clean. Cleaning the data you capture is often most of the work, and the simple act of cleaning it up can often reveal important patterns.

Who This Book Is For This book is for the entrepreneur trying to build something innovative. We’ll walk you through the analytical process, from idea generation to achieving product/market fit and beyond, so this book both is for those starting their entrepreneurial journey as well as those in the middle of it. Web analysts and data scientists may also find this book useful, because it shows how to move beyond traditional “funnel visualizations” and connect their work to more meaningful business discussions. Similarly, business professionals involved in product development, product management, marketing, public relations, and investing will find much of the content relevant, as it will help them understand and assess startups.

Remember, however, that you may still be able to invite them back to the service later if you have significant feature upgrades—as Path did when it redesigned its application—or if you’ve found a way to reach them with daily content, as Memolane did when it sent users memories from past years. As Shopify data scientist Steven H. Noble[32] explains in a detailed blog post,[33] the simple formula for churn is: Table 9-1 shows a simple example of a freemium SaaS company’s churn calculations. Table 9-1. Example of churn calculations Jan Feb Mar Apr May Jun Users Starting with 50,000 53,000 56,300 59,930 63,923 68,315 Newly acquired 3,000 3,600 4,320 5,184 6,221 7,465 Total 53,000 56,600 60,920 66,104 72,325 79,790 Active users Starting with 14,151 15,000 15,900 16,980 18,276 19,831 Newly active 849 900 1080 1,296 1,555 1,866 Total 15,000 15,900 16,980 18,276 19,831 21,697 Paying users Starting with 1,000 1,035 1,035 1049 1,079 1,128 Newly acquired 60 72 86 104 124 149 Lost (25) (26) (27) (29) (30) (33) Total 1,035 1,081 1,140 1,216 1,310 1,426 Table 9-1 shows users, active users, and paying users.

Artificial Whiteness
by Yarden Katz

This position is echoed by other not-for-profits in the sphere of critical AI commentary, such as the American Civil Liberties Union (ACLU). At an AI policy conference hosted at MIT, which included participants from the American intelligence community and the White House, the ACLU’s executive director stated that “AI has tremendous promise, but it really depends if the data scientists and law enforcement work together.”24 These recommendations highlight the expansionist dimension of carceral-positive logic. Reforms such as data “curation,” “maintenance,” or “auditing” all entail allocating more resources for computing systems used in surveillance and policing and the infrastructure around them, and hence to those profiting from the prison-industrial complex.

AI’s rebranding was an opportunity for companies to privatize more scientific research. Google and Microsoft have used the frenzy around AI to file patents on commonly used algorithmic techniques.23 The label’s nebulosity makes for broad patents: a MasterCard–owned company filed for a patent on a “Method for Providing Data Science, Artificial Intelligence and Machine Learning As-A-Service.” AI made for hazy patents in the past, too: a patent in 1985, simply titled “Artificial Intelligence System,” claimed “an artificial intelligence system for accepting a statement, understanding the statement and making a response to the statement based upon at least a partial understanding of the statement.”

The ideological nucleus of AI discourse stays intact.   15.   As computer scientist and statistician Michael I. Jordan has observed, “Such [re]labeling may come as a surprise to optimization or statistics researchers, who find themselves suddenly called AI researchers.” Jordan, “Artificial Intelligence—The Revolution Hasn’t Happened Yet,” Harvard Data Science Review, June 23, 2019. But while Jordan recognizes a rebranding of sorts, he does not question the possibility and coherence of AI nor the political forces capitalizing on its rise. Rather, he argues that the real workhorses behind the recent advances were his own fields of statistics and optimization, which produced “ideas hidden behind the scenes” that “have powered companies such as Google, Netflix, Facebook, and Amazon.”

pages: 291 words: 90,771

Upscale: What It Takes to Scale a Startup. By the People Who've Done It.
by James Silver
Published 15 Nov 2018

‘If you don’t have good people taking control of those areas, who are able to step up and be accountable for key issues like hiring, product and growth, then you won’t really know where to turn as a founder.’ ‘Communication and ongoing dialogue are critical.’ As a founder, you need to make sure that your teams - right down to specialists, if you’re a software company, such as your back-end and front-end developers, data scientists and DevOps person - are talking to one another, and that people are pointed in the same direction. Obviously the more people you have in the organisation, the more challenging that becomes to manage and orchestrate. ‘That’s where things like strategy, culture, goals and objectives become really important because the larger, the more complex the organisation becomes, the more you need things that really bind people together.’

It even got to the point where we were considering putting trackers in the boxes, but it was a time of heightened security concerns and we thought radio transmitters were not a good idea.’ He smiles. ‘Then someone asked: “Can’t you just pay to have these things scanned?” It turned out you could, at seven different places. You just chunked up the money and did it once and did a data science piece [based on] posting them on different days, different strategies, and posting them in different places.’ As a UK-headquartered startup on a limited budget exploring logistics across a vast and complex country, the team ultimately took an imaginative, entrepreneurial approach: namely, one that involved cardboard rabbits.

She explains: ‘When you’re using machine learning, every time you get an answer right, that’s good. However, every time you get an answer wrong, it’s called a “false positive”. There were very few people in the world at that stage that we met that even understood the terminology. So finding the guys that were fighting the fraud battles initially was brilliant, because we spoke the same data-science language, that was the first thing. ‘Then they were able to tell us about the problems they were having with their existing suppliers. We replied: “If we were able to build you something that actually worked better, would that be of interest to you?” And the answer to that was: “Yes, please.”’ King and her team were thus in the enviable position of designing and building a product with continuous input from the customer.

pages: 237 words: 74,109

Uncanny Valley: A Memoir
by Anna Wiener
Published 14 Jan 2020

* * * I moved into an apartment in the Castro, joining a man and a woman in their late twenties, roommates who had wiggled their way onto a hand-me-down lease. They were tech workers, too. The woman worked as a midlevel product manager at the social network everyone hated; the man as a data scientist at a struggling solar-energy startup. They were both endurance runners; the data scientist kept a road bike in his bedroom. They had no body fat. They had no art in the apartment, either. On the refrigerator was an impressive collection of novelty magnets arranged in a perfect grid. The apartment was gigantic, a duplex with two living rooms and a view of the bay.

It could be integrated into online boutiques, digital megamalls, banks, social networks, streaming and gaming websites. It gathered data for platforms that enabled people to book flights or hotels or restaurant reservations or wedding venues; platforms for buying a house or finding a house cleaner, ordering takeout or arranging a date. Engineers and data scientists and product managers would inject snippets of our code into their own codebases, specify which behaviors they wanted to track, and begin collecting data immediately. Anything an app or website’s users did—tap a button, take a photograph, send a payment, swipe right, enter text—could be recorded in real time, stored, aggregated, and analyzed in those beautiful dashboards.

A pay-as-you-wish yoga studio shared a creaky walk-up with the headquarters of an encrypted-communications platform. A bodega selling loosies sat below an anarchistic hacker space. The older office buildings, regal and unkempt with marble floors and peeling paint, housed orthodontists and rare-book dealers alongside four-person companies trying to gamify human resources or commoditize meditation. Data scientists smoked weed in Dolores Park with Hula-Hoopers and blissed-out suburban teenagers. The independent movie theaters played ads for networked appliances and B2B software before projecting seventies cult classics. Even racks at the dry cleaner suggested a city in transition: starched police uniforms and synthetic neon furs, sheathed in plastic, hung beside custom-made suits and machine-washable pullovers.

pages: 157 words: 53,125

The Fifth Risk
by Michael Lewis
Published 1 Oct 2018

The company was about to go public, and they wanted to clean up the organization chart. To that end DJ sat down with his counterpart at Facebook, who was dealing with the same problem. What could they call all these data people? “Data scientist,” his Facebook friend suggested. “We weren’t trying to create a new field or anything, just trying to get HR off our backs,” said DJ. He replaced the job titles for some openings with “data scientist.” To his surprise, the number of applicants for the jobs skyrocketed. “Data scientists” were what people wanted to be. In the fall of 2014 someone from the White House called him. Obama was coming to San Francisco and wanted to meet with him.

“How do we know if any of this will be of any use?” she asked. “If your husband is as good as everyone says he is, he’ll figure it out,” said Obama. Which of course made it even harder for DJ to refuse. DJ went to Washington. His assignment was to figure out how to make better use of the data created by the U.S. government. His title: Chief Data Scientist of the United States. He’d be the first person to hold the job. He made his first call at the Department of Commerce, to meet with Penny Pritzker, the commerce secretary, and Kathy Sullivan, the head of the National Oceanic and Atmospheric Administration. They were pleased to see him but also a bit taken aback that he had come.

“We’re going to open all the data and go to every economics department and say,‘Hey, you want a PhD?’ In every agency there were questions to be answered. Most of the answers we have gotten have not come from government. They’ve come from the broad American public who has access to the data.” The opioid crisis was a case in point. The data scientists in the Department of Health and Human Services had opened up the Medicaid and Medicare data, which held information about prescription drugs. Journalists at ProPublica had combed through it and discovered odd concentrations of opioid prescriptions. “We would never have figured out that there was an opioid crisis without the data,” said DJ.

pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack
by Matthew A. Russell
Published 15 Jan 2011

Further analysis of the graph is left as a voluntary exercise for the reader, as the primary objective of this chapter was to get your development environment squared away and whet your appetite for more interesting topics. Graphviz appears elsewhere in this book, and if you consider yourself to be a data scientist (or are aspiring to be one), it is a tool that you’ll want to master. That said, we’ll also look at many other useful approaches to visualizing graphs. In the chapters to come, we’ll cover additional outlets of social web data and techniques for analysis. Synthesis: Visualizing Retweets with Protovis A turn-key example script that synthesizes much of the content from this chapter and adds a visualization is how we’ll wrap up this chapter.

The approach introduced in this section is to use graph-like structures, where a link between documents encodes a measure of the similarity between them. This situation presents an excellent opportunity to introduce more visualizations from Protovis, a cutting-edge HTML5-based visualization toolkit under development by the Stanford Visualization Group. Protovis is specifically designed with the interests of data scientists in mind, offers a familiar declarative syntax, and achieves a nice middle ground between high-level and low-level interfaces. A minimal (uninteresting) adaptation to Example 7-7 is all that’s needed to emit a collection of nodes and edges that can be used to produce visualizations of our data examples gallery.

Levitt is the co-author of Freakonomics: A Rogue Economist Explores the Hidden Side of Everything (Harper), a book that systematically uses data to answer seemingly radical questions such as, “What do school teachers and sumo wrestlers have in common?” [34] This question was partly inspired by the interesting Radar post, “Data science democratized”, which mentions a presentation that investigated the same question. [35] A “long tail” or “heavy tail” refers to a feature of statistical distributions in which a significant portion (usually 50 percent or more) of the area under the curve exists within its tail. This concept is revisited as part of a brief overview of Zipf’s law in Data Hacking with NLTK.

pages: 100 words: 15,500

Getting Started with D3
by Mike Dewar
Published 26 Jun 2012

The data for this book has been gathered and made publicly available by the New York Metropolitan Transit Authority (MTA) and details various aspects of New York’s transit system, comprising of historical tables, live data streams, and geographical information. By the end of the book, we will have visited some of the core aspects of D3, and will be properly equipped to build basic, interactive data visualizations on the Web. Who This Book Is For This is a little book aimed at the data scientist: someone who has data to visualize and who wants to use the power of the modern web browser to give his visualizations additional impact. This might be an academic who wants to escape the confines of the printed article, a statistician who needs to share their impressive results with the rest of her company, or the designer who wants to get his info-viz out far and wide on the Internet.

Note Time spent forming clean, well-structured JSON can save you a lot of heartache down the road. Make sure any JSON you use satisfies http://jsonlint.com at the very least. Performing cleaning or data analysis in the browser is not only a frustrating programming task, but can also make your visualization less responsive. Micha’s Golden Rule Micha Gorelick, a data scientist in NYC, coined the following rule: Do not store data in the keys of a JSON blob. This is Micha’s Golden Rule; it should always be followed when forming JSON for use in D3, and will save you many confusing hours. This means that one should never form JSON like the following: { "bob": 20, "alice": 23, "drew": 30 } Here we are storing data in both the key (name) and the value (age).

This is an essential resource, both for reference and inspiration. Finally, the community around D3 is very active and friendly, and growing fast. The d3-js user group is a great resource for conversation and the d3.js tag on Stack Overflow should be used for specific questions. About the Author Mike Dewar is a data-scientist at Bitly, a New York tech company that makes long URLs shorter. He has a PhD in modelling dynamic systems from data from the University of Sheffield in the UK, and has worked as a Machine Learning post-doc in The University of Edinburgh and Columbia University. He has been drawing graphs regularly since he was in High School, and is starting to get the hang of it.

pages: 350 words: 90,898

A World Without Email: Reimagining Work in an Age of Communication Overload
by Cal Newport
Published 2 Mar 2021

To help understand the true scarcity of uninterrupted time, the RescueTime data scientists also calculated the longest interval that each user worked with no inbox checks or instant messaging. For half the users studied, this longest uninterrupted interval was no more than forty minutes, with the most common length clocking in at a meager twenty minutes. More than two thirds of the users never experienced an hour or more of uninterrupted time during the period studied. To make these observations more concrete, Madison Lukaczyk, one of the data scientists involved in this report, published a chart capturing one full week of her own communication tool usage data.

But the studies cited provide only small snapshots of our current predicament, with the typical experiment observing at most a couple dozen employees for just a handful of days. For a more comprehensive picture of what’s going on in the standard networked office, we’ll turn to a small productivity software firm called RescueTime, which in recent years, with the help of a pair of dedicated data scientists, has been quietly producing a remarkable data set that allows an unprecedented look into the details of the communication habits of contemporary knowledge workers. * * * — The core product of RescueTime is its eponymous time-tracking tool, which runs in the background on your devices and records how much time you spend using various applications and websites.

Because the tool is a web application, however, all this data is stored in central servers, which makes it possible to aggregate and analyze the time use habits of tens of thousands of users. After a few false starts, RescueTime got serious about getting these analyses right. In 2016 they hired a pair of full-time data scientists, who transformed the data into the right format to study trends and properly protect privacy, and then got to work trying to understand how these modern, productivity-minded knowledge workers were actually spending their time. The results were staggering. A report from the summer of 2018 analyzed anonymized behavior data from over fifty thousand active users of the tracking software.9 It reveals that half these users were checking communication applications like email and Slack every six minutes or less.

pages: 252 words: 71,176

Strength in Numbers: How Polls Work and Why We Need Them
by G. Elliott Morris
Published 11 Jul 2022

Major media outlets, such as the Washington Post, now regularly work with online pollsters, such as Ipsos. Public opinion polling, at long last, has entered the twenty-first century. STIRRING THE POT In the Obama campaign’s data cave on Election Day 2012, things were not looking so good. “It was 10:30am . . . and my numbers were telling me that President Obama might lose Ohio,” Yair Ghitza, a data scientist for a voter-file vendor called Catalist, recounts in his PhD dissertation. Data on turnout had come in, and his modeling showed that young people and minorities were turning out at lower rates than they had expected: I remember one senior analyst’s ominous interpretation: “it looked like this could be real.”

According to Nolan McCaskill, a reporter for Politico, even Trump did not envision a victory.3 TWO DAYS AFTER THE 2016 ELECTION, the New York Times published a story titled “How Data Failed Us in Calling an Election.” In it, technology journalists Steve Lohr and Natasha Singer chastised election forecasters for getting the contest “wrong.” They decried the media’s growing reliance on data to handicap the horse race. “Data science is a technology advance with trade-offs,” they wrote. “It can see things as never before, but also can be a blunt instrument, missing context and nuance.” Lohr and Singer charged election forecasters and handicappers with taking their eyes off the ball, focusing on the data without considering other sources for prognostication.4 Election forecasters, in turn, blamed the pollsters.

pages: 294 words: 77,356

Automating Inequality
by Virginia Eubanks

The movement manufactures and circulates misleading stories about the poor: that they are an undeserving, fraudulent, dependent, and immoral minority. Conservative critics of the welfare state continue to run a very effective propaganda campaign to convince Americans that the working class and the poor must battle each other in a zero-sum game over limited resources. More quietly, program administrators and data scientists push high-tech tools that promise to help more people, more humanely, while promoting efficiency, identifying fraud, and containing costs. The digital poorhouse is framed as a way to rationalize and streamline benefits, but the real goal is what it has always been: to profile, police, and punish the poor. 2 AUTOMATING ELIGIBILITY IN THE HEARTLAND A little white donkey is chewing on a fencepost where we turn toward the Stipes house on a narrow utility road paralleling the train tracks in Tipton, Indiana.

A 14-year-old living in a cold and dirty house gets a risk score almost three times as high as a 6-year-old whose mother suspects he may have been abused and who may now be homeless. In these cases, the model does not seem to meet a commonsense standard for providing information useful enough to guide call screeners’ decision-making. Why might that be? Data scientist Cathy O’Neil has written that “models are opinions embedded in mathematics.”8 Models are useful because they let us strip out extraneous information and focus only on what is most critical to the outcomes we are trying to predict. But they are also abstractions. Choices about what goes into them reflect the priorities and preoccupations of their creators.

But its outwardly neutral classifications mask discriminatory outcomes that rob whole communities of wealth, compounding cumulative disadvantage. The digital poorhouse replaces the sometimes-biased decision-making of frontline social workers with the rational discrimination of high-tech tools. Administrators and data scientists focus public attention on the bias that enters decision-making systems through caseworkers, property managers, service providers, and intake center workers. They obliquely accuse their subordinates, often working-class people, of being the primary source of racist and classist outcomes in their organizations.

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurelien Geron
Published 14 Aug 2019

Then, before we set out to explore the Machine Learning continent, we will take a look at the map and learn about the main regions and the most notable landmarks: supervised versus unsupervised learning, online versus batch learning, instance-based versus model-based learning. Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started! Tip If you already know all the Machine Learning basics, you may want to skip directly to Chapter 2.

Poor-Quality Data Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor-quality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. It is often well worth the effort to spend time cleaning up your training data. The truth is, most data scientists spend a significant part of their time doing just that. For example: If some instances are clearly outliers, it may help to simply discard them or try to fix the errors manually. If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing values (e.g., with the median age), or train one model with the feature and one model without it, and so on.

It’s just boring Pandas code that joins the life satisfaction data from the OECD with the GDP per capita data from the IMF. 7 It’s okay if you don’t understand all the code yet; we will present Scikit-Learn in the following chapters. 8 For example, knowing whether to write “to,” “two,” or “too” depending on the context. 9 Figure reproduced with permission from Banko and Brill (2001), “Learning Curves for Confusion Set Disambiguation.” 10 “The Unreasonable Effectiveness of Data,” Peter Norvig et al. (2009). 11 “The Lack of A Priori Distinctions Between Learning Algorithms,” D. Wolpert (1996). Chapter 2. End-to-End Machine Learning Project In this chapter, you will go through an example project end to end, pretending to be a recently hired data scientist in a real estate company.1 Here are the main steps you will go through: Look at the big picture. Get the data. Discover and visualize the data to gain insights. Prepare the data for Machine Learning algorithms. Select a model and train it. Fine-tune your model. Present your solution.

pages: 255 words: 78,207

Web Scraping With Python: Collecting Data From the Modern Web
by Ryan Mitchell
Published 14 Jun 2015

All tables in MySQL must have at least one primary key (the key column that MySQL sorts on), so that MySQL knows how to order it, and it can often be difficult to choose these keys intelligently. The debate over whether to use an artificially created id column for this key or some unique attribute such as username has raged among data scientists and software engineers for years, although I tend to lean on the side of creating id col‐ umns. The reasons for doing it one way or the other are complicated but for nonen‐ terprise systems, you should always be using an id column as an autoincremented primary key. Second, use intelligent indexing.

Appendix C includes case studies, as well as a breakdown of key issues that might affect how you can legally run scrapers in the United States and use the data that they produce. Technical books are often able to focus on a single language or technology, but web scraping is a relatively disparate subject, with practices that require the use of databa‐ ses, web servers, HTTP, HTML, Internet security, image processing, data science, and other tools. This book attempts to cover all of these to an extent for the purpose of gathering data from remote sources across the Internet. Part I covers the subject of web scraping and web crawling in depth, with a strong focus on a small handful of libraries used throughout the book. Part I can easily be used as a comprehensive reference for these libraries and techniques (with certain exceptions, where additional references will be provided).

pages: 318 words: 73,713

The Shame Machine: Who Profits in the New Age of Humiliation
by Cathy O'Neil
Published 15 Mar 2022

In their massive research labs, mathematicians work closely with psychologists and anthropologists, using our behavioral data to train their machines. Their objective is to spur customer participation and to mine advertising gold. When it comes to this type of intense engagement, shame is one of the most potent motivators. It’s right up there with sex. So even if the data scientists and their bosses in the executive suites might not map out a strategy based on shaming, their automatic algorithms zero in on it. It spurs traffic and boosts revenue. You could argue that the people mocking Joanna McCabe didn’t intend to hurt her. They were just having a laugh. The photo of her tumble at Walmart provided an opportunity to preen on social media and to drive up reputations, gaining likes and followers.

See also labor and employment work requirements for government benefits, 60–61, 66–67, 76 work therapy rehab programs, 48, 54–56 Z Zaki, Jamil, 213 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z BY CATHY O’NEIL Doing Data Science (with Rachel Schutt) On Being a Data Skeptic Weapons of Math Destruction The Shame Machine About the Author CATHY O’NEIL is the author of the bestselling Weapons of Math Destruction, which won the Euler Book Prize and was longlisted for the National Book Award. She received her PhD in mathematics from Harvard and has worked in finance, tech, and academia.

pages: 267 words: 72,552

Reinventing Capitalism in the Age of Big Data
by Viktor Mayer-Schönberger and Thomas Ramge
Published 27 Feb 2018

This wouldn’t work in every market—getting a good deal on a car can save a person thousands of dollars and may well be worth the effort. But for clothes, as Stitch Fix shows, it’s a model that can be successful. To achieve service at scale, Stitch Fix analyzes rich and comprehensive data streams. By 2016, the company employed more than seventy data scientists on a team headed by chief algorithm officer Eric Colson. Colson ran data science at Netflix, one of the rich-data pioneers. As it turns out, though, picking the right clothes is much harder than suggesting films to watch. Stitch Fix employs vastly more sophisticated data analytics than the standard social-filtering recommendation engines (“people who liked this movie also enjoyed this one”).

pages: 250 words: 79,360

Escape From Model Land: How Mathematical Models Can Lead Us Astray and What We Can Do About It
by Erica Thompson
Published 6 Dec 2022

If modellers, because of who they are, are able to work from home on a laptop while other people maintain the supply chains that bring them their lunchtime sandwich, they simply may not have access to all of the possible harms of the kinds of actions that are being proposed. This is not a conspiracy theory. Instead, this is what data scientists Catherine D’Ignazio and Lauren Klein term ‘privilege hazard’. It reflects no malice aforethought on the part of the modellers, who I am sure were doing their level best at a time of national crisis. Nor does it necessarily mean that the wrong information was given: after all, the modellers provided what the politicians said they wanted, and in a democracy it is the job of the political representative (not the scientist) to combine the available information with the values and interests of citizens to come to a decision.

The Dissemination of Reliable Knowledge, Cambridge University Press, 2010 Mansnerus, Erika, Modelling in Public Health Research: How Mathematical Techniques Keep us Healthy, Springer, 2014 Neustadt, Richard, and Harvey Fineberg, The Swine Flu Affair: Decision-making on a Slippery Disease, US Dept of Health, Education and Welfare, 1978 Rhodes, Tim, Kari Lancaster and Marsha Rosengarten, ‘A Model Society: Maths, Models and Expertise in Viral Outbreaks’, Critical Public Health, 30(3), 2020 ——, and Kari Lancaster, ‘Mathematical Models as Public Troubles in COVID-19 Infection Control: Following the Numbers’, Health Sociology Review, 29(2), 2020 Richardson, Eugene, ‘Pandemicity, COVID-19 and the Limits of Public Health Science’, BMJ Global Health, 5(4), 2020 Spinney, Laura, Pale Rider: The Spanish Flu of 1918 and How it Changed the World, Vintage, 2018 Chapter 10: Escaping from Model Land Dhami, Mandeep, ‘Towards an Evidence-Based Approach to Communicating Uncertainty in Intelligence Analysis’, Intelligence and National Security, 33(2), 2018 Harding, Sandra, Objectivity and Diversity: Another Logic of Scientific Research, University of Chicago Press, 2015 Marchau, Vincent, Warren Walker, Pieter Bloemen and Steven Popper (eds), Decision Making under Deep Uncertainty, Springer, 2019 Scoones, Ian, and Andy Stirling, The Politics of Uncertainty: Challenges of Transformation, Taylor & Francis, 2020 Chris Vernon Erica Thompson is a senior policy fellow at the London School of Economics’ Data Science Institute and a fellow of the London Mathematical Laboratory. With a PhD from Imperial College, she has recently worked on the limitations of models of COVID-19 spread, humanitarian crises, and climate change. She lives in West Wales.

Likewar: The Weaponization of Social Media
by Peter Warren Singer and Emerson T. Brooking
Published 15 Mar 2018

Russian sockpuppets ran rampant on services like Instagram, an image-sharing platform with over 800 million users (larger than Twitter and Snapchat combined) and more popular among youth than its Facebook corporate parent. Here, the pictorial nature of Instagram made the disinformation even more readily shareable and reproducible. In 2017, data scientist Jonathan Albright conducted a study of just twenty-eight accounts identified as having been operated by the Russian government. He found that this handful of accounts had drawn an astounding 145 million “likes,” comments, and plays of their embedded videos. They’d also provided the visual ammunition subsequently used by other trolls who stalked Facebook and Twitter.

One was labeled a “pornographer,” and another was accused of harassment. Such attacks can be doubly effective, not only silencing the direct targets but also discouraging others from doing the sort of work that earned such abuse. While the sockpuppets were extremely active in the 2016 election, it was far from their only campaign. In 2017, data scientists searched for patterns in accounts that were pushing the theme of #UniteTheRight, the far-right protests that culminated in the killing of a young woman in Charlottesville, Virginia, by a neo-Nazi. The researchers discovered that one key account in spreading the messages of hate came to life each day at 8:00 A.M.

As psychologist Sander van der Linden has written, belief in online conspiracy theories makes one more supportive of “extremism, racist attitudes against minority groups (e.g., anti-Semitism) and even political violence.” Modest lies and grand conspiracy theories have been weapons in the political arsenal for millennia. But social media has made them more powerful and more pervasive than ever before. In the most comprehensive study of its kind, MIT data scientists charted the life cycles of 126,000 Twitter “rumor cascades”—the first hints of stories before they could be verified as true or false. The researchers found that the fake stories spread about six times faster than the real ones. “Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information,” they wrote.

pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control
by Stuart Russell
Published 7 Oct 2019

Faced with the socioeconomic equivalent of becoming pet food, humans will be rather unhappy with their governments. Faced with potentially unhappy humans, governments around the world are beginning to devote some attention to the issue. Most have already discovered that the idea of retraining everyone as a data scientist or robot engineer is a nonstarter—the world might need five or ten million of these, but nowhere close to the billion or so jobs that are at risk. Data science is a very tiny lifeboat for a giant cruise ship.27 Some are working on “transition plans”—but transition to what? We need a plausible destination in order to plan a transition—that is, we need a plausible picture of a desirable future economy where most of what we currently call work is done by machines.

The progress of automation in legal analytics, describing the results of a contest: Jason Tashea, “AI software is more accurate, faster than attorneys when assessing NDAs,” ABA Journal, February 26, 2018. 26. A commentary by a distinguished economist, with a title explicitly evoking Keynes’s 1930 article: Lawrence Summers, “Economic possibilities for our children,” NBER Reporter (2013). 27. The analogy between data science employment and a small lifeboat for a giant cruise ship comes from a discussion with Yong Ying-I, head of Singapore’s Public Service Division. She conceded that it was correct on the global scale, but noted that “Singapore is small enough to fit in the lifeboat.” 28. Support for UBI from a conservative viewpoint: Sam Bowman, “The ideal welfare system is a basic income,” Adam Smith Institute, November 25, 2013. 29.

pages: 245 words: 71,886

Spike: The Virus vs The People - The Inside Story
by Jeremy Farrar and Anjana Ahuja
Published 15 Jan 2021

In meetings at Number 10 and elsewhere, he was always curious, asked the right probing questions of the science and had the capacity to spot the difference between good-quality evidence and bogus science. He listened intently and seemed to be one of the few people who could make things happen at speed across government. Ben Warner is a data scientist who trained at University College London and worked with Cummings at Vote Leave. During February and March 2020, Warner frequently attended SAGE meetings. Cummings also called upon the expertise of Marc Warner, Ben’s brother and also a data scientist, who founded a company called Faculty. Marc had also been involved with Vote Leave and had recently been drafted in to work with NHS Digital. Cummings offers his own account of what happened in this important period, which echoes but also adds to the seven hours of evidence he gave to UK MPs on 26 May 2021.

By that time, Cummings said, there was unanimity among Patrick, Chris, Ben Warner and John Edmunds that intervention was needed. The Prime Minister did not want to act. Cummings claims that, in order to try to change Johnson’s mind, he organised a meeting on Tuesday 22 September 2020, at which Johnson was presented with the case numbers and infections rates by Catherine Cutts, a data scientist newly recruited to Number 10. She showed Johnson the current data and then fast-forwarded a month, to role-play the scenario of infections and deaths projected for October. Cummings says: ‘We presented it all as if we were about six weeks in the future. This was my best attempt to get people to actually see sense and realise that it would be better for the economy as well as for health to get on top of it fast.

He has been praised for his loyal support for other SAGE members and enthusiasm for initiatives like COG-UK. Jonathan Van-Tam An expert on respiratory viruses, Van-Tam is (with Jenny Harries) deputy chief medical officer for England. He commented on Cummings’s Durham excursion that the rules applied to everyone: public adherence might depend on this principle. Ben Warner A data scientist with a doctorate from University College London, Warner worked with Cummings at Vote Leave. He attended SAGE meetings as a Number 10 observer and raised early concerns about the UK’s coronavirus response. Chris Whitty The UK government’s chief medical adviser, Whitty trained in infectious diseases and did a period of study in Vietnam.

pages: 575 words: 140,384

It's Not TV: The Spectacular Rise, Revolution, and Future of HBO
by Felix Gillette and John Koblin
Published 1 Nov 2022

If HBO failed to keep pace with streaming technology it would soon be rendered obsolete. While Otto Berkes lobbied internally for more resources, his team opened up a new HBO office in Seattle, not far from his old stomping grounds at Microsoft. There, he began assembling a new team of engineers, product designers, and data scientists. Everybody who came on board knew the ultimate mission. Beat Netflix. “At the end of the day, I think my story, and the team that I built there, is really about modern tech meets old media,” Berkes says. “We think differently. We’re data driven. It’s not about relationships—couldn’t give a shit.

“We weren’t trying to copy HBO,” says Jonathan Friedland, the former Netflix communications executive. “We were trying to do it better than them. We were trying to do it with a different approach, a data-driven approach, as opposed to pure touch. That’s the fundamental thing to understand about Netflix: It’s a data science and technology company. Every step we took was guided by data. We aspired to the quality level of execution but based on a totally different set of tools.” The data analysis would prove to be spot on. House of Cards would go on to air for six seasons, becoming a defining series for the streaming service, proof that Netflix could master the same “It’s Not TV” game played by HBO.

Now when she talks to screenwriters trying to figure out the ideal home for their projects, the answer, almost surprisingly, remains the same. “HBO still is the one they want to go to,” Strauss says. “There is still a way that they look at things and a process that they go through, which is a cut above.” As the cable era drew to a close, Bloys recognized that the streamniks were right about one thing, data science was immensely useful for certain tasks like marketing, customer retention, and optimization of budgets on a broad scale. But great television, he also knew, would never come from listening to the customers or mining their preferences on the internet. “People don’t know they need the Roys until they meet the Roys,” he says of Succession.

pages: 372 words: 100,947

An Ugly Truth: Inside Facebook's Battle for Domination
by Sheera Frenkel and Cecilia Kang
Published 12 Jul 2021

For the past year, the company’s data scientists had been quietly running experiments that tested how Facebook users responded when shown content that fell into one of two categories: good for the world or bad for the world. The experiments, which were posted on Facebook under the subject line “P (Bad for the world),” had reduced the visibility of posts that people considered “bad for the world.” But while they had successfully demoted them in the News Feed, therefore prompting users to see more posts that were “good for the world” when they logged into Facebook, the data scientists found that users opened Facebook far less after the changes were made.

The experiment laid bare both Facebook’s power to reach deep into the psyche of its users and its willingness to test the boundaries of that power without users’ knowledge. “Emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness,” Facebook data scientists wrote in a research paper published in the Proceedings of the National Academy of Sciences. They described how, over the course of one week in 2012, they had tampered with what almost 700,000 Facebook users saw when they logged on to the platform. In the experiment, some Facebook users were shown content that was overwhelmingly “happy,” while others were shown content that was overwhelming “sad.”

“We were announcing all these changes to long-standing policy but treating them as ad hoc, isolated decisions.” Other data were equally alarming. Internal reports also showed a steady rise in extremist groups and conspiracy movements. Facebook’s security team reported incidents of real-world violence, as well as frightening comments made in private groups. Facebook’s data scientists and security officials noted a 300 percent increase, from June through August 2020, in content related to the conspiracy theory QAnon. QAnon believers perpetuated a false theory that liberal elites and celebrities like Bill Gates, Tom Hanks, and George Soros ran a global child trafficking ring.

The Smartphone Society
by Nicole Aschoff

Stories of self-teaching algorithms, autonomous cars, expert reports predicting the disappearance of at least half the world’s jobs in the next couple of decades due to automation and robots abound. It’s a wonder we don’t stay in bed, scrolling through our feeds, awaiting the Singularity in dignified repose. Indeed, a chorus of warnings from tech naysayers and handwringers suggests we should be terrified of what a Silicon Valley future holds. Data scientists paint a picture of a future in which algorithms determine everything, and in some accounts this future is now. Companies and state and federal agencies have used algorithms to determine whether someone will get parole, get a job, get a loan, get a raise, get public assistance, be accepted to a school, get fired, or get a promotion.

Atkinson, president of the Information Technology and Innovation Foundation, says, “No matter how many times a purported expert claims we are facing an epochal technology revolution that will destroy tens of millions of jobs and leave large swathes of human workers permanently unemployed, it still isn’t true.”39 Data scientists are also speaking up to assert that algorithms don’t take people out of the equation and they aren’t unbiased or neutral. Algorithms are designed by people (who have their own biases) and are trained on datasets that themselves often reflect bias and discrimination in their collection and design.

The GDPR’s creation originated in complaints by a former law student, Max Schrems, about Silicon Valley firms’ violations of European privacy laws. Other advocates for digital justice point to how tech companies do more than just invade our privacy—they develop and deploy algorithms that can cause substantial harm to individuals and communities. A growing number of data scientists advocate shining a light into the “black-box algorithms” that are being rapidly integrated into decision-making processes in myriad spheres of life. They call for “algorithmic accountability,” which the nonprofit research institute Data & Society defines as “the assignment of responsibility for how an algorithm is created and its impact on society; if harm occurs, accountable systems include a mechanism for redress.”20 Nicholas Diakopoulos, director of the Computational Journalism Lab at Northwestern University, and Sorelle Friedler, a computer science professor at Haverford College, suggest five dimensions of algorithmic accountability: responsibility, explainability, accuracy, auditability, and fairness.

We Are the Nerds: The Birth and Tumultuous Life of Reddit, the Internet's Culture Laboratory
by Christine Lagorio-Chafkin
Published 1 Oct 2018

Slowe was given a Reddit hoodie, black with subtle orange-red accents, and began working on managing the effort to update the site’s very infrastructure, update the homepage algorithm, modernize anticheating measures, and keep users’ data safe. It wasn’t long before a handful of other Hipmunk engineers also joined Reddit. Some rejoined, including early Reddit employees David King and Jason Harvey. Later, once Reddit started furiously hiring programmers, more than a half dozen former Hipmunk data scientists and engineers would join their ranks. (Hires of his trusted team weren’t limited to engineering; Huffman later hired Hipmunk’s marketing executive, Roxy Young, as VP of marketing.) Ricky Ramirez and Neil Williams had never left Reddit. Combining the forces from essentially all eras of Huffman’s career made it feel like he’d created a reunion show of star employees.

A Reddit representative later said no advertisements connected to the Russian Internet Research Agency were detected. In the midst of all this, Reddit’s legal team proactively reached out to Warner’s office to introduce themselves. As part of the internal investigation, Reddit dug into an isolated conspiracy theory, Pizzagate. Its data scientists found no evidence at the time of suspicious Russian domains orchestrating the dissemination or analysis of Pizzagate on the subreddit. That was all regular Redditors. The spread of political disinformation by regular users wasn’t what Congress was investigating—it was primarily looking into easier-to-track and -grasp advertising—but a representative from Warner’s office said at the time, “No one denies that Reddit has been a hub of anti-Semitic and white nationalist expression.

As new engineers were hired, more were handed over to Slowe to build robust antispam systems. As Slowe’s team grew, he proved an adept manager and was handed an even larger team—eighteen engineers—leading the group in charge of maintaining and developing the full Reddit site’s architecture. Before long, data science was spun out as its own team and also placed under Slowe’s purview. By September 2016, he had four teams of engineers reporting to him. They’d put in motion a longer-term customization of Reddit’s homepage, busted spam by 90 percent, and hired like mad—giving Reddit the future capability not just to maintain the status quo, but rather to build well-functioning systems on top of the site’s old code, rendering the last version Slowe had touched so many years earlier, at long last, mostly useless.

System Error: Where Big Tech Went Wrong and How We Can Reboot
by Rob Reich , Mehran Sahami and Jeremy M. Weinstein
Published 6 Sep 2021

Indeed, most of us will not face algorithmic decision-making in the criminal justice system, though we will grapple with algorithms in many other aspects of our daily lives. But those who will face it are among society’s most vulnerable—and often the victims of historical injustice and systemic inequalities. Cathy O’Neil, a former mathematics professor turned data scientist, emphasizes this in her seminal book Weapons of Math Destruction, writing that algorithmic decision-making models “tended to punish the poor and the oppressed in our society, while making the rich richer.” She recounts this phenomenon in the criminal justice system as well as many other domains such as credit scoring, college admissions, and employment decisions.

There was even a time when computer science departments struggled to attract students. But over the past thirty years, the programmer Davids have defeated the industrial Goliaths and become the new masters of the universe. Enrollments in computer science classes are booming almost everywhere. The reason is obvious: programming and data science are hugely valuable, and students want a chance to contribute to the digital revolution that is profoundly reshaping our world, changing individual human experience, social connection, community, and politics at a national and global level. Of course, the salary premium and chance of amassing start-up riches don’t hurt, either.

Big respect for the team for their great work,” Twitter, February 15, 2019, https://twitter.com/yoavgo/status/1096471273050382337. “humans find GPT-2 outputs convincing”: Irene Solaiman, Jack Clark, and Miles Brundage, “GPT-2: 1.5B Release,” OpenAI, November 5, 2019, https://openai.com/blog/gpt-2-1-5b-release/. “extremist groups can use”: Ibid. the training data: Stephen Ornes, “Explainer: Understanding the Size of Data,” Science News for Students, December 13, 2013, https://www.sciencenewsforstudents.org/article/explainer-understanding-size-data. “Kanye West Exclusive”: Arram Sabeti, “GPT-3,” Arram Sabeti (blog), July 9, 2020, https://arr.am/2020/07/09/gpt-3-an-ai-thats-eerily-good-at-writing-almost-anything/. “Why deep learning will never”: Gwern Branwen, “GPT-3 Creative Fiction,” gwern.net, June 19, 2020, https://www.gwern.net/GPT-3#why-deep-learning-will-never-truly-x; Kelsey Piper, “GPT-3, Explained: This New Language AI Is Uncanny, Funny—and a Big Deal,” Vox, August 13, 2020, https://www.vox.com/future-perfect/21355768/gpt-3-ai-openai-turing-test-language.

pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension
by Samuel Arbesman
Published 18 Jul 2016

Creating generalists who are able to serve this function well in our society first involves the construction of what have become known as T-shaped individuals, a term that appears to have first originated in computing education. T-shaped individuals have deep expertise in one area—the stem of the T shape—but breadth of knowledge as well: the bar of the T. What do these types of people look like? One example is the data scientist, who uses the tools of computer science and statistics to find meaning in large datasets, no matter what the discipline. Data scientists have to know a lot about many different areas in order to do their job successfully. We see something similar in applied mathematicians, who use quantitative tools to cut across disciplines and find commonalities, acting as generalists.

abstraction, 163 biological thinking’s avoidance of, 115–16 in complexity science, 133, 135 in physics thinking, 115–16, 121–22, 128 specialization and, 24, 26–27 technological complexity and, 23–28, 81, 121–22 accretion, 65 in complex systems, 36–43, 51, 62, 65, 191 in genomes, 156 in infrastructure, 42, 100–101 legacy systems and, 39–42 in legal system, 40–41, 46 in software, 37–38, 41–42, 44 in technological complexity, 130–31 unexpected behavior and, 38 aesthetics: biological thinking and, 119 and physics thinking, 113, 114 aggregation, diffusion-limited, 134–35 algorithm aversion, 5 Amazon, 5 American Philosophical Society, 90 Anaximander of Miletus, 139 Apple, 161, 163 Apple II computer, 77 applied mathematics, 143 arche, 140 Ariane 5 rocket, 1996 explosion of, 11–12 Aristotle, 151 Ascher, Kate, 100 Asimov, Isaac, 124 atomic nucleus, discovery of, 124, 141 Audubon, John James, 109 autocorrect, 5, 16 automobiles: self-driving, 91, 231–32 software in, 10–11, 13, 45, 65, 100, 174 see also Toyota automobiles Autonomous Technology (Winner), 22 Average Is Over (Cowen), 84 awe, as response to technological complexity, 6, 7, 154–55, 156, 165, 174 bacteria, 124–25 Balkin, Jack, 60–61 Ball, Philip, 12, 87–88, 136, 140 Barr, Michael, 10 Barrow, Isaac, 89 BASIC, 44–45 Bayonne Bridge, 46 Beacock, Ian, 12–13 Benner, Steven, 119 “Big Ball of Mud” (Foote and Yoder), 201 binary searches, 104–5 biological systems, 7 accretion in, 130–31 complexity of, 116–20, 122 digital technology and, 49 kluges in, 119 legacy code in, 118, 119–20 modules in, 63 tinkering in, 118 unexpected behavior in, 109–10, 123–24 biological thinking, 222 abstraction avoided in, 115–16 aesthetics and, 119 as comfortable with diversity and complexity, 113–14, 115 concept of miscellaneous in, 108–9, 140–41, 143 as detail oriented, 121, 122, 128 generalization in, 131–32 humility and, 155 physics thinking vs., 114–16, 137–38, 142–43, 222 technological complexity and, 116–49, 158, 174 Blum, Andrew, 101–2 Boeing 777, 99 Bogost, Ian, 154 Bookout, Jean, 10 Boorstin, Daniel, 89 Borges, Jorge Luis, 76–77, 131 Boston, Mass., 101, 102 branch points, 80–81 Brand, Stewart, 39–40, 126, 198–99 Brookline, Mass., 101 Brooks, David, 155 Brooks, Frederick P., Jr., 38, 59, 93 bugs, in software, see software bugs bureaucracies, growth of, 41 cabinets of curiosities (wunderkammers), 87–88, 140 calendar application, programming of, 51–53 Cambridge, Mass., 101 cancer, 126 Carew, Diana, 46 catastrophes, interactions in, 126 Challenger disaster, 9, 11, 12, 192 Chandra, Vikram, 77 Chaos Monkey, 107, 126 Chekhov, Anton, 129 Chekhov’s Gun, 129 chess, 84 Chiang, Ted, 230 clickstream, 141–42 Clock of the Long Now, The (Brand), 39–40 clouds, 147 Code of Federal Regulations, 41 cognitive processing: of language, 73–74 limitations on, 75–76, 210 nonlinear systems and, 78–79 outliers in, 76–77 working memory and, 74 see also comprehension, human collaboration, specialization and, 91–92 Commodore VIC-20 computer, 160–61 complexity, complex systems: acceptance of, see biological thinking accretion in, 36–43, 51, 62, 65, 191 aesthetics of, 148–49, 156–57 biological systems and, 116–17, 122 buoys as examples of, 14–15, 17 complication vs., 13–15 connectivity in, 14–15 debugging of, 103–4 edge cases in, 53–62, 65, 201, 205 feedback and, 79, 141–45 Gall on, 157–58, 227 hierarchies in, 27, 50–51 human interaction with, 163 infrastructure and, 100–101 inherent vs. accidental, 189 interaction in, 36, 43–51, 62, 65, 146 interconnectivity of, see interconnectivity interpreters of, 166–67, 229 kluges as inevitable in, 34–36, 62–66, 127 in legal systems, 85 and limits of human comprehension, 1–7, 13, 16–17, 66, 92–93 “losing the bubble” and, 70–71, 85 meaning of terms, 13–20 in natural world, 107–10 scientific models as means of understanding, 165–67 specialization and, 85–93 unexpected behavior in, 27, 93, 96–97, 98–99, 192 see also diversity; technological complexity complexity science, 132–38, 160 complication, complexity vs., 13–15 comprehension, human: educability of, 17–18 mystery and, 173–74 overoptimistic view of, 12–13, 152–53, 156 wonder and, 172 see also cognitive processing comprehension, human, limits of, 67, 212 complex systems and, 1–7, 13, 16–17, 66, 92–93 humility as response to, 155–56 interconnectivity and, 78–79 kluges and, 42 legal system and, 22 limitative theorems and, 175 “losing the bubble” in, 70–71, 85 Maimonides on, 152 stock market systems and, 26–27 technological complexity and, 18–29, 69–70, 80–81, 153–54, 169–70, 175–76 unexpected behavior and, 18–22, 96–97, 98 “Computational Biology” (Doyle), 222 computational linguistics, 54–57 computers, computing: complexity of, 3 evolutionary, 82–84, 213 impact on technology of, 3 see also programmers, programming; software concealed electronic complexity, 164 Congress, U.S., 34 Constitution, U.S., 33–34 construction, cost of, 48–50 Cope, David, 168–69, 229–30 corpus, in linguistics, 55–56 counting: cognitive limits on, 75 human vs. computer, 69–70, 97, 209 Cowen, Tyler, 84 Cryptonomicon (Stephenson), 128–29 “Crystalline Structure of Legal Thought, The” (Balkin), 60–61 Curiosity (Ball), 87–88 Dabbler badge, 144–45 dark code, 21–22 Darwin, Charles, 115, 221, 227 Daston, Lorraine, 140–41 data scientists, 143 datasets, massive, 81–82, 104–5, 143 debugging, 103–4 Deep Blue, 84 diffusion-limited aggregation (DLA), 134–35 digital mapping systems, 5, 49, 51 Dijkstra, Edsger, 3, 50–51, 155 “Divers Instances of Peculiarities of Nature, Both in Men and Brutes” (Fairfax), 111–12 diversity, 113–14, 115 see also complexity, complex systems DNA, see genomes Doyle, John, 222 Dreyfus, Hubert, 173 dwarfism, 120 Dyson, Freeman, on unity vs. diversity, 114 Dyson, George, 110 Economist, 41 edge cases, 53–62, 65, 116, 128, 141, 201, 205, 207 unexpected behavior and, 99–100 see also outliers Einstein, Albert, 114 Eisen, Michael, 61 email, evolution of, 32–33 emergence, in complex systems, 27 encryption software, bugs in, 97–98 Enlightenment, 23 Entanglement, Age of, 23–29, 71, 92, 96, 97, 165, 173, 175, 176 symptoms of, 100–102 Environmental Protection Agency, 41 evolution: aesthetics and, 119 of biological systems, 117–20, 122 of genomes, 118, 156 of technological complexity, 127, 137–38 evolutionary computation, 82–84, 213 exceptions, see edge cases; outliers Facebook, 98, 189 failure, cost of, 48–50 Fairfax, Nathanael, 111–12, 113, 140 fear, as response to technological complexity, 5, 7, 154–55, 156, 165 Federal Aviation Administration (FAA), Y2K bug and, 37 feedback, 14–15, 79, 135 Felsenstein, Lee, 21 Fermi, Enrico, 109 Feynman, Richard, 9, 11 field biologists, 122 for complex technologies, 123, 126, 127, 132 financial sector: interaction in, 126 interconnectivity of, 62, 64 see also stock market systems Firthian linguistics, 206 Flash Crash (2010), 25 Fleming, Alexander, 124 Flood, Mark, 61, 85 Foote, Brian, 201 Fortran, 39 fractals, 60, 61, 136 Frederick the Great, king of Prussia, 89 fruit flies, 109–10 “Funes the Memorious” (Borges), 76–77, 131 Galaga, bug in, 95–96, 97, 216–17 Gall, John, 157–58, 167, 227 game theory, 210 garden path sentences, 74–75 generalists, 93 combination of physics and biological thinking in, 142–43, 146 education of, 144, 145 explosion of knowledge and, 142–49 specialists and, 146 as T-shaped individuals, 143–44, 146 see also Renaissance man generalization, in biological thinking, 131–32 genomes, 109, 128 accretion in, 156 evolution of, 118, 156 legacy code (junk) in, 118, 119–20, 222 mutations in, 120 RNAi and, 123–24 Gibson, William, 176 Gingold, Chaim, 162–63 Girl Scouts, 144–45 glitches, see unexpected behavior Gmail, crash of, 103 Gödel, Kurt, 175 “good enough,” 27, 42, 118, 119 Goodenough, Oliver, 61, 85 Google, 32, 59, 98, 104–5 data centers of, 81–82, 103, 189 Google Docs, 32 Google Maps, 205 Google Translate, 57 GOTO command, 44–45, 81 grammar, 54, 57–58 gravitation, Newton’s law of, 113 greeblies, 130–31 Greek philosophy, 138–40, 151 Gresham College, 89 Guide of the Perplexed, The (Maimonides), 151 Haldane, J.

pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson and Andrew McAfee
Published 20 Jan 2014

Another interesting fact is that the majority of Kaggle contests are won by people who are marginal to the domain of the challenge—who, for example, made the best prediction about hospital readmission rates despite having no experience in health care—and so would not have been consulted as part of any traditional search for solutions. In many cases, these demonstrably capable and successful data scientists acquired their expertise in new and decidedly digital ways. Between February and September of 2012 Kaggle hosted two competitions about computer grading of student essays, which were sponsored by the Hewlett Foundation.* Kaggle and Hewlett worked with multiple education experts to set up the competitions, and as they were preparing to launch many of these people were worried.

.* Kaggle and Hewlett worked with multiple education experts to set up the competitions, and as they were preparing to launch many of these people were worried. The first contest was to consist of two rounds. Eleven established educational testing companies would compete against one another in the first round, with members of Kaggle’s community of data scientists invited to join in, individually or in teams, in the second. The experts were worried that the Kaggle crowd would simply not be competitive in the second round. After all, each of the testing companies had been working on automatic grading for some time and had devoted substantial resources to the problem. Their hundreds of person-years of accumulated experience and expertise seemed like an insurmountable advantage over a bunch of novices.

Economics reporter Catherine Rampell points out that college graduates are the only group that has seen employment growth since the start of the Great Recession in 2007, and in October of 2011 the unemployment rate for bachelor’s degree holders, at 5.8 percent, was only about half that of those with associate’s degrees (10.6 percent) and a third that of those who stopped after high school (16.2 percent).18 The college premium exists in part because so many types of raw data are getting dramatically cheaper, and as data get cheaper, the bottleneck increasingly is the ability to interpret and use data. This reflects the career advice that Google chief economist Hal Varian frequently gives: seek to be an indispensable complement to something that’s getting cheap and plentiful. Examples include data scientists, writers of mobile phone apps, and genetic counselors, who have come into demand as more people have their genes sequenced. Bill Gates has said that he chose to go into software when he saw how cheap and ubiquitous computers, especially microcomputers, were becoming. Jeff Bezos systematically analyzed the bottlenecks and opportunities created by low-cost online commerce, particularly the ability to index large numbers of products, before he set up Amazon.

Demystifying Smart Cities
by Anders Lisdorf

Here a data asset should exist only in one version sanctioned by the data owner. The discovery zone is specific to a unit or organization, and users can bring in their own data or create their own versions of data sets. This is not meant for general consumption but is more like a sandbox for data scientists where they can prepare new data sets and make ad hoc experiments. It is the only zone the users have a possibility to create and upload data to. The operational zone is like a traditional operational data store and is in essence a read replica of an operational database. It is used in order not to unnecessarily affect an operational, transactional database with queries.

At a workshop hosted by 100 Resilient Cities and the city of New York, the topic was how we could use data to improve the resilience of our cities. The brainstorming session produced multiple suggestions, but the one with most votes was the data catalog. Making it possible to find the data you need is crucial for realizing the value of data. Promote data lineage transparency – Data scientists around the world spend an inordinate amount of time and worry about where their data comes from and try to track down all the steps it goes through before it ends up in their data source. This is with good reason since this is crucial to the quality and nature of the data. Promoting data lineage transparency can be solved with tools, but they typically cover only the particular vendor stack.

A good example of a famous scientist is Stephen Hawking, who investigated black holes that are of little direct practical relevance. The same can be said of Charles Darwin who also worked on problems that at the time had little practical relevance. This type is fairly rare but can be found in an R&D division of a product-oriented company. However, it has recently resurfaced in the form of the data scientist. While they often work on more applied science type of problems, they also occasionally supply real scientific results based on their own research agenda. This can be seen in larger tech companies. For cities it is rare to have scientists employed, but they can frequently be supplied by a close-by university system.

pages: 335 words: 97,468

Uncharted: How to Map the Future
by Margaret Heffernan
Published 20 Feb 2020

A simplistic commercial view of the future is being forced onto a world as though there are no alternative possibilities, when in fact there are many. The same sleight of hand is intrinsic to almost all discussions of artificial intelligence. Wild promises are made about the capacity of AI to predict disease, crime, recidivism, career trajectories, lifespan. Inevitablism discourages practical questions. Data scientists know that, with a large enough dataset, projecting trends with gross accuracy is easy, but it’s near impossible to reduce from that to pinpoint accuracy for an individual. Philosophical questions abound – and not only about who owns the data and for what purposes. The rhetoric flowing from Silicon Valley casually assumes that a person is simply an aggregation of data.

It is, as one commentator noticed, one more step towards cutting out human agency altogether.44 Pervasive monitoring devices – smartphones, wearables, voice-enabled speakers and smart meters – allow companies to track and manage consumer behaviour. The Harvard business scholar Shoshana Zuboff quotes an unnamed chief data scientist who explains: ‘The goal of everything we do is to change people’s actual behavior at scale . . . we can capture their behaviours and identify good and bad [ones]. Then we develop “treatments” or “data pellets” that select good behaviours.’45 MIT’s Alex Pentland seems more interested in enhancing machines than human understanding.

One scenario planner, Angela Wilkinson, compared computer models in scenario planning to ‘a heavy axe in the hand of a fireman – even if it hinders his escape from a fire, a fireman is reluctant to drop his axe. The axe is a help most of the time, but a dangerous burden in extreme events.’ If you entrust scenario planning to data scientists, there is a real risk that strategy and data bifurcate so severely that the plastic, creative exercise of integrating them fails. And while it is clear that scenario planning demands a diverse range of people who are open-minded with deep intellectual curiosity, expertise and a capacity to think freely, those people can be hard to find.

pages: 244 words: 66,977

Subscribed: Why the Subscription Model Will Be Your Company's Future - and What to Do About It
by Tien Tzuo and Gabe Weisert
Published 4 Jun 2018

International Data Corporation (IDC) predicts that by 2020, 50 percent of the world’s largest enterprises will see the majority of their business depend on their ability to create digitally enhanced products, services, and experiences. Focusing on services over products is also a sound business strategy. Zuora’s Subscription Economy Index, which you’ll find at the end of this book, shows that subscription-based companies are growing eight times faster than the S&P 500 and five times faster than US retail sales. Our chief data scientist, Carl Gold, put this report together using anonymized, aggregated, system-generated activity on our platform. I urge you to read it—it’s a fascinating document, based on billions of dollars of revenue and millions of financial transactions, that has all sorts of industry benchmarks and insights.

At the same time, we’re also hearing a lot more about how all these companies have in-house teams of “growth hackers,” which on a surface level sounds a lot like, well, marketing. They’re trying to come up with smarter ways to drive sales. But these folks tend to reject that label. Stitch Fix has more than ninety data scientists on its payroll. These people aren’t thinking of snappier punch lines for billboards; they’re looking for ways to optimize growth within the service itself. It’s almost as if the engineers have taken over the marketing shop: building freemium models, creating upgrade incentives, offering in-app purchases.

Over a period of just under six years (January 1, 2012, to September 30, 2017), the SEI grew at an average annual rate of 17.6%. The S&P 500 Sales grows at an average annual rate of 2.2%, while US Retail Sales grew at an average annual rate of 3.6%. This study was conducted by Zuora chief data scientist Carl Gold. Subscription business sales have grown substantially faster than two key public benchmarks—S&P 500 sales and US retail sales. Overall, the SEI reveals that subscription businesses grew revenues about eight times faster than S&P 500 company revenues (17.6 percent versus 2.2 percent) and about five times faster than US retail sales (17.6 percent versus 3.6 percent) from January 1, 2012, to September 30, 2017.

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

Eshoo, “Preeminent Universities and Leading Tech Companies Announce Support for Bipartisan, Bicameral Bill to Develop National AI Research Cloud,” news release, June 29, 2020, https://eshoo.house.gov/media/press-releases/preeminent-universities-and-leading-tech-companies-announce-support-bipartisan. 32increasingly concentrated in the hands of corporate-backed labs: Deep Ganguli et al., Predictability and Surprise in Large Generative Models (arXiv.org, February 15, 2022), https://arxiv.org/pdf/2202.07785.pdf, 11. 32“do not have the luxury of a large amount of compute and engineering resources”: Anima Kumar, “An Open and Shut Case on OpenAI,” anima-ai.org, January 1, 2021, https://anima-ai.org/2019/02/18/an-open-and-shut-case-on-openai/. 32help academics stay engaged: Gil Alterovitz et al., Recommendations for Leveraging Cloud Computing Resources for Federally Funded Artificial Intelligence Research and Development (Select Committee on Artificial Intelligence, National Science & Technology Council, November 17, 2020), https://www.nitrd.gov/pubs/Recommendations-Cloud-AI-RD-Nov2020.pdf; Hatef Monajemi et al., “Ambitious Data Science Can Be Painless,” Harvard Data Science Review, no. 1.1 (July 1, 2019), https://doi.org/10.1162/99608f92.02ffc552. 32National Artificial Intelligence Research Resource: Division E, “National Artificial Intelligence Research Resource,” in William M. (Mac) Thornberry National Defense Authorization Act for Fiscal Year 2021, H.R. 6395, 116th Cong. (2019), https://www.congress.gov/bill/116th-congress/house-bill/6395/text; The White House, “The Biden Administration Launches the National Artificial Intelligence Research Resource Task Force,” news release, June 10, 2021, https://www.whitehouse.gov/ostp/news-updates/2021/06/10/the-biden-administration-launches-the-national-artificial-intelligence-research-resource-task-force/; Interim NAIRR Task Force, Envisioning a National Artificial Intelligence Research Resource (NAIRR): Preliminary Findings and Recommendations, May 2022, https://www.ai.gov/wp-content/uploads/2022/05/NAIRR-TF-Interim-Report-2022.pdf; Lynne Parker, “Bridging the Resource Divide for Artificial Intelligence Research,” OSTP blog, May 22, 2022, https://www.whitehouse.gov/ostp/news-updates/2022/05/25/bridging-the-resource-divide-for-artificial-intelligence-research/. 32China’s Thousand Talents Plan: Threats to the U.S.

Under the old method, the top academically performing cadets got their first pick, and then it would go down the list of cadets, in academic rank, until each job position filled up. Easley explained that the Army has known for a while that it wasn’t optimally aligning people to jobs that might be the best fit for them. “They just didn’t have a better way of implementing it,” he said. AI is changing that. Lieutenant Colonel Isaac Faber, chief data scientist for the Army AI Task Force, outlined how they are in the process of building an AI model that uses five years’ worth of officer performance data—“tens of thousands of data points”—to predict how well West Point cadets are likely to do in a given career field. In 2020, “for the first time,” Faber said, “a machine learning algorithm will be part of the fabric that makes up the branching recommendations for cadets at West Point.”

Rosenfeld et al., A Constructive Prediction of the Generalization Error Across Scales (arXiv.org, December 20, 2019), https://arxiv.org/pdf/1909.12673.pdf; Tom Henighan et al., Scaling Laws for Autoregressive Generative Modeling (arXiv.org, November 6, 2020), https://arxiv.org/pdf/2010.14701.pdf. 20“Data is the new oil”: Joris Toonders, “Data Is the New Oil of the Digital Economy,” Wired, n.d., https://www.wired.com/insights/2014/07/data-new-oil-digital-economy/; Kiran Bhageshpur, “Data Is the New Oil—and That’s a Good Thing,” Forbes, November 15, 2019, https://www.forbes.com/sites/forbestechcouncil/2019/11/15/data-is-the-new-oil-and-thats-a-good-thing/?sh=10eefed73045; Adeola Adesina, “Data Is the New Oil,” Medium, November 13, 2018, https://medium.com/@adeolaadesina/data-is-the-new-oil-2947ed8804f6; Will Murphy, “Data Is the New Oil,” Towards Data Science, May 7, 2017, https://towardsdatascience.com/data-is-the-new-oil-f11440e80dd0; Giuliano Giacaglia, “Data Is the New Oil,” Hackernoon, February 9, 2019, https://hackernoon.com/data-is-the-new-oil-1227197762b2. 20data-is-not-the-new-oil articles: Antonio Garcia Martinez, “No, Data Is Not the New Oil,” Wired, February 26, 2019, https://www.wired.com/story/no-data-is-not-the-new-oil/; Bernard Marr, “Here’s Why Data Is Not the New Oil,” Forbes, March 5, 2018, https://www.forbes.com/sites/bernardmarr/2018/03/05/heres-why-data-is-not-the-new-oil/?

pages: 343 words: 91,080

Uberland: How Algorithms Are Rewriting the Rules of Work
by Alex Rosenblat
Published 22 Oct 2018

Its results, published in a 2018 report, state that on the national level in the United States, 93 percent of its drivers drive fewer than twenty hours per week, and 93 percent are employed, seeking employment, full-time students, or retired.9 In February 2018, Uber published a blog post stating that “nearly 60% of U.S. drivers use Uber less than 10 hours a week.”10 Uber confirmed in an email to me that the latter statistic accounted for drivers who drove fewer than ten hours a week in a typical workweek over the previous three months, according to data scientists on Uber’s policy team.11 However, the Uber and Lyft reports on how much drivers work for either company tell only part of the story: a typical driver I met in New York City worked full time for multiple apps (often two to three), such as some combination of Uber, Lyft, Juno, Via, and Gett. Indeed, a 2016 report by the Office of the Mayor in New York City states that, of taxi and for-hire drivers (which includes ridehail drivers), “about three-quarters of all drivers say that driving a taxi or other for-hire vehicle is their full-time job.”12 Lyft’s 2018 report also offers a city-by-city breakdown of driver statistics, and it states that in New York City, 91 percent of drivers work fewer than twenty hours per week13—but that may simply reflect the fact that drivers who work full time are giving some of their hours to local competitors, like Uber, Juno, or Via.

The app is known to display information about local happenings like sporting events to drivers so they can anticipate demand. While nudges are not necessarily manipulative and do inherently provide nudge-recipients with a sense of choice or agency,50 they are nevertheless highly influential in setting expectations. The recommendations that individual drivers receive from Uber may be the product of mathematical data science, but it’s not clear that Uber is an honest broker of that data. Moreover, when drivers use that data to their own advantage, such as by rejecting a nonsurge dispatch when they are located in a surge-pricing zone, in order to wait for a surge-priced dispatch, they risk being fired. In other words, drivers are not merely consumers of free data-driven analysis, like users of GPS navigation services.

How to Work Without Losing Your Mind
by Cate Sevilla
Published 14 Jan 2021

She acknowledges that she was comfortable in her corporate job, and would never have considered a change of career otherwise. She says that she’s trying to see her redundancy as a chance to try something new, and as a tremendous opportunity: I’m thinking about training as a data scientist. I’ve done a course in my spare time to try to keep my skills going, so I might spend the money consolidating what I’ve taught myself and then try and change career into data science. It wasn’t even around as a job when I graduated. But because I’ve got this opportunity of being redundant, it’s given me a bit of a kick to something else. If you’re going to get a bit of money, why not do it now?

pages: 379 words: 109,223

Frenemies: The Epic Disruption of the Ad Business
by Ken Auletta
Published 4 Jun 2018

“We really have to get into these walled gardens to really understand what people are doing and how they’re behaving,” he says. Mobile phones pose obstacles as well; marketing on mobile phones is complex. Salama cautions, “How do we test ads on mobile?” And if the mobile phone lacks a flash drive, they can’t show the ad. Talent may be big data’s most crucial impediment, Salama thinks. “Everyone wants to hire data scientists and engineers.” The supply is limited, the competition intense. * * * ■ ■ ■ In the data arms race, Irwin Gotlieb set out to build a state-of-the-art data weapon, known internally as the Secret Sauce project. Gotlieb was intent on building GroupM’s own proprietary data system because, as he would publicly complain in 2015, Facebook and Google had their own ad tech vehicles to target ads and were, in effect, muscling agencies and clients by warning: “If you want to buy ads on our properties, you have to use our ad tech tools.”

Third, I need to match that to an advertiser that’s trying to reach you. Fourth, I need to value [price] that. And fifth, I need the systems to be able to get you the right ad within two hundred milliseconds.” Agencies have to change, he says, and will have to recruit more “software developers and data scientists and analysts. . . . Tech people are going to take over.” Like colleagues at other agencies, Xaxis executives obsess about Facebook. On any given day, a Xaxis vice president confides, they see 130 billion Internet pages that carry an ad. Facebook and Google see trillions. With its walled garden, he says Facebook tries to “curtail people from understanding what’s happening in their environments.

The reason that progress in AI has seemed so pronounced in the past few years is that technological advances in all three areas have accelerated.”* The race to dominate AI reveals companies with deep pockets—Google, Facebook, Amazon, IBM, Oracle, Apple, Salesforce.com, among others—vying to hire engineers and data scientists. “The bulk of Fox advertising will be sold by machines,” predicts James Murdoch, who goes on to say this will threaten the existence of the advertising holding companies. “The bulk of their business, the buying of media and the analysis of how to generate reach at a low incremental cost, it’s hard to see what their role is twenty years from now.

pages: 428 words: 103,544

The Data Detective: Ten Easy Rules to Make Sense of Statistics
by Tim Harford
Published 2 Feb 2021

Clearly this was a ponderous method, although it is unlikely that it introduced enough errors to explain the huge disparity between my experience and the official occupancy figures. In any case, in the age of contactless payments it’s much easier to estimate passenger numbers. The vast majority of bus journeys are made by people tapping an identifiable contactless chip on a bank card, a TfL Oyster card, or a smartphone. The data scientists at TfL can see where and when these devices are being used. They still have to make an educated guess as to when you get off the bus, but this is often possible—for example, they might see you make the return journey from the same area later. Or they might see that you had used your card on a connecting service: whenever I tapped into the tube network at Bethnal Green, one minute after the bus I’d been riding on arrived in the area, TfL could conclude with confidence that I’d been on the bus until the stop at Bethnal Green, but no farther.

I’ve discussed this story with many people and I’m struck by a disparity. Most people are wide-eyed with astonishment. But two of the groups I hang out with a lot take a rather different view. Journalists are often cynical; some suspect Duhigg of inventing, exaggerating, or passing on an urban myth. (I suspect them of professional jealousy.) Data scientists and statisticians, on the other hand, yawn. They regard the tale as both unsurprising and uninformative. And I think the statisticians have it right. First, let’s think for a moment about just how amazing it might be to predict that someone is pregnant based on their shopping habits: not very.

We’re awestruck by the algorithm in part because we don’t appreciate the mundanity of what’s happening underneath the magician’s silk handkerchief. And there’s another way in which Duhigg’s story of Target’s algorithm invites us to overestimate the capabilities of data-fueled computer analytics. “There’s a huge false positive issue,” says Kaiser Fung, a data scientist who has spent years developing similar approaches for retailers and advertisers. What Fung means is that we don’t get to hear stories about women who receive coupons for baby wear but who aren’t pregnant. Hearing the anecdote, it’s easy to assume that Target’s algorithms are infallible—that everybody receiving coupons for onesies and wet wipes is pregnant.

pages: 484 words: 114,613

No Filter: The Inside Story of Instagram
by Sarah Frier
Published 13 Apr 2020

When Instagram launched new features, she tried to make sure they demonstrated them with teen digital-first influencers. The data showed that these kinds of stars, who had become famous on Vine, YouTube, or Instagram, were much more popular than anyone in the office expected them to be. She made a list of 500 of them, then asked Facebook data scientists for help understanding their impact. They found that about a third of Instagram’s user base followed at least one of the people on her list. Perle, like Porch, thought that Instagram should have a role in creating future mainstream celebrities—and that it would be important to build relationships with the ones who hadn’t quite become stars yet but had high interest from their audiences.

In the case of Myspace, the disruptor was Facebook. Paranoia over obsolescence festered at Facebook’s very core, and was the reason they’d bought Instagram and attempted to buy Snapchat in the first place. The anecdotal evidence from Third Thursday Teens was backed up by the data. When Nayak first heard about finstas, she asked Instagram’s data scientists to look into how many people had multiple accounts. After weeks of pestering, she got the numbers back. Between 15 and 20 percent of users had multiple accounts, and among teens, that proportion was much higher. She wrote out a report to explain the phenomenon for the Instagram team, since she couldn’t find anything about it in a Google search.

But Zuckerberg gave the same dismissive opinion at a question-and-answer session with employees the next day. He also told his workers that there was another, more positive way of looking at it. If people were blaming Facebook for the election’s outcome, it showed how important the social network was to their everyday lives. Not long after Zuckerberg’s talk, a data scientist posted a study internally on the difference between Trump’s campaign and Clinton’s. That was when employees realized there was another, maybe even bigger way their company had helped ensure the election outcome. In their attempt to be impartial, Facebook had given much more advertising strategy help to Trump.

pages: 409 words: 112,055

The Fifth Domain: Defending Our Country, Our Companies, and Ourselves in the Age of Cyber Threats
by Richard A. Clarke and Robert K. Knake
Published 15 Jul 2019

By formally defining and verifying modular components of code, these pieces of code may be used as trusted building blocks, providing strong footing for less secure software running on top of them. Meanwhile, as we wait for our AI overlords to start writing better code, Zatko and his wife, the data scientist Sarah Zatko, have started rating today’s software for how well it is constructed. At the request of the White House in 2015, they set up the Cyber Independent Testing Lab to automate the process of rating software quality to, in his words, “quantify the resilience of software against future exploitation.”

Evan Wolff is one of the leading cybersecurity attorneys in Washington. What that means is that by day he helps his clients respond to cyber incidents, including directing investigations and advising on notifications under state, federal, and international requirements. By night, as a former MITRE data scientist and Global Fellow at the Wilson Center, he thinks and writes about how his clients can mitigate the threat of cyber incidents in the first place, including what can be done to build an effective collective defense. From experience, Wolff recognizes that only rarely would the teams behind security incidents stop at nothing to reach their targets.

For most AI/ML programs to work well, that data all needs to be swimming in the same place, in what big corporations call their data lake. It seldom is. It’s scattered. Or sometimes it is not even collected or stored, or not stored for very long, or not stored in the right format. Then it has to be converted into a usable format through what data scientists politely call “manicuring” (and behind the scenes call “data mangling”) for the AI/ML engine to perform its work. Capturing all the data and storing it for six weeks or more to catch the “low-and-slow” attacks (ones that take each step in the attack days or weeks apart so as not to be noticed) would be a very expensive proposition for any company.

pages: 1,034 words: 241,773

Enlightenment Now: The Case for Reason, Science, Humanism, and Progress
by Steven Pinker
Published 13 Feb 2018

I am grateful as well to Marian Tupy of HumanProgress and to Ola Rosling and Hans Rosling of Gapminder, two other invaluable resources for understanding the state of humanity. Hans was an inspiration, and his death in 2017 a tragedy for those who are committed to reason, science, humanism, and progress. My gratitude goes as well to the other data scientists I pestered and to the institutions that collect and maintain their data: Karlyn Bowman, Daniel Cox (PRRI), Tamar Epner (Social Progress Index), Christopher Fariss, Chelsea Follett (HumanProgress), Andrew Gelman, Yair Ghitza, April Ingram (Science Heroes), Jill Janocha (Bureau of Labor Statistics), Gayle Kelch (US Fire Administration/FEMA), Alaina Kolosh (National Safety Council), Kalev Leetaru (Global Database of Events, Language, and Tone), Monty Marshall (Polity Project), Bruce Meyer, Branko Milanović (World Bank), Robert Muggah (Homicide Monitor), Pippa Norris (World Values Survey), Thomas Olshanski (US Fire Administration/FEMA), Amy Pearce (Science Heroes), Mark Perry, Therese Pettersson (Uppsala Conflict Data Program), Leandro Prados de la Escosura, Stephen Radelet, Auke Rijpma (OECD Clio Infra), Hannah Ritchie (Our World in Data), Seth Stephens-Davidowitz (Google Trends), James X.

One consequence is that many Americans today have difficulty imagining, valuing or even believing in the promise of incremental system change, which leads to a greater appetite for revolutionary, smash-the-machine change.30 Bornstein and Rosenberg don’t blame the usual culprits (cable TV, social media, late-night comedians) but instead trace it to the shift during the Vietnam and Watergate eras from glorifying leaders to checking their power—with an overshoot toward indiscriminate cynicism, in which everything about America’s civic actors invites an aggressive takedown. If the roots of progressophobia lie in human nature, is my suggestion that it is on the rise itself an illusion of the Availability bias? Anticipating the methods I will use in the rest of the book, let’s look at an objective measure. The data scientist Kalev Leetaru applied a technique called sentiment mining to every article published in the New York Times between 1945 and 2005, and to an archive of translated articles and broadcasts from 130 countries between 1979 and 2010. Sentiment mining assesses the emotional tone of a text by tallying the number and contexts of words with positive and negative connotations, like good, nice, terrible, and horrific.

The technological advances that have propelled this progress should only gather speed. Stein’s Law continues to obey Davies’s Corollary (Things that can’t go on forever can go on much longer than you think), and genomics, synthetic biology, neuroscience, artificial intelligence, materials science, data science, and evidence-based policy analysis are flourishing. We know that infectious diseases can be extinguished, and many are slated for the past tense. Chronic and degenerative diseases are more recalcitrant, but incremental progress in many (such as cancer) has been accelerating, and breakthroughs in others (such as Alzheimer’s) are likely.

pages: 393 words: 91,257

The Coming of Neo-Feudalism: A Warning to the Global Middle Class
by Joel Kotkin
Published 11 May 2020

Stanford graduates had already founded Hewlett-Packard in 1939, and an engineering professor who became provost of the university, Frederick Terman, nurtured tech companies in the area.6 In the ensuing decades, the Bay Area, including San Francisco, became the world’s leading technology hub. This rapid technological growth resulted in a consolidation of wealth and power in a handful of companies. A relatively small cadre of engineers, data scientists, and marketers—a tiny sliver of humanity—began reshaping the world’s economy, and its culture as well.7 In the Middle Ages, the power of the nobility rested on the control of land and the right to bear arms; the power of today’s ascendant tech aristocracy comes mainly from exploiting “natural monopolies” in web-based business.

They will thereby create new godlings, who might be as diferent from us Sapiens as we are different from Homo erectus.38 Clearly the tech elites’ search for immortality does not address issues that affect those still living within nature’s limits. Someone needing assistance in a disaster is more likely to look toward a church member than a data scientist for help. Organized faiths at their best serve as powerful instruments of social improvement, with particular concern for the needy. The secular social justice warriors may be passionately committed to their causes, but often it is groups like the Baptists or the Church of Jesus Christ of Latter-Day Saints who come to the rescue faster and more effectively in a crisis.39 Religious institutions have long brought together people of disparate backgrounds and economic status, building social bonds between them and serving as unifying transmitters of tradition and cultural identity.

“By 2022, it’s possible that your personal device will know more about your emotional state than your own family,” said Annette Zimmermann, research vice president at the consulting company Gartner.13 This emotional reliance on technology provides more opportunity for the oligarchy and the clerisy to gain access to our inner feelings and profit from them.14 No matter how strongly a public relations staffer at Facebook or Google contends otherwise, the algorithms that govern social media are not neutral or objective, but reflect the assumptions of those who create the programs. “Algorithms are opinions embedded in code,” writes Cathy O’Neil, a data scientist.15 The most concerning effects of the new intrusive technology can be seen in younger people. Research published in 2017 by Jean Twenge, a psychologist at San Diego State University, indicates that more screen time and social media activity correlate with a higher rate of depression and elevated suicide risk among American adolescents.

pages: 661 words: 156,009

Your Computer Is on Fire
by Thomas S. Mullaney , Benjamin Peters , Mar Hicks and Kavita Philip
Published 9 Mar 2021

Technology analysts Luke Stark and Anna Hoffmann, in a mid-2019 opinion piece published while ethical debates raged in data science, suggested ways in which metaphors matter. Data-driven work is already complicit, they argue, “in perpetuating racist, sexist, and other oppressive harms.”4 Stark and Hoffmann argued, however, that a solution was hidden in the very articulation of this problem: “The language we use to describe data can also help us fix its problems.” They drew on a skill commonly believed appropriate only in literature departments: the analysis of metaphor, imagery, and other narrative tropes. This route to data ethics, they suggested, would help data scientists to be “explicit about the power dynamics and historical oppressions that shape our world.”

Often, the fact that data—which is the output on some level of human activity and thought—is not typically seen as a social construct by a whole host of data makers makes intervening upon the dirty or flawed data even more difficult.14 In fact, I would say this is a major point of contention when humanists and social scientists come together with colleagues from other domains to try to make sense of the output of these products and processes. Even social scientists currently engaging in data science projects, like anthropologists developing predictive policing technologies, are using logics and frameworks that are widely disputed as discriminatory.15 The concepts of the purity and neutrality of data are so deeply embedded in the training and discourses about what data is that there is great difficulty moving away from the reductionist argument that “math can’t discriminate because it’s math,” which patently avoids the issue of application of predictive mathematical modeling to the social dimensions of human experience.

pages: 364 words: 99,897

The Industries of the Future
by Alec Ross
Published 2 Feb 2016

In very competitive elections, the Obama campaign used big data to gain insights into how to raise money, where to campaign, and how to advertise, which none of its opponents could rival. From fund-raising to field operations to the analytics in its polling operation, a group of several hundred digital operatives and data scientists crushed their Republican opponents. In 2012, the Obama campaign’s voter targeting and turnout programs performed brilliantly, while the Romney campaign’s crashed. Over the course of the 2012 campaign, Obama’s 18-person email team tested over 10,000 versions of email messages. In one instance, the campaign ran 18 variations of a single email, all with different subject lines, to determine which would be most effective.

Google chairman Eric Schmidt recruited an Israeli entrepreneur, Dror Berman, to move to Silicon Valley and head up Innovation Endeavors, a large venture firm that invests Schmidt’s money. Israel is home to many of the 20th century’s great innovations in farming. Berman brought the intellectual curiosity about agriculture with him to Silicon Valley and developed Farm2050, a partnership that aspires to combine data science and robotics to improve farming with a group of partners as diverse as Google, DuPont, and 3D Robotics. Dror recognized that Silicon Valley can be a little too navel-gazing, and told me that 90 percent of the region’s entrepreneurs focus on 10 percent of the world’s problems. With Farm2050, he is trying to bring Silicon Valley’s A game to agriculture.

pages: 484 words: 104,873

Rise of the Robots: Technology and the Threat of a Jobless Future
by Martin Ford
Published 4 May 2015

Tools that provide new ways to visualize data collected from social media interactions as well as sensors built into doors, turnstiles, and escalators offer urban planners and city managers graphic representations of the way people move, work, and interact in urban environments, a development that may lead directly to more efficient and livable cities. There is a potential dark side, however. Target, Inc., provided a far more controversial example of the ways in which vast quantities of extraordinarily detailed customer data can be leveraged. A data scientist working for the company found a complex set of correlations involving the purchase of about twenty-five different health and cosmetic products that were a powerful early predictor of pregnancy. The company’s analysis could even estimate a woman’s due date with a high degree of accuracy. Target began bombarding women with offers for pregnancy-related products at such an early stage that, in some cases, the women had often not yet shared the news with their immediate families.

Quentin Hardy, “Active in Cloud, Amazon Reshapes Computing,” New York Times, August 27, 2012, http://www.nytimes.com/2012/08/28/technology/active-in-cloud-amazon-reshapes-computing.html. 31. Mark Stevenson, An Optimist’s Tour of the Future: One Curious Man Sets Out to Answer “What’s Next?” (New York: Penguin Group, 2011), p. 101. 32. Michael Schmidt and Hod Lipson, “Distilling Free-Form Natural Laws from Experimental Data,” Science 324 (April 3, 2009), http://creativemachines.cornell.edu/sites/default/files/Science09_Schmidt.pdf. 33. Stevenson, An Optimist’s Tour of the Future, p. 104. 34. National Science Foundation Press Release: “Maybe Robots Dream of Electric Sheep, But Can They Do Science?,” April 2, 2009, http://www.nsf.gov/mobile/news/news_summ.jsp?

pages: 688 words: 107,867

Python Data Analytics: With Pandas, NumPy, and Matplotlib
by Fabio Nelli
Published 27 Sep 2018

Population in 2014 Conclusions Chapter 12:​ Recognizing Handwritten Digits Handwriting Recognition Recognizing Handwritten Digits with scikit-learn The Digits Dataset Learning and Predicting Recognizing Handwritten Digits with TensorFlow Learning and Predicting Conclusions Chapter 13:​ Textual Data Analysis with NLTK Text Analysis Techniques The Natural Language Toolkit (NLTK) Import the NLTK Library and the NLTK Downloader Tool Search for a Word with NLTK Analyze the Frequency of Words Selection of Words from Text Bigrams and Collocations Use Text on the Network Extract the Text from the HTML Pages Sentimental Analysis Conclusions Chapter 14:​ Image Analysis and Computer Vision with OpenCV Image Analysis and Computer Vision OpenCV and Python OpenCV and Deep Learning Installing OpenCV First Approaches to Image Processing and Analysis Before Starting Load and Display an Image Working with Images Save the New Image Elementary Operations on Images Image Blending Image Analysis Edge Detection and Image Gradient Analysis Edge Detection The Image Gradient Theory A Practical Example of Edge Detection with the Image Gradient Analysis A Deep Learning Example:​ The Face Detection Conclusions Appendix A:​ Writing Mathematical Expressions with LaTeX With matplotlib With IPython Notebook in a Markdown Cell With IPython Notebook in a Python 2 Cell Subscripts and Superscripts Fractions, Binomials, and Stacked Numbers Radicals Fonts Accents Appendix B:​ Open Data Sources Political and Government Data Health Data Social Data Miscellaneous and Public Data Sets Financial Data Climatic Data Sports Data Publications, Newspapers, and Books Musical Data Index About the Author and About the Technical Reviewer About the Author Fabio Nelliis a data scientist and Python consultant, designing and developing Python applications for data analysis and visualization. He has experience with the scientific world, having performed various data analysis roles in pharmaceutical chemistry for private research companies and universities. He has been a computer consultant for many years at IBM, EDS, and Hewlett-Packard, along with several banks and insurance companies.

About the Technical Reviewer Raul Samayoa is a senior software developer and machine learning specialist with many years of experience in the financial industry. An MSc graduate from the Georgia Institute of Technology, he’s never met a neural network or dataset he did not like. He’s fond of evangelizing the use of DevOps tools for data science and software development. Raul enjoys the energy of his hometown of Toronto, Canada, where he runs marathons, volunteers as a technology instructor with the University of Toronto coders, and likes to work with data in Python and R. © Fabio Nelli 2018 Fabio NelliPython Data Analyticshttps://doi.org/10.1007/978-1-4842-3913-1_1 1.

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
by Ralph Kimball and Margy Ross
Published 30 Jun 2013

Here are a couple of examples to illustrate this recommendation: The production environment for custom analytic programming might be MatLab within PostgreSQL or SAS within a Teradata RDBMS, but the data scientists might be building their proofs of concept in a wide variety of their own preferred languages and architectures. The key insight here: IT must be uncharacteristically tolerant of the range of technologies the data scientists use and be prepared in many cases to re-implement the data scientists' work in a standard set of technologies that can be supported over the long haul. The sandbox development environment might be custom R code directly accessing Hadoop, but controlled by a metadata-driven driven ETL tool. Then when the data scientist is ready to hand over the proof of concept, much of the logic could immediately be redeployed under the ETL tool to run in a grid computing environment that is scalable, highly available, and secure.

Build From Sandbox Results Consider embracing sandbox silos and building a practice of productionizing sandbox results. Allow data scientists to construct their data experiments and prototypes using their preferred languages and programming environments. Then, after proof of concept, systematically reprogram these implementations with an IT turnover team. Here are a couple of examples to illustrate this recommendation: The production environment for custom analytic programming might be MatLab within PostgreSQL or SAS within a Teradata RDBMS, but the data scientists might be building their proofs of concept in a wide variety of their own preferred languages and architectures.

pages: 223 words: 60,909

Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech
by Sara Wachter-Boettcher
Published 9 Oct 2017

After all, if women interested in technology don’t exist, how could employers hire them? This is theoretical, sure: I don’t know how often Google got gender wrong back then, and I don’t know how much that affected the way the tech industry continued to be perceived. But that’s the problem: neither does Google. Proxies are naturally inexact, writes data scientist Cathy O’Neil in Weapons of Math Destruction. Even worse, they’re self-perpetuating: they “define their own reality and use it to justify their results.” 12 Now, Google doesn’t think I’m a man anymore. Sometime in the last five years, it sorted that out (not surprising, since Google now knows a lot more about me, including how often I shop for dresses and search for haircut ideas).

Mostly, the algorithm is somewhere in the middle: it finds just what you want a lot of the time, but sends you somewhere mediocre some of the time too. Yelp is also able to tune its model and improve results over time, by looking at things like how often users search, don’t like the results, and then search again. The algorithm is the core of Yelp’s product—it’s what connects users to businesses—so you can bet that data scientists are tweaking and refining this model all the time. A product like COMPAS, the criminal recidivism software, doesn’t just affect whether you opt for tacos or try a new ramen place tonight, though. It affects people’s lives: whether they can get bail, how long they will spend in prison, whether they’ll be eligible for parole.

pages: 205 words: 61,903

Survival of the Richest: Escape Fantasies of the Tech Billionaires
by Douglas Rushkoff
Published 7 Sep 2022

We would liberate ourselves from obsolete notions of God, and all become part of the same “supreme cybernetic system” he called Mind. Throughout the 1950s and 1960s, government and corporate leaders alike hoped that computers would offer new ways of measuring public opinion and then developing appropriate “mass communications’’ strategies for controlling all these people. Data scientists at companies from RAND to Simulmatics sought and failed to predict and steer the behavior of consumers and voters. It wasn’t until the first intentionally “sticky” websites in the mid-nineties—websites designed to keep users from surfing away—that digital technology provided the sort of controlled environment and live feedback mechanisms required to do operant conditioning en masse.

Skinner, Science and Human Behavior (New York: Macmillan, 1953). 103   “a servosystem coupled” : Fred Turner, The Democratic Surround: Multimedia and American Liberalism from World War II to the Psychedelic Sixties (Chicago: University of Chicago Press, 2013), 123. 103   “How would we rig” : Gregory Bateson, quoted in Mark Stahlman, “The Inner Senses and Human Engineering,” Dianoetikon 1 (2020): 1–26. 104   didn’t come off as nefarious : For the rich history of Bateson and Mead’s efforts in this regard, see Fred Turner’s terrific The Democratic Surround . 104   Data scientists : Jill Lepore, If Then: How the Simulmatics Corporation Invented the Future (New York: Liveright, 2020). 106   “the purpose of Behavior Design” : Stanford University, “Welcome | Behavior Design Lab,” https:// captology .stanford .edu /, accessed June 18, 2018. 106   Chatbots engage : “Smartphone App to Change Your Personality,” Das Fachportal für Biotechnologie , Pharma und Life Sciences , February 15, 2021, https:// www .bionity .com /en /news /1169863 /smartphone -app -to -change -your -personality .html. 107   Amazon incentivizes productivity : Nick Statt, “Amazon Expands Gamification Program That Encourages Warehouse Employees to Work Harder,” Verge , March 15, 2021, https:// www .theverge .com /2021 /3 /15 /22331502 /amazon -warehouse -gamification -program -expand -fc -games. 107   promote environmentally friendly behavior : Markus Brauer and Benjamin D.

pages: 447 words: 111,991

Exponential: How Accelerating Technology Is Leaving Us Behind and What to Do About It
by Azeem Azhar
Published 6 Sep 2021

She started her project over the summer holidays, learning how to build and fine-tune convolutional neural networks and how to find and clean the data. Fortunately, Herlev Hospital in Denmark had an open-source data set of cervical smears she could use. It wasn’t straightforward. The data set was, in the jargon of a data scientist, unbalanced. It contained too many supposedly abnormal, potentially cancerous screens, and not enough healthy ones. Real-world data would be the reverse: most women have healthy smears, and a tiny number have problematic ones. This kind of inconsistency, or lack of balance, in the data Laura was using could cause problems for her system.

By 2013, Alipay had become the world’s largest mobile payment service, and its parent company spun the business out as Ant Financial. This was the first step in horizontal expansion: giving this new finance company its own autonomy. Ant Financial drew upon the huge amounts of transaction data Alibaba had been collecting, channelling the power of data science – through what it described in hyperbolic marketing speak as the ‘Ant Brain’ – to become an ever more dominant force. It was a classic example of the power of network effects: the more data Alibaba had, the more powerful and effective it became; the more powerful and effective it became, the more customers joined up – and the more data it had.

pages: 425 words: 112,220

The Messy Middle: Finding Your Way Through the Hardest and Most Crucial Part of Any Bold Venture
by Scott Belsky
Published 1 Oct 2018

Of course, to build upon ideas, everyone must understand them. Seek people who make the impossible-to-understand more accessible. One of the greater challenges leaders face on the hiring front is evaluating people with a different technical expertise. For example, how do you evaluate the skills of a cryptocurrency expert or a data scientist if you have no expertise in either one? Sure, you can get third-party opinions from others in the industry, but sometimes recruitment is confidential and your candidates have jobs elsewhere, which restricts how many people you can involve in the recruitment process. But all skills, no matter how scientific, can be explained in layman’s terms—it’s just extremely hard to do it.

The word delegate suggests that a leader is single-handedly deciding who should do what, assigning tasks, and then holding everyone responsible as any prototypical boss would. But among high-performing teams, delegation is as much sought as it is received. In such teams, there is a genuine collective drive to free up those with the rarest or least scalable talents. For example, you want your data science experts or programmers to be analyzing data and programming—not expending their precious energy on administrative work. If everyone is aligned with the mission and the market forces and is determined to do whatever they can to make the greatest impact, then the pressure to delegate should come from below as well as above.

Exploring Everyday Things with R and Ruby
by Sau Sheong Chang
Published 27 Jun 2012

Introducing R Programmers are trained in logic, and our daily work mostly involves controlling and moving bits and bytes around. So when we’re faced with a chunk of data and asked to do something with it, our reactions usually involve either bolting for the nearest exit or stuffing the data into a relational database and running SQL SELECT statements on it. I’m exaggerating, of course. Most, if not all, data scientists are also programmers, and you can hardly get away with data analysis without doing some programming work. However, not all programming platforms and languages are suitable for data analysis and manipulation. There are a number of languages built for this rather specialized purpose, including MATLAB and S, as well as packages like SAS and SPSS.

The other reason why R is getting increasingly popular is that it is free. The existing batch of tools for data analysis—S, MATLAB, SPSS, and SAS—can be quite expensive, and R is a cost-effective way to achieve the same goals. Also, R has a very vibrant and active community of domain experts and developers, including statisticians and data scientists who contribute many very useful packages that enhance its overall capabilities. R is available in most major platforms, and installing it is quite straightforward. Just visit the R website (http://www.r-project.org/), download the necessary binaries or installer for your platform, and then install it accordingly.

pages: 52 words: 14,333

Growth Hacker Marketing: A Primer on the Future of PR, Marketing, and Advertising
by Ryan Holiday
Published 2 Sep 2013

Whereas marketing was once brand based, with growth hacking it becomes metric and ROI driven. Suddenly, finding customers and getting attention for your product become no longer a guessing game. But this is more than just marketing with better metrics. Growth hackers trace their roots back to programmers—and that’s how they see themselves. They are data scientists meets design fiends meets marketers. They welcome this information, process it and utilize it differently, and see it as desperately needed clarity in a world that has been dominated by gut instincts and artistic preference for too long. Ultimately that’s why this new approach is better suited to the future.

pages: 533 words: 125,495

Rationality: What It Is, Why It Seems Scarce, Why It Matters
by Steven Pinker
Published 14 Oct 2021

Paul Slovic, a collaborator of Tversky and Kahneman, showed that people also overestimate the danger from threats that are novel (the devil they don’t know instead of the devil they do), out of their control (as if they can drive more safely than a pilot can fly), human-made (so they avoid genetically modified foods but swallow the many toxins that evolved naturally in plants), and inequitable (when they feel they assume a risk for another’s gain).23 When these bugbears combine with the prospect of a disaster that kills many people at once, the sum of all fears becomes a dread risk. Plane crashes, nuclear meltdowns, and terrorist attacks are prime examples. * * * • • • Terrorism, like other losses of life with malice aforethought, brews up a different chemistry of fear. Body-counting data scientists are often perplexed at the way that highly publicized but low-casualty killings can lead to epochal societal reactions. The worst terrorist attack in history by far was 9/11, and it claimed 3,000 lives; in most bad years, the United States suffers a few dozen terrorist deaths, a rounding error in the tally of homicides and accidents.

Increasingly, “randomistas” are urging policymakers to test their nostrums in one set of randomly selected villages, classes, or neighborhoods, and compare the results against a control group which is put on a waitlist or given some meaningless make-work program.26 The knowledge gained is likely to outperform traditional ways of evaluating policies, like dogma, folklore, charisma, conventional wisdom, and HiPPO (highest-paid person’s opinion). Randomized experiments are no panacea (since nothing is a panacea, which is a good reason to retire that cliché). Laboratory scientists snipe at each other as much as correlational data scientists, because even in an experiment you can’t do just one thing. Experimenters may think that they have administered a treatment and only that treatment to the experimental group, but other variables may be confounded with it, a problem called excludability. According to a joke, a sexually unfulfilled couple consults a rabbi with their problem, since it is written in the Talmud that a husband is responsible for his wife’s sexual pleasure.

While a low channel number (I) can cause people to watch Fox News (A), and watching Fox News may or may not cause them to vote Republican (B), neither having conservative views (C) nor voting Republican can cause someone’s favorite television station to skitter down the cable dial. Sure enough, in a comparison across cable markets, the lower the channel number of Fox News relative to other news networks, the larger the Republican vote.29 From Correlation to Causation without Experimentation When a data scientist finds a regression discontinuity or an instrumental variable, it’s a really good day. But more often they have to squeeze what causation they can out of the usual correlational tangle. All is not lost, though, because there are palliatives for each of the ailments that enfeeble causal inference.

pages: 240 words: 74,182

This Is Not Propaganda: Adventures in the War Against Reality
by Peter Pomerantsev
Published 29 Jul 2019

Throughout the book I will travel, some of the time through space, but not always. The physical and political maps delineating continents, countries and oceans, the maps I grew up with, can be less important than the new maps of information flows. These ‘network maps’ are generated by data scientists. They call the process ‘surfacing’. One takes a keyword, a message, a narrative and casts it into the ever-expanding pool of the world’s data. The data scientist then ‘surfaces’ the people, media outlets, social media accounts, bots, trolls and cyborgs pushing or interacting with those keywords, narratives and messages. These network maps, which look like fields of pin mould or photographs of distant galaxies, show how outdated our geographic definitions are, revealing unexpected constellations where anyone from anywhere can influence everyone everywhere.

pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future
by Andrew McAfee and Erik Brynjolfsson
Published 26 Jun 2017

The members of Topcoder’s global community include not only programmers, but also people who identify as designers, students, data scientists, and physicists. Topcoder offers this crowd a series of corporate projects, lets them self-select into teams and into roles, stitches all their work together, and monitors quality. It uses monetary and competitive rewards, along with a bit of oversight, to create Linux-like efforts for its clients. Kaggle does the same thing for data science competitions. Finding the right resource. Sometimes you don’t want to bring together an entire crowd; you simply want to find, as quickly and efficiently as possible, the right person or team to help with something.

pages: 424 words: 123,180

Democracy's Data: The Hidden Stories in the U.S. Census and How to Read Them
by Dan Bouk
Published 22 Aug 2022

Reading the data deeply now, we can see her and acknowledge her, and her choice to live outside the bounds of a patriarchal household, and the unseen negotiations that made her, maybe only briefly, a “partner.” MAPS REVEAL MARGINS, WHERE PARTNERS CLUSTER This map, prepared for me by the data scientist Stephanie Jordan, nicely illustrates the occurrence of “partner” labels across the nation (but excluding territories, like Hawaii): they come in clusters. I went looking for “partners” in marginal (edgy) neighborhoods, which could be seen on maps created by risk evaluators for the Home Owners’ Loan Corporation (HOLC) to support federal mortgage lending.

At Johns Hopkins University, a team of digital humanists (including Kim Gallon, Jeremy Greene, Jessica Marie Johnson, and Alexandré White) in the Black Beyond Data project work to uncover the racist histories of data sets while also imagining new and better ways to make meaningful data about Black lives. 11.  A useful text for those who want a more thorough introduction to doing data (science) differently is D’Ignazio and Klein, Data Feminism. 12.  Jesse Jones to Franklin D. Roosevelt, November 29, 1940, in Folder “Speeches, Articles, and Papers,” Box 4, Entry 229, RG 29, NARA, D.C. 13.  For the history of apportionment and the methods used, see Michel L. Balinski and H. Peyton Young, Fair Representation: Meeting the Ideal of One Man, One Vote (Washington, D.C.: Brookings Institution Press, 2001). 14.  

pages: 301 words: 89,076

The Globotics Upheaval: Globalisation, Robotics and the Future of Work
by Richard Baldwin
Published 10 Jan 2019

A recent study by the consulting firm, Forrester, suggest that 16 percent of all US jobs will be displaced by automation in the next ten years.9 That is one out of every six jobs. The professions hardest hit are forecast to be those that employ office workers. Forrester, however, notes that about half of the job destruction will be matched by job creation equal to 9 percent of today’s jobs. The study points to “robot monitoring professionals,” data scientists, automation specialists, and content curators as the biggest sources of new tech-related jobs. On net, Forrester forecasts that the impact will be a loss of 7 percent of jobs. That is still one out of every fourteen jobs. A recent World Economic Forum study, which is based on a survey of high-level corporate human resource types, put the number much lower.

Another aspect of RPA may dial-up the outrage factor even more. The workers being replaced will be training their robot replacements. Here is how one RPA software company explains it. “WorkFusion automates the time-consuming process of training and selecting machine learning algorithms . . . WorkFusion’s Virtual Data Scientist uses historical data and real-time human actions to train models to automate judgment work in a business process, like categorizing and extracting unstructured information.” This thing, in other words, is a white-collar robot that figures out what parts of the job can be done by a white-collar robot.

AI 2041: Ten Visions for Our Future
by Kai-Fu Lee and Qiufan Chen
Published 13 Sep 2021

For example, the future doctor will still be the primary point of contact trusted by the patient but will rely on AI diagnostic tools to determine the best treatment. This will redirect the doctor’s role to that of a compassionate caregiver, giving them more time with their patients. Just as the mobile Internet led to roles like the Uber driver, the coming of AI will create jobs we cannot even conceive of yet. Examples today include AI engineers, data scientists, data-labelers, and robot mechanics. But we don’t yet know and cannot predict many of these new professions, just as in 2001 we couldn’t have known about Uber drivers. We should watch for the emergence of these roles, make people aware of them, and provide training for them. RENAISSANCE Finally, with the right training and the right tools, we can expect an AI-led renaissance that will enable and celebrate creativity, compassion, and humanity.

As a result, human life expectancy increased from thirty-one years in 1900 to seventy-two years in 2017. Today, I believe we are at the cusp of another revolution for healthcare, in which digitization will enable the application of all data technologies from computing, communications, mobile, robotics, data science, and, most important, AI. First, existing healthcare databases and processes will be digitized, including patient records, drug efficacy, medical instruments, wearable devices, clinical trials, quality-of-care surveillance, infectious-disease-spread data, as well as supplies of drugs and vaccines.

pages: 330 words: 91,805

Peers Inc: How People and Platforms Are Inventing the Collaborative Economy and Reinventing Capitalism
by Robin Chase
Published 14 May 2015

The challenges included tracking asteroids given a specific set of NASA data; delivering email and calendar updates between Earth and the International Space Station (44,000 miles distant) reliably, safely, and securely; tracking food intake for space travelers; and using algorithms to crunch seventeen years’ worth of data from the Saturn-orbiting Cassini rocket and uncover interesting patterns in ring phenomena and structure, or detect new moons. The TopCoder community, consisting of 630,000 data scientists, developers, and designers were offered up these and other challenges in 2013 and 2014.30 So far, NASA has received thousands of different submissions from more than twenty countries. By mid-2014, the contest winners had taken home over $1.5 million. Jason Crusan, director of the Advanced Exploration Systems Division, said that “tapping into a diverse pool of the world’s top technical talent has not only resulted in new and innovative ways to advance technologies to further space exploration, but has also led to a whole new way of thinking for NASA, and other government agencies, providing us with an additional set of on-demand tools to tackle complex projects.”31 Karim Lakhani, who has extensively investigated the way communities and contests can be used for innovation and has run large-scale experiments for both Harvard Medical School and NASA, told me that his analysis of more than 150 scientific contests revealed that “the best solutions came from solvers who had expertise that was quite far from the problem domain.

I’ll talk more in Chapter 7 about how and when government might play a role in protecting the rights of peers. Last year Facebook experienced a public relations fiasco: the aftermath from the results of an experiment it had let researchers conduct inside the Facebook community. For one week in January 2012, data scientists skewed the news feed of almost 700,000 Facebook users so that they saw either happier or sadder news. At the end of the week, those who had seen happier news feeds themselves posted more upbeat status updates; those who had seen the more pessimistic news feeds posted more negative updates. The report, “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks,” was published in the June 2014 issue of the Proceedings of the National Academy of Sciences of the United States of America.32 Danah Boyd, a principal researcher at Microsoft in the area of social media, commented on the fallout that had filled Twitter, Facebook, blogs, and the mainstream media for days: “What’s at stake is the underlying dynamic of how Facebook runs its business, operates its system, and makes decisions that have nothing to do with how its users want Facebook to operate.

pages: 358 words: 93,969

Climate Change
by Joseph Romm
Published 3 Dec 2015

If you fill these data gaps using satellite measurements, the warming trend is more than doubled in the widely used HadCRUT4 data, and the much-discussed “warming pause” has virtually disappeared. Figure 1.3 The corrected data (bold lines) are shown compared to the uncorrected ones (thin lines). Source: Kevin Cowtan and Robert Way When you include all of the data scientists have (through 2012), surface air temperatures have continued to rise globally in the last decade (see Figure 1.3), but at what appears to be a slightly slower rate than in previous decades. Why is that? A 2011 study removed the “noise” of natural climate variability from the temperature record to reveal the true global warming signal.11 That noise is “the estimated impact of known factors on short-term temperature variations (El Niño/southern oscillation, volcanic aerosols and solar variability).”

However, in fact, the coming multidecadal megadroughts will be much worse than the Dust Bowl of the 1930—“worse than anything seen during the last 2000 years,” as a major 2014 Cornell-led study put it. They will be the kind of megadroughts that in the past destroyed entire civilizations.30 In that 2014 Journal of Climate study, “Assessing the Risk of Persistent Drought Using Climate Model Simulations and Paleoclimate Data,” scientists quantified the risk of devastating, prolonged drought in the southwestern U.S. and the world due to global warming. Researchers from Cornell, University of Arizona, and the U.S. Geological Survey concluded “the risk of a decade-scale megadrought in the coming century [in the Southwest] is at least 80%, and may be higher than 90% in certain areas.”

pages: 98 words: 25,753

Ethics of Big Data: Balancing Risk and Innovation
by Kord Davis and Doug Patterson
Published 30 Dec 2011

They run a Hadoop cluster of nearly 100 machines, process near real-time analytics reporting with Pentaho, and are experimenting with ways to enhance their customers’ ability to analyze their own datasets using R for statistical analysis and graphics. Their combined customer data sets exceed 100 terabytes and are growing daily. Further, they are especially excited about a powerful new opportunity their data scientists have uncovered that would integrate some of the data in their customers’ databases with other customer data to enhance and expand the value of the services they offer for everyone. They are aware, however, that performing such correlations must be done in a highly secure environment, and a rigorous test plan is designed and implemented.

pages: 317 words: 100,414

Superforecasting: The Art and Science of Prediction
by Philip Tetlock and Dan Gardner
Published 14 Sep 2015

It would be facile to reduce superforecasting to a bumper-sticker slogan, but if I had to, that would be it. 6 Superquants? We live in the era of Big Data. Vast, proliferating information-technology networks churn out staggering quantities of information that can be analyzed by data scientists armed with powerful computers and arcane math. Order and meaning are extracted. Reality is seen and foreseen like never before. And most of us—let’s be honest with ourselves—don’t have the dimmest idea how data scientists do what they do. We find it a little intimidating, if not dazzling. As the scientist and science fiction writer Arthur C. Clarke famously observed, “Any sufficiently advanced technology is indistinguishable from magic.”

pages: 391 words: 99,963

The Weather of the Future
by Heidi Cullen
Published 2 Aug 2010

Given the 1.3°F of warming we’ve already put into the system, it’s a temperature range that could easily be in the cards during this century. A fundamental problem is that existing models of ice sheets are unable to explain the speed of the recent changes in the GIS that GRACE and IceSat are observing. In other words, the models cannot reproduce the data. Scientists such as Scott Luthcke are seeing things happen in Greenland right now that, technically, the models don’t show as happening for another thirty years.22 Even if some temperature threshold is passed, the IPCC gives a 1,000-year timescale for a total collapse of the GIS. But, given the inability of current models to simulate the rapid disappearance of continental ice right now, let alone at the end of the last ice age, a lower limit of 300 years is conceivable.23 I met up with Steffensen, Severinghaus, and other scientists from the NEEM project in Kangerlussuaq, a former Cold War outpost for the U.S.

Climate Central used a common technique to translate large-scale climate information from the computer models to provide useful information about local and regional conditions. This method involves calculating differences between time series data from current and future global climate model simulations and then adding these changes to time series of observed climate data. Scientists at Climate Central first identified weather observation stations closest to each city, as well as the closest point in the output of computer models, which is known as a grid point. For the station data, Climate Central examined temperature information for the summer months (June, July, and August) during two twenty-year periods to determine how extreme heat events have evolved during the twentieth century.

pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World
by Peter H. Diamandis and Steven Kotler
Published 3 Feb 2015

A great example of this is TopCoder (www.topcoder.com). You’ve probably heard about hackathons—those mysterious tournaments where coders compete to see who can hack together the best piece of software in a weekend. Well, with TopCoder, now you can have over 600,000 developers, designers, and data scientists hacking away to create solutions just for you. In fields like software and algorithm development, where there are many ways to solve a problem, having multiple submissions lets you compare performance metrics and choose the best one. Or take Gigwalk, a crowdsourced information-gathering platform that pays a small denomination to incentivize the crowd (i.e., anyone who has the Gigwalk app) to perform a simple task at a particular place and time.

Unfortunately, not everyone knows how to tease out valuable insights from this deluge. Enter companies like Kaggle (www.kaggle.com) and TopCoder (www.topcoder.com), both of which are crowdsourcing, data-mining competition platforms that allow you to define your goal/desired insight, set a monetary prize, upload your data, and watch as hordes of data scientists (tens of thousands, to be exact) figure out the best way to sort through it. The best algorithm wins. The reward levels vary from kudos or zero dollars to hundreds of thousands of dollars from bigger companies. And for exponential entrepreneurs, not relying on the advantages of data is no longer an option.

Forward: Notes on the Future of Our Democracy
by Andrew Yang
Published 15 Nov 2021

They found that a false story was much more likely to go viral; fake news was six times faster to reach fifteen hundred people than something accurate. This was the case in every subject area—business, foreign affairs, science, and technology. “It seems to be pretty clear that false information outperforms true information,” Soroush Vosoughi, an MIT data scientist who led the study, told a reporter for The Atlantic. The tendency seemed most acute in one subject category: political content. “The key takeaway,” Rebekah Tromble, a professor of political science, told The Atlantic, “is really that content that arouses strong emotions spreads further, faster, more deeply, and more broadly.”

Against this backdrop and facing such a high set of standards, the fact that citizens have won more than $1 billion in civil judgments against police departments across the country per year in recent years is staggering, and evidence that the true scope of police damages against citizens is some multiple billions of dollars per year. In 2018 there were 686,665 police officers in eighteen thousand local departments across the country, from the tiniest police department in rural America to the NYPD. How can one meaningfully reform behaviors nationwide? Samuel Sinyangwe, co-founder of Campaign Zero, is a data scientist who has been researching police violence data and different policy responses for years. He has identified a number of changes that correspond to lower loss of life in encounters with police. The first is direct and obvious: more restrictive rules and laws governing use of force. Police departments have rules and guidelines as to what techniques they can use in different situations.

The Knowledge Machine: How Irrationality Created Modern Science
by Michael Strevens
Published 12 Oct 2020

Over the past few decades, the answers have come in. They are almost entirely negative. There is little evidence, as you will see, for a dispassionate Popperian critical spirit, but also little evidence for universal subservience to a paradigm. Indeed, in their thinking about the connection between theory and data, scientists seem scarcely to follow any rules at all. CHAPTER 2 Human Frailty Scientists are too contentious and too morally and intellectually fragile to follow any method consistently. AS THE MOON’S DISK CREPT across the face of the sun on May 29, 1919, a new science of gravity hung in the balance.

Eddington had to make a choice. Discount the astrographic data? Overlook the 4-inch discrepancy? Declare the experiment to be inconclusive? He did not have enough information to single out an obviously correct answer. So he followed his instincts. Eddington’s situation was not at all unusual. In the interpretation of data, scientists often have great room for maneuver and all too seldom have unambiguous guidance as to which maneuvers are objectively right and wrong. The room for maneuver exists because, as the eclipse experiment shows, theories in themselves do not make predictions about what will be observed. To say anything at all about the experimental outcome—about, say, the position of spots on a photographic plate—theories must be supported and helped along by other posits, other presumptions about the proper functioning of the experimental apparatus, the suitability of the background conditions, and more.

pages: 285 words: 98,832

The Premonition: A Pandemic Story
by Michael Lewis
Published 3 May 2021

Newsom’s economic adviser called Park immediately and asked him if he could help the state figure out what to do about the coronavirus. Park recruited a pair of former Obama administration officials: Bob Kocher, a doctor turned venture capitalist who had advised Obama about health care, and DJ Patil,* who had served as the country’s first chief data scientist. Patil pulled together a team of some of the best programmers in Silicon Valley, and the team instantly began to collect data that would help them to project and predict. In a couple of days, they had everything from the number of beds in intensive care units to data from toll booths and cell phone companies that gave them a feel for how people moved around inside the state.

And as she waited, her governor, in whom she still had faith, did the sort of thing that might have given her even more of it. He called the Red Phone. * I wrote about DJ Patil in The Fifth Risk. Working with a friend at LinkedIn, and needing a description for a new kind of job in the economy, DJ had coined the phrase “data scientist.” † And not just in California. The two other states that moved most quickly to shut down, Ohio and Maryland, had also paid close attention to Carter’s analysis. ‡ Slavitt renamed the plan “Victory over COVID-19” and presented it to Kushner as his own. PART III TEN The Bug in the System The Red Phone had always been a less than perfectly efficient tool for saving lives.

pages: 116 words: 31,356

Platform Capitalism
by Nick Srnicek
Published 22 Dec 2016

There is a convergence of surveillance and profit making in the digital economy, which leads some to speak of ‘surveillance capitalism’.27 Key to revenues, however, is not just the collection of data, but also the analysis of data. Advertisers are interested less in unorganised data and more in data that give them insights or match them to likely consumers. These are data that have been worked on.28 They have had some process applied to them, whether through the skilled labour of a data scientist or the automated labour of a machine-learning algorithm. What is sold to advertisers is therefore not the data themselves (advertisers do not receive personalised data), but rather the promise that Google’s software will adeptly match an advertiser with the correct users when needed. While the data extraction model has been prominent in the online world, it has also migrated into the offline world.

pages: 416 words: 108,370

Hit Makers: The Science of Popularity in an Age of Distraction
by Derek Thompson
Published 7 Feb 2017

It’s more like an orchestra of dozens and dozens of formulas that are conducted by a metaformula. One of the most important instruments in this algorithmic symphony is familiarity. “The most common complaint about Pandora is that there is too much repetition of bands and songs,” said Eric Bieschke, the first chief data scientist at Pandora. “Preferences for familiarity are much more individual than I would have thought. You can play the exact same songs to two people with the same tastes in music. One will consider the station perfectly familiar, and the other will consider it horribly repetitive.” There are two fascinating implications here.

Facebook has an advantage over the Iowa Method and basically every other company in the world when it comes to understanding people. In psychological studies, “reactivity” is the notion that when people are aware that they’re being watched, they change their behavior. On Facebook, however, it’s unlikely that most people are in a constant state of nervous self-monitoring, lest Facebook’s data scientists know that they like videos of red pandas. Facebook can watch readers without readers’ explicit awareness that they are under surveillance. This ought to afford a fairly accurate understanding of what people really want to read. The most obvious thing that Facebook can tell is that reader preferences are a mosaic within countries and around the world.

pages: 382 words: 105,819

Zucked: Waking Up to the Facebook Catastrophe
by Roger McNamee
Published 1 Jan 2019

When I met Tim a few days later, he helped me understand the role of state attorneys general in the legal system and the kind of evidence that would be necessary to make a case. Over the ensuing six months, he organized a series of meetings with staff in Schneiderman’s office, a truly impressive group of people. We did not have to explain internet platforms to them. Not only did the New York AG’s office understand the internet, they had data scientists who could perform forensics. The AG’s office had the skills and experience to handle the most complex cases. In time, we would furnish them with whistle-blowers, as well as insights. By April 2018, thirty-seven state attorneys general had begun investigations of Facebook. 6 Congress Gets Serious Technological progress has merely provided us with more efficient means for going backwards.

New Knowledge also created Hamilton 68, the public dashboard that tracks Russian disinformation on Twitter. Sponsored by the German Marshall Fund and introduced on August 2, 2017, Hamilton 68 enables anyone to track what pro-Kremlin Twitter accounts are discussing and promoting. Renée is also director of policy of Data for Democracy, whose mission is “to be an inclusive community of data scientists and technologists to volunteer and collaborate on projects that make a positive impact on society.” Renée’s own focus is on analysis of efforts by bad actors to subvert democracy around the world. Unlike us, Renée was a pro in the world of election security. She and her colleagues had heard whispers of Russian interference efforts in 2015 but had struggled to get the authorities to take action.

pages: 380 words: 109,724

Don't Be Evil: How Big Tech Betrayed Its Founding Principles--And All of US
by Rana Foroohar
Published 5 Nov 2019

They adhered strictly to the maxim that says it’s better to ask for forgiveness than to beg for permission—though in truth they weren’t really doing either. It’s an attitude of entitlement that still exists today, even after all the events of the past few years. In 2018, while attending a major economics conference, I was stuck in a cab with a Google data scientist, who expressed envy at the amount of surveillance that Chinese companies are allowed to conduct on citizens, and the vast amount of data it produces. She seemed genuinely outraged about the fact that the university where she was conducting AI research had apparently allowed her to put just a handful of data-recording sensors around campus to collect information that could then be used in her research.

As Shoshana Zuboff has written, in the sort of surveillance capitalism practiced by Google and other Big Tech firms, “Contract and the rule of law are supplanted by the rewards and punishments of a new kind of invisible hand,”42 the algorithmic hand of Silicon Valley. Varian and his team were unique, and foreshadowed an era in which most big companies would hire data scientists and data economists in great numbers. The existing laws that governed commerce were, like most laws in the view of Big Tech, made to be broken. Stewards of Trust? To be fair, pioneers like Varian have acknowledged a number of downsides of this new networked business model being pursued by Google and numerous other Silicon Valley giants, even the big one: privacy.

pages: 386 words: 112,064

Rich White Men: What It Takes to Uproot the Old Boys' Club and Transform America
by Garrett Neiman
Published 19 Jun 2023

How was I different from an equally intelligent student in a high-poverty neighborhood whose teachers may have been too overwhelmed to offer such advocacy? How could I account for the fact that—as Johns Hopkins researchers found—white teachers like mine believe white students have more potential?6 Or the analysis of former Google data scientist Seth Stephens-Davidowitz, who found that in their Google searches parents are two and a half times more likely to ask “Is my son gifted?” than “Is my daughter gifted?” which suggests that many parents see their sons as more intelligent or—at the very least—are more invested in their sons’ being intelligent because intelligence typically offers more status and financial rewards for men than it does for women.7 It is true that if the GATE test had included just the math and verbal sections—without the spatial component—I would have passed with flying colors.

While the typical student speaks in every other class session, Paul averaged five comments every class. That was ten times as often as the typical student, and thirty times as often as less vocal students. The desire to hear from everyone is about more than an appetite for novelty. In his 2014 book Social Physics, MIT data scientist Alex Pentland studies teams and groups. In his research, Pentland found that groups where a few people dominate the conversation are less collectively intelligent than groups where more people contribute. “The largest factor in predicting group intelligence,” Pentland writes, “was the equality of conversational turn taking.”2 Why?

pages: 151 words: 39,757

Ten Arguments for Deleting Your Social Media Accounts Right Now
by Jaron Lanier
Published 28 May 2018

Companies like Facebook, Google, and Twitter are finally trying to fix some of the massive problems they created, albeit in a piecemeal way. Is it because they are being pressured or because they feel that it’s the right thing to do? Probably a little of both. The companies are changing policies, hiring humans to monitor what’s going on, and hiring data scientists to come up with algorithms to avoid the worst failings. Facebook’s old mantra was “Move fast and break things,”3 and now they’re coming up with better mantras and picking up a few pieces from a shattered world and gluing them together. This book will argue that the companies on their own can’t do enough to glue the world back together.

pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy
by Sharon Bertsch McGrayne
Published 16 May 2011

17 Elaborating on Jeffreys, Savage answered as follows: as the amount of data increases, subjectivists move into agreement, the way scientists come to a consensus as evidence accumulates about, say, the greenhouse effect or about cigarettes being the leading cause of lung cancer. When they have little data, scientists disagree and are subjectivists; when they have piles of data, they agree and become objectivists. Lindley agreed: “That’s the way science is done.”18 But when Savage trumpeted the mathematical treatment of personal opinion, no one—not even he and Lindley—realized yet that he had written the Bayesian Bible.

[on a] clear way to treat uncertainty. . . . In certain circumstances, a population might go extinct before a significant decline could be detected.”22 During the administration of Bill Clinton, the Wildlife Protection Act was amended to accept Bayesian analyses alerting conservationists early to the need for more data. Scientists advising the International Whaling Commission were particularly worried about the uncertainty of their measurements. Each year the commission establishes the number of endangered bowhead whales Eskimos can hunt in Arctic seas. To ensure the long-term survival of the bowheads, scientists compute 2 numbers each year: the number of bowheads and their rate of increase.

pages: 521 words: 118,183

The Wires of War: Technology and the Global Struggle for Power
by Jacob Helberg
Published 11 Oct 2021

Google has a number of discrete news products—from the Google News tab on your web browser, to the news feed you see when you swipe down on your Android phone, to the audio news you hear if you ask the Google Assistant, “Okay, Google, read me the news.” To most consumers, these products probably seem like different features. Yet each one is made possible by an unseen team of designers, engineers, data scientists, and marketing professionals. Unlike the organic ten blue links, which are meant to be a reflection of the web, news features are more tightly curated and subject to stricter policies for content that Google labels or designates as “news.” And each of these products, potentially, was a fissure into which Moscow might have inserted its perverse propaganda.

Then, a month later, new revelations turned the tech world upside down. On March 17, the Guardian and the New York Times simultaneously reported an explosive series of stories, based on whistleblower accounts, about a company called Cambridge Analytica and its work with the Trump campaign.68 According to Brittany Kaiser, one of the whistleblowers, the data scientists at Cambridge Analytica had taken advantage of lax privacy laws and Facebook loopholes to “scrape” up to 5,000 data points on every American older than eighteen—approximately 240 million people.69 This included data from public posts and ostensibly private direct messages. Most egregiously, perhaps, users who agreed to the terms and services of a third-party app (like Candy Crush) consented to provide not only their own data but their friends’.

Uncontrolled Spread: Why COVID-19 Crushed Us and How We Can Defeat the Next Pandemic
by Scott Gottlieb
Published 20 Sep 2021

The CDC and HHS had drafted a plan to bring their systems into compliance with the legislation, but “the actions identified in the implementation plan did not address all of the requirements defined by the law,” the GAO concluded, and “as of May 2017, HHS had made limited progress toward establishing the required electronic public health situational awareness network capabilities.”31 During COVID, it became evident that the CDC didn’t have the basic tools of data management for public health decision making. They didn’t have the advanced data analytics needed to do the evaluations that were required, and they didn’t have the right integration for electronic capture of information from health records. They didn’t have a large group of data scientists and modelers. They outsourced most of these analytical tasks to academic partners. And the problems weren’t just on the back end in how the CDC captured and analyzed data, but also on the front end, in how they developed the raw information, particularly in cases where the agency had the primary responsibility for generating a body of evidence.

The intelligence community, by contrast, has a prospective mind-set; it’s always scanning the horizon. Intelligence agencies have to make a call on future threats. They’re willing to be wrong, but they’re compelled by the nature of their work to make predictions. Even if the CDC were in the business of making assessments on future threats, it largely lacks the data science capabilities and the advanced analytics required to do this sort of forecasting of risk. America’s dismal experience with COVID leaves us little choice but to expand the tools we use to inform us of new risks. In bolstering our pandemic preparedness, our purpose shouldn’t be merely to blunt the impact of the next pathogen that emerges, but to make sure that a calamity on the scale of COVID can never happen again, and the US can never be threatened in this way again.

pages: 160 words: 45,516

Tomorrow's Lawyers: An Introduction to Your Future
by Richard Susskind
Published 10 Jan 2013

For example, by aggregating search data, we might be able to find out what legal issues and concerns are troubling particular communities; by analysing databases of decisions by judges and regulators, we may be able to predict outcomes in entirely novel ways; and by collecting huge bodies of commercial contracts and exchanges of emails, we might gain insight into the greatest legal risks that specific sectors face. The disruption here is that crucial legal insights, correlations, and even algorithms might come to play a central role in legal practice and legal risk management and yet they will not be generated through the work of mainstream lawyers (unless they choose to collaborate with big data scientists). AI-Based Problem-Solving If IBM’s Watson (an artificially intelligent computer system designed to compete on the US TV quiz show Jeopardy!) is able publicly to beat the two finest human competitors, then the days of online problem-solving by computer are not very far away. And when we enter that era, and we apply the same techniques and technologies in law, then we will have AI-based legal problem-solving.

pages: 184 words: 46,395

The Choice Factory: 25 Behavioural Biases That Influence What We Buy
by Richard Shotton
Published 12 Feb 2018

Search is the most accessible found data source. Analysing search data provides insights that consumers might be loath to admit in a survey. Consider sexism. Most people would claim that they’re equally interested in their children’s intelligence, regardless of gender. However, Seth Stephens-Davidowitz, the New York Times journalist and data scientist, has analysed US search data and found that parents are two and a half times more likely to Google “is my son gifted?” than “is my daughter gifted?”. Google acts as a modern confessional in which all our darkest thoughts are captured. However, this rich seam of data is too rarely mined by advertisers.

pages: 444 words: 127,259

Super Pumped: The Battle for Uber
by Mike Isaac
Published 2 Sep 2019

One employee coined the term “rides of glory” to describe the Uber trip a customer takes home the morning after a one-night-stand. “In times of yore, you would have woken up in a panic, scrambling in the dark trying to find your fur coat or velvet smoking jacket or whatever it is you cool kids wear,” the post said, authored by Bradley Voytek, one of Uber’s data scientists. “Then that long walk home in the pre-morning dawn.” Voytek, a cognitive neuroscientist by trade, joined Uber because he loved the insight that such an enormous data set gave him into human behavior. Watching trips across cities being carried out in real time was like having his own personal human ant farm.

Fraudsters simply entered fake names and emails. Then they used apps like “Burner” or “TextNow” to create thousands of fake telephone numbers to be matched with stolen credit card numbers. But requiring Chinese users to add other, more precise, forms of identification would add more friction to the process. And, as Kalanick’s data scientists found in their research, adding friction slowed growth. For Kalanick, putting a dent in growth was not an option. Kalanick’s solution was to grow and rely upon the anti-fraud team. But scammers grew more shrewd over time. Eventually, hustlers found that searching forums for riders was inefficient and time-consuming, so they ended up creating “riders” themselves.

pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions
by Brian Christian and Tom Griffiths
Published 4 Apr 2016

We have the expression “Eat, drink, and be merry, for tomorrow we die,” but perhaps we should also have its inverse: “Start learning a new language or an instrument, and make small talk with a stranger, because life is long, and who knows what joy could blossom over many years’ time.” When balancing favorite experiences and new ones, nothing matters as much as the interval over which we plan to enjoy them. “I’m more likely to try a new restaurant when I move to a city than when I’m leaving it,” explains data scientist and blogger Chris Stucchio, a veteran of grappling with the explore/exploit tradeoff in both his work and his life. “I mostly go to restaurants I know and love now, because I know I’m going to be leaving New York fairly soon. Whereas a couple years ago I moved to Pune, India, and I just would eat friggin’ everywhere that didn’t look like it was gonna kill me.

Instead of “the” Google search algorithm and “the” Amazon checkout flow, there are now untold and unfathomably subtle permutations. (Google infamously tested forty-one shades of blue for one of its toolbars in 2009.) In some cases, it’s unlikely that any pair of users will have the exact same experience. Data scientist Jeff Hammerbacher, former manager of the Data group at Facebook, once told Bloomberg Businessweek that “the best minds of my generation are thinking about how to make people click ads.” Consider it the millennials’ Howl—what Allen Ginsberg’s immortal “I saw the best minds of my generation destroyed by madness” was to the Beat Generation.

pages: 525 words: 147,008

SuperBetter: The Power of Living Gamefully
by Jane McGonigal
Published 14 Sep 2015

If I sound quite confident that you can transform your life for the better with a gameful mindset and the SuperBetter method, it’s because I am. Since I invented SuperBetter, more than 400,000 people have played an online version of the game. We’ve recorded every power-up they’ve activated, every bad guy they’ve battled, and every quest they’ve completed—so we know what works and what doesn’t. I’ve joined forces with data scientists to analyze all the information we’ve collected from these 400,000 players over the past two years. I wanted answers to some of the same questions you might have: Who can the SuperBetter method work for? (Virtually anyone—young or old, male or female, avid game player or someone who has never played a video game in their life.)

Eventually, the perfect opportunity arose: Roepke found two colleagues at Penn who were interested in helping her conduct a formal study of SuperBetter’s effectiveness for treating depression. To help Penn prepare for this study, I teamed up with two collaborators at SuperBetter Labs, science writer Bez Maxwell and data scientist Rose Broome, to create a special set of depression-related power-ups, bad guys, and quests. The three of us also helped the Penn researchers design the study—how long it would last, how often we would encourage participants to play, and what questions we would ask. But the actual trial, including all recruitment, data collection, and data analysis, was conducted independently by the research team at the University of Pennsylvania.

Agile Project Management With Kanban
by Eric Brechner
Published 25 Feb 2015

A feature team is a group of individuals, often from multiple disciplines, who work on the same set of product features together. A typical feature team might have 1–3 analysts, 1–6 developers, and 1–6 testers (a total of 3–15 people), but some can be larger. Feature teams may also have marketers, product planners, designers, user researchers, architects, technical researchers, data scientists, quality assurance personnel, service engineers, service operations staff, and project managers. Often, feature team members are part of multiple feature teams, although developers and testers tend to be dedicated to a single team. Many people who use traditional Waterfall work on feature teams, all for the same manager or as a virtual team.

pages: 186 words: 50,651

Interactive Data Visualization for the Web
by Scott Murray
Published 15 Mar 2013

Information Dashboard Design: The Effective Visual Communication of Data by Stephen Few. O’Reilly Media, 2006. On the practicalities of working with data: Bad Data Handbook: Mapping the World of Data Problems by Q. Ethan McCallum. O’Reilly Media, 2012. Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists by Philipp K. Janert. O’Reilly Media, 2010. Python for Data Analysis: Agile Tools for Real World Data by Wes McKinney. O’Reilly Media, 2012. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions.

pages: 230 words: 61,702

The Internet of Us: Knowing More and Understanding Less in the Age of Big Data
by Michael P. Lynch
Published 21 Mar 2016

That is, instead of asking people survey questions or contriving small-scale experiments, which was how social science was often done in the past, I could go and look at what actually happens, when, say, 100,000 white men and 100,000 black women interact in private.2 Anderson and Rudder’s comments are not isolated; they bring to the surface sentiments that have been echoed across discussions of analytics over the last few years. While Rudder has been particularly adept at showing how huge data gathered by social sites can provide eye-opening correlations, and data scientists and companies the world over have been harvesting a wealth of surprising information using analytics, Google remains the most visible leader in this field. The most frequently cited, and still one of the most interesting, examples is Google Flu Trends. In a now-famous journal article in Nature, Google scientists compared the 50 million most common search terms used in America with the CDC’s data about the spread of seasonal flu between 2003 and 2008.3 What they learned was that forty-five search terms could be used to predict where the flu was spreading—and do so in real time, as they did with some accuracy in 2009 during the H1N1 outbreak.

Natural Language Processing with Python and spaCy
by Yuli Vasiliev
Published 2 Apr 2020

Natural language processing (NLP) is a subfield of artificial intelligence that tries to process and analyze natural language data. It includes teaching machines to interact with humans in a natural language (a language that developed naturally through use). By creating machine learning algorithms designed to work with unknown datasets much larger than those two dozen tablets found on Rapa Nui, data scientists can learn how we use language. They can also do more than simply decipher ancient inscriptions. Today, you can use algorithms to observe languages whose semantics and grammar rules are well known (unlike the rongorongo inscriptions), and then build applications that can programmatically “understand” utterances in that language.

pages: 202 words: 62,901

The People's Republic of Walmart: How the World's Biggest Corporations Are Laying the Foundation for Socialism
by Leigh Phillips and Michal Rozworski
Published 5 Mar 2019

We get a few super-yachts instead of superabundant housing for all; and we might well say the same when it comes to which consumer items we prioritize for production and distribution. In our irrational system, the ultimate purpose of product recommendations is to drive sales and profits for Amazon. Data scientists have found that rather than high numbers of customer-submitted reviews, which have little impact, it is recommendations that boost Amazon’s sales. Recommendations help sell not only less popular niche items—when it’s hard to dig up information, even just a recommendation can be enough to sway us—and bestsellers that constantly pop up when we’re browsing.

pages: 184 words: 60,229

Re-Educated: Why It’s Never Too Late to Change Your Life
by Lucy Kellaway
Published 30 Jun 2021

The moral of his story is similar to mine. He got away with it, partly because he had middle-class parents who found a way. He could afford to fail, because he had a safety net. I have just shown him what I’ve written. He doesn’t mind being portrayed as a wastrel, because he is one no longer. He has a job as a data scientist in a start-up and is motivated and doing well. But what about all the shouting and nagging? Did it make a difference? Art now says it was counterproductive – as well as being unpleasant – anything resembling coercion automatically makes him inclined to do the reverse. Has he forgiven me for it?

pages: 287 words: 62,824

Just Keep Buying: Proven Ways to Save Money and Build Your Wealth
by Nick Maggiulli
Published 15 May 2022

Maggiulli not only uses evidence to guide his suggestions, but he is also among the best at boiling everything down into ideas that are easy to understand and apply.” —James Clear, #1 New York Times bestselling author, Atomic Habits “The first time I read Nick Maggiulli's writing I knew he had a special talent. There are lots of good data scientists, and lots of good storytellers. But few understand the data and can tell a compelling story about it like Nick. This is a must-read.” —Morgan Housel, bestselling author, The Psychology of Money “Nick Maggiulli clearly delights in flouting the received wisdom about how people should manage their money.

pages: 626 words: 167,836

The Technology Trap: Capital, Labor, and Power in the Age of Automation
by Carl Benedikt Frey
Published 17 Jun 2019

Hiring good people has always been a critical issue for competitive advantage. But since the widespread availability of data is comparatively recent, this problem is particularly acute. Automobile companies can hire people who know how to build automobiles since that is part of their core competency. They may or may not have sufficient internal expertise to hire good data scientists, which is why we can expect to see heterogeneity in productivity as this new skill percolates through the labor markets.84 For these reasons, Amara’s Law will likely to apply to AI, too. Myriad necessary ancillary inventions and adjustment are required for automation to happen. Erik Brynjolfsson, who was among those investigating the role of computer technologies in the productivity boom of the late 1990s, thinks that the trajectory of AI adoption is likely to mirror the past in this regard.

Official employment statistics are always behind the curve when it comes to capturing new occupations, which are not included in the data until they have reached a critical mass in terms of the number of people in them. But other sources, like LinkedIn data, allow us at least to nowcast some emerging jobs. Among them are the jobs of machine learning engineers, big data architects, data scientists, digital marketing specialists, and Android developers.14 But we also find jobs like Zumba instructors and Beachbody coaches.15 In a world that is becoming increasingly technologically sophisticated, rising returns on skills are unlikely to disappear and likely to intensify. Like computers, AI seems set to spawn more skilled jobs for labor, in the process creating more demand for in-person service jobs that remain hard to automate.

pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy
by Tom Slee
Published 18 Nov 2015

The effectiveness of reputation systems and algorithmic ratings systems in providing a solid basis for trust is exaggerated in the Sharing Economy world. Sites that rely on algorithmic ratings have run into problems of fairness and of proper process, for example marketplace lending company Lending Club. By becoming a new way to qualify potential borrowers, the marketplace lending companies are entering into the area of credit scoring. Data scientist Cathy O’Neil argues that one reason Lending Club and others can bring such value to big financial institutions is that they provide a way to bypass credit scoring regulations, such as the Federal Trade Commission’s Equal Credit Opportunities Act (ECOA) that prohibits credit discrimination on the basis of race, color, religion and other factors and the Fair Credit Reporting Act (FCRA).

pages: 238 words: 68,914

Where Does It Hurt?: An Entrepreneur's Guide to Fixing Health Care
by Jonathan Bush and Stephen Baker
Published 14 May 2014

It’s not just a matter of killing time in uncomfortable chairs with soap operas blaring. People often have to take time off from work, or hire a babysitter. Some drive through heavy traffic, in both directions. Their time is money, and it represents an uncounted health care expense—on top of the trillions that we get billed for. Our data scientists can capture the elapsed minutes between the moment a patient signs in and the consultation begins. That’s the waiting time. Then, conceivably, we’ll be able to look for correlations between waiting time and other behaviors. Do offices where patients wait for more than thirty minutes suffer from higher customer churn?

pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy
by Jonathan Taplin
Published 17 Apr 2017

The privacy issue was reignited in early 2014, when the Wall Street Journal reported that Facebook had conducted a massive social-science experiment on nearly seven hundred thousand of its users. To determine whether it could alter the emotional state of its users and prompt them to post either more positive or negative content, the site’s data scientists enabled an algorithm, for one week, to automatically omit content that contained words associated with either positive or negative emotions from the central news feeds of 689,003 users. As it turned out, the experiment was very “successful” in that it was relatively easy to manipulate users’ emotions, but the backlash from the blogosphere was horrendous.

pages: 250 words: 64,011

Everydata: The Misinformation Hidden in the Little Data You Consume Every Day
by John H. Johnson
Published 27 Apr 2016

SIGNIFICANT OTHERS In the movie Thank You for Smoking, Aaron Eckhart’s character (a spokesperson for the tobacco industry) tells his son, “When you argue correctly, you’re never wrong.”16 It’s a line from a lobbyist in a Hollywood satire—but it’s an interesting quote to keep in mind as we talk about statistical significance, given that many people feel it’s the “correct” way to talk about data. Statistical significance is a concept used by scientists and researchers to set an objective standard that can be used to determine whether or not a particular relationship “statistically” exists in the data. Scientists test for statistical significance to distinguish between whether an observed effect is present in the data (given a high degree of probability), or just due to chance. It is important to note that finding a statistically significant relationship tells us nothing about whether a relationship is a simple correlation or a causal one, and it also can’t tell us anything about whether some omitted factor is driving the result.

pages: 212 words: 69,846

The Nation City: Why Mayors Are Now Running the World
by Rahm Emanuel
Published 25 Feb 2020

When a certain threshold is met in a neighborhood, we flood the area with police. We borrowed this concept from Los Angeles, then refined it to fit Chicago. We put strategic support centers in twelve of twenty-two districts, which run the data every eight hours to stay on top of crime trends. These centers are staffed around the clock by two cops and two data scientists from the University of Chicago. (If you are ever interested in witnessing some great cultural exchanges, sit in for a bit with two Chicago cops and two data nerds from the University of Chicago stuck in a room for eight hours at a time. Talk about cultural diversity.) Ken Griffin, the founder and chief executive of the investment firm Citadel, pitched in $10 million in funding for our new crime measures.

pages: 246 words: 68,392

Gigged: The End of the Job and the Future of Work
by Sarah Kessler
Published 11 Jun 2018

There, we would find answers to common complaints, such as “I feel uncomfortable or unsafe” (remove yourself from the situation and call the police) and instructions for what to do if a customer locked them out of the home by accident, or if they wanted to dispute a fee the company had taken from their pay for missing a scheduled cleaning or other violations to the terms of service. “We’ll call you!” shouted a woman wearing a skirt in the front row. “Don’t call us,” Carol automatically corrected. * * * Uber was extremely shrewd at finding new ways to manage independent contractors through its app. “Employing hundreds of social scientists and data scientists,” wrote the New York Times in 2017, “Uber has experimented with video game techniques, graphics and noncash rewards of little value that can prod drivers into working longer and harder—and sometimes at hours and locations that are less lucrative for them.”6 One strategy was its surge pricing model, which made rates higher during busy times in order to encourage more drivers to work during those periods.

pages: 279 words: 71,542

Digital Minimalism: Choosing a Focused Life in a Noisy World
by Cal Newport
Published 5 Feb 2019

In other words, depending on whom you ask, social media is either making us lonely or bringing us joy. To better understand this general phenomenon of contrasting conclusions, let’s look closer at the specific studies summarized above. One of the main positive articles cited by the Facebook blog post was authored by Moira Burke, a data scientist at the company (who also coauthored the blog post), and Robert Kraut, a human computer interaction specialist at Carnegie Mellon University. It was published in the Journal of Computer-Mediated Communication in July 2016. In this study, Burke and Kraut recruited a group of around 1,900 Facebook users who agreed to quantify their current level of happiness when prompted.

pages: 215 words: 69,370

Still Broke: Walmart's Remarkable Transformation and the Limits of Socially Conscious Capitalism
by Rick Wartzman
Published 15 Nov 2022

“Everywhere I traveled—even for vacation—I would talk to the cashiers and the people in the stores, and I would ask questions,” she said. “I would do that for our stores, and then I would do that for the competition.” After enough of these outings, she began to pick up on something. “If you’d go into a Starbucks,” said Ormanidou, who in 2016 would leave Walmart to become a data scientist at the coffee chain, “for the most part, people would say, ‘Oh, I love it here. They accept me. I’m happy. I love my job.’ But you would go to Walmart and everybody would complain. It’s the same candidates, the same people who apply to Target and Walmart and McDonald’s and Starbucks. How do we end up with the people who are not happy?

pages: 252 words: 73,131

The Inner Lives of Markets: How People Shape Them—And They Shape Us
by Tim Sullivan
Published 6 Jun 2016

The rest are plain-vanilla fixed-price sales, just as one would see listed from a third-party seller on Amazon, which makes buying things on eBay fundamentally not that different from the way people have done their shopping for the past century or so. Given the critical importance of this shift in online commerce for eBay’s bottom line, it’s no surprise that data scientists within the company’s research group have thoroughly studied the change. In a collaboration with Stanford economists, eBay researchers have dug into the reasons behind the decline of the company’s auction business. Their findings matter to eBay executives mapping out business’s future, and also for those of us who are simply trying to make sense of how the internet has changed the nature of markets and, just as important, the ways in which it hasn’t.

pages: 278 words: 74,880

A World of Three Zeros: The New Economics of Zero Poverty, Zero Unemployment, and Zero Carbon Emissions
by Muhammad Yunus
Published 25 Sep 2017

Thanks in large part to the attractiveness of this online platform, in less than three years, the Food Assembly has spread to more than seven hundred locations in France, Belgium, the United Kingdom, Spain, Germany, and Italy—a vivid illustration of what I mean by the multiplying power of digital ICT! MakeSense is continuing to develop and refine its use of technological tools to enhance and spread social business. Beginning in 2016, a data scientist with expertise in developing and applying advanced analytic tools came to work at Make-Sense thanks to a grant from his main employer, the media company Bloomberg L.P. The scientist is working on a system to track and measure the performance of social business projects. The goal is to develop new, more accurate ways of determining which methodologies and practices produce the best results for the people whom the social business is designed to benefit.

pages: 281 words: 78,317

But What if We're Wrong? Thinking About the Present as if It Were the Past
by Chuck Klosterman
Published 6 Jun 2016

It’s easy to discover a new planet and then work up the math proving that it’s there; it’s quite another to mathematically insist a massive undiscovered planet should be precisely where it ends up being. This is a different level of correctness. It’s not interpretative, because numbers have no agenda, no sense of history, and no sense of humor. The Pythagorean theorem doesn’t need the existence of Mr. Pythagoras in order to work exactly as it does. I have a friend who’s a data scientist, currently working on the economics of mobile gaming environments. He knows a great deal about probability theory,35 so I asked him if our contemporary understanding of probability is still evolving and if the way people understood probability three hundred years ago has any relationship to how we will gauge probability three hundred years from today.

pages: 333 words: 76,990

The Long Good Buy: Analysing Cycles in Markets
by Peter Oppenheimer
Published 3 May 2020

This component of a strong narrative that drives the interest in investment was observed by renowned Austrian economist Joseph Schumpeter, who argued that speculation often occurs at the start of a new industry. More recently, in a testimony before the US Congress on 26 February 1997, then-chairman of the Federal Reserve Alan Greenspan noted that ‘regrettably, history is strewn with visions of such “new eras” that, in the end, have proven to be a mirage’. A recent study by data scientists found that, in a sample of 51 major innovations introduced between 1825 and 2000, bubbles in equity prices were evident in 73% of the cases. They also found that the magnitude of these bubbles increases with the radicalness of innovations, with their potential to generate indirect network effects and with their public visibility at the time of commercialisation.12 Although it is not obvious that innovation was a trigger in the case of the tulip mania, it could be argued that it was important in the financial bubbles of the South Sea Company in Great Britain and the Mississippi Company in France in 1720.

pages: 305 words: 75,697

Cogs and Monsters: What Economics Is, and What It Should Be
by Diane Coyle
Published 11 Oct 2021

This means that companies that used to invest in servers and other equipment, and hire people to staff large IT departments, no longer need to do so. More and more companies, and pretty much all start-ups, do not make these investments at all now but instead use cloud services such as Amazon Web Services, or Microsoft’s Azure. Executives I have interviewed told me they used to have IT departments with skilled data scientists costing many tens or hundreds of thousands of pounds a year, but now for a few pounds on the company credit card they can simply use services provided by cloud platforms, with the latest software and cutting edge AI. Big firms and government departments and agencies have switched to cloud computing, and new firms start with it.

pages: 232 words: 72,483

Immortality, Inc.
by Chip Walter
Published 7 Jan 2020

Bloom had other examples of how insights into the human genome could kill you or save your life, depending. Take the infamous H1N1 flu epidemic of 2009. H1N1 killed 203,000 people worldwide—one of the worst pandemics in recent history. The first known outbreak was in Veracruz, Mexico. In analyzing the epidemic’s genomic data, HLI’s chief data scientist Amalio Telenti found that for every 40,000 children, one died of the disease. That wasn’t a lot (unless you were the child who died), but the statistics had a “genomic” ring, as if something in the genes made some children more susceptible to the virus than others. When Telenti looked at the records of the children that died, he noticed about 60 percent had preexisting lung illnesses like asthma or cystic fibrosis.

pages: 300 words: 76,638

The War on Normal People: The Truth About America's Disappearing Jobs and Why Universal Basic Income Is Our Future
by Andrew Yang
Published 2 Apr 2018

Every innovation will bring with it new opportunities, and some will be difficult to predict. Self-driving cars and trucks will bring with them a need for improved infrastructure and thus perhaps some construction jobs. The demise of retail could make drone pilots more of a need over time. The proliferation of data is already making data scientists a hot new job category. The problem is that the new jobs are almost certain to be in different places than existing ones and will be less numerous than the ones that disappear. They will generally require higher levels of education than the displaced workers have. And it will be very unlikely for a displaced worker to move, identify the need, gain skills, and fill the new role.

pages: 280 words: 71,268

Measure What Matters: How Google, Bono, and the Gates Foundation Rock the World With OKRs
by John Doerr
Published 23 Apr 2018

Remind is on its way to solving that problem—by focusing on what matters. 6 Commit: The Nuna Story Jini Kim Cofounder and CEO Nuna is the story of the passionate Jini Kim, propelled by family tragedy to deliver better health care to huge numbers of Americans. Of how she bootstrapped Nuna through years of rejection. And of how she recruited engineers and data scientists to commit to a wildly audacious goal: building a new Medicaid data platform, from scratch. Alongside focus, commitment is a core element of our first superpower. In implementing OKRs, leaders must publicly commit to their objectives and stay steadfast. At Nuna, a health care data platform and analytics company, the cofounders overcame a false start with OKRs.

pages: 269 words: 70,543

Tech Titans of China: How China's Tech Sector Is Challenging the World by Innovating Faster, Working Harder, and Going Global
by Rebecca Fannin
Published 2 Sep 2019

Consumers complete the entire lending process over their smartphone and don’t need an established credit history—an issue among young people starting in their careers. Loan decisions for individual borrowers are made online within seconds. One hint: don’t fill out the online form in all capital letters. WeLab has found applicants who write in upper case are not good credit risks. A technology team of more than 210 engineers and data scientists have guided WeLab in reinventing traditional lending and assessing credit risks by three proprietary AI systems: WeDefend detects fraud and suspicious behavior by analyzing more than 2,500 user data points in under one second. WeReach peeks into consumers’ influence and interactions with social connections.

pages: 290 words: 73,000

Algorithms of Oppression: How Search Engines Reinforce Racism
by Safiya Umoja Noble
Published 8 Jan 2018

In attendance was the journalist Julia Angwin, one of the investigators of the breaking story about courtroom sentencing software Northpointe, used for risk assessment by judges to determine the alleged future criminality of defendants.6 She and her colleagues determined that this type of artificial intelligence miserably mispredicted future criminal activity and led to the overincarceration of Black defendants. Conversely, the reporters found it was much more likely to predict that White criminals would not offend again, despite the data showing that this was not at all accurate. Sitting next to me was Cathy O’Neil, a data scientist and the author of the book Weapons of Math Destruction, who has an insider’s view of the way that math and big data are directly implicated in the financial and housing crisis of 2008 (which, incidentally, destroyed more African American wealth than any other event in the United States, save for not compensating African Americans for three hundred years of forced enslavement).

pages: 244 words: 78,238

Cabin Fever: The Harrowing Journey of a Cruise Ship at the Dawn of a Pandemic
by Michael Smith and Jonathan Franklin
Published 14 Jul 2022

Aboard the Zaandam, information about typical COVID-19 symptoms took time to get around and then spread from cabin to cabin, one person to the next. Passengers were even less informed than people onshore. Anne hadn’t heard the news that losing one’s sense of taste or smell was such a clear sign of COVID-19 that data scientists were now tracking Google queries: “Why can’t I taste my food?” and “Why can’t I smell anything?” These searches were now considered frontline markers of the pandemic’s international expansion. But Arthur, on dry land and far from the Zaandam, was informed. “I’d just heard an epidemiologist describe lack of taste as one of the symptoms of COVID,” he said.

pages: 240 words: 78,436

Open for Business Harnessing the Power of Platform Ecosystems
by Lauren Turner Claire , Laure Claire Reillier and Benoit Reillier
Published 14 Oct 2017

This process of collecting human observations about the relevance of search results is also used by Facebook to improve what people should see first in their newsfeed.7 It started as a feed quality panel experiment in 2014, asking Platform maturity: profitable growth 125 people to provide detailed feedback on what they saw in their newsfeeds, which posts they liked and why, and what posts they would have liked to see instead. This qualitative human feedback exposed blind spots that data scientists, who work on improving the newsfeed algorithm, could not identify with machine learning. Facebook now runs feed quality panels across the US and international markets, and combines both quantitative and qualitative approaches to optimize its newsfeed algorithm. Search innovation Search is an area that is constantly evolving.

pages: 280 words: 82,355

Extreme Teams: Why Pixar, Netflix, AirBnB, and Other Cutting-Edge Companies Succeed Where Most Fail
by Robert Bruce Shaw , James Foster and Brilliance Audio
Published 14 Oct 2017

They also have the flexibility to balance long and short term work, creating business impact while managing technical debt. Does this mean engineers just do whatever they want? No. They work to define and prioritize impactful work with the rest of their team including product managers, designers, data scientists and others.” nerds.airbnb.com/engineering-culture-airbnb/. 10Own Thomas. “How Airbnb Manages Not to Manage Engineers.” 11The importance of experience in Airbnb is suggested when realizing that the head of what most firms call human resources is called the head of employee experience at Airbnb.

pages: 266 words: 86,324

The Drunkard's Walk: How Randomness Rules Our Lives
by Leonard Mlodinow
Published 12 May 2008

But what was truly striking was that when I made a bar graph showing how the number of buyers diminished as the buyers’ age strayed from the mean of seven, I found that the graph took a very familiar shape—that of the error law. It is one thing to suspect that archers and astronomers, chemists and marketers, encounter the same error law; it is another to discover the specific form of that law. Driven by the need to analyze astronomical data, scientists like Daniel Bernoulli and Laplace postulated a series of flawed candidates in the late eighteenth century. As it turned out, the correct mathematical function describing the error law—the bell curve—had been under their noses the whole time. It had been discovered in London in a different context many decades earlier.

pages: 247 words: 81,135

The Great Fragmentation: And Why the Future of All Business Is Small
by Steve Sammartino
Published 25 Jun 2014

The efficiency low-friction labour markets create opens up an opportunity for further independence of worlds, which is more profitable for those who need things done and those doing it. The type of work to evolve will be that of projecteers. These people — who aren’t really staff members, and don’t really run companies either — are digitally facilitated freelancers with skills that are in demand from the new economic landscape, such as UX Consulting, app developers, big data scientists, community managers, cloud services specialists, online course teachers and 3D printing designers, as well as jobs that don’t exist yet. They’re niche roles for an increasingly fragmented world. The greatest fallacy in modern politics is the idea of saving jobs. There aren’t many people who hunt bison for a living in this day and age and saving jobs is a simple misallocation of taxpayers’ dollars.

pages: 361 words: 81,068

The Internet Is Not the Answer
by Andrew Keen
Published 5 Jan 2015

The drop-off that occurred during the last few years coincided with increased awareness of and sensitivity to worrisome behavior in chat rooms.”39 This “pervasive misogyny” has led some former Internet evangelists, such as the British author Charles Leadbeater, to believe that the Internet is failing to realize its potential.40 “It’s outrageous we’ve got an Internet where women are regularly abused simply for appearing on television or appearing on Twitter,” Leadbeater said. “If that were to happen in a public space, it would cause outrage.”41 Hatred is ubiquitous on the Internet. “Big hatred meets big data,” writes the Google data scientist Seth Stephens-Davidowitz about the growth of online Nazi and racist forums that attract up to four hundred thousand Americans per month.42 Then there are the haters of the haters—the digital vigilantes, such as the group OpAntiBully, who track down Internet bullies and bully them.43 Worst of all are the anonymous online bullies themselves.

pages: 291 words: 85,822

The Truth About Lies: The Illusion of Honesty and the Evolution of Deceit
by Aja Raden
Published 10 May 2021

Well, some guys at MIT proved it.35 According to a first-of-its-kind study of Twitter, fake news stories (rumors, hoaxes, propaganda) spread six times faster and significantly farther than truthful ones.36 And before you scream BOTS!, these results accounted and corrected for their impact. This was all us. Humans love to read inflammatory lies and pass them along—even when they know the stories are untrue. The study was the brainchild of its lead author, data scientist Soroush Vosoughi. Dr. Vosoughi was a Ph.D. student at the time of the 2013 Boston Marathon bombing. He was deeply disturbed by the volume and fervor of conspiracy theories emerging in the days following the bombings—most directed at a missing Brown student, whose tragic disappearance was ultimately totally unrelated.

pages: 252 words: 78,780

Lab Rats: How Silicon Valley Made Work Miserable for the Rest of Us
by Dan Lyons
Published 22 Oct 2018

Human recruiters still had to look at all the videos, and there is only so much they can do. Sure, they could fast-forward through videos and make decisions. But to scale up even more, “We started asking, how can we use technology to take the place of what humans are doing?” Parker says. HireVue assembled a team of data scientists and industrial and organizational psychologists, who took existing science on things like “facial action units” and encoded it into software. Two years ago HireVue began offering this service to its customers. HireVue has more than seven hundred clients, including Nike, Intel, Honeywell, and Delta Airlines.

pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies
by Igor Tulchinsky
Published 30 Sep 2019

Another alternative is FloatBoost, which incorporates the backtracking mechanism of floating search and repeatedly performs a backtracking to remove unfavorable weak classifiers after a new weak classifier is added by AdaBoost; this ensures a lower error rate and reduced feature set at the cost of about five times longer training time. Deep Learning Deep learning (DL) is a popular topic today – and a term that is used to discuss a number of rather distinct things. Some data scientists think DL is just a buzz word or a rebranding of neural networks. The name comes from Canadian scientist Geoffrey Hinton, who created an unsupervised method known as the restricted Boltzmann machine (RBM) for pretraining NNs with a large number of neuron layers. That was meant to improve on the backpropagation training method, but there is no strong evidence that it really was an improvement.

pages: 444 words: 84,486

Radicalized
by Cory Doctorow
Published 19 Mar 2019

They’ve been watching the darknet boards, they know that everyone’s been figuring out how to jailbreak their shit while we’ve been getting restarted, and they figure all those people could be customers, but instead of paying for food we sell them, they’d pay us to use food someone else sold them.” Salima almost laughed. It was a crime if she did it, a product if they sold it to her. Everything could be a product. “It’s weird, I know. But here’s where you come in. They’ve got this research unit, anthropologists and data scientists and marketers, and they want to talk to people like you, find out what you’d pay for different kinds of products. They want to see if you’d sell the package to your neighbors, if you could get a cut of the money from them, like a commission? They’ve got one plan, you could teach those kids you were working with to sell paid unlocking to the people in your building, and they’d get a commission and you’d get a commission because you recruited them.”

pages: 280 words: 82,393

Conflicted: How Productive Disagreements Lead to Better Outcomes
by Ian Leslie
Published 23 Feb 2021

As Graham puts it, ‘Agreeing tends to motivate people less than disagreeing.’ Readers are more likely to comment on an article or post when they disagree with it, and in disagreement they have more to say (there are only so many ways you can say, ‘I agree’). They also tend to get more animated when they disagree, which usually means getting angry. A team of data scientists in 2010 studied user activity on BBC discussion forums, measuring the emotional sentiment of nearly 2.5 million posts from 18,000 users. They found that longer discussion threads were sustained by negative comments, and that the most active users overall were more likely to express negative emotions.

pages: 422 words: 86,414

Hands-On RESTful API Design Patterns and Best Practices
by Harihara Subramanian
Published 31 Jan 2019

ELK, which is an open source software, fulfills these differing requirements in a tightly-integrated manner. E stands for Elasticsearch, L for Logstash, and K for Kibana. Elasticsearch just dumps the logs and provides a fuzzy search capability, Logstash is used to collect logs from different sources and transform them, and Kibana is a graphical user interface (GUI) that helps data scientists, testers, developers, and even businesspeople to insightfully search the logs as per their evolving requirements. Considering the significance of log analytics, there are open source as well as commercial-grade solutions to extract log, operational, performance, scalability, and security insights from microservice interaction log data.

pages: 277 words: 81,718

Vassal State
by Angus Hanton
Published 25 Mar 2024

Jeff Bezos’s vision that his platform should become the ‘everything store’ offering limitless selection and seductive convenience at disruptively low prices is now entrenched as the dominant force in UK retail commerce. Amazon Web Services (AWS) began as an attempt by Amazon to control its own web hosting, creating a system that could handle the growing data storage and traffic to its site. By 2002, Jeff Bezos had dispatched a team of data scientists to South Africa to develop a product for others to use, now known as ‘the cloud’. It took just two years to launch but it grew fast without dramatically increasing overheads. ‘This has to scale to infinity,’ Mr Bezos instructed staff, ‘with no planned downtime. Infinity!’29 Today, AWS is worth at least $500 billion to its parent company.30 Most of the UK government’s websites and data are on AWS, which hosts everything from websites to streamed video.

pages: 408 words: 85,118

Python for Finance
by Yuxing Yan
Published 24 Apr 2014

His work is focused on developing an FX Options application, and he mainly works with C++. However, he has worked with a variety of languages and technologies through the years. He is a Linux and Python enthusiast and spends his free time experimenting and developing applications with them. Mourad MOURAFIQ is a software engineer and data scientist. After successfully completing his studies in Applied Mathematics, he worked at an investment bank as a quantitative modeler in the structured products market, specializing in ABS, CDO, and CDS. Then, he worked as a quantitative analyst for the largest French bank. After a couple of years in the financial world, he discovered a passion for machine learning and computational mathematics and decided to join a start-up that specializes in software mining and artificial intelligence.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists
by Gary Marcus and Jeremy Freeman
Published 1 Nov 2014

The Human Brain Project aims to bridge the two. Where’s the Data? A prerequisite to creating a whole brain model is the emerging new discipline called neuroinformatics—the endeavor to apply computing technology to help solve the challenges neuroscientists face in organizing, sharing, and gaining insight from their data. Scientists have produced millions of papers and petabytes of data about the brain describing these many levels of detail—and the pace is growing even faster. Since 1990, the number of publications alone has grown from around 30,000 to nearly 100,000 per year in 2013. The number and size of large-scale datasets are also rapidly increasing—a recently produced single human brain scan consumes 1 terabyte (a thousand gigabytes) of storage—enough to fill the storage on a single laptop.

pages: 297 words: 93,882

Winning Now, Winning Later
by David M. Cote
Published 17 Apr 2020

To attract these premier programmers, or “multipliers” as we called them, we began evaluating potential hires on specific skills related to programming, collaboration, and teamwork, observing their actual behavior rather than just relying on their academic record. We took a similar approach to hiring data scientists as well. Our efforts in this area helped us significantly up our game as we developed software as a business and incorporated it into more of our existing products. To expand our capability and improve recruiting, we also brought a number of multifunctional teams into a new software center we had built in Atlanta, Georgia.

pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order
by Kai-Fu Lee
Published 14 Sep 2018

Instead, it will simply take over the execution of tasks that meet two criteria: they can be optimized using data, and they do not require social interaction. (I will be going into greater detail about exactly which jobs AI can and cannot replace.) Yes, there will be some new jobs created along the way—robot repairing and AI data scientists, for example. But the main thrust of AI’s employment impact is not one of job creation through deskilling but of job replacement through increasingly intelligent machines. Displaced workers can theoretically transition into other industries that are more difficult to automate, but this is itself a highly disruptive process that will take a long time.

pages: 292 words: 92,588

The Water Will Come: Rising Seas, Sinking Cities, and the Remaking of the Civilized World
by Jeff Goodell
Published 23 Oct 2017

These measurements, from which the influence of the tides and the waves is removed, are free from distortion by rising or sinking land. When this data is combined with tide gauge averages, as well as measurements from ocean floats that record changes in the heat content of the ocean, it gives scientists a very good picture of how much the sea level is rising and what the causes are. With better data, scientists are now able to more clearly understand other factors beyond land movement that lead to variations in the rate of sea-level rise. One is the gravitational fingerprinting I mentioned earlier, which pushes water into the Southern Hemisphere from melting ice sheets in Greenland and into the Northern Hemisphere from Antarctica.

pages: 372 words: 94,153

More From Less: The Surprising Story of How We Learned to Prosper Using Fewer Resources – and What Happens Next
by Andrew McAfee
Published 30 Sep 2019

Samasource, founded by Janah in 2008, trains people to do entry-level technology work (such as data entry and image labeling) and connects them with employers. Online education companies such as Udacity, Coursera, and Lambda aim to provide higher-level online training. I like these efforts because they often train people for jobs that can be done wherever there’s an Internet connection. Not every coder or data scientist wants to live in a big city, or to have to move to one to acquire new skills. I’m encouraged to see that promising alternatives are now appearing. I’m also encouraged to see that business leaders are taking seriously the issue of disconnection and working on efforts to bring back economic opportunity to communities in danger of being left behind as globalization and tech progress race ahead.

pages: 263 words: 92,618

Going Infinite: The Rise and Fall of a New Tycoon
by Michael Lewis
Published 2 Oct 2023

The most glorious five-­bedroom units in Honeycomb or Cube were yours for the taking, and stocked with mountains of Chinese snacks, dress for any occasion, and enough alcohol to sink a pirate ship. Sam’s parents had flown to the Bahamas and would remain in the Orchid penthouse with their son till the end, as would his psychiatrist. A single FTX technologist named Dan Chapsky had stayed on, but he was an odd case. He’d held the title Chief Data Scientist, but Sam barely knew who he was, or what he did, or why he had stayed—­and neither did he. On bankruptcy Friday, he had emerged from his own luxury condo with the haunted look of a man after an air raid and sought out George Lerner. “Why am I here?” he’d asked. George had looked him in the eye for a long moment and said, “You need to leave.”

pages: 420 words: 94,064

The Revolution That Wasn't: GameStop, Reddit, and the Fleecing of Small Investors
by Spencer Jakab
Published 1 Feb 2022

The usual prescription is more and better “investor education,” though there is little evidence that it is effective. The meme-stock squeeze delivered a wake-up call to everyone. Even as it was still going on, furious politicians called hearings, movie studios raced to ink deals for GameStop movies, and hedge funds scrambled to hire data scientists to scour social media so they could get in early on the next mania, or at least stay out of harm’s way. There may be nothing new under the sun when it comes to investing or stock tips, but the hyperconnected, algorithmically enhanced version is an order of magnitude more potent. It was strong enough to enrage Washington, inspire Hollywood, and rattle Wall Street.

pages: 282 words: 93,783

The Future Is Analog: How to Create a More Human World
by David Sax
Published 15 Jan 2022

The plan was to attempt a test bed downtown, using sensors, advanced cameras, public Wi-Fi networks, and digital kiosks to connect all sorts of city services and improve them for the mostly poorer Black and Latino residents of the area. The data would reveal gaps in parking, transportation, and policing, which would lead to quicker and better solutions by city staff. Embedding herself in the project over three years, doing everything from visiting the huge control rooms run by data scientists and statisticians to riding in the backs of police cruisers to waiting at cold bus stops, Baykurt got a front-row seat to what a smart city actually looks like when implemented on the ground. “To be honest, it doesn’t change much,” Baykurt concluded. “The hype mobilizes a lot of people. There seems to be change going on.”

pages: 406 words: 88,977

How to Prevent the Next Pandemic
by Bill Gates
Published 2 May 2022

Most of the team would be based at individual countries’ national public health institutes, though some would sit in the WHO’s regional offices and at its headquarters in Geneva. When there’s a potential pandemic looming, the world needs expert analysis of early data points that can confirm the threat. GERM’s data scientists would build a system for monitoring reports of clusters of suspicious cases. Its epidemiologists would monitor reports from national governments and work with WHO colleagues to identify anything that looks like an outbreak. Its product-development experts would advise governments and companies on the highest-priority drugs and vaccines.

pages: 285 words: 91,144

App Kid: How a Child of Immigrants Grabbed a Piece of the American Dream
by Michael Sayman
Published 20 Sep 2021

At my parents’ place, probably—sitting at the kitchen counter, sweating over the next app that would make or break my world. I never wanted to feel that fragile and precarious again. I made a promise to myself to devote the rest of my internship to networking like a boss. I began by reaching out to market researchers, data scientists, and other people at that level, telling them I’d like to learn about their experiences. Rarely did someone decline to meet with me. Most people, I’ve found, will embrace the opportunity to be a teacher. The secret, I learned, was not to talk about myself, beyond saying enough to reassure people that I was competent and worthy of their time.

pages: 336 words: 91,806

Code Dependent: Living in the Shadow of AI
by Madhumita Murgia
Published 20 Mar 2024

Over the years, he had seen the face of poverty up close while working with families like Norma Gutiarraz’s in Salta’s slums: young people without jobs, adolescent pregnancies, school dropouts, and a cycle that sometimes repeated itself, generation after generation. He knew what the problems were, but he wasn’t a data scientist or an engineer. He needed someone with technical skills to help him build solutions that could be scaled up quickly. Someone he could trust. Like his younger brother, Pablo. ‘Solving’ Teenage Pregnancy Pablo and Charlie grew up in a middle-class, close-knit family, living in the former colonial city centre of Salta, crammed with classical architecture and verdant squares.

The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do
by Erik J. Larson
Published 5 Apr 2021

In Part Three, The Future of the Myth, I argue that the myth has very bad consequences if taken seriously, because it subverts science. In particular, it erodes a culture of human intelligence and invention, which is necessary for the very breakthroughs we will need to understand our own future. Data science (the application of AI to “big data”) is at best a prosthetic for human ingenuity, which if used correctly can help us deal with our modern “data deluge.” If used as a replacement for individual intelligence, it tends to chew up investment without delivering results. I explain, in particular, how the myth has negatively affected research in neuroscience, among other recent scientific pursuits.

Programs like DENDRAL, which analyzed the structure of chemicals, and MYCIN, which provided sometimes quite good medical diagnoses, made clear that AI methods were relevant to a variety of problems normally requiring high human intelligence. Machine translation, as we’ve seen, was an initial failure, but yielded to different approaches made possible by the availability of large datasets (a precursor to many Big Data and data science successes in the 2000s). All sorts of natural language processing tasks, like generating parses of natural language sentences, and tagging parts of speech or entities (persons, organizations, places, and the like), were chipped away at by AI systems with increasing power and sophistication.8 Yet Turing’s original goal for AI, passing the Turing test, remained elusive.

Amazon was using big data before it was a buzzword, tracking and cataloging online purchases, which now are used as data to feed machine learning algorithms offering product recommendations, enhanced search, and other customer features. Big data is an inevitable consequence of Moore’s law: as computers become more powerful, statistical techniques like machine learning become better, and new business models emerge—all from data and its analysis. What we now refer to as data science (or, increasingly, AI) is really an old field, given new wings by Moore’s law and massive volumes of data, mostly made available by the growth of the web. Governments and nonprofit organizations quickly joined in, using big data to predict everything from traffic flows to recidivism among parole-eligible prisoners.

pages: 1,829 words: 135,521

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
by Wes McKinney
Published 25 Sep 2017

Cross-validated models take longer to train, but can often yield better model performance. 13.5 Continuing Your Education While I have only skimmed the surface of some Python modeling libraries, there are more and more frameworks for various kinds of statistics and machine learning either implemented in Python or with a Python user interface. This book is focused especially on data wrangling, but there are many others dedicated to modeling and data science tools. Some excellent ones are: Introduction to Machine Learning with Python by Andreas Mueller and Sarah Guido (O’Reilly) Python Data Science Handbook by Jake VanderPlas (O’Reilly) Data Science from Scratch: First Principles with Python by Joel Grus (O’Reilly) Python Machine Learning by Sebastian Raschka (Packt Publishing) Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron (O’Reilly) While books can be valuable resources for learning, they can sometimes grow out of date when the underlying open source software changes.

The Python community has grown immensely, and the ecosystem of open source software around it has flourished. This new edition of the book would not exist if not for the tireless efforts of the pandas core developers, who have grown the project and its user community into one of the cornerstones of the Python data science ecosystem. These include, but are not limited to, Tom Augspurger, Joris van den Bossche, Chris Bartak, Phillip Cloud, gfyoung, Andy Hayden, Masaaki Horikoshi, Stephan Hoyer, Adam Klein, Wouter Overmeire, Jeff Reback, Chang She, Skipper Seabold, Jeff Tratner, and y-p. On the actual writing of this second edition, I would like to thank the O’Reilly staff who helped me patiently with the writing process.

Among interpreted languages, for various historical and cultural reasons, Python has developed a large and active scientific computing and data analysis community. In the last 10 years, Python has gone from a bleeding-edge or “at your own risk” scientific computing language to one of the most important languages for data science, machine learning, and general software development in academia and industry. For data analysis and interactive computing and data visualization, Python will inevitably draw comparisons with other open source and commercial programming languages and tools in wide use, such as R, MATLAB, SAS, Stata, and others.

pages: 337 words: 103,273

The Great Disruption: Why the Climate Crisis Will Bring on the End of Shopping and the Birth of a New World
by Paul Gilding
Published 28 Mar 2011

These gatherings have become key milestones measuring society’s progress on sustainability, or the lack of it, with a recent example being the Climate Conference in Copenhagen. The 1972 Stockholm Conference also established various global and regional scientific monitoring processes that helped provide the data scientists now use to measure the changing state of the global ecosystem. And in case you thought climate change was a recent issue, it was addressed at this meeting nearly forty years ago! The second key event of 1972 was the publication of The Limits to Growth. While commissioned by the Club of Rome, an international group of intellectuals and industrialists, the report was produced by MIT experts who were focused on system dynamics—taking the behavior of systems, rather than environmental issues, as their starting point.

Beautiful Visualization
by Julie Steele
Published 20 Apr 2010

In addition to working at the Times, Nick helped co-found NYCResistor, a hardware hacker space in Brooklyn, New York. He is also an adjunct professor at NYU in the Interactive Telecommunications program. Michael Driscoll fell in love with with data visualization over a decade ago as a software engineer for the Human Genome Project. He is the founder and principal data scientist at Dataspora, an analytics consultancy in San Francisco. Jonathan Feinberg is a computer programmer who lives in Medford, Massachusetts, with his wife and two sons. Please write to him at jdf@pobox.com, especially if you know of any Boston-area Pad Thai that can go up against the Thai Café in Greenpoint, Brooklyn.

pages: 309 words: 96,168

Masters of Scale: Surprising Truths From the World's Most Successful Entrepreneurs
by Reid Hoffman , June Cohen and Deron Triff
Published 14 Oct 2021

* * * — It’s no surprise that a digital platform like Eventbrite relies heavily on data—as well as emotional connection—to inform the fuller picture of their customer. But you might be surprised to learn that Jenn Hyman’s clothing rental biz, Rent the Runway, is also deep into data, and always has been. “Actually, 80 percent of our corporate employees are engineers, data scientists, and product managers,” says Jenn. “We have very few people in merchandising and marketing. The first C-level hire that I made was a chief data officer, and he was in my first ten employees. From the very beginning of the company, we were thinking about data. “We are getting data from our customer over a hundred times a year,” says Jenn.

pages: 372 words: 101,678

Lessons from the Titans: What Companies in the New Economy Can Learn from the Great Industrial Giants to Drive Sustainable Success
by Scott Davis , Carter Copeland and Rob Wertheimer
Published 13 Jul 2020

We’ve heard dozens of companies adapt the lingo of continuous improvement and even some of the techniques. Most fail to make durable progress. Making an actual system work takes years of disciplined implementation. Software as a sector hasn’t come up with differentiated and systematic workflows. For companies like Uber, that is an expensive problem: 3,000 software engineers and data scientists in a company expected to lose $3 billion in operating profits on less than $20 billion in revenue. This failure to build in rigorous continuous improvement is unsurprising and certainly not unique to software or technology. As competition grows, winners will need a bigger moat. When Uber was founded in 2008, there were only a small handful of companies pursuing asset sharing.

pages: 359 words: 96,019

How to Turn Down a Billion Dollars: The Snapchat Story
by Billy Gallagher
Published 13 Feb 2018

Snapchat exec Sriram Krishnan later wrote about this Snapchat core belief after leaving the company, explaining how most companies measure a proxy metric for actual human behavior, since the latter is nearly impossible to measure perfectly. You convert a nebulous human emotion/behavior to a quantifiable metric you can align execution on and stick on a graph and measure teams on. Engineers and data scientists can’t do anything with “this makes people feel warm and fuzzy.” They can do a lot with “this feature improves metric X by 5% week-over-week.” Figuring out the connection between the two is often the art and science of product management. Krishnan explained how these metrics often have unforeseen side effects, as people focus on simply making the metric increase, but not in the way the original system designers intended: For example, in terms of what designers wanted, what they built/measured and what they unintentionally caused: Quality journalism → Measure Clicks → Creation of click-bait content Marissa Mayer once tested forty-one different shades of blue on Google users to see which one would be most effective.

pages: 463 words: 105,197

Radical Markets: Uprooting Capitalism and Democracy for a Just Society
by Eric Posner and E. Weyl
Published 14 May 2018

Respondents in QV surveys also participate more actively, revising their answers to reflect their preferences much more frequently and often providing feedback that taking the QV survey had helped them learn their own preferences more accurately by forcing them to make difficult, even frustrating tradeoffs. To test whether QV manages to solve the problems with Likert, in 2016 Decide’s chief data scientist and now professor of mathematics education David Quarfoot, along with several co-authors, ran a nationally representative survey with thousands of participants that took versions of the same poll using Likert, QV, or both depending on which group they were assigned to.43 Figure 2.4 pictures a representative set of responses, on the question of repealing Obamacare, with the Likert survey on the left (with its signature W-shape) and the results from QV on the right.

pages: 332 words: 100,601

Rebooting India: Realizing a Billion Aspirations
by Nandan Nilekani
Published 4 Feb 2016

Along with his colleagues Jeff Bezanson, Alan Edelman and Stefan Karpinski, Viral is a co-inventor of the Julia programming language. Julia is an open-source, high-performance programming language under development since 2009. It is commonly used by scientists in the physical and social sciences, engineers and data scientists for diverse purposes ranging from exploring the secrets of the universe to teasing out new insights from big data. While Julia is itself an open-source project that received contributions from scores of programmers worldwide, it has benefited from government research funding at the Massachusetts Institute of Technology, from US government agencies such as the Defense Advanced Research Projects Agency, the National Science Foundation and the Department of Energy.

pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by Mary L. Gray and Siddharth Suri
Published 6 May 2019

Right now, most ghost work platforms, particularly for micro-tasks working with AI training data, default to assuming that people are individual agents standing by and ready to jump into a task, with the latest software and a stable internet connection. For example, a manager might give a team of data scientists, hired through an online labor market, access to files and data, effectively bringing them into the enterprise for a short period of time. Then, once the project is completed, their access to the files and data would be revoked, returning them to a position outside of the enterprise. Managing the workflows—worker output and interactions with data—presents new challenges for porous enterprises mixing full-time employees and on-demand workers.

pages: 302 words: 100,493

Working Backwards: Insights, Stories, and Secrets From Inside Amazon
by Colin Bryar and Bill Carr
Published 9 Feb 2021

“How can we make a 44-inch TV with an HD display that can retail for $1,999 at a 25 percent gross margin?” or “How will we make a Kindle reader that connects to carrier networks to download books without customers having to sign a contract with a carrier?” or “How many new software engineers and data scientists do we need to hire for this new initiative?” In other words, the FAQ section is where the writer shares the details of the plan from a consumer point of view and addresses the various risks and challenges from internal operations, technical, product, marketing, legal, business development, and financial points of view.

pages: 329 words: 99,504

Easy Money: Cryptocurrency, Casino Capitalism, and the Golden Age of Fraud
by Ben McKenzie and Jacob Silverman
Published 17 Jul 2023

We are committed to being fully licensed and regulated around the world, and we were recently awarded virtual assets service provider licenses in Bahrain and Dubai.” Binance’s best defense may be to claim basic technical incompetence—perhaps network congestion really did lead to malfunctions in the company’s app. What actually happened on May 19 remains a mystery. But people like Carol Alexander and Matt Ranger, a data scientist and former professional poker player, propose that the platform’s problems may go beyond simple technical outages. In blog posts, academic papers, and conversations with journalists, they have argued that Binance has been outplayed in its own casino. According to their analysis, Binance has become the perfect playground for professional trading firms to clean up against unsophisticated retail traders.

pages: 335 words: 101,992

Not the End of the World
by Hannah Ritchie
Published 9 Jan 2024

But, as much as it pains me to admit, plastic also has a good side, one that doesn’t get the recognition it deserves. I started writing this book during the Covid-19 pandemic. It seems odd to say that writing about climate change, air pollution and deforestation has been an escape, but it’s true. While I’m an environmental scientist by training, recently my role has been very different. I’ve become a data scientist in epidemiology – a job I didn’t quite know I was signing up for. Since the early days of the pandemic my team at Our World in Data have been collecting, visualising and sharing the global data on the evolution of the pandemic, updating every day, for every country, for as many metrics as possible.

pages: 130 words: 43,665

Powerful: Teams, Leaders and the Culture of Freedom and Responsibility
by Patty McCord
Published 9 Jan 2018

People Learn to Welcome Criticism Openly sharing criticism was one of the hardest parts of the Netflix culture for new employees to get used to, but most quickly came to appreciate how valuable the openness was. When I talked about this with one of our great team leaders, Eric Colson, he told me the giving and taking of honest feedback was central to how well his teams worked, and his teams worked beautifully. That’s why Eric rose to the position of VP of data science and engineering in less than three years at the company, having begun as an individual contributor. He’d been managing a small data analytics team at Yahoo! before coming to Netflix, and he recalled that the culture there was to be super supportive of people and not to criticize them. He told me that when he started getting critical feedback from colleagues at Netflix, “It hurt.

There’s a dangerous fallacy that data constitutes the facts you need to know to run your business. Hard data is absolutely vital, of course, but you also need qualitative insight and well-formulated opinions, and you need your team to debate those insights and opinions openly and with gusto. Data Doesn’t Have an Opinion I loved when we hired somebody new in data science, especially in the early days. We all had our own beliefs about customer behavior that they’d bust. In the beginning we opined about how the customer behaved based on ourselves as customers. We would argue back and forth, saying, “That’s not the way they watch; no, no, I don’t watch that way.” Then with the transition to streaming, we started to get actual viewing data.

The best employees are always looking for challenging new opportunities, and though they are usually intensely loyal, many of them will eventually seek those opportunities elsewhere. You can never know when they might decide to make a move, and often there is nothing you’ll be able to do to stop them. Earlier I mentioned Eric Colson. In less than three years, he rose from a data analyst position to the role of VP of data science and engineering, reporting directly to Reed and managing four big and very important teams. He had never expected to be given so much responsibility, certainly not so fast. He told me recently that he was, and still is, hugely grateful for the opportunities offered to him. He also loved the work he was doing at Netflix.

pages: 281 words: 71,242

World Without Mind: The Existential Threat of Big Tech
by Franklin Foer
Published 31 Aug 2017

Or if we want to be melodramatic about it, we could say Facebook is constantly tinkering with how its users view the world—always tinkering with the quality of news and opinion that it allows to break through the din, adjusting the quality of political and cultural discourse in order to hold the attention of users for a few more beats. But how do the engineers know which dial to twist and how hard? There’s a whole discipline, data science, to guide the writing and revision of algorithms. Facebook has a team, poached from academia, to conduct experiments on users. It’s a statistician’s sexiest dream—some of the largest data sets in human history, the ability to run trials on mathematically meaningful cohorts. When Cameron Marlow, the former head of Facebook’s data science team, described the opportunity, he began twitching with ecstatic joy. “For the first time,” Marlow said, “we have a microscope that not only lets us examine social behavior at a very fine level that we’ve never been able to see before but allows us to run experiments that millions of users are exposed to.”

For one group, Facebook excised the positive words from the posts in the News Feed; for another group, it removed the negative words. Each group, it concluded, wrote posts that echoed the mood of the posts it had reworded. This study was roundly condemned as invasive, but it is not so unusual. As one member of Facebook’s data science team confessed: “Anyone on that team could run a test. They’re always trying to alter people’s behavior.” There’s no doubting the emotional and psychological power possessed by Facebook—at least Facebook doesn’t doubt it. It has bragged about how it increased voter turnout (and organ donation) by subtly amping up the social pressures that compel virtuous behavior.

.* The formula supposedly illustrates how a piece of editorial content could go viral—how it could travel through the social networks to quickly reach a massive audience, as rapidly as smallpox ripped its way across North America. Peretti’s formula, in fact, came from epidemiology. The nod to science was intentional. With experimentation and careful reading of data, science could suggest which pieces had the best shot at achieving virality—and if not virality, then at least a robust audience. The emerging science of traffic was really a branch of behavioral psychology—people clicked so quickly, they didn’t always fully understand why they gravitated to one piece over another.

pages: 375 words: 88,306

The Sharing Economy: The End of Employment and the Rise of Crowd-Based Capitalism
by Arun Sundararajan
Published 12 May 2016

In a September 2014 panel discussion I participated in at the Techonomy Detroit conference, the moderator, Jennifer Bradley of the Aspen Institute, asked TaskRabbit’s president Stacy Brown-Philpot whether the platform had “flags or protections or things that could alert you to discrimination in the system or bad actors.” “We do. We have a data science team that we run [to] constantly to make sure we’re flagging and alerting human beings to actually go through and look at it,” Brown-Philpot replied, “and we actually track data on what drives somebody to select a tasker, and you can see all their pictures so you know what they look like, and the most important thing is a smile. That’s it.”30 Data science holds tremendous promise as a way to detect systemic forms of discrimination, often difficult to identify on a case-by-case basis during face-to-face interaction, but which may be brought to light and addressed with data analytics.

These include conversations with: Neha Gondal about the sociology of the sharing economy; Ravi Bapna, Verena Butt d’Espous, Juan Cartagena, Chris Dellarocas, Alok Gupta, and Sarah Rice about trust; Paul Daugherty, Peter Evans, Geoffrey Parker, Anand Shah, Marshall Van Alstyne, and Bruce Weinelt about platforms; Brad Burnham, Kanyi Maqubela, Simon Rothman, Craig Shapiro, and Albert Wenger about venture capital; Janelle Orsi, Nathan Schreiber, and Trebor Scholz about cooperatives; Umang Dua, Oisin Hanrahan, Micah Kaufmann, and Juho Makkonen about marketplace models; Gene Homicki about alternative rental models; Primavera De Filipi and Matan Field about the blockchain and decentralized peer-to-peer technologies; Ashwini Chhabra, Molly Cohen, Althea Erickson, David Estrada, Nick Grossman, David Hantman, Alex Howard, Meera Joshi, Veronica Juarez, Chris Lehane, Mike Masserman, Padden Murphy, Joseph Okpaku, Brooks Rainwater, April Rinne, Sofia Ranchordas, Michael Simas, Jessica Singleton, Adam Thierer, and Bradley Tusk about regulation; Elena Grewal, Kevin Novak, and Chris Pouliot about the use of data science in the sharing economy; Nellie Abernathy, Cynthia Estlund, Steve King, Wilma Liebman, Marysol McGee, Brian Miller, Michelle Miller, Caitlin Pearce, Libby Reder, Julie Samuels, Kristin Sharp, Dan Teran, Felicia Wong, and Marco Zappacosta about the future of work. I am also thankful to Congressman Darrell Issa, Congressman Eric Swalwell, and Senator Mark Warner for their leadership and for many conversations about critical sharing economy policy issues.

See also Car sharing platform, 77 trust and, 98, 145 Black Rock, 25 Blecharczyk, Nathan, 7, 8, 131 Blinder, Alan S., 159, 163–164 Blockchain, 18–19, 59–60, 86, 88 blockchain economies, 58–60, 87–95, 100–102 Blockchain: Blueprint for a New Economy (Swan), 93 Blurring of boundaries/lines, 8, 27, 46, 76, 141–142, 148, 171 Botsman, Rachel, 27–30, 35, 70, 81, 82 Bradley, Jennifer, 157 Brand-based trust, 144–146 Brastaviceanu, Tiberius, 199 Bresnahan, Timothy, 75 Brookings Institute, 179, 184 Brown-Philpot, Stacy, 157 Brynjolfsson, Erik, 75, 112, 165–166 Budelis, Katrina, 14 Burnham, Brad, 85–86, 187 Business of Sharing, The (Stephany), 28 Buterin, Vitalik, 95, 101–102 Button, 201 Byers, John W., 121 California Fruit Exchange, 197 California Labor Commissioner, 160 California Public Utilities Commission (CPUC), 153–154 Capalino, James, 136 Capital in the 21st Century (Piketty), 123 Card, David, 166 Car sharing, 1, 3. See also BlaBlaCar; Getaround; Lyft; Turo; Uber data science and, 157 La’Zooz, 94–95 local network effects, 119–120 regulatory challenges, 135 trust and, 98 Cartagena, Juan, 65 Castor, Emily, 9 Center for Global Enterprise, 119 Centers for Disease Control Chain, 86–87, 91 Chandler, Alfred, 4, 69, 71–72 Chase, Robin, 27, 66 Chen, Edward M., 160 Cheng, Denise, 184 Chéron, Guilhem, 16 Chesborough, Henry, 75–76 Chesky, Brian, 1, 7–9, 113, 122, 125, 131 Chhabra, Ashwini, 136 Chhabria, Vince, 160, 177 Choukrun, Marc-David, 16, 17 Clark, Shelby, 190 Clinton, Hillary, 161 Clothing and accessories rentals, 15–16 “Coase’s Penguin, or, Linux and the Nature of the Firm” (Benkler), 210n19 Cohen, Molly, 139, 153 Coleman, James, 60 Collaborative consumption, 28, 82 Collaborative economy, 25–28 Collaborative Economy Honeycomb, 82–84 Collaborative-Peer-Sharing Economy Summit, 105, 114–115 Commercial exchange, history of peer-to-peer, 4–5 Commons-based peer production, 30–32, 210n19 Community accommodation platforms and, 38–43 creation of, 36 funding, 41–43 human connectedness and, 44–46 service platforms and, 43–44 Congressional Sharing Economy Caucus, 137, 182 Consumer Electronic Show, 100 Consumerization of the digital, 54–55 Contractor dependent contractor, 183–184 versus employee, 159–160, 174–175, 178–182 Cooperatives; platform cooperativism, 19, 106, 178, 196–198 Couchsurfing, 38–40, 45, 121.

pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations
by Nicholas Carr
Published 5 Sep 2016

Technology would replace ideology. Today’s neobehavioralism has also been inspired by advances in computer technology, particularly the establishment of vast data banks of information on people’s behavior and the development of automated statistical techniques to parse the information. The MIT data scientist Alex Pentland, in his revealingly titled 2014 book Social Physics, offered something of a manifesto for the new behavioralism, using terms that, consciously or not, echoed what was heard in the early sixties: We need to move beyond merely describing social structure to building a causal theory of social structure.

pages: 385 words: 103,561

Pinpoint: How GPS Is Changing Our World
by Greg Milner
Published 4 May 2016

Whenever the satellite passed through the sky over a station, a sprawling network of ground antennas would measure the signal’s angle. That information would be sent back to Washington, converted into punch cards, and programmed into the computer. Between seven to nine hours after liftoff, the computer would have enough data to compute the satellite’s exact orbit and velocity. Minitrack had another component. Based on the data, scientists at the Smithsonian Astrophysical Observatory in Cambridge, Massachusetts, would calculate where the satellite would be most visible, and when. Scattered around the globe, in Florida, Mexico, Iran, Japan, and eight other locations, observation stations were established, each equipped with a camera that could locate an object up to 500 miles away, linked to a clock accurate to within a millisecond.

pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future
by Kevin Kelly
Published 6 Jun 2016

Over the next 30 years, the great work will be parsing all the information we track and create—all the information of business, education, entertainment, science, sport, and social relations—into their most primeval elements. The scale of this undertaking requires massive cycles of cognition. Data scientists call this stage “machine readable” information, because it is AIs and not humans who will do this work in the zillions. When you hear a term like “big data,” this is what it is about. Out of this new chemistry of information will arise thousands of new compounds and informational building materials.

pages: 385 words: 111,113

Augmented: Life in the Smart Lane
by Brett King
Published 5 May 2016

In the peer-reviewed paper released by Baylor College of Medicine and IBM, at the conclusion of the study, scientists were able to demonstrate a possible new path for generating scientific questions that may be helpful in the long-term development of new, effective treatments for disease. In a matter of weeks, biologists and data scientists, using Watson technology, accurately identified proteins that modify the p53 protein structure16. The study noted that this feat would have taken researchers years to accomplish without Watson’s cognitive capabilities. Watson analysed 70,000 scientific articles on p53 to predict proteins that turn on or off p53’s activity.

pages: 445 words: 105,255

Radical Abundance: How a Revolution in Nanotechnology Will Change Civilization
by K. Eric Drexler
Published 6 May 2013

From ancient photons captured by telescopes, astronomers infer the composition and motion of galaxies; by probing materials with electronic instruments, physicists infer the dynamics of systems of coupled electrons; by using their eyes and telemetry signals, ornithologists study how bar-tailed godwits cross from Alaska to New Zealand, spanning the Pacific in a single non-stop flight. Through data, scientists describe what they seek to explain. On the bridge to the top level of this sketch of science, concrete descriptions drive the evolution of theories, first by suggesting ideas about how the world works, and then by enabling tests of those ideas through an intellectual form of natural selection.

pages: 432 words: 106,612

Trillions: How a Band of Wall Street Renegades Invented the Index Fund and Changed Finance Forever
by Robin Wigglesworth
Published 11 Oct 2021

These days, even PhD economists aren’t guaranteed jobs in asset management, unless they have married their degree with a programming language like Python, which would allow them to parse vast digital datasets that are now commonplace, such as credit card data, satellite imagery, and consumer sentiment gleaned from continuously scraping billions of social media posts. Beating the market is not impossible. But the degree of difficulty in doing so consistently is far greater than it was in the past. Even giant, multibillion-dollar hedge funds staffed with an army of data scientists, programmers, rocket scientists, and the best financial minds in the industry can struggle to consistently outperform their benchmarks after fees. To use Mauboussin’s poker metaphor, not only are the remaining players around the table the best ones, but new ones entering the game are even more cunning, calculating, and inscrutable than in the past.* * * * ♦ THE RESULT IS THAT EVERY facet of the money management industry is being altered by the advent of index funds.

pages: 343 words: 102,846

Trees on Mars: Our Obsession With the Future
by Hal Niedzviecki
Published 15 Mar 2015

The games include Dungeon Scrawl, in which players have to move through quest-themed mazes and puzzles, and Wasabi Waiter, which, as you might imagine, is a game in which players have to match up sushi to an ever-growing number of customers. But here’s the rub. The games are carefully “designed by a team of neuroscientists, psychologists, and data scientists to suss out human potential.”35 According to Guy Halfteck, Knack’s founder, when you play any of the games, you generate massive amounts of data about how quickly you are able to solve problems and make the right decisions while multitasking and learning on the go. “The end result,” Halfteck says, “is a high-resolution portrait of your psyche and intellect, and an assessment of your potential as a leader or an innovator.”36 Hans Haringa, leader of petroleum giant Royal Dutch Shell’s GameChanger unit, asked about 1,400 people who had contributed ideas and proposals to the division to play Dungeon Scrawl and Wasabi Waiter.

pages: 370 words: 107,983

Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All
by Robert Elliott Smith
Published 26 Jun 2019

In the US, a program called Correctional Offender Management Profiling for Alternative Sanctions (Compas) has been used to inform the decisions of judges when assessing the likelihood of defendants reoffending, by comparing the big data of many past defendants to features of the individual facing time behind bars.2 The Los Angeles Police Department worked with data scientists at UCLA and Santa Clara University to develop PredPol, a predictive policing program that maps out ‘crime hotspots’ where police should concentrate their presence, patrols, intelligence-gathering exercises and other efforts to prevent crime from happening, because according to Modesto Police Chief Galen Carroll, ‘burglars and thieves work in a mathematical way, whether they know it or not’.3 In each case, it’s assumed that the evaluation of data about what some people have done in the past can predict the propensities of what other people will do in the future.

pages: 361 words: 107,461

How I Built This: The Unexpected Paths to Success From the World's Most Inspiring Entrepreneurs
by Guy Raz
Published 14 Sep 2020

While it was true that Stitch Fix would be selling clothes, it was the technology that was going to make the business work, and there was only one place in the country in 2011 that had the kind of concentration of technology talent Katrina would need to scale up as quickly as she wanted. That place was Silicon Valley. Almost overnight, the gravity shifted. “The talent is here,” she said. “We have many, many data scientists and many, many engineers, and if we wanted to fulfill the vision of using technology to deliver our service, it would have been very difficult to do it elsewhere.” A couple years later, Dropbox CEO Drew Houston, who like Katrina had come up with his idea and founded his company while in school in Boston and then moved to San Francisco to launch it in earnest, gave the commencement address to the graduating class at his alma mater, MIT.

pages: 387 words: 106,753

Why Startups Fail: A New Roadmap for Entrepreneurial Success
by Tom Eisenmann
Published 29 Mar 2021

So, Nagaraj moved to Palo Alto after graduation to pursue his vision alone. He recalled, “I had blind faith, but I had no validation for my idea, no investors, no product, and no team. I look back and wonder how I ever did it.” Upon arrival, Nagaraj recruited two new co-founders: an engineer and a data scientist. The team finished building the first version of Triangulate’s matching engine in October 2009. This version automated the collection of users’ digital information from sites like Facebook, Twitter, and Netflix using browser plug-ins and application programming interfaces (APIs). To avoid all the technical jargon, Nagaraj referred to these plug-ins and APIs as “life-stream connectors.”

The Smart Wife: Why Siri, Alexa, and Other Smart Home Devices Need a Feminist Reboot
by Yolande Strengers and Jenny Kennedy
Published 14 Apr 2020

Gemma Hartley, Fed Up: Emotional Labor, Women, and the Way Forward (New York: HarperOne, 2018). 72. Schiller and McMahon, “Alexa, Alert Me When the Revolution Comes.” 73. Schiller and McMahon, “Alexa, Alert Me When the Revolution Comes,” 185, citing a 2017 job description for a position as a data scientist in the “Alexa Engine” team. 74. Schiller and McMahon, “Alexa, Alert Me When the Revolution Comes,” 185. 75. Emma, “The Gender Wars of Household Chores: A Feminist Comic,” Guardian, May 26, 2017, https://www.theguardian.com/world/2017/may/26/gender-wars-household-chores-comic. 76. Caroline Criado Perez, Invisible Women: Data Bias in a World Designed for Men (New York: Abrams, 2019). 77.

pages: 385 words: 106,848

Number Go Up: Inside Crypto's Wild Rise and Staggering Fall
by Zeke Faux
Published 11 Sep 2023

CHAPTER SEVEN “A Thin Crust of Ice” The investor-protection bureau operated out of an office tower in New York’s Financial District. Under former attorney general Elliot Spitzer, it was feared by Wall Street investment banks, but in the years since, its resources had dwindled. It employed few of the data scientists and economists needed to analyze financial markets, and certainly no cryptocurrency specialists. In 2017, John Castiglione, then thirty-eight, and a colleague named Brian Whitehurst were assigned to investigate the cryptocurrency market. It was a huge assignment for such a small team, especially given that neither of them had specialized knowledge of the industry.

pages: 338 words: 105,112

Life as We Made It: How 50,000 Years of Human Innovation Refined--And Redefined--Nature
by Beth Shapiro
Published 15 Dec 2021

In 2011, the AKC began registering Dalmatians that passed this genetic test, paving the way for genetic solutions for other diseases caused by breeding selection. Genetic tests like that for hyperuricosuria have begun to change how breeding choices are made. Global DNA sequencing collaborations have created reference databases for bears, dogs, cattle, apples, maize, tomatoes, and dozens of other domesticated lineages. With these data, scientists and breeders are discovering which genes map to which traits and using this information to match breeds to environments, to move traits among populations and breeds, and to avoid the propagation of maladaptive genes. The consequences are significant: a 2019 report by Thomas Lewis and Cathryn Mellersh, both associated with The Kennel Club in the United Kingdom, found that several genetic diseases, including exercise-induced collapse in Labrador retrievers, early-onset cataracts in Staffordshire bull terriers, and progressive blindness in cocker spaniels, have been nearly eliminated among Kennel Club–registered dogs since genomic tests have become available.

pages: 321 words: 105,480

Filterworld: How Algorithms Flattened Culture
by Kyle Chayka
Published 15 Jan 2024

Streaming-era songs are often brief, too—Grimes, for the deluxe version of her 2020 album Miss Anthropocene, released a few “Algorithm Mix” iterations of songs, cutting down their run time, making them denser and more immediately compelling, better for algorithmic feeds. (It’s not dissimilar to the “radio mixes” of the past.) On average, hit songs have gotten shorter in the past two decades, decreasing a total of thirty seconds from 4:30 in 1995 to 3:42 in 2019. Data scientists at UCLA calculated that the average length of a song released in 2020 on Spotify was just 3:17, and that length is trending even shorter. The musicologist Nate Sloan has argued that the collective shortening is caused by the incentives of streaming services—Spotify, for example, counts thirty seconds of listening as a “play” and pays out royalties based on that metric.

pages: 704 words: 182,312

This Is Service Design Doing: Applying Service Design Thinking in the Real World: A Practitioners' Handbook
by Marc Stickdorn , Markus Edgar Hormess , Adam Lawrence and Jakob Schneider
Published 12 Jan 2018

Lessons learned A valuable lesson was how the insights from the data science informed the ethnography (e.g., revealing how mental and physical health are related), and how the ethnography informed the data science (e.g., highlighting the non-health needs of those on health-related benefits). There is huge power in using these two techniques together, with the data science giving the broad, large-scale “what” and the ethnography providing the deep, rich “why.” KEY TAKEAWAYS 01 Data science can inform ethnographic insights (and vice versa) through correlation of different events. 02 Combine data science to understand the large-scale context with ethnography to determine the deeper meaning or “why” of your research. 03 When conducting research, speak with people from all ages, levels, and perspectives.

The approach The UK government’s Policy Lab and the joint Work & Health Unit (a joint unit sponsored by the Department of Health and Department for Work and Pensions) created a multidisciplinary team with the service design agency Uscreates, ethnography agency Keep Your Shoes Dirty, and data science organization Mastodon C, and involved around 70 service providers, users, and stakeholders to solve the problem. After a three-day sprint to properly diagnose the problem, we embarked on a discovery phase of ethnography and data science, and a develop phase where we co-designed and prototyped ideas which we are now taking to scale. We conducted ethnography with 30 users and people that supported them: doctors, employers, Jobcentre staff, and community groups. The insights We used data science techniques (Sankey analysis and k-means clustering) to look at patterns of people surveyed through the Understanding Society survey.

CAT DREW — SENIOR POLICY DESIGNER, UK POLICY LAB Cat is a hybrid policy maker and designer with more than 10 years of experience working in government, including at the Cabinet Office and No. 10. She also holds a postgraduate degree in Design. This allows her to seek out innovative new practices (such as speculative design, data visualization, and combining rich user insight and big data science) and experiment with how they could work in government. CHRIS FERGUSON — CEO, BRIDGEABLE Chris is a service design leader and CX strategist who works with complex organizations such as Roche, TELUS, Genentech, RBC, and Mount Sinai Hospital to increase the impact of their services. He is the Founder and CEO of Bridgeable, a lecturer at the University of Toronto’s Rotman School of Management, and the Co-Founder of the Canadian chapter of the Service Design Network.

pages: 229 words: 75,606

Two and Twenty: How the Masters of Private Equity Always Win
by Sachin Khajuria
Published 13 Jun 2022

Just by examining deals, as well as doing them, you gain valuable investing experience—including sector expertise and contacts with management teams. And your picture about the macroeconomy sharpens markedly, too. You develop deep knowledge about the state of the economy by analyzing its component parts. You have an edge. Once you have that edge, you do not stop. You do not stand still; you invest in data science and machine learning to harness more of the power of the information you gather. And when you combine that power with your track record for investing and the enormous reserves of cash you have on tap to do deals, you start to outpace your rivals more and more. It is a virtuous circle, with the odd mishap along the way, a deal here or there that fails and can’t be salvaged.

It is clear to the special committee that although the two firms were founded in the same year, the Firm is light-years ahead. It has grown to be more than three times larger than Madison Stone in assets under management across its investment strategies for private capital. The Firm has the best tools and people in data science, information technology, financial reporting, risk management, and environmental impact—part of the core infrastructure of a modern private equity firm. The Firm already has a preliminary view on each asset in Madison Stone’s portfolio, using its library—of the macro picture and the micro trends of the customers, suppliers, and supply and demand.

They have achieved this evolution, remarkably, without diluting the cultures and working practices that are specific to their firms and the traits of success we have been discussing. Lateral hires have been integrated into partnerships and senior management layers, and first-class talent has been brought in to run key parts of a firm’s infrastructure from the CFO’s office to human capital to data science. Where it has made sense to seed or buy a stake in another firm rather than start up a new industry vertical, they have taken the opportunity to do so. Double-digit growth, delivered each year. Whisper it softly…but the truth is that comparing what private equity firms used to be—and where the perception of private equity still sits in many quarters—to what they are now is like comparing a Motorola cellphone from the 1990s to the latest iPhone.

pages: 298 words: 43,745

Understanding Sponsored Search: Core Elements of Keyword Advertising
by Jim Jansen
Published 25 Jul 2011

Therefore, the prediction was that black swans cannot exist. However, black swans do exist, being native to Australia. Basically, in the end, we cannot prove that something will or will not occur just because it occurred or did not occur in the past. However, this does not mean that we cannot do anything with data. Scientists have gotten around this touchy point by using data only to disprove something. That is, empirically, we can show that there is evidence to disprove a hypothesis, but we cannot prove a hypothesis is true. The best we can say is that a hypothesis is supported based on the data. We see this in the warnings in the marketing literature of financial investments – “Past performance is not a guarantee of future success.”

pages: 373 words: 112,822

The Upstarts: How Uber, Airbnb, and the Killer Companies of the New Silicon Valley Are Changing the World
by Brad Stone
Published 30 Jan 2017

Once again seeking to take advantage of the attention that comes with an onstage appearance at an industry conference, Kalanick wanted to launch the company’s first international city during LeWeb, the European technology confab where, three years before, he and Camp had hashed over plans for a hypothetical on-demand car service. By then the startup had finally moved to its own office, on the seventh floor of 800 Market Street. It had a round conference room with broad windows that opened up onto Market, the city’s main commercial artery. There were twenty employees in the new office, mostly engineers and data scientists, and another dozen in the field. The engineers rebelled against the idea of opening overseas so soon. Launching in Paris required accepting foreign credit cards, converting euros to dollars, and translating the app into French, among other tasks. Kalanick simply directed his team to work harder.

pages: 501 words: 114,888

The Future Is Faster Than You Think: How Converging Technologies Are Transforming Business, Industries, and Our Lives
by Peter H. Diamandis and Steven Kotler
Published 28 Jan 2020

They leverage (that is, rent out) the assets (spare bedrooms) of the crowd. These models also lean on staff-on-demand, which provides a company with the agility needed to adapt to a rapidly changing environment. Sure, this once meant call centers in India, but today it’s everything from micro-task laborers behind Amazon’s Mechanical Turk on the low end to Kaggle’s data scientist-on-demand service on the high end. The Free/Data Economy: This is the platform version of the “bait and hook” model, essentially baiting the customer with free access to a cool service (like Facebook) and then making money off the data gathered about that customer (also like Facebook). It also includes all the developments spurred by the big data revolution, which is allowing us to exploit micro-demographics like never before.

pages: 385 words: 112,842

Arriving Today: From Factory to Front Door -- Why Everything Has Changed About How and What We Buy
by Christopher Mims
Published 13 Sep 2021

Upward mobility may be at historic lows in the United States, but education and hard work still propel some out of these jobs. Amazon itself announced in July 2019 that it would spend more than $700 million to “upskill” 100,000 of its employees, propelling them into roles including “data mapping specialist, data scientist, solutions architect and business analyst, as well as logistics coordinator, process improvement manager and transportation specialist.” I asked associates about this program and the opportunities for training at Amazon in general. Associates in the Baltimore fulfillment center said that the most popular course of study for associates there was a commercial driver’s license.

pages: 354 words: 118,970

Transaction Man: The Rise of the Deal and the Decline of the American Dream
by Nicholas Lemann
Published 9 Sep 2019

What radio had been for Roosevelt, a new mass medium that offered unprecedented possibilities for a politician who wanted to connect with the public, the new online networks—which by now had far bigger audiences than newspapers, radio, or television—were for Obama. The White House hired a former LinkedIn executive, DJ Patil, as its first chief data scientist. LinkedIn provided proprietary data about the employment market to the White House, to be used in the annual “Economic Report of the President.” When the website associated with Obama’s health-care reform legislation had an unsuccessful debut, Hoffman was part of a group of Silicon Valley executives that organized a rescue operation.

pages: 521 words: 110,286

Them and Us: How Immigrants and Locals Can Thrive Together
by Philippe Legrain
Published 14 Oct 2020

‘If you want to attract the best talent, you need to be reflective of the talent in that market,’ says Eileen Taylor, Deutsche Bank’s global head of diversity.36 Diverse perspectives can also help generate better solutions to problems. In Chapter 9 on the deftness dividend, we met Rúben, the Portuguese head of design at Century Tech, a London-based edtech company. He manages a team made up of a Lithuanian designer, a Dutch one and three Britons. Century also employs a Chinese-born data scientist who studied in Toronto, San Diego and London, both a Brazilian engineer and an Argentinian one who previously worked in Germany, a developer who is an Eritrean-born Swede, an account manager from Lebanon and a project manager from Ukraine, among others. ‘It’s great to work with a diverse team with people from different backgrounds,’ Rúben remarks.

pages: 463 words: 115,103

Head, Hand, Heart: Why Intelligence Is Over-Rewarded, Manual Workers Matter, and Caregivers Deserve More Respect
by David Goodhart
Published 7 Sep 2020

However, our experience suggests that many of the recipients of professional advice are in fact seeking a reliable solution or outcome rather than a trusted adviser per se.”9 Like Haldane and Baldwin, the Susskinds advise young people either to look for jobs that either favor human capabilities over artificial intelligence—above all, creativity and empathy—or become directly involved in the design and delivery of these increasingly capable systems as a data scientist or knowledge engineer. Less Room at the Top There is plenty of plausible skepticism about how swiftly AI is going to advance. Yet even if some of the predictions turn out to be overenthusiastic, there is other evidence for the decline and fall of the knowledge worker all around us: the decline in the graduate pay premium, especially in the United Kingdom; the increasing number of graduates in nongraduate jobs; and even the shrinkage, or at least slower growth, of the top managerial and professional social class.

pages: 381 words: 113,173

The Geek Way: The Radical Mindset That Drives Extraordinary Results
by Andrew McAfee
Published 14 Nov 2023

It turns out that overconfidence and confirmation bias seem to be as common among scientists as they are among people in other professions. In his 2020 book, The Knowledge Machine, philosopher Michael Strevens concludes that scientists are just as biased and mentally sloppy as the rest of us: “In their thinking about the connection between theory and data, scientists seem scarcely to follow any rules at all.” Kahneman and his colleagues on the textbook project, in fact, followed a completely unscientific rule of “ignore data you don’t like.” What caused them to do this? And whatever it is, how can it possibly be a feature instead of a bug—how can it be useful for us to be so chronically biased in our thinking?

pages: 421 words: 120,332

The World in 2050: Four Forces Shaping Civilization's Northern Future
by Laurence C. Smith
Published 22 Sep 2010

From this model and others, we see that by midcentury the Mediterranean, southwestern North America, north and south Africa, the Middle East, central Asia and India, northern China, Australia, Chile, and eastern Brazil will be facing even tougher water-supply challenges than they do today. One model even projects the eventual disappearance of the Jordan River and the Fertile Crescent267—the slow, convulsing death of agriculture in the very cradle of its birth. Computer models like these aren’t built and run in a vacuum. They are built and tuned using whatever real-world data scientists can get their hands on. Take, for example, the western United States. In Kansas, falling water tables from groundwater mining is already drying up the streams that refill four federal reservoirs; another in Oklahoma is now bone-dry. These past observed trends, together with reasonable expectations of climate change, suggest that over half of the region’s surface water supply will be gone by 2050.268 Kevin Mulligan’s projection of the remaining life of the southern Ogallala Aquifer requires no climate models at all—it simply subtracts how much water we are currently pumping from what’s left in the ground, then counts down the remaining years until the water is gone.

pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr
by Doug Turnbull and John Berryman
Published 30 Apr 2016

Often simpler relevance gains can be gathered with the straightforward techniques discussed earlier in this book. In our consulting work, we’re often hired to implement an advanced solution when a far simpler adjustment can provide more immediate and less risky gains for an organization. You don’t need data scientists to provide a simple tweak to an analyzer or query strategy that gains you a significant—and with test-driven relevancy—measurable improvement to search’s bottom line. Nevertheless, with the right expertise and data in place, learning to rank can be extremely powerful in helping push beyond the “diminishing returns” of relevance tuning.

pages: 401 words: 119,488

Smarter Faster Better: The Secrets of Being Productive in Life and Business
by Charles Duhigg
Published 8 Mar 2016

Since I contacted Gawande four years ago, I’ve sought out neurologists, businesspeople, government leaders, psychologists, and other productivity experts. I’ve spoken to the filmmakers behind Disney’s Frozen, and learned how they made one of the most successful movies in history under crushing time pressures—and narrowly averted disaster—by fostering a certain kind of creative tension within their ranks. I talked to data scientists at Google and writers from the early seasons of Saturday Night Live who said both organizations were successful, in part, because they abided by a similar set of unwritten rules regarding mutual support and risk taking. I interviewed FBI agents who solved a kidnapping through agile management and a culture influenced by an old auto plant in Fremont, California.

pages: 457 words: 126,996

Hacker, Hoaxer, Whistleblower, Spy: The Story of Anonymous
by Gabriella Coleman
Published 4 Nov 2014

Even if Anonymous could never replicate the high levels of its LulzSec/AntiSec days (back when Fox News’ description of them as “hackers on steroids” was apt), Anonymous carried along just fine through 2012 and much of 2013, executing major hacks and attacks across the world. Its formidable reputation is best illustrated by an anecdote about the highest echelons of US officialdom. In 2012, Barack Obama’s reelection campaign team assembled a group of programmers, system administrators, mathematicians, and data scientists to fine-tune voter targeting. Journalists praised Obama’s star-studded and maverick technology team, detailing its members’ hard work, success, and travails, and ultimately heralding the system as a success. These articles, however, failed to report one of the team’s big concerns. Throughout the campaign, the technologists had treated Anonymous as a potentially even bigger nuisance than the foreign state hackers who had infiltrated the McCain and Obama campaigns in 2008.38 In late November 2012, Asher Wolf, a geek crusader who acted as a sometimes-informal-adviser to Anonymous, noticed that Harper Reed, the chief technologist for Obama’s reelection tech team, followed @AnonyOps on Twitter.

pages: 432 words: 124,635

Happy City: Transforming Our Lives Through Urban Design
by Charles Montgomery
Published 12 Nov 2013

They say they want economic development, livability, mobility, housing affordability, taxes, all stuff that relates to happiness.” These are just the concerns that have caused us to delay action on climate change. But Boston insists that by focusing on the relationship between energy, efficiency, and the things that make life better, cities can succeed where scary data, scientists, logic, and conscience have failed. The happy city plan is an energy plan. It is a climate plan. It is a belt-tightening plan for cash-strapped cities. It is also an economic plan, a jobs plan, and a corrective for weak systems. It is a plan for resilience. The Green Surprise Consider the by-product of the happy city project in Bogotá.

pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence
by John Brockman
Published 5 Oct 2015

Next question. An executive might ask, “The algorithm is doing well on loan applications in the United Kingdom. Will it also do well if we deploy it in Brazil?” There’s no satisfying answer here, either; we’re not good at assessing how well a highly optimized rule will transfer to a new domain. A data scientist might say, “We know how well the algorithm does with the data it has. But surely more information about the consumers would help it. What new data should we collect?” Our human domain knowledge suggests lots of possibilities, but with an incomprehensible algorithm we don’t know which of these possibilities will help it.

pages: 756 words: 120,818

The Levelling: What’s Next After Globalization
by Michael O’sullivan
Published 28 May 2019

India is a case in point: from 1991 to 2011 the number of internal migrants more than doubled. In 60 percent of cases, global migration consists of people moving to neighboring countries. As an example, a large proportion of Indian migrants move to regional neighbors, such as the United Arab Emirates, Kuwait, and Saudi Arabia.7 An excellent resource here comes from the data scientist Max Galka, who graphically tracks the flow of immigrants across the world.8 He produces long-term charts showing the waves of immigration into the United States over the past two centuries. His data show that America is founded on a bedrock of German, Irish, Italian, and eastern European immigrants and that lately (since the late 1980s) the biggest flow of immigrants has come from Mexico.

pages: 578 words: 131,346

Humankind: A Hopeful History
by Rutger Bregman
Published 1 Jun 2020

Psychologically, physiologically, neurologically – they must be every kind of screwed up. They must be psychopaths, or maybe they never went to school, or grew up in abject poverty – there must be something to explain why they deviate so far from the average person. Not so, say sociologists. These stoic data scientists have filled miles of Excel sheets with the personality traits of people who have blown themselves up, only to conclude that, empirically, there is no such thing as an ‘average terrorist’. Terrorists span the spectrum from highly to hardly educated, from rich to poor, silly to serious, religious to godless.

pages: 385 words: 123,168

Bullshit Jobs: A Theory
by David Graeber
Published 14 May 2018

Then the total number of “fails” in each department would be turned over to be tabulated by a metrics division, this allowing everyone involved to spend hours every week in meetings arguing over whether any particular “fail” was real. Irene: There was an even higher caste of bullshit, propped atop the metrics bullshit, which were the data scientists. Their job was to collect the fail metrics and apply complex software to make pretty pictures out of the data. The bosses would then take these pretty pictures to their bosses, which helped ease the awkwardness inherent in the fact that they had no idea what they were talking about or what any of their teams actually did.

pages: 480 words: 119,407

Invisible Women
by Caroline Criado Perez
Published 12 Mar 2019

A widely quoted 1967 psychological paper had identified a ‘disinterest in people’ and a dislike of ‘activities involving close personal interaction’ as a ‘striking characteristic of programmers’.62 As a result, companies sought these people out, they became the top programmers of their generation, and the psychological profile became a self-fulfilling prophecy. This being the case, it should not surprise us to find this kind of hidden bias enjoying a resurgence today courtesy of the secretive algorithms that have become increasingly involved in the hiring process. Writing for the Guardian, Cathy O’Neil, the American data scientist and author of Weapons of Math Destruction, explains how online tech-hiring platform Gild (which has now been bought and brought in-house by investment firm Citadel63) enables employers to go well beyond a job applicant’s CV, by combing through their ‘social data’.64 That is, the trace they leave behind them online.

pages: 402 words: 126,835

The Job: The Future of Work in the Modern Era
by Ellen Ruppel Shell
Published 22 Oct 2018

the Weather Channel broadcasts 18 million forecasts John Koetsier, “Data Deluge: What People Do on the Internet, Every Minute of Every Day,” Inc.com, July 25, 2017, https://www.inc.com/​john-koetsier/​every-minute-on-the-internet-2017-new-numbers-to-b.html. continuously improving its performance Many thanks to the very kind and patient data scientists who helped clarify this for me, and also see Christof Koch, “How the Computer Beat the Go Master,” Scientific American, March 19, 2016, https://www.scien­tificam­erican.com/​article/​how-the-computer-beat-the-go-master/. the third leading cause of death in America Martin A. Makary and Michael Daniel, “Medical Error: The Third Leading Cause of Death in the US,” British Medical Journal, May 3, 2016, i2139, http://dx.doi.org/​doi:10.1136/​bmj.i2139.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design
by Michael Kearns and Aaron Roth
Published 3 Oct 2019

We shall grapple here with difficult and consequential issues, but at the same time we want to communicate the excitement of new science. It is in this spirit of uncertainty and adventure that we begin our investigations. 1 Algorithmic Privacy From Anonymity to Noise “Anonymized Data Isn’t” It has been difficult for medical research to reap the fruits of large-scale data science because the relevant data is often highly sensitive individual patient records, which cannot be freely shared. In the mid-1990s, a government agency in Massachusetts called the Group Insurance Commission (GIC) decided to help academic researchers by releasing data summarizing hospital visits for every state employee.

But now that we know this, can the problem of privacy be solved by simply concealing information about birthdate, sex, and zip code in future data releases? It turns out that lots of less obvious things can also identify you—like the movies you watch. In 2006, Netflix launched the Netflix Prize competition, a public data science competition to find the best “collaborative filtering” algorithm to power Netflix’s movie recommendation engine. A key feature of Netflix’s service is its ability to recommend to users movies that they might like, given how they have rated past movies. (This was especially important when Netflix was primarily a mail-order DVD rental service, rather than a streaming service—it was harder to quickly browse or sample movies.)

These days virtually every major research university makes grand claims to interdisciplinarity, but Penn is the real deal. For that we give warm thanks to Eduardo Glandt, Amy Gutmann, Vijay Kumar, Vincent Price, and Wendell Pritchett. We are particularly grateful to Fred and Robin Warren, founders and benefactors of Penn’s Warren Center for Network and Data Sciences, for helping to create the remarkable intellectual melting pot that allowed this book to develop. Many thanks to Lily Hoot of the Warren Center for her unflagging professionalism and organizational help. We also are grateful to Raj and Neera Singh, founders and benefactors of Penn’s Networked and Social Systems Engineering (NETS) Program, in which we developed much of the narrative expressed in these pages.

pages: 139 words: 35,022

Roads and Bridges
by Nadia Eghbal

NumFOCUS is an example of a 501(c)(3) foundation that supports open source scientific software through fiscal sponsorship and donations. [190] An external foundation model could help provide the support that scientific software needs within the context of an academic environment. The Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation are also experimenting with ways to connect academic institutions with maintainers of data science software, in order to facilitate an open and sustainable ecosystem. [191] Opportunities Ahead Developing effective support strategies Although there is growing interest in efforts to support digital infrastructure, current initiatives are still new, ad hoc or provide only partial support (such as fiscal sponsorship).

The enormous social contributions of today’s digital infrastructure cannot be ignored or argued away, as has happened with other, equally important debates about data and privacy, net neutrality, or private versus public interests. This makes it easier to shift the conversation to solutions. Secondly, there are already engaged, thriving open source communities to work with. Many developers identify with the programming language they use (such as Python or JavaScript), the function they provide (such as data science or devops), or a prominent project (such as Node.js or Rails). These are strong, vocal, and enthusiastic communities. The builders of our digital infrastructure are connected to each other, aware of their needs, and technically talented. They already built our city; we just need to help keep the lights on so they can continue doing what they do best.

pages: 344 words: 104,077

Superminds: The Surprising Power of People and Computers Thinking Together
by Thomas W. Malone
Published 14 May 2018

For a description of the invention of the process for printing on Pringles, see Larry Huston and Nabil Sakkab, “Connect and Develop: Inside Procter & Gamble’s New Model for Innovation,” Harvard Business Review, March 2006, reprint no. R0603C, https://hbr.org/2006/03/connect-and-develop-inside-procter-gambles-new-model-for-innovation. 6. Vincent Granville, “21 Data Science Systems Used by Amazon to Operate Its Business,” Data Science Central, November 19, 2015, http://www.datasciencecentral.com/profiles/blogs/20-data-science-systems-used-by-amazon-to-operate-its-business. 7. Martin Reeves and Daichi Ueda use the term integrated strategy machine to describe a somewhat similar idea in “Designing the Machines That Will Design Strategy,” Harvard Business Review, April 18, 2016, https://hbr.org/2016/04/welcoming-the-chief-strategy-robot.

For instance, if the people who submit proposed strategies for all the parts of your business include revenue and expense projections, then spreadsheets (or other simple programs) can do a good job of estimating the consolidated earnings for your whole company. Or if you’ve already done enough market research to have good automated models of how different customers respond to price changes, then you could use those models to estimate your revenue at different price points. For instance, Amazon has done vast amounts of data-science work to develop detailed models of many parts of its business: how customers respond to prices, ads, and recommendations; how supply-chain costs vary with inventory policies, delivery methods, and warehouse locations; and how load balancing and server purchases affect software and hardware costs.6 With tools like these, computers can do much of the work by “running the numbers,” and people can then use their general intelligence to do a higher level of analysis.

pages: 302 words: 73,581

Platform Scale: How an Emerging Business Model Helps Startups Build Large Empires With Minimum Investment
by Sangeet Paul Choudary
Published 14 Sep 2015

As the value of the platform increases with greater participation, consumers and producers are organically incentivized to stay engaged on the platform because the platform provides increasing amounts of value to both parties. DATA SCIENCE IS THE NEW BUSINESS PROCESS OPTIMIZATION Pipes achieve scale by improving the repeatability and efficiency of value-creation processes. The world of pipes required process engineering and optimization. Process engineers and managers helped improve internal processes and make them more efficient. In a platformed world, value is created in interactions between users, powered by data. Data science improves the platform’s ability to orchestrate interactions in the ecosystem. As value creation moves from organizational processes to ecosystem interactions, the focus of efficiency shifts from the enhancement of controlled processes to the improvement of the platform’s ability to orchestrate interactions in the ecosystem.

Community management is the new human resources management 6. Liquidity management is the new inventory control 7. Curation and reputation are the new quality control 8. User journeys are the new sales funnels 9. Distribution is the new destination 10. Behavior design is the new loyalty program 11. Data science is the new business process optimization 12. Social feedback is the new sales commission 13. Algorithms are the new decision makers 14. Real-time customization is the new market research 15. Plug-and-play is the new business development 16. The invisible hand is the new iron fist 1.3 THE RISE OF THE INTERACTION-FIRST BUSINESS A Fundamental Redesign Of Business Logic Platforms compete with each other on the basis of their ability to enable interactions sustainably.

pages: 284 words: 79,265

The Half-Life of Facts: Why Everything We Know Has an Expiration Date
by Samuel Arbesman
Published 31 Aug 2012

For example, researchers have examined the drinking establishment locations and characteristics in different communities, and even whether the elderly are capable of crossing the street in the time a given traffic light provides them. In the past few years there has been a surge in what is being called data science. Of course, all science uses data, but data science is more of a return to the Galtonian approach, where through the analysis of massive amounts of data—how people date on the Internet, make phone calls, shop online, and much more—one can begin to visualize and make sense of the world, and in the process discover new facts about ourselves and our surroundings.

actuarial escape velocity, 53 Akaike Information Criterion, 69–70 Albert, Réka, 103 aluminum, 53 Ambient Devices, 195 amyotrophic lateral sclerosis (ALS), 98, 100–101 anatomy, 23 Anaxagoras, 201 Anaximander, 201 Andreessen, Marc, 123 Annals of Internal Medicine, 107 apatosaurus, 79–82 apoptosis (programmed cell death), 111, 194 Aral, Sinan, 143 Arbesman, Harvey, 96–98, 100–101 Arbesman, Samuel, 79 Ariely, Dan, 172 Asimov, Isaac, 35–36 asteroids, 22, 23, 51, 85–86, 183–84 athletes, 51 Atlantic, 86, 198 Australia, 57, 59, 60 automated discovery programs, 112–14 Automated Mathematician, 112 Babbage, Charles, 106–7 Back to the Future (film), 211 Bak, Per, 137–38 Barabási, Albert-László, 103 Battle of New Orleans, 70 Bede, 115–16 Being Wrong (Schulz), 174–75, 201–2 Berlin, 64 Berman, David, 81–82 Bettencourt, Luís, 135 Bingham, Alpheus, 96–97 biomarkers, 98 Black Death, 52, 64, 71, 73 board games, 2, 51 Bohemian Journal of Counting, 86 Bone Wars, 80, 169 bookkeeping, double-entry, 200 Book of Lost Books, The: An Incomplete History of All the Great Books You’ll Never Read (Kelly), 115 Boston Globe, 86 Bowers, John, 85–86 Boyle, Robert, 94 Bradley, David, 62–63 brain, 205, 207 branching process, 104 Bremer, Arthur, 66 British Medical Journal, 83, 212 brontosaurus, 79–82, 169 Brooks, David, 198 Brooks, Rodney, 46 bubonic plague, 52 Black Death, 52, 64, 71, 73 “Bully for Brontosaurus” (Gould), 82 calculations, 43–44 calculus, 67 Canterbury Tales, The (Chaucer), 90 Caplan, Bryan, 58 Cardarelli, François, 146 Carroll, Sean, 36–37 carrying capacity, 45 cell death, programmed, 111, 194 cell phone calls, 69, 77 Census of Marine Life, 37–39 Chabon, Michael, 184 Chabris, Christopher, 178 chain letters, 91–93 change: fast, 207–9 slow, 171, 172, 190, 191 change blindness, 177–79 Chaucer, Geoffrey, 90 chemical elements, 6, 22, 23, 50–51 atomic number of, 150–51 atomic weight of, 150–52 periodic table of, 50, 150–52, 182 thermal conductivity of, 33–35 Christakis, Nicholas, 21, 75 Christensen, Clayton, 45 chromosomes, 1–2, 89, 92, 143 cirrhosis, 28–30 Cisne, John, 116 citations, 17, 31–32, 90–91, 108 cities, 135–36, 202 citizen science, 19–21 Clarke, Arthur C., 18–19 classification systems, 204–5 Clay Mathematics Institute, 133 climate change, 203 clinical trials, 107–9, 157, 160 coelacanths, 26–27 cognitive biases, 175–76, 177, 188 cognitive dissonance, 4 Colbert, Stephen, 193 Cole, Jonathan, 48–49 Cole, Stephen, 162, 163 computation, human, 20 computers, 20, 41, 53, 110 automated discovery programs, 112–14 Babbage and, 106–7 games and, 2, 51 information transformation and, 43–44, 46 Moore’s Law and, 42 confirmation bias, 177 Consumer Price Index (CPI), 196 Cope, Edward, 80, 81, 169 Copernicus, Nicolaus, 206 CoPub Discovery, 110–12 Cosmos, 121, 129 Couric, Katie, 41 Courtenay-Latimer, Marjorie, 26–27 Cowen, Tyler, 23 cryptography, 134 cumulative knowledge, 56–57 Daily Show, The, 159 Darwin, Charles, 79, 80, 105, 166, 187 data science, 167–68 Davy, Humphry, 51 decline effect, 155–56, 157, 162 de Grey, Aubrey, 53 demographics, 204 Dessler, A. J., 148–49, 155 deuterium, 151 Devezas, Tessaleno, 207–8 DEVONthink, 118–19 Diabetes Care, 67 dialect, situation-based, 190 Diamond, Arthur, 187 Dictionary of Theories (Bothamley, ed.), 85 dinosaurs, 3, 79–82, 168–69, 194 discovery: long tail of, 38 multiple independent, 104–5 pace of, 9–25 discriminating power, 159–60 diseases, 52, 176–77 categorization of, 205 spread of, 62, 64 Dittmar, Jeremiah, 71, 73 Dixon, William Macneile, 8 DNA, 88, 90, 122, 163 drugs, 24, 111–12 repurposing of, 112 streptokinase, 108–9 Dunbar, Robin, 205 Dunbar’s Number, 205–6 Earth, curvature of, 35–36 education, 182–83, 195 Einstein, Albert, 36, 106, 186 Electronics, 42 Ellsworth, Henry, 54 e-mail, 41 Empedocles, 201 Encyclopaedia of Scientific Units, Weights, and Measures: Their SI Equivalences and Origins (Cardarelli), 146 EndNote, 117–18 energy, 55, 204 Eos, 148 Erdo˝s, Paul, 104 errors, 78–95 contrary to popular belief phrase and, 84–85 Essay on the Application of Mathematical Analysis to the Theories of Electricity and Magnetism, An (Green), 106 eurekometrics, 21, 22 Eureqa, 113–14 Everest, George, 140 evolution, 79, 187 evolutionary programming, 113 evolutionary psychology, 175 expertise, long tail of, 96, 102 experts, 96–97 exponential growth, 10–14, 44–45, 46–47, 54–55, 57, 59, 130, 204 extinct species, 26, 27–28 facts, see knowledge and facts factual inertia, 175, 179–83, 188, 190, 199 Fallows, James, 86 Fermat, Pierre de, 132 Feynman, Richard, 104 fish, 201 fishing, 173 fish oil, 99, 110 Florey, Lord, 163 Flory, Paul, 104 Foldit, 20 Franzen, Jonathan, 208–9 French Canadians, 193–94 frogs: boiling of, 86, 171 vision of, 171 Galaxy Zoo, 20 Galileo, 21, 143–44 Galton, Francis, 165–68 games, 51 generational knowledge, 183–85, 199 genetics, 87–90 genome sequencing, 48, 51 Gibrat’s Law, 103 Goddard, Robert H., 174 Godwin’s law, 105 Goldbach’s Conjecture, 112–13 Goodman, Steven, 107–8 Gould, Stephen Jay, 82 grammar: descriptive, 188–89 prescriptive, 188–89, 194 Granovetter, Mark, 76–78 Graves’ disease, 111 Great Vowel Shift, 191–93 Green, George, 105–6 growth: exponential, 10–14, 44–45, 46–47, 54–55, 57, 59, 130, 204 hyperbolic, 59 linear, 10, 11 Gumbel, Bryant, 41 Gutenberg, Johannes, 71–73, 78, 95 Hamblin, Terry, 83 Harrison, John, 102 Hawthorne effect, 55–56 helium, 104 Helmann, John, 162 Henrich, Joseph, 58 hepatitis, 28–30 hidden knowledge, 96–120 h-index, 17 Hirsch, Jorge, 17 History of the Modern Fact, A (Poovey), 200 Holmes, Sherlock, 206 homeoteleuton, 89 Hooke, Robert, 21, 94 Hull, David, 187–88 human anatomy, 23 human computation, 20 hydrogen, 151 hyperbolic growth rate, 59 idiolect, 190 impact factors, 16–17 inattentional blindness (change blindness), 177–79 India, 140–41 informational index funds, 197 information transformation, 43–44, 46 InnoCentive, 96–98, 101, 102 innovation, 204 population size and, 135–37, 202 prizes for, 102–3 simultaneous, 104–5 integrated circuits, 42, 43, 55, 203 Intel Corporation, 42 interdisciplinary research, 68–69 International Bureau of Weights and Measures, 47 Internet, 2, 40–41, 53, 198, 208, 211 Ioannidis, John, 156–61, 162 iPhone, 123 iron: magnetic properties of, 49–50 in spinach, 83–84 Ising, Ernst, 124, 125–26, 138 isotopes, 151 Jackson, John Hughlings, 30 Johnson, Steven, 119 Journal of Physical and Chemical Reference Data, 33–35 journals, 9, 12, 16–17, 32 Kahneman, Daniel, 177 Kay, Alan, 173 Kelly, Kevin, 38, 46 Kelly, Stuart, 115 Kelvin, Lord, 142–43 Kennaway, Kristian, 86 Keynes, John Maynard, 172 kidney stones, 52 kilogram, 147–48 Kiribati, 203 Kissinger, Henry, 190 Kleinberg, Jon, 92–93 knowledge and facts, 5, 54 cumulative, 56–57 erroneous, 78–95, 211–14 half-lives of, 1–8, 202 hidden, 96–120 phase transitions in, 121–39, 185 spread of, 66–95 Koh, Heebyung, 43, 45–46, 56 Kremer, Michael, 58–61 Kuhn, Thomas, 163, 186 Lambton, William, 140 land bridges, 57, 59–60 language, 188–94 French Canadians and, 193–94 grammar and, 188–89, 194 Great Vowel Shift and, 191–93 idiolect and, 190 situation-based dialect and, 190 verbs in, 189 voice onset time and, 190 Large Hadron Collider, 159 Laughlin, Gregory, 129–31 “Laws Underlying the Physics of Everyday Life Really Are Completely Understood, The” (Carroll), 36–37 Lazarus taxa, 27–28 Le Fanu, James, 23 LEGO, 184–85, 194 Lehman, Harvey, 13–14, 15 Leibniz, Gottfried, 67 Lenat, Doug, 112 Levan, Albert, 1–2 Liben-Nowell, David, 92–93 libraries, 31–32 life span, 53–54 Lincoln, Abraham, 70 linear growth, 10, 11 Linnaeus, Carl, 22, 204 Lippincott, Sara, 86 Lipson, Hod, 113 Little Science, Big Science (Price), 13 logistic curves, 44–46, 50, 116, 130, 203–4 longitude, 102 Long Now Foundation, 195 long tails: of discovery, 38 of expertise, 96, 102 of life, 38 of popularity, 103 Lou Gehrig’s disease (ALS), 98, 100–101 machine intelligence, 207 Magee, Chris, 43, 45–46, 56, 207–8 magicians, 178–79 magnetic properties of iron, 49–50 Maldives, 203 Malthus, Thomas, 59 mammal species, 22, 23, 128 extinct, 28 manuscripts, 87–91, 114–16 Marchetti, Cesare, 64 Marsh, Othniel, 80–81, 169 mathematics, 19, 51, 112–14, 124–25, 132–35 Matthew effect, 103 Mauboussin, Michael, 84 Mayor, Michel, 122 McGovern, George, 66 McIntosh, J.

pages: 400 words: 129,841

Capitalism: the unknown ideal
by Ayn Rand
Published 15 Aug 1966

Today’s frantic development in the field of technology has a quality reminiscent of the days preceding the economic crash of 1929: riding on the momentum of the past, on the unacknowledged remnants of an Aristotelian epistemology, it is a hectic, feverish expansion, heedless of the fact that its theoretical account is long since overdrawn—that in the field of scientific theory, unable to integrate or interpret their own data, scientists are abetting the resurgence of a primitive mysticism. In the humanities, however, the crash is past, the depression has set in, and the collapse of science is all but complete. The clearest evidence of it may be seen in such comparatively young sciences as psychology and political economy. In psychology, one may observe the attempt to study human behavior without reference to the fact that man is conscious.

pages: 515 words: 132,295

Makers and Takers: The Rise of Finance and the Fall of American Business
by Rana Foroohar
Published 16 May 2016

Designs will be altered in real time to reflect the knowledge. But while all this technology in Schenectady has reduced the number of machinists needed to make a battery, it has also fueled the creation of a GE global research center in San Ramon, California. The center now employs more than one thousand software engineers, data scientists, and user-experience designers who are well paid to develop the software for that kind of industrial Internet—otherwise known as the Internet of things. GE plans to hire thousands more such employees within the next half-decade. “We are probably the most competitive, on a global basis, that we’ve been in the past 30 years,” in terms of being able to make things again in the United States, says CEO Jeffrey Immelt.

pages: 504 words: 129,087

The Ones We've Been Waiting For: How a New Generation of Leaders Will Transform America
by Charlotte Alter
Published 18 Feb 2020

Billingsley and Clyde Tucker found that, contrary to the conventional wisdom that people simply get more conservative as they age, “each generation seems to display its own political behavior as a result of experiences during early adulthood.” Nearly thirty years later, Columbia political scientist Andrew Gelman and data scientist Yair Ghitza built on this research in their 2014 study of longitudinal data on voter behavior. They found that while variables such as religion, geography, or parental political influence remain important, shared experiences between ages fourteen and twenty-four have a significant impact on lifelong political attitudes.

pages: 515 words: 143,055

The Attention Merchants: The Epic Scramble to Get Inside Our Heads
by Tim Wu
Published 14 May 2016

And while this might sound like unprecedented cynicism vis-à-vis the audience, the idea was to transfer creative intention to them; they alone would “decide if the project reaches 10 people or 10 million people.”1 To help them decide, BuzzFeed pioneered techniques like “headline optimization,” which was meant to make the piece irresistible and clicking on it virtually involuntary. In the hands of the headline doctors, a video like “Zach Wahls Speaks About Family” became “Two Lesbians Raised a Baby and This Is What They Got”—and earned 18 million views. BuzzFeed’s lead data scientist, Ky Harlin, once crisply explained the paradoxical logic of headlining: “You can usually get somebody to click on something just based on their own curiosity or something like that, but it doesn’t mean that they’re actually going to end up liking the content.” BuzzFeed also developed the statistical analysis of sharing, keeping detailed information on various metrics, especially the one they called “viral lift.”

pages: 474 words: 130,575

Surveillance Valley: The Rise of the Military-Digital Complex
by Yasha Levine
Published 6 Feb 2018

It was a kind of stripped-down 1960s version of Palantir, the powerful data mining, surveillance, and prediction software the military and intelligence planners use today. The project also funded various efforts to use these programs in ways that were beneficial to the military, including compiling various intelligence databases. As a bonus, the Cambridge Project served as a training ground for a new cadre of data scientists and military planners who learned to be proficient in data mining on it. The Cambridge Project had another, less menacing side. Financial analysts, psychologists, sociologists, CIA agents—the Cambridge Project was useful to anyone interested in working with large and complex data sets. The technology was universal and dual use.

pages: 432 words: 143,491

Failures of State: The Inside Story of Britain's Battle With Coronavirus
by Jonathan Calvert and George Arbuthnott
Published 18 Mar 2021

There were signs, however, that Javid’s enemy – the prime minister’s chief adviser – was beginning to sense that coronavirus was more of a problem than Downing Street had previously realised. Cummings had sent one of his most trusted lieutenants, Ben Warner, to listen in on the Sage expert committee meetings, which were now taking place regularly in response to the virus. Warner was a data scientist who helped mastermind the computer modelling for Vote Leave’s 2016 referendum campaign and Cummings had drafted him into No. 10 to do similar analysis for the Conservative Party’s 2019 general election campaign. He had joined the committee as an observer for the first time on Thursday 20 February – which was the ninth meeting Sage had held to discuss the UK’s reaction to the virus.

pages: 439 words: 131,081

The Chaos Machine: The Inside Story of How Social Media Rewired Our Minds and Our World
by Max Fisher
Published 5 Sep 2022

Facebook announced it would allow politicians to lie on the platform and grant them special latitude on hate speech, rules that seemed written for Trump and his allies. “I’d been at FB for less than a year when I was pulled into an urgent inquiry—President Trump’s campaign complained about experiencing a decline in views,” Sophie Zhang, a Facebook data scientist, recalled on Twitter, “I never was asked to investigate anything similar for anyone else.” This sort of appeasement of political leaders appeared to be a global strategy. Between 2018 and 2020, Zhang flagged dozens of incidents of foreign leaders promoting lies and hate for gain, but was consistently overruled, she has said.

pages: 642 words: 141,888

Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination
by Mark Bergen
Published 5 Sep 2022

The internet’s top destination, Facebook—Pepsi to their Coke—captured about 200 million hours every month of its users’ time, according to their math. Very sticky. TV, the big stomach, claimed four to five hours of the average American’s day, depending on who was doing the counting. YouTube was then clocking around 100 million hours of viewed footage every day. So, 100 million. 10x. Mehrotra left the conference room and beelined to a data scientist who worked for him. “What would it mean to hit a billion?” he asked. “When could we reasonably do that?” * * * • • • Mehrotra announced the new OKR the following year at YouTube’s annual leadership summit in Los Angeles: YouTube would work to get one billion hours of watch time every day within four years.

pages: 163 words: 42,402

Machine Learning for Email
by Drew Conway and John Myles White
Published 25 Oct 2011

--The R Project for Statistical Computing, http://www.r-project.org/ The best thing about R is that it was developed by statisticians. The worst thing about R is that...it was developed by statisticians. --Bo Cowgill, Google, Inc. R is an extremely powerful language for manipulating and analyzing data. Its meteoric rise in popularity within the data science and machine learning communities has made it the de facto lingua franca for analytics. R’s success in the data analysis community stems from two factors described in the epitaphs above: R provides most of the technical power that statisticians require built into the default language, and R has been supported by a community of statisticians who are also open source devotees.

His academic curiosity is informed by his years as an analyst in the U.S. intelligence and defense communities. John Myles White is a Ph.D. student in the Princeton Psychology Department, where he studies how humans make decisions both theoretically and experimentally. Outside of academia, John has been heavily involved in the data science movement, which has pushed for an open source software approach to data analysis. He is also the leadmaintainer for several popular R packages, including ProjectTemplate and log4r.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

Stochastic simulation algorithms for dynamic probabilistic networks. In UAI-95. Kang, S. M. and Wildes, R. P. (2016). Review of action recognition and detection methods. arXiv:1610.06906. Kanter, J. M. and Veeramachaneni, K. (2015). Deep feature synthesis: Towards automating data science endeavors. In Proc. IEEE Int'l Conf. on Data Science and Advanced Analytics. Kantorovich, L. V. (1939). Mathematical methods of organizing and planning production. Published in translation in Management Science, 6(4), 366–422, 1960. Kaplan, D. and Montague, R. (1960). A paradox regained. Notre Dame Formal Logic, 1, 79–90.

Some number of trees can cover unique cases that appear only a few times in the data, and their votes can prove decisive, but they can be outvoted when they do not apply. That said, random forests are not totally immune to overfitting. Although the error can't increase in the limit, that does not mean that the error will go to zero. Random forests have been very successful across a wide variety of application problems. In Kaggle data science competitions they were the most popular approach of winning teams from 2011 through 2014, and remain a common approach to this day (although deep learning and gradient boosting have become even more common among recent winners). The randomForest package in R has been a particular favorite. In finance, random forests have been used for credit card default prediction, household income prediction, and option pricing.

The method is called “stacking” because it can be thought of as a layer of base models with an ensemble model stacked above it, operating on the output of the base models. In fact, it is possible to stack multiple layers, each one operating on the output of the previous layer. Stacking reduces bias, and usually leads to performance that is better than any of the individual base models. Stacking is frequently used by winning teams in data science competitions (such as Kaggle and the KDD Cup), because individuals can work independently, each refining their own base model, and then come together to build the final stacked ensemble model. 19.8.4Boosting The most popular ensemble method is called boosting. To understand how it works, we need first to introduce the idea of a weighted training set, in which each example has an associated weight wj ≥ 0 that describes how much the example should count during training.

pages: 199 words: 48,162

Capital Allocators: How the World’s Elite Money Managers Lead and Invest
by Ted Seides
Published 23 Mar 2021

The turmoil in markets, employment, and working protocols caused by Covid-19 in March 2020 presented a recent case study for how CIOs respond to a period of uncertainty. To learn more Podcasts Capital Allocators: Patrick O’Shaughnessy – O’Shaughnessy Asset Management (First Meeting, Ep.1) Capital Allocators: Jordi Visser – Next Generation of Manager Allocation (Ep.92) Capital Allocators: Matthew Granade – Inside Data Science at Point72 (First Meeting, Ep.22) Companies Novus Partners, www.novus.com Essentia Analytics, www.essentia-analytics.com Alpha Theory, www.alphatheory.com Reading won’t help much in improving investment results through quantitative means. Instead, reach out to Novus, Essentia, and Alpha Theory to learn more about their application of tools for allocators and portfolio managers

Scott, Kim and Andy’s conversations are your picks – the most downloaded shows among the CIO interviews. Data analysis Podcasts Capital Allocators: Patrick O’Shaughnessy – O’Shaughnessy Asst Management (First Meeting, Ep.1) Capital Allocators: Jordi Visser – Next Generation of Manager Allocators (Ep.92) Capital Allocators: Matthew Granade – Inside Data Science at Point72 (First Meeting, Ep.22) Companies Novus Partners, www.novus.com Essentia Analytics, www.essentia-analytics.com Alpha Theory, www.alphatheory.com Reading won’t help much in improving investment results through quantitative means. Instead, reach out to Novus, Essentia, and Alpha Theory to learn more about their application of tools for allocators and portfolio managers

pages: 319 words: 90,965

The End of College: Creating the Future of Learning and the University of Everywhere
by Kevin Carey
Published 3 Mar 2015

By 2014, edX was offering hundreds of free online courses in subjects including the Poetry of Walt Whitman, the History of Early Christianity, Computational Neuroscience, Flight Vehicle Aerodynamics, Shakespeare, Dante’s Divine Comedy, Bioethics, Contemporary India, Historical Relic Treasures and Cultural China, Linear Algebra, Autonomous Mobile Robots, Electricity and Magnetism, Discrete Time Signals and Systems, Introduction to Global Sociology, Behavioral Economics, Fundamentals of Immunology, Computational Thinking and Data Science, and an astrophysics course titled Greatest Unsolved Mysteries of the Universe. Doing this seemed to contradict five hundred years of higher-education economics in which the wealthiest and most sought-after colleges enforced a rigid scarcity over their products and services. The emerging University of Everywhere threatened institutions that depended on the privilege of being scarce, expensive places.

She quoted Edwin Slosson’s well-known alleged quip that lecture notes are a way of transmitting information from the lecturer to the student without it passing through the minds of either one of them. Third, there will be a transformation of the study of human learning, from a series of anecdotes to real-data science. Suppes had written about this, too. “The power of the computer to assemble and provide data as a basis for [educational] decisions,” he wrote, “will be perhaps the most powerful impetus to the development of education theory yet to appear.” As we finished the interview, Michael Staton mentioned to Koller that Learn Capital was putting together a new pool of investment money.

Over time those courses will be organized into sequences that approximate the scope of learning we associate with college majors. MIT is already moving in this direction, starting with a seven-course sequence in computer programming that begins with introductions to coding, computational thinking, and data science and then moves to software construction, digital circuits, programmable architectures, and computer systems organization. The length of the course sequences will vary depending on the field, profession, or kind of work. Some will involve a few courses; others will be dozens long. Neither the courses nor the sequences will be constrained by the artificial limitations of semester hours or years spent attending school.

pages: 361 words: 100,834

Mapmatics: How We Navigate the World Through Numbers
by Paulina Rowinska
Published 5 Jun 2024

The Ghost Map: The Story of London’s Most Terrifying Epidemic—and How It Changed Science, Cities, and the Modern World. New York: Riverhead Books, 2006. Pastore y Piontti, Ana, Nicola Perra, Luca Rossi, Nicole Samay, and Alessandro Vespignani. Charting the Next Pandemic: Modeling Infectious Disease Spreading in the Data Science Age. Cham, Switzerland: Springer, 2019. Snow, John. On the Mode of Communication of Cholera, 2nd ed. London: John Churchill, 1855. Geographic Profiling Chainey, Spencer, and Lisa Tompson, eds. Crime Mapping Case Studies: Practice and Research. Chichester: John Wiley and Sons, 2008. Rossmo, D.

called Markov chain Monte Carlo (MCMC): Benjamin Fifield et al., ‘Automated Redistricting Simulation Using Markov Chain Monte Carlo’, Journal of Computational and Graphical Statistics 29, no. 4 (2020): 715–28, https://doi.org/10.1080/10618600.2020.1739532. led by mathematician Moon Duchin: Daryl DeFord, Moon Duchin and Justin Solomon, ‘Recombination: A Family of Markov Chains for Redistricting’, Harvard Data Science Review 3, no. 1 (31 March 2021), https://doi.org/10.1162/99608f92.eb30390f. a bipartisan group of voters: Wendy K. Tam Cho and Yan Y. Liu, ‘Toward a Talismanic Redistricting Tool: A Computational Method for Identifying Extreme Redistricting Plans’, Election Law Journal: Rules, Politics, and Policy 15, no. 4 (2016): 351–66, https://doi.org/10.1089/elj.2016.0384.

became popular in the early twentieth century: Fred Brauer, ‘Compartmental Models in Epidemiology’, Mathematical Epidemiology 1945 (2008): 19–79, https://doi.org/10.1007/978-3-540-78911-6_2. removed (sometimes called recovered, R): Ana Pastore y Piontti et al., Charting the Next Pandemic: Modeling Infectious Disease Spreading in the Data Science Age (Cham, Switzerland: Springer, 2019), 35–7. real-world population and mobility data: Pastore y Piontti et al., Charting the Next Pandemic, 29–34. enrol in one in the future: Allan Casey, ‘Rossmo’s Formula’, College of Arts and Science, University of Saskatchewan, 8 May 2018, https://artsandscience.usask.ca/magazine/Spring_2018/rossmos-formula.php.

pages: 271 words: 52,814

Blockchain: Blueprint for a New Economy
by Melanie Swan
Published 22 Jan 2014

Blockchain Layer Could Facilitate Big Data’s Predictive Task Automation As big data allows the predictive modeling of more and more processes of reality, blockchain technology could help turn prediction into action. Blockchain technology could be joined with big data, layered onto the reactive-to-predictive transformation that is slowly under way in big-data science to allow the automated operation of large areas of tasks through smart contracts and economics. Big data’s predictive analysis could dovetail perfectly with the automatic execution of smart contracts. We could accomplish this specifically by adding blockchain technology as the embedded economic payments layer and the tool for the administration of quanta, implemented through automated smart contracts, Dapps, DAOs, and DACs.

“Unreliable Research. Trouble at the Lab.” The Economist, October 17, 2013 (paywall restricted). http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble. 157 Schmidt, M. and H. Lipson. “Distilling Free-Form Natural Laws from Experimental Data.” Science 324, no. 5923 (2009): 81–5. http://creativemachines.cornell.edu/sites/default/files/Science09_Schmidt.pdf; Keim, B. “Computer Program Self-Discovers Laws of Physics.” Wired, April 2, 2009. http://www.wired.com/2009/04/newtonai/. 158 Muggleton, S. “Developing Robust Synthetic Biology Designs Using a Microfluidic Robot Scientist.

pages: 523 words: 148,929

Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100
by Michio Kaku
Published 15 Mar 2011

If you go to a nursing home, where people are wasting away, living with constant pain, and waiting to die and ask the same question, you might get an entirely different answer.) As UCLA’s Greg Stock says, “Gradually, our agonizing about playing God and our worries about longer life spans would give way to a new chorus: ‘When can I get a pill?’ ” In 2002, with the best demographic data, scientists estimated that 6 percent of all humans who have ever walked the face of the earth are still alive today. This is because the human population hovered at around 1 million for most of human history. Foraging for meager supplies of food kept the human population down. Even during the height of the Roman Empire, its population was estimated to be only 55 million.

pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection
by Jacob Silverman
Published 17 Mar 2015

Ironically, it’s the very unreliability of Big Data–style analysis that prompts ever more data collection. If you think you’ve detected some false patterns or aren’t finding the kinds of correlations you sought, why not just collect and analyze more? If you can’t process all the data you’ve stored—a problem that the NSA has faced—just build more data centers and hire more mathematicians and data scientists. Whether you’re Facebook or the U.S. government, the money is out there to do just that. Even the apparent presence of a pattern can lead us toward some false choices. A health insurance company may believe that people who buy six key grocery items are 30 percent more likely to develop diabetes, but does that give the insurer the right to raise this group’s premiums or deny them coverage?

pages: 543 words: 153,550

Model Thinker: What You Need to Know to Make Data Work for You
by Scott E. Page
Published 27 Nov 2018

Studies show that people may reside in online bubbles. That is, we may belong to communities of people who get their news from similar sources. If so, that has implications for social cohesion. Prior to the creation of the internet, that may have been true as well, but demonstrating it with data would have been hard. Now data scientists can scrape the web to identify the news sources that people frequent and tell us that, yes, in fact we do live in bubbles to an extent. Models provide the formal definitions of communities. Data tells us the strength of those communities. Using judgment we can make wise inferences based on what the data say.

pages: 569 words: 156,139

Amazon Unbound: Jeff Bezos and the Invention of a Global Empire
by Brad Stone
Published 10 May 2021

A decentralized operating structure limited the company’s dexterity at precisely the time when it needed to evolve quickly to meet changing tastes, as well as to introduce home delivery and new digital payment methods. In an unusual arrangement, Mackey was running the company at the time with a co-CEO, Walter Robb, who managed day-to-day operations. They recognized the looming challenges, and hired teams of data scientists in Austin and contracted with the San Francisco–based grocery delivery startup Instacart. But things were progressing slowly—and then they ran out of time. In 2016, the New York investment firm Neuberger Berman started sending letters to Whole Foods leadership and to other shareholders, complaining about complacent management, the unconventional CEO structure, and highlighting deficiencies like the absence of a rewards program.

pages: 534 words: 157,700

Politics on the Edge: The Instant #1 Sunday Times Bestseller From the Host of Hit Podcast the Rest Is Politics
by Rory Stewart
Published 13 Sep 2023

He said that Messina had just won the second election for Barack Obama, through his use of Facebook and Twitter, and that he had been hired to win the same victory for us. The party, we were told, had raised tens of millions of pounds for our new campaign – buying consumer data and building a new software platform. We were to be the first generation of British politicians to enter the world of Big Data, AI and social media. Messina’s data-scientists would micro-target exactly the right supporters in the key target constituencies, with the most efficient allocation of money and resources, and persuade them to vote through their phones. (The older MPs glanced at their phones as though unsure whether they had turned them off.) In March 2015, Cameron called the election and I returned for six weeks of constituency campaigning.

The Metropolitan Revolution: How Cities and Metros Are Fixing Our Broken Politics and Fragile Economy
by Bruce Katz and Jennifer Bradley
Published 10 Jun 2013

The center will also work with private industry partners, including IBM, Cisco, ConEdison, National Grid, Siemens, Xerox, AECOM, Arup, IDEO, Lutron, and Microsoft, and government labs, including the Livermore, Los Alamos, and Sandia National Laboratories. 02-2151-2 ch2.indd 28 5/20/13 6:48 PM NYC: INNOVATION AND THE NEXT ECONOMY 29 NEW YORK CITY’S APPLIED SCIENCES CAMPUSES 02-2151-2 ch2.indd 29 5/20/13 6:48 PM 30 NYC: INNOVATION AND THE NEXT ECONOMY In July 2012 Columbia University’s new Institute for Data Sciences and Engineering, located at its Morningside Heights and Washington Heights campuses in New York City, became the third Applied Sciences campus.37 At Columbia, students and faculty will focus on applications for new media, smart cities, health analytics, cybersecurity, and financial analytics, among other areas.

Center for Urban Science Progress, “Educational Programs,” New York University, 2012 (http://cusp.nyu.edu/ms-in-applied-urban-science-and-informatics/). 37. New York City Economic Development Corporation, “Mayor Bloomberg and Columbia President Bollinger Announce Agreement to Create New Institute for Data Sciences and Engineering,” press release, July 30, 2012. 38. These include state renewable portfolio standards as well as national carbonreduction strategies, such as those promulgated by the U.K. Department of Energy and Climate Change. See Barry Rabe, “Race to the Top: The Expanding Role of U.S. State Renewable Portfolio Standards,” Sustainable Development Law and Policy 7 no. 3 (2007).

Houston Settlement Association, 91 “How America Can Rise Again” (Fallows), 154–55 Howder, Randy, 119 Hsieh, Tony, 119–20 Hughes, Tom, 158 Hull House settlement, 101 12-2151-2 index.indd 254 IBM, viii, 122, 148 Idea viruses, 10 Immelt, Jeffrey, 19–20, 32, 182 Immigrant populations: and benefits for U.S. economy, 92–93; education level among, 93, 154; and global trade, impact on, 154; in metropolitan areas, 92–93; and patent generation, 93; poverty among, 103; in suburbs, 48, 93–94, 98–99; trends among, 153 Induced travel, 57 Industrial districts, 115–16 Information vs. knowledge, 118 Innovation and innovation districts, vii, 113–43; anchor institutions model for, 114, 121–23, 127; challenges facing, 129–31; and clustering, 22–23; collaborative approach of, 117, 119–20, 139–40; defined, 114; and demographic changes, 120–21; drivers of, 116, 138–39; and economic growth, 4; and exports, 32–33, 34; factors affecting rise of, 114–15, 116; funding for, 130–31, 141–42; government’s role in, 142; implications of, 117, 118–19; international models for, 127–29, 130; and manufacturing, 82–83; metropolitan revolution, role in, 10–11, 141–43, 202–05; remake science park model for, 126–27; replication of, 10–11, 203–05; spatial impacts of, 121–29; transforming underused areas model for, 123–26, 127; trends facilitating necessity for, 113–14, 119; and urbanization, 114–15, 116, 121; and zoning regulations, 129–30 Institute for Data Sciences and Engineering, 30 Intel, 93, 157 Inter-American Development Bank urban initiative, 148 Intermediaries, defined, 75–76 International Trade Administration (U.S.), 32 Internet, replication of ideas through, 10, 203–05 Investments. See Funding Istrate, Emilia, 33 Jacksonville (Florida) metropolitan area, infrastructure development in, 4 Jacobs, Jane, 34, 113, 150 James, Franklin, 45–46 Jaquay, Bob, 70–71, 78–79 Johnson, Steven, 38, 39, 67, 83 5/20/13 7:04 PM INDEX Kansas City Federal Reserve Bank study, 53 Kendall Square, 122–23, 129 Kenney, Peter, 49, 54, 55, 56, 60, 61, 62–63 Kent State University, 75 Kharas, Homi, 147 Kim, Charlie, 27 Knowledge vs. information, 118 Koonin, Steven, 28, 29, 37 LaHood, Raymond, 138 Latin America: emerging market economies in, 32, 147, 148; innovation in, 204; and international tourism, 153; Miami, influence on, 161–62, 163, 186 Latinos/Latinas: education level of, 93, 103, 104; in suburbs, 99 Leadership, metropolitan vs. state and federal, 3–4, 5–9 Leal, Roberta, 99, 100, 105, 107 Lehman Brothers collapse (2008), 17–18 Lewis, Michael, 196 LG Corporation, 83–84 Light bulbs, metropolitan influence on invention of, 39–40 Liveris, Andrew, 182 London: East End development, viii; trade links with, 162, 165, 167; traffic congestion and pollution control, 204 Los Angeles (California) metropolitan area: game changers for, 197; transit system in, ix, 4, 185–86; vision established for, 196 Lübeck and Hamburg trade agreement, 166–68 Madison, James, 175–76 MAGNET development organization, 83 “Making Northeast Ohio Great Again: A Call to Arms to the Foundation Community” (Fund for Our Economic Future), 70 Manufacturing: additive, 77; and exports, 152; foreign investment in, 155; and innovation, 82–83 Marchio, Nicholas, 33 Marcuse, Peter, 160 Masdar City, viii Massachusetts Institute of Technology.

The Singularity Is Nearer: When We Merge with AI
by Ray Kurzweil
Published 25 Jun 2024

id=5kSamKhS560C; “Marvin Minsky: The Problem with Perceptrons (121/151),” Web of Stories—Life Stories of Remarkable People, YouTube video, October 17, 2016, https://www.youtube.com/watch?v=QW_srPO-LrI; Heinz Mühlenbein, “Limitations of Multi-Layer Perceptron Networks: Steps Towards Genetic Neural Networks,” Parallel Computing 14, no. 3 (August 1990): 249–60, https://doi.org/10.1016/0167-8191(90)90079-O; Aniruddha Karajgi, “How Neural Networks Solve the XOR Problem,” Towards Data Science, November 4, 2020, https://towardsdatascience.com/how-neural-networks-solve-the-xor-problem-59763136bdd7. BACK TO NOTE REFERENCE 27 See the appendix for the sources used for all the cost-of-computation calculations in this book. BACK TO NOTE REFERENCE 28 Tim Fryer, “Da Vinci Drawings Brought to Life,” Engineering & Technology 14, no. 5 (May 21, 2019): 18, https://eandt.theiet.org/content/articles/2019/05/da-vinci-drawings-brought-to-life.

id=QILTDQAAQBAJ; The Brain from Top to Bottom, “The Motor Cortex,” McGill University, accessed November 20, 2021, https://thebrain.mcgill.ca/flash/i/i_06/i_06_cr/i_06_cr_mou/i_06_cr_mou.html. BACK TO NOTE REFERENCE 39 For more technical lessons on basis functions as relevant to machine learning, see “Lecture 17: Basis Functions,” Open Data Science Initiative, YouTube video, November 28, 2011, https://youtu.be/OOpfU3CvUkM?t=151; Yaser Abu-Mostafa, “Lecture 16: Radial Basis Functions,” Caltech, YouTube video, May 29, 2012, https://www.youtube.com/watch?v=O8CfrnOPtLc. BACK TO NOTE REFERENCE 40 Mayo Clinic, “Ataxia,” Mayo Clinic, accessed November 20, 2021, https://www.mayoclinic.org/diseases-conditions/ataxia/symptoms-causes/syc-20355652; Helen Thomson, “Woman of 24 Found to Have No Cerebellum in Her Brain,” New Scientist, September 10, 2014, https://institutions.newscientist.com/article/mg22329861-900-woman-of-24-found-to-have-no-cerebellum-in-her-brain; R.

BACK TO NOTE REFERENCE 93 Rachel Syme, “Gmail Smart Replies and the Ever-Growing Pressure to E-Mail Like a Machine,” New Yorker, November 28, 2018, https://www.newyorker.com/tech/annals-of-technology/gmail-smart-replies-and-the-ever-growing-pressure-to-e-mail-like-a-machine. BACK TO NOTE REFERENCE 94 For a more detailed explainer on how transformers work, and the original technical paper, see Giuliano Giacaglia, “How Transformers Work,” Towards Data Science, March 10, 2019, https://towardsdatascience.com/transformers-141e32e69591; Ashish Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762v5 [cs.CL], December 6, 2017, https://arxiv.org/pdf/1706.03762.pdf. BACK TO NOTE REFERENCE 95 Irene Solaiman et al., “GPT-2: 1.5B Release,” OpenAI, November 5, 2019, https://openai.com/blog/gpt-2-1-5b-release.

pages: 444 words: 117,770

The Coming Wave: Technology, Power, and the Twenty-First Century's Greatest Dilemma
by Mustafa Suleyman
Published 4 Sep 2023

GO TO NOTE REFERENCE IN TEXT The breakthrough moment took Alex Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks,” Neural Information Processing Systems, Sept. 30, 2012, proceedings.neurips.cc/​paper/​2012/​file/​c399862d3b9d6b76c8436e924a68c45b-Paper.pdf. GO TO NOTE REFERENCE IN TEXT In 2012, AlexNet beat Jerry Wei, “AlexNet: The Architecture That Challenged CNNs,” Towards Data Science, July 2, 2019, towardsdatascience.com/​alexnet-the-architecture-that-challenged-cnns-e406d5297951. GO TO NOTE REFERENCE IN TEXT Thanks to deep learning Chanan Bos, “Tesla’s New HW3 Self-Driving Computer—It’s a Beast,” CleanTechnica, June 15, 2019, cleantechnica.com/​2019/​06/​15/​teslas-new-hw3-self-driving-computer-its-a-beast-cleantechnica-deep-dive.

GO TO NOTE REFERENCE IN TEXT But it uses an efficient training William Fedus et al., “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity,” Journal of Machine Learning Research, June 16, 2022, arxiv.org/​abs/​2101.03961. GO TO NOTE REFERENCE IN TEXT Or look at DeepMind’s Chinchilla Alberto Romero, “A New AI Trend: Chinchilla (70B) Greatly Outperforms GPT-3 (175B) and Gopher (280B),” Towards Data Science, April 11, 2022, towardsdatascience.com/​a-new-ai-trend-chinchilla-70b-greatly-outperforms-gpt-3-175b-and-gopher-280b-408b9b4510. GO TO NOTE REFERENCE IN TEXT At the other end of the spectrum See github.com/​karpathy/​nanoGPT for more details. GO TO NOTE REFERENCE IN TEXT Meta has open-sourced Susan Zhang et al., “Democratizing Access to Large-Scale Language Models with OPT-175B,” Meta AI, May 3, 2022, ai.facebook.com/​blog/​democratizing-access-to-large-scale-language-models-with-opt-175b.

GO TO NOTE REFERENCE IN TEXT Under it, India established Trisha Ray and Akhil Deo, “Priorities for a Technology Foreign Policy for India,” Washington International Trade Association, Sept. 25, 2020, www.wita.org/​atp-research/​tech-foreign-policy-india. GO TO NOTE REFERENCE IN TEXT We live in an age Cronin, Power to the People. GO TO NOTE REFERENCE IN TEXT For example, GitHub has Neeraj Kashyap, “GitHub’s Path to 128M Public Repositories,” Towards Data Science, March 4, 2020, towardsdatascience.com/​githubs-path-to-128m-public-repositories-f6f656ab56b1. GO TO NOTE REFERENCE IN TEXT The original such service arXiv, “About ArXiv,” arxiv.org/about. GO TO NOTE REFERENCE IN TEXT The great stock of the world’s “The General Index,” Internet Archive, Oct. 7, 2021, archive.org/​details/​GeneralIndex.

User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work & Play
by Cliff Kuang and Robert Fabricant
Published 7 Nov 2019

But is a user-friendly world actually the best world we can create? In the months after the election, as flummoxed Hillary Clinton staffers were wondering how they’d so badly misunderstood the race they were running against Donald Trump, news reports began trickling out about Cambridge Analytica, a mysterious data-science company that had been paid millions to help Trump’s campaign in the run-up to the election.34 Cambridge Analytica itself wasn’t an innovator. It had been inspired by Michal Kosinski, a young psychologist at Cambridge University. Kosinski typically wears the uniform of a venture capitalist: pressed khakis, crisp button-down shirt tucked in.

Just 150 likes would be enough to outdo the person’s parents. At 300 or more likes, you could predict nuances of preference and personality unknown even to a person’s partner.36 On April 9, 2013, when Kosinski published his findings, a recruiter at Facebook called to see if he’d be interested in a role on its data science team. Later, when he checked his snail mail, he saw that Facebook’s lawyers had also sent him a threat of a lawsuit. Facebook quickly responded by allowing likes to be made private. But the genie had escaped its bottle. Kosinski had shown that if you knew a person’s Facebook likes, you knew their personality.

It has been estimated that during the election, the firm was testing 40,000 to 50,000 ads a day to better understand what would motivate voters—or keep voters who didn’t like Trump from voting at all.37 In one instance, Trump’s own digital operatives claimed that they’d targeted black voters in Miami’s Haitian community with stories about the Clinton Foundation’s supposedly corrupt efforts to deliver aid after Haiti’s catastrophic 2010 earthquake.38 Some months later, journalists began to question whether Cambridge Analytica’s data science really could be as advanced as it claimed.39 What no one questioned was that Facebook could easily do what Cambridge Analytica had boasted about. Indeed, months after the election, a leaked Facebook document produced by company executives in Australia suggested that they could target teens precisely at the moment they felt “insecure,” “worthless,” or “needed a confidence boost.”

pages: 239 words: 74,845

The Antisocial Network: The GameStop Short Squeeze and the Ragtag Group of Amateur Traders That Brought Wall Street to Its Knees
by Ben Mezrich
Published 6 Sep 2021

When you finally sell your positions, you still have the same problems you had before. You still have the same life, with the same bills to pay—” “Bills? Who pays bills?” Jeremy laughed. He knew he was privileged—his dad still sent him checks every month to cover his living expenses and tuition. Before Covid, he had worked part time doing data science for a professor to help cover his student loans, but now he was mostly on the parental dole. He knew there were a lot of people on the WSB board who were a lot worse off than he was. The pandemic had hit the community hard, and many were out of work. Which made it more understandable, to Jeremy, that they were willing to try to use whatever little money they had to change things, and not just incrementally—but monumentally.

Moving forward, I don’t think you’re going to see stocks with the kind of short interest levels that we’ve seen prior to this year. I don’t think investors like myself will want to be susceptible to these types of dynamics. I think there will be a lot closer monitoring of message boards…we have a data science team that will be looking at that…You know, whatever regulation that you guys come up with—certainly we’ll abide by.” Even over livestream, the transformation was visible; in less than a minute, Gabe had gone from a bewildered victim to the professional athlete he’d always been. It was time to accept the loss and move on, because there were plenty more wins in his future.

pages: 561 words: 163,916

The History of the Future: Oculus, Facebook, and the Revolution That Swept Virtual Reality
by Blake J. Harris
Published 19 Feb 2019

And as with most cases at Oculus where the opinion was split, each half spent the next few days trying to prove why the other half were idiots. But on June 29, the playful banter was interrupted by a hot new topic that hit even closer to home. “Did you see this?” Dycus asked Luckey, pointing out one of the many articles about Facebook’s “Secret Mood Experiment,” in which data scientists at Facebook manipulated what appeared in the newsfeed of 689,000 users to try and see if they could make people feel “more positive” or “more negative” through a process called “emotional contagion.” “Yeah, I saw it,” Luckey said. “The experiment was successful, by the way,” Dycus said. “What does Brendan think?”

pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything
by Martin Ford
Published 13 Sep 2021

AI’s tentacles will eventually reach into and transform virtually every existing industry, and any new industries that arise in the future will very likely incorporate the latest AI and robotics innovations from their inception. In other words, it seems very unlikely that some entirely new sector with tens of millions of new jobs will somehow materialize to absorb all the workers displaced by automation in existing industries. Rather, future industries will be built on a foundation of digital technology, data science and artificial intelligence—and as a result, they will simply not generate large numbers of jobs. A second point involves the nature of the activities undertaken by workers. It’s reasonable to estimate that roughly half our workforce is engaged in occupations that are largely routine and predictable in nature.4 By this, I don’t mean “rote-repetitive” but simply that these workers tend to face the same basic set of tasks and challenges again and again.

McCorvey, “This image-authentication startup is combating faux social media accounts, doctored photos, deep fakes, and more,” Fast Company, February 19, 2019, www.fastcompany.com/90299000/truepic-most-innovative-companies-2019. 8. Ian Goodfellow, Nicolas Papernot, Sandy Huang, et al., “Attacking machine learning with adversarial examples,” OpenAI Blog, February 24, 2017, openai.com/blog/adversarial-example-research/. 9. Anant Jain, “Breaking neural networks with adversarial attacks,” Towards Data Science, February 9, 2019, towardsdatascience.com/breaking-neural-networks-with-adversarial-attacks-f4290a9a45aa. 10. Ibid. 11. Slaughterbots, released November 12, 2017, Space Digital, www.youtube.com/watch?reload=9&v=9CO6M2HsoIA. 12. Stuart Russell, “Building a lethal autonomous weapon is easier than building a self-driving car.

pages: 559 words: 155,372

Chaos Monkeys: Obscene Fortune and Random Failure in Silicon Valley
by Antonio Garcia Martinez
Published 27 Jun 2016

As I was soon to find is de rigueur, I was asked to sign a nondisclosure agreement that made it illegal for me to so much as leak the wallpaper design in the kitchens or reveal any knowledge, whether carnal or technical, gleaned while inside Twitter. Then I waited. A couple of overdressed and nervous-looking people waited with me, probably job candidates. There were large-format coffee-table books, artsy tomes about the new world of data science and landscape photography, on the coffee tables. The reception area itself was tastefully paneled with reclaimed wood, and that little Twitter bird logo was on everything, down to the elegant black coffee mugs kept in the reception’s minikitchen. Jess appeared and was exactly like her publicity photos online.

Since Facebook Ads didn’t ship products that people actually asked for, launching always had a certain foie-gras-duck-undergoing-gavage quality to it: open up and pump it in.* However, to my earlier point, the performance gains attributed to Sponsored Stories (to the extent they even existed outside the perfect test conditions of the Facebook data-science team) weren’t significant in the everyday noise of a live ads campaign. And so the partners were struggling to get advertisers to invest in the new, sexy hotness. Facebook’s reaction was to basically tell them to grip the goose tighter and stick the tube farther down its throat. Yet despite all these hints of failure and fundamentally wrong direction, here we were at the big FMC show, with Paul Adams standing onstage with an image of a stylized social network behind him, like George C.

* Pronounced “eff-ait,” and not “fate,” the name F8 came from the eight hours that engineers spent in hackathons, the all-night company-wide coding sessions that produced some of Facebook’s more random (and successful) products. * Titled Social Influence in Social Advertising: Evidence from Field Experiments, the paper would eventually appear at an Association for Computing Machinery e-commerce conference, with Eytan Bakshy, Dean Eckles, Rong Yan, and Itamar Rosenn as its authors. Facebook’s data-science team was absolutely top-notch and boasted both already-prominent academics and young, up-and-coming PhDs who were ecstatic to get their busy hands on Facebook’s vast store of proprietary data. The team’s papers, such as this one, were always carefully executed experiments that often called bullshit on some social media truism—often one that originated with Facebook itself

pages: 517 words: 147,591

Small Wars, Big Data: The Information Revolution in Modern Conflict
by Eli Berman , Joseph H. Felter , Jacob N. Shapiro and Vestal Mcintyre
Published 12 May 2018

We argue that taking a conventional approach, based on a symmetric warfare doctrine, will waste lives and resources, and risk defeat. However, taking a smarter approach can improve strategy and make dramatic gains in efficiency. Two major new tools enable this smart approach: research methods that were unavailable just fifteen years ago and data science, including the analysis of “big data.” Our use of these tools has already yielded an important central finding: in information-centric warfare, small-scale efforts can have large-scale effects. Larger efforts may be neutral at best and counterproductive at worst. If this more nuanced view can guide policy, lives and money could be saved.

Conversely, there is not a causal relationship from smoke to fire: if I want to cause more fires, I shouldn’t rent a smoke machine. Correlations are great for prediction—for example, predicting the flow of populations after a disaster or the level of hospitalization at different times—and this accounts for much of the excitement about big data that we cited in chapter 1. Big data and data science can do well predicting what will happen in the world absent policy changes, but when predicting the effects of those policy changes, correlations are not enough. If you want to know what to do in the world to produce a certain outcome, then you need to establish causality. And when the goal is discovering causality, correlations can mislead: “smoke causes fire” is obviously an erroneous statement, but we see equivalent logic in policy all the time.

Heather, 143–44 credit, attribution of. See attribution of credit/blame crisis aid, 274–78 Crost, Ben, 129, 134 Cruz, Cesi, 282 Daesh. See Islamic State DARPA. See Defense Advanced Research Projects Agency Das, Jishnu, 277 data. See microdata data access and confidentiality, 314–16, 323–24 data science. See big data D-Day, 2–4 decision process, for civilian informants, 16–17, 65–77, 80–81, 188–89 Defense Advanced Research Projects Agency (DARPA), 13–14 Deininger, Klaus, 241 Dell, Melissa, 177, 179–81, 214 Democratic Republic of the Congo, 296 demonstration effects, 276 Department of Defense Rewards Program, 247–48 development assistance, 109–51; in Afghanistan, 132–34, 144–48, 153–56, 223–25, 280; in asymmetric vs. symmetric wars, 141, 150; characteristics of successful, 80, 128, 149, 151, 221, 252, 258, 260, 321–22; civilian attitudes in relation to, 132–34; community-tailored, 124; conditional nature of, 122, 126–27, 131–32; expertise as factor in success of, 125–26; food as the form of, 139–42; humanitarian rationales for, 149–50; in Iraq, 109–13, 123–28, 146–48; large-scale, 123–28, 134–35, 146–48, 298–99, 299, 322–23; level of existing violence as factor in, 122–23, 127–28; as military strategy, 113–14, 298; modestly-scaled, 123–28, 148–49, 167–70, 278–81; in Pakistan, 279–80; in Philippines, 128–32, 134–39; predictions on, from information-centric model, 120–23, 148–49; rationales for, 109, 114–15; security provision in relation to, 127, 147–48, 153–59, 162–77, 169, 181–83; studies of effects of, 29–31, 52, 123–38, 151; theft and corruption involving, 139–48; violence diminished by, 123–34, 148–49, 157–62, 167–77, 323; violence increased by, 115, 134–45, 136, 156, 223–24, 224, 322.

pages: 339 words: 92,785

I, Warbot: The Dawn of Artificially Intelligent Conflict
by Kenneth Payne
Published 16 Jun 2021

For a classic American statement, see Krulak, Charles, C. ‘The Strategic Corporal: Leadership in the three-block war’. Marines Magazine 6, 1999. https://apps.dtic.mil/dtic/tr/fulltext/u2/a399413.pdf. 28. Eady, Yarrow. ‘Tesla’s deep learning at scale: Using billions of miles to train neural networks’, Towards Data Science, 7 May 2019, towardsdatascience.com/teslas-deep-learning-at-scale-7eed85b235d3. 29. Lye, Harry. ‘UK flies 20-drone swarm in major test,’ Airforce Technology, 28 January 2021, https://www.airforce-technology.com/news/uk-flies-20-drone-swarm-in-major-test/. 30. Cooper, Helene, Ralf Blumenthal and Leslie Kean.

‘The accuracy, fairness, and limits of predicting recidivism’, Science Advances 4, no. 1 (2018): eaao5580. Dunbar, Robin I. M. ‘Neocortex size and group size in primates: a test of the hypothesis’, Journal of Human Evolution 28, no. 3 (1995): 287–296. Eady, Yarrow. ‘Tesla’s deep learning at scale: using billions of miles to train neural networks’, Towards Data Science, 7 May 2019, https://towardsdatascience.com/teslas-deep-learning-at-scale-7eed85b235d3. Eagleman, David. ‘Can we create new senses for humans’, TED talks, March 2015, https://www.ted.com/talks/david_eagleman_can_we_create_new_senses_for_humans?utm_campaign=tedspread&utm_medium=referral&utm_source=tedcomshare.

Learn Algorithmic Trading
by Sebastien Donadio
Published 7 Nov 2019

He has been in the IT industry for more than 19 years and has worked in the technical & analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. He led the Data Science team at Purdue, where he developed the company's award-winning Big Data and Machine Learning platform. Prior to Purdue, at UBS, he held the role of Associate Director, working with high-frequency & algorithmic trading technologies in the Foreign Exchange Trading group. He has authored Practical Big Data Analytics and co-authored Hands-on Data Science with R. Apart from his role at RxDataScience, and is also currently affiliated with Imperial College, London. Ratanlal Mahanta is currently working as a quantitative analyst at bittQsrv, a global quantitative research company offering quant models for its investors.

pages: 279 words: 87,875

Underwater: How Our American Dream of Homeownership Became a Nightmare
by Ryan Dezember
Published 13 Jul 2020

Once the computer got the picture, it pored over listing photos, written property descriptions, public records, and satellite imagery, looking for sunny kitchens. Kay and his partners decided to focus on helping better-funded investors find houses rather than enlarging their own pool of homes. Their specialty was data science, after all, not collecting rent. Whenever Entera needed money to train its algorithms on a new market or add employees, Kay would sell some of the Texas houses that he’d bought on the cheap. By 2018, Entera’s algorithms had unearthed tens of thousands of houses that wound up in the portfolios of American Homes 4 Rent, Invitation Homes, and others.

condominiums bankruptcies of Beach Club prices of in Bon Secour Village flipping of foreclosures influencing James, T., taking deposits on Lighthouse preconstruction sales of real estate crash and sales contracts for Shallow, B., selling stock prices compared to subdivision development over for Sunset Bay’s auctioning off Connors, Cristie conservation conspiracy theories CoreLogic corporate buyout firm (KKR) corruption charges Countrywide Financial courthouse auctions credit default swaps credit scores credit-rating firms Cypress Village data science Davidson, Jerry debt debt-to-income ratio deed filings Deepwater Horizon oil spill DeLawder, C. Daniel demand growth destruction, from hurricane developers, litigious buyers and development projects Dolphin Club down payments drug charges drug overdose DuBose, Kristi Dudley, William easy money ecology economy emergency management bunker Empire Group endangered species Engels, Friedrich English, Dewey Entera Technology Environmental Protection Agency (EPA) Envision Gulf Shores EPA.

pages: 618 words: 179,407

The Bill Gates Problem: Reckoning With the Myth of the Good Billionaire
by Tim Schwab
Published 13 Nov 2023

In the spring of 2020, U.S. president Donald Trump held a press conference in which his advisers pointed to IHME estimates as evidence that the pandemic would rapidly peak and then wind down in the weeks ahead. “Throughout April, millions of Americans were falsely led to believe that the epidemic would be over by June because of IHME’s projections,” data scientist Youyang Gu told me. “I think that a lot of states reopened [from lockdowns] based on their modeling.” Gu was one of many modelers who ended up competing with, and outperforming, the IHME during the Covid-19 pandemic, independently producing projections that appeared more accurate than Bill Gates’s half-billion-dollar health metrics enterprise.

pages: 400 words: 99,489

The Sirens of Mars: Searching for Life on Another World
by Sarah Stewart Johnson
Published 6 Jul 2020

Head III, et al., “Oceans in the Past History of Mars: Tests for Their Presence Using Mars Orbiter Laser Altimeter (MOLA) Data,” Geophysical Research Letters (Dec. 1998), p. 4,403; J. W. Head III, et al., “Possible Ancient Oceans on Mars: Evidence from Mars Orbiter Laser Altimeter Data,” Science, 286 (1999), pp. 2,134–2,137. THREE AND A HALF J. P. Bibring, et al., “Global Mineralogical and Aqueous Mars History Derived from OMEGA/Mars Express Data,” Science, 312 (April 2006), pp. 400–404. ALMOST ALL OF THE ATMOSPHERE With essentially no greenhouse effect, the surface temperatures of Mars, following the Stefan-Boltzmann law, slowly dropped to an average of minus 60 degrees Celsius, the surface temperature today.

pages: 174 words: 34,672

Nginx Essentials
by Valery Kholodkov
Published 21 Jul 2015

Jesse Estill Lawson is a computer scientist and social science researcher who works in higher education. He has consulted with dozens of colleges across the country to help them design, develop, and deploy computer information systems on everything from Windows and Apache to Nginx and node servers, and he centers his research on the coexistence of data science and sociology. In addition to his technological background, Jesse holds an MA in English and is currently working on his PhD in education. You can learn more about him on his website at http://lawsonry.com. Daniel Parraz is a Linux systems administrator with 15 years of experience in high-volume e-retailer sites, large system storage, and security enterprises.

pages: 451 words: 103,606

Machine Learning for Hackers
by Drew Conway and John Myles White
Published 10 Feb 2012

We’d also like to thank the members of the NYC Data Brunch for originally inspiring us to write this book and for giving us a place to refine our ideas about teaching machine learning. In particular, thanks to Hilary Mason for originally introducing us to several people at O’Reilly. Finally, we’d like to thank the many friends of ours in the data science community who’ve been so supportive and encouraging while we’ve worked on this book. Knowing that people wanted to read our book helped us keep up pace during the long haul that writing a full-length book entails. From Drew Conway I would like to thank Julie Steele, our editor, for appreciating our motivation for this book and giving us the ability to produce it.

—The R Project for Statistical Computing, http://www.r-project.org/ The best thing about R is that it was developed by statisticians. The worst thing about R is that...it was developed by statisticians. —Bo Cowgill, Google, Inc. R is an extremely powerful language for manipulating and analyzing data. Its meteoric rise in popularity within the data science and machine learning communities has made it the de facto lingua franca for analytics. R’s success in the data analysis community stems from two factors described in the preceding epitaphs: R provides most of the technical power that statisticians require built into the default language, and R has been supported by a community of statisticians who are also open source devotees.

pages: 446 words: 102,421

Network Security Through Data Analysis: Building Situational Awareness
by Michael S Collins
Published 23 Feb 2014

Bad security policy will result in users increasingly evading detection in order to get their jobs done or just to blow off steam, and that adds additional work for your defenders. The emphasis on actionability and the goal of achieving security is what differentiates this book from a more general text on data science. The section on analysis proper covers statistical and data analysis techniques borrowed from multiple other disciplines, but the overall focus is on understanding the structure of a network and the decisions that can be made to protect it. To that end, I have abridged the theory as much as possible, and have also focused on mechanisms for identifying abusive behavior.

When building visualizations, it’s important to know how long it will take to complete one and to provide the user with some feedback that the visualization is actually being generated. Further Reading Greg Conti, Security Data Visualization: Graphical Techniques for Network Analysis (No Starch Press, 2001). NIST Handbook of Explorator Data Analysis Cathy O’Neil and Rachel Schutt, Doing Data Science (O’Reilly, 2013). Edward Tufte, The Visual Display of Quantitative Information (Graphics Press, 2001). John Tukey, Exploratory Data Analysis (Pearson, 1997). * * * [17] There’s nothing quite like the day you start an investigation based on the attacker being written up in the New York Times

pages: 677 words: 206,548

Future Crimes: Everything Is Connected, Everyone Is Vulnerable and What We Can Do About It
by Marc Goodman
Published 24 Feb 2015

I visited the scions of Silicon Valley and made friends within the highly talented San Francisco Bay Area start-up community. I was invited to join the faculty of Singularity University, an amazing institution housed on the campus of NASA’s Ames Research Center, where I worked with a brilliant array of astronauts, roboticists, data scientists, computer engineers, and synthetic biologists. These pioneering men and women have the ability to see beyond today’s world, unlocking the tremendous potential of technology to confront the grandest challenges facing humanity. But many of these Silicon Valley entrepreneurs hard at work creating our technological future pay precious little attention to the public policy, legal, ethical, and security risks that their creations pose to the rest of society.

pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload
by Daniel J. Levitin
Published 18 Aug 2014

NOTES NOTE ON THE ENDNOTES Scientists make their living by evaluating evidence, and come to provisional conclusions based on the weight of that evidence. I say “provisional” because we acknowledge the possibility that new data may come to light that challenge current assumptions and understanding. In evaluating published data, scientists have to consider such things as the quality of the experiment (and the experimenters), the quality of the review process under which the work was assessed, and the explanatory power of the work. Part of the evaluation includes considering alternative explanations and contradictory findings, and forming a (preliminary) conclusion about what all the existing data say.

pages: 691 words: 203,236

Whiteshift: Populism, Immigration and the Future of White Majorities
by Eric Kaufmann
Published 24 Oct 2018

Controlling for all the usual confounders, white British Leavers, UKIP/BNP voters or right-wing voters move to places a few points whiter than white British Remainers and left-wing voters, but the difference is small. In the US, there is no equivalent data, so I turned to geocoded pro- and anti-Trump tweets. This work, with a data scientist, Andrius Mudinas, finds a similar pattern to Britain. Namely, white Americans move to significantly whiter places than minorities, but whites who are pro- and anti-Trump move to equally white areas. This echoes a growing number of US studies using voter registration files which find that the partisan composition of areas is not what attracts white Republicans or Democrats there.

pages: 137 words: 38,925

The Death of Truth: Notes on Falsehood in the Age of Trump
by Michiko Kakutani
Published 17 Jul 2018

Steve Bannon told the journalist Michael Lewis that Trump not only was an angry man but also had a unique ability to tap into the anger of others: “We got elected on Drain the Swamp, Lock Her Up, Build a Wall. This was pure anger. Anger and fear is what gets people to the polls.” At the same time, the Trump campaign made shrewd and Machiavellian use of social media and big-data tools, employing information from Facebook and Cambridge Analytica (a data science firm partially owned by the Trump backer and Breitbart investor Robert Mercer that boasts of its ability to psychologically profile millions of potential voters) to target its advertising and plan Trump’s campaign stops. Facebook revealed that the data of as many as 87 million people may have been shared improperly with Cambridge Analytica, which used the information to help create tools designed to predict and influence voter behavior.

pages: 404 words: 43,442

The Art of R Programming
by Norman Matloff

Thus, I have the points of view of both a “hard-core” computer scientist and of a statistician and statistics researcher. I hope this blend enables this book to fill a gap in the literature and enhances its value for you, the reader. Introduction xxiii 1 GETTING S TAR TED As detailed in the introduction, R is an extremely versatile open source programming language for statistics and data science. It is widely used in every field where there is data— business, industry, government, medicine, academia, and so on. In this chapter, you’ll get a quick introduction to R—how to invoke it, what it can do, and what files it uses. We’ll cover just enough to give you the basics you need to work through the examples in the next few chapters, where the details will be presented.

You can delete rows or columns by reassignment, too: > m <- matrix(1:6,nrow=3) > m [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 > m <- m[c(1,3),] > m [,1] [,2] [1,] 1 4 [2,] 3 6 3.4.2 Extended Example: Finding the Closest Pair of Vertices in a Graph Finding the distances between vertices on a graph is a common example used in computer science courses and is used in statistics/data sciences too. This kind of problem arises in some clustering algorithms, for instance, and in genomics applications. Matrices and Arrays 75 Here, we’ll look at the common example of finding distances between cities, as it is easier to describe than, say, finding distances between DNA strands. Suppose we need a function that inputs a distance matrix, where the element in row i, column j gives the distance between city i and city j and outputs the minimum one-hop distance between cities and the pair of cities that achieves that minimum.

pages: 386 words: 113,709

Why We Drive: Toward a Philosophy of the Open Road
by Matthew B. Crawford
Published 8 Jun 2020

Google is building a model city within Toronto, a sort of Bonsai version of what is possible, conceived in the spirit of other demonstration cities that were intended to sway elite opinion, such as Potemkin’s village that so impressed Catherine the Great. Sensors will be embedded throughout the physical plant to capture the resident’s activities, then to be massaged by cutting-edge data science. The hope, clearly, is to build a deep, proprietary social science. Such a science could lead to real improvements in urban management, for example by being able to predict demand for heat and electricity, manage the allocation of street capacity based on demand, and automate waste disposal. But note that hoarding the data collected, and guarding it with military-grade secrecy, is key to the whole concept, as without that there is no business rationale.

If that legitimacy cannot be grounded in our shared rationality, based on reasons that can be articulated, interrogated, and defended, it will surely be claimed on some other basis. What this will be is already coming into view, and it bears a striking resemblance to priestly divination: the inscrutable arcana of data science, by which a new clerisy peers into a hidden layer of reality that is revealed only by a self-taught AI program—the logic of which is beyond human knowing. For the past several years it has been common to hear establishmentarian intellectuals lament “populism” as a rejection of Enlightenment ideals.

Common Stocks and Uncommon Profits and Other Writings
by Philip A. Fisher
Published 13 Apr 2015

It is alarming to read some of the reasons given in brokerage reports recommending purchase of these shares and then to compare the outlook described in such documents with what actually was to happen. A fragmentary list of such companies might include: Memorex high 173⅞, Ampex high 49⅞, Levitz Furniture high 60½, Mohawk Data Sciences high 111, Litton Industries high 101¾, Kalvar high 176½. The list could go on and on and on. However, more examples would serve only to make the same point over and over. Since it should already be quite apparent how important is the habit of evaluating any difference that may exist between a contemporary financial-community appraisal of a company and the fundamental aspects of that company, it should be more productive for us to spend our time examining fur-ther the characteristics of these financial-community appraisals.

Mallory-Sharon Metals Corporation Management approaching of change in concept of depth in deterioration of discipline of integrity of knowing Margin, buying on Market efficiency of possible downturns in, selling and Marketability, of stocks. See Liquidity Marketing Market potential, of products Market price trends, (chart). See also Price entries Market research Markets, exhaustion of Market timing Matsushita Memorex Metal Hydrides Middle companies, in diversification Mistakes Mohawk Data Sciences Monopolies Montgomery Ward Motorola N National Association of Securities Dealers Needs, of investor New-issue supply New products New York Stock Exchange Nielsen, A. C., Co. Noble, Daniel O Opportunity history vs. price vs. Overpriced stocks Over-the-counter stocks P Panic of 1873 Past, clues from Patents Patience People-effectiveness program People factors Performance Per-share earnings, past Personnel relations.

pages: 169 words: 41,887

Literary Theory for Robots: How Computers Learned to Write
by Dennis Yi Tenen
Published 6 Feb 2024

Whatever is meant by “innovation” consists of realizing which features of the inherited design remain necessary and which can be discarded or improved. Without history, the present becomes invariable, sliding into orthodoxy (“it is what it is”). History’s milestones mark a path into a possible future. It’s okay, then, to sometimes struggle with basic file operations alongside some of my best data-­science students. Believe in yourself, Dennis! How can I consider myself literate if I don’t fully understand how writing works? A technical answer isn’t enough, either. “How writing works” cannot be reduced to the present technological moment, because that moment will pass, and rapidly. Instead, writing “has come to mean” this specific arrangement of mind, body, tool, circuit, bit, gate . . . through its historical development.

pages: 592 words: 125,186

The Science of Hate: How Prejudice Becomes Hate and What We Can Do to Stop It
by Matthew Williams
Published 23 Mar 2021

The pattern in the UK is broadly similar, with the internet ahead (77 per cent), and TV (55 per cent) and print (22 per cent) trailing behind.4 For younger age groups (particularly those aged sixteen to twenty-four), online sources are their primary gateway to information about the world, family and friends.5 Algorithms learn from user behaviour, and therefore influence our collective actions. This means our prejudices and biases become embedded in bits of code that go on to influence what we are exposed to online, reflecting back these biases in often amplified ways. The emerging consensus from the field of data science is that algorithms are assisting in the polarisation of information exposure and hence debate and action online. Take YouTube as an example. The website Algotransparency.org, developed by an ex-Google employee, analyses YouTube’s top autoplay suggestions based on any search in order to demonstrate how the site’s recommendation algorithm works.

Filter bubbles and our bias Research on internet ‘filter bubbles’, often used interchangeably with the term ‘echo chambers’,§ has established that partisan information sources are amplified in online networks of like-minded social media users, where they go largely unchallenged due to ranking algorithms filtering out any challenging posts.9 Data science shows these filter bubbles are resilient accelerators of prejudice, reinforcing and amplifying extreme viewpoints on both sides of the spectrum. Looking at over half a million tweets covering the issues of gun control, same-sex marriage and climate change, New York University’s Social Perception and Evaluation Lab found that hateful posts related to these issues increased retweeting within filter bubbles, but not between them.

pages: 475 words: 127,389

Apollo's Arrow: The Profound and Enduring Impact of Coronavirus on the Way We Live
by Nicholas A. Christakis
Published 27 Oct 2020

The “war on cancer” declared in 1971 had a similar impact (although it did not cure cancer, it advanced fundamental medical science). Perhaps the multitrillion-dollar hit to the American economy by the COVID-19 pandemic will make multibillion-dollar investments in science—from virology to medicine to epidemiology to data science—seem well worth it. Plagues can also lead to long-term shifts in how we think about government and leaders. In medieval times, the manifest inability of rulers, priests, doctors, and others in positions of authority to control the course of plague led to a wholesale loss of faith in the corresponding institutions and a strong desire for new sources of authority.

Christakis is a physician and sociologist who explores the ancient origins and modern implications of human nature. He directs the Human Nature Lab at Yale University, where he is the Sterling Professor of Social and Natural Science in the Departments of Sociology, Medicine, Ecology and Evolutionary Biology, Statistics and Data Science, and Biomedical Engineering. He is the codirector of the Yale Institute for Network Science, the coauthor of Connected, and the author of Blueprint. Also by Nicholas A. Christakis Death Foretold Connected (with James H. Fowler) Blueprint

pages: 1,085 words: 219,144

Solr in Action
by Trey Grainger and Timothy Potter
Published 14 Sep 2014

You can also combine the Boost query parser with another query parser through the use of a nested query: /select?q=_query_:"{!edismax qf=title content}data science" AND _query_:"{!boost b=log(popularity)}*:*" AND _query_:"{!boost b=recip( ms(NOW,articledate),3.16e-11,1,1)}category:news" This query will run a search for the keywords data science, boosting all documents by their popularity and by how recently they were posted if they fall within the “news” category. The number of results will be the same as the search for data science; the boost clauses only serve to affect document relevancy. 7.6.6. Prefix query parser The Prefix query parser can be used in place of a wildcard query.

pages: 933 words: 205,691

Hadoop: The Definitive Guide
by Tom White
Published 29 May 2009

You can inspect the generated script and check that the substitutions look sane (because they are dynamically generated, for example) before running it in normal mode. At the time of this writing, Grunt does not support parameter substitution. Chapter 12. Hive In “Information Platforms and the Rise of the Data Scientist,”[100] Jeff Hammerbacher describes Information Platforms as “the locus of their organization’s efforts to ingest, process, and generate information,” and how they “serve to accelerate the process of learning from empirical data.” One of the biggest ingredients in the Information Platform built by Jeff’s team at Facebook was Hive, a framework for data warehousing on top of Hadoop.

pages: 420 words: 135,569

Imaginable: How to See the Future Coming and Feel Ready for Anything―Even Things That Seem Impossible Today
by Jane McGonigal
Published 22 Mar 2022

And, studies show, they work incredibly well: programs trained with noise injections learn much faster and perform much better than programs trained only on real-world data sets. Recently, Erik Hoel, a neuroscientist at Tufts University, noticed how similar these machine-learning techniques are to the surreal and hard-to-interpret nature of human dreams.2 When we dream, Hoel suggested in a 2021 paper in the data science journal Patterns, it often feels like a “noise injection” into our brains. Our dreams rarely repeat the exact details of our real-world experiences. Instead they recombine real people, places, experiences, and events in bizarre and seemingly random ways. Human dreams also have the same sparseness, or missing data, as noise injections, a kind of narrative fuzziness.

How can we make room there for all of us? Not everyone is on the move in this future. The rest of humanity is learning how to make others feel welcome and at home somewhere new. In fact, the art of welcoming is now ranked by online learners as the most useful and desirable practical skill to master, ahead of computer programming, data science, and even health care. It turns out that a “soft” skill may be the most essential one for humanity’s future. Migration in this future is no longer an individual burden or a dangerous, illegal journey. It’s coordinated, intentional, and strategic—the whole world working together to build vibrant, thriving societies.

pages: 161 words: 52,058

The Art of Corporate Success: The Story of Schlumberger
by Ken Auletta
Published 28 Sep 2015

And while the profits of most oil and oilfield-service companies fell sharply in 1982, Schlumberger’s net income rose by more than 6 percent. Science is the foundation of Schlumberger. Science is the link between the various corporate subsidiaries, for the task of most of them is collecting, measuring, and transmitting data. Science, and particularly geophysics, was at the core of the careers of Conrad and Marcel Schlumberger, the company’s founders. Both were born in the town of Guebwiller, in Alsace—Conrad in 1878 and Marcel six years later. They were two of six children of a Protestant family that owned a prosperous textile machine business.

pages: 175 words: 54,755

Robot, Take the Wheel: The Road to Autonomous Cars and the Lost Art of Driving
by Jason Torchinsky
Published 6 May 2019

v=PgnsapPGaaw. 27 Wikipedia, “Edge Detection,” https://en.wikipedia.org/wiki/Edge_detection. 28 Torchinsky, Jason, “Why Nissan Built Realistic Inflatable Versions of Its Most Popular Cars,” Jalopnik, October 18, 2012, https://jalopnik.com/why-nissan-built-realistic-inflatable-versions-of-its-m-5952415. 29 Condliffe, Jamie, “This Image Is Why Self-Driving Cars Come Loaded with Many Types of Sensors,” MIT Technology Review, July 21, 2017, https://www. technologyreview.com/s/608321/this-image-is-why -self-driving-­cars-come-­loaded-­with-many-types-of-sensors/. 30 Antunes, João, “Performance over Price: Lumina’s Novel Lidar Tech for Autonomous Vehicles,” SPAR 3D, May 5, 2017, https://www.spar3d.com/news/lidar/performance-price-luminars -novel-lidar-tech-autonomous-vehicles/. 31 Dwivedi, Priya, “Tracking a self-driving car with high precision,” Towards Data Science, April 30, 2017, https://towardsdatascience.com/helping-a-self-driving-car-localize-itself-88705f419e4a. 32 Kichun Jo; Yongwoo Jo; Jae Kyu Suhr; Ho Gi Jung; Myoungho Sunwoo, “Precise Localization of an Autonomous Car Based on Probabilistic Noise Models of Road Surface Marker Features Using Multiple Cameras,” IEEE Transactions on Intelligent Transportaion Systems, vol, 16, 6, December 2015, https://ieeexplore.ieee.org/document/7160754/. 33 Silver, David, “How Self-Driving Cars Work,” Medium, December 14, 2017, https://medium.com/udacity/how-self-driving-cars-work -f77c49dca47e. 34 Website of the Australian Government Department of Infrastructure, Regional Development and Cities, https://infrastructure.gov.au/vehicles/mv_standards_act/files/Sub136_Austroads.pdf.

pages: 590 words: 152,595

Army of None: Autonomous Weapons and the Future of War
by Paul Scharre
Published 23 Apr 2018

Comment at 5:10. 224 Automated hacking back is a theoretical concept: Alexander Velez-Green, “When ‘Killer Robots’ Declare War,” Defense One, April 12, 2015, http://www.defenseone.com/ideas/2015/04/when-killer-robots-declare-war/109882/. 224 automate “spear phishing” attacks: Karen Epper Hoffman, “Machine Learning Can Be Used Offensively to Automate Spear Phishing,” Infosecurity Magazine, August 5, 2016, https://www.infosecurity-magazine.com/news/bhusa-researchers-present-phishing/. 224 automatically develop “humanlike” tweets: John Seymour and Philip Tully, “Weaponizing data science for social engineering: Automated E2E spear phishing on Twitter,” https://www.blackhat.com/docs/us-16/materials/us-16-Seymour-Tully-Weaponizing-Data-Science-For-Social-Engineering-Automated-E2E-Spear-Phishing-On-Twitter-wp.pdf. 224 “in offensive cyberwarfare”: Eric Messinger, “Is It Possible to Ban Autonomous Weapons in Cyberwar?,” Just Security, January 15, 2015, https://www.justsecurity.org/19119/ban-autonomous-weapons-cyberwar/. 225 estimated 8 to 15 million computers worldwide: “Virus Strikes 15 Million PCs,” UPI, January 26, 2009, http://www.upi.com/Top_News/2009/01/26/Virus-strikes-15-million-PCs/19421232924206/. 225 method to counter Conficker: “Clock ticking on worm attack code,” BBC News, January 20, 2009, http://news.bbc.co.uk/2/hi/technology/7832652.stm. 225 brought Conficker to heel: Microsoft Security Intelligence Report: Volume 11 (11), Microsoft, 2011. 226 “prevent and react to countermeasures”: Alessandro Guarino, “Autonomous Intelligent Agents in Cyber Offence,” in K.

pages: 285 words: 58,517

The Network Imperative: How to Survive and Grow in the Age of Digital Business Models
by Barry Libert and Megan Beck
Published 6 Jun 2016

He is internationally known for pioneering research on networked organizations, leadership mental models, and marketing strategy. He consults with major firms around the world, providing expert testimony, and has lectured at over fifty universities worldwide. He has authored more than two dozen books on various topics, including network theory, innovation, and leadership. OPENMATTERS is a data science company. It focuses on analyzing business models and the underlying sources of value. The firm harnesses technology, big data and analytics to categorize and measure business model performance. OpenMatters uses proprietary research to build indices and ratings for investors and strategies and rankings for companies to help both achieve better returns.

pages: 207 words: 59,298

The Gig Economy: A Critical Introduction
by Jamie Woodcock and Mark Graham
Published 17 Jan 2020

Available at: https://www.oecd-ilibrary.org/employment/automation-skills-use-and-training_2e2f4eea-en Noble, S.U. (2018) Algorithms of Oppression: How Search Engines Reinforce Racism. New York: NYU Press. OECD (2019) Measuring platform mediated workers. OECD Digital Economy Papers No. 282. Ojanperä, S., O’Clery, N. and Graham, M. (2018) Data science, artificial intelligence and the futures of work. Alan Turing Institute Report, 24 October. Available at: http://doi.org/10.5281/zenodo.1470609 O’Neil, C. (2017) Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. London: Penguin. Pasquale, F. (2015) The Black Box Society: The Secret Algorithms That Control Money and Information.

pages: 262 words: 60,248

Python Tricks: The Book
by Dan Bader
Published 14 Oct 2017

. >>> arr = bytearray((0, 1, 2, 3)) >>> arr[1] 1 # The bytearray repr: >>> arr bytearray(b'x00x01x02x03') # Bytearrays are mutable: >>> arr[1] = 23 >>> arr bytearray(b'x00x17x02x03') >>> arr[1] 23 # Bytearrays can grow and shrink in size: >>> del arr[1] >>> arr bytearray(b'x00x02x03') >>> arr.append(42) >>> arr bytearray(b'x00x02x03*') # Bytearrays can only hold "bytes" # (integers in the range 0 <= x <= 255) >>> arr[1] = 'hello' TypeError: "an integer is required" >>> arr[1] = 300 ValueError: "byte must be in range(0, 256)" # Bytearrays can be converted back into bytes objects: # (This will copy the data) >>> bytes(arr) b'x00x02x03*' Key Takeaways There are a number of built-in data structures you can choose from when it comes to implementing arrays in Python. In this chapter we’ve focused on core language features and data structures included in the standard library only. If you’re willing to go beyond the Python standard library, third-party packages like NumPy14 offer a wide range of fast array implementations for scientific computing and data science. By restricting ourselves to the array data structures included with Python, here’s what our choices come down to: You need to store arbitrary objects, potentially with mixed data types? Use a list or a tuple, depending on whether you want an immutable data structure or not. You have numeric (integer or floating point) data and tight packing and performance is important?

pages: 217 words: 63,287

The Participation Revolution: How to Ride the Waves of Change in a Terrifyingly Turbulent World
by Neil Gibb
Published 15 Feb 2018

There was another pattern in the data – a clear correlation between the hormone levels in each man and the significance he attributed to the practice: the more importance given to meditation, the greater the levels of hormone. What Dr Saron began to conclude was that it wasn’t the meditation per se that was making the difference, but how meaningful it was to those who practised it. This was a great insight. But it was actually only half the picture. The problem with a lot of data science and research is the frame. Dr Saron’s research was focused on the individual. What it missed was that the men were meditating in a group. So much research into addiction, illness, and depression misses this point. Why do people so often get well in rehab but relapse when they leave? Why do people get fit in gym and yoga classes but find it hard to maintain fitness when they are on their own?

pages: 196 words: 61,981

Blockchain Chicken Farm: And Other Stories of Tech in China's Countryside
by Xiaowei Wang
Published 12 Oct 2020

They are the creative director at Logic magazine, and their work encompasses community-based and public art projects, data visualization, technology, ecology, and education. Their projects have been finalists for the Index Award and featured by The New York Times, the BBC, CNN, VICE, and other outlets. They are working toward a Ph.D. at UC Berkeley, where they are a part of the National Science Foundation’s Research Traineeship program in Environment and Society: Data Science for the 21st Century. You can sign up for email updates here. Thank you for buying this Farrar, Straus and Giroux ebook. To receive special offers, bonus content, and info on new releases and other great reads, sign up for our newsletters. Or visit us online at us.macmillan.com/newslettersignup For email updates on the author, click here.

She Has Her Mother's Laugh
by Carl Zimmer
Published 29 May 2018

Social media platforms have worked hard to make this replication not merely perfect but easy. You don’t have to dig into the HTML code for your favorite political slogan or your favorite clip of an insane Russian driver. You press SHARE. You retweet. It’s not just easy to spread memes; it’s also easy to track them. Data scientists can track memes with all the numerical precision of a geneticist following an allele for antibiotic resistance in a petri dish. Forty years after the publication of The Selfish Gene, Dawkins wrote an epilogue to an anniversary edition in which he looked back at his idea with satisfaction.

pages: 898 words: 236,779

Digital Empires: The Global Battle to Regulate Technology
by Anu Bradford
Published 25 Sep 2023

For example, Twitter reportedly deactivated only 11 percent of over 3,500 total accounts spreading pro-government propaganda worldwide.161 US platforms are also criticized for their inability to effectively moderate content in foreign languages. Documents leaked by Frances Haugen reveal Facebook’s inability to curtail inflammatory hate speech in Ethiopia, where the platform was deployed to call for killings and mass internment of the country’s ethnic Tigrayans as part of the ongoing civil war.162 Timnit Gebru, a data scientist who used to lead Google’s ethical AI team and who is fluent in the Amharic language used in Facebook posts in Ethiopia, described the content circulating as “the most terrifying I’ve ever seen anywhere,” likening it to the language used in the context of the earlier Rwanda genocide. In the leaked documents, Facebook acknowledged that it lacked the capabilities to moderate content in Amharic.

pages: 237 words: 67,154

Ours to Hack and to Own: The Rise of Platform Cooperativism, a New Vision for the Future of Work and a Fairer Internet
by Trebor Scholz and Nathan Schneider
Published 14 Aug 2017

He is also helping build seed.coop, a platform for co-ops everywhere to grow their membership. Say hi on Twitter @daspitzberg. Arun Sundararajan is Professor and the Robert L. and Dale Atkins Rosen Faculty Fellow at New York University’s Leonard N. Stern School of Business. He is also an affiliated faculty member at NYU’s Center for Urban Science+Progress, and at NYU’s Center for Data Science. Astra Taylor is a documentary filmmaker, writer, and political organizer. She is the director of the films Zizek! and Examined Life, and the author of The People’s Platform: Taking Back Power and Culture in the Digital Age (Picador, 2015), winner of a 2015 American Book Award. She helped launch the Rolling Jubilee debt-abolishing campaign and is a co-founder of the Debt Collective.

pages: 391 words: 71,600

Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
by Satya Nadella , Greg Shaw and Jill Tracie Nichols
Published 25 Sep 2017

Similarly, an insurer like MetLife can spin up our cloud with ML overnight to run enormous actuarial tables and have answers to its most crucial financial questions in the morning, making it possible for the company to adapt quickly to dramatic shifts in the insurance landscape—an unexpected flu epidemic, a more-violent-than-normal hurricane season. Whether you are in Ethiopia or Evanston, Ohio, or if you hold a doctorate in data science or not, everyone should have that capability to learn from the data. With Azure, Microsoft would democratize machine learning just as it had done with personal computing back in the 1980s. To me, meeting with customers and learning from both their articulated and unarticulated needs is key to any product innovation agenda.

pages: 220 words: 66,518

The Biology of Belief: Unleashing the Power of Consciousness, Matter & Miracles
by Bruce H. Lipton
Published 1 Jan 2005

Hallett, M. (2000). “Transcranial magnetic stimulation and the human brain.” Nature 406: 147-150. Helmuth, L. (2001). “Boosting Brain Activity From The Outside In.” Science 292: 1284-1286. Jansen, R., H. Yu, et al. (2003). “A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data.” Science 302: 449-453. Jin, M., M. Blank, et al. (2000). “ERK1/2 Phosphorylation, Induced by Electromagnetic Fields, Diminishes During Neoplastic Transformation.” Journal of Cell Biology 78: 371-379. Kübler-Ross, Elizabeth (1997) On Death and Dying, New York, Scribner. Li, S., C. M. Armstrong, et al. (2004).

Designing the Mind: The Principles of Psychitecture
by Designing The Mind and Ryan A Bush
Published 10 Jan 2021

You could engage in gratitude to up-regulate your desire for all the great things you have, even if this particular job isn’t one of them. You also might down-regulate the specific desire causing your suffering by reminding yourself of the hour-and-a-half-long commute, or that movie rental is probably not a great industry to build a career in right now, or that you have a master’s in data science. Honestly, I have no idea what you saw in that job in the first place, Sarah. Once you learn and strengthen your ability to use these tactics, you will be able to adjust your desires at will, largely eliminating the tendency to suffer over ungratified longings. The Counteraction of Desire Greed and aversion surface in the form of thoughts, and thus can be eroded by a process of ‘thought substitution,’ by replacing them with the thoughts opposed to them

pages: 681 words: 64,159

Numpy Beginner's Guide - Third Edition
by Ivan Idris
Published 23 Jun 2015

A step-by-step tutorial that will help users solve research-based problems from various areas of science using Scipy. IPython Interactive Computing and Visualization Cookbook ISBN: 978-1-78328-481-8 Paperback: 512 pages Over 100 hands-on recipes to sharpen your skills in high-performance numerical computng and data science with Python 1. Find out how to improve your Code to write high-quality, readable, and well-tested programs with IPython. 2. Master all of the new features of the IPython Notebook, including interactve HTML/JavaScript widgets. 3. Analyze data efectvely using Bayesian and Frequentst data models with Pandas, PyMC, and R.

pages: 256 words: 67,563

Explaining Humans: What Science Can Teach Us About Life, Love and Relationships
by Camilla Pang
Published 12 Mar 2020

But I also know, deep down, that it’s love that makes us feel alive, even when it’s inconvenient, painful and hard to bear. The mathematician in me is also a romantic. She believes there are ways we can use statistics, probability and machine-learning techniques to improve our search for love and harmony with the people we care about. And if you’re sceptical about the role of data science in your love life, then I would ask if you’ve ever used Tinder, Bumble or any other dating app. Because the truth is that many of us have been sharing a bed with AI for some time. Relationships may be far from a science, but there are many ways in which science can help us to manage them better.

pages: 210 words: 65,833

This Is Not Normal: The Collapse of Liberal Britain
by William Davies
Published 28 Sep 2020

This transformation in our recording equipment is responsible for much of the outrage directed at those formerly tasked with describing the world. The rise of blanket surveillance technologies has paradoxical effects, raising expectations for objective knowledge to unrealistic levels, and then provoking fury when those in the public eye do not meet them. On the one hand, data science appears to make the question of objective truth easier to settle. The slow and imperfect institutions of social science and journalism can be circumvented, and we can get directly to reality itself, unpolluted by human bias. Surely, in this age of mass data capture, the truth will become undeniable.

pages: 231 words: 64,734

Safe Haven: Investing for Financial Storms
by Mark Spitznagel
Published 9 Aug 2021

A reliable indication that this fallacy is at work is when bold forward‐looking statements about a particular risk‐mitigation strategy are uttered by someone who has never actually done it, in real time (as in during the Sunday game, as opposed to on the following Monday from their armchair). It is a kissing cousin to datamining, or overfitting, surely the most well‐trodden pitfall in the data sciences. There will always be flawless, successful strategies to be gleaned from past data, from randomness alone. There is a deceptive narrative basis to such strategies that always sound so reasonable and plausible; they have charm and seductive powers. So it's a pretty easy sale, really. Sad to say, but heuristic storytelling plays a huge roll in what risk mitigation has become.

pages: 247 words: 69,593

The Creative Curve: How to Develop the Right Idea, at the Right Time
by Allen Gannett
Published 11 Jun 2018

Unfortunately, he hated anthropology: Kurt Vonnegut, A Man Without a Country (New York: Seven Stories Press, 2005). brought together a team of academic superheroes: The study produced by his team of academic superheroes was Reagan et al., “The Emotional Arcs of Stories Are Dominated by Six Basic Shapes,” EPJ Data Science, November 4, 2016, https://epjdatascience.springeropen.com/​articles/​10.1140/​epjds/​s13688-016-0093-1. Kenya Barris: Details relating to Black-ish and Barris’s involvement and background drawn mostly from my interviews with him. Researcher Gregory Berns: Details relating to dopamine drawn from my interviews with Berns.

pages: 227 words: 63,186

An Elegant Puzzle: Systems of Engineering Management
by Will Larson
Published 19 May 2019

I’ve found that agreeing on the expected skills for a given role can be far harder than anyone anticipates, and it can require spending significant time with your interviewers to agree on what the role requires. (This is often in the context of what extent and kind of programming experience is needed in engineering management, DevOps, and data science roles.) 6.2.3 Finding signal After you’ve broken the role down into a certain set of skills and requirements, the next step is to break your interview loop into a series of interview slots that together cover all of those signals. Typically, each skill is covered by two different interviewers to create some redundancy in signal detection, in case one of the interviews doesn’t go cleanly.

pages: 229 words: 72,431

Shadow Work: The Unpaid, Unseen Jobs That Fill Your Day
by Craig Lambert
Published 30 Apr 2015

In 1997, the College Board introduced an advanced placement examination in statistics. The number of high school students taking it tripled in the decade after 2001, to 149,165 by 2012. American universities conferred close to 3,000 bachelor’s degrees in statistics in the 2010–11 academic year, a 68 percent increase from four years before. A Harvard data science course drew 400 students in 2013, not only undergraduates but those from graduate schools of law, business, government, design, and medicine. At the University of California, Berkeley, the number of statistics majors quintupled from 50 in 2003 to 250 a decade later. Fields of study that strongly attract students like this are an index of what we value, and where society is headed.

Chasing My Cure: A Doctor's Race to Turn Hope Into Action; A Memoir
by David Fajgenbaum
Published 9 Sep 2019

But actually, it’s those patients, who donate samples, data, and funds, that are the only hope the CDCN has for developing a cure. We also partner with tech and pharmaceutical companies on large-scale studies whenever possible. One tech company, Medidata, is contributing machine learning and data science tools to help us generate clinically meaningful insights from the half a million data points in the proteomics study. Though they are often demonized because of a few notable bad actors, pharmaceutical companies have incredible power to do good through contributing funds, data, and samples for research.

The Jobs to Be Done Playbook: Align Your Markets, Organization, and Strategy Around Customer Needs
by Jim Kalbach
Published 6 Apr 2020

Thanks to JTBD, our team was able to focus on solving the biggest opportunities of the customer experience on carmax.com, thus making a meaningful impact to the product. Jake Mitchell is a Principal Product Designer at CarMax, where he strives to reinvent the way customers find and fall in love with their next car. In addition to user experience design and research, Jake is proficient in web development and data science. This case study is a summary of his presentation “Using Jobs to Be Done at CarMax to Guide Product Innovation,” given at UX STRAT 2017 in Boulder, Colorado. Recap JTBD not only helps you understand the customer’s problem, but it also guides solution development. In particular, you can leverage JTBD in several ways to tie the design of products and services back to the individual’s job to be done.

pages: 211 words: 78,547

How Elites Ate the Social Justice Movement
by Fredrik Deboer
Published 4 Sep 2023

turnout for high school graduates: “Voting and Registration in the Election of 2020,” United States Census Bureau, April 2021, https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-585.html. And they’re more likely: See, among many other studies: Jacob R. Brown, Ryan D. Enos, James Felgenbaum, and Soumyajit Mazumder, “Childhood Cross-Ethnic Exposure Predicts Political Behavior Seven Decades Later: Evidence from Linked Administrative Data.” Science Advances 7, no. 24 (June 2021), https://www.science.org/doi/10.1126/sciadv.abe8432. But recent high-quality research: Ralph Scott, “Does University Make You More Liberal? Estimating the Within-Individual Effects of Higher Education on Political Values,” Electoral Studies 77 (June 2022), 102471.

pages: 472 words: 80,835

Life as a Passenger: How Driverless Cars Will Change the World
by David Kerrigan
Published 18 Jun 2017

_r=0 Chapter 5 - All Change http://www.wsj.com/articles/could-self-driving-cars-spell-the-end-of-ownership-1448986572 http://www.bloomberg.com/news/articles/2016-09-11/self-driving-cars-to-cut-u-s-insurance-premiums-40-aon-says http://www.wsj.com/articles/will-the-driverless-car-upend-insurance-1425428891 https://www.wsj.com/articles/driverless-cars-threaten-to-crash-insurers-earnings-1469542958 http://www.datakind.org/projects/creating-safer-streets-through-data-science/ https://twitter.com/BenedictEvans/status/721484633351696384?replies_view=true&cursor=ARAUhOE6Awo https://techcrunch.com/2015/10/30/ride-sharing-will-give-us-back-our-cities/ http://dupress.deloitte.com/dup-us-en/focus/future-of-mobility/roadmap-for-future-of-urban-mobility.html?id=us:2el:3pr:prwhatnext:eng:cons:091516 https://www.planning.org/planning/2015/may/autonomouscars.htm http://www.uspirg.org/news/usp/new-report-shows-mounting-evidence-millennials%E2%80%99-shift-away-driving http://www.slate.com/articles/business/the_juice/2014/07/driving_vs_flying_which_is_more_harmful_to_the_environment.html Chapter 6 - Challenges http://www.wsj.com/articles/driverless-cars-to-fuel-suburban-sprawl-1466395201 https://www.washingtonpost.com/news/energy-environment/wp/2016/06/23/save-the-driver-or-save-the-crowd-scientists-wonder-how-driverless-cars-will-choose/?

Pearls of Functional Algorithm Design
by Richard Bird
Published 15 Sep 2010

Gries, D. (1979). The Schorr–Waite graph marking algorithm. Acta Informatica 11, 223–32. McCarthy, J. (1960). Recursive functions of symbolic expressions and their computation by machine. Communications of the ACM 3, 184. Mason, I. A. (1988). Verification of programs that destructively manipulate data. Science of Computer Programming 10 (2), 177–210. Möller, B. (1997). Calculating with pointer structures. IFIP TC2/WG2.1 Working Conference on Algorithmic Languages and Calculi. Chapman and Hall, pp. 24–48. Möller, B. (1999). Calculating with acyclic and cyclic lists. Information Sciences 119, 135–54.

pages: 239 words: 80,319

Lurking: How a Person Became a User
by Joanne McNeil
Published 25 Feb 2020

A user searching “stages of pancreatic cancer” might not mean “(I am experiencing) stages of pancreatic cancer” but rather, “(my nephew is experiencing) stages of pancreatic cancer” or “(this fictional character in my screenplay is experiencing) stages of pancreatic cancer” or “(I am researching) stages of pancreatic cancer.” This is why I can only side-eye data science researchers who wish to declare one state is more queer than another or more gullible to conspiracy theories than the other, based on unreliable data like Google Trends. Who can say for certain why other people google what they do? A search engine is no truth serum. It is distilled curiosity, which has no borders and is, by definition, undefined.

The Buddha and the Badass: The Secret Spiritual Art of Succeeding at Work
by Vishen Lakhiani
Published 14 Sep 2020

Don’t underestimate how much you matter or assume your managers won’t have time for your concern or question. And NEVER ever take on the disempowering beliefs of someone else. In fact, when you hear such a thing, correct them. Simply ask a question like: “Have you validated that belief with hard data science and study? Or is that a personal opinion clouded by Fundamental Attribution Error and one’s own childhood insecurities projecting a character trait onto someone else?” You get the idea ;-) The simple rule to live by is this: “If the belief makes me feel disempowered, unless it’s backed by empirical scientific data, and not just on someone’s opinion, I’m going to choose to ignore it and do what will empower me instead.”

pages: 268 words: 81,811

Flash Crash: A Trading Savant, a Global Manhunt, and the Most Mysterious Market Crash in History
by Liam Vaughan
Published 11 May 2020

The first element involved statistically analyzing changes in the order book and elsewhere for information that indicated whether prices would rise or fall. Inputs might include the number and type of resting orders at different levels, how fast prices are moving around, and the types of market participants active at any time. “Think of it as a giant data science project,” explains one HFT owner. For years, Nav had used his superior pattern recognition and recall skills to read the ebbs and flows of the order book until it became second nature, but even the most gifted human scalper is no match for a computer at parsing large amounts of data. When it came to speed, the leading HFT firms invested hundreds of millions of dollars in computers, cable, and telecommunications equipment to ensure they could react first in what was often a winner-takes-all game.

pages: 306 words: 82,765

Skin in the Game: Hidden Asymmetries in Daily Life
by Nassim Nicholas Taleb
Published 20 Feb 2018

Just a little bit of significant data is needed when one is right, particularly when it is disconfirmatory empiricism, or counterexamples: only one data point (a single extreme deviation) is sufficient to show that Black Swans exist. Traders, when they make profits, have short communications; when they lose they drown you in details, theories, and charts. Probability, statistics, and data science are principally logic fed by observations—and absence of observations. For many environments, the relevant data points are those in the extremes; these are rare by definition, and it suffices to focus on those few but big to get an idea of the story. If you want to show that a person has more than, say $10 million, all you need is to show the $50 million in his brokerage account, not, in addition, list every piece of furniture in his house, including the $500 painting in his study and the silver spoons in the pantry.

pages: 286 words: 87,401

Blitzscaling: The Lightning-Fast Path to Building Massively Valuable Companies
by Reid Hoffman and Chris Yeh
Published 14 Apr 2018

They may not have the pedigree, but they are great at learning new things and at charging hard to execute on them. Plus, the early business is in too much flux to effectively leverage the finely tuned capabilities of a true specialist. Even at the Tribe stage, hiring a specialist should be considered a major exception—for example, if you need an engineer with a very specialized area of expertise, such as data science or machine learning. The Village stage is where it becomes prudent to hire specialists, as both executives and key contributors. At the Tribe stage you want employees with skill sets flexible enough to pivot along with the company, but if you have hundreds of employees, you better have some pretty well-developed theories about your business and where it is going!

pages: 302 words: 84,881

The Digital Party: Political Organisation and Online Democracy
by Paolo Gerbaudo
Published 19 Jul 2018

Using the polling and rating mechanisms built into the architecture of social media and online platforms more generally, they engage their members/users in all forms of consultations, constantly charting their shifting opinions and with the ultimate aim of adapting to their evolving tendencies, in ways not too dissimilar from those practiced by digital companies and their data science teams. Table 3.1 Similarities between platform companies and platform parties Platform Companies Platform Parties Operational logic Data gathering Political data gathering Membership Free sign-up Free membership Value extraction Free labour Free political labour Second, platform parties operate with a free registration model in which membership is disconnected from financial contribution.

pages: 297 words: 84,447

The Star Builders: Nuclear Fusion and the Race to Power the Planet
by Arthur Turrell
Published 2 Aug 2021

PHOTOGRAPH: KAREN HATCH ARTHUR TURRELL has a PhD in plasma physics from Imperial College London and won the Rutherford Prize for the Public Understanding of Plasma Physics. His research and writing have been featured in the Daily Mail, The Guardian, the International Business Times, Gizmodo, and other publications. He works as a deputy director at the Data Science Campus of the Office for National Statistics in the UK. SimonandSchuster.com www.SimonandSchuster.com/Authors/Arthur-Turrell @ScribnerBooks We hope you enjoyed reading this Simon & Schuster ebook. Get a FREE ebook when you join our mailing list. Plus, get updates on new releases, deals, recommended reads, and more from Simon & Schuster.

pages: 282 words: 85,658

Ask Your Developer: How to Harness the Power of Software Developers and Win in the 21st Century
by Jeff Lawson
Published 12 Jan 2021

The idea was born of necessity—like most nonprofits, Thorn didn’t have deep pockets. Hackathons became part of Thorn’s R&D lab. “We just introduce four or five problems, and say, ‘Okay, go fix it. Solve the problem,’” Kutcher says. Thorn continues to use hackathons to expand upon its work, and has also built out its own dedicated engineering and data science team, 100 percent dedicated to advanced tools to end online child sexual abuse. More interesting, from my perspective, is what this reveals about developers themselves. The people who participate in these hackathons often work for companies that treat them like code monkeys. Thorn invites them to spend a weekend trying to solve an important and difficult tech problem—how to wipe out child sex trafficking—and gives them complete freedom.

Know Thyself
by Stephen M Fleming
Published 27 Apr 2021

A scientist might wonder whether they should spend more time learning new analysis tools or a new theory, and whether the benefits of doing so outweigh the time they could be spending on research. This kind of dilemma is now even more acute thanks to the rise of online courses providing high-quality material on topics ranging from data science to Descartes. One influential theory of the role played by metacognition in choosing what to learn is known as the discrepancy reduction theory. It suggests that people begin studying new material by selecting a target level of learning and keep studying until their assessment of how much they know matches their target.

pages: 366 words: 94,209

Throwing Rocks at the Google Bus: How Growth Became the Enemy of Prosperity
by Douglas Rushkoff
Published 1 Mar 2016

The big rub is that invention of genuinely new products, of game changers, never comes from refining our analysis of existing consumer trends but from stoking the human ingenuity of our innovators. Without an internal source of innovation, a company loses any competitive advantage over its peers. It is only as good as the data science firm it has hired—which may be the very same one that its competitors are using. In any event, everyone’s buying data from the same brokers and using essentially the same analytics techniques. The only long-term winners in this scheme are the big data firms themselves. Paranoia just feeds the system.

pages: 313 words: 92,053

Places of the Heart: The Psychogeography of Everyday Life
by Colin Ellard
Published 14 May 2015

Theoretically, such data could constitute an extremely useful tool for the democratization of city design. Access to this new form of information, critical as it is to understanding how places work, should not only be easily available to everyone, but the basic tools for understanding how it can be used and what it can tell us should be available for all. Data science should be taught in our schools. Discourse in how cities work couched in visualizations built from big data is becoming so important that the basics should be included in the public educational curriculum, just as civics has been now for generations. And, as architectural theorist and historian Sarah Goldhagen has argued, so should architectural history and design.

pages: 293 words: 88,490

The End of Theory: Financial Crises, the Failure of Economics, and the Sweep of Human Interaction
by Richard Bookstaber
Published 1 May 2017

Journal of Financial Economics 59, no. 3: 383–411. doi: 10.1016/S0304-405X(00)00091-X. Helbing, Dirk, Illés Farkas, and Tamás Vicsek. 2000. “Simulating Dynamical Features of Escape Panic.” Nature 407: 487–90. doi: 10.1038/35035023. Helbing, Dirk, and Pratik Mukerji. 2012. “Crowd Disasters as Systemic Failures: Analysis of the Love Parade Disaster.” EPJ Data Science 1: 7. doi: 10.1140/epjds7. Hemelrijk, Charlotte K., and Hanno Hildenbrandt. 2012. “Schools of Fish and Flocks of Birds: Their Shape and Internal Structure by Self-Organization.” Interface Focus 8, no. 21: 726–37. doi: 10.1098/rsfs.2012.0025. Hobsbawm, Eric. 1999. Industry and Empire: The Birth of the Industrial Revolution.

pages: 314 words: 88,524

American Marxism
by Mark R. Levin
Published 12 Jul 2021

If man came into this century trailing clouds of transcendental glory, he was now accounted for in a way that would satisfy the positivists.”21 That is, by those intellectuals who reject eternal truths and experience through the ages for the social engineering by supposed experts and their administrative state—which claim to use data, science, and empiricism to analyze, manage, and control society. Weaver also referenced Charles Darwin and his theory of evolution, writing that “[b]iological necessity, issuing in the survival of the fittest, was offered as the causa causans [the primary cause of action], after the important question of human origin had been decided in favor of scientific materialism.

pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing
by Ed Finn
Published 10 Mar 2017

Unless, of course, something goes wrong and the computer has been possessed by some malicious enemy (like the nanites in the episode “Evolution”).29 Like other elements of the diegetic background of the show, the Enterprise’s talking computer was meant to be unremarkable and efficient.30 The conversational computer of Star Trek had its limits, comically misunderstanding requests and occasionally inspiring the kind of stilted “keywordese” many of us now use with voice-driven algorithmic systems. At its peak, it served as a kind of natural language interface for data science, seeking patterns in various kinds of information and presenting analysis.31 Most important, it presented a simple ideal of frictionless vocal interaction: what Google appears to mean by the Star Trek computer, and what LCARS does simply and effectively for the show’s plotting, is respond usefully to verbal commands and queries.

pages: 340 words: 94,464

Randomistas: How Radical Researchers Changed Our World
by Andrew Leigh
Published 14 Sep 2018

Randomised testing of email subject headers found that a fundraising appeal titled ‘Do this for Michelle’ raised about $700,000, while ‘I will be outspent’ raised $2.6 million.47 Given that politics is a zero-sum contest, it’s likely that many of the insights on political fundraising aren’t yet public. But there is some sharing of ideas among ideological bedfellows. For example, Dan Wagner, who led Obama’s 2012 data science team, went on to found Civis Analytics, which offers analysis to progressives, including Justin Trudeau’s successful 2015 campaign for the Canadian prime ministership. * On 2 February 2001, a public meeting was held in the West African village of Tissierou by supporters of presidential candidate Sacca Lafia.48 Villagers were informed that Lafia was the first candidate from that region since 1960.

pages: 305 words: 93,091

The Art of Invisibility: The World's Most Famous Hacker Teaches You How to Be Safe in the Age of Big Brother and Big Data
by Kevin Mitnick , Mikko Hypponen and Robert Vamosi
Published 14 Feb 2017

It’s easy to figure out the MAC address of authorized devices by using a penetration-test tool known as Wireshark. 10. https://www.pwnieexpress.com/blog/wps-cracking-with-reaver. 11. http://www.wired.com/2010/10/webcam-spy-settlement/. 12. http://www.telegraph.co.uk/technology/internet-security/11153381/How-hackers-took-over-my-computer.html. 13. https://www.blackhat.com/docs/us-16/materials/us-16-Seymour-Tully-Weaponizing-Data-Science-For-Social-Engineering-Automated-E2E-Spear-Phishing-On-Twitter.pdf. 14. http://www.wired.com/2010/01/operation-aurora/. 15. http://www.nytimes.com/2015/01/04/opinion/sunday/how-my-mom-got-hacked.html. 16. http://arstechnica.com/security/2013/10/youre-infected-if-you-want-to-see-your-data-again-pay-us-300-in-bitcoins/. 17. https://securityledger.com/2015/10/fbis-advice-on-cryptolocker-just-pay-the-ransom/.

pages: 362 words: 87,462

Laziness Does Not Exist
by Devon Price
Published 5 Jan 2021

Thank you to my mom and sister for constantly trying to teach me how to lighten up and enjoy life once in a while. I swear I have internalized at least some of it. Finally, thank you to my partner, chinchilla co-parent, and best buckaroo, Nick, for your patience, weirdness, creativity, and love. About the Author Dr. Devon Price is an Assistant Clinical Professor of Applied Psychology and Data Science at Loyola University Chicago’s School of Continuing & Professional Studies. Their research on intellectual humility and political open-mindedness has been published in The Journal of Experimental Social Psychology, Personality and Social Psychology Bulletin, and The Journal of Positive Psychology.

pages: 340 words: 91,416

Lost in Math: How Beauty Leads Physics Astray
by Sabine Hossenfelder
Published 11 Jun 2018

William Daniel Phillips, who won the 1997 Nobel Prize in Physics together with Claude Cohen-Tannoudji and Steven Chu for laser cooling, a technique to slow down atoms. 19. Sparkes A et al. 2010. “Towards robot scientists for autonomous scientific discovery.” Automated Experimentation 2:1. 20. Schmidt M, Lipson H. 2009. “Distilling free-form natural laws from experimental data.” Science 324:81–85. 21. Krenn M, Malik M, Fickler R, Lapkiewicz R, Zeilinger A. 2016. “Automated search for new quantum experiments.” Phys Rev Lett. 116:090405. 22. Quoted in Ball P. 2016. “Focus: computer chooses quantum experiments.” Physics 9:25. 23. Powell E. 2011. “Discover interview: Anton Zeilinger dangled from windows, teleported photons, and taught the Dalai Lama.”

pages: 357 words: 94,852

No Is Not Enough: Resisting Trump’s Shock Politics and Winning the World We Need
by Naomi Klein
Published 12 Jun 2017

, December 29, 2016, https://www.democracynow.org/​2016/​12/​29/​facing_possible_threats_under_trump_internet. “Data rescue” events Lisa Song and Zahra Hirji, “The Scramble to Protect Climate Data under Trump,” Inside Climate News, January 20, 2017, https://insideclimatenews.org/​news/​19012017/​climate-change-data-science-denial-donald-trump. “Hackathon” at UC Berkeley: two hundred data defenders Megan Molteni, “Diehard Coders Just Rescued NASA’s Earth Science Data,” Wired, February 13, 2017, https://www.wired.com/​2017/​02/​diehard-coders-just-saved-nasas-earth-science-data/. Jane Goodall: “a trumpet call” David Smith, “Jane Goodall Calls Trump’s Climate Change Agenda ‘Immensely Depressing,’ ”Guardian, March 29, 2017, https://www.theguardian.com/​environment/​2017/​mar/​28/​jane-goodall-trump-climate-change.

pages: 369 words: 98,776

The God Species: Saving the Planet in the Age of Humans
by Mark Lynas
Published 3 Oct 2011

Schuur et al., 2008: “Vulnerability of Permafrost Carbon to Climate Change: Implications for the Global Carbon Cycle,” Bioscience, 58, 8. 40. C. Tarnocai et al., 2009: “Soil Organic Carbon Pools in the Northern Circumpolar Permafrost Region,” Global Biogeochemical Cycles, 23, GB2023. 41. A. Bloom et al., 2010: “Large-Scale Controls of Methanogenesis Inferred from Methane and Gravity Spaceborne Data,” Science, 327, 5963, 322–5. 42. N. Shakhova et al., 2010: “Extensive Methane Venting to the Atmosphere from Sediments of the East Siberian Arctic Shelf,” Science, 327, 5970, 1246–50, and G. Westbrook et al., 2009: “Escape of Methane Gas from the Seabed along the West Spitsbergen Continental Margin,” Geophysical Research Letters, 36, L15608. 43. http://www.realclimate.org/index.php/archives/2010/03/arctic-methane-on-the-move/. 44.

pages: 416 words: 100,130

New Power: How Power Works in Our Hyperconnected World--And How to Make It Work for You
by Jeremy Heimans and Henry Timms
Published 2 Apr 2018

Most people received a “social” message, which was the same as the “billboard” message but with one big difference. It showed the profile pictures of up to six randomly selected Facebook friends who had clicked the “I voted” button. Researchers from the University of California, San Diego, in collaboration with Facebook’s data-science team, then compared online actions with public records to get a sense of whether which message the user received (or did not receive) affected whether the person voted. They published their study in Nature. Their first stunning result was that the billboard group voted at the same rate as the control group.

pages: 398 words: 105,032

Soonish: Ten Emerging Technologies That'll Improve And/or Ruin Everything
by Kelly Weinersmith and Zach Weinersmith
Published 16 Oct 2017

So to speak, if you have a computer model called “How Am I Doing?” a biomarker is anything you might input into that model that would help it find an answer. Just as the coming together of science and medical practice brought about modern medicine, the coming together of medical science with molecular analysis, data science, and machine learning may bring about a new paradigm, which is coming to be called precision medicine. In the future, you may get medical diagnoses that are determined quickly and correctly from thousands of biomarkers, followed by treatments that are tailored to you in particular. This means you will live longer, live healthier, and—if the detection systems get cheap and easy enough—you don’t spend nearly as much time wondering if that bump on your right butt cheek is cancer.

pages: 341 words: 107,933

The Dealmaker: Lessons From a Life in Private Equity
by Guy Hands
Published 4 Nov 2021

Back in my Nomura days I had one other trick in my magic box that other private equity firms at the time didn’t understand at all: technology. One of Nomura’s greatest assets was that they focused more on technical and analytical skills than sales and marketing skills. This meant they were much less likely to be impressed by an arts student from Cambridge than someone with a Ph.D. in data science. With the bank’s financial support I set up what became known as the Cyber Room – a room full of extremely analytical, ludicrously intelligent, quantitative mathematicians, or ‘quants’, most of whom had a Ph.D. in maths or particle physics. One had that rare neurological condition known as synaesthesia, in which senses that aren’t normally connected merge.

pages: 334 words: 104,382

Brotopia: Breaking Up the Boys' Club of Silicon Valley
by Emily Chang
Published 6 Feb 2018

We face a near-term future of autonomous cars, augmented reality, and artificial intelligence, and yet we are at risk of embedding gender bias into all of these new algorithms. “It’s bad for shareholder value,” Megan Smith, who has worked as a Google VP and chief technology officer of the United States, told me. “We want the genetic flourishing of all humanity . . . in on making these products, especially as we move to AI and data sciences.” If robots are going to run the world, or at the very least play a hugely critical role in our future, men shouldn’t be programming them alone. “We have a long way to go and we recognize it,” Microsoft CEO Satya Nadella told me as his company pushes into a future of machine learning and mixed reality.

pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution
by Gregory Zuckerman
Published 5 Nov 2019

“The inefficiencies are so complex they are, in a sense, hidden in the markets in code,” a staffer says. “RenTec decrypts them. We find them across time, across risk factors, across sectors and industries.” Even more important: Renaissance concluded that there are reliable mathematical relationships between all these forces. Applying data science, the researchers achieved a better sense of when various factors were relevant, how they interrelated, and the frequency with which they influenced shares. They also tested and teased out subtle, nuanced mathematical relationships between various shares—what staffers call multidimensional anomalies—that other investors were oblivious to or didn’t fully understand.

pages: 338 words: 104,815

Nobody's Fool: Why We Get Taken in and What We Can Do About It
by Daniel Simons and Christopher Chabris
Published 10 Jul 2023

Simonsohn, “Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone,” Psychological Science 24 (2013): 1875–1888 [https://doi.org/10.1177/0956797613480366]. The original study was conducted in the Netherlands, so the amounts were not in dollars, but the same principle applies. 28. M. Enserink, “Rotterdam Marketing Psychologist Resigns After University Investigates His Data,” Science, June 25, 2012 [doi.org/10.1126/article.27200]. 29. Simonsohn interview: E. Yong, “The Data Detective,” Nature 487 (2012): 18–19 [https://doi.org/10.1038/487018a]. 30. G. Spier, The Education of a Value Investor (New York: Palgrave Macmillan, 2014). The Farmer Mac story is on pp. 53–57. Having learned the lesson of his wrong initial call on Farmer Mac, Spier later spent over one year researching a company called BYD Auto, a Chinese battery and car maker, before investing his fund’s money (pp. 125–126). 31.

pages: 421 words: 110,406

Platform Revolution: How Networked Markets Are Transforming the Economy--And How to Make Them Work for You
by Sangeet Paul Choudary , Marshall W. van Alstyne and Geoffrey G. Parker
Published 27 Mar 2016

We have also benefited from working with a group of world-class scholars who have dedicated their careers to understanding the digital economy, and who participate in the annual Workshop on Information Systems and Economics (WISE) and the Boston University Platform Strategy Research Symposium, as well as some of the world’s leading thinkers in adjacent fields such as behavior design, data science, systems design theory, and agile methodologies. We have written this book because we believe that digital connectivity and the platform model it makes possible are changing the world forever. The platform-driven economic transformation is producing enormous benefits for society as a whole and for the businesses and other organizations that create wealth, generate growth, and serve the needs of humankind.

pages: 387 words: 120,155

Inside the Nudge Unit: How Small Changes Can Make a Big Difference
by David Halpern
Published 26 Aug 2015

The transparency of the information has subtly changed the market, and might even edge us towards saving the planet. Shaping better nudges, with better data We have seen how behavioural science can shape how, and when, data is presented to create an especially powerful class of nudging – what we might call ‘behaviourally shaped informing’. But the relationship is two-way. Data science is also shaping and enhancing the power of nudges. We will explore more about this in Chapter 10, but let us have a glimpse into this world. Many businesses, and occasionally governments, have dabbled in the art of segmentation. Advertising agencies and political pundits often classify people into different groups, sometimes adding an evocative name to catch the segment, such as ‘soccer mums’ (argued to be a key segment in the Clinton campaign); ‘Generation X’; or the ‘aspirant working class’.

pages: 405 words: 117,219

In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence
by George Zarkadakis
Published 7 Mar 2016

Within the past two years Google, one of the biggest companies in the computer industry,2 acquired a number of companies in Artificial Intelligence and advanced robotics. Facebook also announced that one of the most prominent AI researchers in the world, Professor Yann LeCun of NYU’s Center for Data Science, would be joining the company to direct a massive new AI effort. These global companies move towards smarter machine technologies because they understand the challenges and opportunities entailed in owning big data. They also understand that it is not enough to own the data. The real game changer lies in understanding the data’s true significance.

pages: 492 words: 118,882

The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory
by Kariappa Bheemaiah
Published 26 Feb 2017

However, they tend to be computationally expensive. Footnotes 1The Royal Society is a Fellowship of many of the world’s most eminent scientists and is the oldest scientific academy in continuous existence. 2See ‘Technological novelty profile and invention’s future impact’, Kim et al., (2016), EPJ Data Science. 3The term ‘combinatorial evolution’, was coined by the scientific theorist W. Brian Arthur, who is also one of the founders of complexity economics. In a streak that is similar to Thomas Kuhn’s ‘The Structure of Scientific Revolutions’, Arthur’s book, ‘The Nature of Technology: What It Is and How It Evolves’, explains that technologies are based on interactions and composed into modular systems of components that can grow.

pages: 396 words: 112,832

Bread, Wine, Chocolate: The Slow Loss of Foods We Love
by Simran Sethi
Published 10 Nov 2015

This Video Will Inspire You,” Sprudge.com, May 15, 2014, http://sprudge.com/erna-knutsen-specialty-coffee-legend-video-will-inspire-56318.html. 31.“Baseline Data for the Conservation of Coffee Species,” Kew Royal Botanical Gardens, accessed July 23, 2015, http://www.kew.org/science-conservation/research-data/science-directory/projects/baseline-data-conservation-coffee. 32.Julian Siddle and Vibeke Venema, “Saving Coffee from Extinction,” BBC News Magazine, May 24, 2015, http://www.bbc.com/news/magazine-32736366. 33.“All You Need to Know About Coffee: Species and Varieties,” CIRAD, accessed March 23, 2014, http://www.cirad.fr/en/publications-resources/science-for-all/the-issues/coffee/what-you-need-to-know/species-and-varieties. 34.Raimond Feil, “Coffea: Genus—Species—Varieties,” trans.

pages: 404 words: 115,108

They Don't Represent Us: Reclaiming Our Democracy
by Lawrence Lessig
Published 5 Nov 2019

CHAPTER 2: THE UNREPRESENTATIVE US 1.See “Topics of the Times: Italy Hails Our Dictator,” New York Times, March 7, 1933; “Italian Fascists Call Roosevelt Rome’s Disciple,” New York Herald Tribune, May 7, 1933; “When Thieves Fall Out,” Daily Worker, March 9, 1935; Harvey Klehr, “American Reds, Soviet Stooges,” New York Times, July 3, 2017, available at link #84; Roger Shaw, “Fascism and the New Deal,” North American Review 238 (1934): 559–64, available at link #85. Roosevelt himself acknowledged the criticism. Franklin Roosevelt, “Fireside Chat 5: On Addressing the Critics,” June 28, 1934, available at link #86. 2.Jill Lepore, “Politics and the New Machine: What the Turn from Polls to Data Science Means for Democracy,” New Yorker, November 16, 2015, available at link #87. Skepticism about polls and the idea of a public will is long-standing. For some representative sources, see Jean M. Converse, Survey Research in the United States: Roots and Emergence, 1890–1960 (Berkeley: University of California Press, 1987). 3.Peverill Squire, “Why the 1936 Literary Digest Poll Failed,” Public Opinion Quarterly 52, 1 (1988): 125–33 4.Daniel Robinson, The Measure of Democracy: Polling, Market Research, and Public Life, 1930–1945 (Toronto: University of Toronto Press, 2019), Kindle edition, loc. 869, n.4.

pages: 362 words: 116,497

Palace Coup: The Billionaire Brawl Over the Bankrupt Caesars Gaming Empire
by Sujeet Indap and Max Frumes
Published 16 Mar 2021

He maintained Boston as his home base, but had a beach home in North Carolina he visited via his own turboprop plane (which had a green stripe on its tail in the same shade as the Boston Celtics logo, the NBA team of which he and David Bonderman were part owners). After spending decades using his knowledge of data science to get Americans to gamble more, Loveman perhaps found a more noble use of his talent. In 2015, he worked for a time at health insurer Aetna, trying to see how data could be used to promote better health outcomes—a topic that had interested him since managing tens of thousands of casino workers in his CEO days.

pages: 414 words: 117,581

Binge Times: Inside Hollywood's Furious Billion-Dollar Battle to Take Down Netflix
by Dade Hayes and Dawn Chmielewski
Published 18 Apr 2022

Cheng spent two years rooting around in data to glean insights to guide how much of the studio’s entertainment resources to devote to particular projects. As is true at Netflix, analytics wouldn’t replace a creative executive’s judgment about which pitches and showrunners had the potential to make a hit show. But data science could make predictions about a show’s success based on historical performance—information that would help frame financial risk. Under Salke, Amazon Studios focused on global development, putting into production series from India, Japan, Britain, Germany, Mexico, and elsewhere to fulfill Bezos’s vision of Amazon Prime Video as a glittery customer acquisition tool for the Prime subscription service.

pages: 419 words: 119,476

Posh Boys: How English Public Schools Ruin Britain
by Robert Verkaik
Published 14 Apr 2018

mhq5j=e3; http://www.mirror.co.uk/news/politics/greedy-george-osborne-facing-furious-10049285 51 https://www.byline.com/column/67/article/2049 11 Boys’ Own Brexit 1 Stuart Jeffries, The Guardian, 26 May 2014. 2 http://www.dulwich.org.uk/college/about/history 3 http://www.telegraph.co.uk/news/politics/ukip/11291050/Nigel-Farage-and-Enoch-Powell-the-full-story-of-Ukips-links-with-the-Rivers-of-Blood-politician.html 4 https://www.channel4.com/news/nigel-farage-ukip-letter-school-concerns-racism-fascism 5 Michael Crick, Channel 4 News, 19 September 2013. 6 http://www.independent.co.uk/news/uk/politics/nigel-farage-open-letter-schoolfriend-brexit-poster-nazi-song-dulwich-college-gas-them-all-a7185336.html 7 http://www.independent.co.uk/news/uk/politics/nigel-farage-fascist-nazi-song-gas-them-all-ukip-brexit-schoolfriend-dulwich-college-a7185236.html 8 Interview with the author at Dulwich College, 12 January 2018. 9 www.facebook.com/myiannopuolos, accessed 24 January 2018. 10 https://www.linkedin.com/in/sam-farage-85b406b2; http://www.telegraph.co.uk/news/politics/nigel-farage/11467039/Nigel-Farage-My-public-school-had-a-real-social-mix-but-now-only-the-mega-rich-can-afford-the-fees.html 11 Simon Kupar, Financial Times, 7 July 2016. 12 http://www.telegraph.co.uk/news/2017/01/05/project-fear-brexit-predictions-flawed-partisan-new-study-says/; http://www.telegraph.co.uk/news/2016/06/25/how-project-fear-failed-to-keep-britain-in-the-eu--and-the-signs/ 13 Odey declined to be interviewed. 14 Sunday Times, 23 April 2017, p. 4; http://www.independent.co.uk/news/uk/politics/brexit-leave-eu-campaign-arron-banks-jeremy-hosking-five-uk-richest-businessmen-peter-hargreaves-a7699046.html 15 https://inews.co.uk/news/technology/cambridge-analytica-facebook-data-protection/ 16 http://www.bbc.co.uk/news/technology-43581892 17 https://inews.co.uk/news/technology/cambridge-analytica-facebook-data-protection/ 18 https://www.reuters.com/article/us-facebook-cambridge-analytica-leave-eu/what-are-the-links-between-cambridge-analytica-and-a-brexit-campaign-group-idUSKBN1GX2IO 19 https://www.theguardian.com/uk-news/2018/mar/24/aggregateiq-data-firm-link-raises-leave-group-questions https://www.businesstimes.com.sg/government-economy/brexit-campaigners-breached-uk-vote-rules-lawyers-say 20 https://dominiccummings.com/2016/10/29/on-the-referendum-20-the-campaign-physics-and-data-science-vote-leaves-voter-intention-collection-system-vics-now-available-for-all/ 21 A Very British Coup, BBC2, 22 Sepptember 2016. 22 http://www.standard.co.uk/business/business-focus-the-billionaire-hedge-fund-winners-who-braved-the-brexit-rollercoaster-a3284101.html 23 http://fortune.com/2014/12/03/heineken-charlene-de-carvalho-self-made-heiress/ 24 http://www.cityam.com/262239/david-camerons-ex-adviser-daniel-korski-launches-major 25 Tim Shipman, All Out War: Brexit and the Sinking of Britain’s Political Class (London: William Collins, 2017), p. 610. 12 For the Few, Not the Many 1 http://www.telegraph.co.uk/news/politics/Jeremy_Corbyn/11818744/Jeremy-Corbyn-the-boy-to-the-manor-born.html 2 http://www.castlehouseschool.co.uk/about-the-school/fees/ 3 Rosa Prince, Comrade Corbyn: A Very Unlikely Coup (London: Biteback Publishing, 2017), p. 29.

pages: 487 words: 124,008

Your Face Belongs to Us: A Secretive Startup's Quest to End Privacy as We Know It
by Kashmir Hill
Published 19 Sep 2023

In 1912, Ellwood wrote in the Journal of the American Institute of Criminal Law and Criminology that “Lombroso has demonstrated beyond a doubt that crime has biological roots.” This belief that all things could be measured, even criminality, was a reflection of an embrace of statistics, standardization, and data science in the nineteenth century. This fervor was shared by industry—particularly American railroads, which became enthralled by the possibility of sorting and tracking their customers. Railways employed a rudimentary facial recognition system for conductors: tickets that featured seven cartoon faces—an old woman, a young woman, and five men with varying styles of facial hair.

pages: 752 words: 131,533

Python for Data Analysis
by Wes McKinney
Published 30 Dec 2011

Import Conventions The Python community has adopted a number of naming conventions for commonly-used modules: import numpy as np import pandas as pd import matplotlib.pyplot as plt This means that when you see np.arange, this is a reference to the arange function in NumPy. This is done as it’s considered bad practice in Python software development to import everything (from numpy import *) from a large package like NumPy. Jargon I’ll use some terms common both to programming and data science that you may not be familiar with. Thus, here are some brief definitions: Munge/Munging/Wrangling Describes the overall process of manipulating unstructured and/or messy data into a structured or clean form. The word has snuck its way into the jargon of many modern day data hackers. Munge rhymes with “lunge”.

pages: 426 words: 136,925

Fulfillment: Winning and Losing in One-Click America
by Alec MacGillis
Published 16 Mar 2021

“That was a big two and a half years ago.” Up walked John Hanly, from the Center for American Progress, the liberal think tank, and Manish Parikh, a chief technology officer for the defense contractor BAE, and James Armitage, a tax lawyer with Caplin & Drysdale, and Khuloud Odeh, CIO and vice president for technology and data science at the Urban Institute, another center-left think tank. The line was growing at the elevator in the lobby. “Holy moly,” said one woman as the line came into view. Around the corner on F Street, a young, Black woman was sleeping on a sidewalk grate with a towel as a pillow. Someone had left a sandwich for her.

pages: 689 words: 134,457

When McKinsey Comes to Town: The Hidden Influence of the World's Most Powerful Consulting Firm
by Walt Bogdanich and Michael Forsythe
Published 3 Oct 2022

He also consulted for gaming companies, including casinos, sports books, horse racing, and e-sports. While Singer’s name rarely surfaced in news accounts of games, his insights were valued by data analysts who don’t make their living scoring runs or touchdowns. McKinsey deepened its expertise in data science by buying a small, elite consulting company called QuantumBlack, which used data to evaluate athletes in the United States and Europe. One of its specialties was injury prediction—an obvious area of interest to gamblers. Knowing whether certain players were prone to injury might influence betting odds, though there is no evidence this type of information was leaked to gamblers.

pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
by Bruce Schneier
Published 2 Mar 2015

Evans (18 Apr 2014), “IAAS Series: Cloud storage pricing: How low can they go?” Architecting IT, http://blog.architecting.it/2014/04/18/iaas-series-cloud-storage-pricing-how-low-can-they-go. store every tweet ever sent: K. Young (6 Sep 2012), “How much would it cost to store the entire Twitter Firehose?” Mortar: Data Science at Scale, http://blog.mortardata.com/post/31027073689/how-much-would-it-cost-to-store-the-entire-twitter. every phone call ever made: Brewster Kahle (2013), “Cost to store all US phonecalls made in a year so it could be datamined,” https://docs.google.com/spreadsheet/ccc? key=0AuqlWHQKlooOdGJrSzhBVnh0WGlzWHpCZFNVcURkX0E#gid=0.

pages: 579 words: 160,351

Breaking News: The Remaking of Journalism and Why It Matters Now
by Alan Rusbridger
Published 14 Oct 2018

Joanna Geary,10 whom we’d hired in 2011 to look after social media, posted on Facebook in late 2017: About 10 years ago I thought I might need to learn Ruby on Rails [to build web apps] to understand what’s going on in journalism. Then, about 5 years after that, I thought I might need an MBA. Now, the qualifications I need are probably in: Computer Science Data Science Natural Language Processing Graph Analysis Advanced Critical Thinking Anthropology Behavioural Sciences Product Management Business Administration Social Psychology Coaching & People Development Change Management I think I need to lie down . . . We had recruited two stars of the digital news universe – Wolfgang Blau from Die Zeit in Germany and Aron Pilhofer from the New York Times11 – and relaunched the website in a design that worked much better over desktop, tablet and mobile.

The Man Behind the Microchip: Robert Noyce and the Invention of Silicon Valley
by Leslie Berlin
Published 9 Jun 2005

Noyce also Political Entrepreneurship 275 began investing with Arthur Rock after the Callanish Fund (Noyce’s private investment partnership with Paul Hwoschinsky) was amicably dissolved in 1979. Noyce and Rock did not have a formal investment partnership, but as Rock puts it, “We’d rope each other in.”55 The two men together funded several small companies: General Signal, Mohawk Data Sciences, and, at the urging of Mike Markkula, Volant, a manufacturer of a novel steel ski that Noyce, who tried a prototype, was convinced instantly improved his skiing. It was never clear to Volant’s founders, Bucky Kashiwa and his brother Hank, that Noyce particularly cared about making money from the company.

pages: 821 words: 178,631

The Rust Programming Language
by Steve Klabnik and Carol Nichols
Published 14 Jun 2018

If the user wants a high-intensity workout, there’s some additional logic: if the value of the random number generated by the app happens to be 3, the app will recommend a break and hydration. If not, the user will get a number of minutes of running based on the complex algorithm. This code works the way the business wants it to now, but let’s say the data science team decides that we need to make some changes to the way we call the simulated_expensive_calculation function in the future. To simplify the update when those changes happen, we want to refactor this code so it calls the simulated_expensive_calculation function only once. We also want to cut the place where we’re currently unnecessarily calling the function twice without adding any other calls to that function in the process.

Blueprint: The Evolutionary Origins of a Good Society
by Nicholas A. Christakis
Published 26 Mar 2019

I dedicated this book to my beloved wife, Erika Christakis, but I will take this further opportunity to acknowledge her beautiful heart and mind, which improved this book immeasurably. About the Author Nicholas A. Christakis, MD, PhD, MPH, is the Sterling Professor of Social and Natural Science at Yale University, with appointments in the departments of Sociology, Ecology and Evolutionary Biology, Statistics and Data Science, Biomedical Engineering, and Medicine. Previously, he conducted research and taught for many years at Harvard University and at the University of Chicago. He was on Time magazine’s list of the 100 most influential people in the world in 2009. He worked as a hospice physician in underserved communities in Chicago and Boston until 2011.

Big Data and the Welfare State: How the Information Revolution Threatens Social Solidarity
by Torben Iversen and Philipp Rehm
Published 18 May 2022

The pandemic made many traditional underwriting practices impossible, most importantly in-person medical examinations. Life insurance companies immediately sought to replace in-person medical exams – long the centerpiece of risk classification – with alternative ways of credibly assessing an applicant’s health status and history. Advances in data sciences came to the rescue. The approach taken by John Hancock life insurance is instructive here. In early April 2020, John Hancock rolled out access to the “Human API portal,” which allows applicants to give the company direct access to their health records.9 Human API has built up a large infrastructure that allows John Hancock to guzzle up, standardize, and interpret health information from users that authorize access to their data.

pages: 743 words: 201,651

Free Speech: Ten Principles for a Connected World
by Timothy Garton Ash
Published 23 May 2016

Typically, this takes the form of an A/B test, when two algorithmic alternatives are tried out simultaneously on a split sample group. These experiments are being made on us all the time, usually with our formal legal consent (that ‘I Agree’ button again) but without our being aware of it. An experiment conducted by Facebook’s Core Data Science Team in 2012, but only made public in a scientific paper that appeared in 2014, ‘manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News Feed’. One group among those 689,003 users had their News Feeds manipulated to select more positive emotional content coming from their Facebook friends, while another got more negative emotional content.

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 17 Apr 2017

ISBN: 978-0-553-41881-1 [88] Julia Angwin: “Make Algorithms Accountable,” nytimes.com, August 1, 2016. [89] Bryce Goodman and Seth Flaxman: “European Union Regulations on Algorith‐ mic Decision-Making and a ‘Right to Explanation’,” arXiv:1606.08813, August 31, 2016. [90] “A Review of the Data Broker Industry: Collection, Use, and Sale of Consumer Data for Marketing Purposes,” Staff Report, United States Senate Committee on Com‐ merce, Science, and Transportation, commerce.senate.gov, December 2013. [91] Olivia Solon: “Facebook’s Failure: Did Fake News and Polarized Politics Get Trump Elected?” theguardian.com, November 10, 2016. [92] Donella H. Meadows and Diana Wright: Thinking in Systems: A Primer. Chelsea Green Publishing, 2008. ISBN: 978-1-603-58055-7 550 | Chapter 12: The Future of Data Systems [93] Daniel J. Bernstein: “Listening to a ‘big data’/‘data science’ talk,” twitter.com, May 12, 2015. [94] Marc Andreessen: “Why Software Is Eating the World,” The Wall Street Journal, 20 August 2011. [95] J. M. Porup: “‘Internet of Things’ Security Is Hilariously Broken and Getting Worse,” arstechnica.com, January 23, 2016. [96] Bruce Schneier: Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World.

pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann
Published 16 Mar 2017

[91] Olivia Solon: “Facebook’s Failure: Did Fake News and Polarized Politics Get Trump Elected?” theguardian.com, November 10, 2016. [92] Donella H. Meadows and Diana Wright: Thinking in Systems: A Primer. Chelsea Green Publishing, 2008. ISBN: 978-1-603-58055-7 [93] Daniel J. Bernstein: “Listening to a ‘big data’/‘data science’ talk,” twitter.com, May 12, 2015. [94] Marc Andreessen: “Why Software Is Eating the World,” The Wall Street Journal, 20 August 2011. [95] J. M. Porup: “‘Internet of Things’ Security Is Hilariously Broken and Getting Worse,” arstechnica.com, January 23, 2016. [96] Bruce Schneier: Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World.

pages: 1,072 words: 237,186

How to Survive a Pandemic
by Michael Greger, M.D., FACLM

Community-acquired pneumonia in patients with chronic obstructive pulmonary disease requiring admission to the intensive care unit: risk factors for mortality. J Crit Care. 28(6):975–979. https://doi.org/10.1016/j.jcrc.2013.08.004. 2735. Aitken M, Kleinrock M. 2018 Apr 19. Medicine use and spending in the U.S.: a review of 2017 and outlook to 2022. Parsippany (NJ): IQVIA Institute for Human Data Science; [accessed 2020 Mar 31]. https://www.iqvia.com/insights/the-iqvia-institute/reports/medicine-use-and-spending-in-the-us-review-of-2017-outlook-to-2022. 2736. Caldeira D, Alarcão J, Vaz-Carneiro A, Costa J. 2012. Risk of pneumonia associated with use of angiotensin converting enzyme inhibitors and angiotensin receptor blockers: systematic review and meta-analysis.

pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics)
by Trevor Hastie , Robert Tibshirani and Jerome Friedman
Published 25 Aug 2009

Rumelhart and J. McClelland (eds), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, The MIT Press, Cambridge, MA., pp. 318–362. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. and Nolan, G. (2005). Causal protein-signaling networks derived from multiparameter singlecell data, Science 308: 523–529. Schapire, R. (1990). The strength of weak learnability, Machine Learning 5(2): 197–227. Schapire, R. (2002). The boosting approach to machine learning: an overview, in D. Denison, M. Hansen, C. Holmes, B. Mallick and B. Yu (eds), MSRI workshop on Nonlinear Estimation and Classification, Springer, New York.