sparse data

back to index

51 results

Big Data: A Revolution That Will Transform How We Live, Work, and Think

by Viktor Mayer-Schonberger and Kenneth Cukier  · 5 Mar 2013  · 304pp  · 82,395 words

(http://www.wired.com/threatlevel/2009/12/netflix-privacy-lawsuit/). On the Netflix data release—Arvind Narayanan and Vitaly Shmatikov, “Robust De-Anonymization of Large Sparse Datasets,” Proceedings of the 2008 IEEE Symposium on Security and Privacy, p. 111 et seq. (http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf); Arvind Narayanan

Break the Anonymity of the Netflix Prize Dataset.” October 18, 2006, arXiv:cs/0610105 (http://arxiv.org/abs/cs/0610105). ———. “Robust De-Anonymization of Large Sparse Datasets.” Proceedings of the 2008 IEEE Symposium on Security and Privacy, p. 111 (http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf). Nazareth, Rita, and Julia

Programming Collective Intelligence

by Toby Segaran  · 17 Dec 2008  · 519pp  · 102,669 words

the same set of del.icio.us bookmarks—most bookmarks are saved by a small group of people, leading to a sparse dataset. Item-based filtering usually outperforms user-based filtering in sparse datasets, and the two perform about equally in dense datasets. Tip To learn more about the difference in performance between these

The Ethical Algorithm: The Science of Socially Aware Algorithm Design

by Michael Kearns and Aaron Roth  · 3 Oct 2019

” by Paul Ohm, which appeared in the UCLA Law Review 57 (2010). Details on the Netflix attack are described in “Robust De-anonymization of Large Sparse Datasets” by Arvind Narayanan and Vitaly Shmatikov, which was published in the IEEE Symposium on Security and Privacy (IEEE, 2008). Details of the original Genome-Wide

Hello World: Being Human in the Age of Algorithms

by Hannah Fry  · 17 Sep 2018  · 296pp  · 78,631 words

.com/watch?v=1nvYGi7-Lxo. 17. The researchers based this part of their work on Arvind Narayanan and Vitaly Shmatikov, ‘Robust de-anonymization of large sparse datasets’, paper presented to IEEE Symposium on Security and Privacy, 18–22 May 2008. 18. Michal Kosinski, David Stillwell and Thore Graepel. ‘Private traits and attributes

The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy

by Matthew Hindman  · 24 Sep 2018

, interpreting factors can be difficult in practice, as we shall see. svd had rarely been used with recommender systems because the technique performed poorly on “sparse” datasets, those (like the Netflix data) in which most of the values are missing. But Funk adapted the technique to ignore missing values, and found a

Dragnet Nation: A Quest for Privacy, Security, and Freedom in a World of Relentless Surveillance

by Julia Angwin  · 25 Feb 2014  · 422pp  · 104,457 words

/technology/09aol.html?_r=0&gwh=2CACC912D19D87BDFD3A39B96C429022. In 2008, researchers at the University of Texas: Arvind Narayanan and Vitaly Shmatikov, “Robust De-anonymization of Large Sparse Datasets,” Security and Privacy (2008): 111–25, http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf. In 2012, my Wall Street Journal team: Jennifer Valentino-Devries

Reset

by Ronald J. Deibert  · 14 Aug 2020

-brother/ How easy it is to unmask real identities contained in large personal data sets: Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. IEEE Symposium on Security and Privacy, 111–125. http://doi.org/10.1109/SP.2008.33 “At least eight surveillance and cyber-intelligence companies attempting

Data Mining: Concepts and Techniques: Concepts and Techniques

by Jiawei Han, Micheline Kamber and Jian Pei  · 21 Jun 2011

2.4.5), or by combinations of these attribute types (Section 2.4.6). Section 2.4.7 provides similarity measures for very long and sparse data vectors, such as term-frequency vectors representing documents in information retrieval. Knowing how to compute dissimilarity is useful in studying attributes and will also be

. Y1 and Y2 are the first two principal components for the given data. PCA can be applied to ordered and unordered attributes, and can handle sparse data and skewed data. Multidimensional data of more than two dimensions can be handled by reducing the problem to two dimensions. Principal components may be used

as inputs to multiple regression and cluster analysis. In comparison with wavelet transforms, PCA tends to be better at handling sparse data, whereas wavelet transforms are more suitable for data of high dimensionality. 3.4.4. Attribute Subset Selection Data sets for analysis may contain hundreds of

space are less subject to sampling variations than the estimates in the higher-dimensional space). Regression and log-linear models can both be used on sparse data, although their application may be limited. While both methods can handle skewed data, regression does exceptionally well. Regression can be computationally intensive when applied to

. In such cases, sparse matrix compression techniques should be explored (Chapter 5). Many MOLAP servers adopt a two-level storage representation to handle dense and sparse data sets: Denser subcubes are identified and stored as array structures, whereas sparse subcubes employ compression technology for efficient storage utilization. Hybrid OLAP (HOLAP) servers: The

as a data cube. Unfortunately, this may often generate a huge, yet very sparse, multidimensional matrix. (a) Present an example illustrating such a huge and sparse data cube. (b) Design an implementation method that can elegantly overcome this sparse matrix problem. Note that you need to explain your data structures in detail

aggregation method for data cube computation in MOLAP was proposed in Zhao, Deshpande, and Naughton [ZDN97]. Ross and Srivastava [RS97] developed a method for computing sparse data cubes. Iceberg queries are first described in Fang, Shivakumar, Garcia-Molina, et al. [FSGM+98]. BUC, a scalable method that computes iceberg cubes from the

in high-dimensional spaces. The distance between objects becomes heavily dominated by noise as the dimensionality increases. Therefore, data in high-dimensional spaces are often sparse. ■ Data subspaces: They should model outliers appropriately, for example, adaptive to the subspaces signifying the outliers and capturing the local behavior of data. Using a fixed

592see alsonetworks social science/social studies data mining 613 soft clustering 501 soft constraints 534, 539 example 534 handling 536–537 space-filling curve 58 sparse data 102 sparse data cubes 190 sparsest cuts 539 sparsity coefficient 579 spatial data 14 spatial data mining 595 spatiotemporal data analysis 319 spatiotemporal data mining 595, 623

Artificial Intelligence: A Modern Approach

by Stuart Russell and Peter Norvig  · 14 Jul 2019  · 2,466pp  · 668,761 words

). Continuous speech recognition by statistical methods. Proc. IEEE, 64, 532–556. Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov source parameters from sparse data. In Proc. Workshop on Pattern Recognition in Practice. Jennings, H. S. (1906). Behavior of the Lower Organisms. Columbi aUniversity Press. Jenniskens, P., Betlem, H., Betlem

Mastering Pandas

by Femi Anthony  · 21 Jun 2015  · 589pp  · 69,193 words

: This defines the pandas Series class and its various methods that Series inherits from NDFrame and IndexOpsMixin. sparse.py: This defines import for handling sparse data structures. Sparse data structures are compressed whereby data points matching NaN or missing values are omitted. For more information on this, go to http://pandas.pydata.org/pandas

Click Here to Kill Everybody: Security and Survival in a Hyper-Connected World

by Bruce Schneier  · 3 Sep 2018  · 448pp  · 117,325 words

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World

by Bruce Schneier  · 2 Mar 2015  · 598pp  · 134,339 words

SciPy and NumPy

by Eli Bressert  · 14 Oct 2012  · 62pp  · 14,996 words

The Elements of Statistical Learning (Springer Series in Statistics)

by Trevor Hastie, Robert Tibshirani and Jerome Friedman  · 25 Aug 2009  · 764pp  · 261,694 words

The Alignment Problem: Machine Learning and Human Values

by Brian Christian  · 5 Oct 2020  · 625pp  · 167,349 words

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

by Zdravko Markov and Daniel T. Larose  · 5 Apr 2007

Statistics in a Nutshell

by Sarah Boslaugh  · 10 Nov 2012

Architects of Intelligence

by Martin Ford  · 16 Nov 2018  · 586pp  · 186,548 words

HBase: The Definitive Guide

by Lars George  · 29 Aug 2011

The Technology Trap: Capital, Labor, and Power in the Age of Automation

by Carl Benedikt Frey  · 17 Jun 2019  · 626pp  · 167,836 words

Beautiful Data: The Stories Behind Elegant Data Solutions

by Toby Segaran and Jeff Hammerbacher  · 1 Jul 2009

Natural Language Annotation for Machine Learning

by James Pustejovsky and Amber Stubbs  · 14 Oct 2012  · 502pp  · 107,510 words

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement

by Eric Redmond, Jim Wilson and Jim R. Wilson  · 7 May 2012  · 713pp  · 93,944 words

Hadoop: The Definitive Guide

by Tom White  · 29 May 2009  · 933pp  · 205,691 words

When the Air Hits Your Brain: Tales From Neurosurgery

by Frank Vertosick  · 1 Jan 1996  · 250pp  · 75,586 words

Natural language processing with Python

by Steven Bird, Ewan Klein and Edward Loper  · 15 Dec 2009  · 504pp  · 89,238 words

Coming Apart: The State of White America, 1960-2010

by Charles Murray  · 1 Jan 2012  · 397pp  · 121,211 words

The Deepest Map

by Laura Trethewey  · 15 May 2023

Gnuplot in Action: Understanding Data With Graphs

by Philipp Janert  · 2 Jan 2010  · 398pp  · 31,161 words

Enlightenment Now: The Case for Reason, Science, Humanism, and Progress

by Steven Pinker  · 13 Feb 2018  · 1,034pp  · 241,773 words

RDF Database Systems: Triples Storage and SPARQL Query Processing

by Olivier Cure and Guillaume Blin  · 10 Dec 2014

The Terraformers

by Annalee Newitz  · 404pp  · 118,036 words

Applied Artificial Intelligence: A Handbook for Business Leaders

by Mariya Yao, Adelyn Zhou and Marlene Jia  · 1 Jun 2018  · 161pp  · 39,526 words

Without Conscience: The Disturbing World of the Psychopaths Among Us

by Robert D. Hare  · 1 Nov 1993  · 260pp  · 78,229 words

Spineless: The Science of Jellyfish and the Art of Growing a Backbone

by Juli Berwald  · 14 May 2017  · 397pp  · 113,304 words

Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past

by David Reich  · 22 Mar 2018  · 372pp  · 110,208 words

Digital Apollo: Human and Machine in Spaceflight

by David A. Mindell  · 3 Apr 2008  · 377pp  · 21,687 words

The Art of Statistics: How to Learn From Data

by David Spiegelhalter  · 2 Sep 2019  · 404pp  · 92,713 words

Pale Blue Dot: A Vision of the Human Future in Space

by Carl Sagan  · 8 Sep 1997  · 356pp  · 102,224 words

House of Suns

by Alastair Reynolds  · 16 Apr 2008  · 635pp  · 186,208 words

Amritsar 1919: An Empire of Fear and the Making of a Massacre

by Kim Wagner  · 26 Mar 2019

Four Battlegrounds

by Paul Scharre  · 18 Jan 2023

The Art of Statistics: Learning From Data

by David Spiegelhalter  · 14 Oct 2019  · 442pp  · 94,734 words

Complete Guide to Fasting: Heal Your Body Through Intermittent, Alternate-Day, and Extended Fasting

by Jimmy Moore and Jason Fung  · 18 Oct 2016  · 275pp  · 74,972 words

Genius: The Life and Science of Richard Feynman

by James Gleick  · 1 Jan 1992  · 795pp  · 215,529 words

Talk on the Wild Side

by Lane Greene  · 15 Dec 2018  · 284pp  · 84,169 words

The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous

by Joseph Henrich  · 7 Sep 2020  · 796pp  · 223,275 words

Automating Inequality

by Virginia Eubanks  · 294pp  · 77,356 words

Mining the Social Web: Finding Needles in the Social Haystack

by Matthew A. Russell  · 15 Jan 2011  · 541pp  · 109,698 words

The Raging 2020s: Companies, Countries, People - and the Fight for Our Future

by Alec Ross  · 13 Sep 2021  · 363pp  · 109,077 words

The Data Detective: Ten Easy Rules to Make Sense of Statistics

by Tim Harford  · 2 Feb 2021  · 428pp  · 103,544 words