by Viktor Mayer-Schonberger and Kenneth Cukier · 5 Mar 2013 · 304pp · 82,395 words
(http://www.wired.com/threatlevel/2009/12/netflix-privacy-lawsuit/). On the Netflix data release—Arvind Narayanan and Vitaly Shmatikov, “Robust De-Anonymization of Large Sparse Datasets,” Proceedings of the 2008 IEEE Symposium on Security and Privacy, p. 111 et seq. (http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf); Arvind Narayanan
…
Break the Anonymity of the Netflix Prize Dataset.” October 18, 2006, arXiv:cs/0610105 (http://arxiv.org/abs/cs/0610105). ———. “Robust De-Anonymization of Large Sparse Datasets.” Proceedings of the 2008 IEEE Symposium on Security and Privacy, p. 111 (http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf). Nazareth, Rita, and Julia
by Toby Segaran · 17 Dec 2008 · 519pp · 102,669 words
the same set of del.icio.us bookmarks—most bookmarks are saved by a small group of people, leading to a sparse dataset. Item-based filtering usually outperforms user-based filtering in sparse datasets, and the two perform about equally in dense datasets. Tip To learn more about the difference in performance between these
by Michael Kearns and Aaron Roth · 3 Oct 2019
” by Paul Ohm, which appeared in the UCLA Law Review 57 (2010). Details on the Netflix attack are described in “Robust De-anonymization of Large Sparse Datasets” by Arvind Narayanan and Vitaly Shmatikov, which was published in the IEEE Symposium on Security and Privacy (IEEE, 2008). Details of the original Genome-Wide
by Hannah Fry · 17 Sep 2018 · 296pp · 78,631 words
.com/watch?v=1nvYGi7-Lxo. 17. The researchers based this part of their work on Arvind Narayanan and Vitaly Shmatikov, ‘Robust de-anonymization of large sparse datasets’, paper presented to IEEE Symposium on Security and Privacy, 18–22 May 2008. 18. Michal Kosinski, David Stillwell and Thore Graepel. ‘Private traits and attributes
by Matthew Hindman · 24 Sep 2018
, interpreting factors can be difficult in practice, as we shall see. svd had rarely been used with recommender systems because the technique performed poorly on “sparse” datasets, those (like the Netflix data) in which most of the values are missing. But Funk adapted the technique to ignore missing values, and found a
by Julia Angwin · 25 Feb 2014 · 422pp · 104,457 words
/technology/09aol.html?_r=0&gwh=2CACC912D19D87BDFD3A39B96C429022. In 2008, researchers at the University of Texas: Arvind Narayanan and Vitaly Shmatikov, “Robust De-anonymization of Large Sparse Datasets,” Security and Privacy (2008): 111–25, http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf. In 2012, my Wall Street Journal team: Jennifer Valentino-Devries
by Ronald J. Deibert · 14 Aug 2020
-brother/ How easy it is to unmask real identities contained in large personal data sets: Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. IEEE Symposium on Security and Privacy, 111–125. http://doi.org/10.1109/SP.2008.33 “At least eight surveillance and cyber-intelligence companies attempting
by Jiawei Han, Micheline Kamber and Jian Pei · 21 Jun 2011
2.4.5), or by combinations of these attribute types (Section 2.4.6). Section 2.4.7 provides similarity measures for very long and sparse data vectors, such as term-frequency vectors representing documents in information retrieval. Knowing how to compute dissimilarity is useful in studying attributes and will also be
…
. Y1 and Y2 are the first two principal components for the given data. PCA can be applied to ordered and unordered attributes, and can handle sparse data and skewed data. Multidimensional data of more than two dimensions can be handled by reducing the problem to two dimensions. Principal components may be used
…
as inputs to multiple regression and cluster analysis. In comparison with wavelet transforms, PCA tends to be better at handling sparse data, whereas wavelet transforms are more suitable for data of high dimensionality. 3.4.4. Attribute Subset Selection Data sets for analysis may contain hundreds of
…
space are less subject to sampling variations than the estimates in the higher-dimensional space). Regression and log-linear models can both be used on sparse data, although their application may be limited. While both methods can handle skewed data, regression does exceptionally well. Regression can be computationally intensive when applied to
…
. In such cases, sparse matrix compression techniques should be explored (Chapter 5). Many MOLAP servers adopt a two-level storage representation to handle dense and sparse data sets: Denser subcubes are identified and stored as array structures, whereas sparse subcubes employ compression technology for efficient storage utilization. Hybrid OLAP (HOLAP) servers: The
…
as a data cube. Unfortunately, this may often generate a huge, yet very sparse, multidimensional matrix. (a) Present an example illustrating such a huge and sparse data cube. (b) Design an implementation method that can elegantly overcome this sparse matrix problem. Note that you need to explain your data structures in detail
…
aggregation method for data cube computation in MOLAP was proposed in Zhao, Deshpande, and Naughton [ZDN97]. Ross and Srivastava [RS97] developed a method for computing sparse data cubes. Iceberg queries are first described in Fang, Shivakumar, Garcia-Molina, et al. [FSGM+98]. BUC, a scalable method that computes iceberg cubes from the
…
in high-dimensional spaces. The distance between objects becomes heavily dominated by noise as the dimensionality increases. Therefore, data in high-dimensional spaces are often sparse. ■ Data subspaces: They should model outliers appropriately, for example, adaptive to the subspaces signifying the outliers and capturing the local behavior of data. Using a fixed
…
592see alsonetworks social science/social studies data mining 613 soft clustering 501 soft constraints 534, 539 example 534 handling 536–537 space-filling curve 58 sparse data 102 sparse data cubes 190 sparsest cuts 539 sparsity coefficient 579 spatial data 14 spatial data mining 595 spatiotemporal data analysis 319 spatiotemporal data mining 595, 623
by Stuart Russell and Peter Norvig · 14 Jul 2019 · 2,466pp · 668,761 words
). Continuous speech recognition by statistical methods. Proc. IEEE, 64, 532–556. Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov source parameters from sparse data. In Proc. Workshop on Pattern Recognition in Practice. Jennings, H. S. (1906). Behavior of the Lower Organisms. Columbi aUniversity Press. Jenniskens, P., Betlem, H., Betlem
by Femi Anthony · 21 Jun 2015 · 589pp · 69,193 words
: This defines the pandas Series class and its various methods that Series inherits from NDFrame and IndexOpsMixin. sparse.py: This defines import for handling sparse data structures. Sparse data structures are compressed whereby data points matching NaN or missing values are omitted. For more information on this, go to http://pandas.pydata.org/pandas
by Bruce Schneier · 3 Sep 2018 · 448pp · 117,325 words
by Bruce Schneier · 2 Mar 2015 · 598pp · 134,339 words
by Eli Bressert · 14 Oct 2012 · 62pp · 14,996 words
by Trevor Hastie, Robert Tibshirani and Jerome Friedman · 25 Aug 2009 · 764pp · 261,694 words
by Brian Christian · 5 Oct 2020 · 625pp · 167,349 words
by Zdravko Markov and Daniel T. Larose · 5 Apr 2007
by Sarah Boslaugh · 10 Nov 2012
by Martin Ford · 16 Nov 2018 · 586pp · 186,548 words
by Lars George · 29 Aug 2011
by Carl Benedikt Frey · 17 Jun 2019 · 626pp · 167,836 words
by Toby Segaran and Jeff Hammerbacher · 1 Jul 2009
by James Pustejovsky and Amber Stubbs · 14 Oct 2012 · 502pp · 107,510 words
by Eric Redmond, Jim Wilson and Jim R. Wilson · 7 May 2012 · 713pp · 93,944 words
by Tom White · 29 May 2009 · 933pp · 205,691 words
by Frank Vertosick · 1 Jan 1996 · 250pp · 75,586 words
by Steven Bird, Ewan Klein and Edward Loper · 15 Dec 2009 · 504pp · 89,238 words
by Charles Murray · 1 Jan 2012 · 397pp · 121,211 words
by Laura Trethewey · 15 May 2023
by Philipp Janert · 2 Jan 2010 · 398pp · 31,161 words
by Steven Pinker · 13 Feb 2018 · 1,034pp · 241,773 words
by Olivier Cure and Guillaume Blin · 10 Dec 2014
by Annalee Newitz · 404pp · 118,036 words
by Mariya Yao, Adelyn Zhou and Marlene Jia · 1 Jun 2018 · 161pp · 39,526 words
by Robert D. Hare · 1 Nov 1993 · 260pp · 78,229 words
by Juli Berwald · 14 May 2017 · 397pp · 113,304 words
by David Reich · 22 Mar 2018 · 372pp · 110,208 words
by David A. Mindell · 3 Apr 2008 · 377pp · 21,687 words
by David Spiegelhalter · 2 Sep 2019 · 404pp · 92,713 words
by Carl Sagan · 8 Sep 1997 · 356pp · 102,224 words
by Alastair Reynolds · 16 Apr 2008 · 635pp · 186,208 words
by Kim Wagner · 26 Mar 2019
by Paul Scharre · 18 Jan 2023
by David Spiegelhalter · 14 Oct 2019 · 442pp · 94,734 words
by Jimmy Moore and Jason Fung · 18 Oct 2016 · 275pp · 74,972 words
by James Gleick · 1 Jan 1992 · 795pp · 215,529 words
by Lane Greene · 15 Dec 2018 · 284pp · 84,169 words
by Joseph Henrich · 7 Sep 2020 · 796pp · 223,275 words
by Virginia Eubanks · 294pp · 77,356 words
by Matthew A. Russell · 15 Jan 2011 · 541pp · 109,698 words
by Alec Ross · 13 Sep 2021 · 363pp · 109,077 words
by Tim Harford · 2 Feb 2021 · 428pp · 103,544 words