description: activity of obtaining information resources relevant to an information need from a collection of information resources
181 results
by Liz Pelly · 7 Jan 2025 · 293pp · 104,461 words
taste and receive recommendations back. Collaborative filtering was created around the same time. In music scholarship, there’s a whole field of study called “music information retrieval” (MIR) that goes back several decades and, since 2000, even has its own long-running annual conference hosted by the International Society for Music
…
Information Retrieval, the “world’s leading research forum on processing, analyzing, searching, organizing, and accessing music related data.”2 This was not Spotify’s world, but as
…
profits/corporate greed in, 2–3 (see also payola) “saving” of, 13–14, 20–21, 23 See also record labels, independent; record labels, major music information retrieval (MIR), 93 music labor movement AFM strike, 207–8 on copyright violations, 206–7 during COVID-19 lockdowns, 205–6 “Justice at Spotify” campaign, 205
by Doug Turnbull and John Berryman · 30 Apr 2016 · 593pp · 118,995 words
“relevant” search result? 1.2.2. Search: there’s no silver bullet! 1.3. Gaining insight from relevance research 1.3.1. Information retrieval 1.3.2. Can we use information retrieval to solve relevance? 1.4. How do you solve relevance? 1.5. More than technology: curation, collaboration, and feedback 1.6. Summary
…
approaches, like latent semantic analysis. We dove into Lucene’s guts and explored techniques for building custom search components to solve problems. We began exploring information retrieval research. As we learned more techniques to solve hard problems, we continued to write about them. Still, blogs have their limits. John and I always
…
Solr blog (http://sujitpal.blogspot.com/) The Solr Start newsletter (www.solr-start.com) On the more general topic of search and information retrieval, we recommend this canonical text: Introduction to Information Retrieval by Christopher Manning et al. (Cambridge University Press, 2008), http://nlp.stanford.edu/IR-book/. For questions specific to Solr/Elasticsearch
…
of engineering effort. You engaged with the culmination of an even larger body of academic research that goes back a century in the field of information retrieval. Standing on the shoulders of giants, you sifted through millions of pieces of information—the entire human collection of information on the topic—and found
…
but is instead a bag of tricks that can’t be generally applied. In reality, there is a discipline behind relevance: the academic field of information retrieval. It has generally accepted practices to improve relevance broadly across many domains. But you’ve seen that what’s relevant depends a great deal on
…
your application. Given that, as we introduce information retrieval, think about how its general findings can be used to solve your narrower relevance problem.[2] 2 For an introduction to the field of
…
information retrieval, we highly recommend the classic text Introduction to Information Retrieval by Christopher D. Manning et al. (Cambridge University Press, 2008); see http://nlp.stanford.edu/IR-book/. 1.3.1
…
. Information retrieval Luckily, experts have been studying search for decades. The academic field of information retrieval focuses on the precise recall of information to satisfy a user’s information need. What’s an information need
…
fortunate, you’ll find a result addressing a problem similar to your own. That information will solve your problem, and you’ll move on. In information retrieval, relevance is defined as the practice of returning search results that most satisfy the user’s information needs. Further, classic
…
information retrieval focuses on text ranking. Many findings in information retrieval try to measure how likely a given article is going to be relevant to a user’s text search. You’ll learn about several
…
of these invaluable methods throughout this book—as many of these findings are implemented in open source search engines. To discover better text-searching methods, information retrieval researchers benchmark different strategies by using test collections of articles. These test collections include Amazon reviews, Reuters news articles, Usenet posts, and other similar, article
…
judgment lists, researchers aim to measure whether changes to text relevance calculations improve the overall relevance of the results across every test collection. To classic information retrieval, a solution that improves a dozen text-heavy test collections 1% overall is a success. Rather than focusing on one particular problem in depth
…
on solving search for a broad set of problems. 1.3.2. Can we use information retrieval to solve relevance? You’ve already seen there’s no silver bullet. But information retrieval does seem to systematically create relevance solutions. So ask yourself: Do these insights apply to your application? Does your application care about
…
searching article-length text? Would it be better to solve the specific problems faced by your application, here and now? To be more precise, classic information retrieval begs several questions when brought to bear on applied relevance problems. Let’s reflect on these questions to see where
…
information retrieval research can help and where it might stop being helpful. Do we care only about information needs? For many applications, satisfying users’ information needs isn’
…
overpriced clunker off the lot, relevance engineers must work with these factors to keep their employer in business. What besides text reflects information needs? Classic information retrieval focuses on a generic, one-size-fits-all measure of text relevance. These factors may not matter—at all—to your application. You need to
…
(PageRank). Google uses PageRank to get around pure text-based measures easily gamed in its domain. Even text search doesn’t always neatly fit into information retrieval’s focus on article-length text. Good results for short text snippets such as tweets or titles require different thinking. You, not
…
information retrieval researchers, must decide which factors matter to your application, and implement those. An approach that does poorly against the Reuters test set may be exactly
…
, ready, willing, and able to solve any problem asked of it, including helping a doctor save a life? Considering these questions, you can see that information retrieval builds a foundation for applying generally useful relevance measures to extremely broad classes of problems. Your job is to solve relevance for your application. As
…
context of a particular user experience, while balancing how ranking impacts our business’s needs. 1.4. How do you solve relevance? Informed now by information retrieval, let’s focus on how to solve your relevance problems. Open source search engines recognize that what’s relevant to your application depends on a
…
factors. Many of these are application-specific (how far the user is from a restaurant, for instance). Others are broader, generic, text-ranking components from information retrieval. Given the capabilities of open source search, how do you solve an applied relevance problem? What framework can we define that incorporates both the narrower
…
, domain-specific factors alongside broader information-retrieval techniques? To solve relevance, the relevance engineer: 1. Identifies salient features describing the content, the user, or the search query 2. Finds a way to
…
. Bringing users to relevant search results can turn into a multibillion-dollar business advantage; failing to do so can mean losing out to the competition. Information retrieval is the academic field of bringing users to content that satisfies their information needs, largely as specified in search queries. In practice, relevance is more
…
chapter, we have made several references to the idea of relevance and how various search features can be used to improve it. Recall that in information retrieval, relevance measures how well search results satisfy a user’s information need. In this book, we adopt a broader notion. In light of this, relevant
…
. The scoring begins to look more cryptic, filled with deeper search-engine jargon. At this point, you’re seeing a more fundamental reflection of the information retrieval intelligence built into the search engine. At this level, you begin to see information about match statistics for a field. These matches are the basic
…
convey more strength than others. 3.6.2. The vector-space model, the relevance explain, and you Much of the Lucene scoring formula derives from information retrieval. But the theoretical influence needs to be tempered mightily. Although the theoretical basis gives you context for solving a problem, in practice, relevance scoring uses
…
. Understanding the science will help you ensure that the search engine correctly measures the weight of features latent in the text, represented by terms. To information retrieval, a search for multiple terms in a field (such as our overview:basketbal overview:alien overview:cartoon against Space Jam) attempts to approximate a vector
…
, the higher the dot product. dotprod(fruit1, fruit2) = juiciness(fruit1) × juiciness(fruit2) + size(fruit1) × size(fruit2) What does this have to do with text? To information retrieval, text (queries and documents) can also be represented as vectors. Instead of examining features such as juiciness or size, the dimensions in the text vector
…
, and similarity Although you can see that TF × IDF seems to be an intuitive weighting formula, these raw statistics need additional tweaking to be optimal. Information retrieval research demonstrates that although a search term might occur 10 times more in a piece of text, that doesn’t make it 10 times as
…
(doc=31) Lucene’s next default similarity: BM25 Over the years, an alternate approach to computing a TF × IDF score has become prevalent in the information retrieval community: Okapi BM25. Because of its proven high performance on article-length text, Lucene’s BM25 similarity will be rolling out as the default similarity
…
, even as you read this book. What is BM25? Instead of “fudge factors” as discussed previously, BM25 bases its TF × IDF “fudges” on more-robust information retrieval findings. This includes forcing the impact of TF to reach a saturation point. Instead of the impact of length (fieldNorms) always increasing, its impact is
…
). IDF is computed similarly to classic TF × IDF similarity. Will BM25 help your relevance? It’s not that simple. As we discussed in chapter 1, information retrieval focuses heavily on broad, incremental improvements to article-length pieces of text. BM25 may not matter for your specific definition of relevance. For this reason
…
be as vague as “I know it when I see it,” but ideally it should be more concretely defined. At an extreme, the field of information retrieval uses so-called relevance judgments to define “looks good.” Here, given a fixed set of queries and a fixed set of documents, each query-document
…
gather relevance judgments given sufficient user traffic. Armed with these judgments, a program can automatically adjust query parameters on your behalf. In the field of information retrieval, this highest level of automation is known as learning to rank. Learning to rank turns out to be a tricky problem to solve, but it
…
has become a focus in information retrieval. So be on the lookout for improvements and breakthroughs in the near future. We’ll have more to say on learning to rank and test
…
one day being able to automatically converge the ideal relevance parameters. Finally, you need to combine all the insights from this chapter with cutting-edge information retrieval and machine learning. Often simpler relevance gains can be gathered with the straightforward techniques discussed earlier in this book. In our consulting work, we’re
…
inconsistent scoring index-time analysis, 2nd index-time personalization indexing documents information and requirements gathering business needs required and available information users and information needs information retrieval, creating relevance solutions through inner objects innermost calculation integers, tokenizing inventory-related files inventory_dir configuration inverse document frequency. See IDF. inverted index data structure
…
engineer search relevance collaboration and curation and defined difficulty of class of search and lack of single solution feedback and gaining skills of relevance engineer information retrieval research into systematic approach for improving search-as-you-type searchable data semantic expansion sentiment analysis sentinel tokens, 2nd sharding short-tail application SHOULD clause
by Jiawei Han, Micheline Kamber and Jian Pei · 21 Jun 2011
of knowledge discovery and data mining. As a multidisciplinary field, data mining draws on work from areas including statistics, machine learning, pattern recognition, database technology, information retrieval, network science, knowledge-based systems, artificial intelligence, high-performance computing, and data visualization. We focus on issues relating to the feasibility, usefulness, effectiveness, and scalability
…
University in 2002 under Dr. Jiawei Han's supervision. He has published prolifically in the premier academic forums on data mining, databases, Web searching, and information retrieval and actively served the academic community. His publications have received thousands of citations and several prestigious awards. He is an associate editor of several data
…
great boost to the database and information industry, and it enables a huge number of databases and information repositories to be available for transaction management, information retrieval, and data analysis. Data can now be stored in many different kinds of databases and information repositories. One emerging data repository architecture is the data
…
and play a vital role in the information industry. The effective and efficient analysis of data from such different forms of data by integration of information retrieval, data mining, and information network analysis technologies is a challenging task. In summary, the abundance of data, coupled with the need for powerful data analysis
…
highly application-driven domain, data mining has incorporated many techniques from other domains such as statistics, machine learning, pattern recognition, database and data warehouse systems, information retrieval, visualization, algorithms, high-performance computing, and many application domains (Figure 1.11). The interdisciplinary nature of data mining research and development contributes significantly to the
…
. The data cube model not only facilitates OLAP in multidimensional databases but also promotes multidimensional data mining (see Section 1.3.2). 1.5.4. Information Retrieval Information retrieval (IR) is the science of searching for documents or information in documents. Documents can be text or multimedia, and may reside on the Web. The
…
differences between traditional information retrieval and database systems are twofold: Information retrieval assumes that (1) the data under search are unstructured; and (2) the queries are formed mainly by keywords, which do not have
…
complex structures (unlike SQL queries in database systems). The typical approaches in information retrieval adopt probabilistic models. For example, a text document can be regarded as a bag of words, that is, a multiset of words appearing in the
…
a topic model. A text document, which may involve one or multiple topics, can be regarded as a mixture of multiple topic models. By integrating information retrieval models and data mining techniques, we can find the major topics in a collection of documents and, for each document in the collection, the major
…
care information systems. Their effective search and analysis have raised many challenging issues in data mining. Therefore, text mining and multimedia data mining, integrated with information retrieval methods, have become increasingly important. 1.6. Which Kinds of Applications Are Targeted? Where there are data, there are data mining applications As a highly
…
new methods from multiple disciplines. For example, to mine data with natural language text, it makes sense to fuse data mining methods with methods of information retrieval and natural language processing. As another example, consider the mining of software bugs in large programs. This form of mining, known as bug mining, benefits
…
mining, as a highly application-driven domain, has incorporated technologies from many other domains. These include statistics, machine learning, database and data warehouse systems, and information retrieval. The interdisciplinary nature of data mining research and development contributes significantly to the success of data mining and its extensive applications. ■ Data mining has many
…
(PAMI), and Cognitive Science. Textbooks and reference books on information retrieval include Introduction to Information Retrieval by Manning, Raghavan, and Schutz [MRS08]; Information Retrieval: Implementing and Evaluating Search Engines by Büttcher, Clarke, and Cormack [BCC10]; Search Engines: Information Retrieval in Practice by Croft, Metzler, and Strohman [CMS09]; Modern Information Retrieval: The Concepts and Technology Behind Search by Baeza-Yates and
…
), the Text Retrieval Conference (TREC), and the ACM/IEEE Joint Conference on Digital Libraries (JCDL). Other sources of publication include major information retrieval, information systems, and Web journals, such as Journal of Information Retrieval, ACM Transactions on Information Systems (TOIS), Information Processing and Management, Knowledge and Information Systems (KAIS), and IEEE Transactions on Knowledge
…
2.4.6). Section 2.4.7 provides similarity measures for very long and sparse data vectors, such as term-frequency vectors representing documents in information retrieval. Knowing how to compute dissimilarity is useful in studying attributes and will also be referenced in later topics on clustering (Chapter 10 and Chapter 11
…
2 0 3 0 Term-frequency vectors are typically very long and sparse (i.e., they have many 0 values). Applications using such structures include information retrieval, text document clustering, biological taxonomy, and gene feature mapping. The traditional distance measures that we have studied in this chapter do not work well for
…
y to the number of attributes possessed by x or y. This function, known as the Tanimoto coefficient or Tanimoto distance, is frequently used in information retrieval and biology taxonomy. 2.5. Summary ■ Data sets are made up of data objects. A data object represents an entity. Data objects are described by
…
can handle databases of high dimensionality and can quickly compute small local cubes online. It explores the inverted index data structure, which is popular in information retrieval and Web-based information systems. The basic idea is as follows. Given a high-dimensional data set, we partition the dimensions into a set of
…
. Commonly used objective measures include support, confidence, correlation, and tf-idf (or term frequency versus inverse document frequency), where the latter is often used in information retrieval. Subjective measures are based on user beliefs in the data. They therefore depend on the users who examine the patterns. A subjective measure is usually
…
top-k patterns can thus be transformed into finding a k-pattern set that maximizes the marginal significance, which is a well-studied problem in information retrieval. In this field, a document has high marginal relevance if it is both relevant to the query and contains minimal marginal similarity to previously selected
…
of the DBLP data set. 3 The DBLP data set contains papers from the proceedings of 12 major conferences in the fields of database systems, information retrieval, and data mining. Each transaction consists of two parts: the authors and the title of the corresponding paper. 3www.informatik.uni-trier.de/~ley/db
…
extracted because their contexts are similar since they all are database and/or data mining researchers; thus the annotation is meaningful. For the title term “information retrieval,” which is a sequential pattern, its strongest context indicators are usually the authors who tend to use the term in the titles of their papers
…
terms that tend to coappear with it. Its semantically similar patterns usually provide interesting concepts or descriptive terms, which are close in meaning (e.g., “information retrieval → information filter).” In both scenarios, the representative transactions extracted give us the titles of papers that effectively capture the meaning of the given patterns. The
…
involved in estimating classifier accuracy are described in Weiss and Kulikowski [WK91] and Witten and Frank [WF05]. Sensitivity, specificity, and precision are discussed in most information retrieval textbooks. For the F and measures, see van Rijsbergen [vR90]. The use of stratified 10-fold cross-validation for estimating classifier accuracy is recommended over
…
the results in a concise and easily accessible way. Moreover, clustering techniques have been developed to cluster documents into topics, which are commonly used in information retrieval practice. As a data mining function, cluster analysis can be used as a standalone tool to gain insight into the distribution of data, to observe
…
is the subject of Chapter 12. Data clustering is under vigorous development. Contributing areas of research include data mining, statistics, machine learning, spatial database technology, information retrieval, Web search, biology, marketing, and many other application areas. Owing to the huge amounts of data collected in databases, cluster analysis has recently become a
…
analysis, mining associations, and video and audio data mining (Section 13.2.3). Mining Text Data Text mining is an interdisciplinary field that draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. A substantial portion of information is stored as text such as news articles, technical papers, books, digital
…
based on the opinions of other customers who have similar tastes or preferences as the user. Recommender systems use a broad range of techniques from information retrieval, statistics, machine learning, and data mining to search for similarities among items and customer preferences. Consider Example 13.1. Scenarios of using a recommender system
…
, and Zhu [ZHZ00]). An overview of image mining methods is given by Hsu, Lee, and Zhang [HLZ02]. Text data analysis has been studied extensively in information retrieval, with many textbooks and survey articles such as Croft, Metzler, and Strohman [CMS09]; S. Buttcher, C. Clarke, G. Cormack [BCC10]; Manning, Raghavan, and Schutze [MRS08
…
. Syst 11 (2006) 29–44. [AGAV09] Amigó, E.; Gonzalo, J.; Artiles, J.; Verdejo, F., A comparison of extrinsic clustering evaluation metrics based on formal constraints, Information Retrieval 12 (4) (2009) 461–486. [Agg06] Aggarwal, C.C., Data Streams: Models and Algorithms. (2006) Kluwer Academic . [AGGR98] Agrawal, R.; Gehrke, J.; Gunopulos, D.; Raghavan
…
–16. [BC83] Beckman, R.J.; Cook, R.D., Outlier…s, Technometrics 25 (1983) 119–149. [BCC10] Buettcher, S.; Clarke, C.L.A.; Cormack, G.V., Information Retrieval: Implementing and Evaluating Search Engines. (2010) MIT Press, Cambridge, MA . [BCG01] Burdick, D.; Calimlim, M.; Gehrke., J., MAFIA: A maximal frequent itemset algorithm for transactional
…
] Babu, S.; Widom, J., Continuous queries over data streams, SIGMOD Record 30 (2001) 109–120. [BYRN11] Baeza-Yates, R.A.; Ribeiro-Neto, B.A., Modern Information Retrieval. 2nd ed. (2011) Addison-Wesley, Boston . [Cat91] [CBK09] Chandola, V.; Banerjee, A.; Kumar, V., Anomaly detection: A survey, ACM Computing Surveys 41 (2009) 1–58
…
, In: Proc. 2005 Int. Conf. Data Mining (ICDM’05) Houston, TX. (Nov. 2005), pp. 82–89. [CMS09] Croft, B.; Metzler, D.; Strohman, T., Search Engines: Information Retrieval in Practice. (2009) Addison-Wesley, Boston . [CN89] Clark, P.; Niblett, T., The CN2 induction algorithm, Machine Learning 3 (1989) 261–283. [Coh95] Cohen, W., Fast
…
, C.-Y.; Song, Y.-I.; Sun, Y., Finding question-answer pairs from online forums, In: Proc. 2008 Int. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR’08) Singapore. (July 2008), pp. 467–474. [CYHH07] Cheng, H.; Yan, X.; Han, J.; Hsu, C.-W., Discriminative frequent pattern analysis for effective classification
…
), pp. 181–203. [Gol89] Goldberg, D., Genetic Algorithms in Search, Optimization, and Machine Learning. (1989) Addison-Wesley, Reading, MA . [GR04] Grossman, D.A.; Frieder, O., Information Retrieval: Algorithms and Heuristics. (2004) Springer, New York . [GR07] Grunwald, P.D.; Rissanen, J., The Minimum Description Length Principle. (2007) MIT Press, Cambridge, MA . [GRG98] Gehrke
…
Discovery and Data Mining (KDD’08) Las Vegas, NV. (Aug. 2008), pp. 444–452. [KT99] Kleinberg, J.M.; Tomkins, A., Application of linear algebra in information retrieval and hypertext analysis, In: Proc. 18th ACM Symp. Principles of Database Systems (PODS’99) Philadelphia, PA. (May 1999), pp. 185–193. [KYB03] Korf, I.; Yandell
…
. Knowledge Discovery and Data Mining (KDD’95) Montreal, Quebec, Canada. (Aug. 1995), pp. 216–221. [MRS08] Manning, C.D.; Raghavan, P.; Schutze, H., Introduction to Information Retrieval. (2008) Cambridge University Press . [MS03a] Markou, M.; Singh, S., Novelty detection: A review—part 1: Statistical approaches, Signal Processing 83 (2003) 2481–2497. [MS03b] Markou
…
Databases (SSTD’01) Redondo Beach, CA. (July 2001), pp. 443–459. [PL07] Pang, B.; Lee, L., Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2 (2007) 1–135. [Pla98] Platt, J.C., Fast training of support vector machines using sequential minimal optimization, In: (Editors: Schölkopf, B.; Burges, C.J
…
’09) Saint Petersburg, Russia. (Mar. 2009), pp. 565–576. [Sil10] Silvestri, F., Mining query logs: Turning search usage data into knowledge, Foundations and Trends in Information Retrieval 4 (2010) 1–174. [SK08] Shieh, J.; Keogh, E., iSAX: Indexing and mining terabyte sized time series, In: Proc. 2008 ACM SIGKDD Int. Conf. Knowledge
…
, M., Lazy associative classificaiton, In: Proc. 2006 Int. Conf. Data Mining (ICDM’06) Hong Kong, China. (2006), pp. 645–654. [vR90] van Rijsbergen, C.J., Information Retrieval. (1990) Butterworth . [VWI98] Vitter, J.S.; Wang, M.; Iyer, B.R., Data cube approximation and histograms via wavelets, In: Proc. 1998 Int. Conf. Information and
…
mining, In: Proc. 2002 SIAM Int. Conf. Data Mining (SDM’02) Arlington, VA. (Apr. 2002), pp. 457–473. [Zha08] Zhai, C., Statistical Language Models for Information Retrieval. (2008) Morgan and Claypool . [ZHL+98] Zaïane, O.R.; Han, J.; Li, Z.N.; Chiang, J.Y.; Chee, S., MultiMedia-Miner: A system prototype for
…
evolution of 594 link prediction in 593–594 mining 623 OLAP in 594 role discovery in 593–594 similarity search in 594 information processing 153 information retrieval (IR) 26–27 challenges 27 language model 26 topic model 26–27 informativeness model 535 initial working relations 168, 169, 177 instance-based learners. seelazy
by Olivier Cure and Guillaume Blin · 10 Dec 2014
it’s also popular for other systems, such as IBM OmnifindY! Edition, Technorati, Wikipedia, Internet Archive, and LinkedIn. Lucene is a very popular open-source information-retrieval library from the Apache Software Foundation (originally created in Java by Doug Cutting). It provides Java-based full-text indexing 99 100 RDF Database Systems
…
details on RDF-3X and its extension X-RDF-3X are provided in Chapter 6. The YARS (Harth and Decker, 2005) system combines methods from information retrieval and databases to allow for better query answering performance over RDF data. It stores RDF data persistently by using six B+tree indexes. It not
by Chris Eagle · 16 Jun 2011 · 1,156pp · 229,431 words
SetBptCnd(next_eip, "Warning(\"Exception return hit\") || 1"); return 0; //don't stop } This function locates the pointer to the process’s saved register context information , retrieves the saved instruction pointer value from offset 0xB8 within the CONTEXT structure , and sets a breakpoint on this address . In order to make it clear
by Gordon Bell and Jim Gemmell · 15 Feb 2009 · 291pp · 77,596 words
, G. Jones, and A. F. Smeaton. “Retrieval of Similar Travel Routes Using GPS Tracklog Place Names.” SIGIR 2006—Conference on Research and Development on Information Retrieval, Workshop on Geographic Information Retrieval, Seattle, Washington, August 6-11, 2006. Gurrin, C., A. F. Smeaton, D. Byrne, N. O’Hare, G. Jones, and N. O’Connor. “An
…
Examination of a Large Visual Lifelog.” AIRS 2008—Asia Information Retrieval Symposium, Harbin, China, January 16-18, 2008. Lavelle, B., D. Byrne, C. Gurrin, A. F. Smeaton, and G. Jones. “Bluetooth Familiarity: Methods of Calculation, Applications
…
E. O’Connor, and Gareth J. F. Jones. “Adaptive Visual Summary of LifeLog Photos for Personal Information Management.” AIR 2006—First International Workshop on Adaptive Information Retrieval, Glasgow, UK, October 14, 2006. O’Conaire, C., N. O’Connor, A. F. Smeaton, and G. Jones. “Organizing a Daily Visual Diary Using Multi-Feature
…
. Hori, Tetsuro, and Kiyohara Aizawa. “Context-Based Video Retrieval System for the Life-Log Applications.” Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, California, 2003. reQall has a Web site, and there are a number of good articles about reQall, including the one in Forbes, below. reQall
…
.” IEEE Colloquium on Multimedia Communications and Applications, February 7, 1991, pp. 5/1-5/3. Lamming, M. G., and W. M. Newman. 1992: “Activity-based Information Retrieval: Technology in Support of Personal Memory.” Personal Computers and Intelligent Systems: Information Processing ’92. Amsterdam: North-Holland, 68-81. Newman,W. M., M. A. Eldridge
…
are typing. A contextual retrieval application called Margin Notes has also been developed for Web browsing. Rhodes, Bradley. 2003. “Physical Context for Just-in-Time Information Retrieval.” IEEE Transactions on Computers 52, no. 8 (August): 1011-14. ———. 1997. “The Wearable Remembrance Agent: A System for Augmented Memory.” Special Issue on Wearable Computing
…
the International Conference on Intelligent User Interfaces (IUI ’00), New Orleans, Louisiana, January 9-12, 2000. Rhodes, Bradley, and Pattie Maes. 2000. “Just-in-Time Information Retrieval Agents.” Special issue on the MIT Media Laboratory, IBM Systems Journal 39, nos. 3 and 4: 685-704. Rhodes, Bradley, and Thad Starner. “The Remembrance
…
Agent: A Continuously Running Automated Information Retrieval System. The Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi Agent Technology (PAAM ’96), London, UK, April 1996
…
page. http://flamenco.berkeley.edu Hearst, Marti A. “UIs for Faceted Navigation: Recent Advances and Remaining Open Problems, in the Workshop on Computer Interaction and Information Retrieval,” HCIR 2008, Redmond, Washington, October 2008. For those of you interested in trying a start-up: Bell, C. Gordon, and John E. McNamara. 1991. High
by Mehmed Kantardzić · 2 Jan 2003 · 721pp · 197,134 words
attempt to present and discuss such issues and principles and then describe representative and popular methods originating from statistics, machine learning, computer graphics, data bases, information retrieval, neural networks, fuzzy logic, and evolutionary computation. In this book, we describe how best to prepare environments for performing data mining and discuss approaches that
…
. The reader may have noticed the similarity between the problem of finding nearest neighbors for a test sample and ad hoc retrieval methodologies. In standard information retrieval systems such as digital libraries or web search, we search for the documents (samples) with the highest similarity to the query document represented by a
…
readers interested in practical implementation of some clustering methods, the paper offers useful advice and a large spectrum of references. Miyamoto, S., Fuzzy Sets in Information Retrieval and Cluster Analysis, Cluver Academic Publishers, Dodrecht, Germany, 1990. This book offers an in-depth presentation and analysis of some clustering algorithms and reviews the
…
possibilities of combining these techniques with fuzzy representation of data. Information retrieval, which, with the development of advanced Web-mining techniques, is becoming more important in the data-mining community, is also explained in the book. 10
…
authoring styles and content variation than that seen in traditional print document collections. This level of complexity makes an “off-the-shelf” database-management and information-retrieval solution very complex and almost impossible to use. New methods and tools are necessary. Web mining may be defined as the use of data-mining
…
the page linked. This assumption underlies PageRank and HITS, which will be explained later in this section. Web-structure mining is mainly used in the information retrieval (IR) process. PageRank may have directly contributed to the early success of Google. Certainly the analysis of the structure of the Internet and the interlinking
…
engine prototype called EnviroDaemon. Garcia, E., SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations, Mi Islita, 2006, http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-4-lsi-how-to-calculations.html. This Web tutorial provides students with a greater understanding of latent semantic indexing. It provides
…
structural similarity function such as the edit distance. This clustering approach makes it an ideal technique for applications in areas such as scientific-data exploration, information retrieval, computational biology, Web-log analysis, forensics analysis, and blog analysis. Link analysis is an important field that has received a lot of attention recently when
…
real world connection between two entities. Probably the most famous example of exploiting link structure in the graph is the use of links to improve information retrieval results. Both, the well-known PageRank measure and hubs, and authority scores are based on the link structure of the Web. Link analysis techniques are
…
include prediction, classification, clustering, search and retrieval, and pattern discovery. The first four have been investigated extensively in traditional time-series analysis, pattern recognition, and information retrieval. We will concentrate in this text on illustrative examples of algorithms for pattern discovery in large databases, which are of more recent origin and showing
…
both a theoretical and experimental point of view. It covers a wide scope of research areas including data representation, structuring and querying, as well as information retrieval and data mining. It encompasses different forms of databases, including data warehouses, data cubes, tabular or relational data, and many applications, among which are music
…
, 2006, pp. 76–82. Garcia, E., SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations, Mi Islita, 2006, http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-4-lsi-how-to-calculations.html. Han, J., M. Kamber, Data Mining: Concepts and Techniques, 2nd edition, San Francisco, Morgan Kaufmann
…
, J., Fuzzy Logic Systems for Engineering: A Tutorial, Proceedings of the IEEE, Vol. 83, No. 3, 1995, pp. 345–377. Miyamoto, S., Fuzzy Sets in Information Retrieval and Cluster Analysis, Kluwer Academic Publishers, Dordrecht, 1990. Munakata, T., Fundamentals of the New Artificial Intelligence: Beyond Traditional Paradigm, Springer, New York, 1998. Özyer, T
…
techniques Histogram Holdout method Hubs Hyperbolic tangent sigmoid Hypertext Icon-based visualization Induction Inductive-learning methods Inductive machine learning Inductive principle Info function Information visualization Information retrieval (IR) Initial population Interesting association rules Internet searching Interval scale Inverse document frequency Itemset Jaccard coefficient Kernel function Knowledge distillation Large data set Large itemset
by Alex Wright · 6 Jun 2014
precursor to the once familiar, now rapidly disappearing, library card catalog.23 Today, we might tend to think of the card catalog as a simplistic information retrieval tool: the dominion of somber librarians in fusty reading rooms. However, to take such a dismissive view of these compact, efficient systems—the direct ancestors
…
of Technology, published his famous essay “As We May Think.” Today, most computer science historians have characterized Bush’s Rapid Selector as the first electronic information-retrieval machine. When Bush tried to patent his invention in 1937 and 1940, however, the U.S. Patent Office turned him down, citing Goldberg’s work
…
access to information might help prevent future wars. Beginning with his 1905 work, A Modern Utopia, Wells had developed a fascination with the problem of information retrieval— the need for better methods for organizing the world’s recorded knowledge. This led him to reject old values and institutional strictures and embrace a
…
—seeming to echo Otlet’s notion of a personalized knowledge system that would let anyone tap into humanity’s entire intellectual output, constructing a personalized information retrieval system from the comfort of one’s armchair. 291 C ATA L O G I N G T H E WO R L D Otlet
…
change, and a belief in the possibility of spiritual liberation. Otlet’s vision for an international knowledge network—always far more expansive than a mere information retrieval tool—points toward a more purposeful vision of what the global network could yet become. And while history may judge Otlet a relic from another
…
Story of Greed, Terror, and Heroism in Colonial Africa. Boston: Houghton Mifflin, 1999. Hunter, E. J. Classification Made Simple: An Introduction to Knowledge Organisation and Information Retrieval. Aldershot: Ashgate, 2009. “Industrial History | Belgium.” European Route of Industrial Heritage. Accessed July 27, 2013. http://www.erih.net/topmenu/about-erih.html. “Internet 2012
by Sam Newman · 25 Dec 2014 · 540pp · 103,101 words
you going when your system has to handle very different volumes of load. As Jeff Dean said in his presentation “Challenges in Building Large-Scale Information Retrieval Systems” (WSDM 2009 conference), you should “design for ~10× growth, but plan to rewrite before ~100×.” At certain points, you need to do something pretty
by Donald Ervin Knuth · 15 Jan 1998
sound as if this book is only for those systems programmers who are concerned with the preparation of general-purpose sorting routines or applications to information retrieval. But in fact the area of sorting and searching provides an ideal framework for discussing a wide variety of important general issues: • How are good
…
complicated questions. Indeed, we might consider an entire library as a database, and a searcher may want to find everything that has been published about information retrieval. An introduction to the techniques for such secondary key (multi-attribute) retrieval problems appears below in Section 6.5. Before entering into a detailed study
…
in Table 1 as a trie structure; this name was suggested by E. Fredkin [CACM 3 A960), 490-500] because it is a part of information retrieval. A trie — pronounced "try" —is essentially an M-ary tree, whose nodes are M-place vectors with components corresponding to digits or characters. Each node
…
tries, have been analyzed by P. Kirschenhofer and H. Prodinger, Random Structures and Algorithms 5 A994), 123-134. *Balanced filing schemes. Another combinatorial approach to information retrieval, based on balanced incomplete block designs, has been the subject of considerable investigation. Although the subject is quite interesting from a mathematical point of view
…
in detail in Marshall Hall, Jr.'s book Combinatorial Theory (Waltham, Mass.: Blaisdell, 1967). Although such combinatorial configurations are very beautiful, their main application to information retrieval so far has been to decrease the redundancy incurred when compound inverted lists are being used; and David K. Chow [Information and Control-IB A969
…
. Infinity, 4, 138-139, 142-144, 156, 214, 257-258, 263, 521, 624-625, 646, 663-664, 685, 707. as sentinel, 159, 252, 308, 324. Information retrieval, 392, 395. 766 INDEX AND GLOSSARY Information theory, 183, 198, 442-444, 633. lower bounds from, 183, 194, 202, 204, 655. Inner loop: Part of
by Greg Nudelman and Pabini Gabriel-Petit · 8 May 2011
by Geoffrey C. Bowker · 24 Aug 2000
by Benjamin K. Bergen · 12 Sep 2016 · 364pp · 102,926 words
by Ben Goertzel and Pei Wang · 1 Jan 2007 · 303pp · 67,891 words
by Martin Ford · 13 Sep 2021 · 288pp · 86,995 words
by Foster Provost and Tom Fawcett · 30 Jun 2013 · 660pp · 141,595 words
by Gary Price, Chris Sherman and Danny Sullivan · 2 Jan 2003 · 481pp · 121,669 words
by Daniel J. Levitin · 18 Aug 2014 · 685pp · 203,949 words
by Trey Grainger and Timothy Potter · 14 Sep 2014 · 1,085pp · 219,144 words
by Ian Goldin and Chris Kutarna · 23 May 2016 · 437pp · 113,173 words
by Michael W. Berry and Murray Browne · 15 Jan 2005
by W. Richard Stevens, Bill Fenner, Andrew M. Rudoff · 8 Jun 2013
by Mark Masse · 19 Oct 2011 · 153pp · 27,424 words
by Ronald J. Deibert · 14 Aug 2020
by Safiya Umoja Noble · 8 Jan 2018 · 290pp · 73,000 words
by Dipanjan Sarkar · 1 Dec 2016
by Martin Kleppmann · 16 Mar 2017 · 1,237pp · 227,370 words
by Andy Oram · 26 Feb 2001 · 673pp · 164,804 words
by Emmanuel Goldstein · 28 Jul 2008 · 889pp · 433,897 words
by John Brockman · 18 Jan 2011 · 379pp · 109,612 words
by Stuart Russell and Peter Norvig · 14 Jul 2019 · 2,466pp · 668,761 words
by David Sawyer McFarland · 28 Oct 2011 · 924pp · 196,343 words
by Jim Jansen · 25 Jul 2011 · 298pp · 43,745 words
by Tom White · 29 May 2009 · 933pp · 205,691 words
by Toby Segaran and Jeff Hammerbacher · 1 Jul 2009
by Viktor Mayer-Schönberger · 1 Jan 2009 · 263pp · 75,610 words
by Michal Zalewski · 26 Nov 2011 · 570pp · 115,722 words
by Belinda Barnet · 14 Jul 2013 · 193pp · 19,478 words
by Dafydd Stuttard and Marcus Pinto · 30 Sep 2007 · 1,302pp · 289,469 words
by Justin Schuh · 20 Nov 2006 · 2,054pp · 359,149 words
by Clive Thompson · 11 Sep 2013 · 397pp · 110,130 words
by M. Mitchell Waldrop · 14 Apr 2001
by Rafal Kuc and Marek Rogozinski · 14 Aug 2013 · 480pp · 99,288 words
by Martin Kleppmann · 17 Apr 2017
by Nick Bostrom · 3 Jun 2014 · 574pp · 164,509 words
by Eric Enge, Stephan Spencer, Jessie Stricchiola and Rand Fishkin · 7 Mar 2012
by Matthew A. Russell · 15 Jan 2011 · 541pp · 109,698 words
by Steven Bird, Ewan Klein and Edward Loper · 15 Dec 2009 · 504pp · 89,238 words
by Jing Tsu · 18 Jan 2022 · 408pp · 105,715 words
by Geoffrey C. Bowker and Susan Leigh Star · 25 Aug 2000 · 357pp · 125,142 words
by David J. Leinweber · 31 Dec 2008 · 402pp · 110,972 words
by James Vlahos · 1 Mar 2019 · 392pp · 108,745 words
by Julie Steele · 20 Apr 2010
by Harold Abelson, Gerald Jay Sussman and Julie Sussman · 25 Jul 1996 · 893pp · 199,542 words
by Coingecko, Darren Lau, Sze Jin Teh, Kristian Kho, Erina Azmi, Tm Lee and Bobby Ong · 22 Mar 2020 · 135pp · 26,407 words
by Thierry Bardini · 1 Dec 2000
by Robin Sharp · 13 Feb 2008
by Steven Pinker · 1 Jan 1997 · 913pp · 265,787 words
by Thierry Poibeau · 14 Sep 2017 · 174pp · 56,405 words
by Harold Abelson, Gerald Jay Sussman and Julie Sussman · 1 Jan 1984 · 1,387pp · 202,295 words
by David Easley and Jon Kleinberg · 15 Nov 2010 · 1,535pp · 337,071 words
by Alistair Cockburn · 30 Sep 2000
by Steven Levy · 12 Apr 2011 · 666pp · 181,495 words
by Richard Susskind and Daniel Susskind · 24 Aug 2015 · 742pp · 137,937 words
by Lisa Gitelman · 26 Mar 2014
by Unknown · 13 Jan 2012 · 470pp · 109,589 words
by Adam Freeman · 25 Mar 2014 · 671pp · 228,348 words
by Justin Peters · 11 Feb 2013 · 397pp · 102,910 words
by Anthony T. Holdener · 25 Jan 2008 · 982pp · 221,145 words
by Zdravko Markov and Daniel T. Larose · 5 Apr 2007
by Martin Ford · 16 Nov 2018 · 586pp · 186,548 words
by John Markoff · 24 Aug 2015 · 413pp · 119,587 words
by Erik J. Larson · 5 Apr 2021
by John Markoff · 1 Jan 2005 · 394pp · 108,215 words
by Benjamin H. Bratton · 19 Feb 2016 · 903pp · 235,753 words
by Rachel Botsman and Roo Rogers · 2 Jan 2010 · 411pp · 80,925 words
by Leslie Sikos · 10 Jul 2015
by Douglas R. Dechow · 2 Jul 2015 · 223pp · 52,808 words
by Ray Kurzweil · 14 Jul 2005 · 761pp · 231,902 words
by Matthew Hindman · 24 Sep 2018
by Yves Hilpisch · 8 Dec 2020 · 1,082pp · 87,792 words
by John Brockman · 14 Feb 2012 · 416pp · 106,582 words
by Michael Harris · 6 Aug 2014 · 259pp · 73,193 words
by Mark Last, Abraham Kandel and Horst Bunke · 24 Jun 2004 · 205pp · 20,452 words
by John MacCormick and Chris Bishop · 27 Dec 2011 · 250pp · 73,574 words
by Adam Greenfield · 29 May 2017 · 410pp · 119,823 words
by Cherie L. Weible and Karen L. Janke · 15 Apr 2011 · 144pp · 55,142 words
by Toby Segaran · 17 Dec 2008 · 519pp · 102,669 words
by David Talbot · 5 Sep 2016 · 891pp · 253,901 words
by Kim Zetter · 11 Nov 2014 · 492pp · 153,565 words
by Angel Au-Yeung and David Jeans · 25 Apr 2023 · 427pp · 134,098 words
by Ralph Watson McElvenny and Marc Wortman · 14 Oct 2023 · 567pp · 171,072 words
by Manuel Castells · 31 Aug 1996 · 843pp · 223,858 words
by David Smiley and Eric Pugh · 15 Nov 2009 · 648pp · 108,814 words
by Siva Vaidhyanathan · 1 Jan 2010 · 281pp · 95,852 words
by Tom Clancy and Scott Brick · 2 Jan 2002
by Tim Berners-Lee · 8 Sep 2025 · 347pp · 100,038 words
by Larry Harris · 2 Jan 2003 · 1,164pp · 309,327 words
by Stephen Baker · 17 Feb 2011 · 238pp · 77,730 words
by Katie Hafner and Matthew Lyon · 1 Jan 1996 · 352pp · 96,532 words
by Alastair Reynolds · 14 Feb 2006 · 436pp · 124,373 words
by Simon Head · 14 Aug 2003 · 242pp · 245 words
by Shoshana Zuboff · 14 Apr 1988
by Jure Leskovec, Anand Rajaraman and Jeffrey David Ullman · 13 Nov 2014
by Steven Levy · 2 Feb 1994 · 244pp · 66,599 words
by Sonja Thiel and Johannes C. Bernhardt · 31 Dec 2023 · 321pp · 113,564 words
by Markus Krajewski and Peter Krapp · 18 Aug 2011 · 222pp · 74,587 words
by Daniel C. Dennett · 7 Feb 2017 · 573pp · 157,767 words
by Dariusz Jemielniak and Aleksandra Przegalinska · 18 Feb 2020 · 187pp · 50,083 words
by Jill Lepore · 14 Sep 2020 · 467pp · 149,632 words
by Takuro Sato · 17 Nov 2015
by Andrew B. King · 15 Mar 2008 · 597pp · 119,204 words
by James Gleick · 1 Mar 2011 · 855pp · 178,507 words
by Brian Christian and Tom Griffiths · 4 Apr 2016 · 523pp · 143,139 words
by John Seely Brown and Paul Duguid · 2 Feb 2000 · 791pp · 85,159 words
by David N. Blank-Edelman · 16 Sep 2018
by James Pustejovsky and Amber Stubbs · 14 Oct 2012 · 502pp · 107,510 words
by Robert N. Proctor · 28 Feb 2012 · 1,199pp · 332,563 words
by Sherry Turkle · 11 Jan 2011 · 542pp · 161,731 words
by Lawrence Freedman · 31 Oct 2013 · 1,073pp · 314,528 words
by Luke Dormehl · 10 Aug 2016 · 252pp · 74,167 words
by Pete Warden · 20 Sep 2011 · 58pp · 12,386 words
by Jodi Taylor · 8 Jan 2013
by Claire L. Evans · 6 Mar 2018 · 371pp · 93,570 words
by Jacob Silverman · 17 Mar 2015 · 527pp · 147,690 words
by Alec Nevala-Lee · 1 Aug 2022 · 864pp · 222,565 words
by Thomas H. Davenport and Julia Kirby · 23 May 2016 · 347pp · 97,721 words
by Alvin Toffler · 1 Jun 1984 · 286pp · 94,017 words
by Peter F. Hamilton · 18 Aug 2010 · 857pp · 232,302 words
by Aurélien Géron · 13 Mar 2017 · 1,331pp · 163,200 words
by Frank Pasquale · 17 Nov 2014 · 320pp · 87,853 words
by Gayle Laakmann Mcdowell · 25 Jan 2011 · 242pp · 71,938 words
by James Dale Davidson and William Rees-Mogg · 3 Feb 1997 · 582pp · 160,693 words
by Cal Newport · 5 Jan 2016
by Arvid Kahl · 24 Jun 2020 · 461pp · 106,027 words
by Donald Ervin Knuth · 15 Jan 2001
by David B. Agus · 15 Oct 2012 · 433pp · 106,048 words
by Peter Morville · 14 May 2014 · 165pp · 50,798 words
by Joe Karaganis · 3 May 2018 · 334pp · 123,463 words
by Benjamin R. Barber · 5 Nov 2013 · 501pp · 145,943 words
by Federico Biancuzzi and Shane Warden · 21 Mar 2009 · 496pp · 174,084 words
by Cory Doctorow · 15 Sep 2008 · 189pp · 57,632 words
by Nicholas Carr · 28 Jan 2025 · 231pp · 85,135 words
by Nathan L. Ensmenger · 31 Jul 2010 · 429pp · 114,726 words
by Pedro Domingos · 21 Sep 2015 · 396pp · 117,149 words
by Peter Seibel · 22 Jun 2009 · 1,201pp · 233,519 words
by Mariana Mazzucato · 1 Jan 2011 · 382pp · 92,138 words
by Pedro Gairifo Santos · 7 Nov 2011 · 353pp · 104,146 words
by Richard Susskind · 10 Jan 2013 · 160pp · 45,516 words
by Ray Kurzweil · 31 Dec 1998 · 696pp · 143,736 words
by Peter Marshall · 1 Feb 1997
by Anthony Mancuso · 2 Jan 1977
by Philip Augar · 20 Apr 2005 · 290pp · 83,248 words
by Ajay Agrawal, Joshua Gans and Avi Goldfarb · 16 Apr 2018 · 345pp · 75,660 words
by Margaret O'Mara · 8 Jul 2019
by Anson-QA
by Veljko Krunic · 29 Mar 2020
by Paul Adams · 1 Nov 2011 · 123pp · 32,382 words
by Robert F. Barsky · 2 Feb 1997
by Matt Taibbi · 8 Apr 2014 · 455pp · 138,716 words
by Guy Standing · 13 Jul 2016 · 443pp · 98,113 words
by Lynda Gratton and Andrew Scott · 1 Jun 2016 · 344pp · 94,332 words
by Library Of Congress and Carla Hayden · 3 Apr 2017
by Trent Hauck · 3 Nov 2014
by Pistono, Federico · 14 Oct 2012 · 245pp · 64,288 words
by Jon Gertner · 15 Mar 2012 · 550pp · 154,725 words
by Adam Fisher · 9 Jul 2018 · 611pp · 188,732 words
by Wallace W. Kravitz · 30 Apr 1990
by Jamie Bartlett · 4 Apr 2018 · 170pp · 49,193 words
by Cathy O'Neil and Rachel Schutt · 8 Oct 2013 · 523pp · 112,185 words
by John Brockman · 19 Feb 2019 · 339pp · 94,769 words
by Jonathan Taplin · 17 Apr 2017 · 222pp · 70,132 words
by Sales, Leonard John. · 19 Sep 2008
by Dee Maldon · 16 Mar 2010 · 32pp · 7,759 words
by Yarden Katz
by Joanna Walsh · 22 Sep 2025 · 255pp · 80,203 words
by Adam Lashinsky · 31 Mar 2017 · 190pp · 62,941 words
by David S. Abraham · 27 Oct 2015 · 386pp · 91,913 words
by Frank Pasquale · 14 May 2020 · 1,172pp · 114,305 words
by Russell-Jones, Neil. · 21 Mar 2008
by Richard Dawkins · 1 Jan 1976 · 365pp · 117,713 words