description: process of analysing text to extract information from it
44 results
by Nicolas Niarchos · 20 Jan 2026 · 654pp · 170,150 words
investor who, along with his brother Jean-Raymond, would aid the rebellion in search of diamonds and other raw materials. GO TO NOTE REFERENCE IN TEXT mine entrepreneur of the era: Congo mine entrepreneur, interview with the author, August 2023. GO TO NOTE REFERENCE IN TEXT his fellow rebel leaders: How Kabila
by Brad Feld · 8 Oct 2012 · 169pp · 56,250 words
two individuals were sharing freely on the Internet. I met this small team sitting in the old fishing factory in the Reykjavik Harbor, working on text mining. They were each younger than 25 years old and called their company CLARA. They wanted to build a software-as-a-service company that helped
by Jeremy Howard, Mike Loukides and Margit Zwemer · 23 Mar 2012 · 23pp · 5,264 words
-made disasters. Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools
by Eric Siegel · 19 Feb 2013 · 502pp · 107,657 words
Origins of the Digital Universe (Pantheon Books, 2012). Natural language processing: Dursun Delen, Andrew Fast, Thomas Hill, Robert Nisbit, John Elder, and Gary Miner, Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications (Academic Press, 2012). James Allen, Natural Language Understanding, 2nd ed. (Addison-Wesley, 1994). Regarding the translation
…
global technology company: Thanks to Dean Abbott, Abbot Analytics (http://abbottanalytics.com/index.php) for information about this case study. “Inductive Business-Rule Discovery in Text Mining.” Text Analytics World San Francisco Conference, March 7, 2012, San Francisco, CA. www.textanalyticsworld.com/sanfrancisco/2012/agenda/full-agenda#day11040–11–2. Leading payments
…
more, see: www.campana.com/axis/products/SSERS0308.pdf. Analytics leaders: G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and B. Nisbet, Practical Text Mining and Statistical Analysis for Non-Structured Data Text Applications (Academic Press, 2012), Part II, Tutorial B, p. 181, by Jennifer Thompson and Thomas Hill. Continental
…
-datafest. For final results, see http://stat.duke.edu/datafest/results. G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and B. Nisbet, Practical Text Mining and Statistical Analysis for Non-Structured Data Text Applications (Academic Press, 2012), Part II, Tutorial K, p. 417, by Richard Foley of SAS. Tap directly
…
.kiva.org. U.S. Social Security Administration: Thanks to John Elder, PhD, Elder Research, Inc. (www.datamininglab.com), for this case study. John Elder, PhD, “Text Mining to Fast-Track Deserving Disability Applicants,” Elder Research, Inc., August 7, 2010. http://videolectures.net/site/normal_dl/tag=73772/kdd2010_elder_tmft_01.pdf
…
. John Elder, PhD, “Text Mining: Lessons Learned,” Text Analytics World San Francisco Conference, March 7, 2012, San Francisco, CA. www.textanalyticsworld.com/sanfrancisco/2012/agenda/full-agenda#day1520–605. Infinity
…
/xpl/freeabs_all.jsp?reload=true&arnumber=5771407. Researchers (lie detection): G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and B. Nisbet, Practical Text Mining and Statistical Analysis for Non-Structured Data Text Applications (Academic Press, 2012), Part II, Tutorial N, p. 509. Christie M. Fuller, David P. Biros, and
…
-away-in-their-tweets_b14805. Analytics leaders and a psychiatry professor: G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and B. Nisbet, Practical Text Mining and Statistical Analysis for Non-Structured Data Text Applications (Academic Press, 2012), Part II, Tutorial I, p. 395. University of California, Berkeley: These researchers write
…
tax refunds TCP/IP Telenor (Norway) Teragram terrorism, predicting Tesco (UK) test data test preparation, predicting Tetlock, Paul text analytics Text Analytics World text data text mining. See text analytics thought and understanding, PA for thoughtcrimes Tian, Lu Titantic (ship) tobacco. See smoking and smokers Tolstoy, Leo traffic, predicting train tracks, predicting
by Giulio Boccaletti · 13 Sep 2021 · 485pp · 133,655 words
–1947. Manchester, UK: Manchester University Press, 1987. Brown, Peter. Augustine of Hippo: A Biography. Berkeley: University of California Press, 2000. Brumfield, Sara. “Imperial Methods: Using Text Mining and Social Network Analysis to Detect Regional Strategies in the Akkadian Empire.” PhD diss., UCLA, 2013. Bruno, Giovanni. “Capitale straniero e industria elettrica nell’Italia
by Foster Provost and Tom Fawcett · 30 Jun 2013 · 660pp · 141,595 words
tried to be consistent with typography, reserving fixed-width typewriter fonts like sepal_width to indicate attributes or keywords in data. For example, in the text-mining chapter, a word like 'discussing' designates a word in a document while discuss might be the resulting token in the data. The following typographical conventions
…
dedicated pre-processing steps and sometimes specific expertise on the part of the data science team. Entire books and conferences (and companies) are devoted to text mining. In this chapter we can only scratch the surface, to give a basic overview of the techniques and issues involved in typical business applications. First
…
steps to transform a body of text into a set of data that can be fed into a data mining algorithm. The general strategy in text mining is to use the simplest (least expensive) technique that works. Nevertheless, these ideas are the key technology underlying much of web search, like Google and
…
or other linguistic analysis. It performs surprisingly well on a variety of tasks, and is usually the first choice of data scientists for a new text mining problem. Still, there are applications for which bag of words representation isn’t good enough and more sophisticated techniques must be brought to bear. Here
…
will see again in the movie recommendation example in Chapter 12). Example: Mining News Stories to Predict Stock Price Movement To illustrate some issues in text mining, we introduce a new predictive mining task: we’re going to predict stock price fluctuations based on the text of news stories. Roughly speaking, we
…
news recommendation. From this point of view, there is a huge stream of market news coming in—some interesting, most not. We’d like predictive text mining to recommend interesting news stories that we should pay attention to. What’s an interesting story? Here we’ll define it as news that will
…
the problem further to make it more tractable (in fact, this task is a good example of problem formulation as much as it is of text mining). Here are some of the problems and simplifying assumptions: It is difficult to predict the effect of news far in advance. With many stocks, news
…
stories are pre-tagged with stocks, which are mostly accurate (Sidebar: The News Is Messy goes into some details on why this is a difficult text mining problem). Almost all stories have timestamps (those without are discarded) so we can align them with the correct day and trading window. Because we want
…
. A paper by Mao et al. (2011) provides a good analysis and comparison of the effect of these additional sources. Finally, though it’s not text mining per se, let us mention the paper “Legislating Stock Prices” by Cohen, Diether, and Malloy (2012). These researchers examined the relationship of politicians, legislation, and
…
details of algorithms and applications, and think instead about the fundamentals, we might notice that the ideas discussed in the example of problem formulation for text mining (Chapter 10) would apply very well here—even though this example has nothing to do with text. When mining data on documents, we often ignore
…
, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415-444. Mittermayer, M., & Knolmayer, G. (2006). Text mining systems for market response to news: A survey. Working Paper No.184, Institute of Information Systems, University of Bern. Muoio, A. (1997). They have a
by Mehmed Kantardzić · 2 Jan 2003 · 721pp · 197,134 words
EFFICIENCY OF THE APRIORI ALGORITHM 10.5 FP GROWTH METHOD 10.6 ASSOCIATIVE-CLASSIFICATION METHOD 10.7 MULTIDIMENSIONAL ASSOCIATION–RULES MINING 11 WEB MINING AND TEXT MINING 11.1 WEB MINING 11.2 WEB CONTENT, STRUCTURE, AND USAGE MINING 11.3 HITS AND LOGSOM ALGORITHMS 11.4 MINING PATH–TRAVERSAL PATTERNS 11
…
.5 PAGERANK ALGORITHM 11.6 TEXT MINING 11.7 LATENT SEMANTIC ANALYSIS (LSA) 12 ADVANCES IN DATA MINING 12.1 GRAPH MINING 12.2 TEMPORAL DATA MINING 12.3 SPATIAL DATA MINING
…
aspects of local modeling in large data sets are addressed in Chapter 10, and common techniques of association-rule mining are presented. Web mining and text mining are becoming one of the central topics for many researchers, and results of these activities are new algorithms summarized in Chapter 11. There are a
…
understood pseudo-code, and they are suitable for use in real-world, large-scale data-mining projects, including advanced applications such as Web mining and text mining. Hand, D., H. Mannila, P. Smith, Principles of Data Mining, MIT Press, Cambridge, MA, 2001. The book consists of three sections. The first, foundations, provides
…
applications of the SVM have been in image processing, in particular handwritten digit recognition and face recognition. Other interesting application areas for SVMs are in text mining and categorization of large collection of documents, and in the analysis of genome sequences in bioinformatics. Furthermore, the SVM has been successfully used in a
…
understood pseudo-code and they are suitable for use in real-world, large-scale data-mining projects including advanced applications such as Web mining and text mining. Hand, D., H. Mannila, P. Smith, Principles of Data Mining, MIT Press, Cambridge, MA, 2001. The book consists of three sections. The first, foundations, provides
…
-code and they are suitable for use in real-world, large-scale data-mining projects including advanced applications such as Web mining and text mining. 11 WEB MINING AND TEXT MINING Chapter Objectives Explain the specifics of Web mining. Introduce a classification of basic Web-mining subtasks. Illustrate the possibilities of Web mining using
…
Hyperlink-Induced Topic Search (HITS), LOGSOM, and Path Traversal algorithms. Describe query-independent ranking of Web pages. Formalize a text-mining framework specifying the refining and distillation phases. Outline latent semantic indexing. 11.1 WEB MINING In a distributed information environment, documents or objects are usually
…
in a simplified form in this text, had numerous modifications to evolve into the current commercial version implemented in the Google search engine. 11.6 TEXT MINING Enormous amounts of knowledge reside today in text documents that are stored either within organizations or are freely available. Text databases are rapidly growing because
…
publications, digital libraries, e-mail, and the World Wide Web. Data stored in most text databases are semi-structured, and special data-mining techniques, called text mining, have been developed for discovering new information from large collections of textual data. In general, there are two key technologies that make online
…
text mining possible. One is Internet search capabilities and the other is the text analysis methodology. Internet search has been around for a few years. With the
…
in the graph. Content-based analysis and partition of documents is a more complicated problem. Some progress has been made along these lines, and new text-mining techniques have been defined, but no standards or common theoretical background has been established in the domain. Generally, you can think of text categorization as
…
the efficiency and effectiveness of a search process to find similar or related information; and 4. to detect duplicate information or documents in an archive. Text mining is an emerging set of functionalities that are primarily built on text-analysis technology. Text is the most common vehicle for the formal exchange of
…
commercial text-retrieval systems are based on inverted text indices composed of statistics such as word occurrence per document, text mining must provide values beyond the retrieval of text indices such as keywords. Text mining is about looking for semantic patterns in text, and it may be defined as the process of analyzing text
…
to extract interesting, nontrivial information that is useful for particular purposes. As the most natural form of storing information is text, text mining is believed to have a commercial potential even higher than that of traditional data mining with structured data. In fact, recent studies indicate that 80
…
% of a company’s information is contained in text documents. Text mining, however, is also a much more complex task than traditional data mining as it involves dealing with unstructured text data that are inherently ambiguous
…
. Text mining is a multidisciplinary field involving IR, text analysis, information extraction, natural language processing, clustering, categorization, visualization, machine learning, and other methodologies already included in the
…
-intelligence gathering, e-mail management, claim analysis, e-procurement, and automated help desk are only a few of the possible applications where text mining can be deployed successfully. The text-mining process, which is graphically represented in Figure 11.6, consists of two phases: text refining, which transforms free-form text documents into a
…
chosen intermediate form (IF), and knowledge distillation, which deduces patterns or knowledge from an IF. Figure 11.6. A text-mining framework. An IF can be semi-structured, such as a conceptual-graph representation, or structured, such as a relational-data representation. IFs with varying degrees
…
concepts. These semantic-analysis methods are computationally expensive, and it is a challenge to make them more efficient and scalable for very large text corpora. Text-mining operations such as predictive modeling and association discovery fall in this category. A document-based IF can be transformed into a concept-based IF by
…
concept-based is a domain-dependent representation. Text-refining and knowledge-distillation functions as well as the IF adopted are the basis for classifying different text-mining tools and their corresponding techniques. One group of techniques, and some recently available commercial products, focuses on document organization, visualization, and navigation. Another group focuses
…
on text-analysis functions, IR, categorization, and summarization. An important and large subclass of these text-mining tools and techniques is based on document visualization. The general approach here is to organize documents based on their similarities and present the groups or
…
clusters of the documents as 2-D or 3-D graphics. IBM’s Intelligent Miner and SAS Enterprise Miner are probably the most comprehensive text-mining products today. They offer a set of text-analysis tools that include tools for feature-extraction, clustering, summarization, and categorization; they also incorporate a text
…
search engine. More examples of text-mining tools are given in Appendix A. Domain knowledge, not used and analyzed by any currently available text-mining tool, could play an important role in the text-mining process. Specifically, domain knowledge can be used as early as in the text
…
a part in knowledge distillation to improve learning efficiency. All these ideas are still in their infancy, and we expect that the next generation of text-mining techniques and tools will improve the quality of information and knowledge discovery from text. 11.7 LATENT SEMANTIC ANALYSIS (LSA) LSA is a method that
…
two dimensions. VT can be thought of as a transformation of the original document matrix and can be used for any of a number of text-mining tasks, such as classification and clustering. Improved results may be obtained using the newly derived dimensions, comparing against the data-mining tasks using the original
…
word counts. In most text-mining cases, all training documents (in our case 4) would have been included in matrix A and transformed together. Document five (d5) was intentionally left out
…
the reduced set of two dimensions. Table 11.5 shows the same comparison using cosine similarity to compare documents. Cosine similarity is often used in text-mining tasks for document comparisons. TABLE 11.4. Use of Euclidean Distance to Find Nearest Neighbor to d5 in Both 2-d and 10-d (the
…
MRS. 4. Given the following text documents and assumed decomposition: Document Text A Web-content mining B Web-structure mining C Web-usage mining D Text mining (a) create matrix A by using term counts from the original documents; (b) obtain rank 1, 2, and 3 approximations to the document representations; (c
…
parts (a) and (b). How quickly would you say that the scores converged? Explain. 6. Why is the text-refining task very important in a text-mining process? What are the results of text refining? 7. Implement the HITS algorithm and discover authorities and hubs if the input is the table of
…
and data are, the better the knowledge that can be extracted from them. In the second part of the book, basic concepts and techniques on text mining, Web mining, and Web crawling are introduced. A case study, in the last part of the book, focuses on a search engine prototype called EnviroDaemon
…
understood pseudo-code, and they are suitable for use in real-world, large-scale data-mining projects, including advanced applications such as Web mining and text mining. Mulvenna, M. D., et al., ed., Personalization on the Net Using Web Mining, CACM, Vol. 43, No. 8, 2000. This is a collection of articles
…
Data, Text and Web Mining Software, Kybernetes, Vol. 39, No. 4, 2010, pp. 625–655. The paper reviews and compares selected software for data mining, text mining, and Web mining that are not available as free open-source software. The software for data mining are SAS® Enterprise Miner™, Megaputer PolyAnalyst® 5.0
…
, NeuralWare Predict®, and BioDiscovery GeneSight®. The software for text mining are CompareSuite, SAS® Text Miner, TextAnalyst, VisualText, Megaputer PolyAnalyst® 5.0, and WordStat. The software for Web mining are Megaputer PolyAnalyst®, SPSS Clementine®, ClickTracks, and
…
QL2. The paper discusses and compares the existing features, characteristics, and algorithms of selected software for data mining, text mining, and Web mining, respectively. 12 ADVANCES IN DATA MINING Chapter Objectives Analyze the characteristics of graph-mining algorithms and introduce some illustrative examples. Identify the
…
their effective methods. Additionally, the description of the tasks provides great insight into original approaches to using data mining with clickstream data. A.4.5 Text Mining Reuters-21578 Text Categorization Collection. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html This is a collection of news articles that appeared on Reuters
…
workstations. RapidMiner Publisher: Rapid-I (http://rapid-i.com) Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, that is, for large amounts of structured data-like database systems and unstructured
…
an OLAP front end. EWA Systems Vendor: EWA Systems Inc. (www.ewasystems.com) EWA Systems provide enterprise analytics solutions: Math and statistics libraries, data mining, text mining, optimization, visualization, and rules engine software are all available from one coordinated source. EWA Systems’ ability to tackle such a broad range of analytical solutions
…
). SPAD Vendor: Coheris (www.coheris.fr) SPAD, provides powerful exploratory analyses and data-mining tools, including PCA, clustering, interactive decision trees, discriminant analyses, neural networks, text mining and more, all via user-friendly GUI. Viscovery Data Mining Suite Vendor: Viscovery (www.viscovery.net) The Viscovery® Data Mining Suite offers a selection of
…
World Wide Web: An Information Search Approach, Kluwer Academic Publishers, Boston, MA, 2001. Fan, F., L. Wallace, S. Rich, Z. Zhang, Tapping the Power of Text Mining, Communications of ACM, Vol. 49, No. 9, 2006, pp. 76–82. Garcia, E., SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations
…
Analysis and Data Mining Applications, R. Nisbet, J. Elder, J. F. Elder, G. Miner, eds., Academic Press, Amsterdam, NL, 2009, pp. 151–172. Sirmakessis, S., Text Mining and Its Applications, Springer-Verlag, Berlin, 2003. Zhang, Q., R. S. Segall, Review of Data, Text and Web Mining Software, Kybernetes, Vol. 39, No. 4
…
Support Survey plot Survival data Synapse System identification Tchebyshev distance Temporal data Mining Sequences Time series Test of hypothesis Testing sample Text analysis Text database Text mining Text-refining Time lag (time window) Time series, multivariate Time series, univariate Training sample Transduction Traveling salesman problem (TSP) Trial and error True risk functional
by Jiawei Han, Micheline Kamber and Jian Pei · 21 Jun 2011
Yu, Jeffrey X. Yu, Philip S. Yu, Maria Zemankova, ChengXiang Zhai, Yuanyuan Zhou, and Wei Zou. Deng Cai and ChengXiang Zhai have contributed to the text mining and Web mining sections, Xifeng Yan to the graph mining section, and Xiaoxin Yin to the multirelational data mining section. Hong Cheng, Charios Ermopoulos, Hector
…
such as digital libraries, digital governments, and health care information systems. Their effective search and analysis have raised many challenging issues in data mining. Therefore, text mining and multimedia data mining, integrated with information retrieval methods, have become increasingly important. 1.6. Which Kinds of Applications Are Targeted? Where there are data
…
each group has its own manager. Alternatively, other methods partition data objects hierarchically, where clusters can be formed at different semantic levels. For example, in text mining, we may want to organize a corpus of documents into multiple general topics, such as “politics” and “sports,” each of which may have subtopics, For
…
each object is assigned a probability of belonging to a cluster. Probabilistic model-based clustering is widely used in many data mining applications such as text mining. ■ Clustering high-dimensional data: When the dimensionality is high, conventional distance measures can be dominated by noise. Section 11.2 introduces fundamental methods for cluster
…
. Other topics in multimedia mining include classification and prediction analysis, mining associations, and video and audio data mining (Section 13.2.3). Mining Text Data Text mining is an interdisciplinary field that draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. A substantial portion of information is stored as
…
text such as news articles, technical papers, books, digital libraries, email messages, blogs, and web pages. Hence, research in text mining has been very active. An important goal is to derive high-quality information from text. This is typically done through the discovery of patterns and
…
trends by means such as statistical pattern learning, topic modeling, and statistical language modeling. Text mining usually requires structuring the input text (e.g., parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent
…
). This is followed by deriving patterns within the structured data, and evaluation and interpretation of the output. “High quality” in text mining usually refers to a combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity
…
-relation modeling (i.e., learning relations between named entities). Other examples include multilingual data mining, multidimensional text analysis, contextual text mining, and trust and evolution analysis in
…
text data, as well as text mining applications in security, biomedical literature analysis, online media analysis, and analytical customer relationship management. Various kinds of
…
text mining and analysis software and tools are available in academic institutions, open-source forums, and industry. Text mining often also uses WordNet, Sematic Web, Wikipedia, and other information sources to enhance the understanding and mining of text
…
]; Grossman and Frieder [GR04]; Baeza-Yates and Riberio-Neto [BYRN11]; Zhai [Zha08]; Feldman and Sanger [FS06]; Berry [Ber03]; and Weiss, Indurkhya, Zhang, and Damerau [WIZD04]. Text mining is a fast-developing field with numerous papers published in recent years, covering many topics such as topic models (e.g., Blei and Lafferty [BL09
…
]); sentiment analysis (e.g., Pang and Lee [PL07]); and contextual text mining (e.g., Mei and Zhai [MZ06]). Web mining is another focused theme, with books like Chakrabarti [Cha03a], Liu [Liu06] and Berry [Ber03]. Web mining has
…
. ed. (2008) MIT Press, Cambridge, MA . [Ber81] Bertin, J., Graphics and Graphic Information Processing. (1981) Walter de Gruyter, Berlin . [Ber03] Berry, M.W., Survey of Text Mining: Clustering, Classification, and Retrieval. (2003) Springer, New York . [Bez81] Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms. (1981) Plenum Press . [BFOS84] Breiman, L
…
Techniques: For Marketing, Sales, and Customer Relationship Management. (2004) John Wiley & Sons . [BL09] Blei, D.; Lafferty, J., Topic models, In: (Editors: Srivastava, A.; Sahami, M.) Text Mining: Theory and Applications Taylor and Francis. (2009). [BLC+03] Barbará, D.; Li, Y.; Couto, J.; Lin, J.-L.; Jajodia, S., Bootstrapping a data mining intrusion
…
generalization of on-line learning and an application to boosting, J. Computer and System Sciences 55 (1997) 119–139. [FS06] Feldman, R.; Sanger, J., The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. (2006) Cambridge University Press . [FSGM+98] Fang, M.; Shivakumar, N.; Garcia-Molina, H.; Motwani, R.; Ullman, J.D
…
-SIGMOD Int. Conf. Management of Data (SIGMOD’97) Tucson, AZ. (May 1997), pp. 452–461. [MZ06] Mei, Q.; Zhai, C., A mixture model for contextual text mining, In: Proc. 2006 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’06) Philadelphia, PA. (Aug. 2006), pp. 649–655. [NB86] Niblett, T.; Bratko, I
…
, In: Proc. 4th Int. Conf. Information and Knowledge Management Baltimore, MD. (Nov. 1995), pp. 25–30. [WIZD04] Weiss, S.; Indurkhya, N.; Zhang, T.; Damerau, F., Text Mining: Predictive Methods for Analyzing Unstructured Information. (2004) Springer, New York . [WK91] Weiss, S.M.; Kulikowski, C.A., Computer Systems That Learn: Classification and Prediction Methods
…
14 term-frequency vectors 77 cosine similarity between 78 sparse 77 table 77 terminating conditions 404 test sets 330 test tuples 330 text data 14 text mining 596–597, 624 theoretical foundations 600–601, 625 three-layer neural networks 399 threshold-moving approach 385 tilted time windows 598 timeliness, data 85 time
by Sau Sheong Chang · 27 Jun 2012
Shapiro and Jeff Dasovich’s ratios are pretty low. This means that both Richard and Jeff sent Steve lots of messages, but he rarely replied. Text Mining We’ve done quite a bit of email message counting in this chapter so far. It would be nice if we could go a bit
…
and To email addresses, it pulls out only the date and the text body of the message. Example 5-15. Creating a data source for text mining with messages from Gmail require 'csv' require 'mail' require 'nokogiri' def write_row(mail, csv) data = [] data << mail.date text = "" if mail.text_part text
…
pretty big, depending on how many messages we take. Getting the body of the Enron messages for text mining is very similar, as shown in Example 5-16. Example 5-16. Creating a data source for text mining from Enron email files require 'csv' require 'mail' def write_row(mail, csv) data = [] data << mail
…
, while the sent_txt_data_enron.csv is smaller, at around 5 MB. Now that we have the data, we’ll turn to the actual text mining in R. This script is different from the earlier ones. In the previous scripts, we used mainly the core packages and functions that R provides
…
processing work. In this script, we’ll be taking out the big guns and using one of the more popular text mining packages around, aptly named tm (did you guess it stands for “text mining”? If so, you’re right). What we want to do in this analysis is find out, for each month
…
that dataset. The code we have written might be simple, but the insights could be significant. I have ventured a bit into the territory of text mining, but overall we’ve barely scraped the surface of what could be done. The tm library, for example, is extremely powerful for
…
text mining, and various other text mining packages have been built on it as well. A few things you should take note of (especially for text mining) before you wend your way to mining your mailbox: The Enron dataset was cleaned up
…
be wild and unruly, so your mileage will definitely vary. The Enron dataset comprises office email accounts derived from Exchange and Outlook files. For the text mining section, you will definitely want to tweak the write_row method to give you better results. The message format in the Enron dataset follows that
…
Contact Us conventions used in this book, Conventions Used in This Book cor() function, R, The R Console Core library, Ruby, Requiring External Libraries corpus, Text Mining correlation, R, The R Console CRAN (Comprehensive R Archive Network), Packages CSV (comma-separated value) files, Importing data from text files, The First Simulation–The
…
Day of Week–Number of Messages by Hour of the Day, Interactions–Comparative Interactions, Text Mining–Text Mining charts for, Number of Messages by Day of the Month–Number of Messages by Hour of the Day content of messages, analyzing, Text Mining–Text Mining data for, Grab and Parse–Grab and Parse Enron data for, The Emailing Habits
…
restrooms example) stopwatch, Shoes stopwatch–Shoes stopwatch expressions, R, Programming R external libraries, Ruby, Requiring External Libraries–Requiring External Libraries F factor() function, R, Factors, Text Mining factors, R, Factors–Factors FFmpeg library, Extracting Data from Video, Extracting Data from Video field of vision (FOV), Roids fish, schools of, Schooling Fish and
…
, Ruby, Symbols T table() function, R, Interpreting the Data, Number of Messages by Day of the Month term-document matrix, Text Mining ternary conditional expression, Ruby, if and unless text document, Text Mining text files, Importing data from text files, Importing data from text files, The Emailing Habits of Enron Executives (see also CSV
…
files) email message data in, The Emailing Habits of Enron Executives importing data from, R, Importing data from text files text mining, Text Mining–Text Mining The Grammar of Graphics (Springer
…
), Introducing ggplot2 tm library, Text Mining U Ubuntu system, installing Ruby on, Installing Ruby using your platform’s package management tool UI toolkits, Shoes toolkit, Shoes
by Paul R. Daugherty and H. James Wilson · 15 Jan 2018 · 523pp · 61,179 words
brewed by AI. It translates its customer feedback, directed through Facebook Messenger, into recipe changes, which affect the brew composition over time.a Lenovo uses text-mining tools to listen to customers voicing their problems worldwide. Insights from discussions of those problems then feed into product and service improvements.b Las Vegas
by Matthew A. Russell · 15 Jan 2011 · 541pp · 109,698 words
by Cathy O'Neil and Rachel Schutt · 8 Oct 2013 · 523pp · 112,185 words
by Igor Tulchinsky · 30 Sep 2019 · 321pp
by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle · 2 Aug 2017
by Jan Erik Solem · 26 Jun 2012
by Thomas H. Davenport and Jeanne G. Harris · 6 Mar 2007 · 233pp · 67,596 words
by Peter Gutmann
by Lisa Gitelman · 25 Jan 2013
by Stuart Russell and Peter Norvig · 14 Jul 2019 · 2,466pp · 668,761 words
by Thomas H. Davenport and Jinho Kim · 10 Jun 2013 · 204pp · 58,565 words
by Glyn Moody · 26 Sep 2022 · 295pp · 66,912 words
by Q. Ethan McCallum · 14 Nov 2012 · 398pp · 86,855 words
by Thomas H. Davenport · 4 Feb 2014
by Eric Enge, Stephan Spencer, Jessie Stricchiola and Rand Fishkin · 7 Mar 2012
by Douglas B. Laney · 4 Sep 2017 · 374pp · 94,508 words
by Samuel Arbesman · 31 Aug 2012 · 284pp · 79,265 words
by Eric Topol · 1 Jan 2019 · 424pp · 114,905 words
by Matthew G. Kirschenbaum · 1 May 2016 · 519pp · 142,646 words
by Toby Segaran and Jeff Hammerbacher · 1 Jul 2009
by Steven Bird, Ewan Klein and Edward Loper · 15 Dec 2009 · 504pp · 89,238 words
by Jared Sullivan · 15 Oct 2024 · 545pp · 147,673 words
by James Pustejovsky and Amber Stubbs · 14 Oct 2012 · 502pp · 107,510 words
by Don Tapscott and Anthony D. Williams · 28 Sep 2010 · 552pp · 168,518 words
by Drew Conway and John Myles White · 10 Feb 2012 · 451pp · 103,606 words
by Jonathan Gray, Lucy Chambers and Liliana Bounegru · 9 May 2012
by Henry Grabar · 8 May 2023 · 413pp · 115,274 words
by Greg Nudelman and Pabini Gabriel-Petit · 8 May 2011
by Drew Conway and John Myles White · 25 Oct 2011 · 163pp · 42,402 words
by Tom Chivers and David Chivers · 18 Mar 2021 · 172pp · 51,837 words
by Dipanjan Sarkar · 1 Dec 2016
by Joshua B. Smith · 30 Sep 2006
by Ron Jeffries · 14 Aug 2015 · 444pp · 118,393 words
by Leslie Sikos · 10 Jul 2015
by Drew Neil · 2 May 2018 · 241pp · 43,252 words