description: standardized and organized sets of words and phrases for retrieval and disambiguation of information, distinguishing preferred terms from non-preferred terms
19 results
Business Metadata: Capturing Enterprise Knowledge
by
William H. Inmon
,
Bonnie K. O'Neil
and
Lowell Fryman
Published 15 Feb 2008
Some of these concepts, such as taxonomy and glossaries, have been covered in other chapters, especially in Chapter 4. Most of these frameworks are geared toward the machine part of the equation and often leave the humans confused, which means they are ignoring business metadata. 11.4.1 Controlled Vocabulary A controlled vocabulary (CV) provides a means to restrict term usage to those terms specified in the controlled vocabulary. A CV often includes “preferred terms” that should be used instead of the referenced term. Preferred terms keep search engines from having to reference all terms and can therefore add speed. One of my clients, going through a very lengthy and painful migration to a new CRM system, wanted to phase out all the legacy terms used before the 11.4 Attempts to Capture Semantics: Semantic Frameworks 201 From Search to Knowing Strong Semantics Higher Order Logic 2nd Order Logic Model Logic Logical Theory First Order Logic Description Logic Conceptual Model OWL Semantic interoperability UML RDF/S Thesaurus Topic Map Structural interoperability ER Model DB Schema, XML Schema Taxonomy Syntactic interoperability Relational Model, XML List Glossary Weak Controlled Vocabulary Semantics Recovery Source: Dr.
…
(In other words, if you don’t understand what a term means that is used in a definition, you look it up.) 4.3.1 Components of a Definition Use of controlled vocabulary and thesaurus concepts helps software manage glossaries and can also empower enterprise search capabilities. They can also assist us in writing more complete and comprehensive definitions. Acronyms and specialized terms are indicated in italics and parentheses. For more information about controlled vocabularies and thesauri, see Chapter 11, Semantics. Here are the components of a well-written definition. Not all components are required for every definition, but the more you have, the more precise the definition. 1.
…
Dictionaries, indexes, and search engines often have a “SEE ALSO” section that lists related terms. 62 Chapter 4 Business Metadata, Communication, and Search 8. Synonyms: Terms that mean nearly the same thing as the term being defined. Controlled vocabularies often handle synonyms using a synonym ring, defining one term as the Preferred Term (PT). The language “USE FOR” is indicated in a controlled vocabulary when preferred terms are used. Example: Suppose Division and Department mean the same thing in an organization; they are synonyms, but suppose Division was the Preferred Term. When you look up Department, the glossary will state “Division: USE FOR Department.”
Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data
by
Leslie Sikos
Published 10 Jul 2015
Knowledge Representation and Reasoning is the field of Artificial Intelligence (AI) used to represent information in a machine-readable form that computer systems can utilize to solve complex tasks. Taxonomies or controlled vocabularies are structured collections of terms that can be used as metadata element values. For example, an events vocabulary can be used to describe concerts, lectures, and festivals in a machine-readable format, while an organization vocabulary is suitable for publishing machine-readable metadata about a school, a corporation, or a club. The controlled vocabularies are parts of conceptual data schemas (data models) that map concepts and their relationships. The most widely adopted knowledge-management standards are the Resource Description Framework (RDF), the Web Ontology Language (OWL), and the Simple Knowledge Organization System (SKOS).
…
Class Disjointness in OWL <owl:Class rdf:about="VirtualKeyboard"> <owl:equivalentClass rdf:resource="#Softquerty" /> <owl:disjointWith rdf:resource="Keyboard" /> </owl> Simple Knowledge Organization System (SKOS) Simple Knowledge Organization System (SKOS) is a W3C recommendation for representing taxonomies, thesauri, classification schemes, subject-heading systems, and structured controlled vocabularies. Being one of the most frequently implemented Semantic Web standards in industrial applications, SKOS is built upon RDF and RDFS to enable easy publication of controlled vocabularies as linked data. RDF provides interoperability, consistency, and integrity and allows knowledge organization systems to be used in distributed, decentralized metadata applications where metadata are retrieved from multiple resources.
…
The schemas defining the most common concepts of a field of interest, the relationships between them, and related individuals are collected by semantic knowledge bases. These schemas are the de facto standards used by machine-readable annotations serialized in RDFa, HTML5 Microdata, or JSON-LD, as well as in RDF files of Linked Open Data datasets. Vocabularies and Ontologies Controlled vocabularies of the Semantic Web collect concepts and terms used to describe a field of interest or area of concern. Ontologies are more complex, very formal definitions of terms, individuals and their properties, object groups (classes), and relationships between individuals suitable to describe virtually any statement related to the field of interest in a machine-readable form.
Designing Search: UX Strategies for Ecommerce Success
by
Greg Nudelman
and
Pabini Gabriel-Petit
Published 8 May 2011
Spelling correction and substituting a customer’s original keywords with different keywords from a controlled vocabulary 2. Removing some of a customer’s original keywords, or making partial matches 3. Matching only categories or aspects, without the keywords 4. Top searches, featured results, or most popular results 5. Third-party resources and ads You already looked at a good example of controlled vocabulary keyword substitution: the Google Did you mean… feature shown in Figure 1-1. To provide relevant content, Google draws upon its enormous list of the indexed keywords for which people have previously searched, which forms its controlled vocabulary, to suggest the closest matching available keyword.
…
Google’s brilliant auto-suggest feature, another industry-defining innovation shown in Figure 1-9, is an excellent example of a successful marriage of two content strategies: making partial keyword matches and using a controlled vocabulary for keyword substitution. Figure 1-9: Google auto-suggest prevents a no search results condition from occurring When automatically suggesting search keywords, Google chooses the top keywords from its controlled vocabulary of the most popular keywords. By matching the beginning of a string the customer begins to type with popular search keywords, Google ensures a successful search, forestalling the no search results condition before it ever occurs.
…
One critical innovation was the Google Did you mean… feature, which gave the process of discovery a safety net and made exploration more fun. This feature was the result of deliberate and original thinking about how to help people correct the misspellings that were a common cause of the appearance of the no search results page. Controlled vocabulary substitution redefined the way Google does search, and today, the Did you mean… feature shown in Figure 1-1 is a virtual necessity for a successful search implementation. Figure 1-1: The Google Did you mean… feature If you want to create a killer search app, begin with the no search results page.
Understanding search engines: mathematical modeling and text retrieval
by
Michael W. Berry
and
Murray Browne
Published 15 Jan 2005
For more general (or varied) vocabularies (e.g., popular magazines and encyclopedias), simple term frequencies (£**) may be sufficient. Binary term frequencies (b**) are useful when the term list (or row dimension of the term-by-document matrix) is relatively short, such as the case with controlled vocabularies. Choosing the global weighting factor (gi) should take into account the state of the document collection. By this we mean how often the collection is likely to change. Adjusting the global weights in response to new vocabulary will impact all the corresponding rows of the term-by-document matrix.
…
WURMAN, Information Architects, Graphis Press Corporation, Zurich, 1996. [88] X. ZHANG, M. BERRY, AND P. RAGHAVAN, Level search schemes for information and filtering, Information Processing and Management, 37 (2001), pp. 313-334. This page intentionally left blank Index anthropomorphism, 92 Arnold!, 58, 98 Ask Jeeves, 17, 91 controlled vocabularies, 37 coordinates, 43, 57 Cornell University, 4, 20, 77 cosine, 33, 36, 43, 49, 57 threshold, 33, 50 Baeza-Yates, 98, 101 banded, 38 basis, 46, 47, 61 Bayesian models, 6 Berry, 98 bitmap, 27 blogs, 78 Boolean, 64, 65, 68 operator, 10, 64, 65, 94 query, 65 search, 7, 65 Brin, 78, 84 data compression, 20 dense matrix, 58 diagonal, 48, 51 dictionary, 21, 23, 30 disambiguation, 19 document file, 3, 11, 21 document purification, 12 Drmac, 98 Dumais, 98 dynamic collections, 59 C, 90 C++, 90 Cinahl Information Systems, 16 Cleverdon, 3 clustering, 38, 43 column pivoting, 47-49 space, 46, 51, 52, 59, 61 compressed column storage (CCS), 38, 39 compressed row storage (CRS), Eckart and Young, 52 Euclidean distance, 6 Faloutsos, 98 Fierro, 98 FLAMENCO, 101 folding-in, 59, 61 formal public identifier (FPI), 14 Fortran, 58 Frakes, 98 fuzzy, 67 38, 39 genetic algorithms, 75 geometric relationships, 55 contiguous word phrase, 24, 25, 64, 67 113 Index 114 Golub, 98 Google, 15, 17, 18, 78, 84, 85, 87, 88, 91, 100 Googlebombing, 78 Kolda, 98 Korfhage, 11, 71, 72, 75, 97 Kowalski, 3, 19, 23, 67, 97 Kumar, 100 H.W.
The Invisible Web: Uncovering Information Sources Search Engines Can't See
by
Gary Price
,
Chris Sherman
and
Danny Sullivan
Published 2 Jan 2003
The standard that seems most likely to achieve something close to universal adoption is RDF (Resource Description Framework), which uses the syntax of XML (Extensible Markup Language). The goal of all metadata standards proposals is to go beyond machine-readable data and create machineunderstandable data on the Web. Among other things, they provide the capability to introduce controlled vocabulary (often organized in thesaurus form) into the search equation. A controlled vocabulary can bring different terms, jargon, and concepts together. Though the standard will provide a structure for describing, classifying, and managing Web documents, it has its own set of vulnerabilities, and not everyone is sanguine about its prospects.
…
A form of computing where data resides on many decentralized computers (servers) and is accessed and manipulated by programs called clients residing on users’ computers. controlled vocabulary (thesaurus). A standardized set of terms used to describe similar items. Web-based information about soft drinks may be indexed under such terms as “soda,” “soda-pop,” “pop,” “cola,” “carbonated beverages,” “soft drinks,” and even brand names like “Coke.” A controlled vocabulary links all these terms so that a keyword search on any one of them provides results for all. crawler (Web crawler, spider). A software robot used by search engines to autonomously find and retrieve Web pages to be included in a search engine’s index. database aggregators.
Sorting Things Out: Classification and Its Consequences
by
Geoffrey C. Bowker
and
Susan Leigh Star
Published 25 Aug 2000
Figure 2.4a: In 1913 it was still possible to die of being worn out. Source: Department of Commerce, Bureau of the Census, Manual of the International List of Causes of Death, Department of Commerce, US Bureau of the Census, Washington, DC: Government Printing Office, 1913: 131. Figure 2.4b: The problem of controlled vocabulary: this list shows terms in common use to be avoided in favor of more technical medical terms. Source: Edward T. Thompson, Textbook and Guide to the Standard Nomenclature of Diseases and Operations (Chicago: Physicians’ Record Co., 1 958), pp. 247-249. Medical Terminology an Interesting Study A knowledge of medical terminology will make the tasks of the medical record librarian much easier.
…
The past, we are told, is recreated afresh at each instant in the present; one role of the historian is to honor this openness while telling the best story one can (Serres 1993, C. Becker 1967). Modern medical classification systems, most particularly the ICD rival SNOMED (Systeme de Nomenclature Medicale) strive in precisely the same way to keep the past open. Ideally, they would become topological, but with an ease of management, data entry, and controlled vocabulary preserved. Thus far, this goal has proved elusive. To tell the story as one internal to the history of medicine, consider the problem of tracking AIDS through history. AIDS achieved recognition as a disease in a slow process. Gay and sexual politics, medical profit making, and medical research were embroiled together in both its definition and its control.
…
There are a thousand “controlled medical vocabularies” for a thousand purposes, many of them having embedded within them some version or other of the ICD. As one article put it: “We are often reminded that medical knowledge has grown to the point where we require the assistance of computers to manage it. One response has been the construction of controlled vocabularies to facilitate this process. We are now at the point where the vocabularies themselves have reached unmanageable proportions and must again call on computers for help” (Cimono et al. 1989, 517). The call now is for a unified medical language system (UMLS) that will provide for automatic, flexible communication among all authorized controlled medical vocabularies.
Producing Open Source Software: How to Run a Successful Free Software Project
by
Karl Fogel
Published 13 Oct 2005
Getting Started Starting From What You Have Choose a Good Name Have a Clear Mission Statement State That the Project is Free Features and Requirements List Development Status Downloads Version Control and Bug Tracker Access Communications Channels Developer Guidelines Documentation Demos, Screenshots, Videos, and Example Output Hosting Choosing a License and Applying It The "Do Anything" Licenses The GPL How to Apply a License to Your Software Setting the Tone Avoid Private Discussions Nip Rudeness in the Bud Codes of Conduct Practice Conspicuous Code Review Be Open From Day One Opening a Formerly Closed Project Announcing 3. Technical Infrastructure What a Project Needs Web Site Canned Hosting Mailing Lists / Message Forums Choosing the Right Forum Management Software Version Control Version Control Vocabulary Choosing a Version Control System Using the Version Control System Receiving and reviewing contributions Bug Tracker Interaction with Email Pre-Filtering the Bug Tracker IRC / Real-Time Chat Systems IRC Bots Archiving IRC Wikis Wikis and Spam Choosing a Wiki Q&A Forums Translation Infrastructure Social Networking Services 4.
…
In particular, setting up commit notifications is extremely useful. The effect of commit notifications is that every time someone commits a change to the central repository, an email or other subscribable notification goes out showing the log message and diffs (unless the diff is too large; see diff, in the section called “Version Control Vocabulary”). The review itself might take place on a mailing list, or in a review tool such as Gerrit or the GitHub "pull request" interface. See the section called “Commit notifications / commit emails” in Chapter 3, Technical Infrastructure for details. Case study In the Subversion project, we did not at first make a regular practice of code review.
…
This section does not discuss all aspects of using a version control system. It's so all-encompassing that it must be addressed topically throughout the book. Here, we will concentrate on choosing and setting up a version control system in a way that will foster cooperative development down the road. Version Control Vocabulary This book cannot teach you how to use version control if you've never used it before, but it would be impossible to discuss the subject without a few key terms. These terms are useful independently of any particular version control system: they are the basic nouns and verbs of networked collaboration, and will be used generically throughout the rest of this book.
The Card Catalog: Books, Cards, and Literary Treasures
by
Library Of Congress
and
Carla Hayden
Published 3 Apr 2017
While still an undergraduate at Amherst, Dewey was obsessed with bringing order to the school’s library, and he recounted that while daydreaming during a long lecture one day, “without hearing a word, my mind absorbed in the vital problem, the solution flasht over me so that I jumpt in my seat and came very near shouting ‘Eureka!’” Dewey’s revolutionary approach to cataloging was a library classification system based on a controlled vocabulary of subject headings, represented by numerical values that could be subdivided further by decimals. Thus was born the Dewey Decimal Classification, a system that borrowed generously from Bacon, Jewett, and Cutter and attempted to encapsulate all knowledge in ten distinct classes. It immediately caught on and expanded Dewey’s influence within the library community.
Sorting Things Out: Classification and Its Consequences (Inside Technology)
by
Geoffrey C. Bowker
Published 24 Aug 2000
No attempt is made here to list cross references to acceptable terms as these terms should immediately be referred back to the clinician for statement of diagnosis. Abdominal adipose Acute abdomen Abdominal hernia Adrenal crisis Aborted lochia Anterior chest-wall syndrome Abortion emesis Apoplexy Abortus fever Appendiceal colic Figure 2.4b The problem of controlled vocabulary: this list shows terms in common use to be avoided in favor of more technical medical terms. 248 TEXTBOOK AND GUIDE TO STANDARD NOMENCLATURE Arteriosclerotic peripheral vascular disease Athlete's foot Hepatic flexure syndrome Hobnail l iver Hydrocephalus, external August fever Hydrocephalus, internal Barber's itch Hydrocephalus, primary Blue baby Hydrocephalus, secondary Bed sores Housewife's dermatitis Blighted ovum Hydrops fetalis Brittle nails Hypersplenism Burst belly Hypertensive crisis Carcinoid Hyperventilation Cardiac asthma Hypotensive syndrome Cardiac cirrhosis Icterus neonatorum Cardiovascular ren al disease Indigestion Catarrhal j aundice Infantile colic Cerebral accident Intervertebral disc syndrome Cervical occipital syndrome Chicleros disease Intracranial tumor Iron-storage disease Combat fatigue Jeep disease Consumption Coronary infarction Jitter legs Jockey itch Coughing disease Kissing spine Cow-horn stomach La grippe D eer-fly fever Lice infestation Desert rheumatism Lipoid nephrosis Devil's grippe Lipping spine Dhobie itch Liver spots Diver's paralysis Lockjaw Dust consumption Dysinsulinism Low leg syndrome Dyskeratosis Low reserve kidney Engorged breasts Lumbar disc syndrome Epicondylitis Lumpy j aw Louping ill Epidemic summer disease Miner's nystagmus Fetal distress Mazoplasia Fetal erythroblastosis Milk leg Field fever Miner's asthma Frozen shoulder Morbus caeruleus Gastric crisis Mud fever Glass-blower's cataract Myocardial fatigue Grinder's consumption Myocardial ischemia Gym itch Neurasthenia Source: Edward T.
…
The past, we are told, is recreated afresh at each instant in the present; one role of the historian is to honor this openness while telling the best story one can (Serres 1 993, C. Becker 1 967) . Modern medical classification systems , most particu larly the lCD rival SNOMED (Systeme de Nomenclature Medicale) strive in precisely the same way to keep the past open. Ideally, they would become topological, but with an ease of management, data entry, and controlled vocabulary preserved. Thus far, this goal has proved elusive. To tell the story as one internal to the history of medicine, consider the problem of tracking AIDS through history. AIDS achieved recog nition as a disease in a slow process. Gay and sexual politics, medical profit making, and medical research were embroiled together in both its definition and its control.
…
There are a thousand "controlled medical vocabularies" for a thousand purposes, many of them having embedded within them some version The !CD as Information Infrastructure 129 or other of the lCD. As one article put it: " We are often reminded that medical knowledge has grown to the point where we require the assistance of computers to manage it. One response has been the construction of controlled vocabularies to facilitate this process. We are now at the point where the vocabularies themselves have reached unmanageable proportions and must again call on computers for help" (Cimono et al. 1 989, 5 1 7) . The call now is for a unified medical language system (UMLS) that will provide for automatic, flexible com munication among all authorized controlled medical vocabularies.
Designing Social Interfaces
by
Christian Crumlish
and
Erin Malone
Published 30 Sep 2009
Examples of this include Flickr’s well-known free tagging feature, which enables users to tag their own objects and gives users the option of permitting others to tag them as well (see Figure 2-3). Download at WoweBook.Com Deliberately Leave Things Incomplete 21 Figure 2-3. There’s no way the designer of a social application can anticipate every tag a user might want to apply. What controlled vocabulary, for instance, would ever include a tag called “thehairofchrisheilmann”? Another free-form taxonomy element inherent in Flickr’s design is the unlimited ability to create groups with any conceivable name or purpose. This feature involves a number of patterns we’ll discuss presently, including the concept of a group, ridiculously easy group formation, discussions, joining, invitation, and the ability to add media objects to a group’s “pool.”
…
Download at WoweBook.Com 202 Chapter 7: Hunters Gather Figure 7-16. LibraryThing.com indicates that tags need to be separated with a comma rather than a space and gives an example. • For more robust social engagement, allow connections and/or friends to tag objects in a collection. • Don’t be afraid to mix a controlled vocabulary (defined by the site architects) and user-generated tags. Recommendations Adding tagging to objects as a product feature should offer a benefit to the user. Do the tags help her find and manage her collection? Do the tags tighten the circle of community? Tagging as a user activity is more successful when there is a payoff to the users and their friends.
Content Everywhere: Strategy and Structure for Future-Ready Content
by
Sara Wachter-Boettcher
Published 28 Nov 2012
Well, when you allow authors to tag content with whatever words they choose and use that as a major form of metadata, you’re going to have lots of inconsistencies: tags that were used once, tags that are similar to one another but written differently, tags that are so popular they’re used for everything. Freeform tags—often visualized as a tag cloud, as shown in Figure 5.1—can be useful for classifying large amounts of information for user retrieval or even defining what your controlled vocabulary should be in the first place, but they often won’t serve your needs when creating content systems that are interconnected and rule-based, like we’re doing today. Because in order to build logic around a tag, that tag must be used consistently across the entire system—something this anything-goes approach is notoriously bad at.
Learning SPARQL
by
Bob Ducharme
Published 15 Jul 2011
Note The Potrzebie System of Weights and Measures was developed by noted computer scientist Donald Knuth. He published it as a teenager in Mad Magazine in 1957, so it is not considered normative. A single potrzebie is the thickness of Mad magazine issue number 26. The use of non-XSD types in RDF is currently most common in data using the SKOS standard for controlled vocabularies. In SKOS, the skos:notation property names an identifier for a concept that is often a legacy value from a different thesaurus expressed as a cryptic numeric sequence (for example, “920” to represent biographies in the library world’s Dewey Decimal System), unlike the concept’s skos:prefLabel property that provides a more human-readable name.
Cataloging the World: Paul Otlet and the Birth of the Information Age
by
Alex Wright
Published 6 Jun 2014
By helping 38 T he L ibraries o f B abel libraries expand their collections and make their material more widely available, they would serve more patrons and thereby exert a growing influence over society at large. Dewey’s classification serves as a case study in industrial management techniques. It rests on two conceptual foundations: first, a tightly controlled vocabulary of subject headings; second, an artificial notation that relies on numbers, letters, and other symbols to organize books into nine top-level “classes,” each with a corresponding beginning number. Each class is further subdivided into ten “divisions,” which in turn are subdivided into 1,000 distinct headings.
RDF Database Systems: Triples Storage and SPARQL Query Processing
by
Olivier Cure
and
Guillaume Blin
Published 10 Dec 2014
OWL 2 QL supports the following axioms: subclass axioms, class expression equivalence and disjointedness, inverse object properties, property inclusion (not involving property chains), property equivalence, property domain and range, disjoint, symmetric, (ir) reflexive, asymmetric properties, assertions other than individual equality assertions, and negative property assertions. Compared to OWL2 EL, OWL2 QL is particularly adapted to knowledge bases characterized by a large Abox and a relatively small TBox with an expressiveness corresponding to a UML class diagram or an entity relationship schema. 3.5.5 SKOS There exists an important number of controlled vocabularies, taxonomies, folksonomies, subject heading systems, or thesauri that are being used within organizations, such as the Library of Congress Subject Headings (http://id.loc.gov/authorities/subjects.html). Although serving applications in an efficient manner, these knowledge organization systems (KOS) do not provide an exchange or linking facilities and are hard to distribute across the Web.
AI in Museums: Reflections, Perspectives and Applications
by
Sonja Thiel
and
Johannes C. Bernhardt
Published 31 Dec 2023
Furthermore, it turned out that a lot of work was required to improve the quality of the datasets as well as the infrastructure, and efforts were therefore made to identify suitable solutions and processes supported by AI technologies, mainly led by the cultural heritage data expert and developer Etienne Posthumus.22 A transfer to Linked Open Data and quality improvements in the application of IIIF, unique IDs, or LongLat codes to the collection thus helped to improve the quality of the datasets in the long-term perspective and facilitate better research possibilities in future. Here we explored the possibilities of training a language model with controlled vocabulary such as ICONCLASS, in addition to vocabulary already used in the collection. Good old-fashioned AI (GOFAI), ‘pragmatic AI’, or newer multimodal approaches were chosen, depending on the purpose. The experimental space was particularly helpful, because it helped the institutions to learn about specific AI-related methodologies and constraints and opened up a space for comparison, where the stakeholders could assess the values of AI solutions in comparison with other methods.23 The gap between research-oriented developments, data-driven heritage experts, user needs, and the professional needs of a museum could also be observed.
Learning SPARQL
by
Bob Ducharme
Published 22 Jul 2011
Note The Potrzebie System of Weights and Measures was developed by noted computer scientist Donald Knuth. He published it as a teenager in Mad Magazine in 1957, so it is not considered normative. A single potrzebie is the thickness of Mad magazine issue number 26. The use of non-XSD types in RDF is currently most common in data using the SKOS standard for controlled vocabularies. In SKOS, the skos:notation property names an identifier for a concept that is often a legacy value from a different thesaurus expressed as a cryptic numeric sequence (for example, “920” to represent biographies in the library world’s Dewey Decimal System), unlike the concept’s skos:prefLabel property that provides a more human-readable name.
Beautiful Data: The Stories Behind Elegant Data Solutions
by
Toby Segaran
and
Jeff Hammerbacher
Published 1 Jul 2009
Think for a moment about why this is the case: even assuming that everyone had agreed on a schema and a mechanism for querying, there’s no guarantee that people would use the same nomenclature to describe their experiments. What are the correct fields to use? And how do you search for, say, a “lung cancer” experiment when another researcher might have described it as an “adenocarcinoma”? Many working groups have emerged to try to create a controlled vocabulary and fixed schema to make experiments easier to find, but so far none have completely cracked this problem. Biotech is actually way ahead of the game, having at least identified the problem and made serious industry-wide attempts to solve it. At the other end of the spectrum, we’ve recently had spectacular failures of investment banks all over the world where no one had any idea what positions their traders held, and traders themselves had no way of knowing whether they held an opposing position to someone sitting across the room.
Natural language processing with Python
by
Steven Bird
,
Ewan Klein
and
Edward Loper
Published 15 Dec 2009
OLAC: Open Language Archives Community The Open Language Archives Community, or OLAC, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practices for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. OLAC’s home on the Web is at http: //www.language-archives.org/. OLAC Metadata is a standard for describing language resources. Uniform description across repositories is ensured by limiting the values of certain metadata elements to the use of terms from controlled vocabularies. OLAC metadata can be used to describe data and tools, in both physical and digital formats. OLAC metadata extends the 11.6 Describing Language Resources Using OLAC Metadata | 435 Dublin Core Metadata Set, a widely accepted standard for describing resources of all types. To this core set, OLAC adds descriptors to cover fundamental properties of language resources, such as subject language and linguistic type.
The Art of SEO
by
Eric Enge
,
Stephan Spencer
,
Jessie Stricchiola
and
Rand Fishkin
Published 7 Mar 2012
Supplemental index Google’s supplemental index is a secondary database containing supplemental results pages that are deemed to be of lesser importance by Google’s algorithm or are less trusted. These are pages that are less likely to show up in search results. Tagging, tags Simple word descriptions used to categorize content. Target audience The market to whom advertisers wish to sell their products or services. Taxonomy Classification system of controlled vocabulary used to organize topical subjects, usually hierarchical in nature. Text link A plain HTML link that does not involve graphic or special code such as Flash or JavaScript. Theme The main keyword focus of a web page. Thin affiliate An affiliate site that provides little value-added content.