linked data

back to index

description: a method of structuring data to enable interlinking and become more useful through semantic queries

58 results

Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data
by Leslie Sikos
Published 10 Jul 2015

For example, tabular data in HTML with RDFa annotation using URIs and semantic properties is five-star data. Maximum reusability and machine-interpretability. The expression of rights provided by licensing makes free data reuse possible. Linked Data without an explicit open license1 (e.g., public domain license) cannot be reused freely, but the quality of Linked Data is independent from licensing. When the specified criteria are met, all five ratings can be used both for Linked Data (for Linked Data without explicit open license) and Linked Open Data (Linked Data with an explicit open license). As a consequence, the five-star rating system can be depicted in a way that the criteria can be read with or without the open license.

More and more universities provide information about staff members, departments, facilities, courses, grants, and publications as Linked Data and RDF dump, such as the University of Florida (http://vivo.ufl.edu) and the Ghent University (http://data.mmlab.be/mmlab). Libraries such as the Princeton University Library (http://findingaids.princeton.edu) publish bibliographic information as Linked Data. Part of the National Digital Data Archive of Hungary is available as Linked Data at http://lod.sztaki.hu. Even Project Gutenberg is available as Linked Data (http://wifo5-03.informatik.uni-mannheim.de/ gutendata/). Museums such as the British Museum publish some of their records as Linked Data (http://collection.britishmuseum.org).

Twitter Card Annotation in the Markup <meta name="twitter:card" content="summary" /> <meta name="twitter:site" content="@lesliesikos" /> <meta name="twitter:creator" content="@lesliesikos" /> <meta property="og:url" content="http://www.lesliesikos.com/linked-data-platform-1-0  standardized/" /> <meta property="og:title" content="Linked Data Platform 1.0 Standardized" /> <meta property="og:description" content="The Linked Data Platform 1.0 is now a W3C  Recommendation, covering a set of rules for HTTP operations on Web resources, including  RDF-based Linked Data, to provide an architecture for read-write Linked Data on the  Semantic Web." /> <meta property="og:image" content="http://www.lesliesikos.com/img/LOD.svg" /> 211 Chapter 8 ■ Big Data Applications IBM Watson IBM Watson’s DeepQA system is a question-answering system originally designed to compete with contestants of the Jeopardy!

pages: 315 words: 70,044

Learning SPARQL
by Bob Ducharme
Published 15 Jul 2011

For example, simply knowing that “spouse” is a symmetric term made it possible to find out the identity of Cindy’s spouse, even though this fact was not part of the dataset. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Using the Labels Provided by DBpedia, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags checking, adding, and removing, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia LCASE(), String Functions LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked Data, Linked Data, Linked Data, Public Endpoints, Private Endpoints, Public Endpoints, Private Endpoints, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Glossary M MAX(), Finding the Smallest, the Biggest, the Count, the Average...

o as variable names, Searching for Strings [], Blank Nodes and Why They’re Useful (see square braces) ^ in property paths, Searching Further in the Data ^^ datatype indicator, Datatypes and Queries _ in blank node names, Blank Nodes and Why They’re Useful | in property paths, Searching Further in the Data || in boolean expressions, Program Logic Functions “"” to delimit strings in Turtle and SPARQL, Representing Strings A a (“a”) as keyword, Reusing and Creating Vocabularies: RDF Schema and OWL abs(), Numeric Functions addition, Comparing Values and Doing Arithmetic AGROVOC thesaurus, Datatypes and Queries APIs, SPARQL, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT arithmetic, Comparing Values and Doing Arithmetic, Comparing Values and Doing Arithmetic ARQ SPARQL processor, Querying the Data, Standalone Processors application development and, Standalone Processors AS, Combining Values and Assigning Values to Variables ASK, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Defining Rules with SPARQL, Defining Rules with SPARQL SPARQL rules and, Defining Rules with SPARQL, Defining Rules with SPARQL asterisk, Searching for Strings, Searching Further in the Data in property paths, Searching Further in the Data in SELECT expression, Searching for Strings AVG(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups B bad data, finding, Finding Bad Data, Using Existing SPARQL Rules Vocabularies BASE, Node Type Conversion Functions Berners-Lee, Tim, Why Learn SPARQL?, What Exactly Is the “Semantic Web”?, Linked Data Linked Data and, Linked Data biggest value, finding, Finding the Smallest, the Biggest, the Count, the Average..., Finding the Smallest, the Biggest, the Count, the Average... BIND, Combining Values and Assigning Values to Variables, Creating New Data, Comparing Values and Doing Arithmetic in CONSTRUCT queries, Creating New Data binding, More Realistic Data and Matching on Multiple Triples, Glossary, Glossary blank nodes, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Searching with Blank Nodes, Using Existing SPARQL Rules Vocabularies, Node Type Conversion Functions, Glossary searching with, Searching with Blank Nodes square braces to represent, Using Existing SPARQL Rules Vocabularies bnode, Blank Nodes and Why They’re Useful (see blank nodes) boolean datatype, Datatypes and Queries bound(), Finding Data That Doesn’t Meet Certain Conditions, Node Type and Datatype Checking Functions C cast, Glossary casting, Functions ceil(), Numeric Functions CGI scripts, SPARQL and Web Application Development classes, Reusing and Creating Vocabularies: RDF Schema and OWL, Reusing and Creating Vocabularies: RDF Schema and OWL, Creating New Data subclasses and, Reusing and Creating Vocabularies: RDF Schema and OWL CLEAR, Deleting Data COALESCE(), Program Logic Functions comma, Storing RDF in Files, Converting Data CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files comma separated values, Standalone Processors comments (in Turtle and SPARQL), The Data to Query CONCAT(), Program Logic Functions CONSTRUCT, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Copying Data, Converting Data, Changing Existing Data prototyping update commands with, Changing Existing Data CONTAINS(), String Functions, String Functions, Extension Functions converting data, Converting Data, Converting Data copying data, Copying Data, Copying Data COUNT(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups CSS, SPARQL and Web Application Development curl utility, SPARQL and Web Application Development D D2RQ, Querying a Remote SPARQL Service, Middleware SPARQL Support data cleanup, FILTERing Data Based on Conditions data typing, Data Typing, Data Typing datatype(), Defining Rules with SPARQL, Node Type and Datatype Checking Functions datatypes, Datatypes and Queries, Datatype Conversion, Datatype Conversion converting, Datatype Conversion, Datatype Conversion custom, Datatypes and Queries date datatype, Datatypes and Queries date ranges in queries, Comparing Values and Doing Arithmetic dateTime datatype, Datatypes and Queries day(), Date and Time Functions DBpedia, Querying a Public Data Source, Using the Labels Provided by DBpedia, SPARQL and Web Application Development querying, Querying a Public Data Source decimal datatype, Datatypes and Queries default graph, Querying Named Graphs, Glossary DELETE, Deleting Data DELETE DATA, Deleting Data, Deleting Data DELETE vs., Deleting Data DESC(), Sorting Data DESCRIBE, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Asking for a Description of a Resource DISTINCT, Eliminating Redundant Output, Eliminating Redundant Output, Querying Named Graphs division, Comparing Values and Doing Arithmetic double precision datatype, Datatypes and Queries DROP, Dropping Graphs Dublin Core, URLs, URIs, IRIs, and Namespaces, Changing Existing Data, Glossary E ENCODE_FOR_URI(), String Functions entailment, The SPARQL Specifications, Glossary F FILTER, Searching for Strings, FILTERing Data Based on Conditions, FILTERing Data Based on Conditions float datatype, Datatypes and Queries floor(), Numeric Functions FOAF (Friend of a Friend), URLs, URIs, IRIs, and Namespaces, Storing RDF in Files, Converting Data, Hash Functions, Glossary hash functions in, Hash Functions Freebase, SPARQL and Web Application Development FROM, Querying the Data, Querying Named Graphs, Copying Data in CONSTRUCT queries, Copying Data FROM NAMED, Querying Named Graphs Fuseki, Getting Started with Fuseki, Getting Started with Fuseki, Adding Data to a Dataset loading data into, Adding Data to a Dataset shutting down, Getting Started with Fuseki starting up, Getting Started with Fuseki G GRAPH, Querying Named Graphs, Querying Named Graphs, Querying Named Graphs, Copying Data, Named Graphs in CONSTRUCT queries, Copying Data in update queries, Named Graphs referencing graphs not named in FROM NAMED clause, Querying Named Graphs variables with, Querying Named Graphs graph pattern, More Realistic Data and Matching on Multiple Triples, Glossary graphs (RDF), More Realistic Data and Matching on Multiple Triples, Glossary GROUP BY, Grouping Data and Finding Aggregate Values within Groups GROUP_CONCAT(), Finding the Smallest, the Biggest, the Count, the Average...

pages: 511 words: 111,423

Learning SPARQL
by Bob Ducharme
Published 22 Jul 2011

We’ll learn more about RDFS and OWL in Chapter 9. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the Web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Checking, Adding, and Removing Spoken Language Tags–Checking, Adding, and Removing Spoken Language Tags adding, Checking, Adding, and Removing Spoken Language Tags checking, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia removing, Checking, Adding, and Removing Spoken Language Tags LCASE(), String Functions, Discussion LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked DataLinked Data, Problem, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development Linked Open Data, Discussion List All Triples query, Named Graphs literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Extension Functions, Glossary M magic properties (see property functions) materialization of triples, Inferred Triples and Your Query MAX(), Finding the Smallest, the Biggest, the Count, the Average...

This means that a good understanding of the role of URIs gives you greater control over your queries. Note The URIs that identify RDF resources are like the unique ID fields of relational database tables, except that they’re universally unique, which lets you link data from different sources around the world instead of just linking data from different tables in the same database. The Resource Description Framework (RDF) In Chapter 1, we learned the following about the Resource Description Framework: It’s a data model in which the basic unit of information is known as a triple. A triple consists of a subject, a predicate, and an object.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences
by Rob Kitchin
Published 25 Aug 2014

Conclusion At one level, the case for open and linked data is commonsensical – open data create transparency and accountability; participation, choice and social innovation; efficiency, productivity and enhanced governance; economic innovation and wealth creation. Linked data convert information across the Internet into a semantic web from which data can be machine-read and linked together. Open and linked data thus hold much promise and value as a venture. However, the case for open and linked data is more complex, and their economic underpinnings are not at all straightforward. Open and linked data might seem to have marginal costs, but their production and the technical and institutional apparatus needed to facilitate and maintain them has real cost in terms of labour, equipment, and resources.

When documents are published in this way, information on the Internet can be rendered and repackaged as data and can be linked in an infinite number of ways depending on purpose. However, as P. Miller (2010) notes, ‘linked data may be open, and open data may be linked, but it is equally possible for linked data to carry licensing or other restrictions that prevent it being considered open’, or for open data to be made available in ways that do not easily enable linking. In general, any linked documents that are not on an intranet or behind a pay wall are also open in nature. For Berners-Lee (2009), open and linked data should ideally be synonymous and he sets out five levels of such data, each with progressively more utility and value (see Table 3.3).

Since the late 2000s the movement has noticeably gained prominence and traction, initially with the Guardian newspaper’s campaign in the UK to ‘Free Our Data’ (www.theguardian.com/technology/free-ourdata), the Organization for Economic Cooperation and Development (OECD)’s call for member governments to open up their data in 2008, the launch in 2009 by the US government of data.gov, a website designed to provide access to non-sensitive and historical datasets held by US state and federal agencies, and the development of linked data and the promotion of the ‘Semantic Web’ as a standard element of future Internet technologies, in which open and linked data are often discursively conjoined (Berners-Lee 2009). Since 2010 dozens of countries and international organisations (e.g., the European Union [EU] and the United Nations Development Programme [UNDP]) have followed suit, making thousands of previously restricted datasets open in nature for non-commercial and commercial use (see DataRemixed 2013).

pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room
by David Weinberger
Published 14 Jul 2011

The rise of Linked Data encapsulates the transformation of knowledge we have explored throughout this book. While the original Semantic Web emphasized building ontologies that are “knowledge representations” of the world, it turns out that if we go straight to unleashing an abundance of linked but imperfect data, making it widely and openly available in standardized form, the Net becomes a dramatically improved infrastructure for knowledge. Linked Data is nevertheless itself only an example of a more expansive practice: Create metadata so your information can be reused. Linked Data is usable because it points beyond itself to information about the information.

For example, when an article in the journal Public Library of Science Medicine 43 examines “the predictors of live birth” in in vitro fertilization by analyzing 144,018 attempts, it links to the UK open government site where the source data—“the world’s oldest and most comprehensive database of fertility treatment in the UK”—is available.44 The new default is: If you’re going to cite the data, you might as well link to it. Networked facts point to where they came from and, sometimes, where they lead to. Indeed, a new standard called Linked Data is making it easier to make the facts presented in one site useful to other sites in unanticipated ways—enabling an ad hoc worldwide data commons. Key to Linked Data is the ability for a computer program not only to get the fact but to ask the resource for a link to more information about the context of the fact.45 Facts have become networked because our new information infrastructure happens also to be a hyperlinked publishing system.

We used to need trust because paper-based publishing breaks knowledge off from its source. Now, however, science—which has always had a network of inter-cited publications—occurs within a network of links. We create these links by hand, computers prowl the Web suggesting new links, and the surge of interest in the Linked Data format is making it easier than ever to create clouds of linked data just waiting for new uses. In this hyperlinked environment, we will continue to tell science’s stories, but those stories will be embedded within a system of connections. We will click to see the data. We will click to have our computers compare disparate datasets, surfacing the anomalies and disagreements that will never be entirely driven out from the data of science or from its stories.

RDF Database Systems: Triples Storage and SPARQL Query Processing
by Olivier Cure and Guillaume Blin
Published 10 Dec 2014

See Knowledge base management system (KBMS) Key-value stores, 27 Knowledge base management system (KBMS), 192 Kowari system, 112 L Last-to-front mapping property, 91 Lehigh University benchmark (LUBM), 77 data set, 98, 123 Linked Data Integration Benchmark (LODIB), 78 Linked data movement, 181 LinkedIn, 99 LinkedMDb, 168 Linked open data (LOD), 3 Literal, full text search, 99 analyzing text, 100 Load scalability, 170 LOD. See Linked open data (LOD) LODIB. See Linked Data Integration Benchmark (LODIB) LUBM. See Lehigh University benchmark (LUBM) Lucene software library, 176 Lucene documents, 100 M MAAN. See Multi-attribute addressable network (MAAN) MapReduce -based cluster, 32, 109 MapReduce decompression approach, 103 MapReduce processing, 9 MapReduce programming model, 102 MapReduce tasks, 37 MariaDB system, 23 MarkLogic system, 32, 138, 179 MaRVIN system, 217 Maximum-weight independent sets problem, 153 Mediator-based information systems, 187 Membase system, 137 Memcached system, 29 Memory mapping, 81 MemSQL system, 38 Message passing interface (MPI) approach, 216 Microformats, 52 Model checking, 193 MonetDB, 19 MongoDB system, 28, 32, 137 MPI approach.

The main advantages of JSON are its simplicity, flexibility (it’s schemaless), and native processing support for most Web applications due to a tight integration with the JavaScript programming language. But RDF is not without assets. For example, as a semi-structured data model, RDF data sets can be described with expressive schema languages, such as RDF Schema (RDFS) or Web Ontology Language (OWL), and can be linked to other documents present on the Web, forming the Linked Data movement. With the emergence of Linked Data, a pattern for hyperlinking machine-readable data sets that extensively uses RDF, URIs, and HTTP, we can consider that more and more data will be directly produced in or transformed into RDF. In 2013, the linked open data (LOD), a set of RDF data produced from open data sources, is considered to contain over 50 billion triples on domains as diverse as medicine, culture, and science, just to name a few.

The FedBench Benchmark (http://fedbench.fluidops.net/) uses several data sets (around 10, among which there are DBpedia subsets, NewYork Times, LinkedMDB, and Drugbank) on cross and life science domains (news, movies, music, drugs, etc.). The major aim of FedBench is to test the efficiency and effectiveness of federated query processing. Other benchmarks, such as Linked Data Integration Benchmark (LODIB) or JustBench, are designed to evaluate other properties of related systems, such as considering linked data (i.e., with real-world heterogeneities) or OWL capabilities of reasoners. 3.8 BUILDING SEMANTIC WEB APPLICATIONS Jena (http://jena.apache.org/) is an open-source Semantic Web framework for Java and is widely used in the Java community.

pages: 245 words: 68,420

Content Everywhere: Strategy and Structure for Future-Ready Content
by Sara Wachter-Boettcher
Published 28 Nov 2012

But a more semantic Web seems closer than ever with the recent advent of linked data, which is made possible through structured content and markup. Coined by Tim Berners-Lee—yes, the guy who invented the World Wide Web—in 2006, linked data means exactly what it sounds like: bits of information that are linked to other, equivalent sets of data elsewhere on the Internet (often referred to as “in the cloud”), as illustrated in Figure 6.1. The idea is that, as opposed to HTML links, which link one document (e.g., a page) to another, linked data connects the things those pages are about by connecting the actual data behind those two pages instead.

This gives both databases access to the information in the other, and that information then becomes more useful to both people and machines. FIGURE 6.1 Linked data connects content from different places, like between your website and Wikipedia, based on shared content attributes—and it’s getting more and more useful for connecting content across sources. For example, consider The New York Times. Since the 19th century, it’s been maintaining a tremendous index of people, organizations, places, and descriptors in the news. Starting in 1913, it began publishing that data first in a quarterly index, and later an annual one.1 Now that its collection has been digitized, the Times has opened it up as linked data at http://data.nytimes.com, making this extensive list of topics—well over 10,000 as of this writing, with plans to continually add more—accessible to anyone who wants it.

And all these pages of content are built automatically, using the content’s underlying structure to dictate what’s contextually relevant where. Finally, remember our introduction to linked data in Chapter 6, “Understanding Markup”? Well, the BBC is making use of that, too. Rather than, say, hiring writers to craft overviews of every animal the BBC has video footage about, the organization relies on content from other sources, accessible via linked data. That is, by structuring content along the same lines as sources like Wikipedia, the BBC can automatically pull in the content it doesn’t have—and isn’t invested enough in to create—from an external source.

pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing)
by Douglas R. Dechow
Published 2 Jul 2015

So for me this really was a seminal conference with so many truly ground breaking ideas emerging at the same time, apparently orthogonal to each other but actually all the same thing as time has confirmed, since the Google Knowledge Graph is the Semantic Web or ZigZag by another name. It’s all about linking data. This is a much quieter revolution than that initiated by the document Web but it will be much more far reaching. Linked data will become an integral part of the development of data-driven systems architectures that will revolutionize the way we build and maintain information management systems. Linked data architectures will supersede relational databases, make websites easier to build and unify the worlds of hypertext, document management, and databases to create rich interlinked knowledge-based systems as envisaged by the pioneers such as Ted and Doug over 50 years ago.

Linked data architectures will supersede relational databases, make websites easier to build and unify the worlds of hypertext, document management, and databases to create rich interlinked knowledge-based systems as envisaged by the pioneers such as Ted and Doug over 50 years ago. But the linked data revolution was very slow to take off—largely because it’s hard to explain the key concepts to people and what the benefits are. In 2004, it seemed to have completely stalled. Analyzing why this was the case is a much longer story than I have time to tell here, but as a by-product of doing this analysis at the time, Tim, Nigel Shadbolt, Danny Weitzner, and I started to look back at the factors that made the web of linked documents take off in order to try and understand why the web of linked data wasn’t. We realized that to understand the ecosystem that is the Web we have to take a socio-technical approach.

Agosti M, Ferr N (2007) A formal model of annotations of digital content. ACM Trans Inf Syst 26(1). doi:10.​1145/​1292591.​1292594 2. Baca M (1998) Introduction to metadata: pathways to digital information. Getty Information Institute, Los Angeles 3. Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Goble C et al (2013) Why linked data is not enough for scientists. Futur Gener Comput Syst 29(2). Special section: Recent advances in e-Science: 599–611. doi:10.​1016/​j.​future.​2011.​08.​004 4. Bechhofer S, De Roure D, Gamble M, Goble C, Buchan I (2010) Research objects: towards exchange and reuse of digital knowledge. Nat Proc. doi:10.​1038/​npre.​2010.​4626.​1 5.

The Art of SEO
by Eric Enge , Stephan Spencer , Jessie Stricchiola and Rand Fishkin
Published 7 Mar 2012

Figure 10-51 and Figure 10-52 depict some example graphs showing the rate of new external links (and in the last two instances, pages) created over time, with some speculation as to what the trends might indicate. Figure 10-51. Interpreting new external link data Figure 10-52. More link data speculation These assumptions do not necessarily hold true for every site or instance, but the graphs make it easy to see how the engines can use temporal link and content growth information to make guesses about the relevance or worthiness of a particular site. Figure 10-53 shows some guesstimates of a few real sites and how these trends have affected them. Figure 10-53. Wikipedia link data guesstimates As you can see in Figure 10-53, Wikipedia has had tremendous growth in both pages and links from 2007 through 2011.

Google and Bing Webmaster Tools As mentioned earlier, other valuable sources of data include Google Webmaster Tools and Bing Webmaster Tools. We cover these extensively in Using Search Engine–Supplied SEO Tools. From a planning perspective, you will want to get these tools in place as soon as possible. Both tools provide valuable insight into how the search engines see your site. This includes things such as external link data, internal link data, crawl errors, high-volume search terms, and much, much more. Note Some companies will not want to set up these tools because they do not want to share their data with the search engines, but this is a nonissue as the tools do not provide the search engines with any more data about your website; rather, they let you see some of the data the search engines already have.

This plug-in provides basic link data on the fly with just a couple of mouse clicks. Figure 10-23 shows the menu you’ll see with regard to backlinks. Notice also in the figure that the SearchStatus plug-in offers an option for highlighting NoFollow links, as well as many other capabilities. It is a great tool that allows you to pull numbers such as these much more quickly than would otherwise be possible. Figure 10-23. Firefox SearchStatus plug-in Third-party link-measuring tools Here is a look at some of the better-known advanced third-party tools for gathering link data. Open Site Explorer Open Site Explorer was developed based on crawl data obtained by SEOmoz, plus a variety of parties engaged by SEOmoz.

Beautiful Visualization
by Julie Steele
Published 20 Apr 2010

However, choosing an effective presentation is challenging, as not all information visualizations are created equally. Not all information visualizations highlight the patterns, gaps, and outliers important to analysts’ tasks, and furthermore, not all information visualizations “force us to notice what we never expected to see” (Tukey 1977). A growing trend in data analysis is to make sense of linked data as networks. Rather than looking solely at attributes of data, network analysts also focus on the connections between data and the resulting structures. My research focuses on understanding these networks because they are topical, emergent, and inherently challenging for analysts. Networks are difficult to visualize and navigate, and, most problematically, it is difficult to find task-relevant patterns.

If we’re starting from a graph representation of the database, as defined in Figure 14-2, this is a simple task. All we need is a nodeset and an edgeset, which can be easily produced from a relational set of tables; it might even come for free if the database is available in the form of an RDF dump (Freebase 2009) or as Linked Data (Bizer, Heath, and Berners-Lee 2009). From there, we can easily produce a node-link diagram using a graph drawing program such as Cytoscape (Shannon et al. 2003)—an open source application that has its roots in the biological networks scientific community. The resulting diagram, shown in Figure 14-3, depicts the given data model in a similar way as a regular Entity-Relationship (E-R) data structure diagram (Chen 1976), enriched with some quantitative information about the actual data.

The CENSUS data model as a weighted node-link diagram The heterogeneity of node and link type frequency evidenced in Figure 14-3 is not restricted to our example. It is observable in many datasets, including research databases (Schich and Ebert-Schifferer 2009), large bibliographies (Schich et al. 2009), Freebase, and the Linked Data cloud, regardless of whether the number of types is predefined or expandable by the curators. In all cases that I have seen so far, both the number of nodes per node type and the number of links per link type exhibit right-skewed diminishing distributions, which are widely known as long tails (Anderson 2006, Newman 2005), and lack a shared average as found in a normal Gaussian distribution.

pages: 283 words: 78,705

Principles of Web API Design: Delivering Value with APIs and Microservices
by James Higginbotham
Published 20 Dec 2021

authorId=765" } } } } } * * * Semantic Hypermedia Messaging Semantic hypermedia messaging is the most comprehensive category as it adds semantic profile and linked data support, making APIs part of the Semantic Web. By applying semantics of resource properties through linked data, more meaning is assigned to each property without requiring an explicit name to be used. Linked data usually relies on a shared vocabulary from Schema.org or other resources. With the growth of data analytics and machine learning, linking data to shared vocabularies enable automated systems to easily derive value of the data provided from APIs. Common formats that support semantic hypermedia messaging include Hydra, UBER, Hyper, JSON-LD, and OData.

", "label" : "Book Description", "rel" : ["https://schema.org/description"] }, { "name" : "authors", "rel" : ["collec tion","http://example.org/rels/authors"], "data" : [ { "id" : "author-765", "rel" : ["http://schema.org/Person"], "url" : "http://example.org/authors/765", "data" : [ { "name" : "authorId", "value" : "765", "label" : "Author ID" }, { "name" : "fullName", "value" : "Vaughn Vernon", "label" : "Full Name", "rel" : "https://schema.org/name" } ] } ] }, ] } ] } } * * * Notice how the size of the representations grow compared to the more compact resource serialization formats. With the increased size comes the addition of linked data and more powerful interactions with API clients. These representation formats offer more insight into how to navigate related resources and tap into new operations, including operations that were not available when the client was built. The goal is to enable generic clients to interact with APIs without the need for custom code or user interfaces.

pages: 100 words: 15,500

Getting Started with D3
by Mike Dewar
Published 26 Jun 2012

First, we lay out the circles and edges: var width = 1500, height = 1500; var svg = d3.select("body") .append("svg") .attr("width", width) .attr("height", height); var node = svg.selectAll("circle.node") .data(data.nodes) .enter() .append("circle") .attr("class", "node") .attr("r", 12); var link = svg.selectAll("line.link") .data(data.links) .enter().append("line") .style("stroke","black"); This populates the web page with the appropriate elements, we just need to lay them out. The force layout applies a force-directed algorithm to decide the position of each node. Here, each node feels a repulsive force from every other node, but is constrained by the edges that keep nodes connected together.

Here, each node feels a repulsive force from every other node, but is constrained by the edges that keep nodes connected together. This can result in an organic layout that looks wonderfully inviting as it unfolds. D3 makes it easy; first we instantiate the algorithm: var force = d3.layout.force() .charge(-120) .linkDistance(30) .size([width, height]) .nodes(data.nodes) .links(data.links) .start(); These methods are all custom methods for the algorithm that detail the various parameters and references the algorithm needs to compute how the position of the nodes and edges should change. We then use it to modify the appropriate attributes of our lines and circles: force.on("tick", function() { link.attr("x1", function(d) { return d.source.x; }) .attr("y1", function(d) { return d.source.y; }) .attr("x2", function(d) { return d.target.x; }) .attr("y2", function(d) { return d.target.y; }); node.attr("cx", function(d) { return d.x; }) .attr("cy", function(d) { return d.y; }); }); The layout algorithm generates a tick event, which corresponds to a single step of the layout algorithm.

pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
by Eric Redmond , Jim Wilson and Jim R. Wilson
Published 7 May 2012

For example, if the text of the article on Star Wars contains the string "[[Yoda|jedi master]]", we want to store that relationship twice—once as an outgoing link from Star Wars and once as an incoming link to Yoda. Storing the relationship twice means that it’s fast to look up both a page’s outgoing links and its incoming links. To store this additional link data, we’ll create a new table. Head over to the shell and enter this: ​​hbase> create 'links', {​​ ​​ NAME => 'to', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​},{​​ ​​ NAME => 'from', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​}​​ In principle, we could have chosen to shove the link data into an existing column family or merely added one or more additional column families to the wiki table, rather than create a new one. Creating a separate table has the advantage that the tables have separate regions.

The real strength of graph databases is traversing through the nodes by following relationships. In Chapter 7, ​Neo4J​, we discuss the most popular graph database today, Neo4J. Neo4J One operation where other databases often fall flat is crawling through self-referential or otherwise intricately linked data. This is exactly where Neo4J shines. The benefit of using a graph database is the ability to quickly traverse nodes and relationships to find relevant data. Often found in social networking applications, graph databases are gaining traction for their flexibility, with Neo4j as a pinnacle implementation.

​​$ curl -X PUT http://localhost:8091/riak/cages/2 \​​ ​​-H "Content-Type: application/json" \​​ ​​-H "Link:</riak/animals/ace>;riaktag=\"contains\",​​ ​​ </riak/cages/1>;riaktag=\"next_to\"" \​​ ​​-d '{"room" : 101}'​​ What makes Links special in Riak is link walking (and a more powerful variant, linked mapreduce queries, which we investigate tomorrow). Getting the linked data is achieved by appending a link spec to the URL that is structured like this: /_,_,_. The underscores (_) in the URL represent wildcards to each of the link criteria: bucket, tag, keep. We’ll explain those terms shortly. First let’s retrieve all links from cage 1. ​​$ curl http://localhost:8091/riak/cages/1/_,_,_​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs​​ ​​Content-Type: multipart/mixed; boundary=Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvrde/U5gymRMY+VwZw35gRfFgA=​​ ​​Location: /riak/animals/polly​​ ​​Content-Type: application/json​​ ​​Link: </riak/animals>; rel="up"​​ ​​Etag: VD0ZAfOTsIHsgG5PM3YZW​​ ​​Last-Modified: Tue, 13 Dec 2011 17:53:59 GMT​​ ​​​​ ​​{"nickname" : "Sweet Polly Purebred", "breed" : "Purebred"}​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD--​​ ​​​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs--​​ It returns a multipart/mixed dump of headers plus bodies of all linked keys/values.

Cataloging the World: Paul Otlet and the Birth of the Information Age
by Alex Wright
Published 6 Jun 2014

One year after writing that essay, he established a company called MetaWeb that created Freebase, which he characterized as an “open, shared database of the world’s knowledge.” In 2010, he sold the company to Google, where its structured snippets now often complement traditional keyword-based search results. In recent years, the Linked Data movement has to some extent subsumed the Semantic Web initiative. Linked Data proposes more of a middle ground, in which ontologies might be derived programmatically from analyzing large data sets, rather than manually created by teams of experts.12 This middle way approach might incorporate some of Otlet’s ideas: a topical structure further refined by automated discovery, bidirectional linking, and the ability to extract content from static documents, then synthesize and interpolate it in new ways.13 278 E ntering the S trea m In a widely circulated 2005 essay, “Ontology Is Overrated,” Clay Shirky argues that projects like the Semantic Web were doomed to failure in the Internet age.

See also Dewey Decimal System development of, 226–227 expanded use of, 40, 232 Josephinian Catalog, 33 playing cards, use of, 33 rejecting Universal Bibliography, 72 significance of, 33 standardized catalog cards, 105 supplies for, as business, 41–42 Library of Congress, 20, 29, 37 Licklider, J. C. R., 15, 248–250, 251, 258, 259 Limited Company for Useful Knowledge, 46 Limousin, Charles, 76 Linked Data movement, 278 Linotype, 89, 91, 92 Lippman, Walter, 162 Literary Machines (Nelson), 266 Lodge, Henry Cabot, 148, 165 Lovelace, Ada, 15 Lumière brothers, 62 Macintosh operating system, 260 Malware, 272 Man-Computer Symbiosis (Licklider), 248 Marburg, Theodore, 143 MARK II computer, 258 Markoff, John, 260 Marlowe, Christopher, 24 Marx, Karl, 59 The Master (Tóibín), 127 Masure, Louis, 158 Max, Adolphe, 105 Mazower, Mark, 67 Mechanical collective brain, 206, 218, 287 Mein Kampf (Hitler), 68 Memex, 217, 254, 256, 256–257 Mergenthaler, Otto, 89 Meta-bibliography, 242 MetaWeb (company), 278 Metric system, 30, 150 Microcosm project (England), 270 Microfilm, 100, 193, 200, 208, 210, 218, 250, 255, 274 Microphotic book, 101–107 Microphotography, 101, 208 Military origins of computers and Internet, 18, 248, 252, 258, 265 A Model Utopia (Wells), 211 Modernism, 179, 191 Mondotheque, 235, 238, 257, 296 Mons (Belgian city), 300 Morel, Edward, 54 Morgan, Pierpont, 125 Morris, William, 36 Morse, Samuel, 90 Motion pictures, possibilities of, 228–229 La Muette de Portici (The Mute Girl of Portici, opera), 44 Multimedia, envisioning of, 199 344 INDEX Mumford, Lewis, 115, 302 Mundaneum, 176–189 Berner-Lee’s views compared to, 274 compared to World Wide Web, 19, 234, 253–254, 277–278, 298 creation of, 5, 176–177, 179 design of, 177, 181–183, 182, 185–188, 187, 277 Encyclopedia Universalis Mundaneum (EUM) and, 193 goals of, 18, 177, 185, 292, 304–305 Google partnership with, 295–297 Le Corbusier’s role in, 181–188 Mons location of, 300–301 Nelson’s views compared to, 266 obscurity of, 11–12 Otlet’s description of, 18, 234–235, 238–239, 242, 243 role in utopian World City, 9, 303 World War II fate of, 9–11, 10 Mundaneum (Le Corbusier and Otlet pamphlet), 181, 182 Murray, James, 32 Musée d’Otlet (childhood display by Otlet), 46 Muséothèque (exhibition kit), 194 Museum for the Book (Brussels), 92–93 Museum of Society and Economy (Vienna), 194–195, 196–197 Museums and museum exhibits, role of, 102, 190–201, 227 Mussolini, 189 National Association for the Advancement of Colored People (NAACP), 168, 171 Nationalism, 144, 245 National Science Foundation, 252, 268 Nazis.

See Palais Mondial Worldstream, 291 World War I, 17, 18, 144–145 World War II Nazi book seizures and burnings, 4–5, 7, 12 Nazi occupation of Belgium, 18 Nazi persecution of Goldberg, 210 Nazi persecution of Zamenhof, 68 Otlet’s attempt to save Mundaneum, 10–11 Rosenberg Commission’s interest in Otlet, 4, 5, 7, 13, 245 World Wide Web. See also Internet flatness of, 285, 303 fundamental disorder of, 253–254, 282, 305 Knowledge Web, 276 Linked Data movement, 278 negatives of structure of, 272, 281, 289–291 ongoing development of, 280, 291 openness of, 271–272, 279, 281, 283, 285 origins of, 14, 15, 217, 252–253, 262, 270–275 Otlet’s prophetic vision of, 8, 14–15, 233–234 popularity of, 289 Semantic Web, 273–276, 278–279, 305 World Wide Web Consortium (W3C), 271, 273, 281 Wright, Frank Lloyd, 181, 262 Writers and economic chain of knowledge production, 231–232 WWW Consortium, 253 Xanadu, 264, 267 Xerox PARC (Palo Alto Research Center), 260 Young Friends of the World Palace, 202 Zamenhof, Ludwig, 67–68, 206 Zeiss Ikon camera company, 208, 210 Zero, Mr.

pages: 430 words: 68,225

Blockchain Basics: A Non-Technical Introduction in 25 Steps
by Daniel Drescher
Published 16 Mar 2017

Since broken hash references serve as evidence that data were changed after the reference was created, the whole construct stores data in a change-sensitive manner. How It Works There are two classical patterns of using hash references in order to store data in a change-sensitive manner: • The chain • The tree Blockchain Basics 87 The Chain A chain of linked data, also called a linked list, 2 is formed when each piece of data also contains a hash reference to another piece of data. Such a structure is useful for storing and linking data together that are not fully available at one given point in time but instead arrive step by step in an ongoing fashion. Figure 11-4 illustrates this idea by using the symbols introduced above. The creation of such a chain starts with the piece of data labeled Data 1 and the creation of the hash reference R1.

Architecture and its underlying concepts Blockchain Basics 199 Consensus Logic Since all the nodes of the distributed system maintain their history of transaction data independently, their content can differ due to delays or other adversities of passing messages through a network. As a result, the data store that was meant to form a straight line of linked data blocks actually forms a three-shaped data structure where each branch represents a conflicting version of the transaction history. The consensus logic as depicted in Figure 21-6 makes all nodes of the system eventually consistent by making them choose the identical version of the transaction history that unites the most collective effort.

pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext
by Belinda Barnet
Published 14 Jul 2013

They would later have a profound influence over hypertext theory and criticism, and also the Storyspace system. From the outset, the nodes in Storyspace were called ‘writing spaces’, and it worked explicitly with topographic MACHINE-ENHANCED (RE)MINDING 121 metaphors, incorporating a graphic ‘map view’ of the link data structure from the first version, along with a tree and an outline view (which are also visual representations of the data). ‘The tree’, Bolter tells us in Turing’s Man, ‘is a remarkably useful way of representing logical relations in spatial terms’ (Bolter 1984, 86). Also in line with the topographic metaphor, writing spaces in Storyspace acted (and still act) as containers for other writing spaces; an author literally ‘builds’ the space as she traverses it, zooming in and out to view details of the work, the map making the territory.

‘You’d tab a text and then you’d be able to associate notes with any particular word or phrase in the text […] an automated version of classical texts with notes’ (Bolter 2011). It wasn’t clickable because the IBM PC wasn’t clickable at the time; the user would move the cursor over the word and select it. This link data structure formed the basis for their future experiments ‘only in the sense that it had this quality of one text leading to another’ (Bolter 2011). In his well-researched chapter on afternoon, Matthew Kirschenbaum suggests that Storyspace has ‘significant grounding in a hierarchical data model’ (Kirschenbaum 2008, 173) that has its origins in the tree structures of ‘interactive fictions of the Adventure type’ (Kirschenbaum 2008, 175).

Guard fields are a powerful device, and one that Joyce deploys to full effect in afternoon. According to the Markle Report, Joyce ‘agitated’ for them to be included in the design of Storyspace from the outset, and Bolter quickly obliged in their fledgling program: It was just a matter of putting a field into the link data structure that would contain the guard, and then just checking that field […] against what the user did before they were allowed to follow the link […] It was [that] idea you know and it was Michael’s. (Bolter 2011) Guard fields, along with the topographic ‘spatial’ writing style, have remained integral to the Storyspace program for 30 years hence.

pages: 201 words: 63,192

Graph Databases
by Ian Robinson , Jim Webber and Emil Eifrem
Published 13 Jun 2013

Triple stores typically provide SPARQL ca‐ pabilities to reason about stored RDF data.11 RDF—the lingua franca of triple stores and the Semantic Web—can be serialized several ways. RDF encoding of a simple three-node graph shows the RDF/XML format. Here we see how triples come together to form linked data. RDF encoding of a simple three-node graph. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.example.org/ter <rdf:Description rdf:about="http://www.example.org/ginger"> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource="http://www.example.org/fred"/> </rdf:Description> 10. http://www.w3.org/standards/semanticweb/ 11.

See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented. The query language SPARQL benefits from similar W3C patronage. In the graph database space there is a similar abundance of innovation around graph serialization formats (e.g. GEOFF) and inferencing query languages (e.g. the Cypher query language that we use throughout this book).12 The key difference is that at this point these innovations do not enjoy the patronage of a well-regarded body like the W3C, though they do benefit from strong engagement within their user and vendor communities.

pages: 458 words: 116,832

The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism
by Nick Couldry and Ulises A. Mejias
Published 19 Aug 2019

In chapter 1 we noted the social credit system seen by the Chinese government as its route to “the modernization of social governance.”110 Meanwhile in India, the Aadhaar identity-card system is being made a requirement for access to welfare services, tax dealings, and even the online booking of train tickets.111 Through the operation of social caching, we are increasingly becoming data subjects whose responsiveness to data signals is expected, even taken as virtuous. IoT = LAC? (Operationalizing Life’s Annexation to Capital) The business opportunities from innovative extensions of social caching are multiplying, often in alliance with the state. Consider the cameras with linked data analytics now offered in the United States by Axon AI (formerly Taser) to replace law enforcement officers’ crime-scene reports; as one investor said, “Taser wants to be the Tesla or Apple of law enforcement.”112 Even in formal democracies, resource-strapped states will take advantage of these apparently risk-free methods for delegating their knowledge of hard-to-reach areas of the social world to algorithms.

To the transparent networks that slowly occlude the flow of all those aspects of nature and character that distinguish humans from elevator buttons and doorbells. . . . Haven’t you felt it? The loss of autonomy. The sense of being virtualized. All the coded impulses you depend on to guide you. All the sensors in the room that are watching you, listening to you, tracking your habits, measuring your capabilities. All the linked data designed to incorporate you into the megadata.37 Something, in other words, is going wrong with human autonomy. But, you might ask, isn’t the notion of autonomy (the self’s ability to govern its own life, deriving from the Greek words autos for self and nomos for law or rule) itself problematic?

We argued that underlying these was something even more fundamental: the drive to capitalize human life itself in all its aspects and build through this a new social and economic order that installs capitalist management as the privileged mode for governing every aspect of life. Put another way, and updating Marx for the Big Data age, human life becomes a direct factor in capitalist production. This annexation of human capital is what links data colonialism to the further expansion of capitalism. This is the fundamental cost of connection, and it is a cost being paid all over the world, in societies in which connection is increasingly imposed as the basis for participating in everyday life. The resulting order has important similarities whether we are discussing the United States, China, Europe, or Latin America.

pages: 58 words: 12,386

Big Data Glossary
by Pete Warden
Published 20 Sep 2011

It has been designed to make it easy to correct the most common errors you’ll encounter in human-created datasets. For example, it’s easy to spot and correct common problems like typos or inconsistencies in text values and to change cells from one format to another. There’s also rich support for linking data by calling APIs with the data contained in existing rows to augment the spreadsheet with information from external sources. Refine doesn’t let you do anything you can’t with other tools, but its power comes from how well it supports a typical extract and transform workflow. It feels like a good step up in abstraction, packaging processes that would typically take multiple steps in a scripting language or spreadsheet package into single operations with sensible defaults.

Data and the City
by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle
Published 2 Aug 2017

London: Macmillan and Co. Cosgrove, D. (2001) Apollo’s Eye: A Cartographic Genealogy of the Earth in the Western Imagination. Baltimore, MD: Johns Hopkins University Press. Debruyne, C., Clinton, É., McNerney, L., Lavin, P. and O’Sullivan, D. (2017) ‘On the construction for a linked data platform for Ireland’s authoritative geospatial linked data’, 186 T. P. Lauriault available from: www.osi.ie/wp-content/uploads/2017/01/osi-eswc-2017-preprint.pdf [accessed 10 February 2017]. Dodge, M., Kitchin, R. and Perkins, C. (eds) (2009) Rethinking Maps: New Frontiers in Cartographic Theory. London: Routledge. Foucault, M. (2003) The Essential Foucault: Selections from Essential Works of Foucault, 1954–1984.

pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money
by Frank J. Ohlhorst
Published 28 Nov 2012

In both cases, the overarching principle is real-time data integration, in which reflecting data change instantly in a data warehouse—whether originating from a MapReduce job or from a transactional system—and create downstream analytics that have an accurate, timely view of reality. Others are turning to linked data and semantics, where data sets are created using linking methodologies that focus on the semantics of the data. This fits well into the broader notion of pointing at external sources from within a data set, which has been around for quite a long time. That ability to point to unstructured data (whether residing in the file system or some external source) merely becomes an extension of the given capabilities, in which the ability to store and process XML and XQuery natively within an RDBMS enables the combination of different degrees of structure while searching and analyzing the underlying data.

Virtual Competition
by Ariel Ezrachi and Maurice E. Stucke
Published 30 Nov 2016

One possibility may be to focus on commercially sensitive information that, although publicly available, is of little or no value to customers but helps the competitors arrive at a supracompetitive price.37 Here the focus is on “cheap talk,” that is, data exchanges that facilitate conscious parallelism but are of limited use to customers. One problem, however, is in identifying such information. Part of the value of Big Data is data fusion, whereby computers link data sets, from which new insights emerge.38 Moreover, the data for some applications—such as customers sharing their inventory data with suppliers—can promote efficiency even while raising antitrust concerns.39 Even if the customers seek to limit what information can be shared, the algorithms—by analyzing a variety of data—could fi ll in the gaps.

cote=DSTI/ICCP(2012)9/FINAL&docLanguage =En, observing that “In some cases, big data is defined by the capacity to analyse a variety of mostly unstructured data sets from sources as diverse as web logs, social media, mobile communications, sensors and financial transactions. This requires the capability to link data sets; this can be essential as information is highly context-dependent and may not be of value out of the right context. It also requires the capability to extract information from unstructured data, i.e. data that lack a predefined (explicit or implicit) model.” 39. Stanford Graduate School of Business Staff, “Sharing Information to Boost the Bottom Line,” Insights by Stanford Business (March 1, 1999), http://www .gsb.stanford.edu/insights/sharing-information-boost-bottom-line. 336 Notes to Pages 234–237 Final Reflections 1.

pages: 262 words: 60,248

Python Tricks: The Book
by Dan Bader
Published 14 Oct 2017

But before we jump in, let’s cover some of the basics first. How do arrays work, and what are they used for? Arrays consist of fixed-size data records that allow each element to be efficiently located based on its index. Because arrays store information in adjoining blocks of memory, they’re considered contiguous data structures (as opposed to linked datas structure like linked lists, for example.) A real world analogy for an array data structure is a parking lot: You can look at the parking lot as a whole and treat it as a single object, but inside the lot there are parking spots indexed by a unique number. Parking spots are containers for vehicles—each parking spot can either be empty or have a car, a motorbike, or some other vehicle parked on it.

pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design
by Diomidis Spinellis and Georgios Gousios
Published 30 Dec 2008

There is no central coordination, and we are free to document our wandering by republishing our stories, thoughts, and journeys as we go. We think of the Web as a series of one-way links between documents (see Figure 5-1). Figure 5-1. Conventional notion of the Web Linked documents are only part of the picture, however. The vision for the Web always included the idea of linked data as well. This content can be consumed through a rendered view or directly referenced and manipulated in preferred forms in different contexts. You can imagine a middle-tier layer asking for information as an XML document while the presentation tier prefers a JSON object via an AJAX call. The same name refers to the same data in different forms.

For the more difficult aspects of establishing the correctness of a design or implementation, the advantage of the functional approach is not so clear. For example, proving that a recursive definition has specific properties and terminates requires the equivalent of a loop invariant and variant. It is also unlikely that efficient functional programs can afford to renounce programmer-visible linked data structures, with all the resulting problems such as aliasing, which are challenging regardless of the underlying programming model. If functional programming fails to bring a significant simplification to the task of establishing correctness, there remains a major practical argument: referential transparency.

The Data Journalism Handbook
by Jonathan Gray , Lucy Chambers and Liliana Bounegru
Published 9 May 2012

While we are all either a journalist, designer, or developer “first,” we continue to work hard to increase our understanding and proficiency in each other’s areas of expertise. The core products for exploring data are Excel, Google Docs, and Fusion Tables. The team has also, but to a lesser extent, used MySQL, Access databases, and Solr to explore larger datasets; and used RDF and SPARQL to begin looking at ways in which we can model events using Linked Data technologies. Developers will also use their programming language of choice, whether that’s ActionScript, Python, or Perl, to match, parse, or generally pick apart a dataset we might be working on. Perl is used for some of the publishing. We use Google, Bing Maps, and Google Earth, along with Esri’s ArcMAP, for exploring and visualizing geographical data.

pages: 224 words: 13,238

Electronic and Algorithmic Trading Technology: The Complete Guide
by Kendall Kim
Published 31 May 2007

However, most financial services institutions do not have the ability to reach an optimal infrastructure because resources for most of a brokerage firm’s cost center have fallen victim to applying discretionary funds within the profit center such as the trading area of the business. It is clearly evident that budgets for data infrastructure have been reduced in the past years when the need for enhancing performance and technology has never been greater. Presumably, this will change in the future, though, when linking data to trading profitability becomes more evident. 8.5 Impact on Operations and Technology Real-time transaction processing and electronic trading can result in a great deal of automation for operations. Real-time transactions move more Effective Data Management 89 quickly, tend to be more accurate, have fewer problems, and need less attention than manually engaged transactions.

Algorithms in C++ Part 5: Graph Algorithms
by Robert Sedgewick
Published 2 Jan 1992

Indeed, the first algorithms that we considered in detail, the union-find algorithms in Chapter 1, are prime examples of graph algorithms. We also used graphs in Chapter 3 as an illustration of applications of two-dimensional arrays and linked lists, and in Chapter 5 to illustrate the relationship between recursive programs and fundamental data structures. Any linked data structure is a representation of a graph, and some familiar algorithms for processing trees and other linked structures are special cases of graph algorithms. The purpose of this chapter is to provide a context for developing an understanding of graph algorithms ranging from the simple ones in Part 1 to the sophisticated ones in Chapters 18 through 22.

The primary disadvantage is that testing for the existence of specific edges can take time proportional to V, as opposed to constant time in the adjacency matrix. These differences trace, essentially, to the difference between using linked lists and vectors to represent the set of vertices incident on each vertex. Thus, we see again that an understanding of the basic properties of linked data structures and vectors is critical if we are to develop efficient graph ADT implementations. Our interest in these performance differences is that we want to avoid implementations that are inappropriately inefficient under unexpected circumstances when a wide range of operations is to be demanded of the ADT.

pages: 318 words: 73,713

The Shame Machine: Who Profits in the New Age of Humiliation
by Cathy O'Neil
Published 15 Mar 2022

Or likewise you might get punished for littering in the subway or denigrating the ruling party online. Your various infractions might also be announced, by name, on Weibo or WeChat, internet giants in China. No matter where we live, some of us fare far better than others in our relations with the expanding network linking data to shame and stigma. The easiest people to exploit tend to be the most desperate, the ones who lack the money, the knowledge, or the leisure time to tend to the digital baggage that trails them, or simply those who have traditionally been treated badly. These are folks who are disproportionately poor or otherwise marginalized and have the least control over their identities.

pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms
by Mehmed Kantardzić
Published 2 Jan 2003

Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments, and contacts, or in bibliographic domains describing publications, authors, and venues. Graph-mining techniques explicitly consider these links when building predictive or descriptive models of the linked data. The requirement of different applications with graph-based data sets is not very uniform. Thus, graph models and mining algorithms that work well in one domain may not work well in another. For example, chemical data is often represented as graphs in which the nodes correspond to atoms, and the links correspond to bonds between the atoms.

Therefore, a labeled graph G consists of three sets of information: G(N,L,V), where the new component V = {v1, v2, … , vt} is a set of values attached to links. An example of a directed graph is given in Figure 12.2b, while the graph in Figure 12.2c is a labeled graph. Different applications use different types of graphs in modeling linked data. In this chapter the primary focus is on undirected and unlabeled graphs although the reader still has to be aware that there are numerous graph-mining algorithms for directed and/or labeled graphs. Besides a graphical representation, each graph may be presented in the form of the incidence matrix I(G) where nodes are indexing rows and links are indexing columns.

pages: 288 words: 85,073

Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think
by Hans Rosling , Ola Rosling and Anna Rosling Rönnlund
Published 2 Apr 2018

We presented at the ceremony for their new Open Data platform in May 2010, and since then the World Bank has become the main access point for reliable global statistics; see gapm.io/x6. This was all possible thanks to Tim Berners-Lee and other early visionaries of the free internet. Sometime after he had invented the World Wide Web, Tim Berners-Lee contacted us, asking to borrow a slide show that showed how a web of linked data sources could flourish (using an image of pretty flowers). We share all of our content for free, so of course we said yes. Tim used this “flower-powerpoint” in his 2009 TED talk—see gapm.io/x6—to help people see the beauty of “The Next Web,” and he uses Gapminder as an example of what happens when data from multiple sources come together; see Berners-Lee (2009).

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists
by Gary Marcus and Jeremy Freeman
Published 1 Nov 2014

While efforts to map the brain have begun as public, government-funded projects, this does not mean that private entities will not enter the arena and seek to compete with those projects. Although initial efforts to map the brain may be fueled by public funds, the issue of how “fine-tuned” information that can be used to determine risk factors or emerging disease states in individual’s brains, which will require linking data to genetic databases, health records, and health databases, will be handled merits discussion now. What rules will govern the sharing of detailed scans or maps about each individual’s brain? Can data be linked from a brain scan to a genome to a database without an individual’s express consent if that person’s identity is not 100 percent secure?

pages: 374 words: 94,508

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage
by Douglas B. Laney
Published 4 Sep 2017

. • Information accessibility • User request turnaround time • User satisfaction survey Agility The ability to respond to external influences, and the ability to respond to marketplace changes to gain or maintain competitive advantage. SCOR agility metrics include flexibility and adaptability. • Utility of information for a range of purposes • Linked data, metadata, and master data measures • Ease of integrating new types of data or changing dimensions Costs The cost of operating the supply chain processes. This includes labor costs, material costs, management, and transportation costs. A typical cost metric is cost of goods sold. • Data acquisition cost • Data management costs • Data delivery costs (Each include labor and technology related costs) Asset Management Efficiency (Assets) The ability to efficiently utilize assets.

Future Files: A Brief History of the Next 50 Years
by Richard Watson
Published 1 Jan 2008

Carolyn 153 trends that will transform transport 5 Embedded intelligence Cars can already be opened or started using fingerprint and iris recognition, so we’ll see more technologies linking vehicle security to user identification. We will also see mood-sensitive vehicles that adjust their behavior according to the mood of the driver or occupants. Cars will also become mobile technology platforms linking data to other services such as healthcare. For example, if your car regularly detects an abnormal heartbeat or high levels of stress, this information could be sent wirelessly to your doctor. Obviously privacy issues abound, but cars could become useful data-collection and delivery points. Remote monitoring Electronic data recorders are little black boxes that already sit covertly inside some cars and monitor your speed, acceleration and braking.

pages: 356 words: 102,224

Pale Blue Dot: A Vision of the Human Future in Space
by Carl Sagan
Published 8 Sep 1997

You reach out your arm to pick up something shiny in the soil, and the robot arm does likewise. The sands of Mars trickle through your fingers. The only difficulty with this remote reality technology is that all this must occur in tedious slow motion: The round-trip travel time of 115 the up-link commands from Earth to Mars and the down-link data returned from Mars to Earth might take half an hour or more. But this is something we can learn to do. We can learn to contain our exploratory impatience if that's the price of exploring Mars. The rover can be made smart enough to deal with routine contingencies. Anything more challenging, and it makes a dead stop, puts itself into a safeguard mode, and radios for a very patient human controller to take over.

pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance
by Emanuel Derman
Published 1 Jan 2004

While I was away on a two-week beach vacation at Fire Island with my family, Ed suddenly threw himself into redesigning and then rewriting the entire system-without giving me advance notice. I returned to a fait accompli, a completely new, enhanced, and almost unrecognizable APL-flavored version of the language. Ed's version now incorporated vastly complex dynamically linked data structures, whose details I knew I would not live long enough to master. Ed had also cleverly modified HEQS so that, once you had used it interactively to develop and solve a financial model, you could then use it generate a C program that would solve your equations many times faster. Programming came naturally to Ed in a way it never would to me, and his proficiency daunted me.

pages: 348 words: 97,277

The Truth Machine: The Blockchain and the Future of Everything
by Paul Vigna and Michael J. Casey
Published 27 Feb 2018

You could say these “cloud” services are much truer to that name than those of Amazon Web Services, Google, Dropbox, IBM, Oracle, Microsoft, and Apple, the providers with which most people associate that word. But even bigger changes are being considered, including projects to entirely re-architect the Web itself. There’s Solid, which stands for Social Linked Data, a new protocol for data storage that puts data back in the hands of the people to whom it belongs. The core idea is that we will store our data in Pods (Personalized Online Data Stores) and distribute it to applications via permissions we control. Solid is the brainchild of none other than Tim Berners-Lee, the computer scientist who perfected HTTP and gave us the World Wide Web.

pages: 352 words: 98,561

The City
by Tony Norfield

The City’s status as a major dealing centre is solidly based on its connections with the rest of the world and its ability to act as an intermediary for global flows of money-capital and credit. Major flows of finance in the form of deposits, loans, and the purchase and sale of securities between UK-based banks and the rest of the world are intermediated by banks outside the UK, but many of these are UK-linked. Data from the Bank of England enable these links to be examined in some detail, and they highlight a key role of the UK banking system, one that has not been analysed before. These data are shown in Table 8.6.22 The figures are in US dollars, since this is the main currency used in the transactions, and they measure the outstanding valuations of bank assets and liabilities.

pages: 350 words: 109,521

Our 50-State Border Crisis: How the Mexican Border Fuels the Drug Epidemic Across America
by Howard G. Buffett
Published 2 Apr 2018

Anderson’s work directly, but now we support it through a nonprofit called the Colibri Center for Human Rights that works with the medical examiner’s office to identify these remains and provide closure for families regardless of the origins of the deceased. For example, we funded an international geographic information system (GIS) initiative in Pima County to link data from missing person reports to postmortem reports. We agree with Anderson and Colibri that respect for the dead is one measure of a civilized society. Is it civilized to view the “mortal danger” of the desert as a deterrent? Should it give us pause that before Operation Gatekeeper funneled immigrants to the desert, there were only about twelve bodies per year recovered along the border?

pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future
by Kevin Kelly
Published 6 Jun 2016

Slowly but surely Amazon’s cloud and Google’s cloud and Facebook’s cloud and all the other enterprise clouds are intertwining into one massive cloud that acts as a single cloud—The Cloud—to the average user or company. A counterforce resisting this merger is that an intercloud requires commercial clouds to share their data (a cloud is a network of linked data), and right now data tends to be hoarded like gold. Data hoards are seen as a competitive advantage, and sharing data freely is hampered by laws, so it will be many years (decades?) before companies learn how to share their data creatively, productively, and responsibly. There is one final step in the inexorable march toward decentralized access.

pages: 371 words: 107,141

You've Been Played: How Corporations, Governments, and Schools Use Games to Control Us All
by Adrian Hon
Published 14 Sep 2022

No one would mistake the clean lines of my flowcharts for the snarl of links that makes up the Q-Web, a notorious QAnon chart crammed with hundreds of supposedly connected things like #MeToo, Monsanto, and J. Edgar Hoover, but the principles are similar: one discovery leads to the next.10 Of course, these two flowcharts are very different beasts. The Q-Web is an imaginary, retrospective description of spuriously linked data, while my flowcharts were a prescriptive network of events completely orchestrated by my team. Except that’s not quite true. In reality, Perplex City players didn’t always solve our puzzles as quickly as we intended them to, or they became convinced their incorrect solution was correct, or, embarrassingly, our puzzles were broken and had no solution at all.

Remix
by John Courtenay Grimwood
Published 15 Nov 2001

But Lady Clare had insisted, reeling off a list that began with the Antiguan Absolutists and ended with Zebediah Nouveau. Mind you, he didn’t hate standing inside that circle as much as he hated being there at all. But Lady Clare had insisted on that as well. Keeping her good side to the main CySat camera, Lady Clare smiled. It was amazing how much clout you carried when you’d linked data credits to gold reserves to keep the senior officers loyal, welcomed the UN Pax Force with open arms, arranged for Paris to be the first European city overflown with the new ‘dote and put some backbone into the Prince Imperial. This was the General’s payback, and as far as Lady Clare was concerned it was a small price.

pages: 404 words: 43,442

The Art of R Programming
by Norman Matloff

If implemented in C, a tree node would be represented by a C struct, similar to an R list, whose contents are the stored value, a pointer to the left child, and a pointer to the right child. But since R lacks pointer variables, what can we do? Our solution is to go back to the basics. In the old prepointer days in FORTRAN, linked data structures were implemented in long arrays. A pointer, which in C is a memory address, was an array index instead. Specifically, we’ll represent each node by a row in a three-column matrix. The node’s stored value will be in the third element of that row, while the first and second elements will be the left and right links.

pages: 400 words: 121,988

Trading at the Speed of Light: How Ultrafast Algorithms Are Transforming Financial Markets
by Donald MacKenzie
Published 24 May 2021

Interviewee CV gave, as an example of this highly demanding form of trading—“taking in the world’s information and being able to translate that to predict the next tick [price movement]”—an algorithm trading 10-year US Treasury futures in the Chicago Mercantile Exchange’s datacenter. The algorithm will take into account the pattern of bids, offers, and trades in those futures, as well as patterns in the trading of the other Treasury and interest-rate futures also traded in that datacenter. The algorithm will receive, via microwave links, data on the buying and selling of the underlying Treasurys, which are traded in the two datacenters in New Jersey shown in the map in figure 4.1. Via Hibernia Atlantic’s ultrafast transatlantic cable, it will receive data on the trading of futures on UK sovereign bonds (these futures are traded in a datacenter just outside of London) and the equivalent German futures, traded in a datacenter in Frankfurt called FR2.

pages: 424 words: 123,180

Democracy's Data: The Hidden Stories in the U.S. Census and How to Read Them
by Dan Bouk
Published 22 Aug 2022

Even more remarkable is the thought that it will likely continue to exist as long as there is a United States of America, maybe even longer. The census began as a relatively simple tool to tether political clout to each state’s head count. The framers of the Constitution and their Enlightenment-era values linked data to democracy and democracy to data. Each state’s say in governing the country would henceforth be proportional to its official population. By 1940, the census had developed into an extensive stocktaking of the American people, a picture of who they were, where they came from, what they did, and how they lived.

The Art of Computer Programming: Fundamental Algorithms
by Donald E. Knuth
Published 1 Jan 1974

The proper way to design a library is heavily dependent upon the computer used and the applications to be handled. Large modern computers require an entirely different approach to subroutine libraries. But this is a nice exercise anyway, because it involves interesting manipulations on both sequential and linked data.) The problem in this exercise is to design an algorithm for the stated task. Your allocator may transform the tape directory in any way as it prepares its answer, since the tape directory can be read in anew by the subroutine allocator on its next assignment, and the tape directory is not needed by other parts of the loading routine. 27. [25] Write a MIX program for the subroutine allocation algorithm of exercise 26. 28. [40] The following construction shows how to "solve" a fairly general type of two- person game, including chess, nim, and many simpler games: Consider a finite set of nodes, each of which represents a possible position in the game.

The first algorithm we require is one that builds the Data Table in such a form. Note the flexibility in choice of level numbers that is allowed by the COBOL rules; the left structure in D) is completely equivalent to 1 A 2 B 3 C 3 D 2 E 2 F 3 G because level numbers do not have to be sequential. 428 INFORMATION STRUCTURES 2.4 Symbol Table LINK Data Table PREV PARENT NAME CHILD SIB A: B: C: D: E: F: G: H: Al B5 C5 D9 E9 F5 G9 HI Empty boxes indicate additional information not relevant here A A A A A A A A F3 G4 B3 C7 E3 D7 G8 A Al B3 B3 Al Al F3 A HI F5 HI HI C5 C5 C5 A B C D E F G H F G B C E D G B3 C7 A A A G4 A F5 G8 A A E9 A A A HI E3 D7 A F3 A A A B5 A C5 A D9 G9 A E) Al: B3?

pages: 505 words: 133,661

Who Owns England?: How We Lost Our Green and Pleasant Land, and How to Take It Back
by Guy Shrubsole
Published 1 May 2019

Part of the problem is that the data on what companies own still isn’t good enough to prove whether or not land banking is occurring. Anna has tried to map the land owned by housing developers, but has been thwarted by the lack in the Land Registry’s corporate dataset of the necessary information to link data on who owns a site with digital maps of that area. That makes it very hard to assess, for example, whether a piece of land owned by a housebuilder for decades is a prime site accruing in value or a leftover fragment of ground from a past development. Second, the scope of Letwin’s review was drawn too narrowly to examine the wider problem of land banking by landowners beyond the major housebuilders.

pages: 494 words: 142,285

The Future of Ideas: The Fate of the Commons in a Connected World
by Lawrence Lessig
Published 14 Jul 2001

For a time, one could find an extraordinary range of songs archived throughout the Web. Slowly these services have migrated to commercial sites. This migration means the commercial sites can support the costs of developing and maintaining this information. And in some cases, with some databases, the Internet provided a simple way to collect and link data about music in particular.8 Here the CDDB—or “CD database”—is the most famous example. As MP3 equipment became common, people needed a simple way to get information about CD titles and tracks onto the MP3 device. Of course, one could type in that information, but why should everyone have to type in that information?

pages: 528 words: 146,459

Computer: A History of the Information Machine
by Martin Campbell-Kelly and Nathan Ensmenger
Published 29 Jul 2013

was already well established when two other Stanford University doctoral students, Larry Page and Sergey Brin, began work on the Stanford Digital Library Project (funded in part by the National Science Foundation)—research that would not only forever change the process of finding things on the Internet but also, in time, lead to an unprecedentedly successful web advertising model. Page became interested in a dissertation project on the mathematical properties of the web, and found strong support from his adviser Terry Winograd, a pioneer of artificial intelligence research on natural language processing. Using a “web crawler” to gather back-link data (that is, the websites that linked to a particular site), Page, now teamed up with Brin, created their “PageRank” algorithm based on back-links ranked by importance—the more prominent the linking site, the more influence it would have on the linked site’s page rank. They insightfully reasoned that this would provide the basis for more useful web searches than any existing tools and, moreover, that there would be no need to hire a corps of indexing staff.

pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
by Bruce Schneier
Published 2 Mar 2015

., 160 fiduciary responsibility, data collection and, 204–5 50 Cent Party, 114 FileVault, 215 filter bubble, 114–15 FinFisher, 81 First Unitarian Church of Los Angeles, 91 FISA (Foreign Intelligence Surveillance Act; 1978), 273 FISA Amendments Act (2008), 171, 273, 275–76 Section 702 of, 65–66, 173, 174–75, 261 FISA Court, 122, 171 NSA misrepresentations to, 172, 337 secret warrants of, 174, 175–76, 177 transparency needed in, 177 fishing expeditions, 92, 93 Fitbit, 16, 112 Five Eyes, 76 Flame, 72 FlashBlock, 49 flash cookies, 49 Ford Motor Company, GPS data collected by, 29 Foreign Intelligence Surveillance Act (FISA; 1978), 273 see also FISA Amendments Act Forrester Research, 122 Fortinet, 82 Fox-IT, 72 France, government surveillance in, 79 France Télécom, 79 free association, government surveillance and, 2, 39, 96 freedom, see liberty Freeh, Louis, 314 free services: overvaluing of, 50 surveillance exchanged for, 4, 49–51, 58–59, 60–61, 226, 235 free speech: as constitutional right, 189, 344 government surveillance and, 6, 94–95, 96, 97–99 Internet and, 189 frequent flyer miles, 219 Froomkin, Michael, 198 FTC, see Federal Trade Commission, US fusion centers, 69, 104 gag orders, 100, 122 Gamma Group, 81 Gandy, Oscar, 111 Gates, Bill, 128 gay rights, 97 GCHQ, see Government Communications Headquarters Geer, Dan, 205 genetic data, 36 geofencing, 39–40 geopolitical conflicts, and need for surveillance, 219–20 Georgia, Republic of, cyberattacks on, 75 Germany: Internet control and, 188 NSA surveillance of, 76, 77, 122–23, 151, 160–61, 183, 184 surveillance of citizens by, 350 US relations with, 151, 234 Ghafoor, Asim, 103 GhostNet, 72 Gill, Faisal, 103 Gmail, 31, 38, 50, 58, 219 context-sensitive advertising in, 129–30, 142–43 encryption of, 215, 216 government surveillance of, 62, 83, 148 GoldenShores Technologies, 46–47 Goldsmith, Jack, 165, 228 Google, 15, 27, 44, 48, 54, 221, 235, 272 customer loyalty to, 58 data mining by, 38 data storage capacity of, 18 government demands for data from, 208 impermissible search ad policy of, 55 increased encryption by, 208 as information middleman, 57 linked data sets of, 50 NSA hacking of, 85, 208 PageRank algorithm of, 196 paid search results on, 113–14 search data collected by, 22–23, 31, 123, 202 transparency reports of, 207 see also Gmail Google Analytics, 31, 48, 233 Google Calendar, 58 Google Docs, 58 Google Glass, 16, 27, 41 Google Plus, 50 real name policy of, 49 surveillance by, 48 Google stalking, 230 Gore, Al, 53 government: checks and balances in, 100, 175 surveillance by, see mass surveillance, government Government Accountability Office, 30 Government Communications Headquarters (GCHQ): cyberattacks by, 149 encryption programs and, 85 location data used by, 3 mass surveillance by, 69, 79, 175, 182, 234 government databases, hacking of, 73, 117, 313 GPS: automobile companies’ use of, 29–30 FBI use of, 26, 95 police use of, 26 in smart phones, 3, 14 Grayson, Alan, 172 Great Firewall (Golden Shield), 94, 95, 150–51, 187, 237 Greece, wiretapping of government cell phones in, 148 greenhouse gas emissions, 17 Greenwald, Glenn, 20 Grindr, 259 Guardian, Snowden documents published by, 20, 67, 149 habeas corpus, 229 hackers, hacking, 42–43, 71–74, 216, 313 of government databases, 73, 117, 313 by NSA, 85 privately-made technology for, 73, 81 see also cyberwarfare Hacking Team, 73, 81, 149–50 HAPPYFOOT, 3 Harris Corporation, 68 Harris Poll, 96 Hayden, Michael, 23, 147, 162 health: effect of constant surveillance on, 127 mass surveillance and, 16, 41–42 healthcare data, privacy of, 193 HelloSpy, 3, 245 Hewlett-Packard, 112 Hill, Raquel, 44 hindsight bias, 322 Hobbes, Thomas, 210 Home Depot, 110, 116 homosexuality, 97 Hoover, J.

The Art of Computer Programming: Sorting and Searching
by Donald Ervin Knuth
Published 15 Jan 1998

Example of Wheeler's tree insertion scheme. structure slightly with "two-way insertion" cuts the number of moves down to about |-/V2. Shellsort cuts the number of comparisons and moves to about N7//6, for N in a practical range; as N —> oo this number can be lowered to order N(\ogNJ. Another way to improve on Algorithm S, using a linked data structure, gave us the list insertion method, which does about \N2 comparisons, 0 moves, and 2N changes of links. Is it possible to marry the best features of these methods, reducing the number of comparisons to order NlogN as in binary insertion, yet reducing the number of moves as in list insertion?

An alert, "modern" reader will note, however, that the whole idea of mak- making digit counts for the storage allocation is tied to old-fashioned ideas about sequential data representation. We know that linked allocation is specifically designed to handle a set of tables of variable size, so it is natural to choose a linked data structure for radix sorting. Since we traverse each pile serially, all 5.2.5 SORTING BY DISTRIBUTION 171 Table 1 RADIX SORTING Input area contents: 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 Counts for units digit distribution: 1123121311 Storage allocations based on these counts: 1 2 4 7 8 10 11 14 15 16 Auxiliary area contents: 170 061 512 612 503 653 703 154 275 765 426 087 897 677 908 509 Counts for tens digit distribution: 4210022311 Storage allocations based on these counts: 4 6 7 7 7 9 11 14 15 16 Input area contents: 503 703 908 509 512 612 426 653 154 061 765 170 275 677 087 897 Counts for hundreds digit distribution: 2210133211 Storage allocations based on these counts: 2 4 5 5 6 9 12 14 15 16 Auxiliary area contents: 061 087 154 170 275 426 503 509 512 612 653 677 703 765 897 908 we need is a single link from each item to its successor.

In the Age of the Smart Machine
by Shoshana Zuboff
Published 14 Apr 1988

Ironically, it means creating a doubly abstract world, where the refer- ence function of the electronic symbols becomes less problematic be- cause of yet another layer of abstractions (mental images) called up to serve as referents. Operators did not appear equally adept at generating an inward im- age. 7 Many seemed unable to link data on the screen to a referential reality. Their interactions with the data were confined to the two- dimensional space of the terminal screen; the electronic symbols were deciphered according to the varying patterns in which they were ar- rayed. Typically, when asked what the data on the screen meant, these operators would point to distinct data elements and discuss them in terms of their spatial relationships on the screen, as if there were no external referents.

pages: 834 words: 180,700

The Architecture of Open Source Applications
by Amy Brown and Greg Wilson
Published 24 May 2011

If the data changes from one execution to another, a new version is checked in to the repository. Thus, the (uuid, version) tuple is a compound identifier to retrieve the data in any state. In addition, we store the hash of the data as well as the signature of the upstream portion of the workflow that generated it (if it is not an input). This allows one to link data that might be identified differently as well as reuse data when the same computation is run again. The main concern when designing this package was the way users were able to select and retrieve their data. Also, we wished to keep all data in the same repository, regardless of whether it is used as input, output, or intermediate data (an output of one workflow might be used as the input of another).

pages: 933 words: 205,691

Hadoop: The Definitive Guide
by Tom White
Published 29 May 2009

This information is not readily available when crawling. Also, the indexing process benefits from taking into account the anchor text on inlinks so that this text may semantically enrich the text of the current page. As mentioned earlier, Nutch collects the outlink information and then uses this data to build a LinkDb, which contains this reversed link data in the form of inlinks and anchor text. This section presents a rough outline of the implementation of the LinkDb tool—many details have been omitted (such as URL normalization and filtering) in order to present a clear picture of the process. What’s left gives a classical example of why the MapReduce paradigm fits so well with the key data transformation processes required to run a search engine.

pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology
by Ray Kurzweil
Published 14 Jul 2005

Resources and Contact Information Singularity.com New developments in the diverse fields discussed in this book are accumulating at an accelerating pace. To help you keep pace, I invite you to visit Singularity.com, where you will find ·Recent news stories ·A compilation of thousands of relevant news stories going back to 2001 from KurzweilAI.net (see below) ·Hundreds of articles on related topics from KurzweilAI.net ·Research links ·Data and citation for all graphs ·Material about this book ·Excerpts from this book ·Online endnotes KurzweilAI.net You are also invited to visit our award-winning Web site, KurzweilAI.net, which includes over six hundred articles by over one hundred "big thinkers" (many of whom are cited in this book), thousands of news articles, listings of events, and other features.

pages: 897 words: 242,580

The Temporal Void
by Peter F. Hamilton
Published 1 Jan 2008

The Yenisey couldn’t even get an accurate quantum signature scan to determine what kind of drive it used. ‘Admiral,’ Lucian called urgently. ‘We can’t—’ The unknown ship fired. ‘What the fuck was that!’ Gore yelled as the secure link abruptly vanished. Kazimir took a second to review the TD link data, he was so surprised. His tactical staff had produced a number of scenarios, mostly incorporating the Ocisens utilizing weapons technology they’d procured from a more advanced species. This hadn’t been a remote consideration. ‘I don’t recognize that design at all,’ Ilanthe said. ‘Do we have any spherical ship on the Navy’s intelligence registry?’

pages: 903 words: 235,753

The Stack: On Software and Sovereignty
by Benjamin H. Bratton
Published 19 Feb 2016

Through various combinations of open or proprietary exigetics of data, and perhaps a sequence of application programming interfaces (APIs), a query entered as “book me a ticket to New York” can activate a series of secondary inquiries to calendars, banks, flight schedules, airline databases, bank accounts, and so on and, through this, initiate the cascading programming resulting in that booking. For this, to search is also to program. Such tidy consumer use cases require enormously difficult standardizations of interoperability between competitive services (not to mention beyond-Esperanto level standardization of all Users’ conceptual taxonomies). The goal of linking data into semantically relevant and accessible structures so that “search” would also provide more actionable results, and in turn allowing queries to program those results for specific ends, remains compelling for search engines, if less so for individual down-service-stream providers, such as airlines and banks, which see their business absorbed into a handful of search platforms.20 By comparison, physical search may be based on a similar tissue of interrelation between addressable entities—in this case, a mix of physical things and data of interest—and might be a necessary condition of a really viable Internet of Things or SPIME space.

Data Mining: Concepts and Techniques: Concepts and Techniques
by Jiawei Han , Micheline Kamber and Jian Pei
Published 21 Jun 2011

.; Ronkainen, P.; Toivonen, H.; Verkamo, A.I., Finding interesting rules from large sets of discovered association rules, In: Proc. 3rd Int. Conf. Information and Knowledge Management Gaithersburg, MD. (Nov. 1994), pp. 401–408. [KMS03] Kubica, J.; Moore, A.; Schneider, J., Tractable group detection on large link data sets, In: Proc. 2003 Int. Conf. Data Mining (ICDM’03) Melbourne, FL. (Nov. 2003), pp. 573–576. [KN97] Knorr, E.; Ng, R., A unified notion of outliers: Properties and computation, In: Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD’97) Newport Beach, CA. (Aug. 1997), pp. 219–222. [KNNL04] Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W., Applied Linear Statistical Models with Student CD. (2004) Irwin .

pages: 918 words: 257,605

The Age of Surveillance Capitalism
by Shoshana Zuboff
Published 15 Jan 2019

Conlee, “How Automation and Analytics Are Changing Customer Care,” Conduent Blog, July 18, 2016, https://www.blogs.conduent.com/2016/07/18/how-automation-and-analytics-are-changing-customer-care; Ryan Knutson, “Call Centers May Know a Surprising Amount About You,” Wall Street Journal, January 6, 2017, http://www.wsj.com/articles/that-anonymous-voice-at-the-call-center-they-may-know-a-lot-about-you-1483698608. 74. Nicholas Confessore and Danny Hakim, “Bold Promises Fade to Doubts for a Trump-Linked Data Firm,” New York Times, March 6, 2017, https://www.nytimes.com/2017/03/06/us/politics/cambridge-analytica.html; Mary-Ann Russon, “Political Revolution: How Big Data Won the US Presidency for Donald Trump,” International Business Times UK, January 20, 2017, http://www.ibtimes.co.uk/political-revolution-how-big-data-won-us-presidency-donald-trump-1602269; Grassegger and Krogerus, “The Data That Turned the World Upside Down”; Carole Cadwalladr, “Revealed: How US Billionaire Helped to Back Brexit,” Guardian, February 25, 2017, https://www.theguardian.com/politics/2017/feb/26/us-billionaire-mercer-helped-back-brexit; Paul-Olivier Dehaye, “The (Dis)Information Mercenaries Now Controlling Trump’s Databases,” Medium, January 3, 2017, https://medium.com/personaldata-io/the-dis-information-mercenaries-now-controlling-trumps-databases-4f6a20d4f3e7; Harry Davies, “Ted Cruz Using Firm That Harvested Data on Millions of Unwitting Facebook Users,” Guardian, December 11, 2015, https://www.theguardian.com/us-news/2015/dec/11/senator-ted-cruz-president-campaign-facebook-user-data. 75.