Wikidata

back to index

description: a free knowledge base that can be read and edited by humans and machines alike, linked to Wikipedia and other Wikimedia projects

2 results

Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data
by Leslie Sikos
Published 10 Jul 2015

birth < "1901-01-01"^^xsd:date) . } ORDER BY ?name Wikidata Wikidata is one of the largest LOD databases that features both human-readable and machine-readable contents, at http://www.wikidata.org. Wikidata contains structured data from Wikimedia projects, such as Wikimedia Commons, Wikipedia, Wikivoyage, and Wikisource, as well as from the once popular directly editable Freebase dataset, resulting in approximately 13 million data items. In contrast to many other LOD datasets, Wikidata is collaborative—anyone can create new items and modify existing ones. Like Wikipedia, Wikidata is multilingual. The Wikidata repository is a central storage of structured data, whereby data can be accessed not only directly but also through client Wikis.

The Wikidata repository is a central storage of structured data, whereby data can be accessed not only directly but also through client Wikis. Data is added to items that feature a label, which is a descriptive alias, connected by site links. Each item is characterized by statements that consist of a property and property value. Wikidata supports the Lua Scribunto parser extension to allow embedding scripting languages in MediaWiki and access the structured data stored in Wikidata through client Wikis. Data can also be retrieved using the Wikidata API. GeoNames GeoNames is a geographical database at http://www.geonames.org that provides RDF descriptions for more than 7,500,000 geographical features worldwide, corresponding to more than 10 million geographical names.

Some datasets provide a SPARQL endpoint, which is an address from which you can directly run SPARQL queries (powered by a back-end database engine and an HTTP/SPARQL server). 62 Chapter 3 ■ Linked Open Data Frequently Used Linked Datasets LOD datasets are published in a variety of fields. Interdisciplinary datasets such as DBpedia (http://dbpedia.org) and WikiData (http://www.wikidata.org) are general-purpose datasets and are, hence, among the most frequently used ones. Geographical applications can benefit from datasets such as GeoNames (http://www.geonames.org) and LinkedGeoData (http://linkedgeodata.org). More and more universities provide information about staff members, departments, facilities, courses, grants, and publications as Linked Data and RDF dump, such as the University of Florida (http://vivo.ufl.edu) and the Ghent University (http://data.mmlab.be/mmlab).

pages: 321 words: 113,564

AI in Museums: Reflections, Perspectives and Applications
by Sonja Thiel and Johannes C. Bernhardt
Published 31 Dec 2023

The comparative approach to the various countries and regions took into account the relevant geopolitical and economic dispositions as well as the different types of museums and possible ethical implications. In what follows, this paper provides a representative snapshot of the current state of research, thus painting a sample picture of AI roll-out in museums world1 https://www.wikidata.org/wiki/Wikidata:WikiProject_Museum_AI_projects. 132 Part 2: Perspectives wide. This review of research from the field was the starting point for the seminar to comprehensively compile data and cases regarding the way AI is being deployed in museums the world over. An outline of the extensive mapping that was then conducted for each continent will add to this presentation of research.

To adapt the original BERT model for historical texts containing OCR errors, an unsupervised pre-training was done using a selection of 2,333,647 German-language pages from the SBB’s digitized collections, followed by additional supervised training on openly available gold-standard data for NER (Labusch/Neudecker/Zellhöfer 2019). Furthermore, to disambiguate the entities recognized and link them to authority data (in this case, Wikidata QIDs), knowledge bases were constructed using Wikipedia and Wikidata for German, French, and English, and a purpose-trained BERT context disambiguation model was developed (Labusch/Neudecker 2020) that decides for a given entity whether and which QID should be linked, based on the local context and a comparison with the knowledge bases.

This collection data is of great value to AI development, as generations of curators have worked on the quality of object descriptions and scholarly descriptions of context or related classification systems. Ideally, this information is stored in a machine-readable collection management system and includes qualitycontrolled metadata and standard data or authority files. The collection data is, moreover, linked to high-level ontologies, vocabularies, or thesauri systems such as AAT, GND, Geonames, Wikidata, or ICONCLASS, which ensure the correct use of terms and provide additional context. These sources of knowledge representation provide a high-quality source for machine learning tasks, but so far nevertheless seem to be underrated. At the same time, the efforts to transfer formats and facilitate communication between domain experts and data scientists and developers should also not be underestimated.