Two worlds, one goal: A Reliable Linked Data ecosystem for media - Wolters Kluwer & DBpedia

Time: 
Monday, September 11, 2017 - 11:00 to 15:00
Place: 
The Meervaart (Room 4)

Organizers: DBpedia, Wolters Kluwer

 

Description

This half-day workshop aims at exploring major topics for publishers and libraries from DBpedia’s and Wolters Kluwer’s perspective. Therefore, both communities will dive into core areas like Interlinking, Metadata and Data Quality and address challenges such as fundamental requirements when publishing data on the web.

 

Program

Session 1 (90min) Interlinking and Metadata

  • Management of Interlinking and Data Contributions in DBpedia (20 + 5, Presenter: Sebastian Hellmann, Executive Director of the DBpedia Association)
  • DBpedia Open Text Extraction Challenge (10 + 5, Presenter: Sandro Coelho, DBpedia NLP Department)
  • Wolters Kluwer goes RDF on citation recognition (20 + 5, Presenter: Fred Diele, Enterprise Business Architect Wolters Kluwer)
  • ID Management and Fusion (20 + 5, Presenter: Johannes Frey, University of Leipzig)

 

Session 2 (90min) - Metadata and Data Quality

  • RDF Unit - Unit testing Framework for RDF and Ontologies (20 + 5, Presenter: Gustavo Publio, University of Leipzig)
  • Challenges and Opportunities for Central Thesaurus (Management) at a global Publisher (20 + 5, Presenter: Kris van Damme, Software Architect Wolters Kluwer)
  • Data Quality as an enabler and prerequisite for data business (20 + 5, Presenter: Christian Dirschl, Head of Content Strategy at Wolters Kluwer Deutschland GmbH)

 

About the Program

  • Interlinking and ID Management: Links are the key enabler for retrieval of related information on the Web of Data. Currently, DBpedia is one of the central interlinking hubs in the Linked Open Data (LOD) cloud. With over 28 million described and localized things it is one of the largest and open datasets. With the increasing number of linked datasets, there is a need for proper maintenance of these links.

Wolters Kluwer applies semantic web technology by automatically recognizing textual references in documents and transforming them into links in RDF format. A generic engine based on micro-services  combined with language and country specific rule sets for tokenization, pattern matching, anchor definition and decoration is presented.   

  • Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. DBpedia provides a dataset for 130 languages in each release consisting of 50 to 150 sub datasets, published as dump files and via SPARQL endpoints. In this workshop we will discuss the need for proper datasets Metadata and present you ideas to improve dataset metadata. For years, Wolters Kluwer Belgium uses indexes, registers, keyword and synonyms lists. These are used in the editorial systems to classify (enrich) content and to make selections of content. On product side they’re used to produce indexes and tables-of-content on paper and in search or browse for digital products. The centralization of the domain or product specific thesauri creates opportunities, but the central management also brings many challenges that we will discuss.
     
  • Data Enrichment & Data Quality - Data Quality as an enabler and pre-requesite for data business

Publishers and information service industry have introduced RDF to deal with their massive amounts of (semi-) unstructured data. Continuous & high quality triplification of semi-structured data is a common problem in the information industry. Schema changes and enhancements are routine tasks, but ensuring data quality is still very often purely manual effort. So any automation will support a lot of real-life use cases in different domains.

Information service providers like Wolters Kluwer use content and data as their core assets in order to serve the information needs of their customer base. One of the main drivers for success is trust in the brand and in the products. The most obvious indicator for having an extraordinary and reliable data offering is high quality, which becomes key in a digital environment, where data errors and gaps can easily be spotted and identified. Therefore, a lot of effort goes into data quality measures, which enables us to generate business growth by having a satisfied customer base.

 

Contact

For any questions please contact Sandra Praetor, DBpedia Association, via praetor@infai.org