Linking Technologies I

Session 1.2

Time:

Tuesday, September 12, 2017 - 10:30 to 12:00

Place:

Room 5

Talks

Theo van Veen, Juliette Lonij

Koninklijke Bibliotheek, National Library of the Netherlands

Industry

Improving Access to Digital Content by Semantic Enrichment

The collection of digitized historical newspapers of the National Library of the Netherlands contains an abundance of information about events, persons, concepts etc. As part of our effort to automatically extract this information from the unstructured text we are developing methods to recognize named entities and link them to external knowledge bases such as DBpedia and Wikidata.

We are continuously working on further increasing the accuracy of the links by exploring new machine learning algorithms and adding new features. Our current focus is on identifying and incorporating features that may play a role in human entity linking, and on applying word and entity embeddings, as we expect these representations to be a valuable addition to our existing, mostly handcrafted features.

Knap, Tomas

Semantic Web Company

Industry

Enrich Your Knowledge Graphs: Linked Data Integration with PoolParty Semantic Integrator

PoolParty Semantic Integrator (https://www.poolparty.biz/) is a world-class semantic technology suite for organizing, enriching, and searching knowledge, which is available on the market for more than 10 years.

Roman Prokofyev, Djellel Difallah, Michael Luggen and Philippe Cudre-Mauroux

Research & Innovation

High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-accurate (precision > 95\%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-the-art entity-linking systems and human annotators.

Search form