Linking Technologies II

Session 5.2

Time:

Wednesday, September 13, 2017 - 15:00 to 16:00

Place:

Room 8

Talks

Nicky van Oorschot

Netage

Industry

Linking National Core Registries

In the Netherlands several core registries are available as open data. Several others are closed data. The 25 safety regions (Fire-, Police- and Health Organizations), decided to connect all administrative links between the registries in a new dataset.

Despina-Athanasia Pantazi, George Papadakis, Konstantina Bereta, Themis Palpanas and Manolis Koubarakis

Research & Innovation

Multi-core Meta-blocking for Big Linked Data

Discovering matching entities in different Knowledge Bases constitutes a core task in the Linked Data paradigm. Due to its quadratic time complexity, Entity Resolution typically scales to large datasets through blocking, which restricts comparisons to similar entities. For Big Linked Data, Meta-blocking is also needed to restructure the blocks in a way that boosts precision, while maintaining high recall. Based on blocking and Meta-blocking, JedAI Toolkit implements an end-to-end ER workflow for both relational and RDF data. However, its bottleneck is the time-consuming procedure of Meta-blocking, which iterates over all comparisons in each block. To accelerate it, we present a suite of parallelization techniques that are suitable for multi-core processors. We present 2 categories of parallelization strategies, with each one comprising 4 different approaches that are orthogonal to Meta-blocking algorithms. We perform extensive experiments over a real dataset with 3.4 million entities and 13 billion comparisons, demonstrating that our methods can process it within few minutes, while achieving high speedup.