Analysis I

Session 2.2

Tuesday, September 12, 2017 - 16:00 to 17:30
Room 5



Contextualization and Analytics based on unstructured Data - Big Data for advanced organisations

• Big Data solutions typically focus on structured data
• But the majority of information contains unstructured data

Research & Innovation

Ontology-guided Skill Demand Analysis in the Job Market: A Cross-Sectional Study of Data Science Skill Demand

The rapid changes on the job market and the dramatic usage of the Web have triggered the need to analyze online job adverts. This paper presents an quantitative method to infer employers skill demand using co-word analysis based on skills keyword. These keywords are extracted automatically by an Ontology-based Information Extraction (OBIE) method. An ontology called Skills and Recruitment Ontology (SARO) has been developed to represent job postings in the context of skills and competencies needed to fill a job role. During the extraction and annotation of keywords, we focus on job posting attributes and job specific skills (Tool, Product, Topic). We present our system where cross-sectional study is decoupled in two phases: (1) a customized-pipeline for extracting information whose results are a matrix of co-occurrences and correlation; and (2) content analysis to visualize the keywords' structure and network. This method reveals the technical skills in demand together with their structure for revealing significant linkages. The evaluation of OBIE method indicates the promising result of automatic keyword indexing with an overall strict F-measure at 79%. The advantage of using an ontology and reusing semantic categories enables other research groups to reproduce this method and its results.

Research & Innovation

IDOL: Comprehensive & Complete LOD Insights

Over the last decade, we observed a steadily increasing amount of RDF datasets made available on the web of data. The decentralized nature of the web, however, makes it hard to identify all these datasets. Even more so, when downloadable data distributions are discovered, only insufficient metadata is available to describe the datasets properly, thus posing barriers on its usefulness and reuse.

In this paper, we describe an attempt to exhaustively identify the whole linked open data cloud by harvesting metadata from multiple sources, providing insights about duplicated data and the general quality of the available metadata. This was only possible by using a probabilistic data structure called Bloom filter. Finally, we enrich existing dataset metadata with our approach and republish them through an SPARQL endpoint.