Research & Innovation

Research & Innovation

Matching Natural Language Relations to Knowledge Graph Properties for Question Answering

Research has seen considerable achievements concerning the translation of natural language patterns into formal queries for Question Answering based on Knowledge Graphs (KG). The main challenge exists on how to identify which property within a Knowledge Graph matches the predicate found in a Natural Language (NL) relation. Current approaches for formal query generation attempt to resolve this problem mainly by first retrieving the named entity from the KG together with a list of its predicates, then filtering out one from all the predicates of the entity. We attempt an approach to directly match an NL predicate to KG properties that can be employed within QA pipelines. In this paper, we specify a systematic approach as well as providing a tool that can be employed to solve this task. Our approach models KB relations with their underlying parts of speech, we then enhance this with extra attributes obtained from Wordnet and Dependency parsing characteristics. From a question, we model a similar representation of query relations. We then define distance measurements between the query relation and the properties representations from the KG to identify which property is referred to by the relation within the query. We report substantive recall values and considerable accuracy from our evaluation.

Research & Innovation

Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation

With the omnipresent availability and use of cloud services, software tools, Web portals or services, legal contracts in the form of license agreements or terms and conditions regulating their use are of paramount importance. Often the textual documents describing these regulations comprise many pages and can not be reasonably assumed to be read and understood by humans. In this work, we describe a method for extracting and clustering relevant parts of such documents, including permissions, obligations, and prohibitions. The clustering is based on semantic similarity employing a distributional semantics approach on large word embeddings database. An evaluation shows that it can significantly improve human comprehension and that improved feature-based clustering has a potential to further reduce the time required for EULA digestion. Our implementation is available as a Web service, which can directly be used to process and prepare legal usage contracts.

Research & Innovation

Semantic Annotation of Heterogeneous Data Sources: Towards an Integrated Information Framework for Service Technicians

Service technicians in the domain of industrial maintenance require extensive technical knowledge and experience to complete their tasks. Some of the needed knowledge is made available as document-based technical manuals or service reports from previous deployments. Unfortunately, due to the great amount of data, service technicians spend a considerable amount of working time searching for the correct information. Another challenge is posed by the fact that valuable insights from operation reports are not yet considered due to insufficient textual quality and content-wise ambiguity. In this work we propose a framework to annotate and integrate these heterogeneous data sources to make them available as information units with Linked Data technologies. We use machine learning to modularize and classify information from technical manuals together with ontology-based autocompletion to enrich service reports with clearly defined concepts. By combining these two approaches we can provide an unified and structured interface for both manual and automated querying. We verify our approach by measuring precision and recall of information for typical retrieval tasks for service technicians, and show that our framework can provide substantial improvements for service and maintenance processes.

Research & Innovation

Linked Data Applied: A Field Report from the Netherlands

Linked Data and the Semantic Web have generated interest in the Netherlands from the very beginning.

Research & Innovation

Specification of SemanticTrajectories and Data Transformations for Analytics: The datAcron Ontology

Motivated by real-life emerging needs in critical domains, this paper proposes a coherent and generic ontology for the representation of semantic trajectories, in association with related events and contextual information. The main contribution of the proposed ontology is the representation of semantic trajectories at varying, interlinked levels of spatio-temporal analysis. The paper presents the ontology in detail, also in connection to other well-known ontologies, and demonstrates how exploiting data at varying levels of granularity supports data transformations that can support visual analytics tasks in the air-traffic management domain.

Research & Innovation

Game CharacterOntology (GCO) - A vocabulary for extracting and describing game character information from fansites

Creating video games that are market competent costs in time, effort and resources which often cannot be afforded by small-medium enterprises, especially by independent game development studios. As most of the tasks involved in developing games are labour and creativity intensive, our vision is to reduce software development effort and enhance design creativity by automatically generating novel and semantically-enriched content for games from Web sources. In particular, this paper presents a vocabulary that defines detailed properties used for describing video game characters information extracted from sources such as fansites to create game character models. These character models could then be reused or merged to create new unconventional game characters.

Research & Innovation

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection inDigital Humanities

We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II.
The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general.

Research & Innovation

LOD-a-lot: A Single-File Enabler for Data Science

Many Data Scientists make use of Linked Open Data. However, most scientists restrict their analyses to one or two datasets (often DBpedia). One reason for this lack of variety in dataset use has been the complexity and cost of running large-scale triple stores, graph stores or property graphs. With Header Dictionary Triples (HDT) and Linked Data Fragments (LDF), the cost of Linked Data publishing has been significantly reduced. Still, Data Scientists who wish to run large-scale analyses need to query many LDF endpoints and integrate the results. Using recent innovations in data storage, compression and dissemination, we are able to compress (a large subset of) the LOD Cloud into a single file. We call this file LOD-a-lot. Because it is just one file, LOD-a-lot can be easily downloaded and shared. It can be queried locally or through an LDF endpoint. In this paper we identify several categories of use cases that previously required an expensive and complicated setup, but that can now be run over a cheap and simple LOD-a-lot file. LOD-a-lot does not expose the same functionality as a full-blown database suite, mainly offering Triple Pattern Fragments. Despite these limitations, this paper shows that there is a surprisingly wide collection of Data Science use cases that can be performed over a LOD-a-lot file. For these use cases LOD-a-lot significantly reduces the cost and complexity of doing Data Science. 

Research & Innovation

Supporting virtual integration of Linked Data with just-in-time query recompilation

In virtual data integration, the data reside on their original sources without being copied and transformed on a single platform as in warehousing. Integration must be performed at query execution time and relies on transformations of the original query to many target endpoints.

Pages

Subscribe to RSS - Research & Innovation