Research & Innovation

Research & Innovation

Counting to k, or How SPARQL 1.1 Could be Efficiently Enhanced with top k Shortest Path Queries

While graph data on the Web and represented in RDF is growing, SPARQL, as the standard query language for RDF still remains largely unusable for the most typical graph query task: finding paths between selected nodes through the graph. Property Paths, as introduced in SPARQL1.1 turn out to be unfit for this task, as they can only be used for testing path existence and not even allow to count the number of paths between nodes. While such a feature has been shown to theoretically highly intractable, particularly in graphs with a high degree of cyclicity, practical use cases still demand a solution. A common restriction in fact is not to ask for all, but only the $k$-shortest paths between two nodes, in order to obtain at least the most important of potentially infeasibly many possible paths. In this paper, we extend SPARQL 1.1 property paths in a manner that allows to compute and return the $k$ shortest paths matching a property path expression between two nodes.

We present an algorithm and implementation and demonstrate in our evaluation that a realtively straightforward solution works (in fact, more efficiently than other, tailored solutions in the literature) in practical use cases.

Research & Innovation

SMJoin: A Multi-way Join Operator for SPARQL Queries

Join operators are particularly important in SPARQL query engines that collect RDF data using Web access interfaces. State-of-the-art SPARQL query engines rely on binary join operators tailored for merging results from SPARQL queries over Web access interfaces.
However, in queries with a large number of triple patterns, binary joins constitute a significant burden on the query performance. 

Research & Innovation

Linked Data Reactor: Towards Data-aware User Interfaces

Most of the existing Web user interfaces (UIs) are hard-coded by their developers to address certain predefined types of data, and hence are blind to the semantics of data they are dealing with. When talking about unstructured data or data without an explicit semantic representation, our expectations of data-awareness are lower. However, when we consider Linked Data UIs, where we have both structured data and semantics, we indeed expect more awareness from the UI which renders the data. In this paper we present an architecture for data-aware UIs, called Linked Data Reactor, implemented based on Web components and Semantic Web technologies. The proposed UIs can understand users' data and are capable to interact with users accordingly.

Research & Innovation

Adaptable Interfaces, Interactions, and Processing for Linked Data Platform Components

Currently, we are witnessing the rise of new technology-driven trends such as the Internet of Things, Web of Things, and Factories of the Future that are accompanied by an increasingly heterogeneous landscape of small, embedded, and highly modularized devices and applications, multitudes of manufactures and developers, and pervasion of things within all areas of life. At the same time, we can observe increasing complexity of the task of integrating subsets of heterogeneous components into applications that fulfil certain needs by providing value-added functionality beyond the pure sum of their components. Enabling integration in these multi-stakeholder scenarios requires new architectural approaches for adapting components, while building on existing technologies and thus ensuring broader acceptance. To this end, we present our approach on adaptation, that introduces adaptable interfaces, interactions, and processing for Linked Data Platform components. In addition, we provide an implementation of our approach that enables the adaptation of components via a thin meta-layer defined on top of the components' domain data and functionality. Finally, we evaluate our implementation by using a benchmark environment and adapting interfaces, interactions, and processing of the involved components at runtime.

Research & Innovation

OntoIdea: Ontology-based Approach for Enhancing Collaborative Ideation

Enhancing creativity has been paid much attention recently, especially with the emergence of online collaborative ideation. Prior work has shown that in addition to the exposure of diverse and creative examples, visualizing the solution space enables ideators to be inspired and thus, come-up with more creative ideas. However, existing automated approaches which assess the diversity of a set of examples fail on unstructured short text due to their reliance on similarity computation. Furthermore, the conceptual divergence cannot be easily captured for such representation. To overcome these issues, in this paper we introduce an ontology-based approach. The proposed solution formalizes user ideas into ontology-based concepts and then an ontology matching system is used to compute the similarity between users' ideas. Based on this approach, we aim also to create a visualization of the solution space based on the similarity matrix obtained by matching process between all ideas.

Research & Innovation

A Snapshot of Ontology Evaluation Criteria and Strategies

Ontologies are key to information retrieval, semantic integration of datasets, and semantic similarity analyzes. Evaluating ontologies (especially defining what constitutes a "good" or "better" ontology) is therefore of central importance for the Semantic Web community. Various criteria have been introduced in the literature to evaluate ontologies, and this article classifies them according to their relevance to the design or the implementation phase of ontology development. In addition, the article compiles strategies for ontology evaluation based on ontologies published between until 2017 in two outlets: the Semantic Web Journal, and the Journal of Web Semantics. Gaps and opportunities for future research on ontology evaluation are exposed towards the end of the paper.

Research & Innovation

Siamese Network with Soft Attention for Semantic Text Understanding

We propose a task independent neural networks model, based on a Siamese twin architecture. Our model specifically benefits from two forms of attention scheme which we use to extract high-level feature representation of the underlying texts, both at the word level (intra-attention) as well as at the sentence level (inter-attention). The inter-attention scheme uses one of the text to create a contextual interlock with the other text, thus paying attention to mutually important parts. We evaluate our system on three tasks, i.e. Textual Entailment, Paraphrase Detection and answer-sentence selection. We set a near state-of-the-art result on the textual entailment task with the SNLI corpus while obtaining strong performance across the other tasks that we evaluate our model on.

Research & Innovation

Multi-core Meta-blocking for Big Linked Data

Discovering matching entities in different Knowledge Bases constitutes a core task in the Linked Data paradigm. Due to its quadratic time complexity, Entity Resolution typically scales to large datasets through blocking, which restricts comparisons to similar entities. For Big Linked Data, Meta-blocking is also needed to restructure the blocks in a way that boosts precision, while maintaining high recall. Based on blocking and Meta-blocking, JedAI Toolkit implements an end-to-end ER workflow for both relational and RDF data. However, its bottleneck is the time-consuming procedure of Meta-blocking, which iterates over all comparisons in each block. To accelerate it, we present a suite of parallelization techniques that are suitable for multi-core processors. We present 2 categories of parallelization strategies, with each one comprising 4 different approaches that are orthogonal to Meta-blocking algorithms. We perform extensive experiments over a real dataset with 3.4 million entities and 13 billion comparisons, demonstrating that our methods can process it within few minutes, while achieving high speedup.

Research & Innovation

High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-accurate (precision > 95\%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-the-art entity-linking systems and human annotators.

Pages

Subscribe to RSS - Research & Innovation