Mikhail Galkin, Kemele M. Endris, Maribel Acosta, Diego Collarana, Maria Esther Vidal and Sören Auer

Research & Innovation

SMJoin: A Multi-way Join Operator for SPARQL Queries

Join operators are particularly important in SPARQL query engines that collect RDF data using Web access interfaces. State-of-the-art SPARQL query engines rely on binary join operators tailored for merging results from SPARQL queries over Web access interfaces.
However, in queries with a large number of triple patterns, binary joins constitute a significant burden on the query performance. Multi-way joins that handle more than two inputs are able to reduce the complexity of pre-processing stages and reduce the execution time.

Whereas in the relational field multi-way joins have already received some attention, the applicability of multi-way joins in SPARQL query processing remains unexplored. We devise SMJoin, a multi-way non-blocking join operator tailored for independently merging results from more than two RDF data sources. SMJoin implements intra-operator adaptivity, i.e., it is able to adjust join execution schedulers to the conditions of Web access interfaces; thus, query answers are produced as soon as they are computed and can be continuously generated even if one of the sources becomes blocked.

We empirically study the behavior of SMJoin in two benchmarks with queries of different selectivity; state-of-the-art SPARQL query engines are included in the study. Experimental results suggest that SMJoin outperforms existing approaches in very selective queries, and produces first answers as fast as state-of-the-art adaptive query engines in non-selective queries.