In order to efficiently disclose the ever-growing amount of distributed RDF data in Semantic Web environments, RDF query engines must optimize the join order of partial query results. Existing methods include two-phase optimization (2PO), a genetic algorithm (GA), and ant colony optimization (ACO), which have mostly been evaluated on a single source. We adapt these methods to a distributed setting and evaluate the effects of distinct join methods, i.e., nestedloop join, bind join, and AGJoin. When optimizing RDF chain queries combining real-world data from 34 different SPARQL endpoints, the ACO method produces the best results in the least amount of time for most chain queries consisting of up to about ten joins. For larger chain queries, each of our considered algorithms may have its benefits, depending on the join method used. When using the least naive join method, i.e., AGJoin, a GA approach produces solutions of a competitive quality in significantly less time than both ACO and 2PO. Copyright is held by the owner/author(s).

, , , ,,
30th Annual ACM Symposium on Applied Computing, SAC 2015
Erasmus University Rotterdam

Hogenboom, A., Niewenhuijse, E., Jansen, M., Frasincar, F., & Vandic, D. (2015). RDF chain query optimization in a distributed environment. Presented at the 30th Annual ACM Symposium on Applied Computing, SAC 2015. doi:10.1145/2695664.2695711