Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

We create the snowflake pattern, star pattern, and fully denormalized pattern and show how these patterns can be used to improve querytimes over the RDF version of the TPC-H dataset.


In today's data-driven world, analytical querying, typically based on the data cube concept, is the cornerstone of answering important business questions and making data-driven decisions. Traditionally, the underlying analytical data was mostly internal to the organization and stored in relational data ware houses and data cubes. Today, external data sources are essential for analytics and, as the Semantic Web gains popularity, more and more external sources are available in native RDF. With the recent SPARQL 1.1 standard, performing analytical queries over RDF data sources has finally become feasible. However, unlike their relational counterparts, RDF data cubes stores lack optimizations that enable fast querying. In this paper, we present an approach to optimizing RDF data cubes that is based on three novel cube patterns that optimize RDF data cubes, as well as associated algorithms that transform the RDF data cube. An extensive experimental evaluation shows that the approach allows trading additional storage and/or load times in return for significantly increased query performance. We further provide guidelines for which patterns to apply for specific scenarios and systems.

Authors: Kim A. Jakobsen, Alex B. Andersen, Katja Hose, and Torben Bach Pedersen

TPC-H relational diagram

The TPC-H relational diagram

Copyright © 2014 - All Rights Reserved - EXTBI