Answering Provenance-Aware Queries on RDF Data Cubes under Memory Budgets

Abstract

Abstract. The steadily-growing popularity of semantic data on the Web and the support for aggregation queries in SPARQL 1.1 have propelled the interest in Online Analytical Processing (OLAP) and multidimensional data (cubes) in RDF. One important factor of query answering on web data is its provenance, i.e., metadata that tells us about the origin and quality of the data. Some applications, e.g., in data analytics, access control, etc., require to augment the data with provenance metadata and run queries that impose constraints on this provenance. This task is called provenance-aware query answering. In an RDF OLAP setting, the queries tend to be complex, i.e., they contain many triple patterns with grouping and aggregation. In this paper, we investigate the benefit of caching some parts of an RDF cube augmented with provenance information when answering provenance-aware SPARQL queries. We propose provenance-aware caching (PAC), a caching approach relying, on a provenance-aware partitioning for RDF graphs, and a benefit model optimized for RDF cubes and SPARQL queries with aggregation. We compare the performance of our caching approach with the standard memory mapping approach of Jena TDB using a synthetic dataset and show an improvement of query evaluation time across all queries.

Authors: Kim Ahlstrøm, Luis Galárraga, Katja Hose, and Torben Bach Pedersen


SSB query characteristics

Query Measures Filters S-S Joins S-O Joins Triple Patterns Dimensions GROUP BYs
AQ1.1 3 4 3 1 5 1 0
AQ1.2 3 5 4 1 6 1 0
AQ1.3 3 6 5 1 7 1 0
AQ2.1 1 2 5 3 9 3 2
AQ2.2 1 3 4 3 8 3 2
AQ2.3 1 2 4 3 8 3 2
AQ3.1 1 4 6 3 10 3 3
AQ3.2 1 4 7 3 11 3 3
AQ3.3 1 4 5 3 9 3 3
AQ3.4 1 3 5 3 9 3 3
AQ4.1 2 3 8 4 13 4 2
AQ4.2 2 4 8 4 13 4 3
AQ4.3 2 3 9 4 14 4 3

Copyright © 2014 - All Rights Reserved - EXTBI