Repositories
The inplementaiton and the code for running the experiments can be access at the the two GitHub project linked below.
We create the snowflake pattern, star pattern, and fully denormalized pattern and show how these patterns can be used to improve querytimes over the RDF version of the TPC-H dataset.
The inplementaiton and the code for running the experiments can be access at the the two GitHub project linked below.
This program generates a serie of SPARQL construct queries that create the snowflake pattern and fully denormalized pattern cubes.
This Java program use Apache Maven to manage dependencies
The SWOD Tools project contains generated SPARQL queries, thus it is not nessary to run the SWOD program in order to run the experiments
These tools will allow you to generate the TPC-H data in triples (generate.sh), load the data into Virtuoso and Apache Jena (load.sh), run the TPC-H queries on the triple stores (query.sh), and analyse the results by comparing the queries (extractQueryTimes.py, compareResults.py).
All scripts are written in bash and python, this might result in some problem on windows systems.
The batch scripts takes a series of "sources" as input, these modular configurations files are located in the "source" folder. Be aware the these configuration files need to be set up manually before running any of the programs.
The python scripts have a help flag (--help) that displays the allowed parameters.
Download and install the following progarm
Create configuration files (source files) that match your system (source/machine/) and wanted configuration (scale factor etc.)
Generate or download the dataset
Generation requires Virtuoso for running the construct queries
Install Virtuoso or Apache Jena
Load the data into the Jena TDB or Virtuoso by using the appropriate configuration files
Change the querymix configuration (source/) to mach which queries you want to execute, run the querymix.sh program to propagate these settings.
Run the query.sh script with the appropiate configuration files to start the experiments
Use the extractQueryTimes.py on the generated logfiles (logs/) to extract and aggregate the query times.
The experiments can now be compare using the compareResults.py script
Feel free to post bug report and ask questions