############################################################################### TRIPLE SCORES FOR TRIPLE STORES Reproducability material for 2015 VLDB paper by Hannah Bast, Bjoern Buchhold and Elmar Haussmann, University of Freiburg ############################################################################### ### REQUIREMENTS * bash * python (>=2.7) with numpy (>=1.8) (and scikit-learn >= 0.14.1, to recompute some results from scratch, see below) * gnu make ### PRINTING RESULTS To produce the major result tables from our paper, use one of the following commands: * make print-score-result-table * make print-rank-result-table * make print-nationality-results-table ### RESULT FILES There are files with names corresponding to our approaches that can be inspected in detail for the judgments made by the particular approach. The files are: * first * random * prefixes * llda * words_regression * words_counting * words_mle * counting_combined * mle_combined The format of these is simple. Here is an excerpt from llda: :e:Jesus_Christ 66638 Preacher 1.0 5.0 0.012183092171 :e:Jesus_Christ 66638 Prophet 7.0 6.0 0.574005830593 :e:Jesus_Christ 66638 Carpenter 6.0 2.0 0.412908247109 Columns are TAB separated and in this order: * the entity name (prefixed with :e:, spaces replaced by _) * a popularity measure (the number of times the entity is mentioned in Wikipedia) * the profession * the computed score for this profession (mapped to 0..7) * the correct score for this profession (as determined by the crowdsourcing task) * the original score/probability (not mapped to 0..7) ### RE-COMPUTING RESULTS In order to reproduce an approach from scratch, use the make target to clean the result file as well as all intermediate files. You can then either call "make " (where is one of first, random ...) or call one of the statements above to print results, which will re-compute previously cleaned results. E.g., to re-build "first" results call: "make clean-first" then "make first". Below there is a list of available clean targets and what is required to rebuild what was cleaned: IMPORTANT: Set the variable "PYTHON" in the Makefile to an installation of pypy for a speedup of up to a factor of 10 for several tasks. clean-all: everything below, CAREFUL! note the requirements, especially for words_regression! clean-words-counting: About 1GB of available disk space. Takes about 15min when using pypy. clean-words-mle: About 1GB of available disk space. Takes about 2h when using pypy. (Careful, with the default python interpreter this may run for a full night.) Parts that require the numpy lib will always use the default python interpreter. You can still set the PYTHON variable to pypy. clean-combined: No special requirements. Takes a few seconds. clean-prefixes: No special requirements. Takes a few seconds. clean-first: No special requirements. Takes a few seconds. clean-nationality: About 1GB of available disk space. Performs (among others) MLE and counting and hence takes as long as those two together. clean-llda: Downloads and compiles the JGibbLDA library (https://github.com/myleott/JGibbLabeledLDA). A rebuild will require a lot of RAM (around 64GB), around 2GB of disk space and take roughly an hour. clean-words-regression: IMPORTANT: Edit the Makefile and set the variable on top to a filesystem location with sufficient space. CAREFUL! This approach is not optimized and requires ~620GB of free disk space and runs roughly 3 hours. Apart from that, the python framework scikit-learn has to be installed. ### CONTACT If you have questions, feel free to contact us: [bast,buchhold,haussmann] at informatik.uni-freiburg.de