Updated Implementation for Deepdive version 0.8

During our project Deepdive has been updated to version 0.8, heavily changing the way an application is set up. On this page we are providing a step-by-step guide to build a custom Deepdive application.

1) Install Deepdive and download the example application

You first need to download and install Deepdive as explained here. Next download our provided example application files here. You will need to extract each file to the correct directory in your Deepdive app. You also have to change the directory of input_sentences.txt inside the script buildSentencesTable.py to the location of your choosing.

2) Build necessary database tables

First run the command deepdive compile to compile the app.ddlog file (You will need to run the command whenever you change app.ddlog). Now you can generate the sentences table by running deepdive do sentences. Next run the command deepdive do entity_mention to create a table containing every entity in the provided input.

3) Relation candidates

For each desired relation you will have to include the following function in app.ddlog and change the implementation python script as well as the entity types needed for the relation. In this case we used PERSON and LOCATION, to find possible has_nationality relations using hasNationalityRelation.py. For your own relation, you can simply copy the provided python script and change the name of the relation in the output string (from "has_nationality" to your own relation).

function has_nationality_candidate over (
    sentence_id text,
    mention1_id text,
    mention1_text text,
    mention2_id text,
    mention2_text text
 ) returns rows like has_relation_candidate
   implementation "udf/hasNationalityRelation.py" handles tsv lines.

 has_relation_candidate += has_nationality_candidate(sentence_id, mention1_id, mention1_text, mention2_id, mention2_text) :-
   entity_mention(sentence_id, , , mention1_text, mention1_id, "LOCATION"),
   entity_mention(sentence_id, , , mention2_text, mention2_id, "PERSON").
  

Now run the script using deepdive do has_relation_candidate. If you made any changes in app.ddlog, don't forget to run deepdive compile first.

4) Relation features

You will now need to add the following code for each relation to app.ddlog and update the corresponding names and entity types. This will create a table containing generic features like the words between to entities.

@extraction
has_nationality_feature(
    @key
    @references(relation="has_nationality", column="p_id", alias="has_nationality")
    p_id text,
    @key
    @references(relation="has_nationality", column="loc_id", alias="has_nationality")
    loc_id text,
    @key
    feature text
).

function extract_has_nationality_features over (
        p_id          text,
        loc_id          text,
        p_begin_index int,
        p_end_index   int,
        loc_begin_index int,
        loc_end_index   int,
        doc_id         text,
        sent_index     text,
        tokens         text,
        lemmas         text,
        pos_tags       text,
        ner_tags       text
    ) returns rows like has_nationality_feature
    implementation "udf/extract_features.py" handles tsv lines.

has_nationality_feature += extract_has_nationality_features(
    p_id, loc_id, p_begin_index, p_end_index, loc_begin_index, loc_end_index,
    doc_id, sent_index, tokens, lemmas, pos_tags, ner_tags
) :-
    entity_mention(sent_index, p_begin_index, p_end_index, _, p_id, "PERSON"),
    entity_mention(sent_index, loc_begin_index, loc_end_index, _, loc_id, "LOCATION"),
    sentences(doc_id, , tokens, lemmas, pos_tags, , ner_tags, _, sent_index).
  

Run the script using deepdive do has_nationality_feature (change has_nationality to the name of your own table).

4) Distant supervision

Add the following code for each relation to app.ddlog and update the corresponding names and entity types. Then you can just copy supervise_has_relation.py rename it and add your own custom rules to the file.

@extraction
has_nationality_label(
    @key
    @references(relation="has_nationality", column="p_id", alias="has_nationality")
    p_id text,
    @key
    @references(relation="has_nationality", column="loc_id", alias="has_nationality")
    loc_id text,
    @navigable
    label int,
    @navigable
    rule_id text
).

# make sure all pairs in has_nationality_candidate are considered as unsupervised examples
has_nationality_label(p,loc, 0, NULL) :- has_relation_candidate(p, loc, , , , , "has_nationality", _).

# supervision by heuristic rules in a UDF
function supervise over (
        p_id text, p_begin int, p_end int,
        loc_id text, loc_begin int, loc_end int,
        doc_id         text,
        sentence_index text,
        sentence_text  text,
        tokens         text,
        lemmas         text,
        pos_tags       text,
        ner_tags       text
    ) returns rows like has_nationality_label
    implementation "udf/supervise_has_nationality.py" handles tsv lines.
has_nationality_label += supervise(
    p_id, p_begin, p_end,
    loc_id, loc_begin, loc_end,
    doc_id, sentence_index, sentence_text,
    tokens, lemmas, pos_tags, ner_tags
) :- has_relation_candidate(p_id, loc_id, , , , , "has_nationality", _),
     entity_mention(, p_begin, p_end, , p_id, _),
     entity_mention(sentence_index, loc_begin, loc_end, , loc_id, ),
     sentences(
        doc_id, sentence_text,
        tokens, lemmas, pos_tags, , ner_tags, , sentence_index ).


# resolve multiple labels by majority vote (summing the labels in {-1,0,1})
has_nationality_label_resolved(p_id, loc_id, SUM(vote)) :- has_nationality_label(p_id, loc_id, vote, rule_id).

# assign the resolved labels for the has_nationality relation
has_nationality(p_id, loc_id) = if l > 0 then TRUE
                      else if l < 0 then FALSE
                      else NULL end :- has_nationality_label_resolved(p_id, loc_id, l).

## Inference Rules ############################################################

# Features
@weight(f)
has_nationality(p_id, loc_id) :-
    has_relation_candidate(p_id, loc_id, , , , , "has_nationality", _),
    has_nationality_feature(p_id, loc_id, f).

# Inference rule: Symmetry
@weight(3.0)
has_nationality(p_id, loc_id) => has_nationality(loc_id, p_id) :-
    has_relation_candidate(p_id, loc_id, , , , , "has_nationality", _).

# Inference rule: Only one nationality
@weight(-1.0)
has_nationality(p_id, loc_id) => has_nationality(p_id, loc2_id) :-
    has_relation_candidate(p_id, loc_id, , , , , "has_nationality", _),
    has_relation_candidate(p_id, loc2_id, , , , , "has_nationality", _).
##########################
  

5) Final predictions

Finally add the following table to the beginning of app.ddlog and change the names appropriately:

@extraction
has_nationality?(
    @key
    @references(relation="entity_mention", column="mention_id", alias="p")
    p_id text,
    @key
    @references(relation="entity_mention", column="mention_id", alias="loc")
    loc_id text
).
  

After running deepdive do has_nationality (or the name of your relation) and deepdive do probabilities you are done and can find the results either by accessing the corresponding tables as shown in the tutorial or by using Mindbender and MindTagger.

Downloads

Download example_project