TAC 2009 Knowledge Base Population Track

Overview

Question Answering and Information Extraction have been studied over the past decade; however evaluation has generally been limited to isolated targets or small scopes (i.e., single documents). At TAC 2009 the Knowledge Base Population (KBP) Track will explore extraction of information about entities with reference to an external knowledge source. Using basic schema for persons, organizations, and locations, nodes in an ontology must be created and populated using unstructured information found in text. A collection of Wikipedia Infoboxes will serve as a rudimentary initial knowledge representation.

In traditional IE it might be sufficient to learn that the actor Paul Newman was born in Cleveland, Ohio and was married to Joanne Woodward; however, KBP requires filling in birthplace and spouse slots for the appropriate node in the reference ontology. Furthermore, the goal would be to link the spouse field in the 'Paul Newman' node to another ontology node -- one for Joanne Woodward -- and not merely provide a textual fragment containing her name. The goal of updating an existing knowledge source will require synthesizing information from multiple documents and grounding entity mentions within the knowledge base. The problem can be formulated as a QA task -- slots can be filled in by asking questions, like "Where was Paul Newman born?"

Heng Ji and Ralph Grishman are coordinating the KBP evalulation in 2010. For details about TAC-KBP 2010 see the track website and the NIST TAC 2010 website for general details about TAC.

In 2009 the focus will be on slot filling, entity linkage, and provenance. Future directions that could be explored are detection of novelty and contradiction, and temporal qualification of information.

Information for TAC 2009 Participants

Guidelines for working papers, presentations, and conference details are available at the TAC 2009 site: http://www.nist.gov/tac/2009/

The deadline has passed for submitting results to the KBP track.

Entity Linking Submissions

A Perl script to validate entity linking submissions is now available: check_kbp_entity-linking.pl.

Ground truth judgments are available for the entity linking task. You'll need your TAC 2009 password to download the data.

Slot Filling Submissions

A Perl script to validate slot filling submissions is now available: check_kbp_slot-filling.pl. Details about submitting results online should be posted to the mailing list later today (7/24/09). Note: run ids for slot filling submissions should be of the form 'TAC 2009 Team ID + Run #', for example, 'ACME3' would be the third run from the ACME team.

Task Description

A revised task description has been posted (6/4/09).

The outdated, original guidelines are still available here.

XML files containing sample queries and a description of submission format are available for both the entity linking and slot filling tasks.

A scoring script is available for the Entity Linking task.

Data

Data for the task is available from the LDC.

Note: Teams must submit both the NIST Agreement Concerning Dissemination of TAC Results and the LDC Data Use agreement in order to receive the evaluation data.

In addition to the Sample Corpus data provided by the LDC, several participants have contibuted development data for the entity linking task.

Revised Schedule

  May 1  Sample data available
  Jun 10 Test data (KB + documents) available
  Jul 2  Entity Linking target list released
  Jul 13 Entity Linking submissions due to NIST (12:00 noon, EDT)
  Jul 14 Slot Filling target list released
  Jul 27 Slot Filling submissions due to NIST (12:00 noon, EDT)  (note 1-week extension)
  Sep 29 Assessments available
  Nov 16-17 TAC 2009 Workshop (NIST)

Mailing List

The mailing list for the KBP Track is tac-kbp@nist.gov. To subscribe, send a message to listproc@email.nist.gov such that the body consists of the line:
    subscribe tac-kbp  
In order for your messages to get posted to the list, you must send them from the email address used when you subscribed to the list. For additional information on how to use mailing lists hosted at NIST, send a message to listproc@email.nist.gov such that the body consists of the line:
    HELP

Advisory Committee

Hoa Dang (NIST)
Radu Florian (IBM)
Andrew Hickl (Language Computing Corporation)
James Mayfield (JHU)
Paul McNamee (JHU)
Satoshi Sekine (NYU)
Maarten de Rijke (University of Amsterdam)
Ralph Weischedel (BBN)
Dan Weld (University of Washington)

Related Events

KBP follows work in information extraction at ACE and in question answering at TREC QA.

The Web People Search (WePS) task explored entity clustering and personal attribute extraction.

IJCAI '09 is holding a workshop (July 2009) on User-Contributed Knowledge and AI

There will be a workshop at ACL-IJCNLP 2009 on Collaboratively Constructed Semantic Resources

Wikimania 2009 (to be held in August 2009 in Buenos Aires).