2012-12-13

Skip Navigation Donghui Li , Tanya Z. Berardini , Robert J. Muller and Eva Huala * Department of Plant Biology, The Arabidopsis Information Resource, Carnegie Institution for Science, Stanford, CA 94305, USA ↵ *Corresponding author: Tel: +1 650 739 4310 ; Fax: +1 650 462 5968 ; Email: ehuala{at}carnegiescience.edu Received July 9, 2012. Revision received September 21, 2012. Accepted October 15, 2012. Abstract TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana , a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org Introduction Published literature continues to be one of the most important repositories of scientific data. The number of articles in PubMed related to Arabidopsis thaliana , a model organism for plant biology research, has increased from 2014 articles in 2002 to 4343 in 2011. Accessing the huge volume of experimental results in the primary research literature, often published in the form of unstructured free text, poses a significant challenge for the research community. To meet this challenge, the biocuration community has taken on the task of collecting and organizing published experimental data into a format suitable for large-scale querying, comparison and computational analysis ( 1 ). This is achieved by converting some of the free text data into controlled vocabulary-based statements through manual curation of the primary literature. Biologists today have become increasingly dependent on such computable datasets provided by biological databases for data access, analysis and discovery. TAIR (The Arabidopsis Information Resource, http://www.arabidopsis.org ) is the primary database for A . thaliana ( 2 , 3 ). TAIR serves as a centralized gateway to Arabidopsis biology, research materials and community members. TAIR is highly used by Arabidopsis researchers, as well as the broader plant research community, with Google Analytics usage statistics showing 164 000 visits and 53 000 unique visitors per [...]

View Full Article... database.oxfordjournals.org

Show more