Overview | NRSP10

NRSP10 is one of seven National Research Support Project funded by the State Agricultural Experiment Stations (SAES) from the Hatch Multistate Research Fund (MRF) provided by the National Institute for Food and Agriculture (NIFA). NRSP10 leverages funding support from USDA SCRI, USDA-ARS, and the NSF PGRP and DIBBS programs, as well as the cotton, tree fruit and legume industry and US Land Grant Universities.

Mission

Establish a robust, dynamic, and widely available genomics, genetics and breeding online database platform as a resource for crops of national significance that are currently underserved (Citrus, Cool Season Food Legumes, Cotton, Rosaceae, and Vaccinium), that is flexible enough to be readily implemented for other crops and organisms valuable to U.S. agriculture.

Goals

Expand technical resources available to the existing online databases currently housing genomic, genetic and breeding data and bioinformatics tools to mine that data.
Establish consensual standards, protocols and applications for data collection, organization, storage, analysis, and curation for efficient leveraging by other underserved crops/organisms research communities.

Specific Objectives

Expand online community databases currently housing high quality genomics, genetics and breeding data for Rosaceae, citrus, cotton, cool season food legume and Vaccinium crops
Develop/Implement a tablet application to collect phenotypic data from field and laboratory studies.
Develop a Tripal Application Programming Interface for building breeding databases.
Convert GenSAS, the online community genome annotation tool, to Tripal
Develop/implement Web Services to promote database interoperability

Rationale

Advances in DNA sequencing, genotyping, and phenotyping technologies have led to a paradigm shift in life science research. Scientists now routinely sequence and genotype genomes from populations, progeny and individual organisms to create highly saturated genetic maps, identify loci influencing important traits and conduct large-scale standardized phenotyping. Already, crop scientists are generating petabytes of data that require organization, storage, analysis, curation and visualization – so they can be efficiently converted into useful information and knowledge. These massive datasets also require integration with other genomic, genetic and breeding data to optimize their availability and utility to the research community and maximize return on investment. Existing crop genome databases were typically custom-developed in isolation, with interfaces and underlying database schema targeted to a specific crop. Most are resource-intensive, require sophisticated management and are not amenable to implementation for other crops or organisms. At the same time, they have been critical for scientific advances in their client research communities. Early versions of the Genome Database for Rosaceae (GDR, www.rosaceae.org) database, first released in 2003, exemplified these limitations. However, the GDR bioinformatics team undertook R&D to overcome legacy issues, with two significant outcomes: 1) Development of a generic database schema, termed “Chado”, featuring a natural diversity module to allow storage of large scale genotype and phenotype data; and, 2) Enhancement of “Tripal”, a collection of open-source freely available software modules based on the Drupal open source content management platform. Tripal is a member of the GMOD family of tools serving as a web front end for the Chado relational database. It is an open source, efficient, flexible and modular platform for building online biological databases and managing content. Tripal marries the data-handling power of Chado with Drupal and is currently used at Washington State University for databases serving the Citrus, Cotton, Pulse Crop, Rosaceae, and Vaccinium communities. The 24 crops now represented in these databases are commercially grown throughout the U.S and had a combined production value of $23.6 billion in 2012. These databases are now considered the international repositories for genomic data in their crop groups and offer great advantage to U.S. researchers in a range of scientific disciplines. Other groups have or are developing Tripal databases, including: KnowPulse, Hardwood Genomics Project, and Legume Information System. This national collaborative effort leverages collective expertise to lower the cost of establishing and maintaining databases while enabling application of cutting edge bioinformatics tools.