Project website, DBTREES, available for NSF PGRP "Standards and CyberInfrastructure that enable Big-Data Driven Discovery for Tree Crop Research" .
This NSF Plant Genome Research Program funded project (award # 1444573) will enhance tree crop databases developed using the Tripal platform by providing cross-site communication, adoption of common data standards, “big data” integration and analysis, as well as enabling citizen science.
Project Summary
Trees are fundamental for life, providing essential oxygen, carbon remediation, habitat, lumber, shelter, energy, food and recreation. Contributing over $130 billion per year, tree crops are important to the US economy and are the economic backbone for many rural areas. Like all crops, they face increasing challenges from abiotic/biotic stresses including rapid climate change and disease. Providing access to high quality genotypic, phenotypic and environmental data and data-mining tools through Tripal, a common, resource-efficient database platform will enable interrogation of this data for basic and applied research purposes in ways currently not available.
This project will create a model "ecosystem" of tree community databases that can inter-communicate, and provide big data analysis tools utilizing common controlled vocabularies. The significant investment in big data generation, cyberinfrastructure, and comprehensive semantic ontologies by federal agencies will be leveraged by this project to bring richly annotated datasets and enhanced computing capabilities to individual scientists. Adoption of these new capabilities will be promoted through educational online modules for "guided" workflow analysis and ontological curation that train scientists to effectively query existing data, upload new data, assign metadata, and perform custom analyses. It is anticipated that the outcomes of this project will accelerate both basic discovery and improvement of important agronomic and silvic traits in tree crops. A major outreach effort will help raise public awareness of the critical importance of healthy trees to a productive, sustainable planet and promote stewardship of these critical resources.
Specific Objectives
- Establish a common, resource-efficient, sustainable platform acrossall major tree crop databases by (a) migrating TreeGenes to Tripal, and (b) fully implementing Tripal for the Hardwood Genomics Web
- Enhance utility of tree crop research data by (a) incorporating established ontologies in tree databases related to plant structure,
traits, phenotypic data quality and the environment, (b) enabling user-driven standardized ontology association and metadata collection, (c) construction of a Tripal extension module to standardize data collection. - Enable cross-database data mining and analysis by developing (a) Tripal extension module for cross-site querying, enabling a user to collate or view data from multiple Tripal sites, and (b) providing access to High Performance Computing at CyVerse and external data resources analysis capability via CyVerse, Galaxy and Semantic Web Services.
- Enable fully customizable cross-disciplinary data mining through new Tripal extension modules for flexible querying and visualization of complex phenotype, genotype, and environment data.
- Support adoption of Tripal infrastructure and module development.
- Promote public awareness and stewardship of our tree resources though use of TreeTaggr and expanding the forestry careers website.