Phenotypes are the observable characteristics of an organism resulting from interactions between genes and the environment in which it grows. Plant phenotyping can be used in several fields, such as breeding programs and biological or agronomical research in multi-location environments, germplasm bank characterisation or biorefinery. Phenotypes (traits and associated value) can be used in several fields of expertise or be specific of each field. Furthermore, phenotypes are in interaction, like for instance in biorefineries where pretreatments have an effect on the glucose yield or in breeding where the grain yield is affected by the number of grains and the grain weight.
This section provides standard formats that are used by the community to design nurseries and trials in fields and the minimum metadata required for documentation in various platforms.
- Data format: use data matrices in csv, excel
- Metadata and vocabularies: use complete metadata for at least germplasm and observation variables
- Keep curated data (checked outliers)
1. Data formats
We recommend following minimum format principles with data matrices plus metadata on at least variables (trait along with method, units and scales or environmental ones) and germplasms.
ISA-Tab is an implementation of this principle. It consists of one zip archive containing data files and metadata files, the latter being used for data discovery and interoperability. More information can be found on this dedicated page or in this presentation from Plant and Animal Genome 2015. It is currently well suited for generation by softwares. ISA-Tab format, phenotype specific configuration and tools are under improvement.
See the germplasm recommendations for data format regarding germplasm information.
2. Metadata and vocabularies
Observation Variables include trait and environment variables.
We recommend using existing variables, listed in the vocabularies and ontologies below.
To create new observation variables, we recommend using the Crop Trait Dictionary Upload Template available at Crop Ontology website. It must include all mandatory fields (trait name, description, abbreviation, synonyms, methods, and scales) to describe an observation variable creation and sharing. The most important field in this template is the Trait ID which must remain stable and never be modified. Furthermore it must never be deleted, possibly deprecated if needed. This way, it can be used in trials and remain valid in the future.
For Nursery and Trial metadata and description we recommend using the Crop Research Ontology. which describes the terms related to nurseries and trials, field management, field environments, study design, etc. These metadata are actively adapted for a wider use.
For biorefinery, we recommend using the Biorefinery ontology which describes the concepts and terms associated with biomass composition and characterization (crystallinity, surface area, particle size, porosity, etc.), physico-chemical pretreatments, enzymatic hydrolysis, and experimental processes descriptions.
Recommended Variable ontologies and vocabularies
- Wheat crop ontology
- INRA Wheat Ontology (soon publicly available)
- Wheat Phenotype
- Biorefinery ontology
- XEO, XEML Environment Ontology
For the difference between metadata, ontologies and vocabularies, see the dedicated page.
3. Raw data
We recommend sharing at least clean documented raw data, like plant height, leaf area, etc…
Phenotype data lifecycle begins with acquisition, then cleaning, elaboration and analyses. The elaboration combines several variables, like phenological stages and traits, to produce elaborated/computed variables used as input for analyses softwares. For instance, leaf area and phenology can be combined to get height at flowering. Different elaborated data are produced for different purposes, it is therefore important to be able to easily generate new ones from raw data.
Some popular Tools
1. Repositories, information systems and data integration tools
The Breeding Managment System, BMS generates standard format for collecting nursery and trial data in fields and uses for variables the Crop Research Ontology for documenting experiment related metadata and trait related ontologies of the Crop Ontology. The format makes it possible to analyze data directly using statistical tools such as Breeding View, Meta-R.
GnpIS is an INRA information system designed for plant and pest genomics. It enables scientists to mine genomic, phenomic and genetic data. For phenomic and phenotype data, it allows data discovery through a keyword based, google like, search engine and data mining. The latter allows dataset building for genetic or phenomic analysis. Data integration in GnpIS is based on a strict identification of germplasms on variables through ontologies like those of the Crop Ontology.
The Breeding API specifies a standard interface for plant phenotype/genotype databases to serve their data to crop breeding applications. It is a shared, open API, to be used by all data providers and data consumers who wish to participate.
For biorefinery applications, the best match pretreatment-biomass achieving best glucose yields can be found through the @Web platform. The Documents tab on @Web structures information by a kind of pretreatment (topics Bioref-XX). Data available include glucose yields, pretreatments used, biomass types and characterization, etc. In the future it will also be possible to find the best match pretreatment-phenotype.
iPlant collaborative offers many services that allow the analysis of genomic, environment and phenotypic data.
2. Data acquisition
Field Book is a simple app for taking phenotypic notes on field research plots. Collecting data in the field has traditionally been a laborious process requiring writing notes by hand followed by transcription. We have created Field Book to replace paper field books to enable increased collection speed with greater data integrity.
Things to follow in the future
Published on: 02 October 2014
Updated on: 27 April 2015