The objective is to get a multitrial Phenotype dataset for a given set of plant material / genotype possibly filtered by trait and environment. This dataset can be used for a phenotype metaanalysis or, by adding relevant genotyping information, for a genetic study.
Challenges:
- Plant material (or germplasm) can be searched identified by taxon, accession, panel, collection. We need unambiguous ID for all germplasms
- Synonym handling: cimmyt/2341 == inra/986 (Open Linked Data)
- Germplasm alignment: all germplasm must receive the same ID with a fusion of synonyms list
- URIs are very good candidates here
- Observed Variables / Traits in search results (data matrix)
- case 1: All trials use the same ontologies
- No alignment needed, integration is straightforward
- case 2: Trials use different ontologies. They are mapped to each other.
- There is a possible integration if protocols are compatible. This information must be encoded in the ontologies
- case 3: Trials use different ontologies. They are not mapped to each other.
- No integration possible. The Trait must be presented as different columns with sufficient metadata and traceability data to allow curation.
- case 1: All trials use the same ontologies
- Observed Variables / Traits as search parameters
- find all possible correspondences through ontology traversing and propose the near match to the user: “grain yield”/protocol:cimmyt is equivalent to “yield”/protocol:inra. Do you agree?
- Markers: We also need unambiguous identification. This is likely to be very problematic.
- URIs?
- Synonyms
- Mapping between different sources / platform
- Computed by genomic positioning comparison
- Stored as synonyms (Open Linked Data)
Written by: Cyril Pommier
Published on: 02 October 2014
No Comments Yet