The objective is to get a multitrial Phenotype dataset for a given set of plant material / genotype possibly filtered by trait and environment. This dataset can be used for a phenotype metaanalysis or, by adding relevant genotyping information, for a genetic study.

Challenges:

  • Plant material (or germplasm) can be searched identified by taxon, accession, panel, collection. We need unambiguous ID for all germplasms
    • Synonym handling: cimmyt/2341 == inra/986 (Open Linked Data)
    • Germplasm alignment: all germplasm must receive the same ID with a fusion of synonyms list
    • URIs are very good candidates here
  • Observed Variables / Traits in search results (data matrix)
    • case 1: All trials use the same ontologies
      • No alignment needed, integration is straightforward
    • case 2: Trials use different ontologies. They are mapped to each other.
      • There is a possible integration if protocols are compatible. This information must be encoded in the ontologies
    • case 3: Trials use different ontologies. They are not mapped to each other.
      • No integration possible. The Trait must be presented as different columns with sufficient metadata and traceability data to allow curation.
  • Observed Variables / Traits as search parameters
    • find all possible correspondences through ontology traversing and propose the near match to the user: “grain yield”/protocol:cimmyt is equivalent to “yield”/protocol:inra. Do you agree?
  • Markers: We also need unambiguous identification. This is likely to be very problematic.
    • URIs?
    • Synonyms
    • Mapping between different sources / platform
    • Computed by genomic positioning comparison
    • Stored as synonyms (Open Linked Data)

Written by: Cyril Pommier
Published on: 02 October 2014