Gene expression analysis measures the abundance of the mRNA molecules, and gives us insight into the regulation of the genes of interest. Machinery in the cell reads the sequence of the gene in groups of three bases. Each group of three bases (codon) corresponds to one of 20 different amino acids used to build a protein.”

Note that standards for RNA Seq gene expression are still under development.

Gene Expression Platforms:

  • microarray analysis
  • RNA Seq analysis– RNASeq data contains information about both nucleotide sequence and gene expression.

Recommendations

Summary

  1. We recommend the existing format standards laid out by the repositories such as NCBI (GEO)  and EBI Array Express +  ENA
  2. We recommend using ontologies and controlled vocabularies to annotate the required metadata.

Data formats

For microarray analysis:

  • We recommend the existing format standards laid out by the repositories such as NCBI (GEO)  and EBI Array Express +  ENA

For RNA Seq analysis:

The NCBI SRA database (http://www.ncbi.nlm.nih.gov/Traces/sra/) is the official repository for the actual sequence data, produced in the form of FASTQ and/or BAM files.

For more information on formats:

Convert data format
You can convert different formats to FASTQ or BAM using the Bioconvert tool.

https://bioconvert.readthedocs.io/en/master/_images/conversion.png

 

Metadata

Metadata is important for all gene expression studies – whether  microarray or RNA Seq data.

For BAM files – additional info needed:

  • mapping software
  • mismatch settings
  • reference sequences used such as IWGSC survey sequences or MIPS gene models, or transcriptome assembly

Please refer to this paper for more information:

https://www.betacell.org/documents/administered/about/guidelines/ENCODE_BCBC_RNA-Seq_Standards_V01_20110503.pdf

Vocabularies

We recommend using ontologies and controlled vocabularies to annotate the required metadata. Please see the recommendations on the detailed page: Ontologies and Vocabularies

  • Plant Ontology terms to describe the plant tissues and developmental stage
  • Plant Environment Ontology  to describe the experimental conditions
  • Plant Stress Ontology to describe the treatments with pathogens, stress conditions (proposed)
  • Gene Ontology is the standard for the functional analysis
  • Microarray ontology (MO) terms mapped to the OBI/OBO foundry ontology terms – MGED ontology (http://bioportal.bioontology.org/ontologies/MO?p=classes)

 

 

Written on: WDI working group
Published on:  02 October 2014
Updated on: 27 April 2015