Gene expression analysis measures the abundance of the mRNA molecules, and gives us insight into the regulation of the genes of interest. Machinery in the cell reads the sequence of the gene in groups of three bases. Each group of three bases (codon) corresponds to one of 20 different amino acids used to build a protein.”
Note that standards for RNA Seq gene expression are still under development.
Gene Expression Platforms:
- microarray analysis
- RNA Seq analysis– RNASeq data contains information about both nucleotide sequence and gene expression.
Recommendations
Summary
- We recommend the existing format standards laid out by the repositories such as NCBI (GEO) and EBI Array Express + ENA
- We recommend using ontologies and controlled vocabularies to annotate the required metadata.
Data formats
For microarray analysis:
- We recommend the existing format standards laid out by the repositories such as NCBI (GEO) and EBI Array Express + ENA
For RNA Seq analysis:
The NCBI SRA database (http://www.ncbi.nlm.nih.gov/Traces/sra/) is the official repository for the actual sequence data, produced in the form of FASTQ and/or BAM files.
For more information on formats:
Convert data format
You can convert different formats to FASTQ or BAM using the Bioconvert tool.
Metadata
Metadata is important for all gene expression studies – whether microarray or RNA Seq data.
- Microarray data: NCBI-GEO- MIAMI compliant- http://www.ncbi.nlm.nih.gov/geo/info/MIAME.htm
- RNA-Seq data – follow the guidelines for MIAME as far as applicable (FastQ)
For BAM files – additional info needed:
- mapping software
- mismatch settings
- reference sequences used such as IWGSC survey sequences or MIPS gene models, or transcriptome assembly
Please refer to this paper for more information:
Vocabularies
We recommend using ontologies and controlled vocabularies to annotate the required metadata. Please see the recommendations on the detailed page: Ontologies and Vocabularies
- Plant Ontology terms to describe the plant tissues and developmental stage
- Plant Environment Ontology to describe the experimental conditions
- Plant Stress Ontology to describe the treatments with pathogens, stress conditions (proposed)
- Gene Ontology is the standard for the functional analysis
- Microarray ontology (MO) terms mapped to the OBI/OBO foundry ontology terms – MGED ontology (http://bioportal.bioontology.org/ontologies/MO?p=classes)
Published on: 02 October 2014
Updated on: 27 April 2015
It seems a bit inconsistent to be mentioning SRA as “the official repository” for RNASeq reads when earlier there is almost equal emphasis on ArrayExpress/ENA and GEO/SRA. I think this entry could do with being clearer that there are two international repositories in the USA and Europe and be more specific about the landing page for data submissions…
Also the reference to https://www.betacell.org/documents/administered/about/guidelines/ENCODE_BCBC_RNA-Seq_Standards_V01_20110503.pdf as the basis for more information on standards does not take you to anything useful, but to a site on beta-cell biology that is no longer active.