Gene expression platforms provide the metadata for high dimensional expression data. They represent microarray and other assay techniques used to generate mRNA expression data.
There are many examples in the GEO database, where platform identifiers have identifuers GPL folloew by a number. In GEO the platforms have extensive annotation. This is reduced, for tranSMART, only 3 columns plus 2 repeated values:
- GPL_ID: the name of the platform repeated on each row
- PROBE_ID: The identifier used in GEO for the data value, this will also appear on a row in the expression data file.
- GENE_SYMBOL: The gene locus name which should be the name recommended by Entrez. If missing tranSMART loading will try to look up the value, so it is more efficient to ensure gene symbols are in the platform annotation file.
- GENE_ID: The Entrez unique gene ID which is the preferred link to the gene. If missing, tranSMART will try to look up the ID from the gene symbol so it is more efficient to ensure both values are present or both absent.
- ORGANISM: The organism name repeated on each row
A single probe can represent more than one gene.
A gene can be represented by more than one probe.