Files included in export:
- Heatmap image file
- CMS.txt (Comparative Marker Selection, only for Marker selection)
The job information submitted to tranSMART. The job information gives an overview of the input data for the analysis. At parameters it shows the jobType to indicate what analysis and image will be produced and it shows which variables were selected by displaying the full concept path names for each of them at indpendentVariable and variablesConceptPaths.
If applicable, the genes or proteins that were selected for the analysis are listed in the parameter divIndependentVariablePathwayName.
txtMaxDrawNumber shows the maximum number of rows to display and doGroupBySubject whether the box for ‘Group by subject (instead of node) for multiple nodes’ has been checked. calculateZscore and divIndependentVariableprobesAggregation show whether or not ‘Calculate z-score on the fly’ or probe aggregation were used, respectively.
The data with four columns describing the points plotted:
- PATIENT_NUM; The subject identifier with a prefix. The prefix is the subset identifier, indicated as either S1 or S2, followed by an underscore, the subject identifier, another underscore and the concept node name, e.g. ‘Genes’. Naming-conventions: <SUBSET_ID>_<PATIENT_ID>_<CONCEPT_NODE_NAME> or <SUBSET_ID>_<CONCEPT_NODE_NAME>_<PATIENT_ID>
- VALUE; The z-score for a patient for a probe
- GROUP; The probe name to which the value corresponds to
- GENE_SYMBOL; The gene symbol the probe belongs to
The Markerselection analysis has two additional columns, instead of ‘GROUP’:
- PROBE.ID; The probe name of the platform
- SUBSET; Subset from which the patient sample originates.
json representation of the jobInfo.txt
The heatmap image with the rows depicting the gene or protein expression as z-score. In case of multiple probes per gene/protein the expression per probe is shown, naming convention is <PROBENAME>_<GENENAME>. When probe aggregation was used, the gene (or protein) expression will be represented by the probe with the highest mean value for that particular gene/protein).
The default sorting of the heatmap is done by row, showing the row with the highest mean value first. By default the heatmap shows the first 50 probes. Each column shows the expression profile for a patient. The colours depict the z-score intensity calculated during upload, unless the ‘calculate probe on the fly’-option was used. When two subsets were used, a yellow and orange bar indicates the two subsets. The same is true for Hierarchical Clustering and Marker Selection.
The jobInfo.txt can be used to determine which analysis was done and depending on which type of analysis that was used the output image will changed.
The RHClust job refers to the hierarchical clustering and the image will be differently sorted compared to the regular heatmap (RHeatmap job), unless no clustering options have been selected in the clustering analysis. Additionally, there is a dendrogram depicted on both the columns and the rows.
The RKClust job refers to the K-means clustering method. This analysis ignores subsets selected and aims to create subsets based on the z-scores found in the data. The image has grey and brown bars, indicating the different clusters.
All information on parameter selection can be found in the jobInfo.txt.
Column names naming-convention: <SUBSET_ID>_<CONCEPT_NODE_NAME>_<PATIENT_ID> or SUBSET_ID>_<PATIENT_ID>_<CONCEPT_NODE_NAME>. Row names naming-convention: <PROBE_ID>_<GENE_SYMBOL>
Specific to the MarkerSelection job, a CMS.txt (Comparative Marker Selection) will be generated.
The CMS.txt or, Comparative Marker Selection file, shows a seven column table, with default the top 50 markers that have differential probe/gene/protein expression. From left to right these columns are;
- GENE_SYMBOL; HGNC gene symbol
- PROBE.ID; probe identifier, can be a gene or protein name but could also be a reference to a gene or protein.
- logFC; log2 fold change
- adj.P.val; adjusted p-value
Note: The adjusted p-value might not always provide the expected output and be the same for all genes.
For more information on these fields or how they are calculated please see the MarkerSelection analysis documentation Marker Selection.