Heatmaps
In Analyze, a heatmap is a matrix of data points for a particular set of biomarkers, such as genes, at a particular point in time and/or for a particular tissue sample in the study, as measured for each subject in the study.
In an Analyze heatmap:
 The values in the heatmap are based on the Zscore calculation
 The color red indicates higherthannormal expression
 The color green indicates lowerthannormal expression
 Biomarkers appear in the yaxis, and subjects appear in the xaxis.
Note
A heatmap can display data points for up to 1000 samples.
Max rows to display
The order of data points to be display is determined by the standard deviation on probe level. First, the probes that do not have a standard deviation are removed. Then, the standard deviation is calculated independently from groups, so the whole mRNA data set is used. Next the standard deviation values are sorted from the highest to the lowest and only the top rows will be displayed.
Selecting biomarker subsets
When using the High Dimensional Data button to select only the biomarkers of interest it is possible selected biomarkers are not displayed in the heatmap image. This is due to the dataset of interest not having any data for the selected biomarker.
Note
The autocomplete in the High Dimensional Data popup uses biomarker dictionaries to suggest autocomplete. These dictionaries do not take into considerations which biomarkers are available for a selected dataset.
Analyze uses the R software environment for statistical computing and to generate analyses and visualizations. For more information, visit http://www.rproject.org.
You can generate the following types of heatmaps:
Standard Heatmap
A standard heatmap is a visualization of biomarker data points with no indication of patterns, groupings, or differentiation among the data points.
To begin the analysis, see Running the Analyses, then perform the following steps.
To perform a standard heatmap analysis:

Click the Advanced Workflow tab, then open the Analysis menu.

Select Heatmap.
The Variable Selection section appears.

Drag a highdimensional data node (), or several highdimensional nodes in the case of serial data, into the Variable Selection box.

Click the High Dimensional Data button.
The Compare SubsetsPathway Selection dialog box appears.

Specify the platform and other filters for the analysis.
For information, see High Dimensional Data.

Click Apply Selections.

In Max rows to display, type the maximum number or rows in the heatmap.

Optionally, select either or both of the following:

Click Run.

Your analysis appears below:
Note
With serial data, the heatmap will display the various conditions ordered by increasing associated value, such as in chronological order for a time series.
Hierarchical Clustering
Hierarchical clustering is a visualization of patterns of related data points in gene expression data.
To begin the analysis, see Running the Analyses, then perform the following steps.
To perform a hierarchical clustering heatmap analysis:

Click the Advanced Workflow tab, then open the Analysis menu.

Select Hierarchical Clustering.
The Variable Selection section appears.

Drag a highdimensional data node () into the Variable Selection box.

Click the High Dimensional Data button.
The Compare SubsetsPathway Selection dialog box appears.

Specify the platform and other filters for the analysis.
For information, see High Dimensional Data.

Click Apply Selections.

In Max rows to display, type the maximum number or rows in the heatmap.

Optionally, select one or more of the following:

Click Run.

Your analysis appears below:
Note
To read more about Hierarchical Clustering, visit: http://www.ics.uci.edu/~eppstein/280/cluster.html
KMeans Clustering
KMeans clustering is a visualization of groupings of the most closely related data points, based on the number of groupings you specify.
Note
The KMeans analysis clusters columns only. Rows are not clustered.
To begin the analysis, see Running the Analyses, then perform the following steps.
To perform a kmeans clustering heatmap analysis:

Click the Advanced Workflow tab, then open the Analysis menu.

Select KMeans Clustering.
The Variable Selection section appears.

Drag a highdimensional data node () into the Variable Selection box.

Click the High Dimensional Data button.
The Compare SubsetsPathway Selection dialog box appears.

Specify the platform and other filters for the analysis.
For information, see High Dimensional Data.

Click Apply Selections.

In Number of clusters, type the number of clusters to include in the heatmap.

In Max rows to display, type the maximum number or rows in the heatmap.

Optionally, select Calculate zscore on the fly.

Click Run.

Your analysis appears below. Clusters are represented by the colored bars at the top of the heatmap:
Note
To read more about KMeans Clustering, visit: http://www.ics.uci.edu/~eppstein/280/cluster.html
Marker Selection
A marker selection heatmap is a visualization of differentially expressed genes in distinct phenotypes. Specifically, the algorithm determines the set of genes which is most differently expressed between the two subsets. This list of differentially expressed genes is subsequently presented in a table, along with a variety of accompanying statistics.
Optionally, you can run a MetaCore Enrichment Analysis from a generated Marker Selection heatmap.
To begin the analysis, see Running the Analyses, then perform the following steps.
Note
Two subsets must be specified when using a Marker Selection heatmap.
To perform a marker selection heatmap analysis:

Click the Advanced Workflow tab, then open the Analysis menu.

Select Marker Selection.
The Variable Selection section appears.

Drag a highdimensional data node () into the Variable Selection box.

Click the High Dimensional Data button.
The Compare SubsetsPathway Selection dialog box appears.

Specify the platform and other filters for the analysis.
For information, see High Dimensional Data.

Click Apply Selections.

In the Number of Markers field, type a numeric value. This will determine the number of differentially expressed genes that are returned.

Optionally, select either or both of the following:

Click Run.

Your analysis appears below. The subsets are represented by the colored bars at the top of the heatmap:
A table of the top markers appears below the heatmap. You can sort the table by clicking any of the column headings. Optionally, you can view MetaCore settings and run a MetaCore Enrichment Analysis by clicking the buttons above the table.
For more information about MetaCore Enrichment Analysis see MetaCore Enrichment Analysis.
The following table represents a portion of the data from the Marker Selection heatmap illustrated above:
Note
For more information on the analyses used in Marker Selection, visit: http://mathworld.wolfram.com/BonferroniCorrection.html
Zscore calculation
The zscores used by default in the advanced analysis like the Heatmaps are calculated during the data loading and are dependent on the ETL tool used to load the data. It is recommended to check the documentation of your ETL tool for more information on this. Documentation for transmartbatch.
Some of the advanced analysis that use the zscore have a check box to indicate Calculate zscore on the fly. This uses the log transformed representation of the data to recalculate the zscore based on the subset of data that was selected. The zscore is calculated using the following formula:
If the standard deviation of the probe is 0 the zscore will be equal to 0. The final zscore will be cutoff with a minimum value of 2.5 and a maximum value of 2.5.
The median that is used in the calculation is retrieved from the subset of the data you selected and the zscore calculation takes into consideration the subset a patient is in. In practice this means when using two subsets in your analysis the median value for the probe will be different between these two groups.
Comments
0 comments
Please sign in to leave a comment.