Files included in export:
- ANOVA PAIRWISE file(s)
- ANOVA RESULTS file(s)
- Boxplot image file(s)
The job information submitted to tranSMART. The job information gives an overview of the input data for the analysis. At Parameters it shows the jobType to indicate the type of analysis and which image will be produced. The selected variables are displayed using the full concept path names for each of them.
In case of high-dimensional data this file contains information on which genes were selected for the independent and dependent variable names.
Variables selected: The items dependentVariable and independentVariable show the full path name of the items selected in tranSMART to use as input, reflect the two input box names in the user interface. Combining several items gives information on the type of variable selected and if the variable was used for binning or not. variablesConceptPaths gives a summary of all used concept paths.
divIndependentVariableType and divDependentVariableType indicate the type of variables that were used. Values include CLINICAL for categorical and numeric low-dimensional data types or the type of high-dimensional data that was used, for example “mrna” incase of a mrna expression dataset.
dependentVariableCategorical and independentVariableCategorical indicate whether the selected variables are CATEGORICAL or not. Note the the X column in the outputfile.txt is either the CATEGORICAL variable or the binned variable (effectively turning it into a categorical variable).
div(In)dependetPathwayName gives the name of the genes in case of a high dimensional variable.
flipImage is either True, when the Dependent variable is used to group the observations, or False, in case the Independent variable is used for grouping.
Binning: There are three different options for binning:
- EDP; Evenly Distribute Population
- ESB; Evenly Spaced Bins
- Manual binning
In case of EDP or ESB for numerical or high-dimensional variables, the file will contain the following parameters to provide information on which variable was used to bin:
- binning; Either True or False
- binDistribution; EDP or ESB
- numberOfBins; Number of bins defined by the algorithm
- binVariable; either IND or DEP which stands for Independent or Dependent, respectively. Note that you can see which variable was used as independent or dependent using the independentVariable and dependentVariable items.
Information on the actual bin boundaries is available from the outputfile.txt in the column named X.
In case of the Manual binning option the above items are also included in the jobInfo but the item manualBinning will be set to True instead of False, indicating that manual bins were defined. Additionally, an item named binRanges is added to reflect the different bins manually defined. Note that both the numberOfBins and binVariable are still relevant in this case, while binDistribution is not.
The data with three to five columns describing the points plotted:
- PATIENT_NUM; Subject identifier
- X; The variable that is used to group the observations; categorical or binned. Can both be the independent or dependent variable depending on the data input.
- Y; The numerical variable used to plot the box. Can both be independent or dependent variable depending on the data input.
Optional columns when using multiple independent or/and dependent variables and when using high dimensional data:
- GROUP; Assigned automatically, name of either the independent or dependent group to be displayed in the boxplot
- GROUP.1; Assigned automatically, name of either the independent or dependent group to be displayed in the boxplot. Only used when GROUP is already present.
NOTE: Using two numerical or high-dimensional concepts requires binning of either the independent or the dependent variable.
When using high-dimensional data, it is possible to select multiple independent or dependent variables by selecting multiple genes or selecting a gene with multiple probes in the high dimensional pop-up. When a high-dimensional data point is binned, it will be treated as a category and will produce boxes determined by the bins. In case of plotting probe-level data, when for a selected gene more than one probe is available, this will result in one plot per probe.
When using two independent variables with two dependent variables this will produce a plot with two sets of boxes, each set will have two boxes, one for each dependent variable selected. In case of high-dimensional data each probe counts as a dependent variable producing its own box.
In case of using two high-dimensional concepts the GROUP and GROUP.1 names represent the probe names as stored in tranSMART. To find the gene names corresponding to the probes used, the two GROUP columns need to be combined with jobInfo.txt to see what was used as input for the X & Y columns. As mentioned before the X always represents the categorical or binned variable, to see whether to independent or dependent variable is used as X look at the “independentVariableCategorical” in jobInfo.txt. If it states False then the X is the dependent variable and if it is True the X is the dependent variable. The independent gene name is displayed next to the “divIndependentPathwayName” item and the dependent gene name is displayed next to the “divDependentPathwayName” item.
json representation of the jobInfo.txt
ANOVA PAIRWISE file(s)
A contingency table view of the paired t-test p-values for each group. The file starts with the group name and then shows the contingency table for the variable selected to provide the numeric input.
In case two high-dimensional concepts were used as input, the output will consist of multiple files where the naming convention follows: ANOVA_PAIRWISE_<Gene/ProbeName>.txt. Note that the variable indicated as binning is used in the naming convention.
ANOVA RESULTS file(s)
A file with the ANOVA results for each group. Starts with the group name followed by a p-value overview for that group with both the p-value and F-statistic scores. Next a summary follows for each category showing one row per category option with the mean value and the number of instances in that category group.
If two high-dimensional concepts were used as input, the output will consist of multiple files where the naming convention follows: ANOVA_RESULTS_<Gene/ProbeName>.txt. Note that the binned variable is used in the naming convention.
Boxplot image file(s)
The boxplot stored as a PNG image file. When using two variables like genes with multiple probes taken from high-dimensional data, more than one image will be produced. Naming convention in that case is: BoxPlot_<ProbeName>.png
On the X-axis, the group names are displayed if the independent variable is chosen to be categorical. In case of multiple sets of boxes, the bars will be coloured and a legend will be provided to indicate which boxes correspond to which group. On the Y-axis, the numerical values are displayed when the dependent variable is chosen to be numeric.
Note that switching around the independent and dependent variable will produce either a horizontal or vertical boxplot image.