Files included in export:
- CoxRegression_result.txt file(s)
- SurvivalCurve_FitSummary.txt file(s)
- SurvivalCurve_Table.txt file(s)
- SurvivalCurve.png file(s)
The job information submitted to tranSMART. The job information gives an overview of the input data for the analysis. At parameters it shows the jobType to indicate the type of analysis and which image will be produced. The selected variables are displayed using the full concept path names for each of them.
In case of high-dimensional data this file contains information on which genes were selected for the independent and dependent variable names.
Variables selected: timeVariable, categoryVariable and censoringVariable give an overview of the variables selected for the Time, Category and Censoring input boxes in the user interface. variablesConceptPaths gives a summary of all used concept paths.
divDependentVariableType indicates the type of variables that were used for the Category. Values include CLINICAL for categorical and numeric low-dimensional data types or the type of high-dimensional data that was used, for example “mrna” in case of a mrna expression dataset.
Binning: There are three different options for binning:
- EDP; Evenly Distribute Population
- ESB; Evenly Spaced Bins
- Manual binning
In case of EDP or ESB for numerical or high-dimensional variables, the file will contain the following parameters to provide information on which variable was used to bin:
- binning; Either True or False
- binDistribution; EDP or ESB
- numberOfBins; Number of bins defined by the algorithm
Information on the actual bin boundaries is available in the outputfile.txt in the column named CATEGORY.
In case of the Manual binning option the abovementioned items are also included in the jobInfo but the item manualBinning will be set to True instead of False, indicating that manual bins were defined. Additionally, an item named binRanges is added to reflect the different bins manually defined. Note that both the numberOfBins and binVariable are still relevant in this case, while binDistribution is not.
The result_instance_id1 and 2 fields indicate the internal number tranSMART assigned to the subsets of patients. NOTE: The survival analysis groups subsets into one group and produces one survival plot based on the combined group.
The data with four columns describing the input data:
- PATIENT_NUM; Subject identifier
- TIME; Values indicating the time someone survived, the unit (days, weeks) is dependent on the values loaded in tranSMART.
- CENSOR; Integer, values 0 or 1. 0 means a row is not censored, 1 means the row is used as a censored row during the analysis
- CATEGORY; The category/group, used to plot survival for the patients. In case of manual binning, the bin number is mentioned here. The bin contents can be found in the jobInfo.txt in this case.
- GROUP; Only when using high-dimensional data to group variables. Indicates to which gene or probe the row of data belongs.
json representation of the jobInfo.txt
A text file with the results of the Cox regression analysis. Explains how tied events are handled, i.e. events with exactly the same survival time. For a full description on how the ties are handled please look here under sections coxph and Surv.
Under ‘Call:’ the actual R command used to run the Cox regression is shown. Next two tables are displayed with the output coefficients:
- coef; Estimated coefficient from the linear model, β.
- exp(coef); Hazard ratio
- se(coef); Standard error
- z; z-score
- PR(>|z); The probability the estimated β could be 0.
- exp(-coef); 1/exp(coef), inverse hazard ratio
- lower .95; lower 95% confidence interval
- upper .95; upper 95% confidence interval
The last table in the file displays the Rsquare, Likelihood ratio test output, Wald test score and the Score (logrank) test output.
When using a high-dimensional concept to indicate the groups, this will produce one output file per gene or probe name. Naming convention: CoxRegression_result_<GENE_NAME>.txt or CoxRegression_result_<PROBE_NAME>.txt
For more information on the Cox regression analysis go here.
A text file that shows the summary for the survfit analysis in R.
Under ‘Call:’ the actual R command used to run the survfit is shown. The table below in the file displays for each plotted category:
- n; number of subjects
- events; number of events
- median; median time value
- 0.95 LCL; lower range of time variable, 95% confidence interval
- 0.95 UCL; upper range of time variable, 95% confidence interval
When using a high-dimensional concept to indicate the groups this will produce one output file per gene or probe name.
Naming convention: SurvivalCurve_<GENE_NAME>_FitSummary.txt or SurvivalCurve_<PROBE_NAME>_FitSummary.txt
For more information on the survfit function go here.
A text file that shows the table for the survfit analysis in R.
When using a high-dimensional concept to indicate the groups this will produce one output file per gene or probe name. Naming convention: SurvivalCurve_<GENE_NAME>_Table.txt or SurvivalCurve_<PROBE_NAME>_Table.txt
Under ‘Call:’ the actual R command used to run the survfit is shown. The table in this file has the following columns per category:
- time; time of event
- n.risk; number of subjects for whom the event did not (yet) occur
- n.event; number of subjects for which the event occurred at that specific time
- survival; percentage of subjects not (yet) affected by the event
- std.err; standard error
- lower 95% CI; lower 95% confidence interval
- upper 95% CI; upper 95% confidence interval
For more information on the survfit function go here.
The Kaplan-Meier plot to display the results of the survival analysis. On the X-axis the time line and on the Y-axis the fraction of patients. The unit of time depends on the unit loaded into tranSMART. The legend indicates which group corresponds to what colour.
When using a high-dimensional concept to indicate the groups, this will produce one output file per gene or probe name. Naming convention: SurvivalCurve_<GENE_NAME>.png or SurvivalCurve_<PROBE_NAME>.png