This article assumes you have installed your database using transmart-data and have set up transmart-data to load public datasets. See the separate articles in this section for more details on these steps.
Loading clinical data
The first step is to load the clinical data for the subjects in a study.
The public data is indexed in transmart data by reading index files from one or more public servers.
Clinical data is loaded as a make target. The examples assume a postgres database. If you are using oracle, simply substitute 'oracle' for 'postgres' in each example.
make -C samples/postgres load_clinical_STUDYNAME
You can expand the target name (assuming your shell supports this) using the tab key. The study name may be simply the StudyID from GEO, but more commonly it is prefixed by the source of the data curation. This allows multiple curated examples of the same study, each loaded with a distinct STUDY_ID.
The study will be loaded into a path in the tree defined in the parameter file from the public server. You can also unpack the study, exit the path, repack and then load manually if you prefer to move the study locally.
Loading platform annotation
Before loading any high dimensional data (mRNA expression, etc.) you must load the platform annotation that maps rows (probe-id etc.) in the study data to genes, proteins, metabolites, etc. in the stored ontologies.
When you load an annotation target the datatype is detected automatically (from the downbloaded parameter file) so it is a simple 'annotation' target.
For each study a simple target is provided that defined the name of the platform you require.
make -C samples/postgres load_ref_annotation_STUDYNAME
This downloads a file containing the platform, then checks whether the platform is installed and if needed downloads and installs the platform annotation.
Be careful if you decide to remove and reload any platform annotation asd all studies referencing that platform will need their high-dimensional data reloaded.
You can also load platform annotation directly, for example as part of a local data load.
make -C samples/postgres load_annotation_PLATFORMNAME
Note: Some studies have more than one high-dimensional dataset and may use more than one platform. In these cases there will be other targets to load usually with some suffix to the STUDYNAME (a, b, etc.)
Loading high dimensional data
Once platform annotation data is loaded, you can load expression data using a simple target
make -C samples/postgres load_expression_STUDYNAME
Other high dimensional datatypes have similar target types:
|acgh||Array CGH data|
|expression||mRNA expression data|
|mirnaqpcr||Micro RNA qPCR data|
|mirnaseq||MicroRNA RNAseq data|
|msproteomics||Mass-spec proteomics data|
|rbm||Rules-based medicine proteomics data|
|rnaseq||Count data from RNAseq for expression|
When loading fails
From time to time data loading will fail. Where the data loading script has started to be executed, database log tabkes will be updated with the prorgess of your ETL job.
You can see the log output of the latest job with:
make -C samples/postgres showdblog
The output should indicate where the job failed and provide a clue to an error in the data or an issue in your local instance.