General Data Considerations
First, ensure that the data files that you wish to load are structured and organized according to the following sections of this guide. The top-level study folder, labeled with the study name, must be in the "data" directory. On Clarivate installations, this directory is typically at </u01/transmart/data>, and will typically contain subdirectories to separate </Public studies> and </Private Studies>. All new studies nested in this directory set will be queued for loading by the ETL tool. The study folder must contain appropriately named sub-directories: one for each data type associated with the study.
Invocation is achieved through a command line interface. Log into the server through a terminal SSH command as a user with the appropriate credentials. (On Clarivate installations, this is typically user 'etl'.) You will need either a password, or possession of a security certificate (contact the server administrator for details). Once logged in, you can verify the data file locations. For example,
cd /u01/transmart/data/[program-folder]/ ls -al
The ETL tool can be invoked from anywhere, but you need to have the either the full or relative path to the tm_etl.jar file. This is usually found in the 'transmart' folder. For example,
Call the tool with the --help (or '-h') flag to receive a concise summary of usage with flag descriptions. Briefly, if the tool were invoked from the 'data' folder, the following command would be issued (requires 'tee' be installed):
java -jar /path_to_etl_tool/tm_etl.jar --secure-study
This will attempt to install all new studies as secure studies that can be locked down and secured on a user or group basis after they are loaded. While the etl is running, you will not have access to your cursor. You can pipe the standard output from the screen to a log file (add '| tee [logfile]'). To retain this ability to continue to invoke command line interaction, nest the above as follows (requires 'nohup' be installed):
nohup java -jar /path_to_etl_tool/tm_etl.jar --secure-study | tee [logfile_name] &
If the load fails for a particular study, examine the log file (focus on the top of the java traceback) for clues as to potential problems. In addition, two other files are generated that might aid in tracking difficult issues. 1) If duplicate values (more than one value mapped to a single data point, a common problem with data structure that is otherwise hard to pin down) exist, a file 'duplicate values' will be written to the offending study folder. 2) A 'SummaryStatistic.txt' file is written to data subfolders in each study.