SNP data steps
Select raw data file:
This step allows selecting a raw data file containing genotyping data for this study.
This file has to contain a header line beginning by the header "ID_REF", and followed by the different sample identifiers. Then each line has to begin with the snp identifier, followed by the genotype for each sample.
The genotype has to be composed of two non-separated letters, for instance "AB". Unknown values can be set as "00" or "NC".
The header line can be preceded by any number of comments lines beginning with the character "!"
The format is checked when the file is added.
Select annotation file:
This step allows selecting a platform annotation file for this study. This file can be for instance found in the GEO website
This file has to contain a header line. The header line can be preceded by any number of comments lines beginning with the character "#"
The file has to contain four or five columns, corresponding to the rs identifier, the SNP identifier, the chromosome number, the position of the SNP on this chromosome, and optionally the gene corresponding to this SNP.
The format is checked when the file is added.
Set subject identifiers
This step allows setting the subject identifiers corresponding to samples, and initiates the subject to sample mapping file creation
For each sample identifier in first column, indicate the corresponding subject identifier.
Set platform
This step allows defining platforms for samples.
The button 'Apply' allows setting all selected fields to the value in the field names 'Value'. All fields can be selected or deselected at the same time with buttons. The button 'OK' allows updating the subject to sample mapping file.
Set tissue type
This step allows defining tissue type for samples.
The button 'Apply' allows setting all selected fields to the value in the field names 'Value'. All fields can be selected or deselected at the same time with buttons. The button 'OK' allows updating the subject to sample mapping file.
Load platform annotation
This step allows selecting a platform annotation file for this study. This file can be for instance found in the GEO website.
This file has to contain a header line. The header line can be preceded by any number of comments lines beginning with the character "#."
The file has to contain four or five columns, corresponding to the rs identifier, the SNP identifier, the chromosome number, the position of the SNP on this chromosome, and optionally the gene corresponding to this SNP.
The format is checked when the file is added.
Check platform annotation loading
This step allows checking the loading of the platform annotation.
Number of expected lines is got from the raw files, and number of inserted lines are got from the database, and displayed. It is also indicated if these values are the same in the two cases. A database connection is needed for this step.
Load meta tables
This step allows loading the metadata tables for a study.
The data tree has to be defined. It represents the path to the node containing SNP in tranSMART. The SNP node will be replaced during loading by the platform name and the tissue type.
Nodes can be added as free text, by indicating a value in the field and clicking on the button  "Add node". Then the SNP node is added with the button "Add SNP node". The steps of the loading can be selected. This can be useful if a part of the loading has succeeded and another failed, to run only the loading part which failed.
Finally, you can choose to use an ETL server to perform the loading. A database connection is needed for this step.
Check meta tables loading
This step allows checking the metadata tables have been filled correctly. Each table supposed to be filled is listed, with the expected number of rows in each table, and the number of inserted lines.
Convert files for loading
This step allows converting the files before the data loading.
The conversion requires the program plink, which path has to be provided in the field "Plink executable path".
The steps of the loading can be selected. This can be useful if a part of the loading has succeeded and another failed, to run only the loading part which failed.
Finally, you can choose to use an ETL server to perform the conversion. A database connection is needed for this step.
Load data
This step allows converting the files before the data loading.
The conversion requires the program plink, which path has to be provided in the field "Plink executable path".
The steps of the loading can be selected. This can be useful if a part of the loading has succeeded and another failed, to run only the loading part which failed.
Finally, you can choose to use an ETL server to perform the conversion. A database connection is needed for this step.
Check data loading
This step allows checking the SNP data tables have been filled correctly. Each table supposed to be filled is listed, with the expected number of rows in each table, and the number of inserted lines.
References
- ICE User Guide v1.4 https://drive.google.com/file/d/0B8lizkKDeaKhMWZBWnlnODVEQW8/view
Comments
0 comments
Please sign in to leave a comment.