During an ETL process, a lot can go wrong. The cause can sometimes be difficult to determine. This section provides some ways to help you gure out where the problem might be.
Your first step should always be to increase the log level of Kitchen. This was described in Section 4. We recommend the debug log level. The rowlevel log level will output lines for each individual row inserted to the database and therefore, this level of verbosity can hinder your search for the cause of the error.
If you can not find a cause in the output of Kitchen, it is time to look at the audit logs. During ETL, most actions are logged to tables in the database, these logs can provide clues as to what is going wrong. First, connect to the database as the tm_cz user as described in Section 9. We need to nd out the job ID of the job that is causing the error. This is probably the most recent job that was carried out. Execute the following SQL statement:
SELECT * FROM cz_job_master ORDER BY job_id DESC LIMIT 10;
The most recent job is now at the top. The job_status column should show the value FAIL. You can also check the start date to make sure this is the job you want to investigate. Now we can go to the audit log to view the steps taken in this job. Suppose the job ID was XXXX, then execute the following query:
SELECT * FROM cz_job_audit WHERE job_id=XXXX;
You will now get a list of steps taken, this shows you which steps are completed and which steps fail. To get to the specific error message, execute the following query:
SELECT * FROM cz_job_error WHERE job_id=XXXX;
You will then be able to see the error message. One possible error message is invalid input syntax for type numeric: "". This indicates that an empty string is being inserted in the place where a numeric value should go. You probably have an empty cell somewhere in your data that got overlooked during the curation process.
As an alternative to using the command-line based psql program, you can use any graphical Postgres administration tool. Here we will describe how to set up pgAdmin, available from http://www.pgadmin.org/download/
First, we need to set up our database connection. Under the File menu, choose Add server.
Give the connection a meaningful name, and ll in the rest of the details. Remember to connect as user tm cz, with password tm cz. If you are using a tunnel, choose use your tunnel entrance point as the host and port. The add server dialog should look like shown in Figure 13. Now click OK.
Figure 13: Add a server in pgAdmin.
On the left side of the main pgAdmin window is the Object browser. You should see your newly created server connection here. Double-clicking the server name makes pgAdmin connect to the server. A tree structure is shown representing the contents of the database server. Open the Databases node, then open the transmart node and the schemas node. This shows all schemas currently in the transmart database (see Figure 14).
Open the Tables node under the tm cz schema node and nd the cz job master, cz job audit and cz job error tables. You can right-click on any of these tables and choose View data/View Last 100 rows
to see the most recent information, as shown in Figure 15. Alternatively you can open the Query tool from the Tools menu, and enter the SQL queries described above.
Figure 14: The Object browser in pgAdmin.
Figure 15: Retrieve the most recent information from a table in pgAdmin.