Data quality is very critical to the success of every data warehouse projects. So ETL Architects and Data Architects spent a lot of time defining the error handling approach. Informatica PowerCenter is given with a set of options to take care of the error handling in your ETL Jobs. In this article, lets see how do we leverage the PowerCenter options to handle your exceptions.
You have to deal with different type of errors in the ETL Job. When you run a session, the PowerCenter Integration Service can encounter fatal or non-fatal errors. Typical error handling includes:
User Defined Exceptions : Data issues critical to the data quality, which might get loaded to the database unless explicitly checked for quality. For example, a credit card transaction with a future transaction data can get loaded into the database unless the transaction date of every record is checked.
Non-Fatal Exceptions : Error which would get ignored by Informatica PowerCenter and cause the records dropout from target table otherwise handled in the ETL logic. For example, a data conversion transformation error out and fail the record from loading to the target table.
Fatal Exceptions : Errors such as database connection errors, which forces Informatica PowerCenter to stop running the workflow.
I. User Defined Exceptions
Business users define the user defined user defined exception, which is critical to the data quality. We can setup the user defined error handling using;
Error Handling Functions.
User Defined Error Tables.
1. Error Handling Functions
We can use two functions provided by Informatica PowerCenter to define our user defined error capture logic.
ERROR() : This function Causes the PowerCenter Integration Service to skip a row and issue an error message, which you define. The error message displays in the session log or written to the error log tables based on the error logging type configuration in the session.
You can use ERROR in Expression transformations to validate data. Generally, you use ERROR within an IIF or DECODE function to set rules for skipping rows.
Eg : IIF(TRANS_DATA > SYSDATE,ERROR('Invalid Transaction Date'))
Above expression raises an error and drops any record whose transaction data is greater than the current date from the ETL process and the target table.
ABORT() : Stops the session, and issues a specified error message to the session log file or written to the error log tables based on the error logging type configuration in the session. When the PowerCenter Integration Service encounters an ABORT function, it stops transforming data at that row. It processes any rows read before the session aborts.
You can use ABORT in Expression transformations to validate data.
Eg : IIF(ISNULL(LTRIM(RTRIM(CREDIT_CARD_NB))),ABORT('Empty Credit Card Number'))
Above expression aborts the session if any one of the transaction records are coming with out a credit card number.
Error Handling Function Use Case
Below shown is the configuration required in the expression transformation using ABORT() and ERROR() Function. This transformation is using the expressions as shown in above examples.
Note :- You need to use these two functions in a mapping along with a session configuration for row error logging to capture the error data from the source system. Depending on the session configuration, source data will be collected into Informatica predefined PMERR error tables or files.
Please refer the article "User Defined Error Handling in Informatica PowerCenter" for more detailed level implementation information on user defined error handling.
2. User Defined Error Tables
Error Handling Functions are easy to implement with very less coding efforts, but at the same time there are some disadvantages such as readability of the error records from the PMERR tables and performance impact. To avoid the disadvantages of error handling functions, you can create your own error log tables and capture the error records into it.
Typical approach is to create an error table which is similar in structure to the source table. Error tables will include additional columns to tag the records as "error fixed", "processed". Below is a sample error table. This error table includes all the columns from the source table and additional columns to identify the status of the error record.
Below is the high level design.
Typical ETL Design will read error data from the error table along with the source data. During the data transformation, data quality will be checked and any record violating the quality check will be moved to error tables. Record flags will be used to identify the reprocessed and records which are fixed for reprocessing.
II. Non-Fatal Exceptions
Non-fatal exception causes the records to be dropped out in the ETL process, which is critical to quality. You can handle non-fatal exceptions using;
Default Port Value Setting.
Row Error Logging.
Error Handling Settings.
1. Default Port Value Setting
Using default value property is a good way to handle exceptions due to NULL values and unexpected transformation errors. The Designer assigns default values to handle null values and output transformation errors. PowerCenter Designer let you override the default value in input, output and input/output ports.
Default value property behaves differently for different port types;
Input ports : Use default values if you do not want the Integration Service to treat null values as NULL.
Output ports : Use default values if you do not want to skip the row due to transformation error or if you want to write a specific message with the skipped row to the session log.
Input/output ports : Use default values if you do not want the Integration Service to treat null values as NULL. But no user-defined default values for output transformation errors in an input/output port.
Default Value Use Case
Use Case 1
Below shown is the setting required to handle NULL values. This setting converts any NULL value returned by the dimension lookup to the default value -1. This technique can be used to handle late arriving dimensions
Use Case 2
Below setting uses the default expression to convert the date if the incoming value is not in a valid date format.
2. Row Error Logging
Row error logging helps in capturing any exception, which is not consider during the design and coded in the mapping. It is the perfect way of capturing any unexpected errors.
Below shown session error handling setting will capture any un handled error into PMERR tables.
Please refer the article Error Handling Made Easy Using Informatica Row Error Logging for more details.
3. Error Handling Settings
Error handling properties at the session level is given with options such as Stop On Errors, Stored Procedure Error, Pre-Session Command Task Error and Pre-Post SQL Error. You can use these properties to ignore or set the session to fail if any such error occurs.
Stop On Errors : Indicates how many non-fatal errors the Integration Service can encounter before it stops the session.
On Stored Procedure Error : If you select Stop Session, the Integration Service stops the session on errors executing a pre-session or post-session stored procedure.
On Pre-Session Command Task Error : If you select Stop Session, the Integration Service stops the session on errors executing pre-session shell commands.
Pre-Post SQL Error : If you select Stop Session, the Integration Service stops the session errors executing pre-session or post-session SQL.
III. Fatal Exceptions
A fatal error occurs when the Integration Service cannot access the source, target, or repository. When the session encounters fatal error, the PowerCenter Integration Service terminates the session. To handle fatal errors, you can either use a restartable ETL design for your workflow or use the workflow recovery features of Informatica PowerCenter
Restartable ETL Design
1. Restartable ETL Design
Restartability is the ability to restart an ETL job if a processing step fails to execute properly. This will avoid the need of any manual cleaning up before a failed job can restart. You want the ability to restart processing at the step where it failed as well as the ability to restart the entire ETL session.
Please refer the article "Restartability Design Pattern for Different Type ETL Loads" for more details on restartable ETL design.
2. Workflow Recovery
Workflow recovery allows you to continue processing the workflow and workflow tasks from the point of interruption. During the workflow recovery process Integration Service access the workflow state, which is stored in memory or on disk based on the recovery configuration. The workflow state of operation includes the status of tasks in the workflow and workflow variable values.
Please refer the article "Informatica Workflow Recovery with High Availability for Auto Restartable Jobs" for more details on workflow recovery.
Hope this article is useful for you guys. Please feel free to share your comments and any questions you may have.
Informatica Training & Tutorials