Pushdown Optimization

Pushdown optimization:-

Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers.

Pushdown Optimization

Pushdown optimization is a way of load-balancing among servers in order to achieve optimal performance. Veteran ETL developers often come across issues when they need to determine the appropriate place to perform ETL logic. Suppose an ETL logic needs to filter out data based on some condition. One can either do it in database by using WHERE condition in the SQL query or inside Informatica by using Informatica Filter transformation. Sometimes, we can even "push" some transformation logic to the target database instead of doing it in the source side (Especially in the case of EL-T rather than ETL). Such optimization is crucial for overall ETL performance.

How does Push-Down Optimization work?

One can push transformation logic to the source or target database using pushdown optimization. The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.

Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.

Let us take an example: Image: Pushdown Optimization Example 1
Filter Condition used in this mapping is: DEPTNO>40

Suppose a mapping contains a Filter transformation that filters out all employees except those with a DEPTNO greater than 40. The Integration Service can push the transformation logic to the database. It generates the following SQL statement to process the transformation logic:

INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO)
SELECT EMP_SRC. EMPNO,
EMP_SRC.ENAME,
EMP_SRC.SAL,
EMP_SRC.COMM,
EMP_SRC.DEPTNO
FROM EMP_SRC
WHERE (EMP_SRC.DEPTNO >40)

The Integration Service generates an INSERT SELECT statement and it filters the data using a WHERE clause. The Integration Service does not extract data from the database at this time.

We can configure pushdown optimization in the following ways:

Using source-side pushdown optimization:

The Integration Service pushes as much transformation logic as possible to the source database. The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database and executes the corresponding SELECT statement.

Using target-side pushdown optimization:

The Integration Service pushes as much transformation logic as possible to the target database. The Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the database and executes the DML.

Using full pushdown optimization:

The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs source-side or target-side pushdown optimization instead. Also the source and target must be on the same database. The Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target. When it can push all transformation logic to the database, it generates an INSERT SELECT statement to run on the database. The statement incorporates transformation logic from all the transformations in the mapping. If the Integration Service can push only part of the transformation logic to the database, it does not fail the session, it pushes as much transformation logic to the source and target database as possible and then processes the remaining transformation logic.

For example, a mapping contains the following transformations:
SourceDefn -> SourceQualifier -> Aggregator -> Rank -> Expression -> TargetDefn

SUM(SAL), SUM(COMM) Group by DEPTNO
RANK PORT on SAL
TOTAL = SAL+COMM

Image: Pushdown Optimization Example 2

The Rank transformation cannot be pushed to the database. If the session is configured for full pushdown optimization, the Integration Service pushes the Source Qualifier transformation and the Aggregator transformation to the source, processes the Rank transformation, and pushes the Expression transformation and target to the target database.

When we use pushdown optimization, the Integration Service converts the expression in the transformation or in the workflow link by determining equivalent operators, variables, and functions in the database. If there is no equivalent operator, variable, or function, the Integration Service itself processes the transformation logic. The Integration Service logs a message in the workflow log and the Pushdown Optimization Viewer when it cannot push an expression to the database. Use the message to determine the reason why it could not push the expression to the database.

How does Integration Service handle Push Down Optimization?

To push transformation logic to a database, the Integration Service might create temporary objects in the database. The Integration Service creates a temporary sequence object in the database to push Sequence Generator transformation logic to the database. The Integration Service creates temporary views in the database while pushing a Source Qualifier transformation or a Lookup transformation with a SQL override to the database, an unconnected relational lookup, filtered lookup.

1. To push Sequence Generator transformation logic to a database, we must configure the session for pushdown optimization with Sequence.
2. To enable the Integration Service to create the view objects in the database we must configure the session for pushdown optimization with View.
2. After the database transaction completes, the Integration Service drops sequence and view objects created for pushdown optimization.

Configuring Parameters for Pushdown Optimization

Depending on the database workload, we might want to use source-side, target-side, or full pushdown optimization at different times and for that we can use the $$PushdownConfig mapping parameter. The settings in the $$PushdownConfig parameter override the pushdown optimization settings in the session properties. Create $$PushdownConfig parameter in the Mapping Designer , in session property for Pushdown Optimization attribute select $$PushdownConfig and define the parameter in the parameter file.

The possible values may be,
1. none i.e the integration service itself processes all the transformations,
2. Source [Seq View],
3. Target [Seq View],
4. Full [Seq View]

Pushdown Optimization Viewer

Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to the database. Select a pushdown option or pushdown group in the Pushdown Optimization Viewer to view the corresponding SQL statement that is generated for the specified selections. When we select a pushdown option or pushdown group, we do not change the pushdown configuration. To change the configuration, we must update the pushdown option in the session properties.

Database that supports Informatica Pushdown Optimization

We can configure sessions for pushdown optimization having any of the databases like Oracle, IBM DB2, Teradata, Microsoft SQL Server, Sybase ASE or Databases that use ODBC drivers.

When we use native drivers, the Integration Service generates SQL statements using native database SQL. When we use ODBC drivers, the Integration Service generates SQL statements using ANSI SQL. The Integration Service can generate more functions when it generates SQL statements using native language instead of ANSI SQL.

Handling Error when Pushdown Optimization is enabled

When the Integration Service pushes transformation logic to the database, it cannot track errors that occur in the database.

When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

If we configure a session for full pushdown optimization and the session fails, the Integration Service cannot perform incremental recovery because the database processes the transformations. Instead, the database rolls back the transactions. If the database server fails, it rolls back transactions when it restarts. If the Integration Service fails, the database server rolls back the transaction.

Informatica tuning

Step by step approach:-

Source Query/ General Query Tuning

1.1 Calculate original query cost
1.2 Can the query be re-written to reduce cost?
- Can IN clause be changed with EXISTS?
- Can a UNION be replaced with UNION ALL if we are not using any DISTINCT cluase in query?
- Is there a redundant table join that can be avoided?
- Can we include additional WHERE clause to further limit data volume?
- Is there a redundant column used in GROUP BY that can be removed?
- Is there a redundant column selected in the query but not used anywhere in mapping?
1.3 Check if all the major joining columns are indexed
1.4 Check if all the major filter conditions (WHERE clause) are indexed
- Can a function-based index improve performance further?
1.5 Check if any exclusive query hint reduce query cost
- Check if parallel hint improves performance and reduce cost
1.6 Recalculate query cost
- If query cost is reduced, use the changed query

Tuning Informatica LookUp

1.1 Redundant Lookup transformation
- Is there a lookup which is no longer used in the mapping?
- If there are consecutive lookups, can those be replaced inside a single lookup override?
1.2 LookUp conditions
- Are all the lookup conditions indexed in database? (Uncached lookup only)
- An unequal condition should always be mentioned after an equal condition
1.3 LookUp override query
- Should follow all guidelines from 1. Source Query part above
1.4 There is no unnecessary column selected in lookup (to reduce cache size)
1.5 Cached/Uncached
- Carefully consider whether the lookup should be cached or uncached
- General Guidelines
- Generally don't use cached lookup if lookup table size is > 300MB
- Generally don't use cached lookup if lookup table row count > 20,000,00
- Generally don't use cached lookup if driving table (source table) row count < 1000
1.6 Persistent Cache
- If found out that a same lookup is cached and used in different mappings, Consider persistent cache
1.7 Lookup cache building
- Consider "Additional Concurrent Pipeline" in session property to build cache concurrently
"Prebuild Lookup Cache" should be enabled, only if the lookup is surely called in the mapping

Tuning Informatica Joiner

3.1 Unless unavoidable, join database tables in database only (homogeneous join) and don't use joiner
3.2 If Informatica joiner is used, always use Sorter Rows and try to sort it in SQ Query itself using Order By (If Sorter
Transformation is used then make sure Sorter has enough cache to perform 1-pass sort)
3.3 Smaller of two joining tables should be master

Tuning Informatica Aggregator

4.1 When possible, sort the input for aggregator from database end (Order By Clause)
4.2 If Input is not already sorted, use SORTER. If possible use SQ query to Sort the records.

Tuning Informatica Filter

5.1 Unless unavoidable, use filteration at source query in source qualifier
5.2 Use filter as much near to source as possible

Tuning Informatica Sequence Generator

6.1 Cache the sequence generator

Setting Correct Informatica Session Level Properties

7.1 Disable "High Precision" if not required (High Precision allows decimal upto 28 decimal points)
7.2 Use "Terse" mode for tracing level
7.3 Enable pipeline partitioning (Thumb Rule: Maximum No. of partitions = No. of CPU/1.2)
(Also remember increasing partitions will multiply the cache memory requirement accordingly)

Tuning Informatica Expression

8.1 Use Variable to reduce the redundant calculation
8.2 Remove Default value " ERROR('transformation error')" for Output Column.
8.3 Try to reduce the Code complexity like Nested If etc.
8.4 Try to reduce the Unnecessary Type Conversion in Calculation

sequence generator cache size? 1 Month, 4 Weeks ago Karma: 0

Here is what you need to know about Informatica Sequence Generator Cache:
When you use sequence generator, every time the sequence generator transformation is called, it returns a new value. If your sequence generator starts with 1 and increments by 1 then it gives you 2 in the next call and 3 , 4, 5 etc. in the subsequent calls.
Why Cache is Required?
It is a good idea to cache a few values in advance in sequence generator. This gives a performance benefit as sequence generator does not go back to fetch the next value every time it is called. For example, if you are going to load value 1 to value 50,000 in the target using a sequence generator and your target commit frequency is 10,000 then you might set the cache value = 10,000 - so that the sequence generator can prepare 10000 values well in advance.
Maximum Value that can be cached
But remember, if you set the sequence generator to cycle, you cannot cache more values than what will fit in a given cycle of sequence numbers.
Therefore, the maximum value allowed for CACHE must be less than the value determined by the following formula:
Max value for cache = (CEIL (MAXVALUE - MINVALUE)) / ABS (INCREMENT)
One disadvantage of caching is, if there is any failure before the commit happens then the potential number of lost values is equal to the value of the cache parameter.