White paper
On
INFORMATICA PERFORMANCE TUNNING
PERFORMANCE TUNNING IN INFORMATICA
Performance tuning in Informatica-
The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session performance by following-
1)Performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance.So avoidnetwork connections.
2)Flat files: If your flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server.
3)Relational data sources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance.
4)Staging areas: If you use staging areas you force informatica server to perform multiple data passes. Removing of staging areas may improve session performance.
5)You can run the multiple informatica servers against the same repository. Distributing the session load to multiple informatica servers may improve session performance.
6)Run the informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.
7)If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
8)We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections.
9)If your target consist key constraints and indexes you slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session.
10)Running a parallel session by using concurrent batches will also reduce the time of loading the data. Soconcurrent batches may also increase the session performance.
11)Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines.
12)In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.
13)Avoid transformation errors to improve the session performance.
If the sessioncontains lookup transformation you can improve the session performance by enabling the look up cache.
14)If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.
15)Aggregator, Rank and joiner transformation may often decrease the session performance, because they must group data before processing it. To improve session performance in this case use sorted ports option.
Improving Mapping Performance in Informatica-
Mapping optimization:
The best time in the development cycle is after system testing. Focus on mapping-level optimization only after optimizing the target and source databases.
Use Session Log to identify if the source, target or transformations are the performance bottleneck
Identifying Target Bottlenecks:
The most common performance bottleneck occurs when the Informatica Server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck.
Tasks to be performed to increase performance:
* Drop indexes and key constraints.
* Increase checkpoint intervals.
* Use bulk loading.
* Use external loading.
* Increase database network packet size.
* Optimize target databases.
Identifying Source Bottlenecks:
If the session reads from relational source, you can use a filter transformation, a read test mapping, or a database query to identify source bottlenecks:
* Filter Transformation - measure the time taken to process a given amount of data, then add an always false filter transformation in the mapping after each source qualifier so that no data is processed past the filter transformation. You have a source bottleneck if the new session runs in about the same time.
* Read Test Session - compare the time taken to process a given set of data using the session with that for a session based on a copy of the mapping with all transformations after the source qualifier removed with the source qualifiers connected to file targets. You have a source bottleneck if the new session runs in about the same time.
* Extract the query from the session log and run it in a query tool. Measure the time taken to return the first row and the time to return all rows. If there is a significant difference in time, you can use an optimizer hint to eliminate the source bottleneck
Tasks to be performed to increase performance:
* Optimize the query.
* Use conditional filters.
* Increase database network packet size.
* Connect to Oracle databases using IPC protocol.
Identifying Mapping Bottlenecks
If you determine that you do not have a source bottleneck.
How to Increase Informatica Server Performance:
Many factors can affect session performance. Here are some points-
Before doing tuning that is specific to Informatica:
1. Check hard disks on related machines. (Slow disk access on source and target databases, source and target file systems, as well as the Informatica Server and repository machines can slow session performance.)
2. Improve network speed. (Slow network connections can slow session performance.)
3. Check CPUs on related machines (make sure the Informatica Server and related machines run on high performance CPUs.)
4. Configure physical memory for the Informatica Server to minimize disk I/O. (Configure the physical memory for the Informatica Server machine to minimize paging to disk.)
5. Optimize database configuration
6. Staging areas. If you use a staging area, you force the Informatica Server to perform multiple passes on your data. Where possible, remove staging areas to improve performance.
7. You can run multiple Informatica Servers on separate systems against the same repository. Distributing the session load to separate Informatica Server systems increases performance.
Informatica specific:
- Transformation tuning
- Using Caches
- Avoiding Lookups by using DECODE for smaller and frequently used tables
- Applying Filter at the earliest point in the data flow etc.
Informatica PowerCenter Partitioning Option
Delivering High Performance for Processing Massive Data Volumes
The PowerCenter® Partitioning Option increases the performance of PowerCenter throughparallel data processing, and it has been instrumental in establishing PowerCenter’s industryperformance leadership. This option provides a thread-based architecture and automatic datapartitioning that optimizes parallel processing on multiprocessor and grid-based hardwareenvironments.
Partitioning Option
Key Features
Data Smart Parallelism:
• Automatically aligns PowerCenter partitions with database table partitions to improve performance.
• Automatically guarantees data integrity by leveraging the parallel engine of
PowerCenter, which dynamically realigns data partitions for set-oriented transformations.
Session Design Tools:
• Create user-defined partitioning schemesquickly and easily
• Provide a graphical partitioning map fordetermining the best partitioning points
• Gather statistics on configurable sessionoptions, such as error handling, recovery
strategy, memory allocation, and logging, to maximize performance.
Integrated Monitoring Console:
• Gathers session statistics, such asthroughput, rows/second, error details,and
performance optimizations, to identifypotential bottlenecks and recognize trends
• Shows all session execution anddependency details.
Multiple Partition Schemes:
• Support parallelization through multiplemechanisms, including key range, hash
algorithm-based, round robin, or filepartitions
• Maximize data throughput via concurrentprocessing of specified partitions along the
data transformation pipeline.
Partitioning OptionBenefits:
Scale Cost-Effectively to HandleLarge Data Volumes:
With the Partitioning Option, you can executeoptimal parallel sessions by dividing data
processing into subsets that are run inparallel and spread among available CPUs
in a multiprocessor system. When differentprocessors share the computational load,
large data volumes can be processed faster.When sourcing and targeting relational
databases, the Partitioning Option enablesPowerCenter to automatically align its
partitions with database table partitions toimprove performance. Unlike approaches
that require manual data partitioning, dataintegrity is automatically guaranteed because
the parallel engine of PowerCenter dynamicallyrealigns data partitions for set-oriented
transformations (e.g., aggregators or sorters).
Enhance Developer Productivity:
The Partitioning Option provides intuitive,GUI-based, session design tools that reduce
the time spent on initial and ongoingconfiguration and performance tuningtasks. You can easily create user-definedpartitioning schemes. A graphical partitioning
map helps you determine the best points ofpartitioning. Configurable session options,
such as error handling, recovery strategy,memory allocation, and logging, make it
easier to gather statistics used to maximizeperformance.
Optimize System Performance inResponse to Changing BusinessRequirements:
The Partitioning Option lets you easily gatherin-depth session statistics such as throughput,rows/second, error details, and performanceoptimizations. These statistics help you identifypotential bottlenecks and recognize trends.An integrated monitoring console lets youview all session execution and dependencydetails. With the metadata-driven architectureof PowerCenter, data transformation logicis abstracted from the physical executionplan. This feature enables rapid performancetuning without compromising the logic anddesign of the original data mappings. Youcan continually and easily optimize systemperformance in the face of increasing dataloads and changing business requirements.
Conclusion
The goal of performance tuning is optimize sessionperformance so sessions run during the available load window for the Informatica Server.
Informatica is a leading provider of enterprisedata integration software and services. With Informatica, organizations can gaingreater business value by integrating all theirinformation assets from across the enterprise.Thousands of companies worldwide rely onInformatica to reduce the cost and expedite the time to address data integration needs ofany complexity and scale.
1