Sybase Adaptive Server IQ
Version 12.0
An Introduction to
Query Engine Internals,
Diagnostics and Performance
David Walrath
Sybase IQ engineering
December 20, 2001
Draft Copy
Please note many sections are incomplete or in need of further editing, or still refer to IQ 11.x
Author’s notes
This paper is intended to provide a basic introduction to Sybase IQ’s query engine and query optimizer, with some notes and comments about bugs and on how to improve performance. It is not intended to provide detailed design information about the IQ optimizer or query engine internals. A fair amount of knowledge of relational databases, SQL queries and of Sybase IQ is assumed.
This document is based on a set of three earlier papers on joins, aggregation, and predicates, and a set of slides on Sybase IQ query diagnostics. This is an unpublished, living document that it is occasionally updated with additions, clarifications, and corrections. Please send comments, corrections or suggestions to the Sybase IQ query group.
Acknowledgments
This document grew out of many discussions within the Sybase IQ query group, internal memos and guides, and suggestions from various members of the IQ SWAT, PSE and tech support groups. In particular, Steve Kirk of the Sybase IQ query group is the source of much of this paper, and certain parts are directly plagiarized from his memos.
Distributing this document
(Sybase Internal Notice) This document contains some proprietary information about Sybase products. Please notify the author or Guy Washburn in Sybase IQ Product Management before giving this document or proprietary information in this document to interested parties outside of Sybase.
(Sybase External Notice) This document is an unpublished internal document copyright ã 1998 by Sybase Inc. It contains Sybase IQ Internals information and should not be copied or distributed to other parties without the permission of Sybase. Contact Sybase IQ tech support or Sybase IQ product management for copies of this document.
Table of Contents
Introduction 6
Part 1 The IQ Dataflow Query Engine 8
Understanding queries : printing and drawing a query tree 9
Additional Query plan information 11
Query statistics 12
Interactively Showing Command Statistics 12
Dumping Command Statistics to a file 13
Long Running Queries 14
Part 2 Vertical Cursors, Filters, and local predicates 15
Vertical Cursor 15
More complex Vertical Cursors 15
Filters and FP scans 16
Reading predicates in a query plan 18
Indexes and predicates 19
IQ SET parameters which effect local predicates 20
Tuning hints and optimizations 21
Views and predicates 21
Inferred Predicates 21
IN lists vs. OR 21
Range predicates 22
Factoring out common predicates 22
More complex factoring 23
Precedence of AND versus OR 24
Part 3 Aggregation 25
What is aggregation? 25
Aggregation and query plans 25
Group By Single node 25
Group By node 25
Distinct node 26
Vertical Cursor aggregation node 26
Ordering of results with aggregation 26
IQ SET parameters which effect aggregation 27
Tuning Hints 28
Count(distinct) 28
Part 4 Joins 29
Query blocks and joins 29
The different join algorithms in IQ 30
Nested Loop 30
Hash 30
Sort/Merge 31
Nested Loop Push Down 31
Hash Push Down 32
Cartesian Nested Loop 33
Prejoin indexsets and joins 33
Join algorithm conditions and limitations 35
Good and bad join conditions 36
Primary equality join conditions: 36
Primary relational (non-equality) join conditions: 36
Secondary (or postfilter) join conditions: 37
Tuning Join Conditions 38
Join diagnostics 39
Indexes and Joins 42
Inferred join conditions 42
Documented parameters which effect Sybase IQ joins 43
Undocumented parameters which effect Sybase IQ joins 44
Some IQ join bugs fixed in 11.2.1 46
Some IQ join bugs fixed in 11.5.0 46
Known IQ bugs 46
Other Known Join Problems and Issues 46
Indexes and data skew 46
Sorts and Sort-Merge join 47
Star schemas, star joins, and “interesting cartesians” 47
Part 5 Views and Nested Tables 48
Examples 48
Tuning hints 49
Part 6 Functions 50
Part 7 Subqueries 51
Uncorrelated Subqueries 51
Correlated Subqueries 51
Tuning 52
Subqueries, query plans, and cmdstats 53
Part 8 Unions 54
Part 9 Parallelism 55
Dataflow Paralellism 55
Parallelism and joins 55
Part 10 Notes on optimizer design 56
Part 11 Indexes, Hashes, and Sorts 57
Indexes 57
Hashes 57
Sorts 57
Part 12 The IQ buffer manager 58
What happens in which cache? 58
SHMEMMB, page size, and the number of cache pages 59
Appendix A Query plan node types 60
Appendix B Aggregation examples 61
Appendix C Sybase IQ 11.5.1 65
Appendix D Sybase IQ 12.0 66
Introduction
Sybase IQ is a Relational Database Management System (RDBMS), like Sybase ASE, Microsoft SQL Server, Oracle, and Informix. Sybase IQ grew out of a query accelerator product based on bitmapped indexes created by a company called Expressway Technologies. Sybase bought the company with the intention of integrating the technology into Sybase SQL Server, but because some of the most useful features of the Expressway product could not be integrated easily into a traditional RDBMS, Sybase IQ was born as an independent Sybase RDBMS focused on decision support.
Sybase IQ 11.0 through 11.5.x was a separate database engine and data store from Sybase SQL Server, and shipped with a custom version of Open Server (called open_iq). Sybase IQ 11.x is closely attached to SQL Server – for example it uses a SQL Server engine as a database catalog repository, which is why SQL Server is required even for a standalone IQ server. (The database catalog is where the database stores metadata, or data about the database, such as table and column names, the storage map, user information, et al.) Another feature is if IQ cannot run a query, it can pass the query to SQL Server, which will attempt to process the query.
In 1997, a plan was begun to separate Sybase IQ from Sybase SQL Server/ASE. Instead of creating the functionality of a catalog server, Sybase IQ would be built as a DLL attached to a version of Sybase’s SQL Anywhere (now Adaptive Server Anywhere). In this version called SAINT internally (SQL Anywhere INTegration), IQ stripped out its own ODBC/DBlib/CTlib interfaces, its parser, and the interface to SQL Server, and uses SQL Anywhere for these functions. SQL Anywhere acts as a sort of front-end to Sybase IQ, parsing DDL, DML, and query statements and passing parsed commands down to Sybase IQ, as well as acting as a catalog server. Many features which IQ does not handle natively (stored procedures, embedded SQL, Java, etc.) are transparently handled by SQL Anywhere as if IQ itself supported such abilities. As part of this process, much of the query engine and optimizer was rewritten, both to support the new parse structure passed from SQL Anywhere, and to add a number of new optimizations. As part of Saint, several other features were added which effect the query engine:
- TSQL functions not supported in 11.x
- most SQL Anywhere functions
- ANSI outer joins (left, right and full outer joins)
- Hash Push Down joins, an ad-hoc join especially useful for “star” joins
- local and global temp tables
- cursors
IQ 12.0 also added a number of other enhancements and features, such as incremental backup, auto-synchronizing prejoin indexes, a single temp database space for all users, and a single database server model, which are not discussed in this paper.
Sybase IQ has several features and characteristics that give it advantages in many types of queries over traditional relational database engines, particularly for ad-hoc (unplanned) queries. Three of IQ’s most notable features are its bitmapped indexes, how IQ stores all data in vertical indexes with no traditional base table, and how IQ compresses data better than other databases. While none of these are part of the query engine, they do have an impact on how the query engine works.
The first feature, IQ’s bitmapped indexes, are called bitwise indexes in much of the Sybase external literature; this document (and most of IQ’s development organization) use the two terms interchangeably. Other databases have or are in the process of implementing a bitmapped index type; to the best of our knowledge, all of these other databases’ bitmapped indexes correspond approximately to Sybase IQ’s LowFast (LF) index, a bitmapped index useful for low cardinality (a small number of distinct values) data columns. Sybase IQ also has two bitmapped indexes - HighNonGroup (HNG) and HighGroup (HG) indexes - that are useful for high cardinality data; these indexes include unique bitmap index technology that has been patented by Sybase. A fourth index type, the Fast Projection (FP) index (called the “default” index in the 12.0 documentation), is not a bitmapped index. A fifth index type, LowDisk, is a bitmapped index designed for medium cardinality columns; due to poor performance in many areas it is only recommended for very limited uses.
Another feature that has major implications in the database engine is how the actual data rows are stored. Most traditional databases create a base table of data, stored as sequential rows of data, possibly clustered on a key index, with other auxiliary indexes layered on top. The exact equivalent of this base table in a traditional database does not exist in IQ; a base table of rows only exists as a virtual entity in the catalog. A useful way to think of IQ is a table with the data sliced vertically, rather than horizontally. The truth is somewhat more subtle; IQ requires an index on every column, stores the data in these indexes, and can use any index type either as a traditional index for looking up values or as a data source (like a base table column) for reading the base data. A FP index is closest to the idea of a vertical slice of the base table, but inside the IQ query engine even an FP index is treated just like any other index type - simply faster at some index operations and slower at others. And even with the FP index type, many columns are not actually stored as vertical slices of raw data and may be stored in an optimized and indexed representation for performance and compression.
Sybase IQ also advertises its ability to compress data very well. A general rule of thumb is that an IQ database will usually take the same or less space than the original raw ASCII data. In practice, most databases use anywhere from 60% to 120% of the raw ASCII data. This compression can provide some performance benefits. In some cases, the compression of data will allow IQ to store more data on a page than with more traditional databases, while in other cases, only a small disk I/O is needed to read the compressed page in.
On the other hand, IQ shares many characteristics with other databases. Once processing that can be done within the IQ indexes is complete, IQ will generate rows of data from the tables’ indexes. These rows are used for joins between tables, grouping, ordering, string and math functions not handled by the indexes, and so on. When data is processed by rows the algorithms and tools IQ uses are similar to many other databases in the industry.
Within Sybase IQ, when a piece of a query is processed within the indexes, it is called vertical processing. When a piece of a query is processed by rows it is called horizontal processing. A query that can be done entirely vertically is called a vertical query, while a query that can only be done with some amount of row processing is usually called a horizontal query.
Sybase IQ V12.0 Query Engine Internals - © 1998 Sybase Inc. Page XXX
Part 1 The IQ Dataflow Query Engine
Unless intercepted and processed by CIS/OMNI (not discussed in this document), IQ will accept a query from ISQL or a front-end tool, parse the SQL statement, and pass this parsed query to the IQ optimizer. As IQ is optimizing a query, it builds a “tree” of objects (joins, group by clauses, subqueries, etc.). The conceptual model of this tree in IQ is somewhat upside-down compared to a real tree. Tables are “leaves” at the bottom of the tree, and rows of data flow up the tree from the leaves to a “root” node at the top, where the data is passed out of IQ to the user. This model with a tree of processing nodes is usually called a dataflow model.
An example of a dataflow tree might look like:
IQ’s query engine is based on the Volcano dataflow engine model developed by professors McKenna & Graefe at the University of Wisconsin. Like Sybase IQ, Informix and Oracle also have dataflow engines based on the Volcano model. Sybase SQL Anywhere, ASE, and OMNI all currently use different, non-dataflow query engine models.
In Sybase IQ, this tree of dataflow objects starts executing at the root node at the top of the tree. This node starts by requesting a first row from the next object below. The node below then "wakes up", and unless the node contains a base table, it begins asking for rows from the next node below it. This continues down the tree until the execution reaches the database tables, which read the data in from the disks.
A database table in IQ is accessed through what IQ calls a vertical cursor node. A vertical cursor represents two functions in a Sybase IQ query. It first contains those local table predicates - that is, the parts of the WHERE clause which access only one table - which can be processed “vertically” within the IQ indexes. After this is done, the second function is to project rows of data from the tables. Once data rows have been projected, IQ processes the data much like other dataflow query engines.
Understanding queries : printing and drawing a query tree
For understanding how IQ processes a query, the most useful tool is the ability to print out a query plan. The method of displaying a query plan in IQ is to turn IQ’s Query_Plan parameter on ( set [temporary] option query_plan = ‘on’ ). When a query is run, IQ will write the query plan to the iq message file – “<database name>.iqmsg”.