Sybase Adaptive Server IQ
Version 12.0

An Introduction to

Query Engine Internals,

Diagnostics and Performance

David Walrath

Sybase IQ engineering

December 20, 2001

Draft Copy

Please note many sections are incomplete or in need of further editing, or still refer to IQ 11.x
Author’s notes

This paper is intended to provide a basic introduction to Sybase IQ’s query engine and query optimizer, with some notes and comments about bugs and on how to improve performance. It is not intended to provide detailed design information about the IQ optimizer or query engine internals. A fair amount of knowledge of relational databases, SQL queries and of Sybase IQ is assumed.

This document is based on a set of three earlier papers on joins, aggregation, and predicates, and a set of slides on Sybase IQ query diagnostics. This is an unpublished, living document that it is occasionally updated with additions, clarifications, and corrections. Please send comments, corrections or suggestions to the Sybase IQ query group.

Acknowledgments

This document grew out of many discussions within the Sybase IQ query group, internal memos and guides, and suggestions from various members of the IQ SWAT, PSE and tech support groups. In particular, Steve Kirk of the Sybase IQ query group is the source of much of this paper, and certain parts are directly plagiarized from his memos.

Distributing this document

(Sybase Internal Notice) This document contains some proprietary information about Sybase products. Please notify the author or Guy Washburn in Sybase IQ Product Management before giving this document or proprietary information in this document to interested parties outside of Sybase.

(Sybase External Notice) This document is an unpublished internal document copyright ã 1998 by Sybase Inc. It contains Sybase IQ Internals information and should not be copied or distributed to other parties without the permission of Sybase. Contact Sybase IQ tech support or Sybase IQ product management for copies of this document.


Table of Contents

Introduction 6

Part 1 The IQ Dataflow Query Engine 8

Understanding queries : printing and drawing a query tree 9

Additional Query plan information 11

Query statistics 12

Interactively Showing Command Statistics 12

Dumping Command Statistics to a file 13

Long Running Queries 14

Part 2 Vertical Cursors, Filters, and local predicates 15

Vertical Cursor 15

More complex Vertical Cursors 15

Filters and FP scans 16

Reading predicates in a query plan 18

Indexes and predicates 19

IQ SET parameters which effect local predicates 20

Tuning hints and optimizations 21

Views and predicates 21

Inferred Predicates 21

IN lists vs. OR 21

Range predicates 22

Factoring out common predicates 22

More complex factoring 23

Precedence of AND versus OR 24

Part 3 Aggregation 25

What is aggregation? 25

Aggregation and query plans 25

Group By Single node 25

Group By node 25

Distinct node 26

Vertical Cursor aggregation node 26

Ordering of results with aggregation 26

IQ SET parameters which effect aggregation 27

Tuning Hints 28

Count(distinct) 28

Part 4 Joins 29

Query blocks and joins 29

The different join algorithms in IQ 30

Nested Loop 30

Hash 30

Sort/Merge 31

Nested Loop Push Down 31

Hash Push Down 32

Cartesian Nested Loop 33

Prejoin indexsets and joins 33

Join algorithm conditions and limitations 35

Good and bad join conditions 36

Primary equality join conditions: 36

Primary relational (non-equality) join conditions: 36

Secondary (or postfilter) join conditions: 37

Tuning Join Conditions 38

Join diagnostics 39

Indexes and Joins 42

Inferred join conditions 42

Documented parameters which effect Sybase IQ joins 43

Undocumented parameters which effect Sybase IQ joins 44

Some IQ join bugs fixed in 11.2.1 46

Some IQ join bugs fixed in 11.5.0 46

Known IQ bugs 46

Other Known Join Problems and Issues 46

Indexes and data skew 46

Sorts and Sort-Merge join 47

Star schemas, star joins, and “interesting cartesians” 47

Part 5 Views and Nested Tables 48

Examples 48

Tuning hints 49

Part 6 Functions 50

Part 7 Subqueries 51

Uncorrelated Subqueries 51

Correlated Subqueries 51

Tuning 52

Subqueries, query plans, and cmdstats 53

Part 8 Unions 54

Part 9 Parallelism 55

Dataflow Paralellism 55

Parallelism and joins 55

Part 10 Notes on optimizer design 56

Part 11 Indexes, Hashes, and Sorts 57

Indexes 57

Hashes 57

Sorts 57

Part 12 The IQ buffer manager 58

What happens in which cache? 58

SHMEMMB, page size, and the number of cache pages 59

Appendix A Query plan node types 60

Appendix B Aggregation examples 61

Appendix C Sybase IQ 11.5.1 65

Appendix D Sybase IQ 12.0 66

Introduction

Sybase IQ is a Relational Database Management System (RDBMS), like Sybase ASE, Microsoft SQL Server, Oracle, and Informix. Sybase IQ grew out of a query accelerator product based on bitmapped indexes created by a company called Expressway Technologies. Sybase bought the company with the intention of integrating the technology into Sybase SQL Server, but because some of the most useful features of the Expressway product could not be integrated easily into a traditional RDBMS, Sybase IQ was born as an independent Sybase RDBMS focused on decision support.

Sybase IQ 11.0 through 11.5.x was a separate database engine and data store from Sybase SQL Server, and shipped with a custom version of Open Server (called open_iq). Sybase IQ 11.x is closely attached to SQL Server – for example it uses a SQL Server engine as a database catalog repository, which is why SQL Server is required even for a standalone IQ server. (The database catalog is where the database stores metadata, or data about the database, such as table and column names, the storage map, user information, et al.) Another feature is if IQ cannot run a query, it can pass the query to SQL Server, which will attempt to process the query.

In 1997, a plan was begun to separate Sybase IQ from Sybase SQL Server/ASE. Instead of creating the functionality of a catalog server, Sybase IQ would be built as a DLL attached to a version of Sybase’s SQL Anywhere (now Adaptive Server Anywhere). In this version called SAINT internally (SQL Anywhere INTegration), IQ stripped out its own ODBC/DBlib/CTlib interfaces, its parser, and the interface to SQL Server, and uses SQL Anywhere for these functions. SQL Anywhere acts as a sort of front-end to Sybase IQ, parsing DDL, DML, and query statements and passing parsed commands down to Sybase IQ, as well as acting as a catalog server. Many features which IQ does not handle natively (stored procedures, embedded SQL, Java, etc.) are transparently handled by SQL Anywhere as if IQ itself supported such abilities. As part of this process, much of the query engine and optimizer was rewritten, both to support the new parse structure passed from SQL Anywhere, and to add a number of new optimizations. As part of Saint, several other features were added which effect the query engine:

-  TSQL functions not supported in 11.x

-  most SQL Anywhere functions

-  ANSI outer joins (left, right and full outer joins)

-  Hash Push Down joins, an ad-hoc join especially useful for “star” joins

-  local and global temp tables

-  cursors

IQ 12.0 also added a number of other enhancements and features, such as incremental backup, auto-synchronizing prejoin indexes, a single temp database space for all users, and a single database server model, which are not discussed in this paper.

Sybase IQ has several features and characteristics that give it advantages in many types of queries over traditional relational database engines, particularly for ad-hoc (unplanned) queries. Three of IQ’s most notable features are its bitmapped indexes, how IQ stores all data in vertical indexes with no traditional base table, and how IQ compresses data better than other databases. While none of these are part of the query engine, they do have an impact on how the query engine works.

The first feature, IQ’s bitmapped indexes, are called bitwise indexes in much of the Sybase external literature; this document (and most of IQ’s development organization) use the two terms interchangeably. Other databases have or are in the process of implementing a bitmapped index type; to the best of our knowledge, all of these other databases’ bitmapped indexes correspond approximately to Sybase IQ’s LowFast (LF) index, a bitmapped index useful for low cardinality (a small number of distinct values) data columns. Sybase IQ also has two bitmapped indexes - HighNonGroup (HNG) and HighGroup (HG) indexes - that are useful for high cardinality data; these indexes include unique bitmap index technology that has been patented by Sybase. A fourth index type, the Fast Projection (FP) index (called the “default” index in the 12.0 documentation), is not a bitmapped index. A fifth index type, LowDisk, is a bitmapped index designed for medium cardinality columns; due to poor performance in many areas it is only recommended for very limited uses.

Another feature that has major implications in the database engine is how the actual data rows are stored. Most traditional databases create a base table of data, stored as sequential rows of data, possibly clustered on a key index, with other auxiliary indexes layered on top. The exact equivalent of this base table in a traditional database does not exist in IQ; a base table of rows only exists as a virtual entity in the catalog. A useful way to think of IQ is a table with the data sliced vertically, rather than horizontally. The truth is somewhat more subtle; IQ requires an index on every column, stores the data in these indexes, and can use any index type either as a traditional index for looking up values or as a data source (like a base table column) for reading the base data. A FP index is closest to the idea of a vertical slice of the base table, but inside the IQ query engine even an FP index is treated just like any other index type - simply faster at some index operations and slower at others. And even with the FP index type, many columns are not actually stored as vertical slices of raw data and may be stored in an optimized and indexed representation for performance and compression.

Sybase IQ also advertises its ability to compress data very well. A general rule of thumb is that an IQ database will usually take the same or less space than the original raw ASCII data. In practice, most databases use anywhere from 60% to 120% of the raw ASCII data. This compression can provide some performance benefits. In some cases, the compression of data will allow IQ to store more data on a page than with more traditional databases, while in other cases, only a small disk I/O is needed to read the compressed page in.

On the other hand, IQ shares many characteristics with other databases. Once processing that can be done within the IQ indexes is complete, IQ will generate rows of data from the tables’ indexes. These rows are used for joins between tables, grouping, ordering, string and math functions not handled by the indexes, and so on. When data is processed by rows the algorithms and tools IQ uses are similar to many other databases in the industry.

Within Sybase IQ, when a piece of a query is processed within the indexes, it is called vertical processing. When a piece of a query is processed by rows it is called horizontal processing. A query that can be done entirely vertically is called a vertical query, while a query that can only be done with some amount of row processing is usually called a horizontal query.

Sybase IQ V12.0 Query Engine Internals - © 1998 Sybase Inc. Page XXX

Part 1   The IQ Dataflow Query Engine


Unless intercepted and processed by CIS/OMNI (not discussed in this document), IQ will accept a query from ISQL or a front-end tool, parse the SQL statement, and pass this parsed query to the IQ optimizer. As IQ is optimizing a query, it builds a “tree” of objects (joins, group by clauses, subqueries, etc.). The conceptual model of this tree in IQ is somewhat upside-down compared to a real tree. Tables are “leaves” at the bottom of the tree, and rows of data flow up the tree from the leaves to a “root” node at the top, where the data is passed out of IQ to the user. This model with a tree of processing nodes is usually called a dataflow model.

An example of a dataflow tree might look like:

IQ’s query engine is based on the Volcano dataflow engine model developed by professors McKenna & Graefe at the University of Wisconsin. Like Sybase IQ, Informix and Oracle also have dataflow engines based on the Volcano model. Sybase SQL Anywhere, ASE, and OMNI all currently use different, non-dataflow query engine models.

In Sybase IQ, this tree of dataflow objects starts executing at the root node at the top of the tree. This node starts by requesting a first row from the next object below. The node below then "wakes up", and unless the node contains a base table, it begins asking for rows from the next node below it. This continues down the tree until the execution reaches the database tables, which read the data in from the disks.

A database table in IQ is accessed through what IQ calls a vertical cursor node. A vertical cursor represents two functions in a Sybase IQ query. It first contains those local table predicates - that is, the parts of the WHERE clause which access only one table - which can be processed “vertically” within the IQ indexes. After this is done, the second function is to project rows of data from the tables. Once data rows have been projected, IQ processes the data much like other dataflow query engines.

Understanding queries : printing and drawing a query tree

For understanding how IQ processes a query, the most useful tool is the ability to print out a query plan. The method of displaying a query plan in IQ is to turn IQ’s Query_Plan parameter on ( set [temporary] option query_plan = ‘on’ ). When a query is run, IQ will write the query plan to the iq message file – “<database name>.iqmsg”.