Chapter 7Migrating Oracle Databases to SQL Server 2000

Chapter 7

Migrating Oracle Databases toSQLServer2000

This chapter is for developers of Oracle applications who want to convert their applications to Microsoft® SQLServer™ 2000. The tools, processes, and techniques required for a successful conversion are described. Also highlighted are the essential design points that allow you to create high-performance, high-concurrency SQLServer–based applications.

Target Audience

The target audience can be new to SQLServer and its operation, but should have a solid foundation in the Oracle RDBMS and general database concepts. The target audience should have:

  • A strong background in Oracle RDBMS fundamentals.
  • General database management knowledge.
  • Familiarity with the Oracle SQL and PL/SQL languages.
  • Membership in the sysadmin fixed server role.

For clarity and ease of presentation, the reference development and application platform is assumed to be the Microsoft Windows® 2000operating system and SQLServer2000. The Visigenic Software ODBC driver is used with Oracle, and the SQL Server ODBC driver is used with SQLServer2000. SQLServer2000 includes an OLEDB driver for Oracle, but that driver is not discussed extensively in this chapter.

Overview

The application migration process can appear complicated. There are many architectural differences between each RDBMS. The words and terminology used to describe Oracle architecture often have completely different meanings in Microsoft SQLServer. Additionally, both Oracle and SQLServer have made many proprietary extensions to the SQL-92 standard.

From an application developer’s perspective, Oracle and SQLServer manage data in similar ways. The internal differences between Oracle and SQLServer are significant, but if managed properly, have minimal impact on a migrated application.

SQL Language Extensions

The most significant migration issue that confronts the developer is the implementation of the SQL-92 SQL language standard and the extensions that each RDBMS has to offer. Some developers use only standard SQL language statements, preferring to keep their program code as generic as possible. Generally, this means restricting program code to the entry-level SQL-92 standard, which is implemented consistently across many database products, including Oracle and SQLServer.

This approach can produce unneeded complexity in the program code and can substantially affect program performance. For example, Oracle’s DECODE function is a nonstandard SQL extension specific to Oracle. The CASE expression in SQLServer is a SQL-92 extension beyond entry level and is not implemented in all database products.

Both the Oracle DECODE and the SQLServer CASE expressions can perform sophisticated conditional evaluation from within a query. The alternative to not using these functions is to perform the function programmatically, which might require that substantially more data be retrieved from the RDBMS.

Also, procedural extensions to the SQL language can cause difficulties. The Oracle PL/SQL and SQLServer TransactSQL languages are similar in function, but different in syntax. There is no exact symmetry between each RDBMS and its procedural extensions. Consequently, you might decide not to use stored programs such as procedures and triggers. This is unfortunate because they can offer substantial performance and security benefits that cannot be duplicated in any other way.

The use of proprietary development interfaces introduces additional issues. The conversion of a program using the Oracle OCI (Oracle Call Interface) often requires a significant investment in resources. When developing an application that may use multiple RDBMSs, consider using the Open Database Connectivity (ODBC) interface.

ODBC

ODBC is designed to work with numerous database management systems. ODBC provides a consistent application programming interface (API) that works with different databases through the services of a database-specific driver.

A consistent API means that the functions a program calls to make a connection, execute a command, and retrieve results are identical whether the program is talking to Oracle or SQLServer.

ODBC also defines a standardized call-level interface and uses standard escape sequences to specify SQL functions that perform common tasks but have different syntax in different databases. The ODBC drivers can automatically convert this ODBC syntax to either Oracle-native or SQLServer–native SQL syntax without requiring the revision of any program code. In some situations, the best approach is to write one program and allow ODBC to perform the conversion process at run time.

ODBC by itself does not provide complete database independence, full functionality, and high performance from all databases. Different databases and third-party vendors offer varying levels of ODBC support. Some drivers implement only core API functions mapped on top of other interface libraries. Other drivers, such as the SQLServer ODBC driver, offer full Level 2 support in a native, high-performance driver.

If a program uses only the core ODBC API, it will likely forego features and performance capabilities with some databases. Furthermore, not all native SQL extensions can be represented in ODBC escape sequences (such as Oracle DECODE and SQLServer CASE expressions).

Additionally, it is common practice to write SQL statements to take advantage of the database’s optimizer. The techniques and methods that enhance performance within Oracle are not necessarily optimal within SQLServer2000. The ODBC interface cannot translate techniques from one RDBMS to another.

ODBC does not prevent an application from using database-specific features and tuning for performance, but the application needs some database-specific sections of code. ODBC makes it easy to keep the program structure and the majority of the program code consistent across multiple databases.

OLEDB

SQLServer2000 takes advantage of OLEDB within the components of SQLServer itself. Additionally, application developers should consider OLEDB for new development with SQLServer2000. Microsoft includes an OLEDB provider for Oracle 8 with SQLServer2000.

OLEDB is an open specification designed to build on the features of ODBC. ODBC was created to access relational databases, and OLEDB is designed to access relational and nonrelational information sources, such as mainframe ISAM/VSAM and hierarchical databases, email and file system stores, text, graphical and geographical data, and custom business objects.

OLEDB defines a collection of COM interfaces that encapsulate various database management system services and allows the creation of software components that implement such services. OLEDB components consist of data providers (that contain and expose data), data consumers (that use data), and service components (that process and transport data, for example, query processors and cursor engines).

OLEDB interfaces are designed to help components integrate smoothly so that OLEDB component vendors can bring high quality OLEDB components to the market quickly. In addition, OLEDB includes a bridge to ODBC to allow continued support for the broad range of ODBC relational database drivers available today.

Organization of This Chapter

To assist you in implementing a systematic migration from Oracle to SQLServer, each section includes an overview of the relevant differences between Oracle databases and Microsoft SQLServer2000. The chapter also includes conversion considerations, SQLServer2000 advantages, and multiple examples.

Where appropriate, the chapter provides references to external sources that describe the topic in more detail.

Architecture and Terminology

To start a successful migration, you should understand the basic architecture and terminology associated with SQLServer2000.

Definition of Database

In Oracle, a database refers to the entire Oracle RDBMS environment and includes these components:

  • Oracle database processes and buffers (instance).
  • SYSTEM tablespace containing one centralized system catalog, which is made up of one or more datafiles.
  • Other tablespaces as defined by the DBA (optional), each made up of one or more datafiles.
  • Two or more online Redo Logs.
  • Archived Redo Logs (optional).
  • Miscellaneous other files (control file, Init.ora, config.ora, etc.).

A SQLServer database provides a logical separation of data, applications, and security mechanisms. A SQLServer installation (an instance) can support multiple databases. Applications built using SQLServer can use databases to logically divide business functionality. There can be multiple instances of SQLServer on a single computer. Each instance of SQLServer can have multiple databases.

Each SQLServer database can support filegroups, which provide the ability to distribute the placement of the data physically. A SQLServer filegroup categorizes the operating-system files containing data from a single SQLServer database to simplify database administration tasks, such as backing up. A filegroup is a property of a SQLServer database and cannot contain the operating-system files of more than one database, although a single database can contain more than one filegroup. After a database is created, filegroups can be added to the database.

{BMC om06.bmp}

SQLServer also installs the following databases by default:

  • The model database is a template for all newly created user databases.
  • The tempdb database is similar to an Oracle temporary tablespace in that it is used for temporary working storage and sort operations. Unlike the Oracle temporary tablespace, SQLServer users can create temporary tables that are automatically dropped when the user logs off.
  • The msdb database supports the SQLServer Agent and its scheduled jobs, alerts, and replication information.
  • The pubs and Northwind databases are provided as sample databases for training.

For more information about the default databases, see SQLServer Books Online.

Database System Catalogs

Each Oracle database runs on one centralized system catalog, or data dictionary, which resides in the SYSTEM tablespace. Each SQLServer2000 database maintains its own system catalog, which contains information about:

  • Database objects (tables, indexes, stored procedures, views, triggers, and so on).
  • Constraints.
  • Users and permissions.
  • User-defined data types.
  • Replication definitions.
  • Files used by the database.

SQLServer also contains a centralized system catalog in the master database, which contains system catalogs as well as some information about the individual databases:

  • Database names and the primary file location for each database.
  • SQLServer login accounts.
  • System messages.
  • Database configuration values.
  • Remote and/or linked servers.
  • Current activity information.
  • System stored procedures.

Similar to the SYSTEM tablespace in Oracle, the SQLServer master database must be available to access any other database. It is important to protect against failures by backing up the master database after any significant changes are made in the database. Database administrators can also mirror the files that make up the master database.

For information about a list of the system tables contained in the master and all other databases, see “System Tables” in SQLServer Books Online.

Physical and Logical Storage Structures

The Oracle RDBMS is comprised of tablespaces, which in turn are comprised of datafiles. Tablespace datafiles are formatted into internal units termed blocks. The block size is set by the DBA when the Oracle database is first created. When an object is created in an Oracle tablespace, the user can specify its space in units called extents (initial extent, next extent, min extents, and max extents). If an extent size is not defined explicitly, a default extent is created. An Oracle extent varies in size and must contain a chain of at least five contiguous blocks.

SQLServer uses filegroups at the database level to control the physical placement of tables and indexes. Filegroups are logical containers of one or more files, and data contained within a filegroup is proportionally filled across all files belonging to the filegroup.

If filegroups are not defined and used, database objects are placed in a default filegroup that is implicitly defined during the creation of a database. Filegroups allow you to:

  • Distribute large tables across multiple files to improve I/O throughput.
  • Store indexes on different files than their respective tables, again to improve I/O throughput and disk concurrency.
  • Store text, ntext, and image columns (large objects) on separate files from the table.
  • Place database objects on specific disk spindles.
  • Back up and restore individual tables or sets of tables within a filegroup.

SQLServer formats files into internal units called pages. The page size is fixed at 8192 bytes (8 KB). Pages are organized into extents that are fixed in size at 8 contiguous pages (64 KB). When a table or index is created in a SQLServer database, it is automatically allocated one page within an extent. As the table or index grows, it is automatically allocated space by SQLServer. This allows for more efficient storage of smaller tables and indexes when compared to allocating an entire extent as in Oracle. For more information, see “Physical Database Architecture” in SQLServer Books Online.

Striping Data

Oracle-type segments are not needed for most SQLServer installations. Instead, SQLServer can distribute, or stripe, data more efficiently with hardware-based RAID or with software–based RAID available through the WindowsDisk Management utility or from third parties. With RAID, you can set up striped volumes (stripe sets in Windows NT 4.0) consisting of multiple disk drives that appear as one logical drive. If database files are created on this striped volume, the disk subsystem assumes responsibility for distributing I/O load across multiple disks. It is recommended that administrators spread out the data over multiple physical disks using RAID.

The recommended RAID configuration for SQLServer is RAID 1 (mirroring) or RAID 5 (stripe sets with an extra parity drive, for redundancy). RAID 10 (mirroring of striped sets with parity) is also recommended, but is much more expensive than the first two options. Stripe sets work very well to spread out the usually random I/O done on database files.

If RAID is not an option, filegroups are an attractive alternative and provide some of the same benefits available with RAID. Additionally, for very large databases that might span multiple physical RAID arrays, filegroups may be an attractive way to further distribute your I/O across multiple RAID arrays in a controlled fashion.

Transaction log files must be optimized for sequential I/O and must be secured against a single point of failure. Accordingly, RAID 1 (mirroring) is recommended for transaction logs. When migrating, the size of this drive should be at least as large as the sum of the size of the Oracle online redo logs and the Oracle rollback segment tablespace(s). Create one or more log files that take up all the space defined on the logical drive. Unlike data stored in filegroups, transaction log entries are always written sequentially and are not proportionally filled.

For more information about RAID, see SQLServer Books Online, your Microsoft Windows 2000 documentation, and the Microsoft Windows2000 Resource Kit.

Transaction Logs and Automatic Recovery

The Oracle RDBMS performs automatic recovery each time it is started. It verifies that the contents of the tablespace files are coordinated with the contents of the online redo log files. If they are not, Oracle applies the contents of the online redo log files to the tablespace files (roll forward), and then removes any uncommitted transactions that are found in the rollback segments (roll back). If Oracle cannot obtain the information it requires from the online redo log files, it consults the archived redo log files.

SQLServer 2000 also performs automatic data recovery by checking each database in the system each time it is started. It first checks the master database and then launches threads to recover all of the other databases in the system. For each SQLServer database, the automatic recovery mechanism checks the transaction log. If the transaction log contains any uncommitted transactions, the transactions are rolled back. The recovery mechanism then checks the transaction log for committed transactions that have not yet been written out to the database. If it finds any, it performs those transactions again, rolling forward.

Each SQLServer transaction log has the combined functionality of an Oracle rollback segment and an Oracle online redo log. Each database has its own transaction log that records all changes to the database and is shared by all users of that database. When a transaction begins and a data modification occurs, a BEGIN TRANSACTION event (as well as the modification event) is recorded in the log. This event is used during automatic recovery to determine the starting point of a transaction. As each data modification statement is received, the changes are written to the transaction log prior to being written to the database itself. For more information, see “Transactions, Locking, and Concurrency” in this chapter.

SQLServer has an automatic checkpoint mechanism that ensures completed transactions are regularly written from the SQLServer disk cache to the transaction log file. A checkpoint writes any cached page that has been modified since the last checkpoint to the database. Checkpointing these cached pages, known as dirty pages, onto the database, ensures that all completed transactions are written out to disk. This process shortens the time that it takes to recover from a system failure, such as a power outage. This setting can be changed by modifying the recovery interval setting by using SQLServer Enterprise Manager or with TransactSQL (sp_configure system stored procedure).