Knowledge Mining: Business Rule Extraction and Reuse

Knowledge Mining: Business Rule Extraction and Reuse

by William Ulrich, president, Tactical Strategy Group

I find that concepts like knowledge mining of legacy business rules can seem abstract or threatening to people, but the concept is important nonetheless. Any organization with a large installed base of legacy systems should consider using business rule extraction, and the reuse of those rules, as a key component of future IT initiatives. I hope to put the concept of business rule extraction and reuse into a context that organizations can readily begin to deploy.

What Is Business Rule Extraction?

In order to define legacy business rule extraction, we must first define a legacy system. A legacy system is any technology that has evolved over the past 50 years and is currently managing information in a production environment. This includes systems built using Assembler, Fortran, Cobol, C, C++, Java, screen-scraping technology, middleware or other coding discipline.

This begs another question. What is a business rule? The following definitions are adopted from the Object Management Group (OMG).

  1. Rules are declarations of policy or conditions that must be satisfied.
  2. Business rules are rules that govern the way a business operates.

For the purpose of legacy business rule extraction, we narrowed the OMG definitions to the following.

A business rule is a combination of conditional and imperative logic that changes the state of an object or data element.

An example might help clarify this definition. In the English language, a rule might appear as: "Pay a supplier invoice only if it has been approved."

Business rule extraction identifies and captures any combination of conditional and imperative statements and isolates these statements to facilitate the understanding and reuse of knowledge contained within legacy information systems. Rules can invoke other rules and be represented within a rule hierarchy. Business rule extraction documents which business functions are being performed by a given set of programs and systems. Consolidating and reusing captured business rules extends this process by making these rules available to design and development teams.

Why Is Business Rule Extraction and Reuse Important?

Rule extraction and reuse are a means to an end, and that end depends on the tactical and strategic requirements of a given enterprise. Big-picture requirements drive the need to perform business rule extraction and reuse. Executives should only sanction rule extraction when there is a valid business reason.

External business factors are constantly changing, and this dictates the consideration of business rule extraction. Globalization, the decay of historic management structures and technological advancements force change. As we move further into the new millennium, we can expect to see changes within organizational infrastructures that will make the business process reengineering craze of the 1990s pale by comparison.

Global dynamics are impacting internal and external infrastructures across a number of industries, including healthcare, energy, transportation, manufacturing, finance, retail and government. At the heart of these changes is the management of critical information. Managing this information requires identifying and changing the business rules found within our information systems.

Now consider how these global dynamics drive change within an enterprise. Management may wish to implement an enterprise resource planning (ERP) package while redeploying mainframe functionality to the Internet. Or analysts may need to apply upgrades to in-house financial applications but do not know where certain functions are defined. Analysts may alternatively wish to integrate related systems containing overlapping functionality. The list of projects goes on, but these initiatives have one thing in common; each type of project requires access to and understanding of functionality contained within your legacy systems.

Why Consider Business Rule Extraction Now?

The loss of system subject matter experts has slowed package deployment, system migration, legacy system deactivation and simple upgrade efforts. In 1995, Standish Group International announced a study stating that more than 80 percent of all IT projects are late or never delivered at all. And a Cap Gemini study found that 80 to 90 percent of all Y2K projects were behind schedule. It appears that the ability to deliver IT projects involving legacy systems has not improved over the past few years.

With the availability of new and powerful Web-enabling technology and object-based solutions on the horizon, businesses cannot afford to continue this pattern of failure. Therefore, any plan to integrate and/or migrate existing systems under new technologies and functional architectures should incorporate the concept of business rule extraction and reuse.

One thing that has not changed is our reticence to reuse and recycle. We live in a throwaway society. We fight recycling just as we fight the reuse of legacy business rules. Many people believe that the reuse of the knowledge embedded in legacy systems means that the new system will not truly be a "new" system. This thinking has permeated the actions of many executives over the years. Unlike buying a new car or house, however, legacy systems contain strategic business rules that govern an enterprise and are very hard to recreate. If this were not true, we would not have collectively spent billions of dollars making these systems Y2K-compliant.

In addition to large-scale replacement initiatives, organizations are attempting to attach front ends or Internet links onto mainframe systems. While this provides a near-term productivity boost, the architectures underlying these interfaces cannot keep up the dynamic changes occurring in the outside world. At some point, executives will need to redesign and integrate these architectures to reflect widespread changes within their industry. Extracting business rules as input to this redesign and integration process brings an impossible challenge into a manageable realm.

IT has spent the last 15 years chasing silver-bullet solutions to the systems redevelopment challenge. Many of these disciplines have value in their own right, but none of them has the right to ignore the vast base of installed systems running our businesses. It has always been more compelling to pursue the development of a new system while ignoring existing systems. Time is running out on that strategy. Executives must alternatively incorporate business rule extraction and reuse into systems management and development paradigms.

A Business Case for Rule Extraction

Executives never seem to blink when handing over $10, $20 or even $50 million for a systems replacement project. During the 1980s and 1990s, the U.S. Department of Internal Revenue Service (IRS) tossed billions of dollars at a systems renewal project that yielded little return on investment. The project was canceled, and the IRS will enter the year 2000 with the bulk of its vintage 1960s Assembler and Cobol systems running its operational environment.

With a history of replacement project failures and the task of catching up with a multiyear backlog that was ignored while Y2K projects were completed, IT organizations have a major challenge ahead. Many subject matter experts have retired, legacy systems expertise is at a premium, business users want instant access through the Internet, and IT executives are stuck in the middle of this dilemma.

A large percentage (70 percent or more) of the rules in a replacement system (package or custom developed) exist within the current system. Can analysts readily respecify replacement rules without examining how the old rules work? History says no. A study done years ago found that the first two years in the life of a replacement system is spent putting in "lost" business rules that analysts missed during the replacement design process.

Programmers go back to old systems all the time when creating or installing a new system for a reality check. We are suggesting that management formalize this process in a way that makes business rules available to all levels of the requirements, design and development team. It doesn't matter if the new system is on the Internet, written in Java or inside a package. The same thinking should apply.

Much of the time spent specifying these requirements would be better spent creating a "straw man" specification based on the business rules captured from your legacy systems. This would ensure that any replacement or integration effort meets the core business requirements of the systems it plans to replace. Even middleware projects need to understand where legacy business rules are defined and where they overlap.

The business case for applying business rule extraction and reuse is the same whether replacing, renovating, integrating or putting front ends onto legacy systems. All of these initiatives need to understand core business requirements. Rule extraction and reuse facilitates this understanding and reduces specification, coding and, ultimately, testing time because you got it right in the first place.

Finally, analysts should consider that Y2K projects serve as a solid foundation for a rule extraction project. Many organizations had to inventory their systems and document which functions those systems support. This documentation effort established the basis for rule extraction and reuse. Your Y2K investment is a cost-effective launching point for a business rule extraction project.

The Rule Extraction Setup Process

A parachutist would never jump from an airplane without the proper training, equipment, crew, weather analysis or other setup procedures. Rule extraction requires setup work as well. Prior to undertaking the business rule extraction process, project teams should consider several preliminary steps.

Business rule extraction requires a high-level assessment of the existing environment so that analysts can segment systems in preparation for rule extraction. Analysts should segment a subset inventory of system components to be fed into the extraction process. Architecturally, analysts should determine how systems share data to refine the population of systems targeted for rule extraction. Any system updating a shared database or master file should be included in the project. Analysts should then refine the set of candidate components based on the functions that they perform. In other words, segment the population of systems and programs across an enterprise into a manageable, interrelated subset.

This preliminary analysis may have identified weaknesses in a system that may be corrected prior to rule extraction. Removal of code flaws, such as recursion, restructuring of convoluted logic and the splitting of cumbersome modules simplifies rule extraction. Data name rationalization, which involves consolidating and renaming redundant data definitions, helps analysts determine which data elements a rule impacts. While code improvement techniques simplify the rule extraction process, timing, budgets and common sense dictate the degree of effort applied to this step.

Analysts must also verify that they have the means to capture and track extracted business rules. This is essential because most environments have thousands of rules at multiple levels. A rule extraction tool, such as HotRod from Netron, can capture and isolate rules as well as assist with rule consolidation. While automated tools can greatly simplify the extraction process, knowledgeable analysts may simply apply their own expertise, along with an intelligent editing tool, to accomplish this task.

Because there are countless business rules, many of which are redundant, that do not adhere to any comprehensible structure, analysts can get lost in the details. This is why I suggest applying a tracking facility to identify which rules were found and where. This facility may be a relational database or repository that can define each physical program and related system component in the subset of systems targeted for extraction. As business rules are found, analysts should describe them within the tracking repository and link them to the system component that physically defines that rule. Redundant rules can also be highlighted using this technique.

Rule Extraction and Reuse

Business rule extraction and reuse involves analysis, elimination of irrelevant rules, tagging, consolidation and importation into an alternative target format. Only about 20 to 30 percent of the source code within a given system can be linked to actual business rules. The remaining code tends to deal with the physical constraints of the environment. This is the main reason that wholesale migration of entire systems to Java-based or other environments is ill advised. Implementation-dependent logic that does not directly address an actual business rule falls in several categories, as depicted in Table 1.

Implementation-Dependent Logic / Extraneous Logic Identification Process
Syntactically Dead Logic / Identify logic never executed regardless of data values.
Semantically Dead Code / Identify logic not executed based on setting of data values. May be due to business changes.
Initialization Logic / Find logic that sets element or record area values to null or zero after entry points.
Input/Output (I/O) Logic / Find I/O code to physical data structures or user media. Includes call, read, write, related commands and logic invoking I/O statements.
Output Area Build Logic / Find logic moving data to screen and report work areas by tracing elements in I/O areas back from output statements.
I/O Status Checking / Find conditionals executed directly after I/O commands, checking communication codes for TP monitor, database or other I/O types.
Error-Handling Logic / Find imperative logic that invokes exception reporting or module termination based on status code results.
Data Structure Manipulation / Find database manipulation logic that can be traced to work areas not containing business data.
Special Environmental Logic / Find logic managing homegrown compiler, TP monitor, database, date handling or similar routines.
Extraneous and Superfluous Logic / Find redundant conditionals in same logic path, mutually exclusive tests or similar routines.

Table 1: Identifying and Discarding Nonbusiness Logic

Table 1 helps analysts determine what type of coding logic to ignore while searching out business rules. The next step is to identify the kinds of logic that you want to identify, log and possibly reuse. Analysts perform rule isolation and extraction by tracing logic paths based on various selection criteria. This includes logic leading to the creation of a given output variable, logic linked to some type of conditional and logic associated with a given input transaction type.

Each conditional and related imperative statement can be considered a rule. One challenge is to identify and link a conditional that is physically remote from its imperative statements. Because business rules do not limit themselves to the confines of a source program, extraction tools and techniques should be able to analyze logic across program boundaries. In addition to this, a rule extraction tool should bypass or highlight implementation-dependent logic, store an extracted rule, signify when a rule invokes another rule, display a rule, link to the tracking repository and, optionally, transform an extracted rule into a reusable format.

The tracking repository should be used to highlight rule redundancy by single rule definition within the repository to the physical location of each redundant legacy rule. The repository can also be used to map legacy rules to various target rules. This process helps analysts determine if a target design model is complete or if a package has the requisite functionality required within the existing system.

The fact that humans cannot digest the sheer volume of captured rules dictates that these rules must be presented using a predetermined rule classification scheme. Analysts can use the tracking repository to link each rule to the data entity that it modifies. They can also link rules to reports or screens for a given system. Once rules have been captured in a tracking repository, system maintenance teams should keep this information up-to-date.

Organizations may stop here and limit reference of these rules to ongoing maintenance, middleware or system integration projects, or they may decide to reuse these rules in a target system design. The specification models used by a given development team dictate the format that rules will ultimately take. The good news here is that management can change its development technology as much as it likes, and the rule definitions in the tracking repository remain current because they are implementation independent.

Conclusion

Legacy business rule extraction and reuse is as integral to systems upgrade, integration, migration and interface projects as are the people and computers used to implement those projects. Launching a rule extraction effort requires a commitment from management and a willingness to deal with legacy systems as opposed to sidestepping them--a failed strategy of the past.

Executives who recognize and address the business rule extraction issue will leverage valuable legacy system assets along with emerging technologies. Those who continue chasing silver-bullet solutions while ignoring the installed base of legacy systems will be doomed to repeat the mistakes of the past.