RILEY / DIEBERT1

MICHAEL LEROY RILEY II

IAN DIEBERT

DR. SAM SIEWERT

CS-332 ORGANIZATION OF PROGRAMMING LANGUAGES

20APR17

COBOL: Lethal Injection or Fountain of Youth?

Towards the end of the 1950’s there were growing concerns among computer users and manufacturers regarding the rising cost of programming. During this time people did not process data on their laptops as we do today. In addition, servers were not a conceptually familiar entity in the way that they are today. (Wikipedia) Data processing installations were dedicated businesses, such as the Computer Services Bureau, which offered services such as payroll but at a significant cost. In 1959, a survey was conducted which found that in any data processing installation, the programming cost was approximately $800,000.00 on average and translating programs to run on new hardware would cost approximately $600,000.00. (Wikipedia) While those figures are not appetizing for any business, let’s remember the context by which those figures are derived. Those dollar amounts were from 1959, so $800,000.00 would be approximately equal to $6,702,432.99 in 2017 after inflation which is an increase of 737.8%. (US Inflation Calculator.) In addition to these costs, there were many other reasons to create a domain specific programming language. Many businesses during this time were creating their own programming languages and / or software packages to perform the tasks relevant to their organization. From a narrow vantage point of that organization this makes sense; however, when one views the business arena from a broader perspective an immediate conclusion can be reached that most of these businesses were of a similar nature, particularly in the sense that they were businesses first. Although nearly every business was producing different physical products, their data for business transactions only differed in the format of the records and the length of fields. From a software engineering perspective, and coupling these facts with the exorbitantly high cost of development,these businesses were similar enough to warrant exploration into the creation of a domain specific ‘Common Business Oriented Language’, derivatively named COBOL.

There were only two major professionally developed programming languages present at the time COBOL arose. Figure one below show a minimalistic view of the status of programming languages at that time.

Figure 1: Origination of COBOL with Key(

Looking at figure 1 above, it may be difficult to grasp the significance of the amount of parallel software development projects occurring in companies between Fortran and COBOL. Figure 2, below, shows a tree of programming language development between the years of 1955-1960 for just the Electronic Numerical Integrator and Computer (ENIAC). Although the ENIAC was not only a programming language, it was under development at the same time. It is not far off to come to the conclusion that as the hardware and capability of the ENIACprogressed and as different customers with different business needs acquired it,the rate of programming language development would fervently increase.

Figure 2: Software Development Efforts between 1955 and 1960 (Gordon)

In 1959, a meeting was held at the University of Pennsylvania with the goal of formulating ideas for common business languages. This group soon propositioned the United States Department of Defense (DoD) to sponsor an effort to create such a language. The DoD was heavily invested in the crisis that was enveloping the business and software engineering worlds at this time. The DoD itself operated 225 computers, had another 175 on back order, and had already spent over $200 million on software development efforts for this machine. Needless to say, the DoD was quite willing to assist in the effort and an early 1960 compiler construction for COBOL-60 was underway. (Wikipedia)

If COBOL was developed to satisfy a niche market need then is questionable as to why many software/computer engineers and scientists would speak the need for its replacement. There are many arguments for both the continuation of COBOL via specification refinement and for the complete retirement of the language in lieu of another more modern language such as C++. Arguments for the replacement of COBOL are centered around the cost of the programming language in today’s market. Due to the poor design and documentation of legacy systems, it is difficult to check downstream and upstream dependencies for COBOL modules, making it expensive and risky to make any changes in the code. (Moffitt, 2017) COBOL programmers, with and without vast experience, are paid handsomely for code maintenance and even more so for code generation. According to a COBOL programmer with less than one year of experience can expect to earn approximately $45k, while a ‘COBOLER’ with over five years of experience can expect to earn $70k. These may not seem very impressive at first but it must be considered that they represent the mean not the maximum. A C++ programmer’s salary is comparable; however, the market is overloaded with C++ developers relative to COBOLERs. The US government is also concerned with the vulnerabilities and expenses of an aging IT infrastructure. On September 22, 2016, the US House of Representatives passed The Modernizing Government Technology Act (Bailey, 2016).

Another area of concern that COBOL is facing in today’s market is the constant threat of cyber-attacks; an area which may overwhelm the COBOL programming language and force it into retirement. The United States government has experienced a number of attacks against COBOL systems in recent years. The Office of Personnel Management (OPM) suffered a breach of security in 2015 as did the Veterans Administration (VA). At the heart of their information technology environment was COBOL. According to Nextgov.com three out of ten of the oldest federal IT systems still in use run on COBOL. Not surprisingly the VA and OPM are listed as the agencies still running COBOL. One question that persists in debates regarding this particular topic is how cyber-attacks effect – directly and indirectly – COBOL systems. According to John Walker, owner of Secure-Bastion Ltd, “At the end of the day it is a case of ‘security through obscurity’ versus ‘devoid security through confusion’. On one hand, we have the inferred security represented by say, non-routable protocols, or mainframe speciation partitions such as LPAR. On the other side of the security coin we have unpatched outdated systems such as NT4.0 residing inside virtualization. No matter, the outcomes are the same: confusion, and anenormous potential for unknown unknowns of insecurity to reside within the operational environment.” (Naked Security) Although this argument seems logical there seems to be an easy work around to the crucial point. If the main defense against injection attacks or any other form of cyber-attack is a lack of knowledge regarding a subsystem and its associated protocols or layers then the only thing required of a malicious actor is to learn about these facets of the system. Furthermore, if the only assurance against the malicious actor gaining the requisite knowledge is the fact that those systems are not well documented, then that really is not a great deal of assurance at all. If a malicious actor had physical access to a COBOL system then learning about it would be trivial. According to Dr. Jon Haass and Dr. Paul Hriljac, professors in the College of Security and Intelligence at Embry-Riddle Aeronautical University Prescott, AZ, “What is more likely the case is that a gateway or filter is used to provide connection to the legacy system and modern systems. The challenge for COBOL is its lack of strong authentication so if you can get past the gateway, you are in.” This seems to be a serious risk that financial organizations are taking and it also seems to be a risk that will eventually result in catastrophic failure. Dr. Haass and Dr. Hriljac continue on to say, “A person does not have to have physical access to interfere, instead you hack the gateway or bridge system and then you can practically access everything in the legacy code. Very little work is being done with Cobol except to link up to new front ends whether they are web based or some other device.” When cast in this light there seems to be some very real evidence that COBOL may need to be ported to another programming language.

C++ exists as a potential competitor to COBOL and is one of the many languages that has been analyzed and discussed for its possible benefits should a portion of the industry decide to transit to the language. C++ contains many well-known and highly understood benefits; it is more powerful (in terms of overall code length and supported features), it is better documented (with perhaps many thousands of manuals and books dedicated to the language), and it is known by a much wider set of programmers. These are easily provable points on a personal level; one can search Google regarding function templates and see that they are not supported in COBOL, or look up COBOL references/manuals to see that the number of results is far less than the same of C++, or simply search the trends regarding COBOL and C++ to find that C++ is a more active topic.

The primary concerns regarding COBOL are that it is obsolete, and lacks the dedicated programmer bases maintained in languages such as java, python, and of course C++. Many argue that a programming language which is generally less useful and performs to a worse degree should not be maintained as the standard for the majority of the business and transactional sectors of programming. Despite this argument, which is – from a high-level perspective – very sound, businesses have always and continue still to support and run COBOL on even the most critical of data systems. Leading to the hypothesis that for the usage of COBOL to continue there must reasons and a market to substantiate this.

C++ was originally designed to be a general-purpose language that superseded C with object oriented features, such that the language would stay low-level and close to the hardware. As a superset of C, C++ would also retain the speed, portability, and all functionality present within the C language; this made it an easy transition to all those programmers who were implementing C. COBOL now was established as a language, but had already been witnessing a decline in overall popularity; Edsger Dijkstra as early as 1975 (over a decade before the official release of C++) stated, “the use of COBOL cripples the mind.” Regarding the overall lack of structure present in the COBOL language.

C++ excels in the areas in which COBOL most lacks; structure, compatibility/efficiency of adding to the language, the verbosity of syntax, support of the greater computer science community, and the methods of design undertaken by the original creators and subsequent standards committee.

C++ is incredibly well structured, primarily due to its derivation from C, and the standards upon which it has been built. Well written C++ code can be understood and read (albeit with some significant work) even as part of a million-line program, meaning that it can be modified, updated, or ported to another language without the fear of tearing apart an entire digital ecosystem simply through the alteration of a single piece of data. On the other end of the spectrum, COBOL suffers from just that; much of the legacy COBOL code can no longer be modified, as the older versions exist as archaic, unstructured, and illegible constructs that can no longer be understood or, more importantly, modified without the danger of tearing down a critical system.

C++ has also demonstrated many times that it is very capable of being modified without sacrificing existing legacy support, or support for the entire modern C language. COBOL, as an old and monolithic language, was unable to be upgraded without leaving behind much of the code that had already been implemented. When COBOL-85 was released, it did not support compatibility with older versions, and was therefore highly criticized; users citing the heavy reprogramming costs of implementing the updated standard. It is important to note that the board of standards for COBOL was unable to convince the then users of the language to switch from it, to a more modern version of itself. This has led to over 300 different dialects of COBOL and a user base that cannot and will not switch to even a more modern version of COBOL, leaving aside the notion of separate modern language.

This ultimately led to COBOLs final primary detriment, which is the lack of continued support by engineers and scientists at the top of the field. COBOL was already an underused language during the mid-80s, which was strongly exacerbated with the alienation of the computer science community regarding its development by members of commerce and government. Elements such as the exclusion of Backus-Naur form for COBOLs syntax in lieu of its own metalanguage prevented most of the computer engineering/science community from paying much notice at all to COBOL.

The COBOL and C++ programming languages were evaluated against each other with regards to binary executable size on disk, lines of code to perform the same operations, and program execution time. Each program was written to the following requirements to ensure the comparison be as representative to a true analysis as possible.

  • Read in a .CSV file
  • Store the data from the file into memory
  • Randomly generate values
  • Update the entire contents of the file by adding the randomly generated number to a specific field per row of the entire data set
  • Write the data back to a .CSV file
  • Be ran on the same computer (laptop) during the evaluation and analysis events

Figure 3: Visual Studio Performance Analysis of COBOL Execution

Figure 3, above, shows a performance analysis of the COBOL program ran by Microsoft Visual Studio 2015 (VS15). Although execution time of about 2.02 seconds is depicted, visual studio profiler removed the time it took for the program to gain admittance to the processor and for garbage to provide actual time on the CPU. According to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME setting, not using Job Control Language (JCL) files, COBOL Copy (CPY) files, or any other of the many resources that COBOLER’s typically make use of.

Figure 4: Breakdown of COBOL Function Calls During Analysis

VS15 also created a separate window showing the functions that took the most time to complete. In figure 4, above, the first two and last functions called by the program were native to the MicroFocus programming environment. These functions consumed 63.37% of the total execution time for the program. The functions written for this program, not built in to COBOL or native to MicroFocus, consumed 36.63% of the total program execution time which yields an execution time of .43956 seconds. This number reflects what most COBOLER’s refer to as a ‘good enough’ execution time for small COBOL operations. Further analysis of the execution timing reveals that the user defined sorting function took up 26.59% of the total execution time. COBOL has a built-in SORT operation which was also tested earlier on in the development phase. While the SORT operation does work on tables, with some modification, it was determined that due to time constraints it would be wiser to write a Bubble Sort method. It is noted that not using the built-in SORT operation is not a fair comparison on COBOL’s behalf.

Figure 5: Visual Studio Performance Analysis of C++ Execution with Optimization Settings O2 – Maximize Speed

Figure 5, above, shows the performance analysis of the C++ program completing the same task as the COBOL program. Perhaps not surprisingly, C++ had an average runtime of 542 milliseconds, performing just over twice as fast as the COBOL program. One of the more challenging aspects of this research was finding examples of data and industry standard COBOL. We would have liked to have implemented programs and data sets similar to that used in the business world however these implementations were more elusive than we thought they would be, perhaps because of their secure nature. Due to this, we cannot say with one hundred percent certainty that the data concludes C++ as faster, however for basic tasks using unoptimized code this is shown to be the case.

Figure 6: Breakdown of C++ Function Calls During Analysis (Optimized O2)

This is the separate window containing the C++ programs functions requiring the largest amount of completion time. The oddity here (not represented in the above figure) was that almost every time the program was run, a different function would take up the largest percentage of individual work. For example, in the case above it happened to be the ‘<’ operator in some fashion, while other times the ‘getline’ function would take that place. Whenever it was run, it consistently ended with the result that one function would take ~60% of the work. There was no clear answer as to why this was the case. Overall, the ‘FileIn’ portion of the code was the most time consuming for the program, and can be seen in the figure above.

Figure 7: C++ Execution with Optimizations Disabled

Microsoft Visual Studio:MicroFocus does not provide settings for optimization of the COBOL compilation, yet C++ optimization settings are certainly available. After seeing the C++ and COBOL execution times, and factoring out MicroFocus overhead we were curious as to the performance of C++ when optimizations were disabled. Figure 7, above shows the execution profile of the C++ program, without any other changes, optimizations disabled. The execution profile without optimizations presented a striking divergence from the profile with optimizations. As shown above the un-optimized C++ program took fourteen seconds to complete, which is twenty-five times slower and would be considered unacceptable for database file IO.