Course : Data Structure and File processing (901230)

Instructor : Dr. Jehad Q. Odeh Alkhalidi

Faculty of Information Technology

Al al-Bayt University

1. INTRODUCTION

  • Throughout this course examples will be given in C++. Most of the time its non-object-oriented features will be used and only in the few problems that will have to be implemented as classes, its object-oriented features will be used.
  • The language is only being used as a means of expressing the algorithms. It should be realized that implementation could be in any modern high level language as they all tend to have equivalent features.
  • It should also be pointed out that in the examples concentration is on the algorithms and not the specific details required for execution. Thus the pre-processor directives and the variable declaratives will not be included in the examples except where necessary.
  • As a C/C++ programmer , you are already familiar with data types like int, char, float etc. These are basic building blocks of a programming language. For real world programming problems, it would not be possible to use basic data types alone.
  • For example consider the following C++ program which reads 5 integer numbers using cin and then print them out using cout.

#include <iostream.h>
void main()
{
int a,b,c,d,e;
cin>a>b>c>d>e;
cout<a<endl<b<endl<c<endl<d<endl<e<endl;
}
  • Try extending this program to read 1000 integer numbers!! Obviously using separate names for 1000 variables and reading all of them using a cin similar to the above will not come to the mind of any programmer. This problem is easily handled using arrays.

#include <iostream.h>
void main()
{
int a[1000],i;
for (i=0;i<10000;i++)
{
cin>a[i];
}
for (i=0;i<10000;i++)
{
cout<a[i]<endl;
} }

Program=Algorithms+data structures

2. C-DATA TYPES AND DATA STRUCTURE CLASSIFICATION

Basic data types
Internal level: bits, bytes (8 bits) and Words (typically 32 bits)
Program level: char (signed/unsigned)
float (signed/unsigned)
double(signed/unsigned)
int (signed/unsigned)
short int (signed/unsigned)
long int (signed/unsigned)
pointers (short/far)
Data structures
Non-user defined (supported directly by C) arrays, enum, struct, union, files
User defined: stacks, queues, trees, linked lists, graphs, heaps, ...

3 ARRAYS

  • A one dimensional array Allows a set of data of a specified type to be held in a structure where the individual elements can be accessed by referral to a common variable name and their element number.

e.g.
int month[12]; is a declarative specifying an array that

can hold 12 integers.

  • In C++ the element numbers are from 0 - 11.
  • An individual integer can be referred to by its position: so that month[7] refers to the 8th element in the array. REMEMBER that the positions start at 0.
  • Most compilers will initialize the elements to zero automatically (but

always play on the safe side and do it yourself if necessary!).

  • Values can be assigned to the elements either through the use of assignment statements or with input statements.

e.g.val = 36;
for(j=0;j<12;j++)
{ month[j] = val;
val = val - 2;
}
or for(j = 0; j < 12; j++)
cin>month[j];
  • Summing the elements could be achieved with:

sum = 0;
for(j = 0; j < 12; j++)
sum = sum + month[j];

  • A double dimension array can be declared as :

int sales[5][12]

  • The above declaration defines an array with 5 rows and 12 columns.
  • Input could be:

for(i = 0; i < 5; i++)
{ cout<"Input the monthly sales for salesperson "<i+1<endl;
for(j = 0; j < 12; j++)
{ cout<"Month "<j+1;
cin>sales[i][j];
}
}

Adding all the columns could be:

int sumcol[12];
for(j=0;j<12;j++)
{
sumcol[j] = 0;
for (i=0;i<5;i++)
sumcol[j] = sumcol[j] + sales[i][j];
}
  • Note that this routine puts the column sums into the array sumcol.

Searching for a target:

int num[10], flag=0, target, i;
cin>target;
for (i=0;i<10;i++)
{
if (nums[i] == target)
{flag = 1;break;}
}
if (flag == 1) cout<"Target found";
else cout<"Target not found";

Shifting elements of an array to the right and putting the last element to the first position:

#include<iostream.h>
void main()
{
int num[5],i,val;
for (i=0;i<5;i++) cin>num[i];
val=num[4];
for (i=3;i>=0;i--)
num[i+1] = num[i];
num[0] =val;
for (i=0;i<5;i++) cout<num[i]<endl;
}

POINTERS AND POINTER VARIABLES

  1. POINTERS
  • Remember that a pointer variable contains an address. A pointer variable can be declared as follows:

int *ptr;
  • This declares ptr to be a (pointer) variable that can contain the address of an integer variable.
  • In a statement block we might find:

a = 23; /* Assigns the integer variable a, the value 23 */
ptr = &b; /* Assigns ptr the address of b *\
*ptr = a; /* The contents of the variable pointed by ptr
becomes the value of a*\
  • NOTE: This is a complicated way of saying b = a; !!!!

c = *ptr; /* Assigns the value of the variable pointed by ptr (i.e. contents of b) to the variable c, (i.e. c becomes 23). */
cout<c<*ptr; /* prints c and contents of address ptr*/
/*prints 23,23 */
  • Another example:

int i,j,*p1,*p2;
i = 5;
p1 = &i;
j = p1/2 + 10;
p2 = p1;
cout<i<j<*p1<*p2; /* prints 5,12,5,5 */
  • In C++, pointer arithmetic is scaled. Consider the following program:

int a,*ptr;
ptr=&a;
if &a is equal to 276314 then ptr+1 will contain 276314+sizeof(int)

2. POINTERS AND ARRAYS

Consider the following program:

#include<iostream.h>
void main()
{
int a[10];
cout&a[0]<endl;
cout<a;
}

The two cout will have the same output because the name of an array is actually representing the address of element 0 of the array. What C++ does when we refer to the element a[i] is to get the data from a+i.

Therefore

&a[i] is the same as a+i
a[i] is the same as *(a+i )
  • The following program reads and prints using the above mentioned pointer equivalence.

#include <stdio.h>
void main()
{
int a[5],i;
for (i=0;i<5;i++) scanf("%d",a+i);
for (i=0;i<5;i++) printf("%d\n",*(a+i));
}
  • The case of two dimensional array is interesting. Every two dimensional array is stored in the memory row by row. From the row and column index (i and j), we can calculate exactly where an element a[i][j] is stored in the memory.
  • In a 2-D array, the name of the array represent the address of the first row. Therefore, we must cast this to a pointer to the array type, to handle it easily. The we will get for a char array a (char*) a=&a[0]
  • If we know the address of the first element, then given the index i and j of any element, we can find its address as i*maximum number of columns +j. This can be implemented as per the following program

#include<iostream.h>
void main()
{
char a[4][3]={{'m','a','t'},{'s','a','t'},{'h','a','t'},{'c','a','t'}};
int i,j;
char c;
for (i=0;i<4;i++)
{
for (j=0;j<3;j++)
{
cout<*((char*) a+i*3+j));
}
cout<endl;
}
}

3. DYNAMIC MEMORY ALLOCATION

  • Memory is allocated for variables in two ways:
  • Static (or at compile-time)
  • Dynamic (or at run-time)
  • In a program which declares an integer variable x, at compile time itself, memory locations of size enough for an integer ( 2 locations since size of an integer is 2 bytes) is reserved for x.
  • Arrays are also known as static data structures since they also get their required memory allocated during compile time. The problem with static memory allocation is that the memory usage may not be efficient.
  • Consider the case of array marks to store the marks of a class of a maximum size 100. It is likely that in each semester the number of students may vary. In a given semester even if 25 students are there, 100 locations will still be reserved and 75 of them wasted.
  • We can re-write the program each time with the array declaration exactly matching the number of students, but then the program is no longer a general one.
  • Dynamic memory allocation is in contrast with this. Memory gets allocated at the time of running the program and hence we can use memory to exactly meet our needs.
  • Allocated memory can be many types:
  • Contiguous memory allocation: Allocation of adjacent memory locations
  • Non-Contiguous memory allocation: Allocation of non adjacent memory locations
  • Heap: Collection of all free memory locations available for allocation
  • De-allocation: Releasing of allocated memory after use, to be added back to the heap.
  • The new and delete operators do dynamic allocation and deallocation in much the same manner that the malloc() and free() functions do in C .
  • During the design of C++, it was felt that since dynamic allocation and deallocation are such a heavily used part of the C programming language and would also be heavily used in C++, it should be a part of the language, rather than a library add-on.
  • The new operator requires one modifier which must be a type.
  • The delete operator can only be used to delete data allocated by a new operator. If the delete is used with any other kind of data, the operation is undefined and anything can happen.
  • Here is a small program which dynamically allocates memory for an integer.

#include<iostream.h>
void main()
{
int *p;
p=new int;
*p=56;
cout<“Memory allocated at ”<p<endl;
cout<“Integer in memory="<*p<endl;
}
  • Below is another example using structures

#include<iostream.h>
void main()
{
struct pair{
int x;
int y;
};
struct pair *p;
p= new pair;
(*p).x=56;
(*p).y=47;
cout<p->x<endl<p->y;
}
  • The brackets around *p is required to control the precedence between * and . operators. However this situation is so common that C++ has a short-hand notation for this.

(*p).x is equivalent to p->x
(*p).y is equivalent to p->y
  • Proper programming practice requires that all memory that is allocated to be freed after use (In the previous programs, when the program finishes running the operating system frees all memory allocated). See the following program.

#include <iostream.h>
#include <string.h>
int main()
{
struct date
{
int month;
int day;
int year;
};
int index, *pt1,*pt2;
pt1 = &index;
*pt1 = 77;
pt2 = new int;
*pt2 = 173;
cout<"The values are "<index<" " <*pt1<" "<*pt2<endl;
pt1 = new int;
pt2 = pt1;
*pt1 = 999;
cout<"The values are "<index<" "<*pt1<" "<*pt2<endl;
delete pt1;
date *date_pt;
date_pt = new date;
date_pt->month = 10;
date_pt->day = 18;
date_pt->year = 1938;
cout<date_pt->day<"/"<date_pt->month<"/"<date_pt->year<endl;
delete date_pt;
char *c_pt;
c_pt = new char[37];
strcpy(c_pt,"John");
cout<c_pt<endl;
delete c_pt;
return 0;
}

<!-- saved from url=(0022) --> STRUCTURES AND ABSTRACT DATA TYPES

1. STRUCTURES:

  • Recall the way of declaring a structured type is:

struct date {
int day;
int month;
int year; }
  • This declares a data type that is a structure containing 3 integer fields.

struct date today, *pointer;
  • This declares a variable today that can contain the 3 fields and a (pointer) variable that can contain the address of a structure of type date.
  • We can assign values in different ways:

today.day = 27;
today.month = 1;
today.year = 1999;
or
pointer = &today;
pointer->year = 1999;

Structures containing pointers:

struct node { int number, int *ptr; }
  • Defines a structure type that contains an integer and an address.

struct node record1, record2;
int a,b,c:
a = 10;
b = 15;
record1.number = a;
record1.ptr = &b;

2. ABSTRACT DATA TYPES

  • So far we have only considered carrying out simple operations on primitive data types and structures. So for instance if we have defined a variable as being of type integer then we are able to carry out operations such as assigning it an integer value, using it in expressions, incrementing it and so on.
  • We will find that with certain more complicated data types or structures we need and are able to carry out other operations and that in fact to define the type fully we will need to specify what operations can be carried out. The resulting definitions create what are called ABSTRACT DATA TYPES(ADT).
  • So structures such as lists, stacks, queues and trees together with the set of operations that can be performed on them are called abstract data types.In C++ the concept of a class is ideally suited to define an ADT .

3.CLASSES

  • Recall that as an example we might have a class as follows:

#include <iostream.h>
class Patient
{public:
Patient();
void SetDetails (int,char);
void DisplayDetails();
private:
int IdNumber;
char Name;
};
  • This specifies a class called Patient, that its structure consists of two data members IdNumber and Name and that the only operations that can be performed on such a structure (data type) are to be able to construct a member (object), to SetDetails and DisplayDetails. Thus we have an ADT called Patient.

We need to define the operations with:

Patient::Patient()
{ cout < "Allocating memory" <endl;}
void Patient::SetDetails (int IdNumberin, char Namein)
{IdNumber = IdNumberin;
Name = Namein;
}
void Patient::DisplayDetails()
{cout < IdNumber<" "<Name<endl;}
  • An example of its use is:

void main()
{ Patient p1,p2;
p1.Setdetails(1111,'x');
p2.Setdetails (2222,'y');
p1.Displaydetails();
p2.Displaydetails();
}

<!-- saved from url=(0022) --> RECURSION, BACKTRACKING AND LOOK AHEAD.

1. RECURSION

  1. A recursive function is a function that calls itself. Remember the way of finding the factorial of a number:

Method 1 - non-recursive
cin>num;
factnum = 1;
for (i=2;i<=num;i++)
factnum = factnum*i;
Method 2 - recursive
long factorial (long number)
{
if (number == 1) return 1;
else
return(number*factorial(number - 1));
}
main()
{
int factnum;
cin>num;
factnum = factorial(num);
cout<factnum;
return 0;
}
  • Every time the function is called new memory is allocated to the “local” variable number. Also the code for the function is loaded again. This means that if for instance we input num as 6 then at the “worst” stage there would be 6 copies of the function and 6 versions of the integer “number”.

2. BACKTRACKING

  • Refers to a certain class of algorithm which attempts to complete a solution by constructing partial solutions and then to extend the partial solution toward completion. When an inconsistency occurs the algorithm 'backs up' by removing the most recent construction and trying another possibility (backtracking).

Example problem: In some languages it is not possible to determine the meaning of a statement until almost all of it has been read. Consider the FORTRAN statements:

DO 17 K = 1,6
DO 17 K = 1.5
  • On compilation such a statement needs to be parsed i.e. broken down into its constituent components.
  • Thus we might commence to parse such a statement from left to write on the initial assumption that it is a loop only to find the decimal point (.) at which point we have to backtrack and try another possibility, namely that it is an assignment statement.

Example problem: Consider the 8 queens problem.

  • A general description of the algorithm might be:

void AddQueen(void)
{
for (every unguarded position p on the board)
{
Place a queen in position p;
n ++;
if (n == 8) Print the configuration;
else
Addqueen();
Remove the queen from position p;
n--;
}
}
  • For a full implementation of the above algorithm see textbook page 106.

3. LOOK AHEAD

  • In some games the advantage will always be with the player who can think several moves ahead.
  • Thus this class of recursive algorithm involves a look ahead procedure which find-out possible future moves, evaluates them and then selects the best before carrying out the move.

A general description might be:

Void LookAhead ()
{
Obtain a stack of possible moves
if recursion terminates (the base case) return one move and the value
else
{
for each move on the stack do
make the move and Lookahead
select the best value
return the move and value
}
  • Try to apply this to the game of 8. The game consists of a 3x3 board containing numbers from 1 to 8 put randomly. Therefore there is one space left for numbers to move. The aim of the game is to move the numbers to rearrange them in the following sequence:
    1 2 3
    4 5 6
    7 8
  • A number is moveable if it is adjacent to the empty square.

<!-- saved from url=(0022) --> STACKS AND QUEUES (ARRAY IMPLEMENTATION)

  1. STACKS
  • A stack is a sequence of items, which can be added and removed from one end only. Stack is a concept known to every one. We make stacks of books, plates and many other things.
  • What goes into a stack first, comes out last and what goes in last comes out first. Hence a stack is often known as LAST-IN-FIRST-OUT (LIFO)
  • A stack is very useful and popular data structure. It is used by the operating system and other system programs extensively. Whenever nested function calls are made, the programs keep track of the location to return to, by storing it in stacks. Complex arithmetic expressions can also be checked and calculated easily.

e.g. Consider the following situation:

- - - 31
23 - 18 18
45 45 45 45
16 16 16 16
37 37 37 37
Remove (23)Add(18) Add(31)

  • The processes for adding to and removing from a stack are called PUSHING and POPPING respectively.

The allowable operations that define a structure as being a stack are:

1.Create an empty stack
2. Determine whether the stack is empty or not
3. Determine whether the stack is full or not
4. Find the size of the stack (how many items are in it)
5. PUSH a new entry onto the top of the stack providing it is not full
6. Copy the top entry from the stack providing it is not empty
7.POP the entry off the top of the stack providing it is not empty
8. Clear the stack to make it empty
9. Traverse the stack performing a given operation on each entry

2. STACK IMPLEMENTATION

  • A stack can be implemented easily using an arrays and a integer that holds the position of the top element.
  • Consider the following:Element Value
    5 -
    4 -
    Here you need to imagine 3 23
    the memory elements are 2 45
    numbered from 0 to 5 upwards. 1 16
    0 37
    Suppose the array is called “stack”. In addition we need something to point to the top of the stack (not a pointer variable). Suppose the pointer was called “topstack” then at the present moment topstack would have the value 3 as this is the element number that contains the value at the top of the stack.
  • “topstack” should have the value –1 initially
  • POPping from the stack means assigning the value at the top of the stack to another variable (?) and decrementing the stack pointer i.e.

? = stack[topstack]
topstack--
  • PUSHing would require incrementing the stack pointer and putting an element onto the top of the stack i.e.

topstack++
stack[topstack] = ?

3. DECLARING A STACK ADT

  • It is better to incorporate the requisites for a stack into a single class:

#include <iostream.h>
class ADTstack
{ int stack[10];
int topstack;
public: ADTstack(){topstack = -1;};
int empty(){if (topstack == -1) return 1; else return 0;};
int full(){if (topstack == 9) return 1; else return 0;};
void push(int num){if (!full()){ topstack++;
stack[topstack] = num;
}
else cout<" Stack is Full"<endl;
};
int pop(){int num; if (!empty()) {num = stack[topstack];
topstack--; return num;
}
else {cout<"Stack is Empty"<endl; return 0;}
};
};
void main()
{ADTstack st;
st.push(23);
st.push(46);
st.push(37);
cout<st.pop()<endl;
cout<st.pop()<endl;
cout<st.pop()<endl;
cout<st.pop()<endl;
}

4. QUEUES