PROG
Merge the files u413a.txt and u413b.txt
Both are assumed to be in sorted order on input
The output file, u413.out, will also be sorted
PED
To learn how to merge files
To prepare us for the Master-File update problem
DO LOOP INVARIANT
number1 contains the current number from u413a -- first file
number2 contains the current number from u413b -- second file
all numbers before number1 from first file are already put in
output file
all numbers before number2 from second file are already put in
output file
output file is sorted
all the numbers in the output file are less than the current number
from first
file
all the numbers in the output file are less than the current number
from
second file
This problem introduces us to an important algorithm in business data processing, merging two files. We merge two files when both input files are sorted and we want an output file that is also sorted. Imagine, that you have two companies, each with an alphabetic list of customers. Both companies merge. We would want a single list of customers--also in alphabeticalorder. We say we merge the two lists of customers. The same thing happens in computers all the time.
A similar problem is the master file update problem which will be
discussed in yuour homework. There is a big file containing all
the customers and their current balance. This file is kept
sorted. Each month, we type in a file containing all the new
transactions, payments and new orders. This file is sorted with a
sort utility. The system outputs a new file containing all
the
new balances for each customer. It is also sorted. We can use
this new file after the next month's transactions. This is a very
common way to handle batch updates in commercial business data
processing systems such as banks, utilities, etc. (The other
option would be a situation where one can enter the customers and update the information on-line as would be done with an Automatic
Teller Machine. This requires arrays or an indexed random-access
file. Arrays will be covered in the next section. Random-access
files are beyond the scope of this course.)
The fundamental idea of this program is that each time through the
loop we add a single number to the output file each time. The
number we add should obey the do loop invariant. We also have the
next number to be added from file1 in number1. And, likewise, we
have the next number to be
added from file2 in number2. If the number to be added is from
file1, we read in a new value for number1. If we decide to add the
next number from file2, then we must read a new value for number2
from file2.
Thus, our first refinement is:
read in first number from file1 read in first number from file2 while(more numbers to process) do begin add the appropriate number from file1 or file2 to the output file writeln; end;
We expand this a little bit:
read in first number from file1
read in first number from file2
while(more numbers to process) do begin
if number for file1 is smaller then number for file2
add the number from file1 to output file
read in next number from file1
end
else
if number for file2 is smaller than number for file1
add the number from file2 to output file
read in next number file2
end;
writeln;
end;
There's only one special case--when we finish either file1 or file2. If we have read in all the numbers from file1 and put them in the output, then we have to copy the remainder of file2 to file1. Likewise, if we read in all the numbers from file2 and put them in the output, then we have to copy of the remaidnerof file1 to file2.
Let's look at our do loop invariant.
number1 contains the current number from u413a -- first file
number2 contains the current number from u413b -- second file
all numbers before number1 from first file are already put in output file
all numbers before number2 from second file are already put in
output file
output file is sorted
all the numbers in the output file are less than the current number
from first
file
all the numbers in the output file are less than the current number
from
second file
Check 1: that the do loop invariant and the condition being true implies that executing the statements leaves the do loop invariant true
There are two major possibilities to consider. The first is that number1 < number2. From the do loop invariant, we know that all the numbers that came from file1 that were less than this number are also in the file. Since, number1 is less than number2, we know that all the numbers in the output file are also less than number2. So if we add number1 to the output file, then all the numbers in the output file will still be less than number2. When we read the next number from file1, the fact that file1 is sorted implies that the next number read in will be greater than the old value. Thus, it will now be true that all numbers in the output file are both less than number1 and less than number2. Since the file was sorted prior to adding number1 and that number was larger than all the other numbers in the output file--the file will remain sorted. If we executed the other branch of the if, then the same sort of argument would apply.
Check 2: that the do loop invariant and the condition not being true implies the POSTCONDITION for the loop
If the condition is not true, that means that both number1 and number2 were -1 the sentinel value. The loop invariant says that all the numbers from file1 before the -1 sentinel are in the output file. Likewise, all the numbers from file2 are before the -1 sentinel in number2. Also, the file is sorted--so the output file is a sorted merge of the two outputs.
Check 3: that the do loop invariant is initially true. We did that by reading in NUMBER but not including that value in the SUM.
There is nothing in the output file and the first number from file1 is in number1 adn the first number from file2 is in number2.
Check 4: that executing the statements once makes progress towards the goal. Since each read statement removes a number from the file and advances us toward the end of what must be a finite set, then the statement makes progress.
We move one number from either file in each time through the loop.