Topic 10 Modes of Processing and Backing up
Problems in data capture
We have seen how data is taken into the computer (data capture). The data going into a computer must be correct, otherwise the processed information will be rubbish. At best this can be irritating, at worst disastrous. A computer’s output can only be as good as what has been put into it. (GIGO)
Question
1 What is meant by GIGO?
Write one way in which this can happen. ANSWER
Most data input is done by keyboard. However it can be error-prone. Mistakes are made by:
There are ways that some organisations use for data capture that avoids the use of keyboards:
Question 2 Describe one advantage and one disadvantage for one of the above methods of data capture. ANSWER
Accuracy of Information
Sometimes it is possible for the data that we collect to accurate and error free but gives us inaccurate information. For example, a survey sample of people conducted at eleven o’clock in the morning would not be typical of the population in general. The data would be skewed to the unemployed, pensioners, and home-makers. There have to be ways to ensure that surveys cover the entire population, including going into people’s homes in the evening when they have got back from work.
Opinion polls in the run-up to elections are a typical example. Suppose the data from the poll suggests the vote for the Conservative is 34 %, Labour 46 %, Liberal Democrats 18 %, and others 2 %. The data that was collected from the 1500 voters asked will be totally accurate. However this trend, identified in the poll, might well not be reflected in the actual vote on the day. Some constituencies have a high proportion of, say, Conservative voters; others would be Labour strongholds. It is very difficult to extrapolate the findings of a limited poll to what would be the vote of the entire electorate. There are statistical methods to correct for any bias in such a poll, but even so there is room for quite a lot of error.
Question
3
Some data can be accurate, but give misleading information.
How can this happen? ANSWER
Batch Processing
Batch processing allows off-line job scheduling. Batch processing is appropriate for production environments, high job-load environments, and situations where program results are not required immediately. It is sometimes called off line processing. It allows for a lot of processing to be done at times when the computer system would be idle, e.g. outside office hours.
When an Array system is used for batch scheduling, users submit jobs to batch queues, which contain ordered sets of waiting jobs. When sufficient compute resources become available, and subject to scheduling constraints, jobs are extracted from the batch queues and scheduled on the computer. Job results and termination status are recorded in files or are electronically mailed to the user. The idea is shown in this picture. In older systems the job was done on a mainframe.

Question 4 What is meant by off-line processing? Write down two advantages of off-line processing to a company. ANSWER
We have seen how important it is that data is accurate to ensure the efficient running of systems in business. We have also seen how inaccurate data can arise due to errors. It is important that businesses have procedures to ensure that data is validated before it is processed, so that errors are picked up.
In batch processing similar transactions are processed in batches of 50. A data control clerk supervises the process:
Documents are counted.
A visual check is made on each one to check for any essential details that might be missing.
Working out a control total for a vital field like Total Payable for the entire batch.
Working out hash totals for other fields. The hash total is used for validation of totals.
Filling in a batch header document.
Logging in the batch into a log book.
Question 5 Describe two jobs that the data control clerk does, explain how each reduces the likelihood of erroneous data getting into the computer system. ANSWER
The computer will perform the same calculation and if the two don’t match up, there is a mistake somewhere. It is similar to the control total, but has no meaning, whereas the control total does. The stages in batch processing are:
1. Collection of documents into batches of 50.
2. The data is keyed in to a computer that is separate from the main computer. The data is validated by a program to eliminate errors such as 31st February.
3. The data is verified by being entered a second time by another operator, which may show up discrepancies, which can be followed up.
4. The data is stored on a transaction file, which is then transferred to the main computer. This might be the simple carrying of a floppy disc to the main computer room, or remote downloading of data over a long distance line.
5. Processing begins at a certain time, usually when the computer is not busy with other business use.
6. The transaction file is sorted into the same order as the master file, on which all the transaction records are kept. This speeds up the processing stage.
7. The master file is updated.
8. The required reports are produced.
Batch processing is carried out by many businesses, including banks, mail order companies, and councils.
Validation
Computers have validation programs to check the data inputted by the operators:
Certain fields have to be filled in, a presence check. This will have been done visually by the operators, but this utility can also assign a Customer ID to a new customer. If you fill in an on-line form on an internet site, you have certain fields that are mandatory.
Format checks ensure that quantities and prices are numeric, and that the product code is two letters and four numbers.
Range checks eliminate absurdities like 31st February.
File look up checks can check up the details of an existing customer by use of the Customer ID. The data entry clerk can tally this with their records.
Check digits are used to validate codes.
Batch header checks ensure that calculated hash totals and control totals square up with what’s on the computer.
In summary, we need to be aware of the difference between verification and validation. Validation is done by computer programs; verification is often done by a second person. When you type in a new password for a second time, that is a verification procedure. The screen shot shows a validation procedure.

The best way of checking the validity of data is the check digit, which uses the modulus 11 system. The procedure is as follows:
1. Each digit is assigned a weight. The right hand digit is given 2, the next 3, and so on.
2. Each digit is multiplied by its weight and the products added together.
3. Sum is divided by 11 to gain the remainder.
The remainder is subtracted from 11 to give the check digit. If the remainder is 0, the check digit is 0, if the remainder is 1, the check digit is X.Question 6. A product has the code 1642814680. What is the check digit? ANSWER
In complex systems transmission errors can occur between two hardware components. This could cause serious corruption of data. So an extra bit, a parity bit is added. In an even parity machine, the total number of 1’s must be an even number. In an odd parity machine, it’s an odd number. Each component checks the parity, and if there are discrepancies, there is an error message.
This is a short section of a program that does a parity bit calculation.
Data is transmitted in blocks, and a checksum is calculated by adding up all the numeric values of the data.

Online Processing
There are occasions when it is not appropriate for processing to be done in batches. An example of this is booking airline seats. Once a seat is taken, then that’s it for the flight; it cannot be sold to someone else. Therefore the airline’s database has to be updated there and then

The diagram below shows the instant feedback available from an on-line booking system. In this example there is one seat left on the flight. It is booked instantly by a customer. Shortly after (it could be a fraction of a second later), a request for the seat comes for another customer. The computer has filled the seat and the customer is told that the flight is fully booked. Then the computer can allocate our second customer a seat on the next flight. There is a slight possibility that the two requests come in at exactly the same time. This is unlikely, but there are mechanisms to prevent double-booking.
On-line processing is important where data needs updating immediately, where the situation is constantly changing. An example may be an air-traffic control system, where aeroplanes are moving constantly, or a chemical plant where the behaviour of the processes has to be constantly supervised, and responded to.
Question 7 Compare the two systems of on line and off line data processing, stating what kind of organisation would use it and state the reasons that they use that particular method. ANSWER
Now go on to Security of Data and Back up Systems