Organisation
of Data for Effective Retrieval
A database is any system for holding records of information.
It can be manual, for example, a card index, or a folder.
Usually we use it in the context of a computerised system.
If we were to retrieve information from a large manual database, it could take a long time, and there is the possibility of mistakes. It will take about a second from a computer. We can also use data to build up profiles, which is much harder, if not impossible, with a manual system. For example, the Police hold records on a database of every person who has any criminal conviction. Most criminals show a pattern in the way that they commit crimes. If a crime has a particular pattern to it, then the database can show very quickly a list of suspects. Any fingerprint patterns can also be compared quickly and DNA patterns can lead the Police to a particular suspect.
Question
1 Explain why criminal
investigations generally took more time thirty years ago, before the
widespread introduction of computer technology. ANSWER
Computerised databases use software called database management systems to access, update, and manipulate the data. Microsoft Access is a DBMS.
The simplest kind of database is called a flat file, in which all the data records are held on a single file. A data record is a collection of related data, for example, name, address, and telephone number. They are simple to use, and even a spreadsheet can be used as one, and all word-processor programs allow for tables to be made. They can be rather like a manual record, but on a computer. However they have drawbacks:
Redundant data can make the file large;
Errors can arise through typing the data in different record;
It is harder to update and modify the data.
Flat files can be used throughout an organisation, which each department keeping their own records. This can be referred to as a flat file information and retrieval system. It has problems such as:
Redundancy of data
Adding new fields to a file is more difficult as all programs have to be modified;
The various different files are not able to link together.
Question 2 Write down two advantages and two disadvantages of a flat file database. ANSWER
Here
is a typical flat file database table:
|
Student |
First
Name |
Course
1 |
Course
2 |
Course
3 |
|
Arnfield |
Jason |
Physics |
Chemistry |
Maths |
|
Byland |
Claire |
Biology |
Maths |
Chemistry |
|
Caulfield |
Jason |
Physics |
Computing |
Maths |
|
Denham |
Ian |
Computing |
Technology |
Electronics |
|
Egan |
Jonathan |
Computing |
Law |
English |
You can see how simple it is. However if we had a whole college of 1500 students, the table would cover many pages. A flat file database management system can do a simple search. Each department has its own copy.
Question 3 Suppose Byland decided she didn’t like Chemistry and decided to do English instead. What problems do you think this might cause? ANSWER
Now let’s rearrange the table
by subject:
|
Subject |
Student |
|
Physics |
Arnfield |
|
|
Caulfield |
|
Chemistry |
Arnfield |
|
|
Byland |
|
Maths |
Arnfield |
|
|
Byland |
|
|
Caulfield |
|
Computing |
Caulfield |
|
|
Denham |
|
|
Egan |
|
And so on… |
|
Notice
that some of the cells in the subject column are blank.
The computer may not like that and return an error message “Cannot
have a null value”. Null
means nothing in the field. So we
would have to modify our table to:
|
Subject |
Student |
|
Physics |
Arnfield |
|
Physics |
Caulfield |
|
Chemistry |
Arnfield |
|
Chemistry |
Byland |
|
Maths |
Arnfield |
|
Mths |
Byland |
|
Maths |
Cauliflower |
|
Computing |
Caulfield |
|
Computing |
Denham |
|
Computing |
Egan |
|
And so on… |
|
There is redundancy here, which means that data is repeated. This is inefficient use of computer space, and with a large file, the wasted space would be horrendous.
Question 4 Use the table above to explain why redundancy is a problem. ANSWER
All these problems can be solved by a relational database system. The advantages of this over flat files are many. The principal advantages are:All data is in a common pool, accessible by all applications.
The system is much more flexible.
Much less data is needed to be held.
Easier to maintain high quality information.
However there are still a couple of major problems:
Unproductive maintenance, where programs were still dependent on the structure of data. If a department needed to add, say, a new field, all the programs would have to be altered for all departments.
Lack of security, so that even the most sensitive information could be accessed
Question
5 Write down two
advantages and two disadvantages of a relational database. ANSWER
This was addressed by software called the database management system (DBMS). There are two essential features:
1. Storage structure of the data is hidden from the user.
2. Restricted user access to the data.
Databases have their own particular jargon with which we need to be familiar. Before that makes sense, we need to understand the hierarchy of data
Level
|
Term
|
Meaning
|
Lowest |
Bit |
Binary digit, “on” or “off”, the 0’s and 1’s. |
|
|
Byte |
8 bits form a byte. Bytes can represent characters, or other data, such as parts of a picture or a sound |
|
|
Field |
Characters* grouped together, e.g. the letters or numbers that form a car registration number. |
|
|
Record |
All the information about somebody or something |
|
|
File |
A collection of related records |
Highest |
Database |
Lots of files linked so that information can be got from several files at the same time. |
* Characters are represented by the ASCII codes (American Standard Code for Information Exchange). With 8 bits, 28 (256) combinations can be represented. In all its functions, arithmetical, or word-processing, or whatever, the computer is adding these combinations. The vast majority of modern computers use the ASCII system. There is another system called EBCDIC, used by a few old-fashioned mainframes.
Before a database is set up, we need to decide how we are going to lay out the records:
1. The order in which fields are stored
2. What each field is called
3. What kinds of field, such as text, numeric, etc.
4. The field length. Differences between lengths of a particular type of field can cause problems when the computer links the fields.
5. The range of the data entered into the field.
6. What validation rules there are.
The screenshot shows some of the features in the list above

This allows the automatic validation of data, e.g. rejection of a date like 31-2-84. If there are to be any kinds of calculations, the field must be number. In any database, there needs to be a primary key, a unique identifier for any item. In this database Cust_ID is the primary key as each customer has their own unique identity.
Notice also that the underscore character (_) is used where there is a space. This is because some database management systems do not allow white space characters. The name of the table, tblCustomer, is consistent with the Leszynski–Reddick Convention, which is the industry standard.
Question 6. Explain with an example what is meant by validation of data. ANSWER
How a Relational Database is made up
The
key idea for a relational database is that it is made up of different tables
which have links between them. They
are linked using the primary keys.
The other fields in a table are dependent on the primary key. So for the Customer Identity, the fields Address_1, Address_2, etc are dependent on the Customer_ID, which is logical. Those fields would not depend on the Primary key Goods_ID, which would be in another table.
Go on to Database Design