Topic 3 Databases
In the exam you are expected
to:
|
What is a Database?
At its simplest a database is a collection of related records. A record is set of related data, for example the name and address of a customer. Each record has a set of attributes, for example, name, address 1, address 2, etc.
In Module 2 we saw that databases can be simple flat files, for example, a record collection. We also saw that large flat files contained duplicated data. There was data redundancy. We also saw that to reduce data redundancy we used a relational database, in which related tables were linked through primary keys, which were unique identifiers for each entity (item of interest in the database).
You can make a flat-file data base with:
a word-processor (a simple table);
a spreadsheet;
a flat-file database management system.
The problems with flat files are numerous:
Data redundancy - duplication of data in each file and across files.
Data inconsistency - data have to be changed in all files when the data in one file are changed. Often this updating is not done, so there are several versions of the data.
Data integrity - if there are lots of files with lots of duplicated data, an incorrect data entry can be duplicated across the system, making corrections tedious.
Program-data dependence - every program in every department has to specify which data fields they are going to process. If a field is changed, every program accessing that field must be altered to recognise the new format.
Data are not sharable - data held by one department have to be copied so that they can be used by other departments.
Question 1 Give three advantages of a relational database. ANSWER
Database Management System (DBMS)
From now on we will use the definition of a database as:
a collection of non-redundant data shareable between different applications.
In the early databases, all the data in an organisation were pooled in a central location, accessible by all applications, programs that could process the data in some way. However there remained two big problems:
unproductive maintenance - if a department needed to add a field to a file, all other programs accessing that file needed to be updated;
security issues - all data, however sensitive, could be accessed by all applications.
These issues were dealt with by a layer of software that links the data with the applications, called a database management system (DBMS) of which the general scheme is like this:

The DBMS acts as an interface between the data and the applications. The DBMS has two key features:
Program-data independence - the storage structure of the data is hidden from each application. In effect the data is converted into the form required by an application by the DBMS.
There is restricted access to the data, each user having access only to those parts they need. For example the marketing department does not need employees' payroll details.
When a database is designed, a conceptual data model is produced, which considers all the data that are to be held, and what is to be done with them. We need to remind ourselves of some key words:
Entity - any thing of interest to the organisation holding the database, e.g. Customer ID, Supplier, Credit Limit;
Attributes - any characteristic of an entity. A college course might have the following attributes: A2 or AS, Length, Name, Tutor, Department.
Relationship - link between two entities, for example student to course. Each student does several course, while each course has many students.
There are three degrees (kinds) of relationships:
One-to-one - for example each school has one headmaster and each headmaster has one school.
One-to-many - for example a train has many passengers. The relationship can be reversed; many passengers are on one train.
Many-to-many - for example many students do many courses.
The relationships are shown in entity-relationship diagrams. The degrees of relationship are shown by the crow's feet.

Database management systems cannot deal with many-to-many relationships. They get round this by sing a link table, for example:

Many doctors see many patients. The many-to-many relationship is broken by the Appointments table which is the link table.
Each patient can have many appointments, although each appointment is with one patient. Each doctor has many appointments, although each appointment is with one doctor. This is the simplest ER diagram for a relational database; you may well have done something like this for your project.
This is probably the most difficult area of database design. Normalisation ensures that the relational database has the best possible design. Tables should be designed so that:
No data are unnecessarily duplicated - the same data are held on one table;
Data are consistent across tables (if the data are not duplicated, this should not be a problem)
The table structure should be flexible enough to allow the designer at add as many or as few items as they want.
The structure enables the user to make all sorts of complex queries to relate data from different tables.
Here are some un-normalised data.
|
Cust _ID |
Name |
Address |
Telep No |
Invoice Id |
Invoice_date |
PaymentDueBy |
Total |
Goods |
Quantity |
Price |
Description |
Supp ID |
Supp Name |
|
001 |
Smith, John |
18 Highcliffe Rd, Stockton,TS18 7KL |
(01642) 654163 |
1001 |
01/11/02 |
01/12/02 |
250.00 |
SL1 |
10 |
25.00 |
Binders |
S001 |
Ford, J |
|
|
1002 |
02/11/02 |
02/12/02 |
500.00 |
SL2 |
5 |
100.00 |
Cabinets |
S002 |
Singh, H |
|||
|
1003 |
03/11/02 |
03/12/02 |
20.00 |
SL3 |
1 |
|
|
S003 |
West, K |
||||
|
002 |
Jones, Sam |
79 High Street, M’Bro TS19 7HU |
(01642) 627462 |
1004 |
05/11/02 |
05/12/02 |
50.00 |
SL4 |
2 |
25.00 |
Disks |
S004 |
Gibson, M |
|
|
1005 |
06/11/02 |
06/11/02 |
70.00 |
SL5 |
1 |
70.00 |
Desk |
S005 |
King, S |
|||
|
1006 |
07/11/02 |
07/12/03 |
100.00 |
SL6 |
2 |
50.00 |
Chairs |
S006 |
Winters, A |
||||
|
003 |
Parker Helen |
20 Winston Grove, Stockton Ts16 7YH |
(01642) 746524 |
1007 |
01/10/02 |
01/11/02 |
70.00 |
SL7
|
2
|
10.00
|
Wallets |
S007 |
Hunt, L |
|
|
SL1 |
1 |
25.00 |
Binders |
S001 |
Ford, J |
|||||||
|
SL4 |
1 |
25.00 |
Disks |
S004 |
Gibson, M |
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This table has repeated data. Can you see where? There is data redundancy, which on a big table will make for a big file.
A table is in First Normal Form if there are no repeating attributes or groups of attributes. One solution might be to rewrite the table:
|
Cust _ID |
Name |
Address |
Telep No |
Invoice Id |
Invoice_date |
PaymentDueBy |
Total |
Goods |
Quantity |
Price |
Description |
Supp ID |
Supp Name |
|
001 |
Smith, John |
18 Highcliffe Rd, Stockton,TS18 7KL |
(01642) 654163 |
1001 1002 1003 |
01/11/02 02/11/02 03/11/02 |
01/12/02 02/12/02 03/12/02 |
250.00 500.00 20.00 |
SL1 SL2 SL3 |
10 5 1 |
25.00 100.00 20 |
Binders Cabinets Shredder |
S001 S002 S003 |
Ford, J Singh, H West, K |
|
002 |
Jones, Sam |
79 High Street, M’Bro TS19 7HU |
(01642) 627462 |
1004 1005 1006 |
05/11/02 06/11/02 07/11/02 |
05/12/02 06/11/02 07/12/03 |
50.00 70.00 100.00 |
SL4 SL5 SL6 |
2 1 2 |
25.00 70.00 50.00 |
Disks Desk Chairs |
S004 S005 S006 |
Gibson, M King, S Winters, A |
|
003 |
Parker Helen |
20 Winston Grove, Stockton Ts16 7YH |
(01642) 746524 |
1007 |
01/10/02 |
01/11/02 |
70.00 |
SL7 SL1 SL4 |
2 1 1 |
10.00 25.00 25.00 |
Wallets Binders Disks |
S007 S001 S004 |
Hunt, L Ford, J Gibson, M |

This is NOT in first normal form, because we still have repeating attributes. This is a very common bear trap. We need to break the table up:
|
Cust _ID |
Name |
Address |
Telep No |
Invoice Id |
|
001 |
Smith, John |
18 Highcliffe Rd, Stockton,TS18 7KL |
(01642) 654163 |
1001 |
|
001 |
Smith, John | 18 Highcliffe Rd, Stockton,TS18 7KL |
1002 |
|
| 001 | Smith, John | 18 Highcliffe Rd, Stockton,TS18 7KL |
1003 |
|
|
002 |
Jones, Sam |
79 High Street, M’Bro TS19 7HU |
(01642) 627462 |
1004 |
|
002 |
Jones, Sam | 79 High Street, M’Bro TS19 7HU |
1005 |
|
| 002 | Jones, Sam | 79 High Street, M’Bro TS19 7HU |
1006 |
|
|
003 |
Parker Helen |
20 Winston Grove, Stockton Ts16 7YH |
(01642) 746524 |
1007 |
|
Invoice Id |
Invoice_date |
PaymentDueBy |
Total |
Goods |
Quantity |
Price |
Description |
Supp ID |
Supp Name |
|
1001 |
01/11/02 |
01/12/02 |
250.00 |
SL1 |
10 |
25.00 |
Binders |
S001 |
Ford, J |
|
1002 |
02/11/02 |
02/12/02 |
500.00 |
SL2 |
5 |
100.00 |
Cabinets |
S002 |
Singh, H |
|
1003 |
03/11/02 |
03/12/02 |
20.00 |
SL3 |
1 |
20.00 |
Shredder |
S003 |
West, K |
|
1004 |
05/11/02 |
05/12/02 |
50.00 |
SL4 |
2 |
25.00 |
Disks |
S004 |
Gibson, M |
|
1005 |
06/11/02 |
06/11/02 |
70.00 |
SL5 |
1 |
70.00 |
Desk |
S005 |
King, S |
|
1006 |
07/11/02 |
07/12/03 |
100.00 |
SL6 |
2 |
50.00 |
Chairs |
S006 |
Winters, A |
|
1007 |
01/10/02 |
01/11/02 |
70.00 |
SL7 |
2 |
10.00 |
Wallets |
S007 |
Hunt, L |
| 1007 | SL1 | 1 | 25.00 | Binders |
S001 |
Ford, J |
|||
| 1007 | SL4 | 1 | 25.00 | Disks |
S004 |
Gibson, M |
This is a bit better, but it's not perfect. There is still data redundancy. Look in the first table and you will find that the customer is being identified by the invoice number. The customer really needs to be identified solely by the customer ID number. To make retrieval more efficient we need to put the data into second normal form:
A table is in second normal form if it is in first normal form and no column that is not part of a primary key is dependent on only a portion of the primary key (i.e. there are no partial dependencies).
That's a bit of a mouthful but it means that all the attributes in the customer table should only be dependent on the customer ID. So things like name, address, and telephone number are looked up using the Customer ID unique identifier, not the invoice unique identifier. So we can break the tables up further to give an ER diagram like this:

The Order table will have two columns, Goods ID and Invoice ID. This breaks the many to many relationship.
In third normal form, there are no non-key dependencies. If we look in the invoice table above, we need to say that the goods are dependent only on the Goods ID, not the Invoice ID. The supplier must be dependent only on the supplier ID and not the Goods ID. So our normalised tables become, in third normal form:
CUSTOMER
|
Cust_ID |
Name |
Address |
Telep_No |
|
001 |
Smith, John |
18, Highcliffe Road Stockton TS18 7KL |
01642 65 4163 |
|
002 |
Jones, Sam |
79 High Street Middlesbrough TS1 7HU |
01642 62 7462 |
|
003 |
Parker, Helen |
20 Winston Grove Stockton TS16 7YH |
01643 74 6254 |
INVOICE
|
Invoice_ID |
Customer_ID |
Invoice_date |
Payment_Dueby |
Total (£) |
|
1001 |
001 |
01/11/02 |
01/12/02 |
250.00 |
|
1002 |
001 |
02/11/02 |
02/12/02 |
500.00 |
|
1003 |
001 |
03/11/02 |
03/12/02 |
20.00 |
|
1004 |
002 |
05/11/02 |
05/12/02 |
50.00 |
|
1005 |
002 |
06/11/02 |
06/11/02 |
70.00 |
|
1006 |
002 |
07/11/02 |
07/12/02 |
100.00 |
|
1007 |
003 |
10/11/02 |
10/12/02 |
45.00 |
ORDER
|
Invoice_ID |
Goods_ID |
Quantity |
|
1001 |
SL1 |
10 |
|
1002 |
SL2 |
5 |
|
1003 |
SL3 |
1 |
|
1004 |
SL4 |
2 |
|
1005 |
SL5 |
1 |
|
1006 |
SL6 |
2 |
|
1007 |
SL7 |
2 |
GOODS
Goods_ID |
Supplier_ID |
Price |
Description |
|
SL1 |
S001 |
25.00 |
Binders |
|
SL2 |
S002 |
100.00 |
Cabinets |
|
SL3 |
S003 |
20.00 |
Shredder |
|
SL4 |
S004 |
25.00 |
Disks |
|
SL5 |
S005 |
70.00 |
Desk |
|
SL6 |
S006 |
50.00 |
Chairs |
|
SL7 |
S007 |
10.00 |
Wallets |
SUPPLIER
Supplier_ID |
Supplier_Name |
|
S001 |
Ford, J |
|
S002 |
Singh, H |
|
S003 |
West, K |
|
S004 |
Gibson, M |
|
S005 |
King, S |
|
S006 |
Winters, A |
|
S007 |
Hunt, L |
The ER Diagram becomes:

The tables can be written in standard notation like this:
CUSTOMER (Customer_ID, Name, Address, Telep_No)
INVOICE (Invoice_ID, Customer_ID, Invoice_date, Payment_Dueby, Total (£))
ORDER (Invoice_ID, Goods_ID, Quantity)
GOODS (Goods_ID, Supplier_ID, Price, Description)
SUPPLIER (Supplier_ID, Supplier_Name)
Notice the following:
The names of the tables are in UPPER CASE;
The key fields are underlined;
Foreign keys, i.e. those from other tables are shown underlined and in italics;
In some old database management systems, white space characters (such as space) are not allowed, so words are separated by the underscore character, _;
Many databases use the Leszynski-Reddick naming convention, e.g. tblCustomer. This is not on the syllabus, so we won't go any further here, although you might find your teacher insisting on its use in the project.
Difficult? Yes, it is. Databases are hard. Large industrial databases need highly experienced analysts to design them if they are to be effective, and such analysts are among the highest paid of the ICT professionals. Even then they do not always get it right, and the failures of some large government ICT database projects have lead to scandals of national proportions.
Question 2 The manager of a video hire shop uses a relational database management system to operate the business. Separate database files hold details of customers, video tapes, and loans. Customers can hire as many films as they wish.
(a) For each of the files mentioned above, identify the key fields and list other appropriate fields that will be required to enable the system to be maintained with minimum redundancy.
(b) Draw an ER diagram for the database.
(c) Describe three advantages of using a relational database rather than a flat file information storage system. (NEAB Past Question) ANSWER
Presentation: Normalisation of Data
Now go on to Database Management