Topic 3  Databases

In the exam you are expected to:
  • Explain the purpose of a database management system (DBMS).

  • Explain the role of the database administrator.

  • Explain what is meant by data consistency, data integrity, data redundancy and data independence.

  • Describe what is meant by entity relationships and data normalisation.

 

What is a Database?

At its simplest a database is a collection of related records.  A record is set of related data, for example the name and address of a customer.  Each record has a set of attributes, for example, name, address 1, address 2, etc.

In Module 2 we saw that databases can be simple flat files, for example, a record collection.  We also saw that large flat files contained duplicated data.  There was data redundancy.  We also saw that to reduce data redundancy we used a relational database, in which related tables were linked through primary keys, which were unique identifiers for each entity (item of interest in the database).

You can make a flat-file data base with:

The problems with flat files are numerous:

Question 1  Give three advantages of a relational database.  ANSWER

 

Database Management System (DBMS)

From now on we will use the definition of a database as:

a collection of non-redundant data shareable between different applications.

In the early databases, all the data in an organisation were pooled in a central location, accessible by all applications, programs that could process the data in some way.  However there remained two big problems:

These issues were dealt with by a layer of software that links the data with the applications, called a database management system (DBMS) of which the general scheme is like this:

 

 

 

 

 

 

 

 

 

 

The DBMS acts as an interface between the data and the applications.  The DBMS has two key features:

Entity relationships

When a database is designed, a conceptual data model is produced, which considers all the data that are to be held, and what is to be done with them.  We need to remind ourselves of some key words:

There are three degrees (kinds) of relationships:

The relationships are shown in entity-relationship diagrams.  The degrees of relationship are shown by the crow's feet.

 

 

 

 

 

 

 

 

 

 

Database management systems cannot deal with many-to-many relationships.  They get round this by sing a link table, for example:

 

 

 

 

 

 

Many doctors see many patients.  The many-to-many relationship is broken by the Appointments table which is the link table.

Each patient can have many appointments, although each appointment is with one patient.  Each doctor has many appointments, although each appointment is with one doctor.  This is the simplest ER diagram for a relational database; you may well have done something like this for your project.

 

Normalisation

This is probably the most difficult area of database design.  Normalisation ensures that the relational database has the best possible design.  Tables should be designed so that:

Here are some un-normalised data.

Cust

_ID

Name

Address

Telep No

Invoice Id

Invoice_date

PaymentDueBy

Total

Goods

Quantity

Price

Description

Supp ID

Supp Name

001

Smith, John

18 Highcliffe Rd, Stockton,TS18 7KL

(01642) 654163

1001

01/11/02

01/12/02

250.00

SL1

10

25.00

Binders

S001

Ford, J

 

1002

02/11/02

02/12/02

500.00

SL2

5

100.00

Cabinets

S002

Singh, H

1003

03/11/02

03/12/02

20.00

SL3

1

 

 

S003

West, K

002

Jones, Sam

79 High Street, M’Bro TS19 7HU

(01642) 627462

1004

05/11/02

05/12/02

50.00

SL4

2

25.00

Disks

S004

Gibson, M

 

1005

06/11/02

06/11/02

70.00

SL5

1

70.00

Desk

S005

King, S

1006

07/11/02

07/12/03

100.00

SL6

2

50.00

Chairs

S006

Winters, A

003

Parker Helen

20 Winston Grove, Stockton Ts16 7YH

(01642)

746524

1007

01/10/02

01/11/02

70.00

SL7

 

2

 

 

10.00

 

Wallets

S007

Hunt, L

 

SL1

1

25.00

Binders

S001

Ford, J

SL4

1

25.00

Disks

S004

Gibson, M

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This table has repeated data.  Can you see where?  There is data redundancy, which on a big table will make for a big file.

A table is in First Normal Form if there are no repeating attributes or groups of attributes.  One solution might be to rewrite the table:

Cust

_ID

Name

Address

Telep No

Invoice Id

Invoice_date

PaymentDueBy

Total

Goods

Quantity

Price

Description

Supp ID

Supp Name

001

Smith, John

18 Highcliffe Rd, Stockton,TS18 7KL

(01642) 654163

1001

1002

1003

01/11/02

02/11/02

03/11/02

01/12/02

02/12/02

03/12/02

250.00

500.00

20.00

SL1

SL2

SL3

10

5

1

25.00

100.00

20

Binders

Cabinets

Shredder

S001

S002

S003

Ford, J

Singh, H

West, K

002

Jones, Sam

79 High Street, M’Bro TS19 7HU

(01642) 627462

1004

1005

1006

05/11/02

06/11/02

07/11/02

05/12/02

06/11/02

07/12/03

50.00

70.00

100.00

SL4

SL5

SL6

2

1

2

25.00

70.00

50.00

Disks

Desk

Chairs

S004

S005

S006

Gibson, M

King, S

Winters, A

003

Parker Helen

20 Winston Grove, Stockton Ts16 7YH

(01642)

746524

1007

01/10/02

01/11/02

70.00

SL7

 SL1

SL4

2

 1

 1

10.00

 25.00

25.00

Wallets

Binders

Disks

S007

S001

S004

Hunt, L

Ford, J

Gibson, M

 

 

 

 

 

 

 

 

This is NOT in first normal form, because we still have repeating attributes.  This is a very common bear trap.  We need to break the table up:

Cust

_ID

Name

Address

Telep No

Invoice Id

001

Smith, John

18 Highcliffe Rd, Stockton,TS18 7KL

(01642) 654163

1001

 001

Smith, John 18 Highcliffe Rd, Stockton,TS18 7KL  

1002

001 Smith, John 18 Highcliffe Rd, Stockton,TS18 7KL  

1003

002

Jones, Sam

79 High Street, M’Bro TS19 7HU

(01642) 627462

1004

 002

Jones, Sam 79 High Street, M’Bro TS19 7HU  

1005

002 Jones, Sam 79 High Street, M’Bro TS19 7HU  

1006

003

Parker Helen

20 Winston Grove, Stockton Ts16 7YH

(01642) 746524

1007

 

 

 

 

 

 

Invoice Id

Invoice_date

PaymentDueBy

Total

Goods

Quantity

Price

Description

Supp ID

Supp Name

1001

01/11/02

01/12/02

250.00

SL1

10

25.00

Binders

S001

Ford, J

1002

02/11/02

02/12/02

500.00

SL2

5

100.00

Cabinets

S002

Singh, H

1003

03/11/02

03/12/02

20.00

SL3

1

 20.00

 Shredder

S003

West, K

1004

05/11/02

05/12/02

50.00

SL4

2

25.00

Disks

S004

Gibson, M

1005

06/11/02

06/11/02

70.00

SL5

1

70.00

Desk

S005

King, S

1006

07/11/02

07/12/03

100.00

SL6

2

50.00

Chairs

S006

Winters, A

1007

01/10/02

01/11/02

70.00

SL7

2

10.00 

Wallets

S007

Hunt, L

1007       SL1 1 25.00 Binders

S001

Ford, J

1007       SL4 1 25.00 Disks

S004

Gibson, M

 

 

 

 

 

 

This is a bit better, but it's not perfect.  There is still  data redundancy.  Look in the first table and you will find that the customer is being identified by the invoice number.  The customer really needs to be identified solely by the customer ID number.  To make retrieval more efficient we need to put the data into second normal form:

A table is in second normal form if it is in first normal form and no column that is not part of a primary key is dependent on only a portion of the primary key (i.e. there are no partial dependencies).

That's a bit of a mouthful but it means that all the attributes in the customer table should only be dependent on the customer ID.  So things like name, address, and telephone number are looked up using the Customer ID unique identifier, not the invoice unique identifier.  So we can break the tables up further to give an ER diagram like this:

 

 

 

 

 

 

 

 

The Order table will have two columns, Goods ID and Invoice ID.  This breaks the many to many relationship.

In third normal form, there are no non-key dependencies.  If we look in the invoice table above, we need to say that the goods are dependent only on the Goods ID, not the Invoice ID.  The supplier must be dependent only on the supplier ID and not the Goods ID.  So our normalised tables become, in third normal form:

CUSTOMER

Cust_ID

Name

Address

Telep_No

001

Smith, John

18, Highcliffe Road

Stockton TS18 7KL

01642 65 4163

002

Jones, Sam

79 High Street

Middlesbrough

TS1 7HU

01642 62 7462

003

Parker, Helen

20 Winston Grove

Stockton

TS16 7YH

01643 74 6254

 INVOICE

Invoice_ID

Customer_ID

Invoice_date

Payment_Dueby

Total (£)

1001

001

01/11/02

01/12/02

250.00

1002

001

02/11/02

02/12/02

500.00

1003

001

03/11/02

03/12/02

20.00

1004

002

05/11/02

05/12/02

50.00

1005

002

06/11/02

06/11/02

70.00

1006

002

07/11/02

07/12/02

100.00

1007

003

10/11/02

10/12/02

45.00

 ORDER

Invoice_ID

Goods_ID

Quantity

1001

SL1

10

1002

SL2

5

1003

SL3

1

1004

SL4

2

1005

SL5

1

1006

SL6

2

1007

SL7

2

 GOODS

Goods_ID

Supplier_ID

Price

Description

SL1

S001

25.00

Binders

SL2

S002

100.00

Cabinets

SL3

S003

20.00

Shredder

SL4

S004

25.00

Disks

SL5

S005

70.00

Desk

SL6

S006

50.00

Chairs

SL7

S007

10.00

Wallets

 SUPPLIER

Supplier_ID

Supplier_Name

S001

Ford, J

S002

Singh, H

S003

West, K

S004

Gibson, M

S005

King, S

S006

Winters, A

S007

Hunt, L

The ER Diagram becomes:

 

 

 

 

 

The tables can be written in standard notation like this:

CUSTOMER (Customer_ID, Name, Address, Telep_No)

INVOICE  (Invoice_ID, Customer_ID, Invoice_date, Payment_Dueby, Total (£))

ORDER (Invoice_ID, Goods_ID, Quantity)

GOODS (Goods_ID, Supplier_ID, Price, Description)

SUPPLIER (Supplier_ID, Supplier_Name)

Notice the following:

Difficult?  Yes, it is.  Databases are hard.  Large industrial databases need highly experienced analysts to design them if they are to be effective, and such analysts are among the highest paid of the ICT professionals.  Even then they do not always get it right, and the failures of some large government ICT database projects have lead to scandals of national proportions.

Question 2 The manager of a video hire shop uses a relational database management system to operate the business.  Separate database files hold details of customers, video tapes, and loans.  Customers can hire as many films as they wish.

(a) For each of the files mentioned above, identify the key fields and list other appropriate fields that will be required to enable the system to be maintained with minimum redundancy.

(b) Draw an ER diagram for the database.

(c) Describe three advantages of using a relational database rather than a flat file information storage system. (NEAB Past Question)  ANSWER  

Presentation: Normalisation of Data

Now go on to Database Management