Topic 7  Databases

 Organisation of Data for Effective Retrieval

A database is any system for holding records of information.  It can be manual, for example, a card index, or a folder.  Usually we use it in the context of a computerised system.

If we were to retrieve information from a large manual database, it could take a long time, and there is the possibility of mistakes.  It will take about a second from a computer.  We can also use data to build up profiles, which is much harder, if not impossible, with a manual system.  For example, the Police hold records on a database of every person who has any criminal conviction.  Most criminals show a pattern in the way that they commit crimes.  If a crime has a particular pattern to it, then the database can show very quickly a list of suspects.  Any fingerprint patterns can also be compared quickly and DNA patterns can lead the Police to a particular suspect.

Question 1  Explain why criminal investigations generally took more time thirty years ago, before the widespread introduction of computer technology.  ANSWER

Computerised databases use software called database management systems to access, update, and manipulate the data.  Microsoft Access is a DBMS.

The simplest kind of database is called a flat file, in which all the data records are held on a single file.  A data record is a collection of related data, for example, name, address, and telephone number.  They are simple to use, and even a spreadsheet can be used as one, and all word-processor programs allow for tables to be made.  They can be rather like a manual record, but on a computer.  However they have drawbacks:

Flat files can be used throughout an organisation, which each department keeping their own records.  This can be referred to as a flat file information and retrieval system.  It has problems such as:

Question 2  Write down two advantages and two disadvantages of a flat file database.  ANSWER

Here is a typical flat file database table:

Student

First Name

Course 1

Course 2

Course 3

Arnfield

Jason

Physics

Chemistry

Maths

Byland

Claire

Biology

Maths

Chemistry

Caulfield

Jason

Physics

Computing

Maths

Denham

Ian

Computing

Technology

Electronics

Egan

Jonathan

Computing

Law

English

You can see how simple it is.  However if we had a whole college of 1500 students, the table would cover many pages.  A flat file database management system can do a simple search.  Each department has its own copy.

Question 3  Suppose Byland decided she didn’t like Chemistry and decided to do English instead.  What problems do you think this might cause? ANSWER

Now let’s rearrange the table by subject:

Subject

Student

Physics

Arnfield

 

Caulfield

Chemistry

Arnfield

 

Byland

Maths

Arnfield

 

Byland

 

Caulfield

Computing

Caulfield

 

Denham

 

Egan

And so on…

 

 Notice that some of the cells in the subject column are blank.  The computer may not like that and return an error message “Cannot have a null value”.  Null means nothing in the field.  So we would have to modify our table to:

Subject

Student

Physics

Arnfield

Physics

Caulfield

Chemistry

Arnfield

Chemistry

Byland

Maths

Arnfield

Mths

Byland

Maths

Cauliflower

Computing

Caulfield

Computing

Denham

Computing

Egan

And so on…

 

There is redundancy here, which means that data is repeated.  This is inefficient use of computer space, and with a large file, the wasted space would be horrendous.

Question 4 Use the table above to explain why redundancy is a problem.  ANSWER

All these problems can be solved by a relational database system.  The advantages of this over flat files are many.  The principal advantages are:

However there are still a couple of major problems:

Question 5  Write down two advantages and two disadvantages of a relational database.  ANSWER

This was addressed by software called the database management system (DBMS).  There are two essential features:

1.      Storage structure of the data is hidden from the user.

2.      Restricted user access to the data.

Databases have their own particular jargon with which we need to be familiar.  Before that makes sense, we need to understand the hierarchy of data

 

Level

Term

Meaning

Lowest

Bit

Binary digit, “on” or “off”, the 0’s and 1’s.

Byte

8 bits form a byte.  Bytes can represent characters, or other data, such as parts of a picture or a sound

 

Field

Characters* grouped together, e.g. the letters or numbers that form a car registration number.

 

Record

All the information about somebody or something

 

File

A collection of related records

Highest

Database

Lots of files linked so that information can be got from several files at the same time.

 

* Characters are represented by the ASCII codes (American Standard Code for Information Exchange).  With 8 bits, 28 (256) combinations can be represented.  In all its functions, arithmetical, or word-processing, or whatever, the computer is adding these combinations.  The vast majority of modern computers use the ASCII system.  There is another system called EBCDIC, used by a few old-fashioned mainframes.

Before a database is set up, we need to decide how we are going to lay out the records:

1.      The order in which fields are stored

2.      What each field is called

3.      What kinds of field, such as text, numeric, etc.

4.      The field length. Differences between lengths of a particular type of field can cause problems when the computer links the fields.

5.      The range of the data entered into the field.

6.      What validation rules there are.

The screenshot shows some of the features in the list above

 

This allows the automatic validation of data, e.g. rejection of a date like 31-2-84.  If there are to be any kinds of calculations, the field must be number.  In any database, there needs to be a primary key, a unique identifier for any item.  In this database Cust_ID is the primary key as each customer has their own unique identity.

Notice also that the underscore character (_) is used where there is a space.  This is because some database management systems do not allow white space characters.  The name of the table, tblCustomer, is consistent with the Leszynski–Reddick Convention, which is the industry standard.

 Question 6.  Explain with an example what is meant by validation of data.  ANSWER

 
How a Relational Database is made up

The key idea for a relational database is that it is made up of different tables which have links between them.  They are linked using the primary keys. 

The other fields in a table are dependent on the primary key.  So for the Customer Identity, the fields Address_1, Address_2, etc are dependent on the Customer_ID, which is logical.  Those fields would not depend on the Primary key Goods_ID, which would be in another table.

Go on to Database Design