Databases (1.3.1)

20p13280
Feb 3
20 min read

Updated: Mar 11

This post covers the following topics:

Database
Table
Record
Field
Entities and relationships
Use of keys: primary key, foreign key, secondary key, composite key
Database tools: query, form, report
Methods of importing external data into a database
Flatfile vs relational database
Data redundancy
Types of validations
Validation vs verification
DBMS (database management system)
ERD (Entity Relationship Modelling - degrees of relationships, issues with M:M relationships and how to resolve it)
Normalisation (1NF, 2NF, 3NF, the need for/advantages of normalisation)
Methods of data capture
Referential Integrity
SQL
Transaction processing - including record locking
ACID (Atomicity, Consistency, Isolation, and Durability)

Database, Table, Record and Field

A database is a persistent organised store of data that has support for storing data, retrieving data and managing it, a database will store one or more tables. A table is a structured collection of data that's organised by rows (records) and columns (fields). For example, a bank will have a database that stores many tables with customer information such as Customer, BankAccount, FraudAlert, CreditCard.

A record is just an entry of data, it contains values for all the fields defined within a table. A bank would create a new record for every card they issue and will store it within the CreditCard table (See example below of a complete record for a card within the CreditCard table).

A field is an attribute within a table, this is similar to OOP where an attribute (field) is a variable that is stored on a class (record). Each record will have a value for each field defined on a table. For example within a CreditCard table on a bank it may have the following fields:

CardID (Int)
AccountID (Int)
CardNumber (String)
IssuedDate (Date/String)
ExpiryDate (Date/String)
CVV (String)
CardType (String)

The CreditCard table for a bank could look like this:

CardID	AccountID	CardNumber	IssuedDate	ExpiryDate	CVV	CardType
1	5	2424 2424 2424 2424	12/25	06/28	234	Visa
2	9	1234 5678 9012 345	06/24	01/29	379	Visa
3	2	5486 3486 1537 9865	11/23	11/26	643	Mastercard

Entities and relationships

An entity is a (physical) construct/item that exists that is within the real-world and a database will hold data on. Entities are usually represented in a database by creating a table for them. Essentially an entity is a "thing". Within a bank, a credit card would be an entity and there is a CreditCard table to represent that.

The CreditCard table has a relationship with a Bank Account using a Foreign Key. This relationship connects a CreditCard to a BankAccount, it's a One-To-Many as a bank account can have multiple credit cards.

There are 3 types of relationships:

One-To-Many ( A bank account can have multiple credit cards and each credit card can only belong to one bank account )
Many-To-Many ( A single customer can have multiple bank accounts and a single bank account can have multiple customers owning/managing it)
One-To-One ( A single customer can only have a single CustomerLoginDetails)

Typically you want to avoid using Many-To-Many and you can do this by using a Link Table. A Link Table is a table that has a minimum of 2 keys, these keys are 2 foreign keys that each link to the primary key from one table. Then together they make up a composite key for the primary key of the record within the link table. So for example within the bank a single customer can have multiple bank accounts and a single bank account can have multiple customers owning/managing it. A link table to link the Customer and BankAccount tables would be named CustomerBankAccount and would have 2 fields, CustomerID and BankAccountID. Within the table these 2 foreign keys are used together (composite key) as a single primary key. This link table now creates two One-To-Many relationships.

Typically you can show relationships using Entity Relationship Diagrams, here is an example of the CustomerBankAccount, Customer and BankAccount Entity Relationship Diagram:

An entity relationship diagram is a diagram used to show the degree of relationship between multiple entities. Each entity is represented using a box or a rectangle with the name of the table. Then each table is linked to another table with a line. A line with no crows feet indicates a One-To-One relationship. With one set of crows feet it means there is a One-To-Many relationship, with 2 sets it means that there is a Many-To-Many relationship. These diagrams are incredibly usefull whilst designing a database as it allows you to visually see the entities and how they are linked, allowing for databases to be more well-designed.

To create a relationship between 2 tables, a foreign key is required. A foreign key is a type of field that will store a primary key of another tables. This then creates a relationship and depending on how many tables have foreign keys that reference eachother you can make one of 3 types of relationships mentioned above. Find above an example of 2 foreign keys being used to make a link table for a Customer and a BankAccount that has a Many-To-Many relationship via 2 One-To-Many relationships.

There is an issue with using Many-To-Many relationships and becuase of this we use a Link table as seen above (CustomerBankAccount). The issue with Many-To-Many is it creates data redundancy as there is multiple versions of the same data stored to create these Many-To-Many relationships and its difficult to write queries with a Many-To-Many relationship. So that's why they are resolved using a link-table. A link table is a table thats made up of minimum 1 composite key which is made up of 2 foreign keys, one for each table. Allowing for a many-to-many like relationship without using a Many-To-Many key, only 2 one-to-many keys.

Use of keys: primary key, foreign key, secondary key, composite key

Primary Key

A primary key is a type of field which is a unique identifier that is given to each record within a table to allow it to be looked up, referenced and differentiated between other records within the same table. Primary keys can be used to order, sort and search the database table. When writing in exams or designing tables, you should underline the primary key to make it clear. Within the bank, the Customer can have a primary key that is their Customer Number which can just be an incrementing integer that increments for each new customer in the banks Customer table.

Foreign Key

A foreign key is a field within a table that references another records primary key in a separate table within the database. Foreign Keys allow for databases to be relational and for records to have relations with others. When writing in exams or designing tables, you should try to write a foreign key in italics (although it might be a bit difficult on paper). Within the bank, the CreditCard table has a field called AccountID which links together the CreditCard table and the AccountID using a One-To-Many relationship. The foreign key on the CreditCard will have a value of the primary key of the BankAccount that it's linked to, this creates the relationship.

Secondary Key

A secondary key (secondary index) is a field that is used as an index which allows a database to have faster data retrieval and makes searching a database easier as you c can search using other fields not just the primary key. A secondary key is not a primary key. Secondary keys can also be used to order, sort and search the database just like primary keys except they don't have to be unique and are not required within a database. Within the bank, the Customer table will have many secondary keys such as: FirstName, LastName, PhoneNumber. Within the real-world, a usecase for this is for support departments/agents. Instead of the bank support agent asking the customer for their Customer ID they can instead ask for the Customers FirstName, LastName and PhoneNumber and find the user in the banks Customer table.

Composite Key

A composite key is a field (typically a primary key) on a record that is made up of 2 or more fields, it's mainly used where there's no way to ensure that a single field is going to be unique so 2 or more can be combined that are guaranteed to be unique compare to each record on the table. For example within the banks database to link the Customer and BankAccount table with a Many-To-Many relationship a link table is needed. And to save storage and not have to put an incrementing integer primary key on the table a composite key can be used as the primary key instead. The composite key is made up of the 2 foreign keys, CustomerID and BankAccountID and these 2 together will be unique.

Database tools: query, form, report

Query

A query is a database tool that is used to search, filter and sort data from a database. Typically a query is written within SQL (Structured Query Language), this is used by almost all modern relational database engines and the syntax is identicall across all of them (some have some slight minor differences). An example of an SQL query a bank may use to get the first and last name from a customer with the ID of 5 looks like this:

SELECT FirstName,LastName FROM Customer WHERE CustomerID=5

An SQL query is broken down typically into 3 main parts:

SELECT (Selects which fields to return from the table, for example in the query above it's returning the FirstName and LastName of the customer. If you wanted to get all the fields on the customer you can use the wildcard (*) instead of field names.)
FROM (Defines what table within the database should be queried to find the customer with the CustomerID of 5.)
WHERE (Defines the criteria that each record needs to meet for it to be returned, essentially filtering. For the bank here it's checking if the CustomerID=5, which should only return 1 customer as this is using the primary key however you don't have to just use the primary key for filtering, you can use any field.)

Within an SQL statement it is also posisble to do JOINS. A join is where a relationship between 2 tables is established. For example within the bank the bank account table and credit card table have a link, so within the query they would do one of 2 join methods (both are INNER JOINS):

JOIN bankAccounts ON bankAccounts.accountNumbner = creditCard.acountNumber

WHERE bankAccounts.accountNumber = creditCard.accountNumber

To store a data within an SQL query it needs to be placed between hashtags, for example 01/01/2000 is #01/01/2000#. There are also many functions such as COUNT() or CONCAT().

Form

A database form is a database tool that creates a user-friendly interface for adding records, deleting records, editing records and viewing records. Database forms are powerful as they can have validation on them and checks to ensure that data is entered in the correct form and wont cause issues when a record is added or modified, for example a bank might add validation to ensure that a UK Customers mobile phone number is 10-11 digits long and begins with 07, they could then add in a check for the form so that you can't update a record to change a phone number when another customer already has that phone number associated with it.

Report

A database report is a structured and organised output from a database that can contain data from multiple tables, typically the data is collected for a report once a query has been run that will filter the data somehow or may just display all. Forms are extremely useful as they are a graphical way to represent data instead of just looking at a table, for example within the bank every month they will send out bank statements to Customers for each of their BankAccounts, each bank statement is a database report that contians the accounts starting balance at the beginning of the month, balance at the end, any interest, all the transactions, deposits, withdrawls, etc. that happened on a customers account.

Methods of importing external data into a database

Files (e.g. CSV File)

A CSV file is a Comma Seperated Value file, it's like a simple spreadsheet or database table with records. It starts with a heading row which defines the fields/columns that are within the file. The each row below that is essentially a record. As the name of the file says each value is seperated using a comma. So an example of a cvs file thats being imported into the banks CreditCard table might look like this:

CardID,AccountID,CardNumber,IssuedDate,ExpiryDate,CVV,CardType
4,96,7593 7594 0275 0673,06/21,05/27,758,Mastercard
5,91,8494 6283 9383 2097,02/26,02/30,556,Visa

When this is imported to the database the top row is treated as the field names and then each row below that is treated as a record and so these rows can easily be imported into the CreditCard table as record.

Form/Manual Entry

Another way of importing external data into a database is using a Database Form, which is Manual Entry. When a customer comes into a bank branch and sets up a new BankAccount the bank worker will ask the customer many questions like their name and phone number to query the database to get their CustomerID, the type of account and if they want a card. This information can be inputted manually into a database form, this then is submitted and a new BankAccount can be created for the customer and a new card can be created too using another form. Another example of a form is a physical paper form, the bank may send out a multiple-choice questionnaire for feedback, this can be filled in and send back to the bank where OMR can be used to get the multiple choice results and then this data can be imported into the banks CustomerFeedback database.

Flatfile vs Relational Database

A flatfile database is made up of a single table with no relationships as there is are no other tables. Whereas a relational database is made up of multiple tables that have relationships between eachother and are linked using foreign keys. As a result this means that relational databases can be more structured and organised as the data can be spread out into multiple tables instead of being put into 1. Also using multiple tables it cuts down on data redundancy (below), which means the integrity of the data will be strong as there is no repeated data so data is only ever stored once. Within the bank if the bank stored a customers, cards and bank accounts all within 1 single table there will be large amounts of repeated data as every time a new bank account is added the customers data and card data has to be duplicated. However within a relational database this can be split into multiple tables (Customer, BankAccount, CreditCard) and so it only has to be defined once per table.

Data redundancy

Data Redundancy is where the same data is stored within 2 or more fields within either a single record or across multiple tables. This can then lead to inconsistencies. For example, if the bank stores a customers name in both the CreditCard and their Customer table, if a customers name is changed and is only updated within the Customer table the CreditCard table has incorrect data in it and the data is then inconsistent. This could lead to issues like customers not being able to use their credit cards or being accused of stealing someone else's card as it doesn't have their new name attached to it.

On the device/server that the database server is hosted on, this can cause extra storage space to be used up due to the repeated data and this is unneccecary. Within the example I mentioned this shouldn't be too detrimental, however if for example YouTube stores a videos content every time it's reported then if a user reports a video multiple times then multiple coppies of that video can be stored, and if the video is long and has a large file size very quickly the device the database server is being hosted on could run out of secondary storage and then not be able to function and respond to queries and create new records.

Types of validations

See the table below:

Validation Type	How it works	Example of usage
Format check	Checking that data follows a pattern.	Ensuring that a date meets the DD/MM/YYYY format.
Length check	Ensures that data is either a certain length or within a range (min and max).	Checking if a password is longer than 8 characters.
Spell check	Checks that the data is spelt correctly.	Spell check within Google Docs that ensures everything is spelt correctly.
Range check	Checks that data falls within a minimum value and a maximum value.	Checking that a password is between 8 characters an 128 characters.
Presence check	Checks that data has been entered and a field isn't empty./	Checking the user inputted a password.
Check digit	A single digit that is calculated based on previous data to ensure that data hasn't been modified and detect errors and prevent tampering	Books have ISBNs that have a last digit that ensures that the ISBN is correct and valid.
Lookup table	Contains predefined values with keys and values like a dictionary. Can be used to check values are the predefined inputs.	Converting a number like 0-15 to a hexadecimal character 0-F.

Validation vs Verification

Data validation (above) is ensuring that data is formatted or entered in a certain way, for example in the bank this could be that a card number is 4 sets of 4 digits with a space in-between each set. Data verification is ensuring that data is accurate and truthful, for example a bank may verify a customers name by checking the customers ID.

DBMS (Database Management System)

A database management system is a piece of software that allows for the creation, management, retrieval and updating of data within a database. Examples of a DBMS include Postgres, MongoDB and MariaDB. They typically handle everything from storing the data to responding to queries and the security of the data. Most Relational DBMS use SQL for queries. For a bank they would typically use a Relational DBMS like PostgreSQL so that they can query it, store their data and control the security of their customers data and avoid breaking any financial protection laws.

Normalisation (1NF, 2NF, 3NF, the need for/advantages of normalisation)

Database normalisation is following a set of formal rules to make a database well-designed, easy to use and easy to maintain. It's also to ensure that datbases aren't flat-file and therefore there is no data redundancy or grouped data, instead all of the data is logically grouped together. Ideally a database should be in 3NF.

First Normal Form (1NF)

To be in first normal form a database has to meet these requirements:

No repeating attributes
No grouped attributes

This means that all data has to be Atomic (in it's simplest form, e.g. instead of AccountHolderName it would be FirstName, MiddleName, LastName).

Second Normal Form (2NF)

To be in second normal form a database first has to meet all the requirements of first normal form and also must have no partial dependencies. This means that all of a tables non-key attributes are dependent on or related to the primary key. For the bank this menas not storing the card issuer on the bank account and instead storing it on the credit card table.

Third Normal Form (3NF)

To be in third normal form a database first has to meet all the requirements of the second normal form, then it must contain no non-key dependencies. This means that there are no attributes that depend on anything more than the primary key. For example storing a customers name on a bank account when the customers account number should be stored there instead as the customer name isn't dependent on the bank account table. This is refered to as "transient" dependencies and these "transient relationships" aren't allowed in 3NF. In 3NF there all attributes are non-transiently dependent on the primary key.

Normalisation is done as it brings many advantages to the database, and in this case the Project Volt Bank:

It's easier to maintain the database, create new fields, new tables, etc.
There is no data redundancy which means that the datbase takes up less storage and also means that there won't be data inconsistencies within the database itself. Also due to the smaller tables with fewer fields it means that querying is faster.
Data integrity is maintained (covered below). This means that all updates and deletes will cascade through the tables when updated.

Methods of Data Capture

Below is a table of a few examples of methods of data capture.

(Scroll along horezontally to fully read it.

Name	Description	Examples	Advantages	Disadvantages
Keyboard	A keyboard is made up of buttons called keys, they are arranged into sections such as alphabet, numbers, function keys, etc.	Standard desktop keyboard, Gaming keyboard, Wireless keyboard, Ergonomic keyboard, Mechanical keyboards, On-screen keyboard	SImple method to get data input, Covers all letters and numbers that a user may need to enter. Very intuitive and easy to understand.	If you use a keyboard for too long, it can lead to health problems such as repetitive strain injury. Not suitable and accessible for users with visual impairments or limited motor control.
Keypad	A keypad is a type of keyboard that only accepts a numerical input as it only has keys for numbers and sometimes a few special characters and an enter button.	On-screen keypad, Physical phone dialling pad, Pinpad	Very simple to get a numerical input from a user. Very intuitive and easy to understand.	Limited input range as it only supports numbers 0-9 and a few special characters such as * and #. Not suitable for complex data. Not suitable and accessible for users with visual impairments or limited motor control.
NFC	NFC stands for Near Field Communication. It's based on RFID. It's a type of wireless connection that sends data, NFC works over a short distance (around 10cm). Typically can store about 48 bytes to 1-4 KB.	Oyster Card, ID Cards, Apple Pay/Google Pay, NFC Tags (do actions like connecting to a network or going to a website url)	It has a much smaller field than RFID so that's more secure than RFID. (Bad actors have to get within a 10cm range) It allows for wireless transmission of data, making it easier to send many types such as payments and files. Doesn't require a direct line of sight like with a barcode. Stores more than RFID.	It has a very short range so you have to be close to the reader. Although the distance is about 10cm it can still be read by a bad actor. Not all mobile devices support NFC so it can't be used solely for things such as connecting to a network or a website link as older devices won't be able to read the NFC. Stores a small amount of information from 48 bytes to 1-4KB.
Paper form	A paper form is a way to physically collect information from people by providing them with questions printed on paper for them to fill in and hand back.	Exam paper front cover, Tax documents, Job application, Order form, Survey	Extremely simple to use as it's just pen/pencil on paper to respond. Has no reliance on a network connection or power.	More expensive than a digital form as you have to pay for the paper and ink that it's using. People may write outside of boxes or not fill them in properly according to how they will be read in (e.g. using OCR or OMR).
Barcode Scanner	A barcode scanner scans typically 1D (one directional) barcodes, these are made up of characters that are represented using different width black bars. They work by sending out visible light and then it's reflected back into the scanner and that data is then sent as an input to a computer system.	Shop checkouts, Inventory systems, Postal services,	They are fast and accurate for reading the data on a barcode and remove human error from entering something like a product code using a keypad. They are a cheap way to store and read data.	They only work if the barcode is fully readable, if the barcode is lightly damaged then it will be unable to fully read it and will cause errors. Can't store complex or long data using a barcode. Requires a direct line of sight to the barcode. Has to be within close proximity to the barcode.
RFID	RFID stands for Radio Frequency Identification, it uses electromagnetic induction to transmit information. Typically they can only store a small bit of information (e.g. 64 bits to 512 bits)	Key fobs, Shop tags (asset protection - alarms),	RFID Range can be anywhere from 10cm to 1m, and that large range can be useful for things such as keyless entry for cars. Doesn't require a direct line of sight like with a barcode.	RFID Range can be anywhere from 10cm to 1m which allows for security breaches from further away. More expensive than something like a barcode or magnetic strip. Only store small amounts of information (roughly 64 bits to 512 bits), so typically requires an additional system to lookup the data (which may be an ID).
OMR	OMR stands for Optical Mark Recognition and is used to collect input on multiple choice questions, it works by shining a light onto a special form in specific locations and looking to see if there is a reflection from a pencil mark to indicate that a certain option has been selected.	National Lottery ticket selection form, Multiple choice test, Surveys,	It's extremely fast for reading inputs. It's more accurate than data being entered in manually by a person and way faster than a person. Also then reduces manual errors. Very simple to get input from a user and very simple to read. Cheaper than OCR.	People may mark slightly outside of boxes and that input won't be detected. People may attempt to erase previous inputs and it may still register causing either multiple inputs to be recorded or the wrong input to be recorded.
OCR	OCR stands for Optical Character Recognition and is used to turn something written physically into a string by picking out each individual character. It's typically not used standalone and is used with something else like a camera for ANPR or a scanner for exam papers.	ANPR Automatically reading forms, Exam papers,	Cheaper than having someone to manually enter the text visible, Quicker and more accurate than having a human enter text.	Not 100% accurate and can read wrong. Can't be used on documents that might have poor handwriting or a different colour ink like exam boards don't want papers written in pink ink as they can't be scanned to good quality and OCR won't be able to pick text up.
Card reader	Reads data stored on a card (typically the chip on the card - called an EMV chip) and then uses that as an input within a computer system.	Payment terminals Self-checkouts Building access checker	Reduces human error as data can be read automatically rather than manually typed out. Very secure compared to NFC and RFID as you have to put the card into a reader for it to read so it prevents bad actors from getting close to you to read it. Can be used in many different applications like payments, building access, etc.	If the card is slightly worn or damaged the reader may not be able to read the data on the card or read incorrectly. It depends on electricity and typically for payments and authentication a network connection too. The card has to be entered in a certain way for it to be read.

Referential Integrity

Referential integirty is a process that is applied in relational databases where it ensures the consitency and validity of data within a table. This is using foreign keys where the primary key referneced in a primary key has to exist in order for the 2 tables to be linked. This allows for these relationships. The integrity of the data is maintained when adding, updating or deleting a record within a table that links to another table with the same data. This means cascading updates can happen. For example if a customers ID changes from a number to a UUID this will be updated everywhere a users ID is present.

Referential integrity is hard to do with data redundancy (duplicated data) and Many-To-Many relationships so there will be inconsistency of data of which referential integrity aims to solve. This is why the bank won't use any Many-To-Many relationships without doing it through a link table with two one-to-many relationships.

Transaction Processing

EDI (Electronic Data Interchange) is the computer-to-computer exchange of documents (e.g. bank statements from the bank to a loan compnay) between two systems. It's a replacement for other unconvential methods such as post or fax. EDI creates a standard that all documents follow so that it can follow this protocol, special EDI software is needed to input and send out EDI.

Within a database a single operation is known as a transaction (e.g. deleting a customers bank account), a transaction can contain multiple processes and for example when deleting a customers bank account this will be doing verification, informing all account holders, getting consent from other parties (if it's held by more than one person as a bank account is a One-To-Many with a link table to Customer). Transaction processing is used here to ensure that the transaction ONLY goes through when everything has been comlpeted. So if the banks systems crash then the transaction won't go through. Transactions are processed as one.

On top of this if two bank employees are trying to update a customers name and the other is trying to update the customers address at the same time with the record open it will cause one of the employees changes to be discarded and overriden. This is why record locking happens. Once an employee accesses and changes the customer, the record is locked meaning that it can't be modified or accessed which will prevent the issue of changes being lost. Once everythings is processed then it can be unlocked and viewed and manipulated again.

This can lead to Deadlock, where 2 bankers can lock one users record and try to access another, of which both end up locked so they will wait indefinately as both records are locked.

The DBMS can prevent the overriding issue using Serialisation. There are two main types of serialisation: Timestamp Ordering

Each object in the database has a read and write timestamp. This gets updated when the data is read from or written to. So if a user tries to copy and edit ar ecord whilst it's being changed the change can be denied and it prevents deadlock and overriding.

Commitment Ordering

This is where transactions are ordered in terms of their dependencies on one another as well as the time they were initiated.

ACID (Atomicity, Consistency, Isolation and Durability)

ACID is a set of properties that ensures that the integrity of the database is maintained at all times. It links in with Transaction Processing and eusres that all transacitons are processed reliably.

Atomicity

A transaction is required to either be processed in it's entirety or not at all, so if the power cuts of the DBMS crashes it's not possible for only a part of the transaction to be completed. Within a bank this is important as if a bank transfer is taking place and money is deducted from one account then the power is lost that money is lost completely as the recipient never recieved the money in the transaction.

Consistency

This is a rule where no transaction can violate any validations rules on the database which includes referential integrity. So if a bank account has their customers ID updated it will also be updated within the bank account.

Isolation

This ensures that each transaction will give the same result regardless of when it is processed. This is crucial within a multi-user database link a bank where many bankers will be updating customers, bank accounts and cards all at once.

Durability

This is where a transaction has been committed it will remain committed, in the event of power loss, crashes, etc. Whilst the transaction happens, after each part it's stored on the disk so that once the entire transaction is complete then the changes to the tables in the database can occur.

ocr2.org.uk