Datacamp course notes on relational db in SQL.
Basic Concepts
Advantages of relational database:
- real-life entities become tables. Each table will only contain information/attributes that are directed related to that entity.
- reduced redundancy. No need to repeat the same person’s name multiple times if he shows up in multiple entities.
- data integrity by relationships. one - one, one - many, many - many
Dealing with Redundancy
Sometimes we may detect redundancy in one table, since it contains multiple entities and repetitive information. We can seperate those information into multiple tables by creating new tables, and then migrate the data into them.
Create Tables
1
2
3
4
5
6
7
8
9
10
11
12CREATE TABLE table_name (
column_a data_type,
column_b data_type,
column_c data_type
);
/*EXAMPLE*/
CREATE TABLE weather (
clouds text,
temperature numeric,
weather_station char(5)
)Add Columns
We can add columns into table.1
2
3
4
5
6ALTER TABLE table_name
ADD COLUMN column_name data_type;
/*EXAMPLE*/
ALTER TABLE professors
ADD COLUMN university_shortname text;Change Column Name
1
2ALTER TABLE table_name
RENAME COLUMN old_name TO new_name;Drop a Column
Sometimes we would find that the rest of the columns can already uniquely identify a row, so the column becomes redundant. We can drop that by the following operation.1
2ALTER TABLE table_name
DROP COLUMN column_name;Migrate Data
We can migrate data from one table to another.1
2
3
4
5
6
7
8
9
10
11
12
13/*university_professors --> organizations*/
INSERT INTO organizations
SELECT DISTINCT organization, organization_sector
/*The above DISTINCT applies to all listed columns*/
FROM university_professors;
/*normal case below*/
INSERT INTO table_name (column_a, column_b)
VALUES ("value_a", "value_b");
/*EXAMPLE*/
INSERT INTO transactions (transaction_date, amount, fee)
VALUES ('2018-24-09', 5454, '30');Drop a Table
After migrating the data, we can delete the redundant table by the following operation1
DROP TABLE table_name;
Constraints
With the specified constraints, we can achieve better data quality.
Types
- Integrity constraints, e.g. data types on columns
Implementing such a type on a column would disallow anything that doesn’t fit the structure specified. - Key constraints, e.g. primary keys
- Referential integrity constraints, enforced through foreign keys
- Integrity constraints, e.g. data types on columns
Why constraints?
- Constraints give the data structure
- Constraints help with consistency, and thus data quality
- Data quality is a business advantage/ data science prerequisite
- Enforcing is difficult, but PostgreSQL helps
Dealing with Data Types
Basics
- Enforced on columns (i.e. attributes)
- Define the so-called “domain” of a column
- Define what operations are possible
- Enfore consistent storage of values, e.g. the number of characters
Most common types
- text: character strings of any length
- varchar [(x)]: a maximum of n characters
- char [(x)]: a fixed-length string of n characters
- boolean: can only take three states, e.g. TRUE, FALSE and NULL (unknown)
Alter types after table creation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16ALTER TABLE students
ALTER COLUMN name
TYPE varchar(128);
ALTER TABLE students
ALTER COLUMN average_grade
TYPE integer
-- Round to the nearest integer,
-- so that it turns 5.54 into 6, not 5, before altering the type
USING ROUND(average_grade);
-- Truncate the string before converting the varchar to shorter length
ALTER TABLE table_name
ALTER COLUMN column_name
TYPE varchar(x)
USING SUBSTRING(column_name FROM 1 FOR x)Using
CAST()
to convert data type on-the-fly1
2SELECT temperature * CAST(wind_speed AS integer) AS wind_chill
FROM weather;
Not NULL Constraint
- Basics:
- Disallow NULL values in a certain column
- Must hold true for the current state
- Must hold true for any future state
Two null values must not have the same meaning.
When creating a table:1
2
3
4
5
6CREATE TABLE students (
ssn integer not null,
lastname varchar(64) not null,
home_phone integer,
office_phone integer
);
When changing the constraint after the table has been created:1
2
3
4
5
6
7
8
9-- add constraint
ALTER TABLE students
ALTER COLUMN home_phone
SET NOT NULL;
-- remove constraint
ALTER TABLE students
ALTER COLUMN ssn
DROP NOT NULL;
Unique Constraint
- Basics
- Disallow duplicate values in a column
- Must hold true for the current state
- Must hold true for any future state
When creating a table:1
2
3CREATE TABLE table_name (
column_name UNIQUE
);
When adding the constraint after the table has been created:1
2
3ALTER TABLE table_name
ADD CONSTRAINT some_name UNIQUE(column_name);
-- some_name is the name of the constraint, which is arbitrary by you, but has to be set
Keys
What is a Key?
- Attribute(s) that identify a record uniquely
- As long as attributes can be removed: superkey (the combination of attributes can still uniquely identify each row after removing one or more attributes)
- If no more attributes can be removed: minimal superkey or key (the combination of attributes can no longer uniquely identify each row after removing any attributes)
- There can be multiple keys that satisfy the above minimal superkey requirement, and thus are all called candidate keys.
Primary Key
What is a Primary Key?
- One primary key per database table, chosen from candidate keys
- Uniquely identifies records, e.g. for referencing in other tables
- Unique and not-null constraints both apply
- Primary keys are time-invariant: choose columns wisely! They must hold for both the current data and future data.
Specifying Primary Key
When creating a table:1
2
3
4
5
6
7
8
9
10
11
12
13
14-- set one attribute as primary key
CREATE TABLE products (
product_no integer PRIMARY KEY,
name text,
price numeric
);
-- set more than one attributes as primary key
CREATE TABLE example (
a integer,
b integer,
c integer,
PRIMARY KEY (a, c)
);When specifying primary key after creating the table
1
2
3ALTER TABLE table_name
ADD CONSTRAINT some_name PRIMARY KEY (column_name)
-- have to give the constraint "some_name"
What is a Surrogate Key?
It is an artificial primary key, and we create it for the following reasons:
- Primary keys should be built from as few columns as possible
- Primary keys should never change over time
Ways to create a surrogate key
Adding a surrogate key with serial data type
Add a column called ‘id’, which is a series on incrementing numbers so that it can uniquely identify each row. (like row number)1
2
3
4
5
6
7
8
9
10
11
12ALTER TABLE cars
ADD COLUMN id serial PRIMARY KEY;
-- Whenever you add a new record to the table,
-- it will automatically get a number that does not exist yet
INSERT INTO cars
VALUES ('Volkswagen', 'Blitz', 'black'); -- insert another row and generates a new id with it
-- If you try to specify an ID that already exists,
-- the primary key constraint will prevent you from doing so
INSERT INTO cars
VALUES ('Opel', 'Astra', 'green', 1); -- throws error since 1 already existsCombining several columns into a new one
1
2
3
4
5
6
7
8
9
10
11-- create a new column
ALTER TABLE table_name
ADD COLUMN column_c varchar(256);
-- concat two existing columns together and set the result into the new column
UPDATE table_name
SET column_c = CONCAT(column_a, column_b);
-- set the new column as surrogate primary key
ALTER TABLE table_name
ADD CONSTRAINT pk PRIMARY KEY (column_c);
Foreign Key
What is a Foregin Key?
- A foreign key (FK) points to the primary key (PK) of another table
- Domain of FK must be equal to domain of PK (same data type)
- Each value of FK must exist in PK of the other table (FK constraint or “referential integrity“)
- FKs are not actual keys (can have duplicates and NULL)
When creating table with foreign key:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21-- Create manufacturers table
CREATE TABLE manufacturers (
name varchar(255) PRIMARY KEY
);
INSERT INTO manufacturers
VALUES ('Ford'), ('VW'), ('GM');
-- create car table, and reference the columne manufacturer_name
-- to manufacturers table's name column,
-- which is manufacturer table's primary key
CREATE TABLE cars (
model varchar(255) PRIMARY KEY,
manufacturer_name integer REFERENCES manufacturers (name)
);
-- From now on, only cars with valid and existing manufacturers
-- may be entered into the car table,
-- due to the foreign key constraint.
INSERT INTO cars
VALUES ('Ranger', 'Ford'), ('Beetle', 'VW')
When specifying foreign keys to existing tables:1
2ALTER TABLE a
ADD CONSTRAINT a_fkey FOREIGN KEY (b_id) REFERENCES b (id);
N:M relationship
How to implement this kind of relationship?
- Create a table
- Add foreign keys for every connected table
- Add additional attributes
- No primary key! Can possibly combine several attributes together to have some form of unique constraint in that table.
1 | CREATE TABLE affiliations ( |
Referential Integrity
What is Referential Integrity?
- A record referencing another table must refer to an existing record in that table.
- Specified between two tables
- Enforced through foreign keys
Referential Integrity Violations
Referential integrity from table A to table B is violated… (A has a foreign key that references from B)
- …if a record in table B that is referenced from a record in table A is deleted.
- …if a record in table A referencing a non-existing record from table B is inserted.
- Foreign keys prevent violations!
Dealing with violations
ON DELETE…
…NO ACTION: Throw an error
1
2
3
4
5
6
7CREATE TABLE a (
id integer PRIMARY KEY,
column_a varchar(64),
...,
b_id integer REFERENCES b (id) ON DELETE NO ACTION
);
-- will throw an error if violation happens…CASCADE: Delete all referencing records
1
2
3
4
5
6
7CREATE TABLE a (
id integer PRIMARY KEY,
column_a varchar(64),
...,
b_id integer REFERENCES b (id) ON DELETE CASCADE
);
-- cascade the deletion…RESTRICT: Throw an error
- …SET NULL: Set the referencing column to NULL
- …SET DEFAULT: Set the referencing column to its default value
To change the default behavior ON DELETE NO ACTION
, we have to delete the key constraint and then add a new one with a different ON DELETE behavior.
For deleting constraints, though, we need to know their name. This information is also stored in information_schema
.1
2
3
4
5
6
7
8
9
10
11
12-- Identify the correct constraint name
SELECT constraint_name, table_name, constraint_type
FROM information_schema.table_constraints
WHERE constraint_type = 'FOREIGN KEY';
-- Drop the right foreign key constraint
ALTER TABLE affiliations
DROP CONSTRAINT affiliations_organization_id_fkey;
-- Add a new foreign key constraint from affiliations to organizations which cascades deletion
ALTER TABLE affiliations
ADD CONSTRAINT affiliations_organization_id_fkey FOREIGN KEY (organization_id) REFERENCES organizations (id) ON DELETE CASCADE;