Datacamp course notes on relational db in SQL.

Basic Concepts

Advantages of relational database:

real-life entities become tables. Each table will only contain information/attributes that are directed related to that entity.
reduced redundancy. No need to repeat the same person’s name multiple times if he shows up in multiple entities.
data integrity by relationships. one - one, one - many, many - many

Dealing with Redundancy

Sometimes we may detect redundancy in one table, since it contains multiple entities and repetitive information. We can seperate those information into multiple tables by creating new tables, and then migrate the data into them.

Create Tables

CREATE TABLE table_name (
column_a data_type,
column_b data_type,
column_c data_type
);

/*EXAMPLE*/
CREATE TABLE weather (
    clouds text,
    temperature numeric,
    weather_station char(5)
)

Add Columns
We can add columns into table.

ALTER TABLE table_name
ADD COLUMN column_name data_type;

/*EXAMPLE*/
ALTER TABLE professors
ADD COLUMN university_shortname text;

Change Column Name

1 2	ALTER TABLE table_name RENAME COLUMN old_name TO new_name;

Drop a Column
Sometimes we would find that the rest of the columns can already uniquely identify a row, so the column becomes redundant. We can drop that by the following operation.
1
2
ALTER TABLE table_name
DROP COLUMN column_name;

Migrate Data
We can migrate data from one table to another.

/*university_professors --> organizations*/
INSERT INTO organizations 
SELECT DISTINCT organization, organization_sector
/*The above DISTINCT applies to all listed columns*/ 
FROM university_professors;

/*normal case below*/
INSERT INTO table_name (column_a, column_b)
VALUES ("value_a", "value_b");

/*EXAMPLE*/
INSERT INTO transactions (transaction_date, amount, fee) 
VALUES ('2018-24-09', 5454, '30');

Drop a Table
After migrating the data, we can delete the redundant table by the following operation
1
DROP TABLE table_name;

Constraints

With the specified constraints, we can achieve better data quality.

Types
1. Integrity constraints, e.g. data types on columns
  Implementing such a type on a column would disallow anything that doesn’t fit the structure specified.
2. Key constraints, e.g. primary keys
3. Referential integrity constraints, enforced through foreign keys
Why constraints?
- Constraints give the data structure
- Constraints help with consistency, and thus data quality
- Data quality is a business advantage/ data science prerequisite
- Enforcing is difficult, but PostgreSQL helps

Dealing with Data Types

Basics
1. Enforced on columns (i.e. attributes)
2. Define the so-called “domain” of a column
3. Define what operations are possible
4. Enfore consistent storage of values, e.g. the number of characters
Most common types
- text: character strings of any length
- varchar [(x)]: a maximum of n characters
- char [(x)]: a fixed-length string of n characters
- boolean: can only take three states, e.g. TRUE, FALSE and NULL (unknown)

Alter types after table creation

ALTER TABLE students
ALTER COLUMN name 
TYPE varchar(128);

ALTER TABLE students
ALTER COLUMN average_grade
TYPE integer
-- Round to the nearest integer, 
-- so that it turns 5.54 into 6, not 5, before altering the type
USING ROUND(average_grade); 

-- Truncate the string before converting the varchar to shorter length
ALTER TABLE table_name
ALTER COLUMN column_name
TYPE varchar(x)
USING SUBSTRING(column_name FROM 1 FOR x)

Using CAST() to convert data type on-the-fly

1 2	SELECT temperature * CAST(wind_speed AS integer) AS wind_chill FROM weather;

Not NULL Constraint

Basics:
- Disallow NULL values in a certain column
- Must hold true for the current state
- Must hold true for any future state

Two null values must not have the same meaning.

When creating a table:

CREATE TABLE students (
 ssn integer not null,
 lastname varchar(64) not null,
 home_phone integer,
 office_phone integer
);

When changing the constraint after the table has been created:

-- add constraint
ALTER TABLE students 
ALTER COLUMN home_phone 
SET NOT NULL;

-- remove constraint
ALTER TABLE students 
ALTER COLUMN ssn 
DROP NOT NULL;

Unique Constraint

Basics
- Disallow duplicate values in a column
- Must hold true for the current state
- Must hold true for any future state

When creating a table:

1
2
3

CREATE TABLE table_name (
 column_name UNIQUE
);

When adding the constraint after the table has been created:

1
2
3

ALTER TABLE table_name
ADD CONSTRAINT some_name UNIQUE(column_name);
-- some_name is the name of the constraint, which is arbitrary by you, but has to be set

Keys

What is a Key?

Attribute(s) that identify a record uniquely
As long as attributes can be removed: superkey (the combination of attributes can still uniquely identify each row after removing one or more attributes)
If no more attributes can be removed: minimal superkey or key (the combination of attributes can no longer uniquely identify each row after removing any attributes)
There can be multiple keys that satisfy the above minimal superkey requirement, and thus are all called candidate keys.

Primary Key

What is a Primary Key?

One primary key per database table, chosen from candidate keys
Uniquely identifies records, e.g. for referencing in other tables
Unique and not-null constraints both apply
Primary keys are time-invariant: choose columns wisely! They must hold for both the current data and future data.

Specifying Primary Key
When creating a table:

-- set one attribute as primary key
CREATE TABLE products (
    product_no integer PRIMARY KEY,
    name text,
    price numeric
);

-- set more than one attributes as primary key
CREATE TABLE example (
    a integer,
    b integer,
    c integer,
    PRIMARY KEY (a, c)
);

When specifying primary key after creating the table

1
2
3

ALTER TABLE table_name
ADD CONSTRAINT some_name PRIMARY KEY (column_name)
-- have to give the constraint "some_name"

What is a Surrogate Key?
It is an artificial primary key, and we create it for the following reasons:

Primary keys should be built from as few columns as possible
Primary keys should never change over time

Ways to create a surrogate key

Adding a surrogate key with serial data type
Add a column called ‘id’, which is a series on incrementing numbers so that it can uniquely identify each row. (like row number)

ALTER TABLE cars
ADD COLUMN id serial PRIMARY KEY;

-- Whenever you add a new record to the table, 
-- it will automatically get a number that does not exist yet
INSERT INTO cars
VALUES ('Volkswagen', 'Blitz', 'black'); -- insert another row and generates a new id with it

-- If you try to specify an ID that already exists,
-- the primary key constraint will prevent you from doing so
INSERT INTO cars
VALUES ('Opel', 'Astra', 'green', 1); -- throws error since 1 already exists

Combining several columns into a new one

-- create a new column
ALTER TABLE table_name
ADD COLUMN column_c varchar(256);

-- concat two existing columns together and set the result into the new column 
UPDATE table_name
SET column_c = CONCAT(column_a, column_b);

-- set the new column as surrogate primary key
ALTER TABLE table_name
ADD CONSTRAINT pk PRIMARY KEY (column_c);

Foreign Key

What is a Foregin Key?

A foreign key (FK) points to the primary key (PK) of another table
Domain of FK must be equal to domain of PK (same data type)
Each value of FK must exist in PK of the other table (FK constraint or “referential integrity“)
FKs are not actual keys (can have duplicates and NULL)

When creating table with foreign key:

-- Create manufacturers table
CREATE TABLE manufacturers (
 name varchar(255) PRIMARY KEY
);

INSERT INTO manufacturers 
VALUES ('Ford'), ('VW'), ('GM');

-- create car table, and reference the columne manufacturer_name 
-- to manufacturers table's name column, 
-- which is manufacturer table's primary key
CREATE TABLE cars (
 model varchar(255) PRIMARY KEY,
 manufacturer_name integer REFERENCES manufacturers (name)
);

-- From now on, only cars with valid and existing manufacturers 
-- may be entered into the car table, 
-- due to the foreign key constraint.
INSERT INTO cars 
VALUES ('Ranger', 'Ford'), ('Beetle', 'VW')

When specifying foreign keys to existing tables:

1 2	ALTER TABLE a ADD CONSTRAINT a_fkey FOREIGN KEY (b_id) REFERENCES b (id);

N:M relationship

How to implement this kind of relationship?

Create a table
Add foreign keys for every connected table
Add additional attributes
No primary key! Can possibly combine several attributes together to have some form of unique constraint in that table.

CREATE TABLE affiliations (
    professor_id integer REFERENCES professors (id),
    organization_id varchar(256) REFERENCES organization (id),
    function varchar(256)
);

Referential Integrity

What is Referential Integrity?

A record referencing another table must refer to an existing record in that table.
Specified between two tables
Enforced through foreign keys

Referential Integrity Violations
Referential integrity from table A to table B is violated… (A has a foreign key that references from B)

…if a record in table B that is referenced from a record in table A is deleted.
…if a record in table A referencing a non-existing record from table B is inserted.
Foreign keys prevent violations!

Dealing with violations
ON DELETE…

…NO ACTION: Throw an error

CREATE TABLE a (
id integer PRIMARY KEY,
column_a varchar(64), 
...,
b_id integer REFERENCES b (id) ON DELETE NO ACTION
);
-- will throw an error if violation happens

…CASCADE: Delete all referencing records

CREATE TABLE a (
id integer PRIMARY KEY,
column_a varchar(64), 
...,
b_id integer REFERENCES b (id) ON DELETE CASCADE
);
-- cascade the deletion

…RESTRICT: Throw an error
…SET NULL: Set the referencing column to NULL
…SET DEFAULT: Set the referencing column to its default value

To change the default behavior ON DELETE NO ACTION, we have to delete the key constraint and then add a new one with a different ON DELETE behavior.

For deleting constraints, though, we need to know their name. This information is also stored in information_schema.

-- Identify the correct constraint name
SELECT constraint_name, table_name, constraint_type
FROM information_schema.table_constraints
WHERE constraint_type = 'FOREIGN KEY';

-- Drop the right foreign key constraint
ALTER TABLE affiliations
DROP CONSTRAINT affiliations_organization_id_fkey;

-- Add a new foreign key constraint from affiliations to organizations which cascades deletion
ALTER TABLE affiliations
ADD CONSTRAINT affiliations_organization_id_fkey FOREIGN KEY (organization_id) REFERENCES organizations (id) ON DELETE CASCADE;