In this course we use data from dvdrental.sql

Content

Pre-requirements for Windows

Download and Install Mysql

Download Mysql from https://dev.mysql.com/downloads/installer/

Download Mysql
do not select the web version

follow the instructions (default, default, default...)

Optional* Download and Install DBeaver

Download DBeaver from https://dbeaver.io/. It's like an IDE for databases which greatly eaiser the editing of sql and database.

Open or stop Mysql service

The are two ways:

  1. Through GUI (recommended for noobs)

    open Services
    find Services

    Services
    set as Manual and start service (stop service to reduce performance cost)

  2. Through command line (recommended for pros)

    cmd
    Run cmd as administrator

    # To enable mysql service, type net start mysql80 # if it doesn't work, type net start mysql # To stop mysql service, type net start mysql80 # if it doesn't work, type net stop mysql

Try Mysql in command line

Env
add Mysql to Path

# To enter mysql mode, type mysql # If it doesn't work, type mysqlsh # To exit mysql mode, type \quit

Query

show databases; use dvdrental; show tables;

SELECT & FROM & WHERE & LIMIT & AS

select * from staff where first_name = "Mike";
select film_id, title, length, length * 60 as length_in_secs from film limit 5;

Special Operators

SQL allows the use of special operators in conjunction with the WHERE clause. These special operators include

-- Find out customers whose last name start with J. select * from customer where last_name like "J%";

GROUP BY & HAVING

select customer_id, staff_id, COUNT(*), SUM(amount) from payment group by customer_id, staff_id;
-- What is the minimal length (>46) for films with different rating ? select rating, MIN(length) from film group by rating having MIN(length) >46;

ORDER BY

ORDER BY orders the selected rows based on one or more attributes. Can be followed by ASC or DESC

select * from actor order by first_name, last_name, actor_id;

Relational Set Operations

Join Tables

SQL joins1

SQL joins2

In MySQL, we use UNION instead of UNION, INTERSECT, and EXCEPT. UNION is set strict and no duplicates; if you want to keep duplicates, you can use UNION ALL.

-- find out actor whose first name is ‘Joe’ and customer whoes first name is ‘Lisa’. select first_name, last_name from actor where first_name = "Joe" union select first_name, last_name from customer where first_name = "Lisa";
-- inner join table city and country with country_id. select * from city inner join country on city.country_id = country.country_id;

Difference between IN and ON: IN is followed by a table while ON is followed by a conditional statement.

Datatypes

TIMESTAMP and DATETIME can keep track of when a row was inserted or last updated

Data type conversion

cast(expression as cast_ type)

Built-in Functions

Strings

select concat('a', space(10), 'b'), -- 'a b' length('very good'), -- 9 upper('very good'), -- 'VERY GOOD' lower('VERY GOOD'), -- 'very good' reverse('very good'), -- 'doog yrev'

Numeric

select round(-3.14), -- -3 ceiling(-3.14), -- -3 floor(-3.14), -- -4 abs(-3.14), -- 3.14 sign(-3.14), -- -1 rand(), -- a random float number, [0.0, 1.0) rand(), -- another random float number rand(123), -- rand() with seed power(3.14, 3), -- 30.959144

Date and Time

set @t = "2021-11-28 20:23:51"; -- setting a variable select current_date(), current_time(), current_timestamp(), utc_date(), utc_time(), year(@t), month(@t), dayofmonth(@t), dayofweek(@t); select extract(year from @t), extract(day_second from @t); select date_add(@t, interval 1 month), date_sub(@t, interval 1 day); select datediff("2021-11-21", "2021-11-1"), -- returns the number of days to_days("2021-11-21"), -- returns the number of days since the year 0. Not reliable for dates <1582 time_to_sec("0:10"); -- returns the number of seconds since midnight 00:00

if()

select title, rating, if(rating != "R", "good film", "x") as good_movie from film

Create and Delete Databases

create database coursedb; create table Stu( stu_id int not null, name varchar(30) default "Not available", primary key (stu_id)); drop table Stu; drop database coursedb;

Column constraints

create table table_name( column_name_1 type column_constraints, column_name_2 type column_constraints, …, [constraint name] primary key (column_name_1), [constraint name] foreign key (column_name_2) references table_name_2 (column_name_1), check(column_name_2>2))

Referential Integrity ensures that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation.

Functions and Procedures

Function

return values and can be run like built-in functions

delimiter // -- change default delimiter from ";" to "//" create function my_add(x integer, y integer) returns integer deterministic -- state same result on same input, not essential begin return x + y; end // delimiter ; -- new delimiter "//" select my_add(1,2); -- test

Procedure

do not return values (IN/OUT/INOUT parameters) and can be run using CALL keyword

delimiter // create procedure get_film() deterministic begin select film_id, title from film; end// delimiter ; call get_film();

Warning: Use "Execute script" instead of "Execute SQL statement" in DBeaver to avoid error.

IF THEN ELSE

delimiter // create procedure my_compare(a int, b int) deterministic begin if a > b then select "a is larger than b"; elseif a = b then select "a equals b"; else select "a is smaller than b"; end if; end// delimiter ; call my_compare(10, 20);

Simple LOOP

delimiter // create procedure cumsum(N int) deterministic begin declare s int default 0; declare i int default 1; my_loop: loop set s = s + i; select i as added, s as result; set i = i + 1; if i>N then leave my_loop; end if; end loop; end // delimiter ; call cumsum(100);

Database Design

Database requirements are statements that define the details and constraints of the data and metadata, which can be represented in a conceptual database model, such as Entity-Relationship (ER) model. The result of ER modeling is an ER diagram (ERD).

Entity-Relationship Diagram (ERD)

We use a modified version of Peter Chen’s Notation. View full info

Database Normalization

Normalization is a process to improve the design of relational databases, mainly to reduce data redundancy while preserving information. Normal form (NF) is a set of particular conditions upon table structures.

0NF -> 1NF

  1. Remove duplicated rows.
  2. Eliminate multivalued columns.

A table is in 1NF if each row is unique and not duplicated. Within each row, each value in each column must be single valued.

1NF -> 2NF

Create an additional relation for each set of partial dependencies.

  1. The portion of the primary key in partial dependency => primary key of the new table (becomes a foreign key in original table).
  2. The columns determined in partial dependency => columns of the new table (removed from original table).

The original table remains after the process of normalizing to 2NF, but no longer contains the partially dependent columns.

A table is in 2NF if it is in 1NF and does not contain partial dependencies (a column of a relation is functionally dependent on a portion of a composite primary key). Table has a single-column primary key <=> Table is in 2NF.

2NF -> 3NF

Create additional relations for each set of transitive dependencies in a relation.

  1. The transitively determinant nonkey column in the original table => the primary key of a new table.
  2. Move the determined nonkey columns to the new table.

The original table remains after normalizing to 3NF, but it no longer contains the transitively dependent columns

A table is in 2NF if it is in 2NF and does not contain transitive functional dependencies (nonkey columns functionally determine other nonkey columns).

Denormalization

Denormalization is reversing the effect of normalization by joining normalized relations into a relation that is not normalized in order to improve query performance.

Database Indexing

Indexing is a mechanism for increasing the speed of data search and data retrieval on relations with a large number of records.

create index index_name [using {btree | hash}] on table_name (column_name [asc | desc], ...);

Database Transactions

Database operations have two types:

Database transactions have two types:

begin; update accounts set balance = balance - 100.00 where name = 'Bob'; update accounts set balance = balance + 100.00 where name = 'Alice'; commit;

Consistent Database State all data integrity constraints satisfied

A transaction must begin with a consistent database state, and end with another consistent state. But the intermediate state during a transaction could be inconsistent.

Transaction Requirements – ACID

The above is about a single transaction, called CAD requirements. In reality, multiple transactions can occur at the same time and access the same data items. Thus, we have ACID requirements for multiple transactions.

Transaction Atomicity and Durability

Transaction States

Transaction States

Concurrent Transaction Common Problems

Transaction Isolation Levels

Isolation Level Dirty Read Lost Update Non-repeatable Read Phantom Read
Serializable N N N N
Repeatable Read N N N Y
Read Committed N Y Y Y
Read Uncommitted Y Y Y Y

Serializable Schedule

If two operations on the same data has at least one "write" in them, then they can not swap, otherwise will cause a conflict (loss of equivalence) before and after swapping.

  1. I = read(Q), J = read(Q);
  2. I = read(Q), J = write(Q);
  3. I = write(Q), J = read(Q);
  4. I = write(Q), J = write(Q);

Case 1 to case 4 have different results if we swap the order, so I and J operations are conflict in above case 2~4.

Further explaintion in Chinese

Transaction management

Practice