CS134-lecture-20210702

Normalization cont. #

image_2021-07-02-09-49-00

We can also use an alternate notation to indicate dependencies:

image_2021-07-02-09-51-03 image_2021-07-02-09-49-40

If we remove the ssn then the dependency is no longer true.

Second normal form #

image_2021-07-02-09-57-25

An attribute that is part of any candidate key is a prime attribute. If we have a relation \[\begin{aligned} R(a, b, c, d, e) \end{aligned}\] and our candidate keys are \( \{a,b\} \) and \( c \) , that means our prime attributes are \( a,b,c \) .

Any attribute that isn’t a prime attribute is called an non-prime attribute, that is it is not part of any candidate key. So our non-prime attributes from the last example are \( d,e \) .

A relational schema \( R \) is in second normal form if every non-prime attribute \( A \) in \( R \) is fully functionally dependent on every key of \( R \) .

image_2021-07-02-10-09-26

FD1. if we know the PROPERTY_ID then we also know the attributes COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE
FD2. if we know the COUNTY_NAME and LOT#, then we also know the attributes PROPERTY_ID AREA PRICE TAX_RATE
FD3. if we know the COUNTY_NAME, then we also know the TAX_RATE

So is this in second normal form?

Our non-prime attributes are AREA PRICE and TAX_RATE, we need to go through and make sure each of these non-prime attributes are fully functional on the keys.
- Is AREA fully functionally dependent on PROPERTY_ID? \[ \begin{aligned} \text{property id} \to \text{area} \end{aligned} \] Yes, because we cannot remove anything from the left hand side and have the dependency hold true. (Any attribute that depends on a single key will automatically be fully functionally dependent).
- The rest of the non-prime attributes are fully functionally dependent on PROPERTY_ID for the same reason as above
- Next we need to check if the non-primes are fully functionally dependent on the key {COUNTY_NAME, LOT#} \[ \begin{aligned} \{ \text{county name, lot num} \} \to \text{area} \end{aligned} \] If we remove LOT# will this dependency still hold true? Yes, this is fully functionally dependent. (If there was another dependency COUNTY_NAME \( \to \) AREA then it would still hold true). \[ \begin{aligned} \{ \text{county name, lot num} \} \to \text{price} \end{aligned} \] This is also fully functionally dependent for the same reason. \[ \begin{aligned} \{ \text{county name, lot num} \} \to \text{tax rate} \end{aligned} \] This is not fully functionally dependent because if we remove LOT#, there still exists a dependency COUNTY_NAME \( \to \) TAX_RATE (FD3).
Therefore, this is not in second normal form because non-prime attribute TAX_RATE is not fully functionally dependent on key {COUNTY_NAME, LOT#}.

Note: As soon as you find a single non-prime attribute that is not fully functionally dependent on a key you can immediately see it is not in second normal form.

In class exercise #

Include this in weekly homework.

Consider the relation

\[\begin{aligned} R(A, B, C, D, E) \end{aligned}\]

where the keys are \( \{A,B\}, \{C,D\} \) , and the dependencies are

\[ \{A,B\} \to \{C,D,E\}\\ \{C,D\} \to \{A, B, E\}\]

Is \( R \) in second normal form?

The non-prime attribute is \( E \) . So is \( E \) fully functional on all keys?
- Is \( E \) fully functionally dependent on \( \{A,B\} \) ? Yes.
- Is \( E \) fully functionally dependent on \( \{C,D\} \) ? Yes.
Therefore, \( R \) is in second normal form.

Include this in weekly homework.

Consider the relation \[\begin{aligned} R(A, B, C, D) \end{aligned}\] with key \( \{A,B\} \)

image_2021-07-02-11-35-25

Is \( R \) in second normal form?

No, because we can remove \( B \) from the dependency \[\begin{aligned} \{A,B\} \to D \end{aligned}\] and still fulfill the depdendency via \[\begin{aligned} A \to B \end{aligned}\]

So, \( R \) is not in second normal form.

Third normal form #

Third normal form is easier to check than second normal form.

image_2021-07-02-11-38-23

A relational schema \( R \) is in third normal form if, whenever a non-trivial functional dependency \( X \to A \) holds in \( R \) , then either

\( X \) is a superkey of \( R \) , or

\( A \) is a prime attribute of \( R \)

Recall, any key is a superkey:

image_2021-07-02-11-43-41

So is this schema in third normal form?

First we should check to see if FD1 has any violation
- Check left hand set {PROPERTY_ID}: Is this a superkey? Yes.
Check if FD2 has any violation
- Check left hand set {COUNTY_NAME, LOT#}: Is this a superkey? Yes.
Check if FD4 has any violation
- Check left hand set {AREA}: Is this a superkey? No.
- Check right hand set {PRICE}: Is this a prime attribute? No.
Therefore, since we had a violation, this schema is not in third normal form.

In class exercise #

Include in weekly homework.

image_2021-07-02-11-55-59

Is this schema in third normal form?

Check FD1
- Left hand set is a superkey, good.
Check FD2
- Left hand set is a superkey, good.
Check FD5
- Left hand set is not a superkey…
- Right hand set is a prime attribute, good.
Therefore, this schema is in third normal form.

Boyce-Codd normal form (BCNF) #

image_2021-07-02-11-54-43

This is the highest normal form we will study.

A relational schema \( R \) is in Boyce-Codd Normal Form (BCNF) if whenever a nontrivial functional dependency \( X \to A \) holds in \( R \) , then \( X \) is a superkey of \( R \) .

This is a stricter version of the third normal form, it only checks the first condition.

image_2021-07-02-12-16-02

Is this schema in BCNF?

Check FD1
- Left hand set is a superkey, good.
Check FD2
- Left hand set is a superkey, good.
Check FD5
- Left hand set is not a superkey.
Therefore, this schema is not in BCNF.

Normal form examples #

If we have the relation \[\begin{aligned} R(A, B, C, D, E) \end{aligned}\] with key \( \{A,B\} \) , and we have these dependencies:

image_2021-07-02-12-23-46

Which normal form is \( R \) in? Why?

Is \( R \) in BCNF?
- FD1: left hand set is a superkey? Yes
- FD2: left hand set is a superkey? No
- Not in BCNF
Is \( R \) in third normal form?
- FD1: left hand set is a superkey? Yes
- FD2: left hand set is a superkey? No. Right hand set prime attribute? Yes
- Passed third normal form checks

Therefore, \( R \) is in third normal form.

Note: We can check normal forms in increasing or decreasing order.

Storage #

File: Storage slides

Now that we’ve finished studying implementation, we will look at physical storage. We will look closer at how a DBMS accesses files on disk.

image_2021-07-02-12-41-41

Specific file format meant for read/write by the DBMS.

Physical storage #

image_2021-07-02-12-42-53 image_2021-07-02-12-44-43 image_2021-07-02-12-46-06

Records #

image_2021-07-02-12-48-04 image_2021-07-02-12-52-29 image_2021-07-02-12-52-37 image_2021-07-02-12-52-43

Record spanning #

image_2021-07-02-12-54-20 image_2021-07-02-13-02-59 image_2021-07-02-12-54-33

File headers #

image_2021-07-02-13-05-34

File operations #

image_2021-07-02-13-06-03 image_2021-07-02-13-06-13 image_2021-07-02-13-06-19 image_2021-07-02-13-06-24 image_2021-07-02-13-06-29 image_2021-07-02-13-06-32

Record ordering #

image_2021-07-02-13-06-48 image_2021-07-02-13-07-54

Moving items when deleting from the middle, ie

image_2021-07-02-13-10-41

can be really expensive. So instead we put a marker for deletion in place

image_2021-07-02-13-11-17

Once there are a lot of markers in a block, space will have to be reorganized.

image_2021-07-02-13-08-02 image_2021-07-02-13-12-54

This file spans multiple blocks
It is sorted by name
If we search based by name, we can use binary search, resulting in \( O(\lg n) \) time complexity.
If we search by salary, best we can do is a linear search, resulting in \( O(n) \) time complexity.
- If we order by salary, then we could use a binary search.

image_2021-07-02-13-13-02