CS134-lecture-20210702

Normalization cont. #

image_2021-07-02-09-49-00

We can also use an alternate notation to indicate dependencies:

image_2021-07-02-09-51-03 image_2021-07-02-09-49-40

If we remove the ssn then the dependency is no longer true.

Second normal form #

image_2021-07-02-09-57-25

An attribute that is part of any candidate key is a prime attribute. If we have a relation \[\begin{aligned} R(a, b, c, d, e) \end{aligned}\] and our candidate keys are \( \{a,b\} \) and \( c \) , that means our prime attributes are \( a,b,c \) .

Any attribute that isn’t a prime attribute is called an non-prime attribute, that is it is not part of any candidate key. So our non-prime attributes from the last example are \( d,e \) .

A relational schema \( R \) is in second normal form if every non-prime attribute \( A \) in \( R \) is fully functionally dependent on every key of \( R \) .

image_2021-07-02-10-09-26

  • FD1. if we know the PROPERTY_ID then we also know the attributes COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE
  • FD2. if we know the COUNTY_NAME and LOT#, then we also know the attributes PROPERTY_ID AREA PRICE TAX_RATE
  • FD3. if we know the COUNTY_NAME, then we also know the TAX_RATE

So is this in second normal form?

  • Our non-prime attributes are AREA PRICE and TAX_RATE, we need to go through and make sure each of these non-prime attributes are fully functional on the keys.
    • Is AREA fully functionally dependent on PROPERTY_ID? \[ \begin{aligned} \text{property id} \to \text{area} \end{aligned} \] Yes, because we cannot remove anything from the left hand side and have the dependency hold true. (Any attribute that depends on a single key will automatically be fully functionally dependent).
    • The rest of the non-prime attributes are fully functionally dependent on PROPERTY_ID for the same reason as above
    • Next we need to check if the non-primes are fully functionally dependent on the key {COUNTY_NAME, LOT#} \[ \begin{aligned} \{ \text{county name, lot num} \} \to \text{area} \end{aligned} \] If we remove LOT# will this dependency still hold true? Yes, this is fully functionally dependent. (If there was another dependency COUNTY_NAME \( \to \) AREA then it would still hold true). \[ \begin{aligned} \{ \text{county name, lot num} \} \to \text{price} \end{aligned} \] This is also fully functionally dependent for the same reason. \[ \begin{aligned} \{ \text{county name, lot num} \} \to \text{tax rate} \end{aligned} \] This is not fully functionally dependent because if we remove LOT#, there still exists a dependency COUNTY_NAME \( \to \) TAX_RATE (FD3).
  • Therefore, this is not in second normal form because non-prime attribute TAX_RATE is not fully functionally dependent on key {COUNTY_NAME, LOT#}.
Note: As soon as you find a single non-prime attribute that is not fully functionally dependent on a key you can immediately see it is not in second normal form.

In class exercise #

Include this in weekly homework.

Consider the relation

\[\begin{aligned} R(A, B, C, D, E) \end{aligned}\]

where the keys are \( \{A,B\}, \{C,D\} \) , and the dependencies are

\[ \{A,B\} \to \{C,D,E\}\\ \{C,D\} \to \{A, B, E\}\]

Is \( R \) in second normal form?

  • The non-prime attribute is \( E \) . So is \( E \) fully functional on all keys?
    • Is \( E \) fully functionally dependent on \( \{A,B\} \) ? Yes.
    • Is \( E \) fully functionally dependent on \( \{C,D\} \) ? Yes.
  • Therefore, \( R \) is in second normal form.
Include this in weekly homework.

Consider the relation \[\begin{aligned} R(A, B, C, D) \end{aligned}\] with key \( \{A,B\} \)

image_2021-07-02-11-35-25

Is \( R \) in second normal form?

No, because we can remove \( B \) from the dependency \[\begin{aligned} \{A,B\} \to D \end{aligned}\] and still fulfill the depdendency via \[\begin{aligned} A \to B \end{aligned}\]

So, \( R \) is not in second normal form.

Third normal form #

Third normal form is easier to check than second normal form.

image_2021-07-02-11-38-23

A relational schema \( R \) is in third normal form if, whenever a non-trivial functional dependency \( X \to A \) holds in \( R \) , then either

  • \( X \) is a superkey of \( R \) , or
  • \( A \) is a prime attribute of \( R \)

Recall, any key is a superkey:

image_2021-07-02-11-41-25

image_2021-07-02-11-43-41

So is this schema in third normal form?

  • First we should check to see if FD1 has any violation
    • Check left hand set {PROPERTY_ID}: Is this a superkey? Yes.
  • Check if FD2 has any violation
    • Check left hand set {COUNTY_NAME, LOT#}: Is this a superkey? Yes.
  • Check if FD4 has any violation
    • Check left hand set {AREA}: Is this a superkey? No.
    • Check right hand set {PRICE}: Is this a prime attribute? No.
  • Therefore, since we had a violation, this schema is not in third normal form.

In class exercise #

Include in weekly homework.

image_2021-07-02-11-55-59

Is this schema in third normal form?

  • Check FD1
    • Left hand set is a superkey, good.
  • Check FD2
    • Left hand set is a superkey, good.
  • Check FD5
    • Left hand set is not a superkey…
    • Right hand set is a prime attribute, good.
  • Therefore, this schema is in third normal form.

Boyce-Codd normal form (BCNF) #

image_2021-07-02-11-54-43

This is the highest normal form we will study.

A relational schema \( R \) is in Boyce-Codd Normal Form (BCNF) if whenever a nontrivial functional dependency \( X \to A \) holds in \( R \) , then \( X \) is a superkey of \( R \) .

This is a stricter version of the third normal form, it only checks the first condition.

image_2021-07-02-12-16-02

Is this schema in BCNF?

  • Check FD1
    • Left hand set is a superkey, good.
  • Check FD2
    • Left hand set is a superkey, good.
  • Check FD5
    • Left hand set is not a superkey.
  • Therefore, this schema is not in BCNF.

Normal form examples #

If we have the relation \[\begin{aligned} R(A, B, C, D, E) \end{aligned}\] with key \( \{A,B\} \) , and we have these dependencies:

image_2021-07-02-12-23-46

Which normal form is \( R \) in? Why?

  1. Is \( R \) in BCNF?
    • FD1: left hand set is a superkey? Yes
    • FD2: left hand set is a superkey? No
    • Not in BCNF
  2. Is \( R \) in third normal form?
    • FD1: left hand set is a superkey? Yes
    • FD2: left hand set is a superkey? No. Right hand set prime attribute? Yes
    • Passed third normal form checks

Therefore, \( R \) is in third normal form.

Note: We can check normal forms in increasing or decreasing order.

Storage #

File: Storage slides

Now that we’ve finished studying implementation, we will look at physical storage. We will look closer at how a DBMS accesses files on disk.

image_2021-07-02-12-41-41

Specific file format meant for read/write by the DBMS.

Physical storage #

image_2021-07-02-12-42-53 image_2021-07-02-12-44-43 image_2021-07-02-12-46-06

Records #

image_2021-07-02-12-48-04 image_2021-07-02-12-52-29 image_2021-07-02-12-52-37 image_2021-07-02-12-52-43

Record spanning #

image_2021-07-02-12-54-20 image_2021-07-02-13-02-59 image_2021-07-02-12-54-33

File headers #

image_2021-07-02-13-05-34

File operations #

image_2021-07-02-13-06-03 image_2021-07-02-13-06-13 image_2021-07-02-13-06-19 image_2021-07-02-13-06-24 image_2021-07-02-13-06-29 image_2021-07-02-13-06-32

Record ordering #

image_2021-07-02-13-06-48 image_2021-07-02-13-07-54

Moving items when deleting from the middle, ie

image_2021-07-02-13-10-41

can be really expensive. So instead we put a marker for deletion in place

image_2021-07-02-13-11-17

Once there are a lot of markers in a block, space will have to be reorganized.

image_2021-07-02-13-08-02 image_2021-07-02-13-12-54

  • This file spans multiple blocks
  • It is sorted by name
  • If we search based by name, we can use binary search, resulting in \( O(\lg n) \) time complexity.
  • If we search by salary, best we can do is a linear search, resulting in \( O(n) \) time complexity.
    • If we order by salary, then we could use a binary search.

image_2021-07-02-13-13-02