CS134-lecture-20210629

SQL cont. #

image_2021-06-29-17-47-40

Subqueries can be used inside insert statements:

image_2021-06-29-17-47-46

This insert statement will update the numbers dynamically based on the response from the subquery. The next time an employee is inserted, we can run this insert statement to update DEPTS_INFO using a trigger.

Or, we can use views…

Virtual table view #

CREATE VIEW dept_info_view AS
    SELECT dname,
           COUNT(*) AS no_of_emps,
           SUM(salary) AS total_sal
    FROM department, employee
    WHERE dnumber=dno
    GROUP BY dname;

This results in a self updating table, called a view:

image_2021-06-29-18-01-59

We can query the view the same way we’d query a table.

SELECT * FROM dept_info_view;

The views are maintained by the DBMS. The views will be automatically updated, to main consistency between the view and the table being viewed. This is a way we can have derived attributes.

Note: Views are covered more in depth in CS174.

Functional dependencies and normalization #

File: Normalization slides

image_2021-06-29-18-28-58

Formal measures are also called normal forms. To understand normal forms, we will start with functional dependency.

Functional dependencies #

image_2021-06-29-18-32-14 image_2021-06-29-18-33-26

\[\begin{aligned} X \to Y \end{aligned}\]
  • \( X \) and \( Y \) are sets of attributes

The functional dependency means that if 2 tuples share the same \( X \) value for an attribute, it means they also share the same \( Y \) value for an attribute.

image_2021-06-29-18-35-24

When \( X \) and \( Y \) are sets of a single item, they may be commonly notated like

\[\begin{aligned} \text{ssn} \to \text{ename} \end{aligned}\]

So if 2 tuples share the same \( \text{ssn} \) , they will share the same \( \text{ename} \) .

image_2021-06-29-18-40-44 image_2021-06-29-18-42-38

Armstrong’s inferences rules #

image_2021-06-29-18-48-05 image_2021-06-29-18-49-05

  • For reflexivity, if \[\begin{aligned} \{\text{ssn}\} \subseteq \{\text{ssn,ename}\} \end{aligned}\] then \[\begin{aligned} \{\text{ssn,ename}\} \to \text{ssn} \end{aligned}\]

  • For augmentation, if \[\begin{aligned} \text{ssn} \to \text{ename} \end{aligned}\] we can add something to both sides, and it still holds \[\begin{aligned} \{\text{ssn, address}\} \to \{\text{ename, address}\} \end{aligned}\]

  • For transitivity, image_2021-06-29-18-55-31

image_2021-06-29-18-49-18

  • For decomposition, if \[\begin{aligned} \text{ssn} \to \{\text{ename, bdate, address}\} \end{aligned}\] then we also know \[\begin{aligned} \text{ssn} &\to \text{ename} \\ \text{ssn} &\to \text{bdate} \\ \text{ssn} &\to \text{address} \\ \text{ssn} &\to \{\text{ename, bdate}\} \\ \text{ssn} &\to \{\text{bdate, address}\} \\ \text{ssn} &\to \{\text{ename, address}\} \\ \end{aligned}\]
  • Union is the opposite of the decomposition above
  • Pseudotransitivity is the transitivity between decomposed sets above

Closure #

image_2021-06-29-19-01-31 image_2021-06-29-19-02-47

Each element of the set \( F \) is a functional dependency. The set \( F^+ \) is all the dependencies in \( F \) , plus all the dependencies that can be inferred from \( F \) .

The set \( X^+ \) is the set of attributes that are functionally deteremined by \( X \) based on \( F \) .

image_2021-06-29-19-06-45 image_2021-06-29-19-06-52

Equivalence #

image_2021-06-29-19-08-53 image_2021-06-29-19-11-32

Minimal sets #

image_2021-06-29-19-11-08 image_2021-06-29-19-15-34 image_2021-06-29-19-12-22 image_2021-06-29-19-12-31

Normalization #

image_2021-06-29-19-17-18

There are 4 normal forms we will study: 1NF, 2NF, 3NF, BCNF. The higher the number, the more strict the form.

image_2021-06-29-19-18-02

Keys and superkeys #

image_2021-06-29-19-23-28

  • any key is a superkey

image_2021-06-29-19-25-56

First normal form #

image_2021-06-29-19-27-02 image_2021-06-29-19-29-30

Since the Research department has a set of locations, it is not in first normal form. We can put it in first normal form by using decomposition.

image_2021-06-29-19-32-17

  • The disadvantage to solution 2 is that the department name and manager’s ssn is repeated for each location.
  • The disadvantage to solution 3 is that any tuple that doesn’t have exactly 3 locations will have a lot of null values in the table. Also the schema would needed to be modified if a department needed 4 locations.