CS134-lecture-20210707

Storage cont. #

Sorted files cont. #

Since inserting into a ordered structure is expensive, we keep track of new inserts in a separate file.

Master files are ordered, overflow file are unordered.

image_2021-07-07-19-14-58

New inserts go to the overflow file.

When we search:

  • Start by searching the ordered master file, \( O(\lg n) \)
    • If we don’t find it, it is possible that it is in the overflow file
  • Search the unordered overflow file, \( O(n) \) however it is a small file

If at any point during the search process we find a match, the search returns. Periodically, the overflow file is merged into the master file.

Hashing functions #

image_2021-07-07-19-11-53 image_2021-07-07-19-34-39 image_2021-07-07-19-39-41

  • Applies hashing techniques to DBMS
  • Hashing for files on disk is called external hashing

image_2021-07-07-19-46-32 image_2021-07-07-19-48-01 image_2021-07-07-19-48-20

In summary #

There are 3 ways to order files

  • unordered
  • sorted
  • organized by hash

We can order the same table in multiple ways by using the tuple’s indexes. That way we can do a fast search on any attribute.

image_2021-07-07-19-52-21

Indexing structures #

File: Indexing slides

image_2021-07-07-19-54-45 image_2021-07-07-19-54-50 image_2021-07-07-19-55-09 image_2021-07-07-19-55-17

DBMS can take care of building indexes for specific attributes for you.

image_2021-07-07-19-59-10

B-Tree #

image_2021-07-07-19-59-29 image_2021-07-07-19-59-36 image_2021-07-07-19-59-39

A lot of DBMS use B+ Tree for indexing files.

image_2021-07-07-20-00-15