Documente online.
Zona de administrare documente. Fisierele tale
Am uitat parola x Creaza cont nou
 HomeExploreaza
upload
Upload




Physical Data Structures

computers


Physical Data Structures

6.1. - Generalities



6.1.1. - Primary file organization

6.1.2. - Access structures - indexes

6.1.2.1. - Data structures transformation

6.1.2.2. - Indexes

dense & non-dense indexes

ordered & unordered indexes

search trees

B-tree &B+ tree

6.1. - Generalities

In this section we turn our attention to the techniques used for physical storage of data on the computer system. Databases are typically organized into files of records, which are stored on computer disks.

We will focus our presentation on the main physical structures of data used in the relational model of databases:

- The primary files (data base files) - the structures that assures the store of the data values in the external storage

- The secondary structures (the index files) - the structures that speed up searching for and retrieving records

6.1.1. - Primary file organization

From the three primary methods for organizing records of the files:

- unordered records

- ordered records

- hashed records

we will present the most appropriate structure for the STORED FILES associated with a BASE TABLE (RELATION), namely the Files of Unordered Records.

In the simplest and most basic type of organization, records are placed in the file in the order in with they are inserted, and new records are inserted at the end of file. Such an organization is called a HEAP file, PILE file or SEQUENTIAL file. This organization is often used with additional access paths, such as the secondary indexes discussed in the next chapter.

Inserting a new record is very efficient:

the last disk block of the file is copied into a buffer

the new record is added in the block

the block is rewritten back to disk

the address of the last file block is kept in the file header

However, searching for a record using any search condition involves a linear search through the file block by block - an expensive procedure. If only one record satisfies the search condition, then, on the average, a program will read into memory and search half the file block before it finds the record. If no records or several records satisfy the search condition, the program must read and search all blocks in the file.

To delete a record, a program can use one of the following techniques:

- First find the record, copy the block into a buffer, then delete the record from the buffer, and finally rewrite the block back to the disk. This leaves extra unused space in the disk block. Deleting a large number of records in this way results in wasted storage space

- Have an extra byte or bit, called deletion marker, stored with each record. A record is deleted by setting the deletion marker to a certain value. A different value of the marker indicates a valid (not deleted) record. Search programs consider only valid records in a block when conducting their search.

Both of these techniques require periodic reorganization of the file to reclaim the unused space of deleted records. During reorganization, the file blocks are accessed consecutively, and records are packed by removing deleted records. After such reorganization, the blocks are filled to capacity once more.

Another possibility is to use the space of deleted records when inserting new records, although this requires extra bookkeeping to keep track of empty locations.

To read all records in order of the values of some field, we create a sorted copy of the file. Sorting is an expensive operation for a large disk file, and special external sorting techniques are used.

For a file of unordered, fixed-length records it is straightforward to access any records by its position in the file, the so named RECORD NUMBER. Such a file is often called relative file because records can easily be accessed by their relative position.

Accessing records by their position does not help locate a record based on a search condition; however, it facilitates the construction of access path on the file, such as the indexes discussed in the next section.

6.1.2. - Access structures - indexes

We describe access structures called INDEXES, which are used to speed up the retrieval of records in response to certain search conditions. Some types of indexes termed secondary access paths, do not affect the physical placement of records on the main file. From this reason the indexes does not help for relational data structures, they rather provide alternative search paths for locating the records efficiently based on the indexing fields.

Before to present the different classifications of the access structure let us to present shortly the born of these data structures.

6.1.2.1. - Data structures transformation

Let us consider a data collection that describes an Entity Class of STUDENTS:

Number

Name

Surname

Sex

Birthday

Group

Nr1

N1

S1

M

B1

G1

Nr2

N2

S1

M

B2

G2

Nr3

N3

S2

F

B3

G1

Nr4

N3

S3

M

B4

G2

Nr5

N4

S4

F

B1

G1

This collection is described by an unordered set of records.

Data collection INVERSION

The user need to order this collection of entities by different criteria:

the Number

the Name

the Surname

the Sex

the Birthday

the Group

For this scope we can use different methods:

change the physical order of the records in the file (ORDER INVERSION)

sort the file of the desired order criteria

rewrite the file

add a new auxiliary structure (INDEX) related to the first (ACCES INVERSION)

direct access - from ENTITY to ATTRIBUTE

inverse access - from ATTRIBUTE to ENTTITY

NUMBER index:

Attribute value

Record address (internal number)

Nr1

Rn1

Nr2

Rn2

Nr3

Rn3

Nr4

Rn4

Nr5

Rnr5

NAME index:

Attribute value

Record address (internal number)

N1

Rn1

N2

Rn2

N3

Rn3

Rn4

N4

Rn5

SURNAME index:

Attribute value

Record address (internal number)

S1

Rn1

Rn2

S2

Rn3

S3

Rn4

S4

Rnr5

NAME+SURNAME index:

Attribute value

Record address (internal number)

N1+S1

Rn1

N2+S1

Rn2

N3+S2

Rn3

N3+S3

Rn4

N5+S4

Rnr5

SEX index:

Attribute value

Record address (internal number)

M

Rn1

Rn2

Rn4

F

Rn3

Rnr5

The different values in the ATTRIBUTE value column of the INDEX collection are named INDEX ENTRIES. We can define two index categories:

DENSE index - the index has an entries for each record in the main collection (base table)

Ex. NUMBER

Each IDENTIFIERS guide to DENSE indexes

NONDENSE index - the index can have an entries for more that one record in the main collection (base table)

Ex. NAME, SURNAME

Each DESCIPTORS guide to NONDENSE indexes

The access in he index structure can be improved by removing the variable length of an entrance in the index collection. This can be achieved by adding internal references in the index structure.

SEX index:

Attribute value

Record address

(internal number)

Index address

(internal number)

M

Rn1

In2

Rn2

In4

F

Rn3

In5

Rn4

Rnr5

The Index address refers the next record in the index collection with the same Attribute value. These references assure the possibility to add in the index collection an initial unknown number of instances with the same attribute value.

An index collection play three function in a Database structure:

identification

access

order

From this reason the index collections can be viewed also as:

unordered collection

ordered collection

The recently used index structures in the commercial relational data bases are oriented on the Search Tree and B-TREES.


B-Trees and B+-Trees are special cases of the well-known tree data structure. We introduce very briefly the terminology used in discussing tree data structures. A tree is formed of nodes. Each node in the tree, except for a special node called root, has one parent node and several - zero or more - child nodes. The root node has no parent. A node that does not have any child is called a leaf node ; a non-leaf node is called an internal node. The level of a node is always one more than the level of its parent, with the level of the root being zero. A sub-tree of a node consists of that node and all its descendent nodes. A precise recursive definition of a sub-tree is that it consists of a node n and the sub-trees of all the child nodes of n.

A search tree is a special type of tree that is used to guide the search for a record, given the value of one of its fields. The index field values in each node guide us to the next node, until we reach the data file block that contains the required records. By following a pointer, we restrict our search at each level to a sub-tree of the search tree and ignore all nodes not in this sub-tree.


Fig. 8.  - Search tree

A B-tree is a search tree with some additional constraints, which ensure that the tree is always balanced and that the space wasted by deletion, if any, never becomes excessive.

In a B+-tree, data pointers are stored only at the leaf nodes of the tree. The leaf nodes of the B+-tree are usually linked together to provide ordered access on the search field to the records.


Fig. 9.  - Search B+-tree

The structure of a B+-tree has the following constraints:

- The internal nodes

1.- each internal node is of the form: < P1, K1, P2, K2, .,Pq-1, Kq >

where q<=p and each Pi is a tree pointer

2.- within each internal node K1 < K2 < . < Kq-1

3.- for all search field values X in the subtree pointed at by Pi , we have :

X <= Ki for i = 1

Ki-1 < X <Ki for 1 < i < q

X > Ki-1 for i = q

4.- each internal node has at most p tree pointers

5.- each internal node, except the root node, has at least p/2 tree pointers ; the root has at least two tree pointers if it is an internal node

6.- an internal node with q pointers, q<=p , has q-1 search field values

- The leaf nodes

1.- each leaf node is of the form : < < K1, Pr1 >, < K2, Pr2 >, .

< Kq-1, Prq-1 > , P next >

where q<=p and each Pri is a data pointer, and P next points to the leaf node of the tree

2.- within each leaf node K1 < K2 < . < Kq-1 , q<=p

3.- each Pri , is a data pointer that point to the record whose search field value is Ki or to qa file block containing the record ( or to a block of record pointers that point to records whose search field value is Ki if the search field is not a key )

4.- each leaf node has at least p/2 values

5.- all leaf nodes are at the same level


Document Info


Accesari: 1198
Apreciat: hand-up

Comenteaza documentul:

Nu esti inregistrat
Trebuie sa fii utilizator inregistrat pentru a putea comenta


Creaza cont nou

A fost util?

Daca documentul a fost util si crezi ca merita
sa adaugi un link catre el la tine in site


in pagina web a site-ului tau.




eCoduri.com - coduri postale, contabile, CAEN sau bancare

Politica de confidentialitate | Termenii si conditii de utilizare




Copyright © Contact (SCRIGROUP Int. 2024 )