7. Database Technology

7.1 Distributed Database System

7.1.1 Relational database systems

• Information is stored in 2D-tables

• Rows in a table are records

• Columns in a record are fields

• A key, consisting of one or more fields, that uniquely identifies a record

• A dictionary is a table that describes all the tables

• Fully-partitioned: if each table is stored at exactly one physical site

• Fully-replicated: if each table is stored at all physical sites

• Natural distribution: data are kept at the local site

• Query operations:

Select: picking records

Project: picking fields

Join: merging of tables

• Predicate: a condition between fields used to manipulate the queries

7.1.2 Issues for distributed database systems

• Distribution of tables to sites

• Natural distribution of data at various sites

• Fully partition of systems

• Fully replicated systems

• Important factors

• Replicates the dictionary at every site

• Frequency of request to a table from a site

• Storage capacity at each site

• Communication costs between sites - Query processing

• Query response time (for interactive applications)

• Total bandwidth consumed (for batch applications)

• Approach

Optimizing specific query based upon specific statistical conditions

The query site will either “estimate” or request the related sites to report the related time and cost of moving the data before deciding on an actual query sequence - Concurrency control

• Maximize the amount of parallel activity while maintaining the semantic integrity of the data

• Approach

A transaction: a set of reads, followed by some processing, and then a set of writes

A log is the time ordered sequence of reads and writes performed on the database

A log is serial if each read is immediately followed by an appropriate write

There is no known algorithm that allows serial logs, all serializable logs, and all other logs that leave the database consistent

Most algorithms achieve serializable logs by allowing transaction to lock part of the database

The lock could be applied on the full database, or some tables, records, fields, and physical sectors

Deadlock occurs when two queries want to lock certain resources that have already been locked by each other