1.1 DISTRIBUTED DATABASE SYSTEMS

 

1.1.1 Relational database systems

 

•Information is stored in 2D-tables

 

•Rows in a table are records

 

•Columns in a record are fields

 

•A key, consisting of one or more fields, that uniquely identifies a record

 

•A dictionary is a table that describes all the tables

 

•Fully-partitioned: if each table is stored at exactly one physical site

 

•Fully-replicated: if each table is stored at all physical sites

 

•Natural distribution: data are kept at the local site

 

•Query operations:

- Select - picking records

- Project - picking fields

- Join - merging of tables

 

•Predicate: a condition between fields used to manipulate the queries

 

1.1.2 Issues for distributed database systems

 

• Distribution of tables to sites

 

• Natural distribution of data at various sites

 

• Fully partition of systems

 

• Fully replicated systems

 

• Important factors

 

• Replicates the dictionary at every site

 

• Frequency of request to a table from a site

 

• Storage capacity at each site

 

• Communication costs between sites

 

1.1.2.1 - Query processing

 

• Query response time (for interactive applications)

 

• Total bandwidth consumed (for batch applications)

 

• Approach

- Optimizing specific query based upon specific statistical conditions

- The query site will either “estimate” or request the related sites to report the related time and cost of moving the data before deciding on an actual query sequence

 

1.1.2.2 - Concurrency control

 

• Maximize the amount of parallel activity while maintaining the semantic integrity of the data

 

• Approach

- A transaction: a set of reads, followed by some processing, and then a set of writes

- A log is the time ordered sequence of reads and writes performed on the database

- A log is serial if each read is immediately followed by an appropriate write

- There is no known algorithm that allows serial logs, all serializable logs, and all other logs that leave the database consistent

- Most algorithms achieve serializable logs by allowing transaction to lock part of the database

- The lock could be applied on the full database, or some tables, records, fields, and physical sectors

- Deadlock occurs when two queries want to lock certain resources that have already been locked by each other