Home > ============================================================================

============================================================================

============================================================================
CS 346 Fall '98 - Prof. Widom - Handout #9
LECTURE NOTES - Query processing lecture #1
============================================================================


Steps in database query processing
----------------------------------

Query string --> Parser

Query tree --> Checker

Valid query tree --> View expander

Valid tree w/o views --> Logical query plan generator

Logical query plan --> Query rewriter (heuristic)

Better logical plan --> Physical query plan generator (cost-based)

Selected physical plan --> Code generator

Executable code --> Execution engine


Running example
---------------

Student(ID, Name, Major) // ID is key
Course(Num, Dept) // Num is key
Taking(ID, Num) // (ID,Num) is key

Query to find all EE students taking at least one CS course:

SELECT DISTINCT Name
FROM Student, Course, Taking
WHERE Taking.ID = Student.ID
AND Taking.Num = Course.Num
AND Major = 'EE'
AND Dept = 'CS'


Parser
------

Produces tree representation of the query string:















Checker
-------

Verifies query tree against database schema:

- All tables in FROM clause exist

- All columns of tables exist

- No ambiguities in table references or unqualified attribute
references (table names usually added at this point)

- All comparisons, aggregations, etc. are type-compatible


View expander
-------------

Suppose Student is actually a view:

StudName(ID, Name)
StudMajor(ID, Major)

CREATE VIEW Student AS
SELECT StudName.ID, StudName.Name, StudMajor.Major
FROM StudName, StudMajor
WHERE StudName.ID = StudMajor.ID

Original query becomes:












Or "flattened":












Logical and physical query plans
--------------------------------

- Both are trees representing query evaluation
- Leaves of the tree represent data
- Internal nodes of the tree are "operators" over the data










- Logical plan is higher-level and algebraic
- Physical plan is lower-level and operational

- Logical plan operators correspond to query language constructs
- Physical plan operator correspond to implemented "access methods"


Logical plan operators
----------------------

- Extended relational algebra

- Leaves of logical plans are table names

- Basic operators: SELECT, PROJECT, CROSS-PRODUCT, UNION, DIFFERENCE


- Abbreviations: NATURAL-JOIN, THETA-JOIN, INTERSECT


- Extensions: RENAME, AGGREGATE/GROUP-BY, DISTINCT (+ others)


Logical query plan for running example query (3-way cross-product):















Using 2-way joins instead of 3-way cross-product:















Logical plan generation
-----------------------

Usually there's a straightforward mapping from a valid parse tree to a
"naive" logical query plan. The logical plan may then be rewritten to
a "better" one.

Push selections down in previous example:















Physical plan operators
-----------------------

- Leaves of physical plans are usually table names, sometimes indexes
or other information.

- Access methods for single tables:

* TABLE SCAN
* INDEX SCAN W/O CONDITION
* CONDITION-BASED INDEX SCAN

- Join methods:

* NESTED-LOOP JOIN (various algorithms/improvements)
* SORT-MERGE JOIN
* HASH JOIN (various algorithms)

- Other important operators:

* SORT
* GROUP
* AGGREGATE (may be combined with GROUP)
* UNION, EXCEPT, INTERSECT

- Operators often merged with other ones:

* SELECTION (APPLY-PRED)
* PROJECTION
* DISTINCT

- In a distributed system:

* SHIP

- In a parallel system:

* EXCHANGE


Possible physical plan for running example query:




















Another possible physical plan:




















Physical plan generation
------------------------

There are lots and lots of possible physical query plans for a given
logical plan. The physical plan generator (sometimes called "physical
plan enumerator") tries to select the one that will be "optimal",
usually with respect to response time or in some cases throughput.

=> This is the meatiest part of query processing!!!


Code generator
--------------

Translates physical query plan tree into executable code. Very
system-specific -- some systems may instead use a query plan
interpreter.


Some terminology
----------------

"Query processing" = entire process

"Query optimization" = rewriter + physical plan generator

"Query compilation" = entire process through code generation

"Query execution" = execution engine


Coming up
---------

- Query rewriting
- Passing information between physical operators
- Cost metrics for physical query plans
- Physical query plan enumeration
- Reducing the search space
- Further optimizations


Exercise
--------

Consider the following modification of our running example:

Student(ID, Name, Major) // ID is key
Course(Num, Dept) // Num is key
Took(ID, Num, Quarter) // (ID,Num) is key

Query to find all EE students who took at least one CS course in the
fall quarter 1998:

SELECT *
FROM Student, Course, Took
WHERE Took.ID = Student.ID
AND Took.Num = Course.Num
AND Student.Major = 'EE'
AND Course.Dept = 'CS'
AND Took.Quarter = 'Fall98'

Suppose:

- The DBMS supports NESTED-LOOP JOIN, SORT-MERGE JOIN, and HASH JOIN

- Each table can be accessed either with a TABLE SCAN or a
CONDITION-BASED INDEX SCAN (assume all relevant attributes are
indexed)

- For simplicity assume initially that the system will not perform a
NESTED-LOOP JOIN using an index on the inner relation's join attribute

Note: join associativity matters


Q: How many possible physical query plans are there for this query?

A:














Q: What if the system also supports NLJ with inner index?

Case (a): System can exploit 2 indexes on a table










Case (b): System can't exploit 2 indexes on a table

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#downhi.com
TOP