- --------------------
- EDIT THIS PAGE
- --------------------
- START NEW
- --------------------
- FORUMS
- --------------------
- SITE NEWS
- --------------------
Relational algebra
prints friendly
Relational algebra
please note:
- the text and code below is from The Pseudopedia
- it has been imported raw for GetWiki
{{Distinguish2|
relation algebras, which are motivated by binary relations and relevant to mathematical logic.}}
Relational algebra, an offshoot of
first-order logic (and of
algebra of sets), deals with a set of
finitary relations (see also
relation (database)) which is
closed under certain
operators. These operators operate on one or more relations to yield a relation. Relational algebra is a part of
computer science.
Introduction
Relational algebras received little attention until the publication of
E.F. Codd's
relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages. (See "Implementations" below.)Relational algebra is essentially equivalent in expressive power to
relational calculus (and thus
first-order logic); this result is known as
Codd's theorem. Some care, however, has to be taken to avoid a mismatch that may arise between the two languages since negation, applied to a formula of the calculus, constructs a formula that may be true on an infinite set of possible tuples, while the difference operator of relational algebra always returns a finite result. To overcome these difficulties, Codd restricted the operands of relational algebra to finite
relations only and also proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are found in many other logic-based computer languages. Codd defined the term
relational completeness to refer to a language that is complete with respect to first-order predicate calculus apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the applicability of his relational algebra for database purposes.
Primitive operations
As in any algebra, some operators are primitive and the others, being definable in terms of the primitive ones, are derived. It is useful if the choice of primitive operators parallels the usual choice of primitive logical operators. Although it is well known that the usual choice in logic of AND, OR and NOT is somewhat arbitrary, Codd made a similar arbitrary choice for his algebra.The six primitive operators of Codd's algebra are the
selection, the
projection, the
Cartesian product (also called the
cross product or
cross join), the
set union, the
set difference, and the
rename. (Actually, Codd omitted the rename, but the compelling case for its inclusion was shown by the inventors of ISBL.) These six operators are fundamental in the sense that none of them can be omitted without losing expressive power. Many other operators have been defined in terms of these six. Among the most important are
set intersection, division, and the natural join. In fact ISBL made a compelling case for replacing the Cartesian product by the natural join, of which the Cartesian product is a degenerate case.Altogether, the operators of relational algebra have identical expressive power to that of
domain relational calculus or
tuple relational calculus. However, for the reasons given in the Introduction above, relational algebra has strictly less expressive power than that of
first-order predicate calculus without function symbols. Relational algebra actually corresponds to a subset of
first-order logic that is
Horn clauses
without recursion and negation.
Set operators
Although three of the six basic operators are taken from
set theory, there are additional constraints that are present in their relational
algebra counterparts: For
set union and
set difference, the two
relations involved must be
union-compatible—that is, the two relations must have the same
set of attributes. As
set intersection can be defined in terms of
set difference, the two relations involved in set intersection must also be union-compatible.The Cartesian product is defined differently from the one defined in
set theory in the sense that tuples are considered to be 'shallow' for the purposes of the operation. That is, unlike in set theory, where the Cartesian product of a
n-tuple by an
m-tuple is a set of 2-tuples, the Cartesian product in relational algebra has the 2-tuple "flattened" into an
n+
m-tuple. More formally,
R ×
S is defined as follows:
R ⋅
S = {
r &cu(;
s|
r ∈
R,
s ∈
S}
In addition, for the Cartesian product to be defined, the two relations involved must have disjoint
headers — that is, they must not have a common
attribute name.
&(i; )">
Projection (
&(i;
)A
projection is a
unary operation written as
&(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ...aarg∈-→(:4(x;font-size:12(x;">n( R )
where
aarg∈-→(:4(x;font-size:12(x;">1...aarg∈-→(:4(x;font-size:12(x;">n
is a set of
attribute names. The result of such projection is defined as the
set that is obtained when all
tuples in
R
are restricted to the set
aarg∈-→(:4(x;font-size:12(x;">1...aarg∈-→(:4(x;font-size:12(x;">n
.
σ )">
Selection (
σ
)A
generalized selection is a
unary operation written as
σarg∈-→(:4(x;font-size:12(x;">var&(hi;(R)
where
var&(hi;
is a
propositional formula that consists of
atoms as allowed in the
normal selection and the logical operators
and
(
and),
or
(
or) and
˜
(
negation). This selection selects all those
tuples in
R
for which
var&(hi;
holds.
ρ )">Rename (
ρ
)A rename is a unary operation written as ρarg∈-→(:4(x;font-size:12(x;">a / b(R)
where the result is identical to R
except that the b
field in all tuples is renamed to an a
field. This is simply used to rename the attribute of a relation or the relation itself.Joins and join-like operators
Natural join
Natural join (bowtie
) is a binary operator that is written as (R bowtie
S) where R and S are relations.(1) The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:{| style="margin: 0 auto;" cellpadding="20"|
valign="top"|{| class="wikitable"
|+ Employee
|-
! Name !! EmpId !! DeptName
|-
| Harry || 3415 || Finance
|-
| Sally || 2241 || Sales
|-
| George || 3401 || Finance
|-
| Harriet || 2202 || Sales
{| class="wikitable"
|+ Dept |- ! DeptName !! Manager |- | Finance || George |- | Sales || Harriet |- | Production || Charles
| {| class="wikitable"
|+ Employee
| bowtie
Dept
|-
! Name !! EmpId !! DeptName !! Manager
|-
| Harry || 3415 || Finance || George
|-
| Sally || 2241 || Sales || Harriet
|-
| George || 3401 || Finance || George
|-
| Harriet || 2202 || Sales || Harriet
This can also be used to define composition of relations. In category theory, the join is precisely the fiber product.The natural join is arguably one of the most important operators since it is the relational counterpart of logical AND. Note carefully that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value. In particular, natural join allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign key probably holds from Employee.DeptName to Dept.DeptName and then the natural join of Employee and Dept combines all employees with their departments. Note that this works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from Dept.manager to Emp.emp-number then we have to rename these columns before we take the natural join. Such a join is sometimes also referred to as an equijoin (see θ-join).More formally the semantics of the natural join is defined as follows:
R bowtie S = ( t &cu(; s || t ∈ R ∧ s ∈ S ∧ ((t &cu(; s) &nbs(;)
where (
is a predicate that is true for a binary relation r
iff r
is a functional binary relation. It is usually required that R
and S
must have at least one common attribute, but if this constraint is omitted then the natural join becomes exactly the Cartesian product.The natural join can be simulated with Codd's primitives as follows. Assume that b1,...,bm are the attribute names common to R, S, a1,...,an are theattribute names unique to R and c1,...,ck are theattribute unique to S. Furthermore assume that the attribute names d1,...,dm are neither in R nor in S. In a first step we can now rename the common attribute names in S:
T = ρarg∈-→(:4(x;font-size:12(x;">darg∈-→(:4(x;font-size:12(x;">1/barg∈-→(:4(x;font-size:12(x;">1lderiv(⋅)sdarg∈-→(:4(x;font-size:12(x;">m/barg∈-→(:4(x;font-size:12(x;">m(S) = ρarg∈-→(:4(x;font-size:12(x;">darg∈-→(:4(x;font-size:12(x;">1/barg∈-→(:4(x;font-size:12(x;">1(ρarg∈-→(:4(x;font-size:12(x;">darg∈-→(:4(x;font-size:12(x;">2/barg∈-→(:4(x;font-size:12(x;">2(lderiv(⋅)sρarg∈-→(:4(x;font-size:12(x;">darg∈-→(:4(x;font-size:12(x;">m/barg∈-→(:4(x;font-size:12(x;">m(S)lderiv(⋅)s))
Then we take the Cartesian product and select the tuples that are to be joined:
P = σarg∈-→(:4(x;font-size:12(x;">barg∈-→(:4(x;font-size:12(x;">1=darg∈-→(:4(x;font-size:12(x;">1lderiv(⋅)sbarg∈-→(:4(x;font-size:12(x;">m=darg∈-→(:4(x;font-size:12(x;">m(R ⋅ T) = σarg∈-→(:4(x;font-size:12(x;">barg∈-→(:4(x;font-size:12(x;">1=darg∈-→(:4(x;font-size:12(x;">1(σarg∈-→(:4(x;font-size:12(x;">barg∈-→(:4(x;font-size:12(x;">2=darg∈-→(:4(x;font-size:12(x;">2(lderiv(⋅)sσarg∈-→(:4(x;font-size:12(x;">barg∈-→(:4(x;font-size:12(x;">m=darg∈-→(:4(x;font-size:12(x;">m(R ⋅ T)lderiv(⋅)s))
Finally we take a projection to get rid of the renamed attributes:
U = &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1lderiv(⋅)saarg∈-→(:4(x;font-size:12(x;">nbarg∈-→(:4(x;font-size:12(x;">1lderiv(⋅)sbarg∈-→(:4(x;font-size:12(x;">mcarg∈-→(:4(x;font-size:12(x;">1lderiv(⋅)scarg∈-→(:4(x;font-size:12(x;">k(P)
θ-join and equijoin
Consider tables Car and Boat which list models of cars and boats and their respective prices. Suppose a customer wants to buy a car and a boat, but she doesn't want to spend more money for the boat than for the car. The θ-join on the relation CarPrice ≥ BoatPrice produces a table with all the possible options.{| style="margin: 0 auto;" cellpadding="20"|
valign="top"|{| class="wikitable"
|+ Car
|-
! CarModel !! CarPrice
|-
| CarA || 20'000
|-
| CarB || 30'000
|-
| CarC || 50'000
{| class="wikitable"
|+ Boat |- ! BoatModel !! BoatPrice |- | Boat1 || 10'000 |- | Boat2 || 40'000 |- | Boat3 || 60'000
| {| class="wikitable"
|+
| matrix Car bowtie Boat scri(tsty≤ CarPrice ≥q BoatPrice endmatrix
|-
! CarModel !! CarPrice !! BoatModel !! BoatPrice
|-
| CarA || 20'000 || Boat1 || 10'000
|-
| CarB || 30'000 || Boat1 || 10'000
|-
| CarC || 50'000 || Boat1 || 10'000
|-
| CarC || 50'000 || Boat2 || 40'000
If we want to combine tuples from two relations where the combination condition is not simply the equality of shared attributes then it is convenient to have a more general form of join operator, which is the θ-join (or theta-join). The θ-join is a binary operator that is written as matrix R bowtie S a thη bendmatrix
or matrix R bowtie S a thη vendmatrix
where a and b are attribute names, θ is a binary relation in the set {<, ≤, =, >, ≥}, v is a value constant, and R and S are relations. The result of this operation consists of all combinations of tuples in R and S that satisfy the relation θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute.The simulation of this operation in the fundamental operations is therefore as follows:
In case the operator θ is the equality operator (=) then this join is also called an equijoin.Note, however, that a computer language that supports the natural join and rename operators does not need θ-join as well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product when there are no shared attributes). Semijoin
The semijoin is joining similar to the natural join and written as Rl⋅
S where R and S are relations. The result of thesemijoin is only the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names. For anexample consider the tables Employee and Dept and theirsemi join:{| style="margin: 0 auto;" cellpadding="20"|
valign="top"|{| class="wikitable"
|+ Employee
|-
! Name !! EmpId !! DeptName
|-
| Harry || 3415 || Finance
|-
| Sally || 2241 || Sales
|-
| George || 3401 || Finance
|-
| Harriet || 2202 || Production
{| class="wikitable"
|+ Dept |- ! DeptName !! Manager |- | Sales || Harriet |- | Production || Charles
| {| class="wikitable"
|+ Employee
| l⋅
Dept
|-
! Name !! EmpId !! DeptName
|-
| Sally || 2241 || Sales
|-
| Harriet || 2202 || Production
More formally the semantics of the semijoin is defined asfollows:
R l⋅
S = {
t :
t ∈
R,
s ∈
S, fun (
t &cu(;
s) }
where fun(r) is as in the definition of natural join.The semijoin can be simulated using the natural join asfollows. If a1, ..., an are theattribute names of R, then
R l⋅
S =
Π
a1,..,an(
R bowtie
S).
Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin. Antijoin
The antijoin, written as R triang≤&nbs(;)
S where R and S are relations, is similar to the natural join, but the result of an antijoin is only those tuples in R for which there is NOT a tuple in S that is equal on their common attribute names. For an example consider the tables Employee and Dept and theirantijoin:{| style="margin: 0 auto;" cellpadding="20"|
valign="top"|{| class="wikitable"
|+ Employee
|-
! Name !! EmpId !! DeptName
|-
| Harry || 3415 || Finance
|-
| Sally || 2241 || Sales
|-
| George || 3401 || Finance
|-
| Harriet || 2202 || Production
{| class="wikitable"
|+ Dept |- ! DeptName !! Manager |- | Sales || Harriet |- | Production || Charles
| {| class="wikitable"
|+ Employee
| triang≤&nbs(;)
Dept
|-
! Name !! EmpId !! DeptName
|-
| Harry || 3415 || Finance
|-
| George || 3401 || Finance
The antijoin is formally defined as follows:
R triang≤&nbs(;)
S = {
t :
t ∈
R and
≠≥ξsts
s ∈
S : fun (
t &cu(;
s) }
or
R triang≤&nbs(;)
S = {
t :
t ∈
R, there is no tuple
s of
S that satisfies fun (
t &cu(;
s) }
where fun(r) is as in the definition of natural join.The antijoin can also be defined as the complement of the semijoin, as follows:
R triang≤&nbs(;)
S =
R -
R l⋅
S
Given this, the antijoin is sometimes called the anti-semijoin, and the antijoin operator is sometimes written as semijoin symbol with a bar above it, instead of triang≤&nbs(;)
. Division
The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R. For an example see the tables Completed,DBProject and their division:{| style="margin: 0 auto;" cellpadding="20"|{| class="wikitable"
|+ Completed
|-
! Student !! Task
|-
| Fred || Database1
|-
| Fred || Database2
|-
| Fred || Compiler1
|-
| Eugene || Database1
|-
| Eugene || Compiler1
|-
| Sara || Database1
|-
| Sara || Database2
{| class="wikitable"
|+ DBProject |- ! Task |- | Database1 |- | Database2
| {| class="wikitable"
|+ Completed ÷ DBProject |- ! Student |- | Fred |- | Sara
| If DBProject contains all the tasks of the Databaseproject then the result of the division above contains exactly allthe students that have completed the Database project.More formally the semantics of the division is defined as follows:
R ÷
S = {
t[
a1,...,an] : t
∈
R ∧
∀
s ∈
S ( (
t[
a1,...,an]
&cu(;
s)
∈
R) }
where {a1,...,an} is the set ofattribute names unique to R andt[a1,...,an] is the restriction oft to this set. It is usually required that the attribute namesin the header of S are a subset of those of R becauseotherwise the result of the operation will always be empty.The simulation of the division with the basic operations is asfollows. We assume that a1,...,an arethe attribute names unique to R andb1,...,bm are the attribute names ofS. In the first step we project R on its unique attributenames and construct all combinations with tuples in S:
T := πa1,...,an(R) × S
In the prior example, T would represent a table such that every Student (because Student is the unique key / attribute of the Completed table) is combined with every given Task. So Eugene, for instance, would have two rows, Eugene -> Database1 and Eugene -> Database2 in T.In the next step we subtract R from thisrelation:
U := T - R
Note that in U we have the possiblecombinations that "could have" been in R, but weren't. So ifwe now take the projection on the attribute names unique to Rthen we have the restrictions of the tuples in R for which notall combinations with tuples in S were present in R:
V := πa1,...,an(U)
So what remains to be done is take the projection of R on itsunique attribute names and subtract those in V:
W := πa1,...,an(R) - V
Outer joins
{{Split section|outer join|date=August 2008}}Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by "fill" values for each of the attributes of the other operand.The operators defined in this section assume the existence of a null value, ω, which we do not define, to be used for the fill values. It should not be assumed that this is the NULL defined for the database language SQL, nor should it be assumed that ω is a mark rather than a value, nor should it be assumed that the controversial three-valued logic is introduced by it.Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word "outer" is sometimes omitted.) Left outer join
The left outer join is written as R =X S where R and S are relations. The result of the left outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition (loosely speaking) to tuples in R that have no matching tuples in S.For an example consider the tables Employee and Dept and theirleft outer join:{| style="margin: 0 auto;" cellpadding="20"|
valign="top"|{| class="wikitable"
|+ Employee
|-
! Name !! EmpId !! DeptName
|-
| Harry || 3415 || Finance
|-
| Sally || 2241 || Sales
|-
| George || 3401 || Finance
|-
| Harriet || 2202 || Sales
|-
| Tim || 1123 || Executive
{| class="wikitable"
|+ Dept |- ! DeptName !! Manager |- | Sales || Harriet |- | Production || Charles
| {| class="wikitable"
|+ Employee =X Dept |- ! Name !! EmpId !! DeptName !! Manager |- | Harry || 3415 || Finance || ω |- | Sally || 2241 || Sales || Harriet |- | George || 3401 || Finance || ω |- | Harriet || 2202 || Sales || Harriet |- | Tim || 1123 || Executive || ω
| In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take a null value, ω.Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where tuples in DeptName have tuples of Finance or Executive.Let r1, r2, ..., rn be the attributes of the relation R and let {(ω, ..., ω)} be the singletonrelation on the attributes that are unique to the relation S (those that are not attributes of R). Then the left outer join can be described in terms of the natural join (and hence using basic operators) as follows:
(R bowtie S) &cu(; ((R - &(i;arg∈-→(:4(x;font-size:12(x;">rarg∈-→(:4(x;font-size:12(x;">1 rarg∈-→(:4(x;font-size:12(x;">2 deriv(⋅)s rarg∈-→(:4(x;font-size:12(x;">n(R bowtie S)) ⋅ (ω deriv(⋅)s ω))
Right outer join
The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched.The right outer join of relations R and S is written as R X= S. The result of the right outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R.For example consider the tables Employee and Dept and theirright outer join:{| style="margin: 0 auto;" cellpadding="20"|
valign="top"|{| class="wikitable"
|+ Employee
|-
! Name !! EmpId !! DeptName
|-
| Harry || 3415 || Finance
|-
| Sally || 2241 || Sales
|-
| George || 3401 || Finance
|-
| Harriet || 2202 || Sales
|-
| Tim || 1123 || Executive
{| class="wikitable"
|+ Dept |- ! DeptName !! Manager |- | Sales || Harriet |- | Production || Charles
| {| class="wikitable"
|+ Employee X= Dept |- ! Name !! EmpId !! DeptName !! Manager |- | Sally || 2241 || Sales || Harriet |- | Harriet || 2202 || Sales || Harriet |- | ω || ω || Production || Charles
| In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω.Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name attribute of the resulting relation where tuples in DeptName had tuples of Production.Let s1, s2, ..., sn be the attributes of the relation S and let {(ω, ..., ω)} be the singletonrelation on the attributes that are unique to the relation R (those that are not attributes of S). Then, as with the left outer join, the right outer join can be simulated using the natural join as follows:
(R bowtie S) &cu(; ((S - &(i;arg∈-→(:4(x;font-size:12(x;">sarg∈-→(:4(x;font-size:12(x;">1 sarg∈-→(:4(x;font-size:12(x;">2 deriv(⋅)s sarg∈-→(:4(x;font-size:12(x;">n(R bowtie S)) ⋅ (ω deriv(⋅)s ω))
Outer join
The outer join or full outer join in effect combines the results of the left and right outer joins.The full outer join is written as R =X= S where R and S are relations. The result of the full outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.For an example consider the tables Employee and Dept and theirfull outer join:{| style="margin: 0 auto;" cellpadding="20"|
valign="top"|{| class="wikitable"
|+ Employee
|-
! Name !! EmpId !! DeptName
|-
| Harry || 3415 || Finance
|-
| Sally || 2241 || Sales
|-
| George || 3401 || Finance
|-
| Harriet || 2202 || Sales
|-
| Tim || 1123 || Executive
{| class="wikitable"
|+ Dept |- ! DeptName !! Manager |- | Sales || Harriet |- | Production || Charles
| {| class="wikitable"
|+ Employee =X= Dept |- ! Name !! EmpId !! DeptName !! Manager |- | Harry || 3415 || Finance || ω |- | Sally || 2241 || Sales || Harriet |- | George || 3401 || Finance || ω |- | Harriet || 2202 || Sales || Harriet |- | Tim || 1123 || Executive || ω |- | ω || ω || Production || Charles
| In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take a null value, ω.The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows:
R=X=
S = (
R=X
S)
&cu(;
(
RX=
S)
Operations for domain computations
The aggregation operation
There are five aggregate functions that are included with most databases. These operations are Sum, Count, Average, Maximum and Minimum. In relational algebra the aggregation operation is written as follows:G1, G2, G3 g f(A1), g(A2), ... (r)The attributes preceding the g are grouping attributes, which function like a "group by" clause in SQL. Then there are an arbitrary number of aggregation functions applied to individual attributes. The operation is applied to an arbitrary relation r. The grouping attributes are optional, and if they are not supplied, the aggregation functions are applied across the entire relation to which the operation is applied.Let's assume that we have a table named Account with three columns, namely Account_Number, Branch_Name and Balance. We wish to find the maximum balance of each branch. This is accomplished by Branch_NameGMax(Balance)(Account). To find the highest balance of all accounts regardless of branch, we could simply write GMax(Balance)(Account). The extend operation
The extend operation is used for computed attributes. For example, if you have an attribute that contains the price of some item and another attribute that contains the quantity of that item, you can form a computed attribute: EXTEND (price * quantity) AS Total_Price. Limitation of relational algebra
Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations which cannot be expressed by relational algebra. The transitive closure of a binary relation is one of them.Given a domain D, let binary relation R be a subset of DxD. The transitive closure R+ of R is the smallest subset of DxD containing R which satifies the following condition:
∀
x
∀
y
∀
z ((x,y)
∈
R
+ ∧
(y,z)
∈
R
+ ⇒
(x,z)
∈
R
+)
It can be proven that there is no relational algebra expression E(R) taking R as a variable argument which produces R+. The proof is based on the fact that, given a relational expression E for which it is claimed that E(R) = R+, where R is a variable, we can always find an instance r of R (and a corresponding domain d) such that E(r) ≠ r+.(2) Use of algebraic properties for query optimization
Queries can be represented as a tree, where
- the internal nodes are operators,
- leaves are relations,
- subtrees are subexpressions.
Our primary goal is to transform expression trees into equivalent expression trees, where the average size of the relations yielded by subexpressions in the tree are smaller than they were before the optimization. Our secondary goal is to try to form common subexpressions within a single query, or if there are more than one queries being evaluated at the same time, in all of those queries. The rationale behind that second goal is that it is enough to compute common subexpressions once, and the results can be used in all queries that contain that subexpression.Here we present a set of rules, that can be used in such transformations. Selection
Rules about selection operators play the most important role in query optimization. Selection is an operator that very effectively decreases the number of rows in its operand, so if we manage to move the selections in an expression tree towards the leaves, the internal relations (yielded by subexpressions) will likely shrink. Basic selection properties
Selection is idempotent (multiple applications of the same selection have no additional effect beyond the first one), and commutative (the order selections are applied in has no effect on the eventual result).
σarg∈-→(:4(x;font-size:12(x;">A(R)=σarg∈-→(:4(x;font-size:12(x;">Aσarg∈-→(:4(x;font-size:12(x;">A(R)
σarg∈-→(:4(x;font-size:12(x;">Aσarg∈-→(:4(x;font-size:12(x;">B(R)=σarg∈-→(:4(x;font-size:12(x;">Bσarg∈-→(:4(x;font-size:12(x;">A(R)
Breaking up selections with complex conditions
A selection whose condition is a conjunction of simpler conditions is equivalent to a sequence of selections with those same individual conditions, and selection whose condition is a disjunction is equivalent to a union of selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that the component selections may be moved or optimized separately.
σarg∈-→(:4(x;font-size:12(x;">A ∧ B(R)=σarg∈-→(:4(x;font-size:12(x;">A(σarg∈-→(:4(x;font-size:12(x;">B(R))=σarg∈-→(:4(x;font-size:12(x;">B(σarg∈-→(:4(x;font-size:12(x;">A(R))
σarg∈-→(:4(x;font-size:12(x;">A ∨ B(R)=σarg∈-→(:4(x;font-size:12(x;">A(R)&cu(;σarg∈-→(:4(x;font-size:12(x;">B(R)
Selection and cross product
Cross product is the costliest operator to evaluate. If the input relations have N
and M
rows, the result will contain NM
rows. Therefore it is very important to do our best to decrease the size of both operands before applying the cross product operator.This can be effectively done, if the cross product is followed by a selection operator, e.g. σarg∈-→(:4(x;font-size:12(x;">A
(R
× P
). Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules.In the above case we break up condition A
into conditions B
, C
and D
using the split rules about complex selection conditions, so that A
= B
∧
C
∧
D
and B
only contains attributes from R
, C
contains attributes only from P
and D
contains the part of A
that contains attributes from both R
and P
. Note, that B
, C
or D
are possibly empty. Then the following holds:
σarg∈-→(:4(x;font-size:12(x;">A(R ⋅ P) = σarg∈-→(:4(x;font-size:12(x;">B ∧ C ∧ D(R ⋅ P) = σarg∈-→(:4(x;font-size:12(x;">D(σarg∈-→(:4(x;font-size:12(x;">B(R) ⋅ σarg∈-→(:4(x;font-size:12(x;">C(P))
Selection and set operators
Selection is distributive over the setminus, intersection, and union operators. The following three rules are used to push selection below set operations in the expression tree. Note, that in the setminus and the intersection operators it is possible to apply the selection operator to only one of the operands after the transformation. This can make sense in cases, where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits of using a smaller relation as an operand.
σarg∈-→(:4(x;font-size:12(x;">A(Rsetmiνs P)=σarg∈-→(:4(x;font-size:12(x;">A(R)setmiνs σarg∈-→(:4(x;font-size:12(x;">A(P) =σarg∈-→(:4(x;font-size:12(x;">A(R)setmiνs P
σarg∈-→(:4(x;font-size:12(x;">A(R&cu(; P)=σarg∈-→(:4(x;font-size:12(x;">A(R)&cu(;σarg∈-→(:4(x;font-size:12(x;">A(P)
σarg∈-→(:4(x;font-size:12(x;">A(R&ca(; P)=σarg∈-→(:4(x;font-size:12(x;">A(R)&ca(;σarg∈-→(:4(x;font-size:12(x;">A(P)=σarg∈-→(:4(x;font-size:12(x;">A(R)&ca(; P=R&ca(; σarg∈-→(:4(x;font-size:12(x;">A(P)
Selection and projection
Selection is associative with projection if and only if the fields referenced in the selection condition are a subset of the fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of duplicates resulting from elided fields).
&(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ...aarg∈-→(:4(x;font-size:12(x;">n(σarg∈-→(:4(x;font-size:12(x;">A( R )) = σarg∈-→(:4(x;font-size:12(x;">A(&(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ...aarg∈-→(:4(x;font-size:12(x;">n( R ));;;mathrmwhere;;mathrmfields ∈ A &su(;eq aarg∈-→(:4(x;font-size:12(x;">1...aarg∈-→(:4(x;font-size:12(x;">n
Projection
Basic projection properties
Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection.
&(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(&(i;arg∈-→(:4(x;font-size:12(x;">barg∈-→(:4(x;font-size:12(x;">1 ... barg∈-→(:4(x;font-size:12(x;">m(R)) = &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(R);;;mathrmwhere;;aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n &su(;eq barg∈-→(:4(x;font-size:12(x;">1 ... barg∈-→(:4(x;font-size:12(x;">m
Projection and set operators
Projection is distributive over set difference, union, and intersection.
&(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(R setmiνs P) = &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(R) setmiνs &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(P)
&(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(R &cu(; P) = &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(R) &cu(; &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(P)
&(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(R &ca(; P) = &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(R) &ca(; &(i;arg∈-→(:4(x;font-size:12(x;">aarg∈-→(:4(x;font-size:12(x;">1 ... aarg∈-→(:4(x;font-size:12(x;">n(P)
Rename
Basic rename properties
Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames adjacent so that they can be collapsed.
ρarg∈-→(:4(x;font-size:12(x;">a / b(ρarg∈-→(:4(x;font-size:12(x;">b / c(R)) = ρarg∈-→(:4(x;font-size:12(x;">a / c(R)
ρarg∈-→(:4(x;font-size:12(x;">a / b(ρarg∈-→(:4(x;font-size:12(x;">c / d(R)) = ρarg∈-→(:4(x;font-size:12(x;">c / d(ρarg∈-→(:4(x;font-size:12(x;">a / b(R))
Rename and set operators
Rename is distributive over set difference, union, and intersection.
ρarg∈-→(:4(x;font-size:12(x;">a / b(R setmiνs P) = ρarg∈-→(:4(x;font-size:12(x;">a / b(R) setmiνs ρarg∈-→(:4(x;font-size:12(x;">a / b(P)
ρarg∈-→(:4(x;font-size:12(x;">a / b(R &cu(; P) = ρarg∈-→(:4(x;font-size:12(x;">a / b(R) &cu(; ρarg∈-→(:4(x;font-size:12(x;">a / b(P)
ρarg∈-→(:4(x;font-size:12(x;">a / b(R &ca(; P) = ρarg∈-→(:4(x;font-size:12(x;">a / b(R) &ca(; ρarg∈-→(:4(x;font-size:12(x;">a / b(P)
Implementations
The first query language to be based on Codd's algebra was ISBL, and this pioneering work has been acclaimed by many authorities as having shown the way to make Codd's idea into a useful language. Business System 12 was a short-lived industry-strength relational DBMS that followed the ISBL example. In 1998 Chris Date and Hugh Darwen proposed a language called Tutorial D intended for use in teaching relational database theory, and its query language also draws on ISBL's ideas. Rel is an implementation of Tutorial D. Even the query language of SQL is loosely based on a relational algebra, though the operands in SQL (tables) are not exactly relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users).See also
References
-
[In Unicode, the bowtie symbol is {{unicode|⋈}} (U+22C8).]
-
[ JOURNAL, Universality of data retrieval languages, Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1979, Alfred V., Aho, Jeffrey D. Ullman, 110–119,weblink ]
External links
{{Databases}}Relationale AlgebraÁlgebra relacionalجبر رابطهایAlgèbre relationnelle관계대수Aljabar relasionalAlgebra relazionaleRelationele algebra関係代数 (関係モデル)RelasjonsalgebraRachunek relacyjnyÁlgebra relacionalРеляционная алгебраRelačná algebraBağıntısal cebirРеляційна алгебраĐại số quan hệ关系代数 (数据库)
- content above as imported from The Pseudopedia
- "Relational algebra" does not exist on GetWiki
- time: 9:24pm EDT - Thu, Mar 18 2010