Showing posts with label Databse Management Systems. Show all posts
Showing posts with label Databse Management Systems. Show all posts

Monday, 7 December 2020

Normalization: Pros and Cons and De-Normalization

 

Advantages of Normalization:

 

The database design refined through the process of normalization renders many advantages as compared to that without normalization. They are as follows.

 

  1. Simple, understandable grammar of attributes:

 

The attributes of the relations are grouped in such a way that their meaning and their relation with each other is a very clear, simple and unambiguous.

 

  1. Reduced  repetition of information:

 

The data redundancy is reduced which leads to many advantages of reducing the database updation anomalies namely insertion, updation and deletion anomalies.

 

  1. Dependency preservation:

 

As all the attributes which are fully functionally dependent on the primary key are grouped together in a relation, the system overhead for checking each database update is reduced.

 

  1.  Avoiding spurious tuples:

 

If the database is not properly normalized, the join operation on the relation may result in some unambiguous tuples or even loss of some information. This is strictly avoided here by creating primary key- foreign key relationship every time the relation is decomposed.

 

 

 Disadvantages of Normalization:

 

  1. The extreme decomposition of relation may result in violation of dependency preservation.
  2. Decomposition of the relation after certain stage increases overhead on the DBMS. The DBMS has to keep track of all the relations in the database. The primary key, foreign key, relations between them, calculations of the join operation, indices, etc.
  3. There is no consideration of null values in the whole process of normalization. The null values in relations create dangling tuple, which may or may not appear in a join.

 

 

De - Normalization:

 

·         Once a set of database structure or tables has been modified in analysis, a process of de-normalization occurs in design.

·         De-normalization is the process of structuring the set of tables so that its performance is most efficient in the production environment.

·         A normalized set of tables is in the most efficient configuration logically, while a de-normalized set of tables is in most efficient form physically.

·         There are several performance considerations that affect database design.

·         These include database table access or update frequency, table size, index requirement, and security requirements.

 

 

 

 

 

Normalization: BCNF, 4NF and 5NF

 

   Boyce - Codd Normal Form (BCNF):

 

A relation is in Boyce Codd normal form if every attribute (for every set of attributes) on which some other attributes is fully functionally dependent is also a candidate key for a primary key of the relation (i.e. Candidate key).

 

Consider the following relation:

BCNF example



The candidate for primary key are:

 

·         Member + Sport

·         Member + Coach

 

 because either of these uniquely identifies the row.

 

The attribute Coach is not a candidate for a primary key even though the attribute Sport is fully functional dependent on it, since a Coach is associated with only one Sport (assumption a coach is an expert in one sport, though he can coach other sports also).

 

Since the relation has an attribute (Coach) on which some other attribute is fully functionally dependent, but it is not a candidate for the primary key, the above relation is not fully normalized. The above relation will result in the following problems:

 

If Mr Rajesh Iyer withdraws from the membership, information about coach Rane is also lost.

 

If coach Kurian’s name is mis-spelt and has to be corrected to Kurien, the update needs to be done in all the rows.

 

A better implementation of the situation can be made through the following relations.

BCNF implementation

By this process we have reduced the 3NF to BCNF by ensuring that they are in 3NF for every candidate key.


Ex 2:

BCNF ex2

The relations can be further decomposed to 4NF and 5NF if still there exists multivalued dependencies. However, this decomposition may lead to increased overhead on the system like maintaining the complex relationships between the tables, indexing mechanism, etc.


 Fourth Normal Form (4NF):

 

A relation is in the 4NF, if it is in the BCNF and does not have any independently multi-valued components of the primary key.

 

Assume that the Club keeps records of its members. For each member it keeps a list of sports that the member is most interested in and a list of most striking characteristics of each member which are inherent in the member and are in no way dependent on the sport.

 

Ex 1:


4NF example


According to the above structure, if another interest for example Cycling  is added for Mr Rao then, 2 more additional rows will be added - one for High Stamina  and another for Poor Speed. Thus, the above relation is normalized further to get 2 relations.

4NF implementation

Thus, we have reduced BCNF entity relation into 4NF by splitting into two relations the independently multivalued component of the primary key.


 Ex 2:

4NF ex2



Fifth Normal Form (5NF):

 

It is also called as Project Join Normal Form (PJNF).

If any relation has functional, multivalued and join dependencies then it can be decomposed into 5NF. For ex: consider the Loan relation with the following attributes.

 

(loan_no, branch_no, customer_no, amount)

 

This relation can be defined as a set of all tables on the loan relation such that:

 

·         The loan represented by loan_no   is made by the branch represented by the branch_no.

 

·         The loan represented by the loan_no is made to the customer represented by the customer_no.

 

·         The loan represented by loan_no is in the amount represented by amount.

 

Thus, the definition of the above relation is in the conjunction of 3 predicates with join dependencies:

 

·         on loan_no and branch_no.

·         on loan_no and customer_no.

·         on  loan_no and amount.

 

To put the Loan relation into PJNF, we must decompose it into 3 relations specified by the join dependency:

 

·         Loan_Branch (loan_no, branch_no)

·         Loan_Customer (loan_no, customer_no)

·         Loan_Amount (loan_no, amount)



Normalization in DBMS: 1NF, 2NF and 3NF

 

Normalization:

 Normalization is a process of decomposing the relational schemas and breaking them into smaller relational schemas that possesses desirable properties.


1.    First Normal Form (1NF):

 

The definition of 1NF says that all the attributes in a relation must be atomic i.e., simple and indivisible values from their respective domains. Multivalued attributes, composite attributes, nested relations are split such that they can be made atomic. Nested relations are the relations within relations.


1NF example


The above relation contains a relation ‘Project’ which must be split into two relations as:

 

Emp (EmpNo, EmpName, EmpCity)

EmpProject (EmpNo, ProjectNo, ProjectName, TimeAllotted)

 

where, primary key EmpNo  of Emp  is combined in the primary key of EmpProject as {EmpNo, ProjectNo}.

 

Multivalued attributes:

 


Multivalued attributes example


Here, Qty_Per_Pack is a multivalued attribute. It should be split as;

 

Drug (DrugNo, Name, Rate, Tablet_Syrup, Unit, Packing)

Drug_Pack (DrugNo, Qty_Per_Pack)

 

2.   Second Normal Form (2NF):

 

Relation is said to be in 2NF if it is in 1NF and every non key attribute of the relation is fully functionally dependent on the key attribute. By fully functional dependency, we mean that if any attribute from the key attributes is removed, the dependency is not preserved.

 

Consider 1NF of ex:

 

Emp (EmpNo, EmpName, EmpCity)

EmpProject (EmpNo, ProjectNo, ProjectName, TimeAllotted)

 

Observe that EmpNo + ProjectNo is a key in EmpProject.

 

It means that (EmpNo, ProjectNo) → ProjectName

 

                         (EmpNo, ProjectNo) → Hours

 

should hold as fully functional dependency.

 

In case of (EmpNo, ProjectNo)→ TimeAllotted

 

if we remove any attributes from the key,

 

EmpNo → TimeAllotted or

ProjectNo → TimeAllotted

 

the functional dependency holds. But, in case of

(EmpNo, ProjectNo) → ProjectName,

observe that ProjectNo→ ProjectName  holds

but EmpNo → ProjectName  does not hold.

 

So, ProjectName is partially dependent on the key. In transformation of 2NF, such dependencies are split to form a new relation. The 2NF of above will be;

 

Project (ProjectNo, ProjectName)

Emp_Project (EmpNo, ProjectNo, TimeAllotted)

 

In this way, the partial dependencies, if any, are removed in the transformation from 1NF to 2NF.

 

 

3.   Third Normal Form (3NF):

 

Third normal form is applied when the relations are in 2NF and there is any non-key attribute that depends transitively on the primary key.

 

The attributes are said to be transitively dependent if PK → A and A → B.

 

An attribute A is dependent on the attribute PK (primary key) and B is dependent on A, implies that B is transitively dependent on PK => PK → B.


3NF example

Here, EmpNo → (EmpName, AreaMarket, Vehicle, Vehicle_Allowance)

But, EmpNo → Vehicle

And Vehicle → Vehicle_Allowance

So, EmpNo → Vehicle_Allowance

 

That means Vehicle_Allowance  is transitively dependent on Emp_No. To transform the relations to 3NF, the transitive dependencies if any, are removed by decomposing the relations and referring them by a primary key attribute of new relation. The 3NF will be:

 

Emp (EmpNo, EmpName, AreaMarket, Vehicle_No)

Vehicle (Vehicle_No, Vehicle_Name, Allowance)


DBMS Inference Rules

 

Inference rules in DBMS:

 

  1. Reflexivity rule: If 𝛂 is a set of attributes and 𝛃⊆𝛂, then 𝛂𝛃 holds. This rule states that a set of attributes always determines itself or any of its subsets. The functional dependency 𝛂𝛃 proven by the reflexivity rule is called a trivial functional dependency.
  2. Augmentation rule: If 𝛂𝛃 holds and 𝝲 is a set of attributes then 𝝲𝛂𝝲𝛃 holds. This rule says that adding the same set of attributes to both the left and right hand sides of a dependency results in another valid dependency.
  3. Transitivity rule: If 𝛂𝛃 holds and 𝛃𝝲 then 𝛂𝝲 holds. This rule states that functional dependencies are transitive.
  4. Union rule: If 𝛂 𝛃 holds and 𝛂𝝲 holds then 𝛂𝛃𝝲 holds. This rule allows us to combine a set of functional dependency {X → A1, X → A2… X → An }  into a single functional dependency X → {A1, A2, …, An}.
  5. Pseudo transitivity rule: If 𝛂𝛃 and 𝝲𝛃𝝳 holds, then 𝛂𝛃𝝳 holds.
  6. Decomposition rule: If 𝛂𝛃𝝲 holds then 𝛂𝛃 holds and 𝛂𝝲 holds.

 

Functional dependency is a constraint between two sets of attributes A and B belonging to some relation R, such that for any tuples t1 and t2, if t1[A]=t2[A],then we must also have t1[B]=t2[B] i.e. for a given value of A, there must be only one value of B.

 

The functional dependency is denoted as A → B i.e.  B is functionally dependent on A.

 

For ex:

 

Let       A= ItemCd

            B= { IName, Op_stock, Cur_stock, Rate }

 

For a given value of ItemCd, there is only one value of all the attributes in B or even a single attribute of B.

So we can say that

ItemCd→ { IName, Op_stock, Cur_stock, Rate }

 

and even

 

ItemCd→ IName or

ItemCd→ Op_stock and so on.

Functional Independence and Functional Dependencies in DBMS

 

Relational database schema consists of number of relations with number of attributes of various entities in the organization. However, the method of grouping these attributes in relations plays a very important role in whole database life. The quality of relational database schema depends on many properties given below.

 

1.    The attributes in a relation must be related to each other in a simple and understandable way.

2.    The duplicate values i.e. the redundant values in tuples must be reduced.

3.    The null values in tuples should be avoided.

4.    There should not be any possibility of any kind of unambiguous tuples being created by any type of operation on the database.

 

To make the database fulfil above conditions, the process of normalization is performed on the attributes. Normalization is a process of decomposing the relational schemas and breaking them into smaller relational schemas that possesses desirable properties.

 

 

Functional Independence:

 

Two fields are said to be functional independent if they are not fully dependent on the complete primary key.

 

For example: po_no, rate, item

 

Here rate is not dependent on the primary key. So, the rate is functionally independent.

 

 

 Functional Dependencies:

 

Functional dependencies are constraints on the set of legal relations. The notion of functional dependency is the generalization of the notion of a key. It is the property of the semantics or meaning of the attributes.

 

Let R be a relational schema and let (A1, A2… An) be the attributes of R. The functional dependency, denoted by X → Y, between two set of attributes X and Y (X R and Y R) that are subsets of R specifies a constraint on the possible tuples that can form a relation state ‘r’ of R.  The constraint is that, for any two tuples t1 and t2 in ‘r’, that have t1[X] =t2[X], we must also have t1[Y] =t2[Y].

 

This means that the values of the Y component of a tuple in ‘r’ depends on (or determined by) the values of the X component. Alternatively, the values of the X component of a tuple uniquely (or functionally) determine the values of the Y component.

 

So, X functionally determines Y in a relation schema R if and only if, whenever two tuples of r(R) of agree on their X value, they must necessarily agree on their Y value.

 

Note:

  1. If X is a candidate key of R, then X → Y for any subset of attributes of Y of R.
  2. If X → Y in R, this does not say whether or not Y → X in R.

 

Dependencies in DBMS

 

Dependencies in DBMS:

A dependency in DBMS is a relation between two or more attributes. It has the following types in DBMS:

  1. Functional Dependency
  2. Fully-Functional Dependency
  3. Transitive Dependency
  4. Multivalued Dependency
  5. Partial Dependency

 

1.   Functional Dependency:

If the information stored in a table can uniquely determine another information in the same table, then it is called Functional Dependency. Consider it as an association between two attributes of the same relation.

If P functionally determines Q, then P -> Q.

Ex:

EmpID

EmpName

EmpAge

E101

Ajit

30

E102

Rohit

32

 

In the above table, as you can see that, EmpName is functionally dependent on EmpID because EmpName can take only one value for the given value of EmpID. So,

EmpID ->EmpName

Thus, Employee Name (EmpName) is functionally dependent on Employee ID (EmpID).

 

2.   Fully-functionally Dependency:

An attribute is fully functional dependent on another attribute, if it is Functionally Dependent on that attribute and not on any of its proper subset.

For ex:, an attribute Q is fully functional dependent on another attribute P, if it is Functionally Dependent on P and not on any of the proper subset of P.

Ex: Consider the following ProjectCost and EmployeeProject relations.

ProjectID

ProjectCost

001

1000

002

5000

 

 

EmpID

ProjectID

Days (spent on the project)

E101

P01

220

E102

P02

150


The above relations states:

EmpID, ProjectID, ProjectCost -> Days

However, it is not fully functional dependent.

Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on the project by the employee.

This summarizes and gives our fully functional dependency −

{EmpID, ProjectID}  -> (Days)



3.   Transitive Dependency

When an indirect relationship causes functional dependency it is called Transitive Dependency.

If P -> Q and Q -> R is true, then P-> R is a transitive dependency.

 

4.   Multivalued Dependency

When existence of one or more rows in a table implies one or more other rows in the same table, then the Multi-valued dependencies occur.

If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.

It is represented by double arrow (->->).


For our example:

P->->Q
Q->->R

In the above case, Multivalued Dependency exists only if Q and R are independent attributes.

 

5.   Partial Dependency

Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a candidate key.

The 2nd Normal Form (2NF) eliminates the Partial Dependency. For ex. consider the below StudentProject relation:

 

StudentID

ProjectNo

StudentName

ProjectName

S01

199

Katie

Geo Location

S02

120

Ollie

Cluster Exploration

In the above table, we have partial dependency; let us see how −

·         The prime key attributes are StudentID and ProjectNo.

·         As stated earlier, the non-prime attributes i.e. StudentName and ProjectName should be functionally dependent on part of a candidate key, to be Partial Dependent.

·         The StudentName can be determined by StudentID that makes the relation Partial Dependent.

·         The ProjectName can be determined by ProjectID, which that the relation Partial Dependent.