IS 341
Business Systems Analysis
Chapter 9 - Notes
Designing Databases
Introduction
– Why use Databases? – Three reasons/benefits/goals of databases:
1.
Control Redundancy - Redundancy is having the same information stored in
more than one location; we allow and sometimes increase redundancy as a means
for more efficiency in Relational Databases, but we want to Control Redundancy,
not eliminate it
2.
Eliminate Inconsistency – Inconsistency is having the same information
stored in more than one location, and the values of that information are
different (one place says the LastName is “Jones”, another says “Smith”); we
want to eliminate these inconsistencies so that the LastName is always the same
no matter where it is found/stored (Smith or Jones, whichever is correct)
3.
Promote Program/Data Independence by Uncoupling the data from the
Programs – in earlier days of programming, the Processes/Programs were
considered more important than the data and the two were coupled such that changes
to a program many times required changes to data structures that also affected
other programs which then had to be modified (etc.). Today the Data is considered of primary
importance and should be structured apart from the processes so that all the
programs go to the same database and use the same structures of the same
data. This separation of programs and
data is termed Uncoupling, and allows the data and the programs to exist
independently.
I. Database Design – text says two steps (three
steps in reality):
1.
Develop (i.e., Choose) a Logical Database Model which describes the data
(most popular today is Relational model);
2.
Choose a DBMS that uses that model;
3.
Choose the technology (physical model) to support that DBMS.
***NOTE: Logical organizational needs and requirements
should drive the choice of DB model, and that choice should then drive
technical considerations, NOT the other way around. In reality (too many times!), the physical
choices (or constraints) drive the logical considerations. This is a serious mistake! But it happens all the time (just as in
politics….).
Logical database design is
driven by previously developed ERDs and by report and form layouts
A.
The Process of Database Design – four key steps in LOGICAL
database modeling and design:
1. Four key steps in LOGICAL database
modeling and design:
a.
Develop a logical data model for each user interface using
Normalization;
b. Combine normalized data requirements into one
logical database model (View Integration);
c. Translate ERD (conceptual data models)
developed without consideration of specific user interfaces into normalized
data requirements;
d. Compare the consolidated logical database
design with the translated ERD model and produce one final logical database model
through View Integration.
2. PHYSICAL database design – use the
results of the four steps above; also must consider definitions of each
attribute, descriptions of when/where of data entry/retrieval/update/deletion,
expected response time, data integrity, and both file and database technologies
to be used. All this allows choices for
physical database design, including:
a. Choosing storage format (data type) for each
attribute – format is chosen to minimize storage space and maximize data quality. Data type involves record length, coding
scheme, decimal places, minimum/maximum values, and other attributes;
b. Grouping attributes from the logical database
model into physical records;
c. Arrange related records in secondary memory
to facilitate efficient/rapid storage/retrieval/update (called File
Organizations);
d. Select media and structures to make data
storage more efficient (key indexes, etc.)
B.
Deliverables and Outcomes – the set of Relations (Tables with ALL the
attributes from the ERDs) that make up the database
II. Relational Database Model – represents data
as a set of Named, two-dimensional tables (Relations), consisting of Named
Columns (Attributes) and Un-Named Rows (Entity Instances, represented by the
specific data values for the attributes)
Shorthand Notation: TableName( AttributeName1, AttributeName2,….)
with the primary key underlined.
Student(Student_ID_Nbr,
Student_Name, Student_Address)
***The
order of the Rows and the Order of the Columns as Immaterial (does not matter)
*****Well-Structured Relations
– what constitutes a well-structured relation?
Minimal redundancy and the ability to insert, modify, or delete rows in
the table without errors or inconsistencies
Anomaly – an error in an RDB;
most errors in an RDB can be solved by Building A New Table.
III. Normalization –
1.
the process of
converting complex data structures into simple, stable data structures;
2.
the process of
organizing data attributes into stable, flexible, and adaptive entities
(relations)
A.
Rules of Normalization
1. Non-(Un-)Normalized – tables with repeating
data groups; tables with multi-valued
data attributes
2. First Normal Form (1NF) – no repeating data groups; no multi-valued data attributes
3. Second
a. must be in
1NF;
b. no partial
dependencies; all non-key attributes
must depend on the full key, not part of the key
4. Third
a. must be in
2NF;
b. no
transitive dependencies; all non-key
attributes must depend on the entire key, not other non-key attributes;
c. no derived
attributes (attribute values that can be calculated or inferred from other
attribute values)
B.
Functional Dependence and Primary Keys -
Functional
Dependency is a relationship
between two attributes such that for a given relation, attribute B is
functionally dependent on attribute A if, for every valid value of A, that
value of A uniquely determines the value of B; represented by: AàB
An
attribute may be dependent on two (or more) attributes:
e.g., (StudentID, CourseNbr, Grade)
***The
instances in a relation do NOT prove a functional dependency exists; only a
thorough working knowledge of the problem domain, obtained through requirements
analysis, is a reliable method for identifying functional dependencies, i.e.,
KNOW YOUR DATA AND YOUR BUSINESS RULES!
C.
Second Normal Form (2NF) – see above
D.
Third Normal Form (3NF) – see above
Referential
Integrity – An integrity
constraint specifying that the value (or existence) of an attribute in one
relation (i.e., a Foreign Key) depends on the value (or existence) of the same
attribute in another relation (as the Primary Key).
IV. Transforming E-R Diagrams into Relations – 4
steps: (see also A, B, & C below)
1. Represent Entities – each entity type (class)
in the ERD becomes a relation with the identifier as the primary key
2. Represent relationships – each relationship
in the ERD must be represented in the relational database design; how this is
represented depends on the relationship’s nature. A relationship may be expressed through a
foreign key, or through a separate table (concatenated key)
3. Normalize the relations – there may be
unnecessary redundancy or other anomalies and we want well-structured
relations.
4. Merge the relations – there may be redundant
relations (two or more relations that describe the same entity type/class) that
must be merged toremove redundancy
A.
Represent Entities – the primary key should satisfy two properties:
1. The value of the key must uniquely identity
every row in the relation
2. The key should be non-redundant; no attribute
in the key can be deleted without destroying its unique identification
B.
Represent Relationships
1. Binary 1:N and 1:1 Relationships
a. 1:N – place the primary key on the “1” side
of the relationships as a Foreign Key on the “N” side of the relationship.
Customer(CustID,
CustName, CustAddr)—Places—Order(OrderNbr, OrderDate, CustID)
b. 1:1 – Add the primary key of A as a foreign
key of B, or primary key of B as foreign key of A, or both
2. Binary and Higher Degree M:N Relationships –
create a new relation with a concatenated key made of both primary keys –
Grade(StudentID, CourseNbr, Grade)
3. Unary Relationships – for a recursive
relationship like Employee manages Employee, use the primary key from one
entity instance as a foreign key in the other entity instance - Employee(EmpNbr, EmpName, EmpAddr, ManagerID)
C.
Summary of Transforming E-R Diagrams into Relations
V. Merging Relationships – if two relations both
use the same primary key, then both relations are actually the same relation
and should be merged
A.
An Example of Merging Relationships
Employee(EmpNbr,
EmpName, EmpAddr) and Worker(EmpNbr, JobClass, PayScale) both have
EmpNbr as a primary key and should be merged into: Employee(EmpNbr,
EmpName, EmpAddr, JobClass, PayScale)
B.
View Integration Problems – problems that may arise
Synonyms – two different names used for the same attribute
Homonyms – a single attribute name that refers to more than
one characteristic – “Account” may mean both Checking Account and Savings
Account
Dependencies
between non-keys – merging two
relations can cause 3NF anomalies; if Student1(StudentID, Major) is
merged with Student2(StudentID, Advisor), but each Major has only one
Advisor, then Student(StudentID, Major, Advisor) is a Transitive
Dependency because Advisor is dependent on Major
VI. Logical database Design for Hoosier Burger
VII. Physical File and Database Design
Schema
– the Physical view of a database (the Computer’s view
Sub-Schema
– the Logical view of a database (the User’s view)
VIII. Designing Fields
Field – the smallest unit of named application data
recognized by system software
A.
Choosing Data Types -
Data
Type – a coding scheme
recognized by system software for representing organizational data; 4
objectives in choosing a data type:
a. Minimize storage space
b. Represent all possible values of the field
c. Improve data integrity for the field
d. Support all data manipulations desired on the
field
1. Calculated Fields (Derived Attributes) – a
field that can be calculated or inferred from other database fields
2. Coding and Compression Techniques – some
attributes have few values and other have a large range of possible values; a
data type should be chosen to represent all the possible values (Domain) but
also keep storage space to a minimum (compression techniques)
B.
Controlling Data Integrity – creating data types helps control data
integrity by limiting possible values for a field; five (5) popular control
methods:
1. Default Value – such as an Area Code on phone
numbers
2. Input Mask/Template – restricts widths and
possible values for a field
3. Range Control – a limited set of alphabetic
or numeric values – a month might ONLY be 3 valid characters: Jan, Feb, Mar, etc., or a department might
only allow certain combinations: ACCT,
MIS, SALES, etc.
4. Referential Integrity – cross referencing
between relations; if EmpNbr 222 is not a valid number in the Employee table,
222 cannot be used as a foreign key where EmpNbr is a foreign key
5. Null Value Control – Null is a non-0,
non-blank character meaning NOTHING, an EMPTY field; nulls may or may not be
allowed at certain points in data entry
IX. Designing Physical Tables – physical tables
have to do with how data will physically be stored on secondary media (concern
is with efficient storage and processing speed), NOT with the logical relations
of normalized tables
Denormalization
– splitting normalized relations into physical tables based on usage of rows
and fields (storing pages efficiently); this can Increase the chance of errors
and inconsistencies
A.
Arranging Table Rows – a computer system stores data in a Physical File;
the WAY the operating system arranges the table rows in the file is called File
Organization; file organization should provide (some of these may conflict with
each other):
a. Fast Data Retrieval
b. High throughput for processing transactions
c. Efficient use of storage space
d. Protection from failures or data loss
e. Minimal need for reorganization
f. Accommodation of growth
g. Security from unauthorized use
Pointer
– a data field that can be used to locate a related field or row of data
1. Sequential File Organizations – rows in the
file are stored in sequence according to a primary key value (NOT always
true! May be in order of transaction,
etc.)
2. Indexed File Organizations – rows may
actually be stored in any sequence, but may be accessed by means of an Index
structure that records their physical location on the media
3. Hashed File Organizations – Address of each
row is calculated by an algorithm
4. Summary of File Organizations
B.
Designing Controls for Files
X. Physical Database Design for Hoosier Burger
XI. Electronic Commerce Application: Designing Databases
Designing Databases for Pine Valley
Furniture’s WebStore