Simsion & Associates - Graeme Simsion

Data Modeling Essentials Third Edition

Contents

CHAPTER 1 WHAT IS DATA MODELING?

1.1. INTRODUCTION
1.2. A DATA-CENTERED PERSPECTIVE
1.3. A SIMPLE EXAMPLE
1.4. DESIGN, CHOICE, AND CREATIVITY
1.5. WHY IS THE DATA MODEL SO IMPORTANT?
1.5.1. Leverage
1.5.2. Conciseness
1.5.3. Data Quality
1.5.4 Summary
1.6. WHAT MAKES A GOOD DATA MODEL?
1.6.1. Completeness
1.6.2. NonRedundancy
1.6.3. Enforcement of Business Rules
1.6.4. Data Reusability
1.6.5. Stability and Flexibility
1.6.6. Elegance
1.6.7. Communication
1.6.8. Integration
1.6.9. Conflicting Objectives
1.7. PERFORMANCE
1.8. DATABASE DESIGN STAGES AND DELIVERABLES
1.8.1. Conceptual, Logical, and Physical Data Models
1.8.2. The Three-Schema Architecture and Technology
1.9. WHERE DO DATA MODELS FIT IN?
1.9.1. Process-Driven Approaches
1.9.2. Data-Driven Approaches
1.9.3. Parallel (Blended) Approaches
1.9.4. Object-Oriented Approaches
1.9.5. Prototyping Approaches
1.9.6. Agile Methods
1.10. WHO SHOULD BE INVOLVED IN DATA MODELING?
1.11. IS DATA MODELING STILL RELEVANT
1.11.1. Costs and Benefits of Data Modeling
1.11.2. Data Modeling and Packaged Software
1.11.3. Data Integration
1.11.4. Data Warehouses
1.11.5. Personal Computing and User-Developed Systems
1.11.6. Data Modeling and XML
1.11.7. Summary
1.12. ALTERNATIVE APPROACHES TO DATA MODELING
1.13. TERMINOLOGY
1.14. WHERE TO FROM HERE?-AN OVERVIEW OF PART I
1.15. SUMMARY

CHAPTER 2 BASIC OF SOUND STRUCTURE

2.1. INTRODUCTION
2.2. AN INFORMAL EXAMPLE OF NORMALIZATION
2.3. RELATIONAL NOTATION
2.4. A MORE COMPLEX EXAMPLE
2.5. DETERMINING COLUMNS
2.5.1. One Fact per Column
2.5.2. Hidden Data
2.5.3. Derivable Data
2.5.4. Determining the Primary Key
2.6. REPEATING GROUPS AND FIRST NORMAL FORM
2.6.1. Limit on Maximum Number of Occurrences
2.6.2. Data Reusability and Program Complexity
2.6.3. Recognizing Repeating Groups
2.6.4. Removing Repeating Groups
2.6.5. Determining the Primary Key of the New Table
2.6.6. First Normal Form
2.7. SECOND AND THIRD NORMAL FORMS
2.7.1. Problems with Tables in First Normal Form
2.7.2. Eliminating Redundancy
2.7.3. Determinants
2.7.4. Third Normal Form
2.8. DEFINITIONS AND A FEW REFINEMENTS
2.8.1. Determinants and Functional Dependency
2.8.2. Primary Keys
2.8.3. Candidate Keys
2.8.4. A More Formal Definition of Third Normal Form
2.8.5. Foreign Keys
2.8.6. Referential Integrity
2.8.7. Update Anomalies
2.8.8. Denormalization and Unnormalization
2.8.9. Column and Table Names
2.9. CHOICE, CREATIVITY, AND NORMALIZATION
2.10. TERMINOLOGY
2.11. SUMMARY

CHAPTER 3 THE ENTITY-RELATIONSHIP APPROACH

3.1. INTRODUCTION
3.2. A DIAGRAMMATIC REPRESENTATION
3.2.1. The Basic Symbols: Boxes and Arrows
3.2.2. Diagrammatic Representation of Foreign Keys
3.2.3. Interpreting the Diagram
3.2.4. Optionality
3.2.5. Verifying the Model
3.2.6. Redundant Arrows
3.3. THE TOP-DOWN APPROACH: ENTITY-RELATIONSHIP MODELING
3.3.1. Developing the Diagram Top Down
3.3.2. Terminology
3.4. ENTITY CLASSES
3.4.1. Entity Diagramming Convention
3.4.2. Entity Naming
3.4.3. Entity Definitions
3.5. RELATIONSHIPS
3.5.1. Relationship Diagramming Conventions
3.5.3. Many-to-Many Relationships
3.5.4. One-to-One Relationships
3.5.5. Self-Referencing Relationships
3.5.6. Relationships Involving Three or More Entity Classes
3.5.7. Transferability
3.5.8. Dependent and Independent Entity Classes
3.5.9. Relationship Names
3.6. ATTRIBUTES
3.6.1. Attribute Identification and Definition
3.6.2. Primary Keys and the Cardinality and Optionality
3.7. MYTHS AND FOLKLORE
3.7.1. Entity Classes without Relationships
3.7.2. Allowed Combinations of Cardinality and Optionality
3.8. CREATIVITY AND E-R MODELING
3.9. SUMMARY

CHAPTER 4 SUBTYPES AND SUPERTYPES

4.1. INTRODUCTION
4.2. DIFFERENT LEVELS OF GENERALIZATION
4.3. RULES VERSUS STABILITY
4.4. USING SUBTYPES AND SUPERTYPES
4.5. SUBTYPES AND SUPERTYPES AS ENTITY CLASSES
4.5.1. Naming Subtypes
4.6. DIAGRAMMING CONVENTIONS
4.6.1. Boxes in Boxes
4.6.2. UML Conventions
4.6.3. Using Tools That Do Not Support Subtyping
4.7. DEFINITIONS
4.8. ATTRIBUTES OF SUPERTYPES AND SUBTYPES
4.9. NONOVERLAPPING AND EXHAUSTIVE
4.10. OVERLAPPING SUBTYPES AND ROLES
4.10.1. Ignoring Real World Overlaps
4.10.2. Modeling Only the Supertype
4.10.3. Modeling the Roles as Participation in Relationships
4.10.4. Using Role Entity Classes and One-to-One Relationships
4.10.5. Multiple Partitions
4.11. HIERARCHY OF SUBTYPES
4.12. BENEFITS OF USING SUBTYPES AND SUPERTYPES
4.12.1. Creativity
4.12.2. Presentation: Level of Detail
4.12.3. Communication
4.12.4. Input to the Design of Views
4.12.5. Classifying Common Patterns
4.12.6. Divide and Conquer
4.13. WHEN DO WE STOP SUPERTYPING AND SUBTYPING?
4.13.1. Differences in Identifiers
4.13.2. Different Attribute Groups
4.13.3. Different Relationships
4.13.4. Different Processes
4.13.5. Migration from One Subtype to Another
4.13.6. Communication
4.13.7. Capturing Meaning and Rules
4.13.8. Summary
4.14. GENERALIZATION OF RELATIONSHIPS
4.14.1. Generalizing Several One-to-Many Relationships to a Single Many-to-Many Relationship
4.14.2. Generalizing Several One-to-Many Relationships to a Single One-to-Many Relationship
4.14.3. Generalizing One-to-Many and Many-to-Many Relationships
4.15. THEORETICAL BACKGROUND
4.16. SUMMARY

CHAPTER 5 ATTRIBUTES AND COLUMNS

5.1. INTRODUCTION
5.2. ATTRIBUTE DEFINITION
5.3. ATTRIBUTE DISAGGREGATION: ONE FACT PER ATTRIBUTE
5.3.1. Simple Aggregation
5.3.2. Conflated Codes
5.3.3. Meaningful Ranges
5.3.4. Inappropriate Generalization
5.4. TYPES OF ATTRIBUTES
5.4.1. DBMS Datatypes
5.5.2. The Attribute Taxonomy in Detail
5.4.3. Attribute Domains
5.4.4. Column Datatyping and Length Requirements
5.4.5. Conversion Between External and Internal Representations
5.5. ATTRIBUTE NAMES
5.5.1. Objectives of Standardizing Attribute Names
5.5.2. Some Guidelines for Attribute Naming
5.6. ATTRIBUTE GENERALIZATION
5.6.1. Options and Trade-Offs
5.6.2. Attribute Generalization Resulting from Entity Generalization
5.6.3. Attribute Generalization within Entity Classes
5.6.4. "First Among Equals"
5.6.5. Limits to Attribute Generalization
5.7. SUMMARY

CHAPTER 6 PRIMARY KEYS AND IDENTITY

6.1. BASIC REQUIREMENTS AND TRADE-OFFS
6.2. BASIC CRITERIA
6.2.1. Applicability
6.2.2. Uniqueness
6.2.3. Minimality
6.2.4. Stability
6.3. SURROGATE KEYS
6.3.1. Performance and Programming Issues
6.3.2. Matching Real-World Identifier
6.3.3. Should Surrogate Keys Be Visible?
6.3.4. Subtypes and Surrogate Keys
6.4. STRUCTURED KEYS
6.4.1. When to Use Structured Keys
6.4.2. Programming and Structured Keys
6.4.3. Performance Issues with Structured Keys
6.4.4. Running Out of Numbers
6.5. MULTIPLE CANDIDATE KEYS
6.5.1. Choosing a Primary Key
6.5.2. Normalization Issues
6.6. GUIDELINES FOR CHOOSING KEYS
6.6.1. Tables Implementing Independent Entity Classes
6.6.2. Tables Implementing Dependent Entity Classes and Many-to-Many Relationships
6.7. PARTIALLY-NULL KEYS
6.8. SUMMARY

CHAPTER 7 EXTENSIONS AND ALTERNATIVES

7.1. INTRODUCTION
7.2. EXTENSIONS TO THE BASIC E-R APPROACH
7.2.1. Introduction
7.2.2. Advanced Attribute Concepts
7.3. THE CHEN E-R APPROACH
7.3.1. The Basic Conventions
7.3.2. Relationships with Attributes
7.3.3. Relationships Involving Three or More Entity Classes
7.3.4. Roles
7.3.5. The Weak Entity Concept
7.3.6. Chen Conventions in Practice
7.4. USING UML OBJECT CLASS DIAGRAMS
7.4.1. A Conceptual Data Model in UML
7.4.2. Advantages of UML
7.5. OBJECT ROLE MODELING
7.6. SUMMARY

CHAPTER 8 ORGANIZING THE DATA MODELING TASK

8.1. DATA MODELING IN THE REAL WORLD
8.2. KEY ISSUES IN PROJECT ORGANIZATION
8.2.1. Recognition of Data Modeling
8.2.2. Clear Use of the Data Model
8.2.3. Access to Users and Other Business Stakeholders
8.2.4. Conceptual, Logical, and Physical Models
8.2.5. Cross-Checking with the Process Model
8.2.6. Appropriate Tools
8.3. ROLES AND RESPONSIBILITIES
8.4. PARTITIONING LARGE PROJECTS
8.5. MAINTAINING THE MODEL
8.5.1. Examples of Complex Changes
6.5.2. Managing Change in the Modeling Process
8.6. PACKAGING IT UP
8.7. SUMMARY

CHAPTER 9 THE BUSINESS REQUIREMENTS

9.1. PURPOSE OF THE REQUIREMENTS PHASE
9.2. THE BUSINESS CASE
9.3. INTERVIEWS AND WORKSHOPS
9.3.1. Should You Model in Interviews and Workshops?
9.3.2. Interviews with Senior Managers
9.3.3. Interviews with Subject Matter Experts
9.3.4. Facilitated Workshops
9.4. RIDING THE TRUCKS
9.5. EXISTING SYSTEMS AND REVERSE ENGINEERING
9.6. PROCESS MODELS
9.7. OBJECT CLASS HIERARCHIES
9.7.1. Classifying Object Classes
9.7.2. A Typical Set of Top-Level Object Classes
9.7.3. Developing an Object Class Hierarchy
9.7.4. Potential Issues
9.7.5. Advantages of the Object Class Hierarchy Technique
9.8. SUMMARY

CHAPTER 10 CONCEPTUAL DATA MODELING

10.1. DESIGNING REAL MODELS
10.2. LEARNING FROM DESIGNERS IN OTHER DISCIPLINES
10.3. STARTING THE MODELING
10.4. PATTERNS AND GENERIC MODELS
10.4.1. Using Patterns
10.4.2. Using a Generic Model
10.4.3. Adapting Generic Models from Other Applications
10.4.4. Developing a Generic Model
10.4.5. When There Is Not a Generic Model
10.5. BOTTOM-UP MODELING
10.6. TOP-DOWN MODELING
10.7. WHEN THE PROBLEM IS TOO COMPLEX
10.8. HIERARCHIES, NETWORKS, AND CHAINS
10.8.1. Hierarchies
10.8.2. Networks (Many-to-Many Relationships)
10.8.3. Chains (One-to-One Relationships)
10.9. ONE-TO-ONE RELATIONSHIPS
10.9.1. Distinct Real-World Concepts
10.9.2. Separating Attribute Groups
10.9.3. Transferable One-to-One Relationships
10.9.4. Self-Referencing One-to-One Relationships
10.9.5. Support for Creativity
10.10. DEVELOPING ENTITY CLASS DEFINITIONS
10.11. HANDLING EXCEPTIONS
10.12. THE RIGHT ATTITUDE
10.12.1. Being Aware
10.12.2. Being Creative
10.12.3. Analyzing or Designing
10.12.4. Being Brave
10.12.5. Being Understanding and Understood
10.13. EVALUATING THE MODEL
10.14. DIRECT REVIEW OF DATA MODEL DIAGRAMS
10.15. COMPARISON WITH THE PROCESS MODEL
10.16. TESTING THE MODEL WITH SAMPLE DATA
10.17. PROTOTYPES
10.18. THE ASSERTIONS APPROACH
10.18.1. Naming Conventions
10.18.2. Rules for Generating Assertions
10.19. SUMMARY

CHAPTER 11 LOGICAL DATABASE DESIGN

11.1. INTRODUCTION
11.2. OVERVIEW OF THE TRANSFORMATIONS REQUIRED
11.3. TABLE SPECIFICATION
11.3.1. The Standard Transformation
11.3.2. Exclusion of Entity Classes from the Database
11.3.3. Classification Entity Classes
11.3.4. Many-to-Many Relationship Implementation
11.3.5. Relationships Involving More Than Two Entity Classes
11.3.6. Supertype/Subtype Implementation
11.4. BASIC COLUMN DEFINITION
11.4.1. Attribute Implementation: The Standard Transformation
11.4.2. Category Attribute Implementation
11.4.3. Derivable Attributes
11.4.4. Attributes of Relationships
11.4.5. Complex Attributes
11.4.6. Multivalued Attribute Implementation
11.4.7. Additional Columns
11.4.8. Column Datatypes
11.4.9. Column Nullability
11.5. PRIMARY KEY SPECIFICATION
11.6. FOREIGN KEY SPECIFICATION
11.6.1. One-to-Many Relationship Implementation
11.6.2. One-to-One Relationship Implementation
11.6.3. Derivable Relationships
11.6.4. Optional Relationships
11.6.5. Overlapping Foreign Keys
11.6.6. Split Foreign Keys
11.7. TABLE AND COLUMN NAMES
11.8. LOGICAL DATA MODEL NOTATIONS
11.9. SUMMARY

CHAPTER 12 PHYSICAL DATABASE DESIGN

12.1. INTRODUCTION
12.2. INPUTS TO DATABASE DESIGN
12.3. OPTIONS AVAILABLE TO THE DATABASE DESIGNER
12.4. DESIGN DECISIONS WHICH DO NOT AFFECT PROGRAM LOGIC
12.4.1. Indexes
12.4.2. Data Storage
12.4.3. Memory Usage
12.5. CRAFTING QUERIES TO RUN FASTER
12.5.1. Locking
12.6. LOGICAL SCHEMA DECISIONS
12.6.1. Alternative Implementation of Relationships
12.6.2. Table Splitting
12.6.3. Table Merging
12.6.4. Duplication
12.6.5. Denormalization
12.6.6. Ranges
12.6.7. Hierarchies
12.6.8. Integer Storage of Dates and Times
12.6.9. Additional Tables
12.7. VIEWS
12.7.1. Views of Supertypes and Subtypes
12.7.2. Inclusion of Derived Attributes in Views
12.7.3. Denormalization and Views
12.7.4. Views of Split and Merged Tables
12.8. SUMMARY

CHAPTER 13 ADVANCED NORMALIZATION

13.1. INTRODUCTION
13.2. INTRODUCTION TO THE HIGHER NORMAL FORMS
13.2.1. Common Misconceptions
13.3. BOYCE-CODD NORMAL FORM
13.3.1. Example of Structure in 3NF but NOT in BCNF
13.3.2. Definition of BCNF
13.3.3. Enforcement of Rules versus BCNF
13.3.4. A Note on Domain Key Normal Form
13.4. FOURTH NORMAL FORM (4NF) AND FIFTH NORMAL FORM (5NF)
13.4.1. Data in BCNF but Not in 4NF
13.4.2. Fifth Normal Form (5NF)
13.4.3. Recognizing 4NF and 5NF Situations
13.4.4. Checking for 4NF and 5NF with the Business Specialist
13.5. BEYOND 5NF: SPLITTING TABLES BASED ON CANDIDATE KEYS
13.6. OTHER NORMALIZATION IN PERSPECTIVE
13.6.1. Normalization and Redundancy
13.6.2. Reference Tables Produced by Normalization
13.6.3. Selecting the Primary Key after Removing Repeating Groups
13.6.4. Sequence of Normalization and Cross-Table Anomalies
13.7. ADVANCED NORMALIZATION IN PERSPECTIVE
13.8. SUMMARY

CHAPTER 14 MODELING BUSINESS RULES

14.1. INTRODUCTION
14.2. TYPES OF BUSINESS RULES
14.2.1. Data Rules
14.2.2. Process Rules
14.2.3. What Rules Are Relevant to the Data Modeler?
14.3. DISCOVERY AND VERIFICATION OF BUSINESS RULES
14.3.1. Cardinality Rules
14.3.2. Other Data Validation Rules
14.3.3. Data Derivation Rules
14.4. DOCUMENTATION OF BUSINESS RULES
14.4.1. Documentation in an E-R Diagram
14.4.2. Documenting Other Rules
14.4.3. Use of Subtypes to Document Rules
14.5. IMPLEMENTING BUSINESS RULES
14.5.1. Where to Implement Particular Rules
14.5.2. Implementation Options: A Detailed Example
14.5.3. Implementing Mandatory Relationships
14.5.4. Referential Integrity
14.5.5. Restricting an Attribute to a Discrete Set of Values
14.5.6. Rules Involving Multiple Attributes
14.5.7. Recording Data That Supports Rules
14.5.8. Rules That May Be Broken
14.5.9. Enforcement of Rules Through Primary Key Selection
14.6. RULES ON RECURSIVE RELATIONSHIPS
14.6.1. Types of Rules on Recursive Relationships
14.6.2. Documenting Rules on Recursive Relationships
14.6.3. Implementing Constraints on Recursive Relationships
14.6.4. Analogous Rules in Many-to-Many Relationships
14.7. SUMMARY

CHAPTER 15 TIME-DEPENDENT DATA

15.1. THE PROBLEM
15.2. WHEN DO WE ADD THE TIME DIMENSION?
15.3. AUDIT TRAILS AND SNAPSHOTS
15.3.1. The Basic Audit Trail Approach
15.3.2. Handling Nonnumeric Data
15.3.3. The Basic Snapshot Approach
15.4. SEQUENCES AND VERSIONS
15.5. HANDLING DELETIONS
15.6. ARCHIVING
15.7. MODELING TIME-DEPENDENT RELATIONSHIPS
15.7.1. One-to-Many Relationships
15.7.2. Many-to-Many Relationships
15.7.3. Self-Referencing Relationships
15.8. DATE TABLES
15.9. TEMPORAL BUSINESS RULES
15.10. CHANGES TO THE DATA STRUCTURE
15.11. PUTTING IT INTO PRACTICE
15.12. SUMMARY

CHAPTER 16 MODELING FOR DATA WAREHOUSES AND DATA MARTS

16.1. INTRODUCTION
16.2. CHARACTERISTICS OF DATA WAREHOUSES AND DATA MARTS
16.2.1. Data Integration: Working with Existing Databases
16.2.2. Loads Rather Than Updates
16.2.3. Less Predictable Database "Hits"
16.2.4. Complex Queries - Simple Interface
16.2.5. History
16.2.6. Summarization
16.3. QUALITY CRITERIA FOR WAREHOUSE AND MART MODELS
16.3.1. Completeness
16.3.2. Nonredundancy
16.3.3. Enforcement of Business Rules
16.3.4. Data Reusability
16.3.5. Stability and Flexibility
16.3.6. Simplicity and Elegance
16.3.7. Communication Effectiveness
16.3.8. Performance
16.4. THE BASIC DESIGN PRINCIPLE
16.5. MODELING FOR THE DATA WAREHOUSE
16.5.1. An Initial Model
16.5.2. Understanding Existing Data
16.5.3. Determining Requirements
16.5.4. Determining Sources and Dealing with Differences
16.5.5. Shaping Data for Data Marts
16.6. MODELING FOR THE DATA MART
16.6.1. The Basic Challenge
16.6.2. Multidimensional Databases, Stars and Snowflakes
16.6.3. Modeling Time-Dependent Data
16.7. SUMMARY

CHAPTER 17 ENTERPRISE DATA MODELS AND DATA MANAGEMENT

17.1. INTRODUCTION
17.2. DATA MANAGEMENT
17.2.1. Problems of Data Mismanagement
17.2.2. Managing Data as a Shared Resource
17.2.3. The Evolution of Data Management
17.3. CLASSIFICATION OF EXISTING DATA
17.4. A TARGET FOR PLANNING
17.5. A CONTEXT FOR SPECIFYING NEW DATABASES
17.5.1. Determining Scope and Interfaces
17.5.2. Incorporating the Enterprise Data Model in the Development Life Cycle
17.6. GUIDANCE FOR DATABASE DESIGN
17.7. INPUT TO BUSINESS PLANNING
17.8. SPECIFICATION OF AN ENTERPRISE DATABASE
17.9. CHARACTERISTICS OF ENTERPRISE DATA MODELS
17.10. DEVELOPING AN ENTERPRISE DATA MODEL
17.10.1. The Development Cycle
17.10.2. Partitioning the Task
17.10.3. Inputs to the Task
17.10.4. Expertise Requirements
17.10.5. External Standards
17.11. CHOICE, CREATIVITY, AND ENTERPRISE DATA MODELS
17.12. SUMMARY

FURTHER READING