Oracle Data Warehousing Unleashed
(Publisher: Macmillan Computer Publishing)
Author(s): Bonnie O'neill et al
ISBN: 0672310775
Publication Date: 09/26/97
Page 3
You have decided to embark on a data warehouse journey. You are probably a little scared and overwhelmed by the immensity of the project in front of you. You are probably also wondering where to start.
This book is designed to be a definitive reference guide for all project staff having anything to do with the data warehouse. Here is an overview of the types of job classes that would benefit from this book:
The library is full of data warehouse books, all concentrating on a different subtopic within the vast reservoir of data warehousing. Here is a brief overview of some of the data warehouse book inventory:
Page 26
The authors felt that there was no comprehensive book about all the areas of data warehousing, all in one volume. We wanted to write the definitive reference guide to developing data warehouses using Oracle. Such a book would contain all the topics mentioned above, and in addition would cover topics such as:
It is therefore our goal in writing this book that you will need only one book on your desk for the definitive reference guide for Oracle data warehousingOracle Data Warehousing Unleashed.
Part I, "Initialization and Project Planning," discusses the rationale behind data warehousing, architecture, and methodology.
Part II, "Preparation: Integration and Cleansing," sets the stage and deals with one of the most important parts of data warehousing: integration challenges and scrubbing. In addition, the
Page 27
section describes important data administration tasks such as defining your data, and the central (and often neglected) role of metadata.
Part III, "Logical and Physical Database Design," shows how designing a warehouse is uniquely different from designing an operational system. It covers logical and physical design and storage/capacity planning. In addition, it also covers dimensions, special dimensions such as time and space, and star schema design. Lastly, no section covering physical design and performance issues is complete without a thorough discussion of indexes.
Part IV, "Management of a Data Warehouse," covers DBA-specific management issues such as security, backup/recovery, and loads.
Part V, "Data Access," presents a whole range of exciting new technologies to assist users in accessing and viewing their data, such as intranets, data mining, and ad-hoc query tools. In addition, this section introduces data access constructs that are related to the data warehouse: the ODS and data marts. Query performance tuning is also discussed.
A number of helpful appendixes serve as further reference when using Oracle for data warehousing. Appendix A supplies a checklist of tasks for the project manager. Appendix B presents a list of the tool vendors by product class. Appendixes C and D reference conferences, magazines, journals, and Web sites specializing in data warehousing. Appendix E provides an overview of Oracle8's data warehouse_specific new features. Last, Appendix F is a list of references which we found helpful.
Read the first few chapters to get an overview of what data warehousing is all about. This book does not guide you through installing Oracle; for this task, you can refer to our companion volume, Oracle Unleashed, or the Oracle DBA Survival Guide. Then you can focus on the areas of the book of special interest to you (as mentioned above).
Commands and code are presented in a special monospaced, computer typeface. In the text itself, we have tried to present all SQL keywords in the text in uppercase. Code output examples are also presented in computer typeface.
| WARSTORY |
We have flagged war stories with a special box so the reader can get an idea how other companies have experienced the things we talk about in the book. We have also peppered the book with many examples to illustrate each point. Other sidebars include notes, cautions, and tips. |
Page 28
We have also flagged Oracle8 new features with an icon, so those users who are going to be using Oracle8 can see at a glance the new features that pertain specifically to them, and those who are migrating to the new release can be aware of them to take special advantage of them whenever possible. Other users can ignore the text flagged Oracle8, because it doesn't pertain to them. In other circumstances when discussing specific features, we have tried to indicate in which Oracle release it was first introduced. We know from personal experience that there is a wide variety of releases being currently used at any one time, and we have attempted to keep this in mind when discussing specific features.
Data warehousing is an adventure: It is very exciting and rewarding, with new challenges always ready to meet you. We hope you will have fun learning about data warehousing and producing an infrastructure that will make a very important contribution to your company's future. And we hope that this book will be a useful reference guide, serving as a roadmap to facilitate your data warehouse adventure.
Introductionxxv
Part I Initialization and Project Planning
What Is a Data Warehouse? 5
What a Data Warehouse Is Not 5
User Access to Data: A Historical Perspective 6
Summary of Historical Data Access Problems 7
The Data Warehouse Context: Facilitate Business 8
Why Oracle? 10
Business Drivers for the Data Warehouse 11
Technical Drivers 14
Solving the Integration Problem 14
Be Prepared 19
Scope and Complexity 20
Political Battles 20
Inadequate Funding 20
Data Integration and Data Semantics 21
Different Methodology 21
Different Skill Set 21
Data Access 22
Summary 22
Six Steps to Develop the Architecture 24
The Data Warehouse Infrastructure 25
Data Warehouse System Infrastructure 26
Metadata 28
Data Discovery 29
Data Acquisition 29
Data Distribution 29
User Access 30
Sybase IQ 31
Data Management Architecture: Data Layers 31
Ongoing Maintenance: Warehouse Infrastructure 34
Summary 35
What Is a Methodology? 38
Why the Data Warehouse Methodology Is Different 39
Page 6
Project Roles 40
Phase One: Define Objectives and Scope 45
Phase Two: Define Architecture 47
Phase Three: Begin to Build Infrastructure 48
Phase Four: Begin First Iteration 49
Phase Five: Set Up Ongoing Maintenance Processes,
First Iteration 52Phase Six: Set Up User Analysis Environment,
First Iteration 53Phase Seven: Release First Iteration, Begin Second Iteration 54
Changing Terms 56
Using Individual Interviews Instead of JAD Sessions 57
Changes You Should Never Make 58
The Heart of a Champion (with the Strength of a Gorilla) 61
Data Discovery 62
Initial Requirements 63
Sizing and Long Range Sizing Plan 63
Metadata and Data Dictionaries 63
Logical Data Model 64
Extraction and/or Transformation Software 64
Populating the Data Warehouse 65
End-User Involvement 65
Neglect Executive Sponsorship 71
Avoid Sharing the Project with an End-User Project
Manager 71Limit End-User Involvement 73
Use a Classic "Waterfall" Project Methodology 74
Use a Technical Project Manager without Data
Warehouse Experience 75Freeze the Specification as Early as Possible 75
Work Multiple Projects in Parallel 76
Page 7
Simultaneously Implement Operational Systems and
Related Warehouses 76Postpone or Avoid Metadata 77
Focus on Technology First 77
Create Your Own Data Warehouse Software 79
Rely on Unproved Technology 80
Rely on "Silver Bullet" Technology 80
Run OLAP on Top of the Operational Database 82
Run the Data Warehouse Database on the Server for
the Operational System 83Use Normalized Data Structures 83
Define a Complex Warehouse Architecture 84
Skimp on Disk Storage 85
Ignore Data Movement Metrics 86
Defer Disaster Planning 87
The Executive Suite 89
The End-User Department(s) 90
Project Managers and Project Leaders 90
IT/IS Management (Also for Consulting Managers) 91
Technical Team Members (Whether in IT/IS or in a
Consulting Organization) 94
Part II Preparation: Integration and Cleansing
Data Marts: Propagation of Data Chaos 105
Mainframe Extracts 108
PCs and Client/Server Development 108
The Ramifications of Not Performing Semantic Integration 110
Same Fact? How Can You Be Sure? 111
Word Meaning 113
Unit of Measure 113
Level of Accuracy 113
Format 113
Timeliness 114
Page 8
Data Element Definitions 122
Business Term Definitions 122
Bringing the Two Definitions Together 123
Stepping Toward the Future 125
Changing Relationships 128
Changing Unique Identifiers 128
Historical Definitions and Names 129
Tracking Historical Fields 129
Issues with Addresses 133
Attribute Naming 134
Abbreviations as Attributes 135
Issues with Names as Attributes 135
Format 135
Database Naming Conventions 135
Oracle Objects 136
Physical DDL Storage/Maintenance 144
Problems with Data Stewardship 146
The Magilla Gorilla Factor 147
Control Metadata 153
Change Control in the Data Warehouse 153
Business Metadata: Ideal World Scenario 161
What Buying a Repository "Buys You" 163
What to Look for in Repositories 164
Page 9
The Metadata Coalition's Metamodel 167
The Electronics Industries Association's CASE Data
Interchange Format (CDIF) 172
Creative Customization 172
Single Vendor Solutions 172
Vendor Partnerships 173
Poor Business Decisions 178
Wasting Money 178
Lost Information 178
Merging Data Sources 183
Integration of Information: Cross Referencing 184
Mailing List Merging 185
Application Merging 185
Acquisition of Another Company 186
Step 1: Data Discovery and Sleuthing 187
Step 2: Categorize/Classify 188
Step 3: Decide Action/Document 188
Step 4a: Create Transforms/Generate Scrub Code 190
Step 4b: Change Business Processes or Legacy Systems 190
Step 5: Scrub Error Handling 191
Step 6: Check for Metadata Drift Over Time 192
Two Different Kinds of Problems: Field versus Value 192
Embedded Values: One Column, Different Values 193
Subtypes and Supertypes 194
Determining the Source of Record 194
One Column, Different Sources 194
Synonyms 195
Homonyms 195
Page 10
Sleuthing/Discovery Tools 196
Extract/Transform/Code: Scrub Tools 197
Tools That Address Specific Problems 199
Which Is Best? 199
Part III Logical and Physical Database Design
Star Schema Design 208
Normal Forms 209
Other Normalization Factors 216
Benefits of Normalization 219
Drawbacks of Normalization 219
Normalization as the Baseline for Warehouse Design 220
Advantages of Denormalization 220
Disadvantages of Normalization 221
Guidelines 221
Basic Denormalization Techniques 222
Changing Column Definition 225
Redefining Tables 228
Data Partitioning/Fragmentation 228
Range of a Scale 236
Nominal Scales 237
Absolute Scales 237
Ordinal Scales 237
Rank Scales 238
Quantitative Scales 238
Page 11
Interval Scales 238
Ratio Scales 238
Use Existing Standards 242
Avoid NULLs 243
Translate Codes for the User 243
Keep the Codes in the Database 243
Drill Downs Across Dimensions 254
Drilling to Another Fact Table 262
Tuning Tips for Regular Star Optimization
(Release 7.2 and Higher) 264Star Transformation Optimization Tips 265
HHCODE: A Slick Way to Formulate N-Dimension
Spatial Intersections 273How Does HHCODE Work? 274
Other Key Features of the Spatial Data Option 275
The Analytic Approach 278
The Empirical Approach 278
The Incremental Approach 278
Page 12
How to Slam-Dunk an Initial Estimate 279
Re-Estimate Regularly! 282
Planning for the Long Haul 283
Extract Storage Requirements 284
Data Cleaning Storage Requirements 285
Data Transformation Storage Requirements 286
Sort Workspace 287
Loads Use Storage Too! 288
The Operational Data Store (ODS) 289
The Main Warehouse Database 289
Data Marts 291
Query Workspace and Front-End Tools 294
Query Results and Report Distribution 294
Ancillary Data 294
Archival Data 295
Backup Data 295
Operating System Paging and Swap Space 297
Datatypes 298
Calculating Theoretical Size of Database Objects 299
Storage Types 301
Physically Locating Your Data 309
Old Style Physical Tuning 309
RAID 5 Makes It EasierBut Not Free 309
Denormalize, Normalize, Overnormalize 318
Special Tables 321
Star Schema 322
Partition Views and Parallelism 324
Partitioned Tables 326
Partitioning on Dimensions Other Than Time 328
Consider Using Parallel Technology 328
Memory Management 329
Page 13
Striping and Object Placement 330
Read-Only Tablespaces 331
Rollback Segments 332
Freelists 332
Indexes 332
Reclaim Space by Utilizing ALTER TABLE DEALLOCATE 332
Explore the Hidden Parameters 334
Define Temporary Tablespaces as TEMPORARY 335
SORT_AREA_SIZE 336
Pinging: The P Word 340
How Many Freelists Do You Need? 342
Skewing 343
The Optimizer 343
Distribution of Data to Maximize Parallelization 348
B*-Tree Indexes 352
B-Tree Index Advantages 354
Disadvantages 355
CREATE INDEX Syntax 356
What Is a Bitmap Index? 358
When to Use Bitmap Indexes 358
CREATE BITMAP INDEX Syntax 360
Restrictions (Features!) 360
Bitmap Query 361
Bitmap INIT.ORA Parameters 361
CREATE CLUSTER Syntax 362
Hash Cluster Parameters 363
Pre-Allocation of Space for a Hash Cluster 366
Hash Cluster Advantages 366
Hashing Disadvantages 366
Page 14
Part IV Management of a Data Warehouse
System Privileges 372
Object Privileges 373
Listing of User Privileges 375
Listing of System Privileges 376
Listing of Object Privileges 376
Column Privilege Listing 377
Creating Roles 379
Granting Privileges to Roles 379
Granting Roles to Users or Roles 380
Enabling Roles 380
Enabling Roles by Setting Default Roles 380
Predefined Roles 381
Revoking Privileges from Roles 382
Revoking Roles from Users or Roles 382
Dropping Roles 382
Data Dictionary Role Views 383
Backup Strategies in NOARCHIVELOG Mode 389
Backup Strategies in ARCHIVELOG Mode 390
Listing the Files for Backup 391
Performing Full Offline Backups 393
Performing Online Partial Backups 394
Enabling ARCHIVELOG Mode 396
Steps to Enabling ARCHIVELOG Mode 397
Checking Archive Status 397
Redo Log Information 399
Recovery Types 400
Page 15
Media Recovery 402
Control File Recovery 402
Instance Recovery 402
NOARCHIVELOG Recovery 403
Recovery Tablespaces/Data Files 403
Load Consistency 410
Referential Integrity 411
Freezing Time Variant Data 412
Disk Storage Capacity 424
Disk and Tape Throughput Capacity 425
Network Capacity 426
Mainframe Channel, Front-End Processor, and
Communication Processor Capacity 426
Trade-Offs 433
Is the Initial Load Representative of Ominous
Things to Come? 434Extracts and Data Movement 434
Dealing with Incremental Loads 435
SQL*Loader Options 435
Loading in Parallel 436
Network Protocol Options 437
Network Strategy for an ODS 437
Page 16
Do What You Can on the Source System 438
Dealing with Computed Fields 439
Aggregate Processing Creativity 440
What Do You Do About Referential Integrity? 440
The Shared Pool 447
Data Block Buffer 452
Other Memory Considerations 456
Size of Database Buffer 456
Tuning for Sorts 457
UTLBSTAT and UTLESTAT 458
Rule-Based versus Cost-Based Optimizer 459
PCTFREE 461
Server Manager 462
Part V Data Access
Multi-Dimensional 483
OLAP 484
Page 17
Slice and Dice 485
Drill 485
End-User Tools Versus Front-End Development Tools 487
Ad Hoc Query 487
Report Writers 487
EIS Tools 488
Data Mining Tools 488
Web Browser-Based Tools 489
Oracle Discoverer 489
Oracle Express 491
Oracle Developer 2000 491
BusinessObjects 492
DataMiner 492
WebIntelligence 492
Impromptu and Power Play 493
Scenario 493
The Correct Performance Tuning Method 495
Tuning the Back-End 496
Sort Performance Problems 496
Query Optimization 497
Data Movement 497
Memory Allocation 498
Ease of Use: A Different Kind of Performance Tuning 498
Selection Approach 500
Selection Criteria 507
A Word on the RFx 511
Information Sources 513
Decision Trees 525
Simple Descriptive Statistics 526
Page 18
Linear (and Other) Regression Analysis 528
Cluster Analysis 529
Fuzzy Logic 530
Neural Networks 531
Rule-Based Optimization 545
Cost-Based Optimization 545
Collecting Statistics 546
Analyzing Tables 546
Analyzing Indexes 547
Cost Optimization Goal 548
Hints Overview 548
Index Usage 551
Index Usage Examples 551
Index Suppression 551
Index Suppression Examples 552
Forcing Index Suppression 552
Forcing Index Usage 553
GROUP BY 553
Query Restructuring 554
Query OptimizationOR / UNION 555
Query OptimizationNOT EXISTS 555
Query OptimizationCombine 556
The PLAN_TABLE 557
Setting Up a Plan 558
EXPLAIN_PLAN Operations 560
How to Run TKPROF 576
TKPROF Syntax 579
Interpreting the TKPROF Results 581
Tuning Tips for Regular Star Optimization
(Releases 7.2 and Higher) 584Star Transformation Optimization Tips 585
Page 19
Satellite Data Marts 590
Feeder Data Marts 593
Partition Data Marts 595
The Mini-Warehouse Data Mart 596
Independent Data Marts 600
Mixed Data Mart Architectures 606
Data Mart Implementation Tools 610
The Oracle Data Mart Suite 611
Oracle Express and Data Marts 611
Part VI Appendixes
A Data Warehouse Checklists for Project Managers 639
B Survey of Tool Vendors 645
Page 20
C Survey of Conferences and Seminars 649
D Survey of Publications and Journals 653
E Oracle8 Features for Data Warehousing 657
F References 661
Page 3
Introductionxxv
Part I Initialization and Project Planning
Part II Preparation: Integration and Cleansing
Part III Logical and Physical Database Design
Part IV Management of a Data Warehouse
Part V Data Access
Page 4
Part VI Appendixes
A Data Warehouse Checklists for Project Managers639
B Survey of Tool Vendors645
C Survey of Conferences and Seminars649
D Survey of Publications and Journals653
E Oracle8 Features for Data Warehousing 657
F References661
Page 5