|
The Superbly Fascinating World of Business
Intelligence
By Kalman Toth, M.Phil., M.Phil., MCDBA, MCITP
July 16, 2005
When you are in OLTP looking at data columns they appear to be
number columns or containing text. The same was true in pre-RDBMS
age when rows were called records and columns masqueraded as fields.
You have to enter the world of Business Intelligence to realize
that column and column is very different.
Let's take a FirstNationalTrust Bank's AccountMaster table:
AccountNumber
FirstName
LastName
Address1
Address2
City
State
Zipcode
Gender
Age
Occupation
Email
Phone
AccountType
Balance
DepositCurrentMonth
WithdrawalCurrentMonth
AccountCreatedOn
LastTransactionDate
Analyzing this table from the viewpoint of BI we can start with
classifying into four types:
Keys
Detail attributes
Dimensions
Measures
AccountNumber is a key. Address1 and Address2 are detail attributes.
Now the interesting BI part starts. Gender is a dimension. Balance
is a measure. Usually used with the SUM and AVG functions when
BI analysis is performed. Well, gender measures something, so
why are we calling it a dimension? We call a column a dimension,
if we use it to categorize data. Typically columns we use in the
GROUP BY clause are dimensions.
Measures are typically the numeric columns, but not necessarily
so. If we use Age in a GROUP BY, it is a dimension. If we place
avg(Age) into the SELECT list, it functions as a calculated measure.
In OLTP there is no concept of hierarchy of columns. In the fascinating
world of Business Intelligence, State and City would be set up
as a 2-level dimension. When you browse this dimension, first
it would expand to states, then if you click to expand any state,
you would get the cities in that state.
Email appears to be detail attribute. Yet we can mine some BI
out of it. If we take the domain part of the email, we can categorize
our data according to email domain.
Date is the ultimate dimension column. It is naturally multi
level: year, quarter, month, week and day.
In summary, Business Intelligence is a vast, well organized,
pre-calculated collection of
SELECT Dimension1, Dimension2
.sum(Measure1), avg(Measure2)
FROM
.WHERE
.GROUP BY Dimension1, Dimension2
..
These pre-calculated GROUP BYs are cleverly connected to single
or multi-level dimensions.
The result is an OLAP cube, the workhorse of Business Intelligence.
|