Main Page | Report this Page
Computers Forum Index  »  Computer - Databases  »  about star schema and surrogate keys...
Page 1 of 1    

about star schema and surrogate keys...

Author Message
Marco Mariani...
Posted: Mon May 04, 2009 1:39 pm
Guest
I've done my homemade small data warehouse with Postgres, and a simple
reporting tool.

What do I gain by having a surrogate integer time_key in my table
dim_time (one row per day), as opposed to a DATE column?

The same question holds for other dimensions that could already have a
sound natural key, usually a short alphanumeric code.

Is there really a performance gain in modern DBs?
Or should I do that for conformance with query tools that I could use
someday?
The biggest dimension has 9000 rows, and about 2M rows for facts.

Thanks!
 
Walter Mitty...
Posted: Mon May 04, 2009 2:13 pm
Guest
"Marco Mariani" <marco at (no spam) sferacarta.com> wrote in message
news:wJyLl.2913$Ux.702 at (no spam) tornado.fastwebnet.it...
Quote:
I've done my homemade small data warehouse with Postgres, and a simple
reporting tool.

What do I gain by having a surrogate integer time_key in my table dim_time
(one row per day), as opposed to a DATE column?

The same question holds for other dimensions that could already have a
sound natural key, usually a short alphanumeric code.

Is there really a performance gain in modern DBs?
Or should I do that for conformance with query tools that I could use
someday?
The biggest dimension has 9000 rows, and about 2M rows for facts.

Thanks!

I can think of one advantage for some dimensions. It allows the dimension
to be time varying.

For example, think of a star where the fact is a product sale. One of the
dimensions is product, and one of the attributes of product is price.
But the latest price for the product is not as relevant as the price that
was true on the date the sale took place. That means that, when the price
changes, I don't want to update the price in the product row. Instead, I
want to insert a new row with the same product and the new price. But if
I'm using ProductID as the key to the product dimension, I can't do that.
If I 'm using a surrogate key, I can. Of course, ProductID will be one of
the attributes in a dimension table.

This advantage does not occur in a date or time dimension.
 
Marco Mariani...
Posted: Mon May 04, 2009 3:55 pm
Guest
Walter Mitty wrote:


Quote:
But the latest price for the product is not as relevant as the price that
was true on the date the sale took place. That means that, when the price
changes, I don't want to update the price in the product row. Instead, I
want to insert a new row with the same product and the new price. But if
I'm using ProductID as the key to the product dimension, I can't do that.
If I 'm using a surrogate key, I can. Of course, ProductID will be one of
the attributes in a dimension table.

I already use a surrogate in cases like that. So it seems like there is
nothing I overlooked.

thank you!
 
--CELKO--...
Posted: Tue May 05, 2009 6:27 pm
Guest
Quote:
What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code.

These fake pointers are harmful. How do you verify and validate this
magical number? At best it is redundant. We live with 64 bit, multi-
core hardware, not 16 bit single processor 1970's minicomputers.
 
Marco Mariani...
Posted: Wed May 06, 2009 2:47 pm
Guest
--CELKO-- wrote:

Quote:
What do I gain by having a surrogate integer time_key in my table dim_time (one row per day), as opposed to a DATE column? The same question holds for other dimensions that could already have a sound natural key, usually a short alphanumeric code.

These fake pointers are harmful. How do you verify and validate this
magical number?

I am not using the magical number right now in the context I described.
You don't need to convince me about using the "real stuff", you and
others already did a long time ago, and I thank you for that.

But I still need to understand which of the practices I use for my
day-to-day invoice stuff and contract management.... which practices
still make sense in the denormalized, zero-gravity platypus-shaped world
of my toy datawarehouse, and which ones don't.

I don't need to worry about verifying and validating anything, beyond
the ETL procedure, because the DWH is small enough that I drop and
recreate everything every night. It's never updated by anyone.

My main concern here is to have fast queries, and to avoid clumsiness in
the building process since I am not an expert and debugging that stuff
is painful at times.
I'm sure the next one will be cleaner Smile
 
david...
Posted: Sun Jun 07, 2009 2:32 pm
Guest
As a code programmer with some database experience,
I think that surrogate keys are a natural for programmers.
Surrogate keys are related to one of the central lessons
of computer programming, related to the reason why we
call our product 'soft' ware.

As such I expect to see surrogate keys overused
anywhere that code programmers are allowed anywhere
near to database schemas. Since people with code
programming experience are frequently involved in schema
design, I think that you can always expect that there
will be discussion about the use of surrogate keys, even
when you are discussing data warehousing, but for
data warehousing you just have to learn to ignore it.

(david)


"Marco Mariani" <marco at (no spam) sferacarta.com> wrote in message
news:wJyLl.2913$Ux.702 at (no spam) tornado.fastwebnet.it...
Quote:
I've done my homemade small data warehouse with Postgres, and a simple
reporting tool.

What do I gain by having a surrogate integer time_key in my table dim_time
(one row per day), as opposed to a DATE column?

The same question holds for other dimensions that could already have a
sound natural key, usually a short alphanumeric code.

Is there really a performance gain in modern DBs?
Or should I do that for conformance with query tools that I could use
someday?
The biggest dimension has 9000 rows, and about 2M rows for facts.

Thanks!
 
 
Page 1 of 1    
All times are GMT
The time now is Wed Dec 09, 2009 12:59 am