Main Page | Report this Page
Computers Forum Index  »  Computer - Databases - MS SQL Server  »  SQL query on normalized table...
Page 1 of 1    

SQL query on normalized table...

Author Message
...
Posted: Sun Oct 11, 2009 5:10 am
Guest
In the process of learning SQL I came across a normalized test table
(SQL Server 2005) containing height and weight values like so:

datevalue testid parameterid parametervalue
--------- ------ ----------- --------------
2009-01-01 1 1 163
2009-01-01 1 2 70
2009-01-01 2 1 158
2009-01-01 2 2 77
2009-01-02 1 1 164
2009-01-02 1 2 78
2009-01-02 2 2 55

Parameterid 1 is Height and parmaterid 2 is Weight.
Datevalue/testid/paramterid are all primary keys. I wanted to
display the result as follows:

datevalue testid height weight
--------- ------ ------ ------
2009-01-01 1 163 70
2009-01-01 2 158 77
2009-01-02 1 164 78
2009-01-02 2 NULL 55

The table has over 4 million rows. I tried to write the query below and
it took over 15 seconds:

select distinct(datevalue), testid,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and parameterid=1) as height,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and paramterid=2) as weight
from testing a order by datevalue, testid

Could you please share a more efficient query to write on the normalized
table than the one above?

Later, of curiosity, I denormalized the table and had fields
datevalue, testid, height, weight with datevalue/testid both primary.
Then, I fired the query below and got results in a second:

select datevalue, testid, height, weight from testing1 order by
datevalue, testid

I will greatly appreciate any guidance on writing an efficient query on
the normalized table. Thank you.

--
zf
 
Erland Sommarskog...
Posted: Sun Oct 11, 2009 11:14 am
Guest
(gempak at (no spam) SAE206-06C1DR.jacks.local) writes:
Quote:
In the process of learning SQL I came across a normalized test table
(SQL Server 2005) containing height and weight values like so:

datevalue testid parameterid parametervalue
--------- ------ ----------- --------------
2009-01-01 1 1 163
2009-01-01 1 2 70
2009-01-01 2 1 158
2009-01-01 2 2 77
2009-01-02 1 1 164
2009-01-02 1 2 78
2009-01-02 2 2 55

Parameterid 1 is Height and parmaterid 2 is Weight.
Datevalue/testid/paramterid are all primary keys. I wanted to
display the result as follows:

It can be disputed whether this table is normalized, or at least whether
this is a proper design. Height and weight certainly sounds like
different attributes, and thus should be different columns.

This is a design known as EAV, Entity-Attribute-Value, and many people
have only bad words to say about it. Personally, I'm not equally
condemnning. The model definitely has its problems, and your question
is a typical token of this. But EAV gives a higher rate of flexibility
that a rigid data model has difficulties to supply. So there are
situations where EAV has its place, but usually as the least of all
evils. Using it as a matter of routine is definitely wrong.

Quote:
datevalue testid height weight
--------- ------ ------ ------
2009-01-01 1 163 70
2009-01-01 2 158 77
2009-01-02 1 164 78
2009-01-02 2 NULL 55

The table has over 4 million rows. I tried to write the query below and
it took over 15 seconds:

select distinct(datevalue), testid,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and parameterid=1) as height,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and paramterid=2) as weight
from testing a order by datevalue, testid

Could you please share a more efficient query to write on the normalized
table than the one above?

The best way to pivot like you want to do is usually:

SELECT datevalue, testid,
height = MAX(CASE parameterid WHEN 1 THEN parametervalue END),
weight = MAX(CASE parameterid WHEN 2 THEN parametervalue END)
FROM testing
GROUP BY datevalue, testid

The MAX here is only used for aggregating, and does not really have anything
to do with MAX at all. You equally well use MIN or even SUM to get the
same result. If you remove the MAX and GROUP BY, you will get one row
for height and another for weigth. The MAX and GROUP BY helps to get
them on the same row.

--
Erland Sommarskog, SQL Server MVP, esquel at (no spam) sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
zf...
Posted: Sun Oct 11, 2009 3:57 pm
Guest
On 2009-10-11, Erland Sommarskog <esquel at (no spam) sommarskog.se> wrote:
Quote:
(gempak at (no spam) SAE206-06C1DR.jacks.local) writes:
In the process of learning SQL I came across a normalized test table
(SQL Server 2005) containing height and weight values like so:

datevalue testid parameterid parametervalue
--------- ------ ----------- --------------
2009-01-01 1 1 163
2009-01-01 1 2 70
2009-01-01 2 1 158
2009-01-01 2 2 77
2009-01-02 1 1 164
2009-01-02 1 2 78
2009-01-02 2 2 55

Parameterid 1 is Height and parmaterid 2 is Weight.
Datevalue/testid/paramterid are all primary keys. I wanted to
display the result as follows:

It can be disputed whether this table is normalized, or at least whether
this is a proper design. Height and weight certainly sounds like
different attributes, and thus should be different columns.

I see. I had thought that the reason the test table had parameterid was
because it would allow adding more attributes at a later time, such as
length-of-arms, waist, blood-pressure and so on.

In case if height and weight were columns and if other attributes were
to be added at a later time, there could be gaps in dataset. What would
be a good way of designing a table where there are possibilities of
adding more attributes? Would the gaps in dataset be okay if attributes
are added as columns? I believe this is a subjective question but your
expertise and thoughts will greatly help me.

Quote:

So there are
situations where EAV has its place, but usually as the least of all
evils. Using it as a matter of routine is definitely wrong.

Yes, your explanation does make sense.


Quote:

datevalue testid height weight
--------- ------ ------ ------
2009-01-01 1 163 70
2009-01-01 2 158 77
2009-01-02 1 164 78
2009-01-02 2 NULL 55


The best way to pivot like you want to do is usually:

SELECT datevalue, testid,
height = MAX(CASE parameterid WHEN 1 THEN parametervalue END),
weight = MAX(CASE parameterid WHEN 2 THEN parametervalue END)
FROM testing
GROUP BY datevalue, testid

Great! This helped. The query runs in 4 seconds now.

Quote:

The MAX here is only used for aggregating, and does not really have anything
to do with MAX at all.

Ah, got it. Would you say that table containing:
datevalue, testid, height, weight
(datevalue/testid = primary key)
is more appropriate for the type of results I am hoping to seek? The
disadvantage I see is that when I add another column at a later time for
blood-pressure I will end up having gaps in data that is already stored unless
I explicitly update the blood-pressure for all/most of the 4 million rows.

Thanks for your valuable inputs and assistance.

--
zf
 
Erland Sommarskog...
Posted: Sun Oct 11, 2009 7:27 pm
Guest
zf (zf at (no spam) jak.local) writes:
Quote:
I see. I had thought that the reason the test table had parameterid was
because it would allow adding more attributes at a later time, such as
length-of-arms, waist, blood-pressure and so on.

In case if height and weight were columns and if other attributes were
to be added at a later time, there could be gaps in dataset. What would
be a good way of designing a table where there are possibilities of
adding more attributes? Would the gaps in dataset be okay if attributes
are added as columns? I believe this is a subjective question but your
expertise and thoughts will greatly help me.

It all depends on the business case. Normally, you should find out when
you design your database what attributes that are needed. Sure, new
needs can arise during the lifetime of the database, but that's a slow-
changing process, and you would typically add new columns with new
releases. As for existing data, one case to determine what to do on
case-by-case basis. In many cases, the new columns would just be NULL
for existing data.

But then there are situations when users want to be able add new fields
on the fly, and this is a difficult situation where no solution is
really pleasant. I can think of three:

o Lots of user_defined_1, user_defined_2 etc columns.
o Provide a functions for superusers to add columns to the table in
a controlled way.
o The EAV design that you have encountered.

Whichever way you go, you need a way get the application to generate
the queries.

--
Erland Sommarskog, SQL Server MVP, esquel at (no spam) sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
zf...
Posted: Sun Oct 11, 2009 7:45 pm
Guest
On 2009-10-11, Erland Sommarskog <esquel at (no spam) sommarskog.se> wrote:
Quote:
zf (zf at (no spam) jak.local) writes:
[snipped useful message only for brevity]

But then there are situations when users want to be able add new fields
on the fly, and this is a difficult situation where no solution is
really pleasant. I can think of three:

o Lots of user_defined_1, user_defined_2 etc columns.
o Provide a functions for superusers to add columns to the table in
a controlled way.
o The EAV design that you have encountered.

Whichever way you go, you need a way get the application to generate
the queries.

Thank you for your inputs and providing a good learning experience
here. Your guidance is greatly appreciated.

--
zf
 
 
Page 1 of 1    
All times are GMT
The time now is Thu Nov 26, 2009 7:44 pm