 |
|
| Computers Forum Index » Computer - Databases - MS SQL Server » SQL query on normalized table... |
|
Page 1 of 1 |
|
| Author |
Message |
| ... |
Posted: Sun Oct 11, 2009 5:10 am |
|
|
|
Guest
|
In the process of learning SQL I came across a normalized test table
(SQL Server 2005) containing height and weight values like so:
datevalue testid parameterid parametervalue
--------- ------ ----------- --------------
2009-01-01 1 1 163
2009-01-01 1 2 70
2009-01-01 2 1 158
2009-01-01 2 2 77
2009-01-02 1 1 164
2009-01-02 1 2 78
2009-01-02 2 2 55
Parameterid 1 is Height and parmaterid 2 is Weight.
Datevalue/testid/paramterid are all primary keys. I wanted to
display the result as follows:
datevalue testid height weight
--------- ------ ------ ------
2009-01-01 1 163 70
2009-01-01 2 158 77
2009-01-02 1 164 78
2009-01-02 2 NULL 55
The table has over 4 million rows. I tried to write the query below and
it took over 15 seconds:
select distinct(datevalue), testid,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and parameterid=1) as height,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and paramterid=2) as weight
from testing a order by datevalue, testid
Could you please share a more efficient query to write on the normalized
table than the one above?
Later, of curiosity, I denormalized the table and had fields
datevalue, testid, height, weight with datevalue/testid both primary.
Then, I fired the query below and got results in a second:
select datevalue, testid, height, weight from testing1 order by
datevalue, testid
I will greatly appreciate any guidance on writing an efficient query on
the normalized table. Thank you.
--
zf |
|
|
| Back to top |
|
|
|
| Erland Sommarskog... |
Posted: Sun Oct 11, 2009 11:14 am |
|
|
|
Guest
|
(gempak at (no spam) SAE206-06C1DR.jacks.local) writes:
Quote: In the process of learning SQL I came across a normalized test table
(SQL Server 2005) containing height and weight values like so:
datevalue testid parameterid parametervalue
--------- ------ ----------- --------------
2009-01-01 1 1 163
2009-01-01 1 2 70
2009-01-01 2 1 158
2009-01-01 2 2 77
2009-01-02 1 1 164
2009-01-02 1 2 78
2009-01-02 2 2 55
Parameterid 1 is Height and parmaterid 2 is Weight.
Datevalue/testid/paramterid are all primary keys. I wanted to
display the result as follows:
It can be disputed whether this table is normalized, or at least whether
this is a proper design. Height and weight certainly sounds like
different attributes, and thus should be different columns.
This is a design known as EAV, Entity-Attribute-Value, and many people
have only bad words to say about it. Personally, I'm not equally
condemnning. The model definitely has its problems, and your question
is a typical token of this. But EAV gives a higher rate of flexibility
that a rigid data model has difficulties to supply. So there are
situations where EAV has its place, but usually as the least of all
evils. Using it as a matter of routine is definitely wrong.
Quote: datevalue testid height weight
--------- ------ ------ ------
2009-01-01 1 163 70
2009-01-01 2 158 77
2009-01-02 1 164 78
2009-01-02 2 NULL 55
The table has over 4 million rows. I tried to write the query below and
it took over 15 seconds:
select distinct(datevalue), testid,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and parameterid=1) as height,
(select parametervalue from testing where testid=a.testid and
datevalue=a.datevalue and paramterid=2) as weight
from testing a order by datevalue, testid
Could you please share a more efficient query to write on the normalized
table than the one above?
The best way to pivot like you want to do is usually:
SELECT datevalue, testid,
height = MAX(CASE parameterid WHEN 1 THEN parametervalue END),
weight = MAX(CASE parameterid WHEN 2 THEN parametervalue END)
FROM testing
GROUP BY datevalue, testid
The MAX here is only used for aggregating, and does not really have anything
to do with MAX at all. You equally well use MIN or even SUM to get the
same result. If you remove the MAX and GROUP BY, you will get one row
for height and another for weigth. The MAX and GROUP BY helps to get
them on the same row.
--
Erland Sommarskog, SQL Server MVP, esquel at (no spam) sommarskog.se
Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx |
|
|
| Back to top |
|
|
|
| zf... |
Posted: Sun Oct 11, 2009 3:57 pm |
|
|
|
Guest
|
On 2009-10-11, Erland Sommarskog <esquel at (no spam) sommarskog.se> wrote:
Quote: (gempak at (no spam) SAE206-06C1DR.jacks.local) writes:
In the process of learning SQL I came across a normalized test table
(SQL Server 2005) containing height and weight values like so:
datevalue testid parameterid parametervalue
--------- ------ ----------- --------------
2009-01-01 1 1 163
2009-01-01 1 2 70
2009-01-01 2 1 158
2009-01-01 2 2 77
2009-01-02 1 1 164
2009-01-02 1 2 78
2009-01-02 2 2 55
Parameterid 1 is Height and parmaterid 2 is Weight.
Datevalue/testid/paramterid are all primary keys. I wanted to
display the result as follows:
It can be disputed whether this table is normalized, or at least whether
this is a proper design. Height and weight certainly sounds like
different attributes, and thus should be different columns.
I see. I had thought that the reason the test table had parameterid was
because it would allow adding more attributes at a later time, such as
length-of-arms, waist, blood-pressure and so on.
In case if height and weight were columns and if other attributes were
to be added at a later time, there could be gaps in dataset. What would
be a good way of designing a table where there are possibilities of
adding more attributes? Would the gaps in dataset be okay if attributes
are added as columns? I believe this is a subjective question but your
expertise and thoughts will greatly help me.
Quote:
So there are
situations where EAV has its place, but usually as the least of all
evils. Using it as a matter of routine is definitely wrong.
Yes, your explanation does make sense.
Quote:
datevalue testid height weight
--------- ------ ------ ------
2009-01-01 1 163 70
2009-01-01 2 158 77
2009-01-02 1 164 78
2009-01-02 2 NULL 55
The best way to pivot like you want to do is usually:
SELECT datevalue, testid,
height = MAX(CASE parameterid WHEN 1 THEN parametervalue END),
weight = MAX(CASE parameterid WHEN 2 THEN parametervalue END)
FROM testing
GROUP BY datevalue, testid
Great! This helped. The query runs in 4 seconds now.
Quote:
The MAX here is only used for aggregating, and does not really have anything
to do with MAX at all.
Ah, got it. Would you say that table containing:
datevalue, testid, height, weight
(datevalue/testid = primary key)
is more appropriate for the type of results I am hoping to seek? The
disadvantage I see is that when I add another column at a later time for
blood-pressure I will end up having gaps in data that is already stored unless
I explicitly update the blood-pressure for all/most of the 4 million rows.
Thanks for your valuable inputs and assistance.
--
zf |
|
|
| Back to top |
|
|
|
| Erland Sommarskog... |
Posted: Sun Oct 11, 2009 7:27 pm |
|
|
|
Guest
|
zf (zf at (no spam) jak.local) writes:
Quote: I see. I had thought that the reason the test table had parameterid was
because it would allow adding more attributes at a later time, such as
length-of-arms, waist, blood-pressure and so on.
In case if height and weight were columns and if other attributes were
to be added at a later time, there could be gaps in dataset. What would
be a good way of designing a table where there are possibilities of
adding more attributes? Would the gaps in dataset be okay if attributes
are added as columns? I believe this is a subjective question but your
expertise and thoughts will greatly help me.
It all depends on the business case. Normally, you should find out when
you design your database what attributes that are needed. Sure, new
needs can arise during the lifetime of the database, but that's a slow-
changing process, and you would typically add new columns with new
releases. As for existing data, one case to determine what to do on
case-by-case basis. In many cases, the new columns would just be NULL
for existing data.
But then there are situations when users want to be able add new fields
on the fly, and this is a difficult situation where no solution is
really pleasant. I can think of three:
o Lots of user_defined_1, user_defined_2 etc columns.
o Provide a functions for superusers to add columns to the table in
a controlled way.
o The EAV design that you have encountered.
Whichever way you go, you need a way get the application to generate
the queries.
--
Erland Sommarskog, SQL Server MVP, esquel at (no spam) sommarskog.se
Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx |
|
|
| Back to top |
|
|
|
| zf... |
Posted: Sun Oct 11, 2009 7:45 pm |
|
|
|
Guest
|
On 2009-10-11, Erland Sommarskog <esquel at (no spam) sommarskog.se> wrote:
Quote: zf (zf at (no spam) jak.local) writes:
[snipped useful message only for brevity]
But then there are situations when users want to be able add new fields
on the fly, and this is a difficult situation where no solution is
really pleasant. I can think of three:
o Lots of user_defined_1, user_defined_2 etc columns.
o Provide a functions for superusers to add columns to the table in
a controlled way.
o The EAV design that you have encountered.
Whichever way you go, you need a way get the application to generate
the queries.
Thank you for your inputs and providing a good learning experience
here. Your guidance is greatly appreciated.
--
zf |
|
|
| Back to top |
|
|
|
|
|
All times are GMT
The time now is Sun Dec 06, 2009 5:58 am
|
|