Main Page | Report this Page
Computers Forum Index  »  Computer - Databases - Informix  »  Can I optimize the engine here or do I have to get the...
Page 1 of 2    Goto page 1, 2  Next

Can I optimize the engine here or do I have to get the...

Author Message
Malc...
Posted: Fri Nov 06, 2009 12:51 pm
Guest
IDS 9.30HC5 running on HP-UX 11i. Engine upgrade is not an option,
don't suggest it, see my previous posts going back some years!

Here's the scenario: a table "cm" has 3.2million rows and we query it
using something like (reduced for posting):
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"

The only bits that matter are the column "name_type", which is char(1)
and unindexed, and "<other_column>", which an is indexed char(60) -
the upper60() call is just a functional index we set up that helps on
that column.

A query using just "upper60(<other_column>) MATCHES "LA*"" is instant,
as expected.

If we include "name_type = "N"", it bogs and takes 1min45sec, at
least.
Here's sqexplain for each case:

CASE 1:
=======
SELECT<column list> FROM cm
WHERE
upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
634086

1) dba.cm: INDEX
PATH

(1) Index Keys: dba.upper60(npname_name) (Serial, fragments:
ALL)
Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'


CASE 2:
=======
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
2

1) dba.cm: INDEX
PATH

Filters: dba.cm.name_type =
'N'

(1) Index Keys: dba.upper60(npname_name) (Serial, fragments:
ALL)
Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'

....so the cost is the same and the only difference is the addition of
the filter "cm.name_type= 'N'"

Here's the histogram output for column "name_type":

-- DISTRIBUTION
---

( N )
1: ( 8569, 1071,
P )

--- OVERFLOW
---
1: ( 3161859, P )

and the counts are:
NULL - 4 rows
" " - 8 rows
"N" - 8193 rows
"P" - 3,162,224 rows

So a big skew.

Update stats is up-to-date on the table (MEDIUM DISTRIBUTIONS ONLY
followed by HIGH(column) on all index heads).
The developers would like me to speed it up rather than have to change
a bunch of SQLs that all do similar things. I've tried dropping the
distributions on column "nake_type" and also the whole table, I've
tried optimizing hints FIRST_ROWS, ALL_ROWS, FULL, AVOID_FULL; tried
PDQ on and off and intermediate values for PDQPRIORITY, all sorts of
things, nothing helps. I tried UPDATE STATISTICS HIGH on column
(name_type), no help. I could include name_type in the index but with
cardinality that skewed I don't see it helping. I think I can't
improve it by engine manipulation - anyone else got a clue?
Many thanks
Malc
 
Superboer...
Posted: Fri Nov 06, 2009 1:20 pm
Guest
Hello Malc,

You probably know all this however......
i would at least try to create the index on

create index ixie on cm(upper60(<other_col>),name_type) and see if
that helps

Without it the engine has to goto the datapage and see if it needs the
record yes or no.

Question i assume you fetch all the data needed in both cases???
eq

dbaccess name_ofdb <<!
select current from systables where tabname ='systables';
unload to /dev/null
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"
select current from systables where tabname ='systables';
!


dbaccess name_ofdb <<!
select current from systables where tabname ='systables';
unload to /dev/null
SELECT<column list> FROM cm
WHERE
upper60(<other_column>) MATCHES "LA*"
select current from systables where tabname ='systables';
!


if both take the same amount of time, it then only takes longer to get
a fist full of data...

Superboer

On 6 nov, 13:51, Malc <i... at (no spam) perrior.net> wrote:
Quote:
IDS 9.30HC5 running on HP-UX 11i. Engine upgrade is not an option,
don't suggest it, see my previous posts going back some years!

Here's the scenario: a table "cm" has 3.2million rows and we query it
using something like (reduced for posting):
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"

The only bits that matter are the column "name_type", which is char(1)
and unindexed, and "<other_column>", which an is indexed char(60) -
the upper60() call is just a functional index we set up that helps on
that column.

A query using just "upper60(<other_column>) MATCHES "LA*"" is instant,
as expected.

If we include "name_type = "N"", it bogs and takes 1min45sec, at
least.
Here's sqexplain for each case:

CASE 1:
======> SELECT<column list> FROM cm
WHERE
upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
634086

  1) dba.cm: INDEX
PATH

    (1) Index Keys: dba.upper60(npname_name)   (Serial, fragments:
ALL)
        Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'

CASE 2:
======> SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
2

  1) dba.cm: INDEX
PATH

        Filters: dba.cm.name_type > 'N'

    (1) Index Keys: dba.upper60(npname_name)   (Serial, fragments:
ALL)
        Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'

...so the cost is the same and the only difference is the addition of
the filter "cm.name_type= 'N'"

Here's the histogram output for column "name_type":

-- DISTRIBUTION
---

(                    N )
 1: (    8569,     1071,
P )

--- OVERFLOW
---
 1: ( 3161859,           P )

and the counts are:
NULL - 4 rows
" " - 8 rows
"N" - 8193 rows
"P" - 3,162,224 rows

So a big skew.

Update stats is up-to-date on the table (MEDIUM DISTRIBUTIONS ONLY
followed by HIGH(column) on all index heads).
The developers would like me to speed it up rather than have to change
a bunch of SQLs that all do similar things. I've tried dropping the
distributions on column "nake_type" and also the whole table, I've
tried optimizing hints FIRST_ROWS, ALL_ROWS, FULL, AVOID_FULL; tried
PDQ on and off and intermediate values for PDQPRIORITY, all sorts of
things, nothing helps. I tried UPDATE STATISTICS HIGH on column
(name_type), no help.  I could include name_type in the index but with
cardinality that skewed I don't see it helping. I think I can't
improve it by engine manipulation - anyone else got a clue?
Many thanks
Malc
 
Malc...
Posted: Fri Nov 06, 2009 1:22 pm
Guest
Yup that does help - I was leaving that as I thought that the huge
skew on that field's values would make it next to ineffective as an
index key (especially when using the value "P" which is in the
overflow bin).
Obviously something I'm missing in my understanding of the optimizer,
but hey...
 
Malc...
Posted: Fri Nov 06, 2009 1:36 pm
Guest
Superboer said:

"Without it the engine has to goto the datapage and see if it needs
the
record yes or no. "

<fx: slaps forehead>

.....Malc goes off to reread Samar Desai's excellent paper
http://www3.software.ibm.com/ibmdl/pub/software/dw/dm/informix/0211desai/0211desai.pdf
 
Art Kagel...
Posted: Fri Nov 06, 2009 6:13 pm
Guest
Append name_type to the functional index. So it would become:

create index cm_func_new on cm( upper60(<othercolumn>), name_type );

That should do the trick.

Art

Art S. Kagel
Oninit (www.oninit.com)
IIUG Board of Directors (art at (no spam) iiug.org)

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Oninit, the IIUG, nor any other organization
with which I am associated either explicitly or implicitly. Neither do
those opinions reflect those of other individuals affiliated with any entity
with which I am affiliated nor those of the entities themselves.



On Fri, Nov 6, 2009 at 7:51 AM, Malc <iiug at (no spam) perrior.net> wrote:

Quote:
IDS 9.30HC5 running on HP-UX 11i. Engine upgrade is not an option,
don't suggest it, see my previous posts going back some years!

Here's the scenario: a table "cm" has 3.2million rows and we query it
using something like (reduced for posting):
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"

The only bits that matter are the column "name_type", which is char(1)
and unindexed, and "<other_column>", which an is indexed char(60) -
the upper60() call is just a functional index we set up that helps on
that column.

A query using just "upper60(<other_column>) MATCHES "LA*"" is instant,
as expected.

If we include "name_type = "N"", it bogs and takes 1min45sec, at
least.
Here's sqexplain for each case:

CASE 1:
=======
SELECT<column list> FROM cm
WHERE
upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
634086

1) dba.cm: INDEX
PATH

(1) Index Keys: dba.upper60(npname_name) (Serial, fragments:
ALL)
Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'


CASE 2:
=======
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
2

1) dba.cm: INDEX
PATH

Filters: dba.cm.name_type =
'N'

(1) Index Keys: dba.upper60(npname_name) (Serial, fragments:
ALL)
Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'

...so the cost is the same and the only difference is the addition of
the filter "cm.name_type= 'N'"

Here's the histogram output for column "name_type":

-- DISTRIBUTION
---

( N )
1: ( 8569, 1071,
P )

--- OVERFLOW
---
1: ( 3161859, P )

and the counts are:
NULL - 4 rows
" " - 8 rows
"N" - 8193 rows
"P" - 3,162,224 rows

So a big skew.

Update stats is up-to-date on the table (MEDIUM DISTRIBUTIONS ONLY
followed by HIGH(column) on all index heads).
The developers would like me to speed it up rather than have to change
a bunch of SQLs that all do similar things. I've tried dropping the
distributions on column "nake_type" and also the whole table, I've
tried optimizing hints FIRST_ROWS, ALL_ROWS, FULL, AVOID_FULL; tried
PDQ on and off and intermediate values for PDQPRIORITY, all sorts of
things, nothing helps. I tried UPDATE STATISTICS HIGH on column
(name_type), no help. I could include name_type in the index but with
cardinality that skewed I don't see it helping. I think I can't
improve it by engine manipulation - anyone else got a clue?
Many thanks
Malc
_______________________________________________
Informix-list mailing list
Informix-list at (no spam) iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
 
Christian Knappke...
Posted: Fri Nov 06, 2009 6:38 pm
Guest
Art Kagel wrote:
Quote:
Append name_type to the functional index.

Yes, that's what I'd suggest -- almost.

Quote:
So it would become:

create index cm_func_new on cm( upper60(<othercolumn>), name_type );

But I'd put name_type in the first place. Otherwise it would not help
a lot. The query you mentioned will be handled differently:

1. (upper60(<othercolumn>), name_type)

LA__________________________________________________________N

2. (name_type, upper60(<othercolumn>))

NLA__________________________________________________________

In the first case all rows where othercolumn starts with 'LA' need to
be searches and then filtered for name_type = 'N', whereas in the
second case only rows with name_type = 'N' need to be searched for
othercolumn starting with 'LA'.

Disclaimer: This optimization is only for the query

SELECT <column list> FROM cm WHERE name_type = 'N'
AND upper60(<other_column>) MATCHES 'LA*'

If you leave out the name_type = 'N' part, you'll end with a table
scan. So check all queries on the table and see if you need more than
one index.

HTH and best regards
Christian
 
Ian Michael Gumby...
Posted: Fri Nov 06, 2009 7:53 pm
Guest
Quote:
From: chknews at (no spam) gmx.net
Subject: Re: Can I optimize the engine here or do I have to get the developers to change the SQL?
Date: Fri, 6 Nov 2009 14:38:46 +0100
To: informix-list at (no spam) iiug.org

Art Kagel wrote:

So it would become:

create index cm_func_new on cm( upper60(<othercolumn>), name_type );

But I'd put name_type in the first place. Otherwise it would not help
a lot. The query you mentioned will be handled differently:

1. (upper60(<othercolumn>), name_type)

LA__________________________________________________________N

2. (name_type, upper60(<othercolumn>))

No.


That would be a mistake.

How unique is name_type? 52 possible single characters (not all will be used) over 20+ million rows? Errrr not good. Its worse when you realize that you may only use upper case (26 possible) and then you don't really use all 26 characters...

Art is right in making it the second column to help with uniqueness in a single index.

The key is to make sure that the first column of the index will give you the most unique set.

Now if IDS allowed you to have multiple indexes on the same table and be used in a query... this wouldn't be a problem. (Is this out yet? or coming out in the next release?)

It sounds like Malcolm has an issue where his index on the 'othercolumn' is returning more than 20K rows with LA as the first two characters.

But hey! What do I know? I'm not a physical DBA. I just play one on TV.

-G Razz


_________________________________________________________________
Find the right PC with Windows 7 and Windows Live.
http://www.microsoft.com/Windows/pc-scout/laptop-set-criteria.aspx?cbid=wl&filt=200,2400,10,19,1,3,1,7,50,650,2,12,0,1000&cat=1,2,3,4,5,6&brands=5,6,7,8,9,10,11,12,13,14,15,16&addf=4,5,9&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen2:112009
 
Fernando Nunes...
Posted: Sat Nov 07, 2009 3:26 am
Guest
Superboer wrote:
Quote:
Hello Malc,

You probably know all this however......
i would at least try to create the index on

create index ixie on cm(upper60(<other_col>),name_type) and see if
that helps

Without it the engine has to goto the datapage and see if it needs the
record yes or no.

Question i assume you fetch all the data needed in both cases???
eq

dbaccess name_ofdb <<!
select current from systables where tabname ='systables';
unload to /dev/null
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"
select current from systables where tabname ='systables';
!


dbaccess name_ofdb <<!
select current from systables where tabname ='systables';
unload to /dev/null
SELECT<column list> FROM cm
WHERE
upper60(<other_column>) MATCHES "LA*"
select current from systables where tabname ='systables';
!


if both take the same amount of time, it then only takes longer to get
a fist full of data...

Superboer

On 6 nov, 13:51, Malc <i... at (no spam) perrior.net> wrote:
IDS 9.30HC5 running on HP-UX 11i. Engine upgrade is not an option,
don't suggest it, see my previous posts going back some years!

Here's the scenario: a table "cm" has 3.2million rows and we query it
using something like (reduced for posting):
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"

The only bits that matter are the column "name_type", which is char(1)
and unindexed, and "<other_column>", which an is indexed char(60) -
the upper60() call is just a functional index we set up that helps on
that column.

A query using just "upper60(<other_column>) MATCHES "LA*"" is instant,
as expected.

If we include "name_type = "N"", it bogs and takes 1min45sec, at
least.
Here's sqexplain for each case:

CASE 1:
=======
SELECT<column list> FROM cm
WHERE
upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
634086

1) dba.cm: INDEX
PATH

(1) Index Keys: dba.upper60(npname_name) (Serial, fragments:
ALL)
Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'

CASE 2:
=======
SELECT<column list> FROM cm
WHERE
name_type = "N"
AND upper60(<other_column>) MATCHES "LA*"
Estimated Cost:
655265
Estimated # of Rows Returned:
2

1) dba.cm: INDEX
PATH

Filters: dba.cm.name_type =
'N'

(1) Index Keys: dba.upper60(npname_name) (Serial, fragments:
ALL)
Lower Index Filter: dba.upper60(dba.cm.npname_name )MATCHES
'LA*'

...so the cost is the same and the only difference is the addition of
the filter "cm.name_type= 'N'"

Here's the histogram output for column "name_type":

-- DISTRIBUTION
---

( N )
1: ( 8569, 1071,
P )

--- OVERFLOW
---
1: ( 3161859, P )

and the counts are:
NULL - 4 rows
" " - 8 rows
"N" - 8193 rows
"P" - 3,162,224 rows

So a big skew.

Update stats is up-to-date on the table (MEDIUM DISTRIBUTIONS ONLY
followed by HIGH(column) on all index heads).
The developers would like me to speed it up rather than have to change
a bunch of SQLs that all do similar things. I've tried dropping the
distributions on column "nake_type" and also the whole table, I've
tried optimizing hints FIRST_ROWS, ALL_ROWS, FULL, AVOID_FULL; tried
PDQ on and off and intermediate values for PDQPRIORITY, all sorts of
things, nothing helps. I tried UPDATE STATISTICS HIGH on column
(name_type), no help. I could include name_type in the index but with
cardinality that skewed I don't see it helping. I think I can't
improve it by engine manipulation - anyone else got a clue?
Many thanks
Malc


I was going to ask the same...
It doesn't make too much sense that the query with name_type condition
takes too much more time... The engine has to do the comparisons, but
that should not be causing too much difference.

Of course the first record can take longer...

Regards.
 
Fernando Nunes...
Posted: Sat Nov 07, 2009 3:29 am
Guest
Ian Michael Gumby wrote:

Quote:
Now if IDS allowed you to have multiple indexes on the same table and be
used in a query... this wouldn't be a problem. (Is this out yet? or
coming out in the next release?)

No. It's not out. It's one of the features of XPS that would make sense
to port... we have to wait.
 
Ian Michael Gumby...
Posted: Sun Nov 08, 2009 3:12 am
Guest
Well I know its coming.
Its something I bothered Jerry K about just over a year. Just don't know where its in the pipeline.

The only other 'must have' feature would be to create a column oriented data type (re: HBase). This shouldn't be too hard to build on. Something similar to the time series datablade.

Quote:
From: domusonline at (no spam) gmail.com
Subject: Re: Can I optimize the engine here or do I have to get the developers to change the SQL?
Date: Fri, 6 Nov 2009 22:29:58 +0000
To: informix-list at (no spam) iiug.org

Ian Michael Gumby wrote:

Now if IDS allowed you to have multiple indexes on the same table and be
used in a query... this wouldn't be a problem. (Is this out yet? or
coming out in the next release?)

No. It's not out. It's one of the features of XPS that would make sense
to port... we have to wait.
_______________________________________________
Informix-list mailing list
Informix-list at (no spam) iiug.org
http://www.iiug.org/mailman/listinfo/informix-list

_________________________________________________________________
Find the right PC with Windows 7 and Windows Live.
http://www.microsoft.com/Windows/pc-scout/laptop-set-criteria.aspx?cbid=wl&filt=200,2400,10,19,1,3,1,7,50,650,2,12,0,1000&cat=1,2,3,4,5,6&brands=5,6,7,8,9,10,11,12,13,14,15,16&addf=4,5,9&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen2:112009
 
Obnoxio The Clown...
Posted: Sun Nov 08, 2009 3:20 am
Guest
Ian Michael Gumby wrote:
Quote:

The only other 'must have' feature would be to create a column oriented
data type (re: HBase). This shouldn't be too hard to build on. Something
similar to the time series datablade.

Fuck. Off.

Firstly, it's nothing like the fucking time series DataBlade and
secondly, it's an affront against nature.

Stick your HBase-alike into DB2, where it can sit alongside the equally
immensely valuable and productive native XML support.

--
Cheers,
Obnoxio The Clown

http://obotheclown.blogspot.com
I will now proceed to pleasure myself with this fish.

--
This message has been scanned for viruses and
dangerous content by OpenProtect(http://www.openprotect.com), and is
believed to be clean.
 
Ian Michael Gumby...
Posted: Sun Nov 08, 2009 6:15 am
Guest
Quote:
Date: Sat, 7 Nov 2009 22:20:27 +0000
From: obnoxio at (no spam) serendipita.com
CC: informix-list at (no spam) iiug.org
Subject: Re: Can I optimize the engine here or do I have to get the developers to change the SQL?

Ian Michael Gumby wrote:

The only other 'must have' feature would be to create a column oriented
data type (re: HBase). This shouldn't be too hard to build on. Something
similar to the time series datablade.

Fuck. Off.

Firstly, it's nothing like the fucking time series DataBlade and
secondly, it's an affront against nature.

Stick your HBase-alike into DB2, where it can sit alongside the equally
immensely valuable and productive native XML support.

Ah, you don't really know how the time series datablade works, do you?

So if you think about it, a column oriented datablade would use a lot of the same basic concepts in terms of memory and page storage.

Yeah there's more to it and the distributed query management would be different, but IDS would be the easiest database to implement these concepts.

While you may see this as an abomination, others have been very successful in implementing this.

-G


_________________________________________________________________
Hotmail: Trusted email with Microsoft's powerful SPAM protection.
http://clk.atdmt.com/GBL/go/177141664/direct/01/
http://clk.atdmt.com/GBL/go/177141664/direct/01/
 
Obnoxio The Clown...
Posted: Sun Nov 08, 2009 4:20 pm
Guest
Ian Michael Gumby wrote:
Quote:


Date: Sat, 7 Nov 2009 22:20:27 +0000
From: obnoxio at (no spam) serendipita.com
CC: informix-list at (no spam) iiug.org
Subject: Re: Can I optimize the engine here or do I have to get the
developers to change the SQL?

Ian Michael Gumby wrote:

The only other 'must have' feature would be to create a column
oriented
data type (re: HBase). This shouldn't be too hard to build on.
Something
similar to the time series datablade.

Fuck. Off.

Firstly, it's nothing like the fucking time series DataBlade and
secondly, it's an affront against nature.

Stick your HBase-alike into DB2, where it can sit alongside the equally
immensely valuable and productive native XML support.

Ah, you don't really know how the time series datablade works, do you?
So if you think about it, a column oriented datablade would use a lot of
the same basic concepts in terms of memory and page storage.

Really. Gosh, well, who would have thunk that a column oriented
DataBlade would also use memory and pages? Certainly not me.

Quote:
Yeah there's more to it and the distributed query management would be
different, but IDS would be the easiest database to implement these
concepts.

Yeah, apart from those minor issues, huh? And one other minor issue: who
wants this shit apart from you?

Quote:
While you may see this as an abomination, others have been very
successful in implementing this.

Yeah, it's taking over the RDBMS market. Oracle for one are absolutely
shitting themselves.

--
Cheers,
Obnoxio The Clown

http://obotheclown.blogspot.com
I will now proceed to pleasure myself with this fish.

--
This message has been scanned for viruses and
dangerous content by OpenProtect(http://www.openprotect.com), and is
believed to be clean.
 
Ian Michael Gumby...
Posted: Mon Nov 09, 2009 4:35 am
Guest
Quote:
Date: Sun, 8 Nov 2009 11:20:02 +0000
From: obnoxio at (no spam) serendipita.com
CC: informix-list at (no spam) iiug.org
Subject: Re: {Spam?} RE: Can I optimize the engine here or do I have to get the developers to change the SQL?

Ian Michael Gumby wrote:
[SNIP]
Ah, you don't really know how the time series datablade works, do you?
So if you think about it, a column oriented datablade would use a lot of
the same basic concepts in terms of memory and page storage.

Really. Gosh, well, who would have thunk that a column oriented
DataBlade would also use memory and pages? Certainly not me.

Well the point is that you're not starting from scratch and that the lessons learned from implementing time series could be used to create a column oriented data type.


Quote:
Yeah there's more to it and the distributed query management would be
different, but IDS would be the easiest database to implement these
concepts.

Yeah, apart from those minor issues, huh? And one other minor issue: who
wants this shit apart from you?

Hmmm.

That's a good question.
I mean if you ask who's looking at HBase, there's a lot of interest.
If you're asking who's going to be interested in a database that can handle column oriented data and work in parallel to HBase?

It will depend on cost, scalability and performance.

Because you have a high cost per node when considering a database, you have to show that there is performance.

The compelling reason why you want this is because when you're done doing a map reduction, then ending set of data could be used within the relational database. Even if you keep it within the HBase column oriented database, can you outperform HBase on its own.

This isn't something you can simulate and extrapolate its performance, you have to build it, al least in terms of a POC.


Quote:
While you may see this as an abomination, others have been very
successful in implementing this.

Yeah, it's taking over the RDBMS market. Oracle for one are absolutely
shitting themselves.


Yeah, someone said Oracle was announcing something along the lines of a column oriented data type. Not sure and if only our favorite Oracle wonk posted here... ;-)

But still Oracle will develop something, but that doesn't mean that it will work, or work well enough.
I mean... there's this thing called Oracle RAC... :-)

-G



_________________________________________________________________
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009
 
Malc...
Posted: Mon Nov 09, 2009 2:28 pm
Guest
Thanks Christian - I was just composing the below ehen I read your
response, much appreciated for the work you're putting in!

Well... sorry this thread has gone offtopic (and I see it now carries
the "SPAM?" tag, nice one) but I've got a far better result by putting
the name_type field in the indexes as the first element, with the
functional index field as the second element; the other way round
(name_type as the second element) was still much slower.
Of course, this only applies if the query includes a WHERE on a
name_type value of "N" (8000 values out of 3M) but using "P" (most of
the rest of the values) it's back to old speed so that's
understandable.
I *think* the issue is that the way I'd imagine it to work would be
that (in the original case where the index DOESN'T include name_type
but the query DOES) it finds the rows that match the "MATCHES LA*"
criterion and then post-filters the result set on name_type; but this
doesn't seem to be the way it works, it is in fact hellish slow
whereas if I DON'T include name_type in the WHERE clause it's
superspeed.
So, I have a partial solution in that a query using name_type = N is
much quicker but name_type = P is not.
The obvious solution is to get the deveolpers to rewreite the query
such that they retrieve all the rows and then postfilter the first 200
that match the name_type in code. (Oops didn't mention they have a
"SELECT FIRST 200" at the top of the query). Trouble is, it's a web
front end that allows users to specify their own value in the query
form (the LA* is just an example, itcould be "MAC*" or "OBNOXIO*" or
even just "*"). SO if some jerk just enters the "*" it could try to
return 3 million rows, which is why there's a FIRST 200 in there. So
post-filtering in code might timeout or might just not get enough of
the required data back.
Hmm
 
 
Page 1 of 2    Goto page 1, 2  Next
All times are GMT
The time now is Thu Dec 03, 2009 2:07 pm