Row has too many columns

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Row has too many columns

Nicolae Mihalache
Hello,

I'm trying to load a csv file as a foreign table with 343 columns and
when running any query, I get the error "Row has too many columns".
It seems that luciddb doesn't like that the first line from my csv file
(the header) has more than 4000 characters because in the Trace.log I
see "Could not process input ...." and it prints the the first 4000
characters from the first line.

Also, I have a more general remark: luciddb feels very slow. Even a
simple query "select FOREIGN_TABLE_NAME  from
sys_root.dba_foreign_tables;" which returns 2 rows, takes 1.6 seconds to
execute.

Thanks for your help,
nicolae



------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations
Conference from O'Reilly Media. Velocity features a full day of
expert-led, hands-on workshops and two days of sessions from industry
leaders in dedicated Performance & Operations tracks. Use code vel09scf
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

Nicolae Mihalache
Sorry, it was my mistake: I have an extra tab at the end of the line and
this is causing the error. It seems that just the log size is limited,
that's why it prints only 4000 characters.

nicolae

Nicolae Mihalache wrote:

> Hello,
>
> I'm trying to load a csv file as a foreign table with 343 columns and
> when running any query, I get the error "Row has too many columns".
> It seems that luciddb doesn't like that the first line from my csv file
> (the header) has more than 4000 characters because in the Trace.log I
> see "Could not process input ...." and it prints the the first 4000
> characters from the first line.
>
> Also, I have a more general remark: luciddb feels very slow. Even a
> simple query "select FOREIGN_TABLE_NAME  from
> sys_root.dba_foreign_tables;" which returns 2 rows, takes 1.6 seconds to
> execute.
>
> Thanks for your help,
> nicolae
>
>
>
>  


------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations
Conference from O'Reilly Media. Velocity features a full day of
expert-led, hands-on workshops and two days of sessions from industry
leaders in dedicated Performance & Operations tracks. Use code vel09scf
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

John Sichi
Administrator
In reply to this post by Nicolae Mihalache
Nicolae Mihalache wrote:
> Also, I have a more general remark: luciddb feels very slow. Even a
> simple query "select FOREIGN_TABLE_NAME  from
> sys_root.dba_foreign_tables;" which returns 2 rows, takes 1.6 seconds to
> execute.

A few points to note here:

* LucidDB is optimized for executing complex analytical queries against
large volumes of data, so when evaluating its suitability, I would
recommend starting with comparing timing for a representative sample of
those versus your other options.

* The very first query after the server is brought up hits some one-time
overhead loading catalog caches, so discard the timing for it (I think
you did this already).

* For short queries such as your catalog view example, subsequent
executions of the same query will typically run quite a bit faster
because query preparation is expensive and LucidDB caches the query
plan.  For example, when I ran your query above (after preloading the
catalog), the first run took 1.5 seconds, while the second run took 0.2
seconds.  I'm guessing this is what you're encountering.  In particular,
this caching effect is important for JDBC metadata API queries, since
client tools tend to execute a lot of these.

* Queries against catalog views have gotten a bit slower going from
LucidDB 0.8 to 0.9 because we switched the repository implementation to
Hibernate instead of MDR; for the example above, the second run of the
query takes only 0.1 seconds in v0.8 compared to the 0.2 seconds in
v0.9.  If any existing users experience a case where the slowdown is a
noticeable problem, please let me know.

JVS

------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations
Conference from O'Reilly Media. Velocity features a full day of
expert-led, hands-on workshops and two days of sessions from industry
leaders in dedicated Performance & Operations tracks. Use code vel09scf
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

Sam Bishop
On Mon, May 4, 2009 at 4:31 PM, John V. Sichi <[hidden email]> wrote:
> * For short queries such as your catalog view example, subsequent
> executions of the same query will typically run quite a bit faster
> because query preparation is expensive and LucidDB caches the query
> plan.  For example, when I ran your query above (after preloading the
> catalog), the first run took 1.5 seconds, while the second run took 0.2
> seconds.  I'm guessing this is what you're encountering.  In particular,
> this caching effect is important for JDBC metadata API queries, since
> client tools tend to execute a lot of these.

Is there any way to pre-cache query plans with LucidDB?  As I
understand it, this is one of the purposes of stored procedures in
most RDBMSs.  But I haven't seen any references to stored procedures
in the wiki.

Thanks,
Sam

------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations
Conference from O'Reilly Media. Velocity features a full day of
expert-led, hands-on workshops and two days of sessions from industry
leaders in dedicated Performance & Operations tracks. Use code vel09scf
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

John Sichi
Administrator
Sam Bishop wrote:
> Is there any way to pre-cache query plans with LucidDB?  As I
> understand it, this is one of the purposes of stored procedures in
> most RDBMSs.  But I haven't seen any references to stored procedures
> in the wiki.

There is currently no way to cache them persistently, although that
could be a useful feature, particularly for complex ETL INSERT and MERGE
statements.

To pre-cache them within a running server's memory, you can prepare the
statement using JDBC.  You don't need to actually keep the statement
prepared; you can close it after prepare and the cached plan will hang
around in the server until it eventually ages out of the cache.  Cached
plans are global (not associated with a particular session).  The cache
key is an internal canonical form of the parsed SQL (not the text), so
the later execution does not need to match exactly, but to minimize the
chances of a cache miss, use the same text.

Detailed info is here:

http://pub.eigenbase.org/wiki/FarragoCodeCache

If code caching turns out to be important for your application's
performance, it may be necessary to increase the code cache size ceiling
since complex statements require more memory.

Note that preparing a statement omits one important step in statement
construction:  the JIT compiler, which will only kick in when the
statement (and associated Java code) is actually executed.  Some JVM's
may provide a way to force JIT precompilation; LucidDB currently
provides no way to access that.

JVS

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

Nicolae Mihalache
In reply to this post by John Sichi
The 1.6 seconds are for the subsequent execution, the first one takes
6-7 seconds.

The computer is not particularly slow: Intel Core 2 2.4Ghz, and the
operating system openSUSE 10.3 (X86-64), java 1.6.0_13-b03 64bit

nicolae

John V. Sichi wrote:

> Nicolae Mihalache wrote:
>> Also, I have a more general remark: luciddb feels very slow. Even a
>> simple query "select FOREIGN_TABLE_NAME  from
>> sys_root.dba_foreign_tables;" which returns 2 rows, takes 1.6 seconds to
>> execute.
>
> A few points to note here:
>
> * LucidDB is optimized for executing complex analytical queries
> against large volumes of data, so when evaluating its suitability, I
> would recommend starting with comparing timing for a representative
> sample of those versus your other options.
>
> * The very first query after the server is brought up hits some
> one-time overhead loading catalog caches, so discard the timing for it
> (I think you did this already).
>
> * For short queries such as your catalog view example, subsequent
> executions of the same query will typically run quite a bit faster
> because query preparation is expensive and LucidDB caches the query
> plan.  For example, when I ran your query above (after preloading the
> catalog), the first run took 1.5 seconds, while the second run took
> 0.2 seconds.  I'm guessing this is what you're encountering.  In
> particular, this caching effect is important for JDBC metadata API
> queries, since client tools tend to execute a lot of these.
>
> * Queries against catalog views have gotten a bit slower going from
> LucidDB 0.8 to 0.9 because we switched the repository implementation
> to Hibernate instead of MDR; for the example above, the second run of
> the query takes only 0.1 seconds in v0.8 compared to the 0.2 seconds
> in v0.9.  If any existing users experience a case where the slowdown
> is a noticeable problem, please let me know.
>
> JVS


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

John Sichi
Administrator
My 0.2s timings below are on a 32-bit 2.6GHz Thinkpad T61 laptop, using
the release version of LucidDB 0.9.  I was using Java 1.5, but just now
I retested with 1.6 (1.6.0_07-b06) and if anything it was a bit faster.

What LucidDB version are you testing?

Did you turn on any tracing (this can slow things down a lot)?

JVS

Nicolae Mihalache wrote:

> The 1.6 seconds are for the subsequent execution, the first one takes
> 6-7 seconds.
>
> The computer is not particularly slow: Intel Core 2 2.4Ghz, and the
> operating system openSUSE 10.3 (X86-64), java 1.6.0_13-b03 64bit
>
> nicolae
>
> John V. Sichi wrote:
>> Nicolae Mihalache wrote:
>>> Also, I have a more general remark: luciddb feels very slow. Even a
>>> simple query "select FOREIGN_TABLE_NAME  from
>>> sys_root.dba_foreign_tables;" which returns 2 rows, takes 1.6 seconds to
>>> execute.
>> A few points to note here:
>>
>> * LucidDB is optimized for executing complex analytical queries
>> against large volumes of data, so when evaluating its suitability, I
>> would recommend starting with comparing timing for a representative
>> sample of those versus your other options.
>>
>> * The very first query after the server is brought up hits some
>> one-time overhead loading catalog caches, so discard the timing for it
>> (I think you did this already).
>>
>> * For short queries such as your catalog view example, subsequent
>> executions of the same query will typically run quite a bit faster
>> because query preparation is expensive and LucidDB caches the query
>> plan.  For example, when I ran your query above (after preloading the
>> catalog), the first run took 1.5 seconds, while the second run took
>> 0.2 seconds.  I'm guessing this is what you're encountering.  In
>> particular, this caching effect is important for JDBC metadata API
>> queries, since client tools tend to execute a lot of these.
>>
>> * Queries against catalog views have gotten a bit slower going from
>> LucidDB 0.8 to 0.9 because we switched the repository implementation
>> to Hibernate instead of MDR; for the example above, the second run of
>> the query takes only 0.1 seconds in v0.8 compared to the 0.2 seconds
>> in v0.9.  If any existing users experience a case where the slowdown
>> is a noticeable problem, please let me know.
>>
>> JVS
>
>
> ------------------------------------------------------------------------------
> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
> production scanning environment may not be a perfect world - but thanks to
> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
> Series Scanner you'll get full speed at 300 dpi even with all image
> processing features enabled. http://p.sf.net/sfu/kodak-com
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

Marcus Herou
In reply to this post by John Sichi
Hi: Regarding Hibernate.

Why did you choose Hibernate ? My experience is that you get extremely good design time performance (fast to develop) with Hibernate but poor runtime performance (and harder to optimize).

In fact 95% of all our queries are done with Hibernate but the 5% that is most important performance wise we use raw jdbc.

Is not Hibernate something that you rather would use on the client side of it all ? But I am perhaps missing something.

Anyway good job with the product (even though my comment was negative)

Cheers

//Marcus


On Tue, May 5, 2009 at 12:31 AM, John V. Sichi <[hidden email]> wrote:
Nicolae Mihalache wrote:
> Also, I have a more general remark: luciddb feels very slow. Even a
> simple query "select FOREIGN_TABLE_NAME  from
> sys_root.dba_foreign_tables;" which returns 2 rows, takes 1.6 seconds to
> execute.

A few points to note here:

* LucidDB is optimized for executing complex analytical queries against
large volumes of data, so when evaluating its suitability, I would
recommend starting with comparing timing for a representative sample of
those versus your other options.

* The very first query after the server is brought up hits some one-time
overhead loading catalog caches, so discard the timing for it (I think
you did this already).

* For short queries such as your catalog view example, subsequent
executions of the same query will typically run quite a bit faster
because query preparation is expensive and LucidDB caches the query
plan.  For example, when I ran your query above (after preloading the
catalog), the first run took 1.5 seconds, while the second run took 0.2
seconds.  I'm guessing this is what you're encountering.  In particular,
this caching effect is important for JDBC metadata API queries, since
client tools tend to execute a lot of these.

* Queries against catalog views have gotten a bit slower going from
LucidDB 0.8 to 0.9 because we switched the repository implementation to
Hibernate instead of MDR; for the example above, the second run of the
query takes only 0.1 seconds in v0.8 compared to the 0.2 seconds in
v0.9.  If any existing users experience a case where the slowdown is a
noticeable problem, please let me know.

JVS

------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations
Conference from O'Reilly Media. Velocity features a full day of
expert-led, hands-on workshops and two days of sessions from industry
leaders in dedicated Performance & Operations tracks. Use code vel09scf
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users



--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

John Sichi
Administrator
Marcus Herou wrote:
> Why did you choose Hibernate ? My experience is that you get extremely
> good design time performance (fast to develop) with Hibernate but poor
> runtime performance (and harder to optimize).
>
> In fact 95% of all our queries are done with Hibernate but the 5% that
> is most important performance wise we use raw jdbc.
>
> Is not Hibernate something that you rather would use on the client side
> of it all ? But I am perhaps missing something.

It's kind of a long story.  For the catalog repository, we already had a
metamodel-driven UML-based O/R mapping system in place implemented in
terms of Netbeans MDR.  MDR had some scalability issues for
large/complex catalogs, and also became abandoned by Sun, so we needed a
replacement.  Since everything was already metamodel-driven, we decided
to replace the implementation under the hood so we could leave all of
the catalog manipulation code alone.  We also did not want to write our
own O/R mapper, so we chose Hibernate as the most natural off-the-shelf
implementation.  It's true that a lot of performance tuning was required
for various access patterns, but again, since everything is
metamodel-driven, it was easy to apply most optimizations generically.

The result is used by other Eigenbase-derived projects, not just
LucidDB; it's called Enki:

http://pub.eigenbase.org/wiki/Enki

Actually, the catalog views such as SYS_ROOT.DBA_FOREIGN_SERVERS bypass
Hibernate entirely.  Instead, they access the underlying catalog store
(hsqldb by default) using a system-defined instance of the SQL/MED JDBC
foreign data wrapper.  For large catalogs, they should actually perform
better than the old MDR implementation, but for small catalogs, there's
probably some overhead from the way the foreign data wrapper works.

JVS

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Row has too many columns

Marcus Herou
Great. Agree if an O/R Mapper need to be used then Hibernate is the way to go, even though I tend to more and more use JDBC + a simple DTO instantiator. It is uglier in a sense and the model grows faster (many almost redundant DTOs) but it makes it hell of a lot easier to tune. But it is nice to be able to use OGNL :)

Check out the latest version of EHCache as well, he claims that pure get's are a magnitude faster. ( I guess you use caching of some more or less static data ).
Even though EHCache is a competitor of a project of my own Mammatus (http://dev.tailsweep.com/projects/mammatus/) I would recommend EHcache over my own at this point (damn)

Keep up the good work

//Marcus




On Tue, May 5, 2009 at 11:04 AM, John V. Sichi <[hidden email]> wrote:
Marcus Herou wrote:
Why did you choose Hibernate ? My experience is that you get extremely good design time performance (fast to develop) with Hibernate but poor runtime performance (and harder to optimize).

In fact 95% of all our queries are done with Hibernate but the 5% that is most important performance wise we use raw jdbc.

Is not Hibernate something that you rather would use on the client side of it all ? But I am perhaps missing something.

It's kind of a long story.  For the catalog repository, we already had a metamodel-driven UML-based O/R mapping system in place implemented in terms of Netbeans MDR.  MDR had some scalability issues for large/complex catalogs, and also became abandoned by Sun, so we needed a replacement.  Since everything was already metamodel-driven, we decided to replace the implementation under the hood so we could leave all of the catalog manipulation code alone.  We also did not want to write our own O/R mapper, so we chose Hibernate as the most natural off-the-shelf implementation.  It's true that a lot of performance tuning was required for various access patterns, but again, since everything is metamodel-driven, it was easy to apply most optimizations generically.

The result is used by other Eigenbase-derived projects, not just LucidDB; it's called Enki:

http://pub.eigenbase.org/wiki/Enki

Actually, the catalog views such as SYS_ROOT.DBA_FOREIGN_SERVERS bypass Hibernate entirely.  Instead, they access the underlying catalog store (hsqldb by default) using a system-defined instance of the SQL/MED JDBC foreign data wrapper.  For large catalogs, they should actually perform better than the old MDR implementation, but for small catalogs, there's probably some overhead from the way the foreign data wrapper works.

JVS



--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs (was: Row has too many columns)

Nicolae Mihalache
In reply to this post by John Sichi
Are you running on Linux or Windows? Which version?

I'm also using the 0.9.0 binary. I tried both 32bit and 64 bit versions
on OpenSuse 11.1 and 10.3 respectively and the results are in the order
of 1.5-1.8 seconds.
However, with the 32 bits on kubuntu 8.10, I get about 0.5 seconds for
the same query on a 2.4GHz Thinkpad.
I will try to investigate why these different results.

nicolae


On 05/05/2009 04:40 AM, John V. Sichi wrote:

> My 0.2s timings below are on a 32-bit 2.6GHz Thinkpad T61 laptop,
> using the release version of LucidDB 0.9.  I was using Java 1.5, but
> just now I retested with 1.6 (1.6.0_07-b06) and if anything it was a
> bit faster.
>
> What LucidDB version are you testing?
>
> Did you turn on any tracing (this can slow things down a lot)?
>
> JVS
>
> Nicolae Mihalache wrote:
>> The 1.6 seconds are for the subsequent execution, the first one takes
>> 6-7 seconds.
>>
>> The computer is not particularly slow: Intel Core 2 2.4Ghz, and the
>> operating system openSUSE 10.3 (X86-64), java 1.6.0_13-b03 64bit
>>
>> nicolae
>>
>> John V. Sichi wrote:
>>> Nicolae Mihalache wrote:
>>>> Also, I have a more general remark: luciddb feels very slow. Even a
>>>> simple query "select FOREIGN_TABLE_NAME  from
>>>> sys_root.dba_foreign_tables;" which returns 2 rows, takes 1.6
>>>> seconds to
>>>> execute.
>>> A few points to note here:
>>>
>>> * LucidDB is optimized for executing complex analytical queries
>>> against large volumes of data, so when evaluating its suitability, I
>>> would recommend starting with comparing timing for a representative
>>> sample of those versus your other options.
>>>
>>> * The very first query after the server is brought up hits some
>>> one-time overhead loading catalog caches, so discard the timing for it
>>> (I think you did this already).
>>>
>>> * For short queries such as your catalog view example, subsequent
>>> executions of the same query will typically run quite a bit faster
>>> because query preparation is expensive and LucidDB caches the query
>>> plan.  For example, when I ran your query above (after preloading the
>>> catalog), the first run took 1.5 seconds, while the second run took
>>> 0.2 seconds.  I'm guessing this is what you're encountering.  In
>>> particular, this caching effect is important for JDBC metadata API
>>> queries, since client tools tend to execute a lot of these.
>>>
>>> * Queries against catalog views have gotten a bit slower going from
>>> LucidDB 0.8 to 0.9 because we switched the repository implementation
>>> to Hibernate instead of MDR; for the example above, the second run of
>>> the query takes only 0.1 seconds in v0.8 compared to the 0.2 seconds
>>> in v0.9.  If any existing users experience a case where the slowdown
>>> is a noticeable problem, please let me know.
>>>
>>> JVS
>>
>>
>> ------------------------------------------------------------------------------
>>
>> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
>> production scanning environment may not be a perfect world - but
>> thanks to
>> Kodak, there's a perfect scanner to get the job done! With the NEW
>> KODAK i700
>> Series Scanner you'll get full speed at 300 dpi even with all image
>> processing features enabled. http://p.sf.net/sfu/kodak-com
>> _______________________________________________
>> luciddb-users mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>>


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

Nicolae Mihalache
Ok, so it turns out that the difference between the two systems is
caused by the fsync of FarragoCatalog.log
It seems that while running the select from the system catalogs, there
are about 40 lines with "SET AUTOCOMMIT TRUE" "SET AUTOCOMMIT FALSE",
each of them followed by a fsync.

On OpenSuse, the ext3 partition is mounted by default with barrier=1,
and this causes the fsync to be quite slow (around 40ms). So in total it
spends 1.5 seconds syncing some unnecessary statements in that log file.

Isn't there a way to get rid of all those statements written into the log?

nicolae


Nicolae Mihalache wrote:

> Are you running on Linux or Windows? Which version?
>
> I'm also using the 0.9.0 binary. I tried both 32bit and 64 bit versions
> on OpenSuse 11.1 and 10.3 respectively and the results are in the order
> of 1.5-1.8 seconds.
> However, with the 32 bits on kubuntu 8.10, I get about 0.5 seconds for
> the same query on a 2.4GHz Thinkpad.
> I will try to investigate why these different results.
>
> nicolae
>
>
> On 05/05/2009 04:40 AM, John V. Sichi wrote:
>  
>> My 0.2s timings below are on a 32-bit 2.6GHz Thinkpad T61 laptop,
>> using the release version of LucidDB 0.9.  I was using Java 1.5, but
>> just now I retested with 1.6 (1.6.0_07-b06) and if anything it was a
>> bit faster.
>>
>> What LucidDB version are you testing?
>>
>> Did you turn on any tracing (this can slow things down a lot)?
>>    



------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

John Sichi
Administrator
Thanks a lot for digging into this; I'll take a look at what is going on
with the Hibernate read-only transactions and see if the corresponding
JDBC autocommit start/end calls can be skipped.

JVS

Nicolae Mihalache wrote:

> Ok, so it turns out that the difference between the two systems is
> caused by the fsync of FarragoCatalog.log
> It seems that while running the select from the system catalogs, there
> are about 40 lines with "SET AUTOCOMMIT TRUE" "SET AUTOCOMMIT FALSE",
> each of them followed by a fsync.
>
> On OpenSuse, the ext3 partition is mounted by default with barrier=1,
> and this causes the fsync to be quite slow (around 40ms). So in total it
> spends 1.5 seconds syncing some unnecessary statements in that log file.
>
> Isn't there a way to get rid of all those statements written into the log?
>
> nicolae
>
>
> Nicolae Mihalache wrote:
>> Are you running on Linux or Windows? Which version?
>>
>> I'm also using the 0.9.0 binary. I tried both 32bit and 64 bit versions
>> on OpenSuse 11.1 and 10.3 respectively and the results are in the order
>> of 1.5-1.8 seconds.
>> However, with the 32 bits on kubuntu 8.10, I get about 0.5 seconds for
>> the same query on a 2.4GHz Thinkpad.
>> I will try to investigate why these different results.
>>
>> nicolae
>>
>>
>> On 05/05/2009 04:40 AM, John V. Sichi wrote:
>>  
>>> My 0.2s timings below are on a 32-bit 2.6GHz Thinkpad T61 laptop,
>>> using the release version of LucidDB 0.9.  I was using Java 1.5, but
>>> just now I retested with 1.6 (1.6.0_07-b06) and if anything it was a
>>> bit faster.
>>>
>>> What LucidDB version are you testing?
>>>
>>> Did you turn on any tracing (this can slow things down a lot)?
>>>    
>
>
>


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

John Sichi
Administrator
In reply to this post by Nicolae Mihalache
Nicolae Mihalache wrote:
> Are you running on Linux or Windows? Which version?
>
> I'm also using the 0.9.0 binary. I tried both 32bit and 64 bit versions
> on OpenSuse 11.1 and 10.3 respectively and the results are in the order
> of 1.5-1.8 seconds.
> However, with the 32 bits on kubuntu 8.10, I get about 0.5 seconds for
> the same query on a 2.4GHz Thinkpad.
> I will try to investigate why these different results.

For completeness:  my laptop also runs Ubuntu (Gutsy), confirming your
results.

http://en.wikipedia.org/wiki/Ext3#No_checksumming_in_journal

I'm creating a JIRA entry to track this issue:

http://issues.eigenbase.org/browse/FRG-376

JVS

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

John Sichi
Administrator
In reply to this post by John Sichi
Nicolae, I haven't figured out how to tell Hibernate to stop issuing the
setAutocommit calls (looks like it is doing it automatically for
read-only txns), but I'd like to see if a workaround helps until we come
up with a real fix in the next release.

Could you try editing your luciddb/catalog/FarragoCatalog.script file?
Search for the line

SET WRITE_DELAY 0 MILLIS

and change it to

SET WRITE_DELAY 20

Do this with the server shut down and then restart it; then see if the
timing improves.  However, note that this opens up the possibility of
catalog corruption after a crash (one of the reasons we changed it to 0
in 0.9).

JVS

John V. Sichi wrote:

> Thanks a lot for digging into this; I'll take a look at what is going on
> with the Hibernate read-only transactions and see if the corresponding
> JDBC autocommit start/end calls can be skipped.
>
> JVS
>
> Nicolae Mihalache wrote:
>> Ok, so it turns out that the difference between the two systems is
>> caused by the fsync of FarragoCatalog.log
>> It seems that while running the select from the system catalogs, there
>> are about 40 lines with "SET AUTOCOMMIT TRUE" "SET AUTOCOMMIT FALSE",
>> each of them followed by a fsync.
>>
>> On OpenSuse, the ext3 partition is mounted by default with barrier=1,
>> and this causes the fsync to be quite slow (around 40ms). So in total it
>> spends 1.5 seconds syncing some unnecessary statements in that log file.
>>
>> Isn't there a way to get rid of all those statements written into the
>> log?
>>
>> nicolae
>>
>>
>> Nicolae Mihalache wrote:
>>> Are you running on Linux or Windows? Which version?
>>>
>>> I'm also using the 0.9.0 binary. I tried both 32bit and 64 bit versions
>>> on OpenSuse 11.1 and 10.3 respectively and the results are in the order
>>> of 1.5-1.8 seconds.
>>> However, with the 32 bits on kubuntu 8.10, I get about 0.5 seconds for
>>> the same query on a 2.4GHz Thinkpad.
>>> I will try to investigate why these different results.
>>>
>>> nicolae
>>>
>>>
>>> On 05/05/2009 04:40 AM, John V. Sichi wrote:
>>>  
>>>> My 0.2s timings below are on a 32-bit 2.6GHz Thinkpad T61 laptop,
>>>> using the release version of LucidDB 0.9.  I was using Java 1.5, but
>>>> just now I retested with 1.6 (1.6.0_07-b06) and if anything it was a
>>>> bit faster.
>>>>
>>>> What LucidDB version are you testing?
>>>>
>>>> Did you turn on any tracing (this can slow things down a lot)?
>>>>    
>>
>>
>>
>
>


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

Nicolae Mihalache
Thanks, this indeed helps a lot.

The catalog corruption can happen only if there is a crash when the
catalog is modified, right? That is when I create/modify tables,
schemas, etc?

nicolae

On 05/07/2009 06:36 AM, John V. Sichi wrote:

> Nicolae, I haven't figured out how to tell Hibernate to stop issuing
> the setAutocommit calls (looks like it is doing it automatically for
> read-only txns), but I'd like to see if a workaround helps until we
> come up with a real fix in the next release.
>
> Could you try editing your luciddb/catalog/FarragoCatalog.script file?
> Search for the line
>
> SET WRITE_DELAY 0 MILLIS
>
> and change it to
>
> SET WRITE_DELAY 20
>
> Do this with the server shut down and then restart it; then see if the
> timing improves.  However, note that this opens up the possibility of
> catalog corruption after a crash (one of the reasons we changed it to
> 0 in 0.9).
>
> JVS
>
> John V. Sichi wrote:
>> Thanks a lot for digging into this; I'll take a look at what is going
>> on with the Hibernate read-only transactions and see if the
>> corresponding JDBC autocommit start/end calls can be skipped.
>>
>> JVS
>>
>> Nicolae Mihalache wrote:
>>> Ok, so it turns out that the difference between the two systems is
>>> caused by the fsync of FarragoCatalog.log
>>> It seems that while running the select from the system catalogs, there
>>> are about 40 lines with "SET AUTOCOMMIT TRUE" "SET AUTOCOMMIT FALSE",
>>> each of them followed by a fsync.
>>>
>>> On OpenSuse, the ext3 partition is mounted by default with barrier=1,
>>> and this causes the fsync to be quite slow (around 40ms). So in
>>> total it
>>> spends 1.5 seconds syncing some unnecessary statements in that log
>>> file.
>>>
>>> Isn't there a way to get rid of all those statements written into
>>> the log?
>>>
>>> nicolae
>>>
>>>
>>> Nicolae Mihalache wrote:
>>>> Are you running on Linux or Windows? Which version?
>>>>
>>>> I'm also using the 0.9.0 binary. I tried both 32bit and 64 bit
>>>> versions
>>>> on OpenSuse 11.1 and 10.3 respectively and the results are in the
>>>> order
>>>> of 1.5-1.8 seconds.
>>>> However, with the 32 bits on kubuntu 8.10, I get about 0.5 seconds for
>>>> the same query on a 2.4GHz Thinkpad.
>>>> I will try to investigate why these different results.
>>>>
>>>> nicolae
>>>>
>>>>
>>>> On 05/05/2009 04:40 AM, John V. Sichi wrote:
>>>>  
>>>>> My 0.2s timings below are on a 32-bit 2.6GHz Thinkpad T61 laptop,
>>>>> using the release version of LucidDB 0.9.  I was using Java 1.5, but
>>>>> just now I retested with 1.6 (1.6.0_07-b06) and if anything it was a
>>>>> bit faster.
>>>>>
>>>>> What LucidDB version are you testing?
>>>>>
>>>>> Did you turn on any tracing (this can slow things down a lot)?
>>>>>    
>>>
>>>
>>>
>>
>>


------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

John Sichi
Administrator
Nicolae Mihalache wrote:
> Thanks, this indeed helps a lot.
>
> The catalog corruption can happen only if there is a crash when the
> catalog is modified, right? That is when I create/modify tables,
> schemas, etc?

Right, except note that modification includes DML, ANALYZE, and
TRUNCATE, since these udpate rowcounts/stats.

The new catalog architecture also supports configuring catalog
persistence in an external DBMS such as MySQL, where the ext3 barrier
mode shouldn't be an issue; I'm working on writing up the instructions
for that.

JVS

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

Jeremy Lemaire
Below are some observations I have made while deploying a multi-terabyte LucidDb based Data Warehouse.  I realize that much of this is out of context so not much can be done in terms of recommendations, but any comments you could make based on my observations would be greatly appreciated.

64 bit vs. 32 bit
I am seeing performance issues after moving to a 64 bit system.  Other than modifying the SET WRITE_DELAY parameter, and opening up the possibility of corruption, is there another way to get 32 bit performance out of my 64 bit server? <g>.

Large Result Sets
I have noticed that large result sets seem to cause a linear increase in query time even if the query is more simple (i.e. less joins etc).  Some of this can be attributed to network latency but after running the queries locally on the server and dumping to csv I am still seeing unusually slow queries (>30 seconds).  Is there a way to prevent large result sets from slowing things down?  Is this a JDBC/IO limitation?  Other databases do not seem to be as sensitive to result set size.

Effects of Joins, Group By, and Horizontal Partitioning 
I had a nicely indexed star schema setup with hash joins kicking in.  Query performance was horrible.  As the number of joins increased beyond one or two things got even worse, unusable in fact.  Horizontal partitioning seemed to hurt performance, I think because the indexes were no longer used after the partitioned view was created.  From here I decided to try distributing my partitions across servers in an attempt to get some benefit from parallel processing.  This seemed to further decrease performance.  As a last ditch effort I flattened out the star schema.  This really limited the use of the data warehouse from a flexible query point of view but did speed things up significantly (30+seconds to less than 5 seconds).  This is good, but I thought Lucid was optimized for star schema?  Another observation with my existing flattened schema, it seems that as I add more columns in the SELECT performance suffers and as I add more filters in the WHERE performance gets better.  I understand that more logic will slow things down but I have composite indexes on these columns.  Shouldn't this help alleviate  these types of variations?  It seems that costly queries that I could not run on Postgres in two days run in 30 seconds and queries that used to run on Postgresql in .3 seconds also take about 30 seconds with LucidDB.  At this point it is a great data repository but is not working particularly well for transactional queries coming from a web service.  

Example: Flattened Version of the  Fact/Dimension Tables and A Sample Query  

-------------------------------------------------------
-- Flattened Version of two dimensions (destination, npa) found in my
-- original star schema.  
-- This is fast but still quarky and the same Postgrsql query out performs
-- LucidDB by 10X.  
-------------------------------------------------------

CREATE SCHEMA test_ia_tables_schema;


CREATE TABLE test_ia_tables_schema.inventory_by_destination
(
   inventory_by_destination_key int generated always as identity not null primary key,
   DATETIME timestamp,
   SOURCE_ID integer,
   REASON boolean,
   NPA varchar(3),
   DESTINATION varchar(32),
   "COUNT" integer,
   UNIQUE ( datetime, source_id, reason, npa, destination )
);

create view test_ia_tables_schema.inventory_by_destination_view as
select
    datetime,
    source_id,
    CAST( reason=0 as boolean ) as reason,
    CASE
      WHEN n."VALUE" IS NULL THEN 'unknown'
    ELSE n."VALUE"
    END as npa,
    CASE
      WHEN d."VALUE" IS NULL THEN 'unknown'
    ELSE d."VALUE"
    END as destination,
    SUM("COUNT") as "COUNT"
        from ad_inventory_warehouse.ad_inventory_summary_daily_fact_2010_q2 f
                left join ad_inventory_warehouse.npa_dimension n using ( parameter_set_id )
                left join ad_inventory_warehouse.destination_dimension d using ( parameter_set_id )
     group by datetime, source_id, CAST( reason=0 as boolean ), n."VALUE", d."VALUE";


insert into test_ia_tables_schema.inventory_by_destination (
   DATETIME,
   SOURCE_ID,
   REASON,
   NPA,
   DESTINATION,
   "COUNT"
) select * FROM test_ia_tables_schema.inventory_by_destination_view;

create index inventory_by_destination_all_idx
        on test_ia_tables_schema.inventory_by_destination ( datetime, source_id, reason, npa, destination );

analyze table test_ia_tables_schema.inventory_by_destination estimate statistics for all columns;



-----------------------------------------------------------------------------------------
-- Removing npa from the following query
-- causes query time to go from 30 seconds to less than 5 seconds.
-- There are only .5 Million rows in this partition.
-----------------------------------------------------------------------------------------

select sum("COUNT") as "COUNT", destination, npa
        from test_ia_tables_schema.inventory_by_destination f
                left join ad_inventory_warehouse.publisher_dimension p using ( source_id )
                where p.publisher_id in ( 25 ) and
                        reason = true and
                        datetime >= timestamp '2010-04-01 00:00:00' and
                        datetime < timestamp '2010-05-01 00:00:00'
        group by destination, npa;


   
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

John Sichi
Administrator
haawker wrote:
> 64 bit vs. 32 bit
> I am seeing performance issues after moving to a 64 bit system.  Other than
> modifying the SET WRITE_DELAY parameter, and opening up the possibility of
> corruption, is there another way to get 32 bit performance out of my 64 bit
> server? <g>.

There are so many factors here (CPU speed, RAM, cache size, disk, ...)
that it's meaningless to talk about it without giving the specs for each
system.  All else being equal, for the same amount of RAM, 64-bit does
worse since pointers are twice as big, but the main reason people use
64-bit is so that they can use more RAM (which may or may not speed
things up depending on the workload).

> Large Result Sets
> I have noticed that large result sets seem to cause a linear increase in
> query time even if the query is more simple (i.e. less joins etc).  Some of
> this can be attributed to network latency but after running the queries
> locally on the server and dumping to csv I am still seeing unusually slow
> queries (>30 seconds).  Is there a way to prevent large result sets from
> slowing things down?  Is this a JDBC/IO limitation?  Other databases do not
> seem to be as sensitive to result set size.

What program are you using for fetching and rendering the results?  If
it is sqlline, be sure to !set incremental on, otherwise it buffers up
the whole result set for formatting purposes before ever producing the
first row, and this is slow.  sqlline's rendering code probably isn't
the fastest possible, either.

> Effects of Joins, Group By, and Horizontal Partitioning
> I had a nicely indexed star schema setup with hash joins kicking in.  Query
> performance was horrible.  As the number of joins increased beyond one or
> two things got even worse, unusable in fact.  Horizontal partitioning seemed
> to hurt performance, I think because the indexes were no longer used after
> the partitioned view was created.  From here I decided to try distributing
> my partitions across servers in an attempt to get some benefit from parallel
> processing.  This seemed to further decrease performance.

Yeah, I don't really recommend trying this until Firewater is a little
further along.

> effort I flattened out the star schema.  This really limited the use of the
> data warehouse from a flexible query point of view but did speed things up
> significantly (30+seconds to less than 5 seconds).  This is good, but I
> thought Lucid was optimized for star schema?  Another observation with my
> existing flattened schema, it seems that as I add more columns in the SELECT
> performance suffers and as I add more filters in the WHERE performance gets
> better.  I understand that more logic will slow things down but I have
> composite indexes on these columns.  Shouldn't this help alleviate  these
> types of variations?  It seems that costly queries that I could not run on
> Postgres in two days run in 30 seconds and queries that used to run on
> Postgresql in .3 seconds also take about 30 seconds with LucidDB.  At this
> point it is a great data repository but is not working particularly well for
> transactional queries coming from a web service.  

1) Don't use composite indexes with LucidDB for the star schema.
Instead, create indexes on individual columns.  (There are good reasons
to use composite indexes in some ETL cases, but not here.)

2) Make sure you have indexes on your foreign key columns in the fact
table (such as SOURCE_ID in your example below).  These are needed in
order for LucidDB's star join optimization to kick in.

If you do this, you should not need any flattening.

JVS

>
> Example: Flattened Version of the  Fact/Dimension Tables and A Sample Query  
>
> -------------------------------------------------------
> -- Flattened Version of two dimensions (destination, npa) found in my
> -- original star schema.  
> -- This is fast but still quarky and the same Postgrsql query out performs
> -- LucidDB by 10X.  
> -------------------------------------------------------
>
> CREATE SCHEMA test_ia_tables_schema;
>
>
> CREATE TABLE test_ia_tables_schema.inventory_by_destination
> (
>    inventory_by_destination_key int generated always as identity not null
> primary key,
>    DATETIME timestamp,
>    SOURCE_ID integer,
>    REASON boolean,
>    NPA varchar(3),
>    DESTINATION varchar(32),
>    "COUNT" integer,
>    UNIQUE ( datetime, source_id, reason, npa, destination )
> );
>
> create view test_ia_tables_schema.inventory_by_destination_view as
> select
>     datetime,
>     source_id,
>     CAST( reason=0 as boolean ) as reason,
>     CASE
>       WHEN n."VALUE" IS NULL THEN 'unknown'
>     ELSE n."VALUE"
>     END as npa,
>     CASE
>       WHEN d."VALUE" IS NULL THEN 'unknown'
>     ELSE d."VALUE"
>     END as destination,
>     SUM("COUNT") as "COUNT"
>         from ad_inventory_warehouse.ad_inventory_summary_daily_fact_2010_q2
> f
>                 left join ad_inventory_warehouse.npa_dimension n using (
> parameter_set_id )
>                 left join ad_inventory_warehouse.destination_dimension d
> using ( parameter_set_id )
>      group by datetime, source_id, CAST( reason=0 as boolean ), n."VALUE",
> d."VALUE";
>
>
> insert into test_ia_tables_schema.inventory_by_destination (
>    DATETIME,
>    SOURCE_ID,
>    REASON,
>    NPA,
>    DESTINATION,
>    "COUNT"
> ) select * FROM test_ia_tables_schema.inventory_by_destination_view;
>
> create index inventory_by_destination_all_idx
>         on test_ia_tables_schema.inventory_by_destination ( datetime,
> source_id, reason, npa, destination );
>
> analyze table test_ia_tables_schema.inventory_by_destination estimate
> statistics for all columns;
>
>
>
> -----------------------------------------------------------------------------------------
> -- Removing npa from the following query
> -- causes query time to go from 30 seconds to less than 5 seconds.
> -- There are only .5 Million rows in this partition.
> -----------------------------------------------------------------------------------------
>
> select sum("COUNT") as "COUNT", destination, npa
>         from test_ia_tables_schema.inventory_by_destination f
>                 left join ad_inventory_warehouse.publisher_dimension p using
> ( source_id )
>                 where p.publisher_id in ( 25 ) and
>                         reason = true and
>                         datetime >= timestamp '2010-04-01 00:00:00' and
>                         datetime < timestamp '2010-05-01 00:00:00'
>         group by destination, npa;
>
>
>    


------------------------------------------------------------------------------
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow select from system catalogs

ngoodman
JVS had covered most of it I think.

On Apr 22, 2010, at 1:06 AM, John V. Sichi wrote:

>> existing flattened schema, it seems that as I add more columns in the SELECT
>> performance suffers and as I add more filters in the WHERE performance gets
>> better.  I understand that more logic will slow things down but I have


If your query workload is I/O bound (which for the size you mentioned there's a good chance) then this makes perfect sense.

SELECT:
The more columns you add to the SELECT the more I/O we have to do to access from the the column storage engine.  If we don't need the column, we don't incur the I/O and processing costs of retrieving and using that column.

WHERE:
The more filters you add in the WHERE the more selective the initial filtering will be and will reduce the I/O workload against your main table.  The smaller the dataset that is being aggregated/processed the faster the query.


So, LucidDB is responding well to both of these conditions:
Reduce the number of columns in the SELECT clause query time improves (a good thing).
Reduce the number of rows to be aggregated/processed in the WHERE clause query time improves (a good thing).

Nick
------------------------------------------------------------------------------
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
12
Loading...