Quantcast

Connection limit or something else?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Connection limit or something else?

Francisco Reyes-3
I have one connection creating an index and a second connection dropping a
schema. I tried to open a third connection and it is just waiting.

The schema I am dropping is my import schema for postgresql so it is not the
schema where the data is.

Looking at top, iostat, vmstat, and ps I see that the lucidDB seems to
still be working. Is there a log or something I can
check to see why the third connection is not going in?


------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

John Sichi
Administrator
Francisco Reyes wrote:
> I have one connection creating an index and a second connection dropping a
> schema. I tried to open a third connection and it is just waiting.
>
> The schema I am dropping is my import schema for postgresql so it is not the
> schema where the data is.
>
> Looking at top, iostat, vmstat, and ps I see that the lucidDB seems to
> still be working. Is there a log or something I can
> check to see why the third connection is not going in?

Creating an index on a table with existing data is one of our few
remaining-to-be-fixed concurrency problems:  it keeps the catalog
exclusive-locked for the duration of the execution instead of just for
the metadata part.  So that's probably what you're seeing, since the
login for the new connection attempts to look up the user in the catalog.

JVS

------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Francisco Reyes-3
John V. Sichi writes:

> Creating an index on a table with existing data is one of our few
> remaining-to-be-fixed concurrency problems:

That was it. After the index finished the login went through.

Updated the "create index" on the docs page to reflect this issue.

Anything else is affected?

------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

John Sichi
Administrator
Francisco Reyes wrote:

> John V. Sichi writes:
>
>> Creating an index on a table with existing data is one of our few
>> remaining-to-be-fixed concurrency problems:
>
> That was it. After the index finished the login went through.
>
> Updated the "create index" on the docs page to reflect this issue.
>
> Anything else is affected?

Thanks for this and the other wiki updates; I've added to the note to
indicate that other catalog-access activites such as query preparation
are also affected.

JVS


------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
I am seeing a similar issue with v0.9.2. I have a complex set of transformations and inserts.  In a daily batch process about 15 million rows are inserted into two raw tables and then this data is broken out in various dimensions and facts.  1 out 9 times the system locks up after the inserts take place and the analyze table commands are being run.  During this time when I try to connect via sqllineClient or a JDBC connection it just hangs.  Using lsof I can see the socket is ESTABLISHED but sqllineClient never returns a prompt and the JDBC connnection never returns a result.

This seem to only occur when I am doing concurrent selects from one table into 5 or so other tables followed by an analyze.  Here is the processes (note that processes 1-5 run concurrently):

process 1
select from table A insert into table B
analyze table B

process 2
select from table A insert into table C
analyze table C

process 3
select from table A insert into table D
analyze table D

process 4
select from table A insert into table E
analyze table E

process 5
select from table A insert into table F
analyze table F

There are two locations in my script that sporadically hang, both are concurrent processes and both hang on the analyze.  All processes show up in top but there is little to no cpu utilzation on the server, 0-1% on one of the 8 cores.  The only way to fix this is to kill all client processes and then restart the server.  When I try to restart the server the first time after killing it I get thousands of lines of errors in the terminal and then a segmentation fault.  

/lib/libpthread.so.0 [0x7fce5b85ca80]
/home/lucid/luciddb-0.9.2/lib/fennel/libfarrago.so(fennel::JavaTraceTarget::notifyTrace(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> >, fennel::TraceLevel, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> >)+0x58) [0x229e38]
/home/lucid/luciddb/lib/fennel/libfennel_common.so(fennel::AutoBacktrace::signal_handler(int)+0x1bd) [0x2ae8d]
/lib/libpthread.so.0 [0x7fce5b85ca80]
/lib/libc.so.6(gsignal+0x35) [0x7fce5b328ed5]
/lib/libc.so.6(abort+0x183) [0x7fce5b32a3f3]
/usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5ae55679]
/usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5af8ec4f]
/usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5af8f211]
/lib/libpthread.so.0 [0x7fce5b85ca80]
/home/lucid/luciddb-0.9.2/lib/fennel/libfarrago.so(fennel::JavaTraceTarget::notifyTrace(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> >, fennel::TraceLevel, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> >)+0x58) [0x229e38]
/home/lucid/luciddb/lib/fennel/libfennel_common.so(fennel::AutoBacktrace::signal_handler(int)+0x1bd) [0x2ae8d]
/lib/libpthread.so.0 [0x7fce5b85ca80]
/lib/libc.so.6(gsignal+0x35) [0x7fce5b328ed5]
/lib/libc.so.6(abort+0x183) [0x7fce5b32a3f3]
/usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5ae55679]
[Too many errors, abort]
./lucidDbServer: line 9:  1729 Segmentation fault      ${JAVA_EXEC} ${JAVA_ARGS} com.lucidera.farrago.LucidDbServer

The second restart seems to be alright but it takes several days of this lockup-restart process before it seems to straighten out.  This seems like a concurrency issue to me so I am going to start doing the 5 processes in a synchronous manner to see if it helps.  I fear this will cause my import to finish too late however, too late is better than not finishing at all.  I'll report back with the results from this test.

Incidently, there are several indexes on the tables being inserted into.  Here is an example of one of them:

create table caller_inventory_by_carrier_2009_q2(
    caller_inventory_by_carrier_2009_q2_key int generated always as identity not null primary key,
    "COUNT" int,
    caller_id varchar(32),
    source_id int,
    datetime timestamp not null,
    filled boolean,
    npa varchar(32),
    carrier varchar(32),
    unique ( caller_id, source_id, datetime, filled, npa, carrier )
);
create index caller_inventory_by_carrier_2009_q2_source_id_idx on caller_inventory_by_carrier_2009_q2(source_id);
create index caller_inventory_by_carrier_2009_q2_caller_id_idx on caller_inventory_by_carrier_2009_q2(caller_id);
create index caller_inventory_by_carrier_2009_q2_datetime_idx on caller_inventory_by_carrier_2009_q2(datetime);
create index caller_inventory_by_carrier_2009_q2_filled_idx on caller_inventory_by_carrier_2009_q2(filled);
create index caller_inventory_by_carrier_2009_q2_npa_idx on caller_inventory_by_carrier_2009_q2(npa);
create index caller_inventory_by_carrier_2009_q2_carrier_idx on caller_inventory_by_carrier_2009_q2(carrier);

Any workarounds or comments would be greatly appreciated.

John Sichi wrote
Francisco Reyes wrote:
> John V. Sichi writes:
>
>> Creating an index on a table with existing data is one of our few
>> remaining-to-be-fixed concurrency problems:
>
> That was it. After the index finished the login went through.
>
> Updated the "create index" on the docs page to reflect this issue.
>
> Anything else is affected?

Thanks for this and the other wiki updates; I've added to the note to
indicate that other catalog-access activites such as query preparation
are also affected.

JVS


------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
luciddb-users mailing list
luciddb-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

John Sichi
Administrator
Hi Jeremy,

This looks like it may be a different problem from the one originally
described in this thread, since that issue had to do with running
CREATE INDEX (as opposed to updating an existing index implicitly as
part of a load, which should not have any concurrency problems).

The next time the lockup happens, could you run jstack on the server
process (make sure you get the server and not the clients) and create
a JIRA issue containing the stack dump?  We may be able to debug it
from that.

Also, since you are running more than 4 concurrent statements, did you
increase system parameter "expectedConcurrentStatements" from the
default of 4?

JVS

On Tue, Dec 21, 2010 at 7:42 AM, Jeremy Lemaire <[hidden email]> wrote:

>
> I am seeing a similar issue with v0.9.2. I have a complex set of
> transformations and inserts.  In a daily batch process about 15 million rows
> are inserted into two raw tables and then this data is broken out in various
> dimensions and facts.  1 out 9 times the system locks up after the inserts
> take place and the analyze table commands are being run.  During this time
> when I try to connect via sqllineClient or a JDBC connection it just hangs.
> Using lsof I can see the socket is ESTABLISHED but sqllineClient never
> returns a prompt and the JDBC connnection never returns a result.
>
> This seem to only occur when I am doing concurrent selects from one table
> into 5 or so other tables followed by an analyze.  Here is the processes
> (note that processes 1-5 run concurrently):
>
> process 1
> select from table A insert into table B
> analyze table B
>
> process 2
> select from table A insert into table C
> analyze table C
>
> process 3
> select from table A insert into table D
> analyze table D
>
> process 4
> select from table A insert into table E
> analyze table E
>
> process 5
> select from table A insert into table F
> analyze table F
>
> There are two locations in my script that sporadically hang, both are
> concurrent processes and both hang on the analyze.  All processes show up in
> top but there is little to no cpu utilzation on the server, 0-1% on one of
> the 8 cores.  The only way to fix this is to kill all client processes and
> then restart the server.  When I try to restart the server the first time
> after killing it I get thousands of lines of errors in the terminal and then
> a segmentation fault.
>
> /lib/libpthread.so.0 [0x7fce5b85ca80]
> /home/lucid/luciddb-0.9.2/lib/fennel/libfarrago.so(fennel::JavaTraceTarget::notifyTrace(stlp_std::basic_string<char,
> stlp_std::char_traits<char>, stlp_std::allocator<char> >,
> fennel::TraceLevel, stlp_std::basic_string<char,
> stlp_std::char_traits<char>, stlp_std::allocator<char> >)+0x58) [0x229e38]
> /home/lucid/luciddb/lib/fennel/libfennel_common.so(fennel::AutoBacktrace::signal_handler(int)+0x1bd)
> [0x2ae8d]
> /lib/libpthread.so.0 [0x7fce5b85ca80]
> /lib/libc.so.6(gsignal+0x35) [0x7fce5b328ed5]
> /lib/libc.so.6(abort+0x183) [0x7fce5b32a3f3]
> /usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5ae55679]
> /usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5af8ec4f]
> /usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5af8f211]
> /lib/libpthread.so.0 [0x7fce5b85ca80]
> /home/lucid/luciddb-0.9.2/lib/fennel/libfarrago.so(fennel::JavaTraceTarget::notifyTrace(stlp_std::basic_string<char,
> stlp_std::char_traits<char>, stlp_std::allocator<char> >,
> fennel::TraceLevel, stlp_std::basic_string<char,
> stlp_std::char_traits<char>, stlp_std::allocator<char> >)+0x58) [0x229e38]
> /home/lucid/luciddb/lib/fennel/libfennel_common.so(fennel::AutoBacktrace::signal_handler(int)+0x1bd)
> [0x2ae8d]
> /lib/libpthread.so.0 [0x7fce5b85ca80]
> /lib/libc.so.6(gsignal+0x35) [0x7fce5b328ed5]
> /lib/libc.so.6(abort+0x183) [0x7fce5b32a3f3]
> /usr/opt/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so [0x7fce5ae55679]
> [Too many errors, abort]
> ./lucidDbServer: line 9:  1729 Segmentation fault      ${JAVA_EXEC}
> ${JAVA_ARGS} com.lucidera.farrago.LucidDbServer
>
> The second restart seems to be alright but it takes several days of this
> lockup-restart process before it seems to straighten out.  This seems like a
> concurrency issue to me so I am going to start doing the 5 processes in a
> synchronous manner to see if it helps.  I fear this will cause my import to
> finish too late however, too late is better than not finishing at all.  I'll
> report back with the results from this test.
>
> Incidently, there are several indexes on the tables being inserted into.
> Here is an example of one of them:
>
> create table caller_inventory_by_carrier_2009_q2(
>    caller_inventory_by_carrier_2009_q2_key int generated always as identity
> not null primary key,
>    "COUNT" int,
>    caller_id varchar(32),
>    source_id int,
>    datetime timestamp not null,
>    filled boolean,
>    npa varchar(32),
>    carrier varchar(32),
>    unique ( caller_id, source_id, datetime, filled, npa, carrier )
> );
> create index caller_inventory_by_carrier_2009_q2_source_id_idx on
> caller_inventory_by_carrier_2009_q2(source_id);
> create index caller_inventory_by_carrier_2009_q2_caller_id_idx on
> caller_inventory_by_carrier_2009_q2(caller_id);
> create index caller_inventory_by_carrier_2009_q2_datetime_idx on
> caller_inventory_by_carrier_2009_q2(datetime);
> create index caller_inventory_by_carrier_2009_q2_filled_idx on
> caller_inventory_by_carrier_2009_q2(filled);
> create index caller_inventory_by_carrier_2009_q2_npa_idx on
> caller_inventory_by_carrier_2009_q2(npa);
> create index caller_inventory_by_carrier_2009_q2_carrier_idx on
> caller_inventory_by_carrier_2009_q2(carrier);
>
> Any workarounds or comments would be greatly appreciated.
>
>
> John Sichi wrote:
>>
>> Francisco Reyes wrote:
>>> John V. Sichi writes:
>>>
>>>> Creating an index on a table with existing data is one of our few
>>>> remaining-to-be-fixed concurrency problems:
>>>
>>> That was it. After the index finished the login went through.
>>>
>>> Updated the "create index" on the docs page to reflect this issue.
>>>
>>> Anything else is affected?
>>
>> Thanks for this and the other wiki updates; I've added to the note to
>> indicate that other catalog-access activites such as query preparation
>> are also affected.
>>
>> JVS
>>
>>
>> ------------------------------------------------------------------------------
>> Are you an open source citizen? Join us for the Open Source Bridge
>> conference!
>> Portland, OR, June 17-19. Two days of sessions, one day of unconference:
>> $250.
>> Need another reason to go? 24-hour hacker lounge. Register today!
>> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
>> _______________________________________________
>> luciddb-users mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>>
>>
>
> --
> View this message in context: http://luciddb-users.1374590.n2.nabble.com/Connection-limit-or-something-else-tp3122544p5855594.html
> Sent from the luciddb-users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Forrester recently released a report on the Return on Investment (ROI) of
> Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
> within 7 months.  Over 3 million businesses have gone Google with Google Apps:
> an online email calendar, and document program that's accessible from your
> browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>

------------------------------------------------------------------------------
Forrester recently released a report on the Return on Investment (ROI) of
Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
within 7 months.  Over 3 million businesses have gone Google with Google Apps:
an online email calendar, and document program that's accessible from your
browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
The expectedConcurrentStatements variable is set to 32 but this server is also getting hit by a web service so there is the potential for lots of lingering sessions/statements if things get backed up. However I am not seeing the "out of scratch space" error so I assumed this was not the case.  Is this a valid assumption?

I will run jstack and create a JIRA issue the next time this occurs.  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

John Sichi
Administrator
Your assumption is correct (you should get an "out of scratch space"
error if you exhaust the buffer pool).

An environment like the one you describe may require a large Java
heap.  Also, could you run

java -version

and provide the output?  There was one JVM hang bug which got fixed
somewhere between 1.6.0_07 and 1.6.0_18.

We also fixed one LucidDB leak in 0.9.3, so you should consider
upgrading to that if you're still running 0.9.2:

http://issues.eigenbase.org/browse/FNL-89

JVS

On Wed, Dec 22, 2010 at 6:30 AM, Jeremy Lemaire <[hidden email]> wrote:

>
> The expectedConcurrentStatements variable is set to 32 but this server is
> also getting hit by a web service so there is the potential for lots of
> lingering sessions/statements if things get backed up. However I am not
> seeing the "out of scratch space" error so I assumed this was not the case.
> Is this a valid assumption?
>
> I will run jstack and create a JIRA issue the next time this occurs.
> --
> View this message in context: http://luciddb-users.1374590.n2.nabble.com/Connection-limit-or-something-else-tp3122544p5859333.html
> Sent from the luciddb-users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Forrester recently released a report on the Return on Investment (ROI) of
> Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
> within 7 months.  Over 3 million businesses have gone Google with Google Apps:
> an online email calendar, and document program that's accessible from your
> browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
Java Version:

lucid@adsdw02:~/luciddb$ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

Java Heap (16GB Physical RAM):

JAVA_ARGS="-Xms512m -Xmx4096m -cp `cat $MAIN_DIR/bin/classpath.gen` \
  -Dnet.sf.farrago.home=$MAIN_DIR \
  -Dorg.eigenbase.util.AWT_WORKAROUND=off \
  -Djava.util.logging.config.file=$MAIN_DIR/trace/Trace.properties"


I'll upgrade to 0.9.3 and report back.




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
I have not had a chance to upgrade to 0.9.3 yet but here is the data you requested from 0.9.2:

Jstack Output
LucidDbServer_Stack.txt

Some Observations
  1. lowering java heap seems to cause server to run out of java heap space and cause system to crash
  2. upping java heap seems to run out of physical memory and cause system to hang
  3. Using less memory by doing analyze operations serially instead of in parallel fixes the problem but
      takes too long to process causing daily imports to overlap.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

John Sichi
Administrator
Thanks for the stack.  After studying it, I have not been able to
identify an explicit deadlock (and if there were one, I think jstack
would have reported it since these are standard ReadWriteLocks).  So I
think most likely a thread has died holding the repository lock
(causing subsequent lock requests on the repository lock to hang).
Normally, this shouldn't be possible since the exception handling is
careful for these cases, but I suspect that if it died due to running
out of memory, then some of the exception handlers can fail too,
leading the unlock portion to be skipped.

4G for the max Java heap size would normally be enough, but if it's
the leak which was fixed in 0.9.3, then the heap could have been
exhausted.  Since you said you have a lot of concurrent queries from
the web service, it's hard to say.  It's generally a good idea to set
the min and max Java heap to the same size (4G in this case) to make
sure that you have all the requested memory dedicated to the JVM up
front.

Also, can you send the output of the following statement so we can see
the buffer pool size etc?

select * from sys_root.dba_system_parameters;

BTW, for the ANALYZE, make sure you are using ESTIMATE (not COMPUTE)
to keep the runtime as short as possible.

JVS

On Wed, Dec 29, 2010 at 11:18 AM, Jeremy Lemaire <[hidden email]> wrote:

>
> I have not had a chance to upgrade to 0.9.3 yet but here is the data you
> requested from 0.9.2:
>
> Jstack Output
> http://luciddb-users.1374590.n2.nabble.com/file/n5875274/LucidDbServer_Stack.txt
> LucidDbServer_Stack.txt
>
> Some Observations
>  1. lowering java heap seems to cause server to run out of java heap space
> and cause system to crash
>  2. upping java heap seems to run out of physical memory and cause system
> to hang
>  3. Using less memory by doing analyze operations serially instead of in
> parallel fixes the problem but
>      takes too long to process causing daily imports to overlap.
>
>
> --
> View this message in context: http://luciddb-users.1374590.n2.nabble.com/Connection-limit-or-something-else-tp3122544p5875274.html
> Sent from the luciddb-users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment, and,
> should the need arise, upgrade to a full multi-node Oracle RAC database
> without downtime or disruption
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
System Parameters
system_parameters.txt

After seeing the jstack I am also leaning towards a problem with exhausted free memory as opposed to my original concern that it was deadlock.  Because of this I have requested more RAM for this machine and also held off on submitting anything to JIRA.  if you disagree let me know and I will submit the details we have discussed.

As for the buffer pool, early on I tried several different settings. 4G for Java Heap and 6G for the Buffer Pool seemed to work best at the time.  My theory for not making the min and max heap both 4G was that I would not be able to run more than one instance of sqllineClient.  Given that it is only a 16G system and that lucidDbServer and sqllineClient share Java heap settings as defined in the defineFarragoRuntime.sh script, it seemed better to allow those clients that do not require 4G to use as little as 512M and grow dynamically.  However running as many as 5 (memory hungry) instances of sqllineClient simultaneously, each of which being capable of consuming a max of 4G of RAM, I can see how memory could quickly become an issue on a 16G system.  My understanding of Java heap however, is that the app will just chew up swap once it runs out of free which could be why it appears to hang.  Maybe it is not hanging at all but instead just swapping like crazy and going sloooow.  Seemingly this would explain the analyze statements not completing, but could it go slow enough not to service the socket connections properly?  I don't recall excessive swap but I will be sure to check if this happens again.

For now I have made a change to do all inserts in parallel and all analyzes w/ ESTIMATE (not COMPUTE) serially and this appears to have worked around the problem.  Going forward I will try and get this going on a 32G machine with version 0.9.3.  Also within the next couple of months I should have a Hadoop cluster in place to offload some of the computation and storage that LucidDb is needlessly doing now and allow it to focus on OLAP jobs.  I think these changes will make my LucidDb setup much happier.  

Let me know if there is any other information you would like and if you think a JIRA entry is still warranted.





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

John Sichi
Administrator
I had forgotten about the client memory setting issue...I've logged a
bug since we should really fix this (and in general make it easier to
configure the memory settings without having to edit scripts
directly).

http://issues.eigenbase.org/browse/LDB-234

JVS

On Thu, Dec 30, 2010 at 6:51 PM, Jeremy Lemaire <[hidden email]> wrote:

>
> System Parameters
> http://luciddb-users.1374590.n2.nabble.com/file/n5877674/system_parameters.txt
> system_parameters.txt
>
> After seeing the jstack I am also leaning towards a problem with exhausted
> free memory as opposed to my original concern that it was deadlock.  Because
> of this I have requested more RAM for this machine and also held off on
> submitting anything to JIRA.  if you disagree let me know and I will submit
> the details we have discussed.
>
> As for the buffer pool, early on I tried several different settings. 4G for
> Java Heap and 6G for the Buffer Pool seemed to work best at the time.  My
> theory for not making the min and max heap both 4G was that I would not be
> able to run more than one instance of sqllineClient.  Given that it is only
> a 16G system and that lucidDbServer and sqllineClient share Java heap
> settings as defined in the defineFarragoRuntime.sh script, it seemed better
> to allow those clients that do not require 4G to use as little as 512M and
> grow dynamically.  However running as many as 5 (memory hungry) instances of
> sqllineClient simultaneously, each of which being capable of consuming a max
> of 4G of RAM, I can see how memory could quickly become an issue on a 16G
> system.  My understanding of Java heap however, is that the app will just
> chew up swap once it runs out of free which could be why it appears to hang.
> Maybe it is not hanging at all but instead just swapping like crazy and
> going sloooow.  Seemingly this would explain the analyze statements not
> completing, but could it go slow enough not to service the socket
> connections properly?  I don't recall excessive swap but I will be sure to
> check if this happens again.
>
> For now I have made a change to do all inserts in parallel and all analyzes
> w/ ESTIMATE (not COMPUTE) serially and this appears to have worked around
> the problem.  Going forward I will try and get this going on a 32G machine
> with version 0.9.3.  Also within the next couple of months I should have a
> Hadoop cluster in place to offload some of the computation and storage that
> LucidDb is needlessly doing now and allow it to focus on OLAP jobs.  I think
> these changes will make my LucidDb setup much happier.
>
> Let me know if there is any other information you would like and if you
> think a JIRA entry is still warranted.
>
>
>
>
>
>
> --
> View this message in context: http://luciddb-users.1374590.n2.nabble.com/Connection-limit-or-something-else-tp3122544p5877674.html
> Sent from the luciddb-users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment, and,
> should the need arise, upgrade to a full multi-node Oracle RAC database
> without downtime or disruption
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
Last night while LucidDb was importing data, all new client connections hung again.  This time it was much different than any of the previous scenarios.  Where in the past sqllineClient would just hang indefinitely shortly after the connect, this time a java.io.IOException was thrown:

java.sql.SQLException: java.io.IOException: Premature EOF
       at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538)
       at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582)
       at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669)
       at java.io.FilterInputStream.read(FilterInputStream.java:116)
       at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2446)
       at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2441)
       at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2430)
       at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
       at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
       at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
       at de.simplicit.vjdbc.servlet.ServletCommandSinkJdkHttpClient.connect(ServletCommandSinkJdkHttpClient.java:55)
       at de.simplicit.vjdbc.VirtualDriver.connect(VirtualDriver.java:127)
       at net.sf.farrago.jdbc.client.FarragoUnregisteredVjdbcHttpClientDriver.connect(FarragoUnregisteredVjdbcHttpClientDriver.java:99)
       at org.tranql.connector.jdbc.JDBCDriverMCF.getPhysicalConnection(JDBCDriverMCF.java:96)
       at org.tranql.connector.jdbc.JDBCDriverMCF.createManagedConnection(JDBCDriverMCF.java:73)
       at org.apache.geronimo.connector.outbound.MCFConnectionInterceptor.getConnection(MCFConnectionInterceptor.java:49)
       at org.apache.geronimo.connector.outbound.LocalXAResourceInsertionInterceptor.getConnection(LocalXAResourceInsertionInterceptor.java:41)
       at org.apache.geronimo.connector.outbound.SinglePoolConnectionInterceptor.internalGetConnection(SinglePoolConnectionInterceptor.java:71)
       at org.apache.geronimo.connector.outbound.AbstractSinglePoolConnectionInterceptor.getConnection(AbstractSinglePoolConnectionInterceptor.java:80)
       at org.apache.geronimo.connector.outbound.TransactionEnlistingInterceptor.getConnection(TransactionEnlistingInterceptor.java:46)
       at org.apache.geronimo.connector.outbound.TransactionCachingInterceptor.getConnection(TransactionCachingInterceptor.java:96)
       at org.apache.geronimo.connector.outbound.ConnectionHandleInterceptor.getConnection(ConnectionHandleInterceptor.java:43)
       at org.apache.geronimo.connector.outbound.TCCLInterceptor.getConnection(TCCLInterceptor.java:39)
       at org.apache.geronimo.connector.outbound.ConnectionTrackingInterceptor.getConnection(ConnectionTrackingInterceptor.java:66)
       at org.apache.geronimo.connector.outbound.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:87)
       at org.tranql.connector.jdbc.DataSource.getConnection(DataSource.java:56)

I have not been able to successfully upgrade from 0.9.2 to 0.9.3 yet and consequently I am not sure if this is a new problem or not, but it did require me to kill -9 the lucidDbServer process to continue (!quit and !kill did not work).  This happened when connection originated from both a Geronimo database pool and when using sqllineClient.  My guess is that it is another memory related problem that will go away once I upgrade my RAM from 16GB to 32GB and move to v.0.9.3, but I thought I throw it out there in case anyone has seen this before.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
A couple of changes that have been made to the system recently and some general observation that I should mention:

./bin/lucidDbServer now using params -Xms2048m -Xmx4096m
./binsqllineClient now using params -Xms512m -Xmx5120m -XX:-UseGCOverheadLimit

I am also continuing to run only one or two instances of ./bin/sqllineClient concurrently while doing an import to conserve memory.  

With the changes made failures seem to be less frequent but more severe (i.e. exceptions rather than hangs)

There are concurrent queries originating from the Geronimo database pool, but expectedConcurrentStatements is still set at 32 and we are never anywhere near this limit so this does not appear to be an issue.

We are in the last month of Q1.  The larger tables in the system are partitioned by quarter.  The system seems to run much faster and with less failures at the beginning of each quarter than it does at the end. Monthly partitions may help but I am afraid it would hinder query performance when crossing partitions.  

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

John Sichi
Administrator
Do you perform deletions/updates, or only inserts?

If anything but inserts, then the presence of deleted rows could
account for the slowdown, in which case ALTER TABLE REBUILD is the
recommended solution.

JVS

On Fri, Mar 4, 2011 at 12:19 PM, Jeremy Lemaire <[hidden email]> wrote:

> A couple of changes that have been made to the system recently and some
> general observation that I should mention:
>
> ./bin/lucidDbServer now using params -Xms2048m -Xmx4096m
> ./binsqllineClient now using params -Xms512m -Xmx5120m
> -XX:-UseGCOverheadLimit
>
> I am also continuing to run only one or two instances of ./bin/sqllineClient
> concurrently while doing an import to conserve memory.
>
> With the changes made failures seem to be less frequent but more severe
> (i.e. exceptions rather than hangs)
>
> There are concurrent queries originating from the Geronimo database pool,
> but expectedConcurrentStatements is still set at 32 and we are never
> anywhere near this limit so this does not appear to be an issue.
>
> We are in the last month of Q1.  The larger tables in the system are
> partitioned by quarter.  The system seems to run much faster and with less
> failures at the beginning of each quarter than it does at the end. Monthly
> partitions may help but I am afraid it would hinder query performance when
> crossing partitions.
>
>
>
> --
> View this message in context: http://luciddb-users.1374590.n2.nabble.com/Connection-limit-or-something-else-tp3122544p6089917.html
> Sent from the luciddb-users mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> What You Don't Know About Data Connectivity CAN Hurt You
> This paper provides an overview of data connectivity, details
> its effect on application quality, and explores various alternative
> solutions. http://p.sf.net/sfu/progress-d2d
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>

------------------------------------------------------------------------------
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connection limit or something else?

Jeremy Lemaire
Both UPSERTs and INSERTs are done daily.  DELETEs are only done if something goes wrong and I need to rebuild the data for a particular day.  In all cases I have ALTER TABLE REBUILD statements at the end of each script followed by an ALTER SYSTEM DEALLOCATE OLD.  Without this, as you have stated, performance degraded significantly.

Unfortunately this is not the problem in this case.
Loading...