Quantcast

Out of memory when merging large data

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Out of memory when merging large data

Derek Wu
Hi All,

I'm trying to merge lots of data from different servers into luciddb
0.9.2 on windows xp(32bit)
Here below is the two issues I met so far:
1) Looks like sqllineClient.bat always return success even the query failed
I'm using Ant to invoke sqllineClient.bat to execute my ETL script,
but I always got success message even the ETL failed for some reason.
2) I got out of memory error when merging large data from source
server and luciddb shuted down
The error message looks like "Error: java.lang.OutOfMemoryError: Java
heap space (state=,code=0)".
The source table contains 9M rows(approximate 400+MB in storage), and
I'm using default luciddb settings with 1200M JVM heap size.
+---------------------------------+-------------------------------------+
|           PARAM_NAME            |             PARAM_VALUE             |
+---------------------------------+-------------------------------------+
| cachePageSize                   | 32768                               |
| cachePagesInit                  | 5000                                |
| cachePagesMax                   | 5000                                |
| cacheReservePercentage          | 5                                   |
| calcVirtualMachine              | CALCVM_JAVA                         |
| checkpointInterval              | 0                                   |
| codeCacheMaxBytes               | 2000000                             |
| connectionTimeoutMillis         | 86400000                            |
| databaseIncrementSize           | 1000                                |
| databaseInitSize                | 2000                                |
| databaseMaxSize                 | 0                                   |
| databaseShadowLogIncrementSize  | 1000                                |
| databaseShadowLogInitSize       | 2000                                |
| databaseTxnLogIncrementSize     | 1000                                |
| databaseTxnLogInitSize          | 2000                                |
| deviceSchedulerType             | default                             |
| expectedConcurrentStatements    | 4                                   |
| fennelDisabled                  | false                               |
| freshmenPageQueuePercentage     | 25                                  |
| groupCommitInterval             | 0                                   |
| javaCompilerClassName           | org.eigenbase.javac.JaninoCompiler  |
| jniHandleTraceFile              |                                     |
| pageHistoryQueuePercentage      | 100                                 |
| prefetchPagesMax                | 12                                  |
| prefetchThrottleRate            | 10                                  |
| processorCacheBytes             | -1                                  |
| resourceDir                     | ${FARRAGO_HOME}/catalog/fennel      |
| serverHttpPort                  | -1                                  |
| serverRmiRegistryPort           | -1                                  |
| serverSingleListenerPort        | -1                                  |
| tempIncrementSize               | 1000                                |
| tempInitSize                    | 3000                                |
| tempMaxSize                     | 0                                   |
| userCatalogEnabled              | false                               |
+---------------------------------+-------------------------------------+

Any best practice for issue #2? FYI, every day we'll have additional
0.1M - 0.5M rows every work day in that table on each server.

Thanks,
Derek Wu

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Out of memory when merging large data

Nicholas Goodman


On Thu, Mar 25, 2010 at 7:00 PM, Derek Wu <[hidden email]> wrote:
1) Looks like sqllineClient.bat always return success even the query failed
I'm using Ant to invoke sqllineClient.bat to execute my ETL script,
but I always got success message even the ETL failed for some reason.
That may be expected.  sqllineClient executes, and properly displays the message.  I'm not sure if there's a way to get sqllineClient to exit with an error if any JDBC statements were errors.
 
2) I got out of memory error when merging large data from source
server and luciddb shuted down
The error message looks like "Error: java.lang.OutOfMemoryError: Java
heap space (state=,code=0)".
The source table contains 9M rows(approximate 400+MB in storage), and
I'm using default luciddb settings with 1200M JVM heap size.
Can you add some more detail here?  Source table, source server (I'm betting it's MySQL... MySQL driver is notorious for causing OOM exceptions), and the DML you are executing?

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Out of memory when merging large data

John Sichi
Administrator
Nicholas Goodman wrote:

>     2) I got out of memory error when merging large data from source
>     server and luciddb shuted down
>     The error message looks like "Error: java.lang.OutOfMemoryError: Java
>     heap space (state=,code=0)".
>     The source table contains 9M rows(approximate 400+MB in storage), and
>     I'm using default luciddb settings with 1200M JVM heap size.
>
> Can you add some more detail here?  Source table, source server (I'm
> betting it's MySQL... MySQL driver is notorious for causing OOM
> exceptions), and the DML you are executing?

Right, if it's MySQL, be sure to check the settings here for your version:

http://pub.eigenbase.org/wiki/LucidDbCreateForeignServer#MySQL

JVS

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Out of memory when merging large data

Derek Wu
Many thanks for your reply, and sorry about not mention more information in previous email.
Yes, the source server is MySQL(with JConnector v5.1.8), and the foreign server definition is similar as:
CREATE SERVER xxx
FOREIGN DATA WRAPPER sys_jdbc
OPTIONS (
   driver_class 'com.mysql.jdbc.Driver',
   url 'jdbc:mysql://192.168.1.3:3306/aaa?useUnicode=true&acharacterEncoding=UTF-8&failOverReadOnly=false',
   user_name 'readonly',
   password '123',
   login_timeout '10',
   fetch_size '1000',
   validation_query 'select 1',
   table_types 'TABLE',
   schema_name 'aaa');
It's a little bit different from the link provided by JVS. I'll add "useCursorFetch=true" and try it again later tonight, and will give you guys an update on that.
As you may interest, the source table on MySQL is as follows:
CREATE TABLE `abc` (
  `row_id` int(11) NOT NULL auto_increment,
  `user_id` int(11) NOT NULL,
  `id1` int(11) NOT NULL,
  `id2` int(11) default NULL,
  `id3` int(11) NOT NULL,
  `value` varchar(2000) default NULL,
  `role` char(1) NOT NULL,
  `id4` int(11) default NULL,
  `create_datetime` datetime NOT NULL,
  PRIMARY KEY  (`row_id`),
  KEY `abcI1` (`id1`,`id3`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Thank you again Nicholas and JVS for your help.

Thanks,
Derek Wu

On Fri, Mar 26, 2010 at 11:45 AM, John V. Sichi <[hidden email]> wrote:
> Nicholas Goodman wrote:
>>     2) I got out of memory error when merging large data from source
>>     server and luciddb shuted down
>>     The error message looks like "Error: java.lang.OutOfMemoryError: Java
>>     heap space (state=,code=0)".
>>     The source table contains 9M rows(approximate 400+MB in storage), and
>>     I'm using default luciddb settings with 1200M JVM heap size.
>>
>> Can you add some more detail here?  Source table, source server (I'm
>> betting it's MySQL... MySQL driver is notorious for causing OOM
>> exceptions), and the DML you are executing?
>
> Right, if it's MySQL, be sure to check the settings here for your version:
>
> http://pub.eigenbase.org/wiki/LucidDbCreateForeignServer#MySQL
>
> JVS
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Out of memory when merging large data

Derek Wu
Problem solved. I've upgraded MySQL JDBC driver to latest version(5.1.12), and most important, using "useCursorFetch=true" in foreign server defintion. No OOM even the maximum JVM heap size for luciddb is set to 512MB.

Thanks for all your help, attached file is the test result with fetch size set to 5000.
Table1
CREATE TABLE `table1` (
  `id1` int(11) NOT NULL default '0',
  `type` int(11) NOT NULL default '0',
  `id2` int(11) NOT NULL default '0',
  `name` varchar(255) default NULL,
  `v1` int(11) NOT NULL default '0',
  `v2` int(11) NOT NULL default '0',
  `v3` int(11) NOT NULL default '0',
  `v4` int(11) NOT NULL default '0',
  `s1` char(1) default 'N',
  `user_id` int(11) default NULL,
  `s2` char(3) default NULL,
  `create_datetime` datetime default NULL,
  `id3` int(11) default NULL,
  `id4` int(11) default '-1',
  `s3` int(11) default NULL,
  PRIMARY KEY  (`id1`,`type`,`id2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Table2 is same as the table abc I mentioned earlier.

Will try different ways to figure out how to make it a little bit faster.

Thanks,
Derek Wu

On Fri, Mar 26, 2010 at 5:30 PM, Derek Wu <[hidden email]> wrote:
Many thanks for your reply, and sorry about not mention more information in previous email.
Yes, the source server is MySQL(with JConnector v5.1.8), and the foreign server definition is similar as:
CREATE SERVER xxx
FOREIGN DATA WRAPPER sys_jdbc
OPTIONS (
   driver_class 'com.mysql.jdbc.Driver',
   url 'jdbc:mysql://192.168.1.3:3306/aaa?useUnicode=true&acharacterEncoding=UTF-8&failOverReadOnly=false',
   user_name 'readonly',
   password '123',
   login_timeout '10',
   fetch_size '1000',
   validation_query 'select 1',
   table_types 'TABLE',
   schema_name 'aaa');
It's a little bit different from the link provided by JVS. I'll add "useCursorFetch=true" and try it again later tonight, and will give you guys an update on that.
As you may interest, the source table on MySQL is as follows:
CREATE TABLE `abc` (
  `row_id` int(11) NOT NULL auto_increment,
  `user_id` int(11) NOT NULL,
  `id1` int(11) NOT NULL,
  `id2` int(11) default NULL,
  `id3` int(11) NOT NULL,
  `value` varchar(2000) default NULL,
  `role` char(1) NOT NULL,
  `id4` int(11) default NULL,
  `create_datetime` datetime NOT NULL,
  PRIMARY KEY  (`row_id`),
  KEY `abcI1` (`id1`,`id3`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Thank you again Nicholas and JVS for your help.

Thanks,
Derek Wu


On Fri, Mar 26, 2010 at 11:45 AM, John V. Sichi <[hidden email]> wrote:
> Nicholas Goodman wrote:
>>     2) I got out of memory error when merging large data from source
>>     server and luciddb shuted down
>>     The error message looks like "Error: java.lang.OutOfMemoryError: Java
>>     heap space (state=,code=0)".
>>     The source table contains 9M rows(approximate 400+MB in storage), and
>>     I'm using default luciddb settings with 1200M JVM heap size.
>>
>> Can you add some more detail here?  Source table, source server (I'm
>> betting it's MySQL... MySQL driver is notorious for causing OOM
>> exceptions), and the DML you are executing?
>
> Right, if it's MySQL, be sure to check the settings here for your version:
>
> http://pub.eigenbase.org/wiki/LucidDbCreateForeignServer#MySQL
>
> JVS
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> luciddb-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/luciddb-users
>



------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users

test_result.png (26K) Download Attachment
Loading...