Horizontal Partitioning Across Distributed Servers

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Horizontal Partitioning Across Distributed Servers

Jeremy Lemaire

Summary

As part of a scale-out effort I have used distributed horizontal partitioning and ran into what appears to be a bug when using a Java PreparedStatement with parameters.   Specifically if a Java PreparedStatement is used with parameters then the dispatched query sent to the remote server contains all table attributes and no filters.  Filtering is not done until the very large result set is returned back to the coordinator.  On the other hand if the same query is sent via a Java PreparedStatement with the same parameter values hardcoded then the filters are retained when the query is dispatched to the remote server and the proper table columns are also sent.  This results in a query time of 1985ms instead of 72475ms.  As would be expected the same query in sqllineClient or Squirrel takes about 2 seconds.  

Test Setup and Results

HorizontalPartitioningAcrossDistributedServers.txt

Questions

1. Is there a workaround for this other than not using Java PreparedStatement with parameters?
2. I am currently on LucidDb v0.9.2, will a software upgrade fix the problem?
3. Assuming this is a problem with the LucidDbClient.jar, can I safely use a 0.9.3 LucidDbClient.jar with a 0.9.2 server installation?  




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Horizontal Partitioning Across Distributed Servers

ngoodman
On Mar 30, 2011, at 1:34 PM, Jeremy Lemaire wrote:

Questions

1. Is there a workaround for this other than not using Java
PreparedStatement with parameters?
It might be related to the UNION ALL portions and the PreparedStatement.  Not JUST the PS and remote server by itself.

What does the Explain Plan for: 
SELECT SUM("COUNT") as "COUNT", f.npa, f.filled, f."LANGUAGE"
	FROM "INVENTORY_ANALYZER_SCHEMA"."INVENTORY_BY_LANGUAGE_2011_Q1" f 
		LEFT JOIN ad_inventory_warehouse.publisher_dimension pub USING (source_id) 
			WHERE f.source_id IN (?) AND 
				pub.publisher_id IN (?) AND 
				f.DATETIME >= timestamp '2011-03-29 00:00:00' AND 
				f.DATETIME <= timestamp '2011-03-29 23:59:59' 
GROUP BY f.npa, f.filled, f."LANGUAGE"
When run as a Prepared Statement?
Also, if you can submit the Explain plans (which will have the information on the remote SQL being executed (with implementation) for both the good/slow ones, that'd be great. 
2. I am currently on LucidDb v0.9.2, will a software upgrade fix the
problem?
Maybe... JVS did a bunch of commits as part of his work with Firewater that might help with some of this.  Actually, for the type of partitioning you're doing, you almost certain want to use that technology.

We are currently preparing it as an "add on" extension to deploy on top of an existing LucidDB installation.  We intend to have a Firewater "mod" available with the 0.9.4 release.
Not quite ready yet, but getting there.

3. Assuming this is a problem with the LucidDbClient.jar, can I safely use a
0.9.3 LucidDbClient.jar with a 0.9.2 server installation?   
We have two customers that have attempted this (and are also subscribed to this email list) which can attest to the issues they faced.  vJDBC was ugpraded during these release to fix issues with Explain plan, etc.  We were able to ultimately get a 0.9.3 installation to talk to a 0.9.2 LucidDB but reintroduced the bugs to do so.  This is, not recommended.  :)

Follow on questions:
 - Can you (or will you) ultimately use Firewater?  It's designed for PRECISELY the use case you're articulating.
 - What is your degree of parallelism set to on your coordinator node?

------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself;
WebMatrix provides all the features you need to develop and
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf

_______________________________________________
luciddb-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/luciddb-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Horizontal Partitioning Across Distributed Servers

Jeremy Lemaire
ngoodman wrote
Follow on questions:
 - Can you (or will you) ultimately use Firewater?  It's designed for PRECISELY the use case you're articulating.
 - What is your degree of parallelism set to on your coordinator node?
Using Firewater sounds great however it was not available when I initially developed our system so it would take a lot of effort to get it integrated.  Given that I have already written several scripts to do precisely what firewater does I am not sure what the benefit would be of using it at this point so I am inclined to say that I will not use it.  Assuming I was given the time to integrate Firewater the first hurdle would be upgrading my existing server from 0.9.2->0.9.3->0.9.4. So far I have not had success with the 0.9.2->0.9.3 upgrade procedure even with John's suggested work around.  Going forward this is going to be a huge roadblock for me with or without Firewater.

My degreeOfParallelism session setting is at the default of 1.    

The EXPLAIN PLAN results are attached.  Although, I am not sure how useful it will be given that the parameterized prepared statement caused an exception before the plan was returned.
ExplainPlan.txt

Question

Would it make sense, and is it safe, to upgrade my coordinator node to 0.9.3 or 0.9.4 and run it with a distributed node that is still running 0.9.2?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Horizontal Partitioning Across Distributed Servers

Jeremy Lemaire
Here is an additional piece of information, the following exception is being thrown on the coordinator node:
INFO: Connecting to datasource FarragoDBMS
Apr 14, 2011 10:02:49 AM org.eigenbase.util.EigenbaseException <init>
SEVERE: org.eigenbase.util.EigenbaseException: Invalid URL:  jdbc:luciddb:http://remotenode.mydomain.com:8034
Apr 14, 2011 10:02:49 AM net.sf.farrago.jdbc.FarragoJdbcUtil newSqlException
SEVERE: Invalid URL:  jdbc:luciddb:http://remotenode.mydomain.com:8034
Apr 14, 2011 10:02:49 AM de.simplicit.vjdbc.VirtualDriver connect
INFO: VJdbc-URL: servlet:http://remotenode.mydomain.com:8034/vjdbc,FarragoDBMS
Apr 14, 2011 10:02:49 AM de.simplicit.vjdbc.VirtualDriver connect
INFO: VJdbc in Servlet-Mode, using URL http://remotenode.mydomain.com:8034/vjdbc
Apr 14, 2011 10:02:49 AM de.simplicit.vjdbc.VirtualDriver connect
Is my URL truly invalid and if so what is the proper format?

Loading...