Archive for February, 2011

Attaching a DAS track to GBrowse

A colleague Frank Schwach and myself recently set up a data source to be used by the Plasmodium berghei GBrowse hosted at Plasmodb.org.

There were some complications in that the das.sanger.ac.uk/das/ data sources are provided through a proxy that sits in front of multiple DAS servers of different types i.e proserver and dazzle.

A quick summary of our experience is described by Frank below in the hope that if anyone else runs into difficulties this will help them or they will get in touch with us for help.

 

Frank:

“Turns out that the Bio::Graphics module at some point asks the DAS
server for it’s capabilities before retrieving any data. It does to find
out which types of features it can get.
So, even if a direct request for features would be successful on its own
(which it was in our case) it doesn’t work in Gbrowse unless the DAS
server states it’s capabilities correctly.
The complication on our side is that the DAS server goes through a
proxy, which is where the POST requests failed previously and now it
turned out that the proxy didn’t send the correct headers. Jonathan had
to modify the headers and now it does return the capabilities of the
source correctly. Unfortunately, Gbrowse doesn’t report the actual
problem with the DAS source even if you switch on all debugging options
in the code. The critical query happens in Bio::Das::Segment, when the
Bio::Graphics::Browser asks the Bio::Das::Segment object for the types.”

Me: In summary I didn’t realise GBrowse would be using POST for requests instead of GET and that the DAS spec specifies that for long requests a POST should be used!!! Ideally all data sources that provide a valid features response would have a valid types response – however many do not. Ensembl doesn’t require one but probably many other clients such as GBrowse do. This sort of thing is where the lack of conformity to the DAS spec by data providers (including us at the Sanger with many old data sources) is detrimental to the DAS system. I’m pleased to see that GBrowse uses the sources.xml response to see the capabilities of sources, and it’s important for these to be stated correctly where used.

Advertisements

JConsole – best kept secret?

When recently having memory issues with our new hardware running tomcat we found we could look at what resources tomcat was using with the jconsole program situated in JAVA_HOME/bin/ if you do a top to find the process id (PID) of tomcat you can just use jconsole PID like this: jconsole 12203 and you will get a fantastic graph options for memory, threads, classes.

A very useful resource indeed!!! This tool told me that our Perm Gen memory was running high relative to the amount allocated so I just upped the allocation using -XX:MaxPermSize=128m (default is 64m) in the tomcat startup script. Tomcat on our production servers has been reliable ever since.

Connection pooling problems

Since moving to using Tomcat from Resin for our production server and using spring templates for database connections the database connection pooling started to suffer from stale connections resulting in errors like this:

 

java.sql.SQLException: Communication link failure: java.io.EOFException, underlying cause: null

** BEGIN NESTED EXCEPTION **

java.io.EOFException

STACKTRACE:

java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1395)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1539)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1930)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1168)

 

After hunting around for a solution I found that adding the following parameters to the database configuration in JNDI in context.xml solved the issue which would eventually cause a permgen out of memory error completely freezing tomcat and the VM it was running on:

validationQuery=”select 1″ testOnBorrow=”true”

 

copied below is the post I found that was most useful in solving this from http://stackoverflow.com/questions/1448974/problem-with-connection-pooling-with-java-and-mysql-in-tomcat-web-application:

 

There are a few pointers on avoiding this situation, obtained from other sources, especially from the connection pool implementations of other drivers and from other application servers. Some of the information is already available in the Tomcat documentation on JNDI Data Sources.

  1. Establish a cleanup/reaper schedule that will close connections in the pool, if they are inactive beyond a certain period. It is not good practice to leave a connection to the database open for 8 hours (the MySQL default). On most application servers, the inactive connection timeout value is configurable and is usually less than 15 minutes (i.e. connections cannot be left in the pool for more than 15 minutes unless they are being reused time and again). In Tomcat, when using a JNDI DataSource, use the removeAbandoned and removeAbandonedTimeout settings to do the same.
  2. When a new connection is return from the pool to the application, ensure that it is tested first. For instance, most application servers that I know, can be configured so that connection to an Oracle database are tested with an execute of “SELECT 1 FROM dual”. In Tomcat, use the validationQuery property to set the appropriate query for MySQL – I believe this is “SELECT 1” (without quotes). The reason why setting the value of the validationQuery property helps, is because if the query fails to execute, the connection is dropped from the pool, and new one is created in its place.

As far are the behavior of your application is concerned, the user is probably seeing the result of the pool returning a stale connection to the application for the first time. The second time around, the pool probably returns a different connection that can service the application’s queries.

Tomcat JNDI Data Sources are based on Commons DBCP, so the configuration properties applicable to DBCP will apply to Tomcat as well.