[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Use of Condor for biological applications

On Thu, May 27, 2004 at 07:42:18AM +1200, Miskell, Craig wrote:
> If we do go with Condor, we'll be doing bioinformatics stuff with it -
> I'd be interested in any details you can share on what you've done,
> especially related to blast.

we are running a bunch of searches on a weekly basis.  we have 5 different sets
of query sequences that we blast against 5 different databases, resulting in
about 8 billion sequence comparisons.

> Specifically, what are you doing regarding splitting the input sequences into
> decent size chunks that take a reasonable time to run,
i have a framework (written in perl) which does all the work:
  1) pulls in a fresh copy of all the databases to be searched using http or
	ftp and builds the indexes
  2) pulls in all the sequences via http or ftp, and splits them into files
	"decent sized chunks".  i try to make each job run for about an hour.
	then, if there is a problem (blast hangs on occaision) it can be killed
	and restarted in a timely manner.  the watching, killing, and restarting
	of jobs is also automatic.
  3) when everything is done, it creates a tarball of the results and can ftp
	it to some location.

> and managing the databases (locally stored?  NFS or some other network file
> system?)

currently it relies on NFS (or some other network file system) but i have just
finished a version of my framework which does not require a shared filesystem,
making it much easier to deploy a large blast search on any condor cluster or
grid resource.  i'll be releasing this soon, hopefully within a week's time,
and i will announce it to this list when i'm done.

please feel free to ask me more questions if you have any.


Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>