[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] BLAST



> On Monday 21 December 2009 15:15:23 Thomas, Dallas wrote:
>> Greetings,
>>
>>   I have a condor pool configured such that I have the CM on one Linux
>> box, another Linux Box is my submit machine and the execute nodes are
>> all windows boxes (not my preferred option - but work with what I got).
>> Anyways I have a ntfs share that I have mapped on all boxes that
>> contains the Global Config, Local Dirs for the Linux boxes and the BLAST
>> databases.
>>
>>   Do I need to install Blast on all of the execute nodes for this to
>> work properly?  Does anyone know?
>
> We ended up scp'ing the databases to all execute nodes (for i in
> `condor_status -f '%s\n' machine | sort | uniq) in a script that fires
> condor_submit: network i/o on DB files was killing us.
>
> On linux NR is mmaped to ram in 2GB chunks, if your slot doesn't have that
> much and machine goes thrashing, it'll take forever to finish. By the time
> I
> figured that out we've decommissioned the last of low-ram boxes here, so I
> never wrote a requirements expression to exclude low-ram nodes. Shouldn't
> be
> too hard.
>
> We do have the executable on nfs share, there doesn't seem to be much i/o
> contention there. There's a bit more on input and output directories, but
> not
> enough to move them off nfs.
>
> That's on linux/nfs, no idea how it works on windows w/ ntfs shares.

Hello,

I designed the same approach, using a wrapper script to check with rsync
that the _local_ copies of the blast db are in sync, and cutting the big
fasta file into a set of smaller ones, each being submitted to condor.
It takes less than a second to let rsync decide that everything is current
with the nas, so it is very cheap to be sure !

It worked well for whole genomes (bovine against human) : one fasta cut
into roughly 20000 jobs.

My experience about ram : 1 GB / core is the minimum requirement.

Regards,

Alain
>
> It's a function of database size and the number of execute slots: with
> small
> database and a few slots you should be ok with shared directories and low
> ram. If it crawls to a near-halt, watch i/o load on the fileserver and
> swap
> usage on execute nodes.
>
> (The fastest way is to BLAST everything against everything in one run on a
> 4x4
> (or 4x6) machine with plenty of ram.)
>
> Dima
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>


-- 


Dr Alain EMPAIN, Bioinformatics, Bryology
  National Botanic Garden of Belgium            alain.empain@xxxxxxxxxx
  University of Liège, GIGA +1, Alma-in-silico  alain.empain@xxxxxxxxx
  Rue des Martyrs, 11 B-4550 Nandrin
Mobile: +32 497 701764  HOME:+32 85 512341   ULG: +32 4 3664157