[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor, sqlite, python

On 4/1/10 12:21 PM, Geary, Brian W. wrote:



We are beginning a research project which will by joining 2 million rows against 250,000 rows three times.

We can either do the 3 joins in one query or do three separate queries with 1 join.


We are using sqlite and python to execute the sql queries and there is a bit of logic in the python as well.

We can use Windows or *nix for this project or both.


Can someone explain if Condor would be appropriate for this research project? Would we just submit the

php script as a job or some other scenario.

I'm not a DB expert, but this does not sound like an obvious candidate for Condor.  Instead, you should let your DB infrastructure (software + hardware + network) look after this for you.

Conditions under which Condor and a cluster may make sense:

1. The DB is static, so you only need read access to it, and it can be replicated to all cluster nodes.
2. The queries are "slow", and take on the order of minutes (or more) to complete
3. You will be executing a lot of these in parallel a lot of the time.

Depending on what you're trying achieve in your "joins", and whether you have a "real DB", or simply "data organized into a DB", you may make some good progress using some portion of the Hadoop projects map/reduce file-system/implementation and associated tools.


Ian Stokes-Rees, PhD                       W: http://hkl.hms.harvard.edu
ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1 617 432-5608 x75
NEBioGrid, Harvard Medical School          C: +1 617 331-5993

fn:Ian Stokes-Rees, PhD
org:Harvard Medical School;Biological Chemistry and Molecular Pharmacology
adr;dom:;;250 Longwood Ave;Boston;MA;02115
title:Research Associate, Sliz Lab
tel;work:+1 617 432-5608 x75
tel;fax:+1 617 432-5600
tel;cell:+1 617 331-5993