[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] run a condor pool out of SGE nodes



Hi Condor-users,

I'm using pegasus to submit grid jobs to our SGE cluster through globus GT5.0.4 but our globus setup is buggy and spawns 10s of thousands of job managers that never die (they do die in the beginning few hours) and eventually I couldn't submit any job. Asking our sys-admin to debug this is not an option as this is a low-priority thing for them (i'm the only one using pegasus in ucla right now).

So I have a couple of idea to go around this problem. Before I go out and try all of them, I want to see if someone has tried (searching condor mailing lists turns up nothing or i don't optimize my keywords hard enough).

  1. construct a condor pool out of SGE nodes by submitting jobs that run condor master/startd daemon in user space and custom ports. They are all managed by the one of the nodes, which acts as condor pool submission host. As i don't have root privilege whatsoever, I have run everything as normal user.
  2. run the condor-g and globus in user space on the SGE cluster submission host. I have to configure the globus's GRAM part to talk to SGE.  And then I submit jobs on the cluster submission host. Again, everything has to be run in user-space.

My personal hunch is option 1 is likely to be do-able based on Condor Daemons That Do Not Run as root.

Thank you guys for reading this.
yu


--
http://www-scf.usc.edu/~yuhuang/