[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] submit to a remote pool, from a linux machine, to an osx manager



You can set the Arch and OpSys in the submitFile.

Section: COMMANDS FOR MATCHMAKING
http://www.cs.wisc.edu/condor/manual/v7.5/condor_submit.html#SECTION0010434000000000000000

"Arch and OpSys are set equal to the Arch and OpSys of the submit machine. In other words: unless you request otherwise, Condor will give your job machines with the same architecture and operating system version as the machine running condor_submit."

 I dont know a way for you to run a Linux program in a BSD System (unless they had installed the port for linux binaries. MacOs is BSD) But really I never test it. I always run BSD in BSD and Linux in Linux.
And... I think you should do the same, maybe you can get 2 binaries, 1 for Mac and 1 for Linux and submit them with Arch set to the correct one.

On Thu, Sep 16, 2010 at 3:43 PM, Peter Doherty <doherty@xxxxxxxxxxxxxxxxxxx> wrote:


On Sep 16, 2010, at 15:07 , Peter Doherty wrote:

Hi,

Okay, so I've got an all mac (intel xeons, running OS X 10.5) cluster.  I set up the headnode as a central manager running the schedd/collector, etc.
It works pretty good.  But I've got a linux box that I want to use to submit jobs from.  So I installed condor on the linux box and set the condor_config variables to point to the remote pool.
No daemons run on the linux box, I just changed CONDOR_HOST to reference the mac central manager.

I can use condor_submit and run jobs, and everything works great.
But when I want to submit a DAG, I run into problems.
1.) the dagman scheduler universe job has requirements (OpSys, Arch) that are looking for a LINUX X86_64 machine, but the CM is INTEL, OSX.
2.) the CMD it wants to run is the linux binary of condor_dagman (because it pulled the config from the linux box i presume)
So the job just gets stuck eternally in the queue.
........

Thanks.
Peter

In true mailing list form, I made some headway shortly after sending this email.
I changed ARCH and OPSYS in the condor_config on the local box, so the job now gets matched and runs. (i'm sure there is a more proper way to do this, however)
But then the job fails to run, because it's a linux binary on os x.

So I found the -dagman flag (and passed it the full path for the osx binary) for condor_submit_dag, but then I get an error in job.dag.dagman.out file that shows that dagman started, ran, and then exited due to -Dagman being an unknown option.  So it looks like condor_submit_dag passed it's option onto condor_dagman, which doesn't support the flag?  Although the fact that it produced the error tells me that at least condor_dagman actually started and ran on the CM before producing the error.

At least i'm getting closer.



Peter
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle