[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Exact semantics of Universe = Local? How to have all work done on submit machine?



The Local universe doesn't use any slots that may be on the local machine.
Also Local universe jobs are not considered by the matchmaker at all, they
are processed by the schedd directly. check your schedd log to see why they aren't starting.

The only control you have on rate of start is a knob that controls how many local universe jobs can run simultaneously, I believe it is called START_LOCAL_UNIVERSE.

Never tried combination of DAG + Local universe jobs. not sure what would happen in that case.

Steve



On Thu, 3 Apr 2014, Rowe, Thomas wrote:

I have a dozen or so Windows machines in a Condor cluster. I am trying to take one machine in the cluster and have submitted jobs run only locally. I figured I could simply change the universe from "vanilla" to "local" in the submit files and then the locally submitted jobs would queue up and run locally.

Firstly, is my understanding that the Local Universe implies the normal queuing and robustness of the Vanilla Universe correct? Or does it do something more basic like simply launch a lot of processes regardless of slots?

Secondly, and more importantly, nothing happens at all with my Local jobs (as part of a DAG). The first job in the DAG simply sits in the queue and "has not been considered by the match maker." It has runAsOwner = TRUE, if that matters. And the version condor version is a bit old at 7.6.3.

Next I tried to go back to Vanilla universe but add a requirement that the Machine == "the_name_of_the_local_machine". And this also resulted in the job just sitting there unmatched. condor_q -analyze reports that the local slots have rejected the job for their own reasons. Which makes no sense because the machine has been working these jobs as part of the cluster just fine. How do I go about getting a better explanation of why the slots are rejecting the job? Nothing jumped out from any of the log files as an explanation.

I'm about to simply reinstall condor with a local master on the machine and remove it from the cluster, but I feel like this should be unnecessary. Either of the two approaches above should work, right?

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing