Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs only running on submit machine

Date: Tue, 26 Feb 2013 11:03:55 -0600
From: Cody Belcher <codytrey@xxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Jobs only running on submit machine

Jaime,

my submit files is:

        Executable = PQL
        Universe = vanilla
        Output = pql.out
        Log = pql.log
        Error = pql.err
        Arguments = -p params.in -t temps.in
        notification = Error
        notify_user = codytrey@xxxxxxxx
        should_transfer_files = YES
Queue 20

I have it queue 20 jobs to see if it would force jobs to other machines if the submit node had all it's processors in use, but it just ran 4 at a time until it was complete

Same results with:

Executable = test.py
Universe = vanilla
Output = /Volumes/Scratch/test/test.out.$(Process)
Log = /Volumes/Scratch/test/test.log
Error = /Volumes/Scratch/test/test.err
should_transfer_files = ALWAYS
Queue 10

-Cody

On 2013-02-26 10:29, Jaime Frey wrote:

What does your submit file look like?

A common problem is that the machines don't have a shared filesystem, and HTCondor's file transfer option isn't being requested in the submit file. In this case, HTCondor will only run the jobs on the submit machine.

-- Jaime
On Feb 26, 2013, at 9:09 AM, Cody Belcher <codytrey@xxxxxxxxxxxxxxxx> wrote:
I do see all of the machines in condor-status

"codytrey@metis:~$ condor_config_val DAEMON_LIST
MASTER, SCHEDD, STARTD"

This is the submit machine, it is the same on an execute a just tried.

-Cody

On 2013-02-26 08:47, Cotton, Benjamin J wrote:
Cody,

The first question is are you sure they're all in the same pool? To
check this, do they all show up in the output of condor_status?

My suspicion is that your submit/execute machine might be running its
own condor_collector and condor_negotiator processes. You can check this
with 

condor_config_val DAEMON_LIST

If that's the case, then your execute-only nodes might be as well.
Thanks and regards,

Jaime Frey

UW-Madison HTCondor Project

Follow-Ups:
- Re: [HTCondor-users] Jobs only running on submit machine
  - From: Jaime Frey

References:
- [HTCondor-users] Jobs only running on submit machine
  - From: Cody Belcher
- Re: [HTCondor-users] Jobs only running on submit machine
  - From: Cotton, Benjamin J
- Re: [HTCondor-users] Jobs only running on submit machine
  - From: Cody Belcher
- Re: [HTCondor-users] Jobs only running on submit machine
  - From: Jaime Frey

Prev by Date: Re: [HTCondor-users] child failed because PRIV_USER_FINAL process was still root before exec
Next by Date: Re: [HTCondor-users] Memory requests increasing
Previous by thread: Re: [HTCondor-users] Jobs only running on submit machine
Next by thread: Re: [HTCondor-users] Jobs only running on submit machine
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Jobs only running on submit machine