[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] job requirements



Hello All,

I am having a similar problem. I submitted a bunch of jobs yesterday and
they started running on different machines. I came in this morning and only
my machine was running jobs even though the other machines were Idle. My
machine is running 6.6.6. I decided to update the other machines from 6.4.5
to 6.6.6. The idle jobs found processors and started running. About an hour
later I came back and checked the status. Once again only my machine was
running processes. The other processors were again sitting around unclaimed.
Has this been seen before? Could it possibly have something to do with the
power options set in the control panel (by the way I am running on Windows
2K machines)? Can someone help?

Thanks for your time,
David

David L. Oakley
Aerospace Engineer
US Army RDECOM
AMSRD-AMR-SS-MD
Redstone Arsenal, AL 35898-5252
Email : david.l.oakley@xxxxxxxxxxx
Phone : 256-876-0539 (DSN: 746-0539)
Secure: 256-876-0649 (DSN: 746-0649)
Fax : 256-842-0808 (DSN: 788-0808)


-----Original Message-----
From: Alain Roy [mailto:roy@xxxxxxxxxxx] 
Sent: Monday, August 09, 2004 7:52 PM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] job requirements


Fernando Rannou wrote:
>Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && (Disk >=
>DiskUsage) && ((Memory * 1024) >= ImageSize) && (TARGET.FileSystemDomain 
>== MY.FileSystemDomain)
>-----
>
>but the condor submit file has not Requierements!!!
>This seems to be happening for one specific user.

condor_submit makes reasonable requirements for a job.

Pick one job that has this problem. Pretend it's job 10.0.

Pick one computer that doesn't match. Pretend it's node.example.com.

Run these two commands, substituting the correct identifiers:

   condor_q -l 10.0
   condor_status -l node.example.com

Walk through the requirements and see what can't be true. For instance, if 
the job has DiskUsage of 1000000 and the computer has Disk of 1000, then 
Requirements will be false.

My bet is that it will be the FileSystemDomain that is causing your 
problem. If so, there are two possibilities:

   * If you are submitting the job from a shared disk (like NFS) then the
     computers should indicate that they have the same FileSystemDomain.
     Here at the UW, we set FileSystemDomain to be cs.wisc.edu instead of
     the $(FULL_HOSTNAME). For more information, see:

http://www.cs.wisc.edu/condor/manual/v6.6/2_5Submitting_Job.html#SECTION0035
4000000000000000

   * If you are submitting the job from a computer that doesn't
     have shared disks, then you'll need to transfer files, so this
     requirement doesn't show up. For more information, see:

http://www.cs.wisc.edu/condor/manual/v6.6/2_5Submitting_Job.html#SECTION0035
4000000000000000

I hope this helps.

-alain


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users