[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor job does not execute on a different machine in the pool.



Hi Siva,

Can you check your security settings. This sounds like an authorization issue. Either a user or a machine authorization issue.

http://www.cs.wisc.edu/condor/manual/v7.6/3_6Security.html

William


Hello,
 
I am new to Condor. I have configured a Condor pool with 3 Fedora linux pc's.
 
[user@fedora71 job1]$ condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
fedora59.xxx. LINUX      INTEL  Unclaimed Idle     0.000  1000  0+03:25:04
fedora66.xxx. LINUX      INTEL  Unclaimed Idle     2.050  1001  0+03:25:04
fedora71.xxx. LINUX      INTEL  Unclaimed Idle     0.020   492  0+02:50:05
                     Total Owner Claimed Unclaimed Matched Preempting Backfill
         INTEL/LINUX     3     0       0         3       0          0        0
               Total     3     0       0         3       0          0        0
[sabic@fedora71 job1]$
 
fedora71 is the central manager. fedora66 can submit as well as execute jobs, fedora 59 can only execute jobs.
 
I have a job created on fedora71 (test job). When I kick it off with "condor_submit <job-desc-file-name>" , it gets submitted but runs on fedora71 only. I removed "startd" daemon from the local config file on fedora71, so that it may send the job to any other machine in the pool, but it doesnt, the job just goes idle.
 
And, when i remove "startd" daemon, i dont see fedora71 anymore in the output of "condor_status". Is this normal behaviour ? I would guess so, because if we do not want to execute jobs on a particular machine, it doesnt need to show us what resources it got.
 
I wanted fedora71 to show up on output of "condor_status" so added "STARTD" again. So now I am back to the beginning. Test job run successfully on fedora71 only.
 
I added the below lines to my job description file (present on fedora71)
 
Requirements = ((Arch == 'INTEL' && OpSys == 'LINUX' && Mem == '1000') || (Arch == 'INTEL' && OpSys == 'LINUX' && Mem == '1001'))
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
The reason I added the requirements line is that, fedora71 has a "Mem" of 492. So as per the requirements it should try to send the job to fedora66 or fedora59 and execute the job there.
 
So when i submit the job, it just goes idle.
 
I checked the "NegotiatorLog" file on fedora71, and I found this:-
 
 
[root@fedora71 condor]# tail -f NegotiatorLog
7/22 12:06:08 Public ads include 1 submitter, 3 startd
7/22 12:06:08 Phase 2:  Performing accounting ...
7/22 12:06:08 Phase 3:  Sorting submitter ads by priority ...
7/22 12:06:08 Phase 4.1:  Negotiating with schedds ...
7/22 12:06:08   Negotiating with user@xxxxxxxxxxxxxxxx at <x.xx.xxx.xx:40518>
7/22 12:06:08 0 seconds so far
7/22 12:06:08     Request 00045.00000:
7/22 12:06:08       Rejected 45.0 user@xxxxxxxxxxxxxxxx <x.xx.xxx.xx:40518>: no match found
7/22 12:06:08     Got NO_MORE_JOBS;  done negotiating
7/22 12:06:08 ---------- Finished Negotiation Cycle ----------
 
Its not finding any match as per the requirements of the job. I am not sure why its unable to do so. Can anyone please analyse the above data and let me know if I need to check something.
 
Thanks in advance,
Siva
 
 
 
 

 
This e-mail and any attachments are for authorized use 
by the intended recipient(s) only. They may contain 
proprietary material or confidential information and/or 
be subject to legal privilege. They should not be copied, 
disclosed to, or used by any other party. If you have 
reason to believe that you are not one of the intended 
recipients of this e-mail, please notify the sender 
immediately by reply e-mail and immediately delete this 
e-mail and any of its attachments. Thank you.


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/