[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Attribute $$(OpSys) not found in the machine ClassAd



Thanks Todd. The workaround did it. Both jobs were executed successfully.

I am working on a dev env and I dont plan to submit many jobs on this dev env. but will make sure we have the latest stable version for our local condor pool.

Poornima.

On 11/10/11 5:12 PM, Todd Tannenbaum wrote:
Poornima Pochana wrote:
   Hi,

I am seeing this same problem as discussed <https://www-auth.cs.wisc.edu/lists/condor-users/2011-June/msg00069.shtml> in the condor-users forum sometime ago, but my case seems to be a bit different (curious), even though the error message is same.


Hi Poornima -

Not 100% certain, but a quick read of your email makes me think you stepped on the bug outlined here:
  https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2552

This bug was introduced in Condor v7.7.1 development release and is fixed for upcoming v7.7.3, i.e. this bug never made it into any stable series.

I think a workaround for Condor v7.7.1 / v7.7.2 could be to add the following into the condor_config of your submit machine(s):
  SHADOW_WORKLIFE = 0
The downside of this setting is it can negatively impact throughput of that submit machine if it is very busy, i.e. simultaneously running thousands of very short jobs.

regards,
Todd


I am trying to execute a simple DAG workflow (bash script below) which submits 2 condor jobs. The condor setup I have is a Personal Condor (7.7) installed from development condor yum repo.The 'executable' command in the submit file has the macro "simple.$$(OpSys).bat" defined and I have the binary 'simple.LINUX.bat' in the same working tree.

The first job - 1.condor finishes off successfully, but the second job - 2.condor is held with the error msg "Cannot expand $$(OpSys)." Curious thing is the OpSys attribute seems to be set to "LINUX" (condor_status cmd below) and that the first job executes but not the second job.

The log messages and my bash script are below. Any pointers as to what my setup is missing would be a great help.

Thanks,
Poornima.

Here's what I poked into:
1. The Condor system mail msg for the second job: "Attribute $$(OpSys) cannot be expanded because this attribute was not found in the machine ClassAd."

2. $ condor_status -long |grep -i machine
Machine = "xxx.xxx.xxx.xxx"
Unhibernate = MY.MachineLastMatchTime =!= undefined
MyType = "Machine"
$ condor_status -long |grep -i opsys
OpSysAndVer = "LINUX"
OpSysVer = 206
OpSys = "LINUX"

3. Message from SchedLog, where it puts on hold job 81 (the held job 2.condor), but schedules job 80 (1.condor)
11/10/11 14:38:39 (pid:17416) Starting add_shadow_birthdate(80.0)
11/10/11 14:38:39 (pid:17416) Started shadow for job 80.0 on centos6.lab.ac.uab.edu <10.0.0.26:40521> for ppreddy, (shadow pid = 16301) 11/10/11 14:38:39 (pid:17416) Finished negotiating for ppreddy in local pool: 1 matched, 1 rejected 11/10/11 14:38:43 (pid:17416) Shadow pid 16301 for job 80.0 reports job exit reason 100. 11/10/11 14:38:43 (pid:17416) match (centos6.lab.ac.uab.edu <10.0.0.26:40521> for ppreddy) switching to job 81.0
11/10/11 14:38:43 (pid:17416) Shadow pid 16301 switching to job 81.0.
11/10/11 14:38:43 (pid:17416) Starting add_shadow_birthdate(81.0)
11/10/11 14:38:43 (pid:17416) Putting job 81.0 on hold - cannot expand $$(OpSys) 11/10/11 14:38:43 (pid:17416) Job 81.0 put on hold: Cannot expand $$(OpSys). 11/10/11 14:38:43 (pid:17416) Failed to expand job ad when switching shadow 16301 to new job 81.0

4. NegotatorLog and MatchLog show rejection for not having found a match:
11/10/11 14:38:39       Rejected 81.0 <x.x.x.x:51994>: no match found

#!/bin/bash

runs=2
arg1=4
arg2=10
for job in `seq $runs`
do
    cat > $job.condor << EOF
Universe   = vanilla
Executable = simple.\$\$(OpSys).bat
Arguments  = $arg1 $arg2
Log        = $job.log
Output     = $job.out
Error      = $job.error
Queue
EOF

    let arg1=$arg1+1
    let arg2=$arg2+10
    JOB_LIST="$JOB_LIST $job"
done

# generate condor dagman to manage jobs
for JOB_ID in $JOB_LIST; do
    echo "JOB    job_$JOB_ID $JOB_ID.condor" >>master.dag
    echo "SCRIPT PRE   job_$JOB_ID pre-job " >>master.dag
    echo "SCRIPT POST  job_$JOB_ID post-job " >>master.dag
    echo "RETRY  job_$JOB_ID 5" >>master.dag
done
condor_submit_dag -notification Never master.dag >condor_submit_dag.out





------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/