I am seeing this same problem as discussed in the condor-users forum sometime ago, but my case seems to be a bit different (curious), even though the error message is same.
I am trying to execute a simple DAG workflow (bash script below) which submits 2 condor jobs. The condor setup I have is a Personal Condor (7.7) installed from development condor yum repo.The 'executable' command in the submit file has the macro "simple.$$(OpSys).bat" defined and I have the binary 'simple.LINUX.bat' in the same working tree.
The first job - 1.condor finishes off successfully, but the second job - 2.condor is held with the error msg "Cannot expand $$(OpSys)." Curious thing is the OpSys attribute seems to be set to "LINUX" (condor_status cmd below) and that the first job executes but not the second job.
The log messages and my bash script are below. Any pointers as to what my setup is missing would be a great help.
Here's what I poked into:
1. The Condor system mail msg for the second job: "Attribute $$(OpSys) cannot be expanded because this attribute was not found in the machine ClassAd."
2. $ condor_status -long |grep -i machine
Machine = "xxx.xxx.xxx.xxx"
Unhibernate = MY.MachineLastMatchTime =!= undefined
MyType = "Machine"
$ condor_status -long |grep -i opsys
OpSysAndVer = "LINUX"
OpSysVer = 206
OpSys = "LINUX"
3. Message from SchedLog, where it puts on hold job 81 (the held job 2.condor), but schedules job 80 (1.condor)
11/10/11 14:38:39 (pid:17416) Starting add_shadow_birthdate(80.0)
11/10/11 14:38:39 (pid:17416) Started shadow for job 80.0 on centos6.lab.ac.uab.edu <10.0.0.26:40521> for ppreddy, (shadow pid = 16301)
11/10/11 14:38:39 (pid:17416) Finished negotiating for ppreddy in local pool: 1 matched, 1 rejected
11/10/11 14:38:43 (pid:17416) Shadow pid 16301 for job 80.0 reports job exit reason 100.
11/10/11 14:38:43 (pid:17416) match (centos6.lab.ac.uab.edu <10.0.0.26:40521> for ppreddy) switching to job 81.0
11/10/11 14:38:43 (pid:17416) Shadow pid 16301 switching to job 81.0.
11/10/11 14:38:43 (pid:17416) Starting add_shadow_birthdate(81.0)
11/10/11 14:38:43 (pid:17416) Putting job 81.0 on hold - cannot expand $$(OpSys)
11/10/11 14:38:43 (pid:17416) Job 81.0 put on hold: Cannot expand $$(OpSys).
11/10/11 14:38:43 (pid:17416) Failed to expand job ad when switching shadow 16301 to new job 81.0
4. NegotatorLog and MatchLog show rejection for not having found a match:
11/10/11 14:38:39 Rejected 81.0 <x.x.x.x:51994>: no match found
for job in `seq $runs`
cat > $job.condor << EOF
Universe = vanilla
Executable = simple.\$\$(OpSys).bat
Arguments = $arg1 $arg2
Log = $job.log
Output = $job.out
Error = $job.error
# generate condor dagman to manage jobs
for JOB_ID in $JOB_LIST; do
echo "JOB job_$JOB_ID $JOB_ID.condor" >>master.dag
echo "SCRIPT PRE job_$JOB_ID pre-job " >>master.dag
echo "SCRIPT POST job_$JOB_ID post-job " >>master.dag
echo "RETRY job_$JOB_ID 5" >>master.dag
condor_submit_dag -notification Never master.dag >condor_submit_dag.out