[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job was held. Cannot expand $${Opsys).



Thank you Matt and Todd!

Unfortunately I never did figure out the source of the bug. It seems to have gone away with an upgrade to condor.  I think that one of the nodes may have been configured oddly but I was unable to determin which node was the problem (using LastRemoteHost ) before the upgrade occurred  

Thanks again for your help,  everything seems to be running well now.

- Dirk 


On Tue, Jun 14, 2011 at 8:56 AM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
Matthew Farrellee wrote:
Check the LastRemoteHost attribute on the held jobs. Or look in the job's log for where it was about to execute before being held.

Best,


matt

Good advice!

Dirk, if it simply continues to remain a mystery --- since it seems like whenever this happens it should have expanded to WINNT51, perhaps a workaround could be:
 executable = avida.$$(OpSys:WINNT51).exe
The above macro says use the vlaue of OpSys unless it is not defined, in which case default to WINNT51. But it would still obviously be better to figure out why Condor thinks it is undefined in the first place.

regards,
Todd



On 06/13/2011 09:45 AM, Dirk Colbry wrote:
Hey Todd,

Thanks for responding to my email.  Unfortunately, I do not think
"undefined" is the problem.  I ran the "condor_status -constraint
"OpSys=?=UNDEFINED"" command you suggested and didn't get anything
back.  I also checked every node in our small pool (using
condor_status -long | grep OpSys) and see that OpSys is set to be
either WINNT51 or LINUX as reported by condor_q long.

I have tried both of the following requirements settings and they both
produce my HOLD problem:

requirements = (OpSys == "LINUX"&&  Arch == "X86_64") || (OpSys ==
"WINNT51"&&  Arch == "INTEL")
requirements = (OpSys == "WINNT51"&&  Arch == "INTEL")

Since some of my jobs are running, my best guess is one of the
WindosXP nodes is not configured properly.  I am going to
systematically take the nodes out of the pool to see if I get the
problem to go away.  However, I am open to suggestions if anyone has
an alternative approach.

Thanks again,

- Dirk


On Fri, Jun 10, 2011 at 5:13 PM, Todd Tannenbaum<tannenba@xxxxxxxxedu>  wrote:
Dirk Colbry wrote:

I have an executable compiled and running in both windows and linux
and would like to make a condor job that can run on either.  However,
it seems to only be working some of the time.  The rest of the time I
get the following output in my log file:


000 (074.1005.000) 06/06 15:06:33 Job submitted from host:
<XX.X.12.45:9636>
...
012 (074.1005.000) 06/06 16:57:47 Job was held.
       Cannot expand $$(OpSys).
       Code 0 Subcode 0

I have named my two executables avida.LINUX.exe and avida.WINNT51.exe
and included the following executable line in my condor submission
script:

executable = avida.$$(OpSys).exe

If I set up the requirements to only run on LINUX, X86-64 then the job
runs fine.  However, when I set it up to run on windows I sometimes
get the above error (But not all the time).  I can make the error go
away if I hard code the executable to avida.WINNT51.exe but that
defeats the purpose.  I looked though the logs but nothing jumped out
and I am not sure which log I should be focusing on.

My best guess is that there is something configured wrong on one of my
windows nodes which is causing the problem.   I tried googling this in
may different ways but did not find anything similar.  Any hints on
where I should be looking?


Strange.  The above implies that you have some machine ads out there that do
not have an OpSys attribute?! Does the following command return anything?
 condor_status -constraint "OpSys=?=UNDEFINED"

What does your Requirements line in your submit file look like?  Maybe
setting it to explicitly require that OpSys is defined and is either LINUX
or WINNT51 would help, like so:
 requirements = opsys =?= "LINUX" || opsys =?= "WINNT51"
Note the use of meta-equals to be certain a value for opsys of UNDEFINED is
not acceptable.

regards,
Todd

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
Todd Tannenbaum                       University of Wisconsin-Madison
Center for High Throughput Computing  Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/