[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] determining if a job has failed to get a licen se



As the default behaviour is for Condor to run each job under a different
local account, does the flexLM log contain both the machine and the username
for the requested (denied)license? This would solve the SMP problem. Another
problem we have with flexLM is that, using the 'wrapper' approach, a dialog
box is presented, hanging the job (waiting for OK click). Any ideas how to
fix this - or how to 'automatically' click dialogs that only have one
choice???

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: 08 December 2005 00:05
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] determining if a job has failed to get a license

Hello -

We've been working on a couple of different things that will help in only
running a job if there is a license available for it to run with.

Regardless of what the final solution looks like, it's still probably going
to be possible that Condor will make a mistake and run a job that it won't
have a license for - so the first thing we want to have in place is a way to
make sure we can clean up any mess that we might create. Once we're sure we
can clean up, then we can move on to making the mess. 

Figuring out of if a job exited because of being denied a license can be
tricky. Depending on the job, you may not be able to depend on the job exit
code to mean anything if there was a license failure. You can't always use a
post script to read the output and look for "license denied" errors, and
even if you do you have to change that script for every different type of
job.

Another place that has information about denied licenses are the logs of
FLEXlm itself. We've written a simple server that allows you ask if a
machine was denied a license in some time interval. The simple usage
scenario we see for this is:

note job starttime
run the job
note the end time
connect to the FLEXlm monitor, and ask "did you deny a license to
myusername@mycurrenthost between starttime and endttime"
if yes, exit with a well-known status, and have Condor requeue this job if
no, exit with a regular status and have Condor remove this job from the
queue as normal

(For simplicity, we're assuming mostly-synchronized clocks - NTP is pretty
universal now, but we could do some sort of NTP-like thing if we needed to)

It's not perfect, but most of the problems are that it's too conservative
- if it's an SMP, the job that failed might have been on the other
processor, but there's no way to correlate the job to something FLEXlm is
tracking.  FLEXlm also only writes to its logfiles periodically, so you have
to wait some "slop time" before connecting to the FLEXlm monitor and asking
it a question (FLEXlm seems to write every 15 or 20 seconds)

The scripts are available here:
http://www.cs.wisc.edu/~epaulson/license/

Feel free to use them or modify them for whatever - if you add anything fun,
please consider sending me the changes. (I also don't think much of my Perl,
so I apologize in advance)

Thanks,

-Erik

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


Visit our website at http://www.halcrow.com

------------------------------------------------------------------------
The contents of this email do not give rise
to any binding legal obligation upon
Halcrow Group Limited unless subsequently
confirmed on headed business notepaper sent
by fax, letter or as an e-mail attachment. Please 
note that emails supplied are as found and 
there's no guarantee that the messages 
contained within the body of the email have 
not been edited after receipt. If you receive this 
email in error, please contact the sender and 
delete the message.
Thank you.
-------------------------------------------------------------------------