[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_q lists jobs, exits w/ 1?



This is interesting--the Cycle Computing people just visited us
yesterday and said that some of the jobs in our cluster were
unreadable to them.. didn't make sense because our cluster
is readable from anywhere.  This could be the same problem.

Steve Timm


On Wed, 25 May 2011, Sam Gerstein wrote:

On Wed, May 25, 2011 at 8:44 PM, Shahaan Ayyub <shahaan@xxxxxxxxx> wrote:

Sam,
  It seems that "-name my_schedd_hostname" was not resolved properly, and
the output was parsed for the default condor_q command which talks to the
local  schedd, which is same in your case. Try condor_q on another schedd in
your pool and see the return value.

regards,

Shahaan


That's a good idea, and I had started to look in that direction before -
unfortunately that doesn't seem to be it.
If I specify a wrong value for the schedd name it fails out without querying
the local schedd; if I specify a different (unused) schedd in the pool, it
succeeds with exit code 0;  if I go to that second machine and run my
original -name with the correct name of the central manager, I get the same
behavior as in my original test: valid output, exit code 1.

You may suggest that I shouldn't care about this problem I've run into -
just run condor_q without the -name, since there's only one active schedd..
Indeed, I wouldn't let it bother me, but I'm running CycleServer on top of
my cluster, and that's how it generates its condor_q commands.  It took me a
little while to figure out why it wasn't getting any job attributes..
Sam



On Thu, May 26, 2011 at 9:25 AM, Sam Gerstein <sgerstein@xxxxxxxxxxxxxxx>wrote:

I've searched around for an explanation of what might cause condor_q to
give an exit code of 1 despite printing what appears to be normal complete
output, but am at a loss - has anyone seen this before?

What I've found is running "condor_q" and "condor_q -name
my_schedd_hostname" produce the same output, but the latter returns exit
code 1.
I'm running 7.4.4, with quill, on ubuntu.  When I tested again just now I
had about 200 jobs in my queue; I don't know whether behavior is different
depending on queue depth.

Thanks for any assistance you can provide-
Sam

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/




--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.