[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with processes finishing up??



Todd Tannenbaum wrote:
Rob Ballantyne <ballanty@xxxxxxxxxxxxx> wrote:
__________

Hi,

 This has just started happening and I
thought I might ask if anyone else has had
the problem.



if you submit jobs with notification = never
in your job submit file, do those jobs finish up ok?


Can you run /usr/bin/mail -s 'subject' ok from the command line ?  With the -s option as well?


I've manually stopped all of Condor and restarted and everything is working ok again. I suspect the OS might not be stable when condor starts up after a reboot.

  Running mail from the command line works perfectly well
and with the -s option.

  If I get a system with Zombies and hung mail's should
would you like me to still run the 'notification = never'
test?

  Thanks for answering!

Rob
-Todd



 Submitted jobs run just fine and return
results but they don't finish up.  On the
submitting host there is a bunch of zombie
csh processes and a corresponding number of
/usr/bin/mail processes (I'm guessing returning
the email results).  IE:


condor     406   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     408   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     411   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     413   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     416   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     418   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     420   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     422   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     424   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)
condor     403   0.0 -0.0    18136    332  ??  SN    8:21PM   0:00.00 /usr/bin/mail -s [Condor] Condor Job 7.4 ballanty@xxxxxxxxxxxxx
condor     405   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.01 /usr/bin/mail -s [Condor] Condor Job 7.0 ballanty@xxxxxxxxxxxxx
condor     407   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.01 /usr/bin/mail -s [Condor] Condor Job 7.8 ballanty@xxxxxxxxxxxxx
condor     410   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.01 /usr/bin/mail -s [Condor] Condor Job 7.5 ballanty@xxxxxxxxxxxxx
condor     412   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.00 /usr/bin/mail -s [Condor] Condor Job 7.1 ballanty@xxxxxxxxxxxxx
condor     415   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.00 /usr/bin/mail -s [Condor] Condor Job 7.2 ballanty@xxxxxxxxxxxxx
condor     417   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.01 /usr/bin/mail -s [Condor] Condor Job 7.3 ballanty@xxxxxxxxxxxxx
condor     419   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.01 /usr/bin/mail -s [Condor] Condor Job 7.6 ballanty@xxxxxxxxxxxxx
condor     421   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.01 /usr/bin/mail -s [Condor] Condor Job 7.7 ballanty@xxxxxxxxxxxxx
condor     423   0.0 -0.0    18136    332  ??  SN    8:22PM   0:00.01 /usr/bin/mail -s [Condor] Condor Job 7.9 ballanty@xxxxxxxxxxxxx
condor     404   0.0 -0.0        0      0  ??  ZN   31Dec69   0:00.00 (csh)


Any idea what's happening?  The start time of the process
is 31Dec69?  How's that possible?

Any help much appreciated!

Rob
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

--- message truncated ---



_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users