[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] hawkeye on dual-processor nodes



Have you checked the log files?
They will give you useful infomation.

Junjun Mao 写道:
Hi all,

A serious problem just happened to my cluster, causing entire shutdown of condor. The ownership of schedd was was changed to a regular user!!! How could this happen?

Here is condor related jobs left on the master node which is the submit machine.
[root@master1 y-61.1]# ps -ef | grep condor
pwang 26763 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.0 <10.10.20.1:34661> - pwang 26766 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.2 <10.10.20.1:34661> - pwang 26772 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.1 <10.10.20.1:34661> - pwang 29394 1 0 Nov18 ? 00:00:00 condor_shadow -f 886.4 <10.10.20.1:34661> - condor 19319 1 0 Nov21 ? 00:34:54 /home2/condor/sbin/condor_master
condor   19320 19319  0 Nov21 ?        01:43:02 condor_collector -f
pwang    19393 19319  0 Dec09 ?        00:00:06 condor_schedd -f
condor   19401 19319  0 Dec09 ?        00:02:31 condor_negotiator -f


Restarting condor daemons still yealds wrong owner of schedd. I have to move job_queue.log to another location to start condor correctly.

Can someone tell me where to look for the cause of the problem?

Junjun
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR