[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAGman failed to detect a node's status, seems because it could not read its log.



Dear condor users,

The following is the dagman log file, DAGman failed to detect a node's status, seems because it could not read its log. I googled in the user-mail-list, and found it maybe caused by NFS, and then I set NFS=YES in global configuration. Besides, this dir is not exported by NFS. But it still failed, any hint?
Thanks.


12/20 22:01:15 1202 seconds since last log event
12/20 22:01:15 Pending DAG nodes:
12/20 22:01:15   Node A6, Condor ID 391, status STATUS_SUBMITTED
12/20 22:10:55 Currently monitoring 1 Condor log file(s)
12/20 22:11:01 Currently monitoring 1 Condor log file(s)
12/20 22:11:02 ReadMultipleUserLogs: read error on log /media/DawnBook2/072809_s36d5fab_burned/msa_dawnsong/runabc6-tight.sh.log
12/20 22:11:02 ERROR: failure to read job log
  A log event may be corrupt.  DAGMan will skip the event and try to
  continue, but information may have been lost.  If DAGMan exits
  unfinished, but reports no failed jobs, re-submit the rescue file
  to complete the DAG.
12/20 22:21:03 602 seconds since last log event
12/20 22:21:03 Pending DAG nodes:
12/20 22:21:03   Node A6, Condor ID 391, status STATUS_SUBMITTED


--
Xiao-Wei Song
Ping Zhu's Lab, Center for Structural and Molecular Biology
Institute of Biophysics, Chinese Academy of Sciences
15 Datun Road, Chaoyang District, Beijing, China 100101
Tel:  +86-10-64888353, E-mail: dawnsong@xxxxxxxxxxxxxx