[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGman failed to detect a node's status, seems because it could not read its log.



Hi, all.

It is fixed in the final. I set all the nodes share a same log file.

As condor manual 7.3 said, DAGman support seperate logs by seperate nodes, but it seems that all nodes share one same log would make DAGman easy to run without complainent about "ERROR: failure to read job log".

This confused me sine I have already upgraded to 7.4.

2009/12/21 dawnsong <dawnsong.tsinghua@xxxxxxxxx>
And I upgraded from 7.2.3 to 7.4.0, it still failed to read the log.

2009/12/21 dawnsong <dawnsong.tsinghua@xxxxxxxxx>

Dear condor users,

The following is the dagman log file, DAGman failed to detect a node's status, seems because it could not read its log. I googled in the user-mail-list, and found it maybe caused by NFS, and then I set NFS=YES in global configuration. Besides, this dir is not exported by NFS. But it still failed, any hint?
Thanks.


12/20 22:01:15 1202 seconds since last log event
12/20 22:01:15 Pending DAG nodes:
12/20 22:01:15   Node A6, Condor ID 391, status STATUS_SUBMITTED
12/20 22:10:55 Currently monitoring 1 Condor log file(s)
12/20 22:11:01 Currently monitoring 1 Condor log file(s)
12/20 22:11:02 ReadMultipleUserLogs: read error on log /media/DawnBook2/072809_s36d5fab_burned/msa_dawnsong/runabc6-tight.sh.log
12/20 22:11:02 ERROR: failure to read job log
  A log event may be corrupt.  DAGMan will skip the event and try to
  continue, but information may have been lost.  If DAGMan exits
  unfinished, but reports no failed jobs, re-submit the rescue file
  to complete the DAG.
12/20 22:21:03 602 seconds since last log event
12/20 22:21:03 Pending DAG nodes:
12/20 22:21:03   Node A6, Condor ID 391, status STATUS_SUBMITTED


--
Xiao-Wei Song
Ping Zhu's Lab, Center for Structural and Molecular Biology
Institute of Biophysics, Chinese Academy of Sciences
15 Datun Road, Chaoyang District, Beijing, China 100101
Tel:  +86-10-64888353, E-mail: dawnsong@xxxxxxxxxxxxxx



--
Xiao-Wei Song
Ping Zhu's Lab, Center for Structural and Molecular Biology
Institute of Biophysics, Chinese Academy of Sciences
15 Datun Road, Chaoyang District, Beijing, China 100101
Tel:  +86-10-64888353, E-mail: dawnsong@xxxxxxxxxxxxxx



--
Xiao-Wei Song
Ping Zhu's Lab, Center for Structural and Molecular Biology
Institute of Biophysics, Chinese Academy of Sciences
15 Datun Road, Chaoyang District, Beijing, China 100101
Tel:  +86-10-64888353, E-mail: dawnsong@xxxxxxxxxxxxxx