[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] About out of sync between schedd and collector



Dear all,
    In our cluster, occasionally, some jobs are not in schedd (these jobs can not be find with condor_q ), but
    these jobs are occupying slots at the same time (these slots can be find with condor_status).
    In schedd, the shadows of these jobs  disappeared; In these startd machines which are occupied by jobs that can not be find in schedd, starters are running correctly.
    When the job program is finished,  the condor_starter can not be released. with condor_status, the slot is Busy.
    So we have to find these machines, and restart these machines manually.
    Is there some way recover shadow when shadow disappears but starter runs correctly.
    Wish for replys.
      
Cheers,
Jiang Xiaowei

NAME:Jiang Xiaowei
MAIL:jiangxw@xxxxxxxxxxxxxxx
TEL:010 8823 6024
DEPARTMENT:Computing Center of IHEP