[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Failed to write .job.ad file by permission denied



jiangxw@xxxxxxxxxx skrev den 12-08-2020 2:12:
Dear all,

Dear Condor people,

We launched condor version 8.9.7 on CentOS7 this month.

On startd side, we are getting this log in StarterLog.slot* once a job
exited:

08/10/20 18:45:40 (pid:116247) Process exited, pid=116303, status=0
08/10/20 18:45:40 (pid:116247) Failed to write ToE tag to .job.ad file
(13): Permission denied
08/10/20 18:45:40 (pid:116247) All jobs have exited... starter exiting
08/10/20 18:45:40 (pid:116247) **** condor_starter (condor_STARTER)
pid 116247 EXITING WITH STATUS 0

It seems not affect on the job. Just confirm if this is a bad log
hiden any problem.

I can confirm this on 8.9.8 (Fedora 32) too.  It's probably something
race-like, because it toggles between:

---
08/11/20 14:27:54 Failed to write ToE tag to .job.ad file (13): Permission denied 08/12/20 11:32:26 Failed to write ToE tag to .job.ad file (2): No such file or directory 08/12/20 11:49:32 Failed to write ToE tag to .job.ad file (13): Permission denied
---

Increasing the debug level to D_ALWAYS is not really helping much:

---
08/12/20 12:39:10 (fd:16) (pid:6619) (D_ALWAYS) PERMISSION GRANTED to submit-side@matchsession from host 10.87.24.2 for command 404 (DEACTIVATE_CLAIM_FORCIBLY), access level DAEMON: reason: DAEMON authorization has been made automatic for submit-side@matchsession 08/12/20 12:39:10 (fd:16) (pid:6619) (D_DAEMONCORE) DAEMONCORE: SendResponse() 08/12/20 12:39:10 (fd:16) (pid:6619) (D_DAEMONCORE) DAEMONCORE: SendResponse() : NOT m_new_session 08/12/20 12:39:10 (fd:16) (pid:6619) (D_DAEMONCORE) DAEMONCORE: ExecCommand(m_req == 404, m_real_cmd == 404, m_auth_cmd == 404) 08/12/20 12:39:10 (fd:16) (pid:6619) (D_COMMAND) Calling HandleReq <command_handler> (0) for command 404 (DEACTIVATE_CLAIM_FORCIBLY) from submit-side@matchsession <10.87.24.2:42119> 08/12/20 12:39:10 (fd:16) (pid:6619) (D_NETWORK) condor_read(fd=14 <10.87.24.2:42119>,,size=21,timeout=20,flags=0,non_blocking=0) 08/12/20 12:39:10 (fd:16) (pid:6619) (D_NETWORK) condor_read(fd=14 <10.87.24.2:42119>,,size=227,timeout=20,flags=0,non_blocking=0) 08/12/20 12:39:10 (fd:16) (pid:6619) (D_NETWORK) condor_write(fd=14 <10.87.24.2:42119>,,size=68,timeout=20,flags=0,non_blocking=0) 08/12/20 12:39:10 (fd:16) (pid:6619) (D_ALWAYS) Failed to write ToE tag to .job.ad file (2): No such file or directory 08/12/20 12:39:10 (fd:16) (pid:6619) (D_ALWAYS) slot1_1: Called deactivate_claim_forcibly() 08/12/20 12:39:10 (fd:16) (pid:6619) (D_PRIV) PRIV_CONDOR --> PRIV_ROOT at /builddir/build/BUILD/condor-8.9.8/src/condor_startd.V6/Starter.cpp:476 08/12/20 12:39:10 (fd:16) (pid:6619) (D_PRIV) PRIV_ROOT --> PRIV_CONDOR at /builddir/build/BUILD/condor-8.9.8/src/condor_startd.V6/Starter.cpp:539
---

The permissions of the file look normal.

It does not seem to affect normal working of condor.

Greetings, B.

Thanks for any hint!

Best Regards,
Xiaowei