[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] File xfer error, setting mismatch



try running

 

condor_config_val -v -dump SEC_PASSWORD

 

I think you will see that SEC_PASSWORD_DIRECTORY is set to /etc/condor/passwords.d

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of pn@xxxxxxxxxxx
Sent: Thursday, June 11, 2020 4:23 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] File xfer error, setting mismatch

 

Hi,

I have setup HTCondor on linux cluster. I installed from yum repo, on Centos7.8. CM is dual nic and all exec nodes are on private LAN. I plan to use file transfer method rather than use a shared filesystem.  I submit jobs and slots of the exec node are alotted but job fails because of file transfer failure. Below is clipping from the job log

007 (024.009.000) 06/12 01:40:08 Shadow exception!
        Error from slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed to transfer files
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...

Secondly, I notice an anomaly about SEC_PASSWORD_FILE. In the security config file, the following is the line

SEC_PASSWORD_FILE = /etc/condor/password.d/POOL

However, in the StarterLog of the particular slot on the exec node, the directory is "passwords.d".  I am unable to figure out where the directory is set as "passwords.d" instead of "password.d". I grepped through the config files, failed to find.

Below are more lines from the StarterLog of the slog (on the exec node)

06/12/20 02:43:29 (pid:39209) Can't open directory "/etc/condor/passwords.d" as PRIV_ROOT, errno: 2 (No such file or directory)
06/12/20 02:43:29 (pid:39209) setting the orig job name in starter
06/12/20 02:43:29 (pid:39209) setting the orig job iwd in starter
06/12/20 02:43:29 (pid:39209) Chirp config summary: IO false, Updates false, Delayed updates true.
06/12/20 02:43:29 (pid:39209) Initialized IO Proxy.
06/12/20 02:43:29 (pid:39209) Done setting resource limits
06/12/20 02:43:29 (pid:39209) Set filetransfer runtime ads to /var/lib/condor/execute/dir_39209/.job.ad and /var/lib/condor/execute/dir_39209/.machine.ad.
06/12/20 02:43:29 (pid:39209) FILETRANSFER: "/usr/libexec/condor/box_plugin.py -classad" did not produce any output, ignoring
06/12/20 02:43:29 (pid:39209) FILETRANSFER: "/usr/libexec/condor/gdrive_plugin.py -classad" did not produce any output, ignoring
06/12/20 02:43:30 (pid:39209) FILETRANSFER: "/usr/libexec/condor/onedrive_plugin.py -classad" did not produce any output, ignoring
06/12/20 02:43:30 (pid:39334) condor_read(): Socket closed abnormally when trying to read 5 bytes from daemon at <158.144.55.71:9618>, errno=104 Connection reset by peer
06/12/20 02:43:30 (pid:39209) File transfer failed (status=0).
06/12/20 02:43:30 (pid:39209) ERROR "Failed to transfer files" at line 2533 in file /var/lib/condor/execute/slot3/dir_3977/userdir/.tmpEsbepJ/BUILD/condor-8.9.7/src/condor_starter.V6.1/jic_shadow.cpp
06/12/20 02:43:30 (pid:39209) ShutdownFast all jobs.
06/12/20 02:43:30 (pid:39209) condor_write(): Socket closed when trying to write 222 bytes to <192.168.55.71:4652>, fd is 8
06/12/20 02:43:30 (pid:39209) Buf::write(): condor_write() failed

Where could it be picking up different setting than what is in the file in config.d? Or any other error?

Thanks for helping out!

Nagaraj