[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] EncryptExecuteDirectory issues on Windows execute nodes without run_as_owner



One kludge around is to use the âcipherâ command to decrypt the file before uploading it, e.g.

You could also potentially use HTCondor's file-transfer mechanism, although it will end up being a little less efficient in this case: if the submit node can mount \\fileserver, your jobs could terminate after creating outputfile.dat but specify

transfer_output_files = outputfile.dat
transfer_output_remaps = outputfile.dat=\\fileserver\user\output

HTCondor will read outputfile.dat as the condor-slot user and transfer if to a daemon running on the submit node as the owner of the job, which
(should) allow that daemon to write to \\fileserver\user\output.

So thatâs the FYI bit, and once users can run_as_owner I donât think this shouldnât be a problem?

	Indeed.

These must be related to the encrypt_execute_directory stuff because we can re-run the jobs with NO execute directory encryption enabled and do not get these errors.

Do you re-run all 5,000 jobs and get no failures, or just the failed 150?

So I guess the question is does anyone have any ideas as to why these errors are occurring? And only when encryptexecutedirectory is set to true?

I'm a little more worried by failing to read from the standard error log after the job has finished than the two errors failing to create the log files. Failing to write to the log after creating it is also very strange. It makes me wonder if there's a clean-up process going astray somewhere, possibly because of a race condition made worse by encrypting the execute directory.

- ToddM