[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Retaining executable for debugging



On Sat, Dec 30, 2017 at 2:52 PM, Larry Martell <larry.martell@xxxxxxxxx> wrote:
> On Sat, Dec 30, 2017 at 2:47 PM, Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx> wrote:
>> On 2017-12-29 19:42, Larry Martell wrote:
>> ...Any other suggestions on
>>>
>>> how to debug this?
>>
>>
>> Make your script pprint sys.path to stderr and make sure you're saving
>> condor's error file?
>
> Thanks for the pointers. I did find in the StarterLog.slot1 log the
> full command line, and also this 'Running job as user nobody' -
> perhaps that is causing a permission issue? I googled that and found
> this thread: https://www-auth.cs.wisc.edu/lists/htcondor-users/2014-March/msg00013.shtml
> - I want to try that, but we are having an NFS issue now and our sys
> admin is not available to fix, so I am stuck for a while.

So I did add a print to stderr and it does not appear in the err file.
This makes me feel that the script that condor is running is not the
version I think it is running.

I am submitting the job from machine A using the python API to the
condor host and that is in turn running the script on an execute
hosts. The python script being run is referenced from a NFS mounted
dir that is the same on all 3 hosts (I checked and it is).

In the error file I see it's running this (where the number after dir_
is different each time):

/var/lib/condor/execute/dir_169123/condor_exec.exe

Is there a log that shows what is copied from where to that file? Is
there a way to keep that dir around after the jobs terminates?