[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job not starting correctly



Hi Peter,

just a random guess (out of paranoia), but can you check for the
binaries etc. security contexts (and maybe other attributes?)

  ls -lZ /path/to/foo
  lsattr /path/to/foo
  getfattr -d /path/to/foo

Just in case something odd happens between executions as the user vs.
user switching.

Cheers,
  Thomas

On 27/10/2021 14.06, Jason Patton wrote:
> Peter,
> 
> Is HTCondor able to create the output and error files specified in your
> job, and are you able to modify the runscript on the (or a targeted)
> execute host to print some information to stdout or stderr? It could be
> useful to have the runscript print out the environment at the line
> before the solver runs and compare for both interactive and batch modes.
> Also, consider having the runscript print out each command to see if the
> script exits before it starts running the solver.
> 
> Jason
> 
> On 10/27/21 3:25 AM, Peter Ellevseth wrote:
>> Christoph
>>
>> The runscript uses only absolute paths.
>>
>> We just got a new version of this code where I get this problem with
>> the new version, and not with the old version. I check ldd for the
>> binaries of both versions and get the same result.
>>
>> Have discussed with supplier of the cfd code and the didnât have any
>> good suggestions yet.
>>
>> P
>>
>> *From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> *On Behalf
>> Of *Beyer, Christoph
>> *Sent:* onsdag 27. oktober 2021 08:24
>> *To:* htcondor-users <htcondor-users@xxxxxxxxxxx>
>> *Subject:* Re: [HTCondor-users] Job not starting correctly
>>
>> Hi,
>>
>> make sure all the paths you need are set in the bash script or use
>> absolute paths if in doubt. The interactive login uses ssh mechanisms
>> and therefore sources your environment which is not necessarily the
>> case in a regular condor job.
>>
>> Try ldd <binary> to check if the libraries the binary uses are hidden
>> somewhere and put all these paths in your bash script (LD_LIBRARY_PATH
>> etc) ...
>>
>> best
>>
>> christoph
>>
>>
>> --Â
>> Christoph Beyer
>> DESY Hamburg
>> IT-Department
>>
>> Notkestr. 85
>> Building 02b, Room 009
>> 22607 Hamburg
>>
>> phone:+49-(0)40-8998-2317
>> mail: christoph.beyer@xxxxxxx <mailto:christoph.beyer@xxxxxxx>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
 ----
>>
>>
>> *Von: *"Peter Ellevseth" <Peter.Ellevseth@xxxxxxxxxx
>> <mailto:Peter.Ellevseth@xxxxxxxxxx>>
>> *An: *"htcondor-users" <htcondor-users@xxxxxxxxxxx
>> <mailto:htcondor-users@xxxxxxxxxxx>>
>> *Gesendet: *Dienstag, 26. Oktober 2021 20:26:08
>> *Betreff: *Re: [HTCondor-users] Job not starting correctly
>>
>> Jason
>>
>> We have a shared file system between all nodes. When I run
>> condor_submit -interactive I get a shell in the same folder as I was
>> previously, but from the âviewâ of the execute node. I can then
>> execute simply by â./runscriptâ.
>>
>> Yes, I get the normal log/out/error files.
>>
>> I have checked the env and there is nothing there that tells me why
>> the job wonât start.
>>
>> I can also ssh to one of my startd machines and start the job manually
>> with the runscript.
>>
>> Loss of ideas here now.
>>
>> P
>>
>> *From:*HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx
>> <mailto:htcondor-users-bounces@xxxxxxxxxxx>> *On Behalf Of *Jason Patton
>> *Sent:* tirsdag 4. mai 2021 14.43
>> *To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx
>> <mailto:htcondor-users@xxxxxxxxxxx>>
>> *Subject:* Re: [HTCondor-users] Job not starting correctly
>>
>> Hi Peter,
>>
>> You say that when you submit an interactive job, you run the script by
>> doing "./runscript". Do your jobs ever use condor file transfer or is
>> your pool set up to assume a shared file system?
>>
>> When you submit the job normally, do you still get back the output
>> (stdout) and error (stderr) files? It might be useful to print out the
>> environment at the very beginning of the script and compare between a
>> normal job and an interactive job.
>>
>> Jason Patton
>>
>> On Mon, May 3, 2021 at 5:04 PM Peter Ellevseth
>> <Peter.Ellevseth@xxxxxxxxxx <mailto:Peter.Ellevseth@xxxxxxxxxx>> wrote:
>>
>> ÂÂÂ Gents
>>
>> ÂÂÂ We are running a commercial CFD-code via htcondor. Been doing it
>> for years without any issued. I installed a new version of that
>> software and want to run it via htcondor as per usual. I to this by
>> telling condor to run a locally installed bash-script on the execute
>> node which in turn starts the CFD-solver. I have to do it this to
>> source some files need by the solver to start (license etc).
>>
>> ÂÂÂ However, the new version is refusing to start. From the the
>> StarterLog.slotX I see the job immediately stops with
>>
>> ÂÂÂ 05/03/21 23:56:33 (pid:4135578) Create_Process succeeded, pid=4135579
>>
>> ÂÂÂ 05/03/21 23:56:33 (pid:4135578) Process exited, pid=4135579,
>> status=139
>>
>>  05/03/21 23:56:33 (pid:4135578) Got SIGQUIT. Performing fast
>> shutdown.
>>
>> ÂÂÂ If I ssh in to one of the execute nodes I can start it just and it
>> runs as normal.
>>
>> ÂÂÂ If I do condor_submit -interactive my_submit_file, I am able to
>> run the script with ./runscript just fine.
>>
>> ÂÂÂ The why wonât it start when I submit the file normally??
>>
>> ÂÂÂ Peter
>>
>> ÂÂÂ _______________________________________________
>> ÂÂÂ HTCondor-users mailing list
>> ÂÂÂ To unsubscribe, send a message to
>> htcondor-users-request@xxxxxxxxxxx
>> <mailto:htcondor-users-request@xxxxxxxxxxx>with a
>> ÂÂÂ subject: Unsubscribe
>> ÂÂÂ You can also unsubscribe by visiting
>> ÂÂÂ https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>>
>> ÂÂÂ The archives can be found at:
>> ÂÂÂ https://lists.cs.wisc.edu/archive/htcondor-users/
>> <https://lists.cs.wisc.edu/archive/htcondor-users/>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> <mailto:htcondor-users-request@xxxxxxxxxxx> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> <https://lists.cs.wisc.edu/archive/htcondor-users/>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature