[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job not starting correctly



Hi,

make sure all the paths you need are set in the bash script or use absolute paths if in doubt. The interactive login uses ssh mechanisms and therefore sources your environment which is not necessarily the case in a regular condor job.

Try ldd <binary> to check if the libraries the binary uses are hidden somewhere and put all these paths in your bash script (LD_LIBRARY_PATH etc) ...

best
christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Peter Ellevseth" <Peter.Ellevseth@xxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 26. Oktober 2021 20:26:08
Betreff: Re: [HTCondor-users] Job not starting correctly

Jason

 

We have a shared file system between all nodes. When I run condor_submit -interactive I get a shell in the same folder as I was previously, but from the âviewâ of the execute node. I can then execute simply by â./runscriptâ.

 

Yes, I get the normal log/out/error files.

 

I have checked the env and there is nothing there that tells me why the job wonât start.

 

I can also ssh to one of my startd machines and start the job manually with the runscript.

 

Loss of ideas here now.

 

P

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Jason Patton
Sent: tirsdag 4. mai 2021 14.43
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Job not starting correctly

 

Hi Peter,

 

You say that when you submit an interactive job, you run the script by doing "./runscript". Do your jobs ever use condor file transfer or is your pool set up to assume a shared file system?

 

When you submit the job normally, do you still get back the output (stdout) and error (stderr) files? It might be useful to print out the environment at the very beginning of the script and compare between a normal job and an interactive job.

 

Jason Patton

 

On Mon, May 3, 2021 at 5:04 PM Peter Ellevseth <Peter.Ellevseth@xxxxxxxxxx> wrote:

Gents

 

We are running a commercial CFD-code via htcondor. Been doing it for years without any issued. I installed a new version of that software and want to run it via htcondor as per usual. I to this by telling condor to run a locally installed bash-script on the execute node which in turn starts the CFD-solver. I have to do it this to source some files need by the solver to start (license etc).

 

However, the new version is refusing to start. From the the StarterLog.slotX I see the job immediately stops with

 

05/03/21 23:56:33 (pid:4135578) Create_Process succeeded, pid=4135579

05/03/21 23:56:33 (pid:4135578) Process exited, pid=4135579, status=139

05/03/21 23:56:33 (pid:4135578) Got SIGQUIT.  Performing fast shutdown.

 

If I ssh in to one of the execute nodes I can start it just and it runs as normal.

 

If I do condor_submit -interactive my_submit_file, I am able to run the script with ./runscript just fine.

 

The why wonât it start when I submit the file normally??

 

Peter

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/