[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] PROBLEM Submitting CONDOR JOB WITH BASH SCRIPT



Hello, thank you for the support. Unfortunately I still have no output.

The log file says, repetitively:

022 (868088.000.000) 01/09 11:55:57 Job disconnected, attempting to reconnect
ÂÂÂ Socket between submit and execute hosts closed unexpectedly
ÂÂÂ Trying to reconnect to slot1_1@xxxxxxxxxxxxxxxxxx <IP?addrs=IP+[--1]-9618&noUDP&sock=8624_1fad_3>
...
024 (868088.000.000) 01/09 11:55:57 Job reconnection failed
ÂÂÂ Job not found at execution machine
ÂÂÂ Can not reconnect to slot1_1@xxxxxxxxxxxxxxxxxx, rescheduling job

The error is empty.

Now, after your suggestion the .sub is the following:

# begin Condor submit file
#Transfer_executableÂÂÂ ÂÂÂ Â= False
#Transfer_files ÂÂÂ ÂÂÂ Â= ALWAYS
When_to_transfer_output ÂÂÂ Â= on_exit

# actual job is performed by the shell script

script_to_run ÂÂÂ ÂÂÂ ÂÂÂ Â= shower_test.sh
#Transfer_input_files ÂÂÂ ÂÂÂ Â= $(script_to_run)
#Arguments ÂÂÂ ÂÂÂ ÂÂÂ Â= $(script_to_run)
Executable ÂÂÂ ÂÂÂ ÂÂÂ Â= $(script_to_run)
Output ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ Â= output/$(script_to_run).$(cluster).$(process).out
Error ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ Â= error/$(script_to_run).$(cluster).$(process).err
Log ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ Â= log/$(script_to_run).log
Queue
# end Condor submit file

Thank you for any help,
MG

2018-01-09 16:11 GMT+01:00 Jason Patton <jpatton@xxxxxxxxxxx>:
What's happening with your latest attempt is that because
"should_transfer_files = yes", condor is sending your /bin/bash on
your local machine to the execute machine, and that machine's
environment can't run your bash executable. Two solutions:

1. Remove "should_transfer_files = yes". Having "transfer_input_files
= $(script_to_run)" should be sufficient for getting your script sent
to the execute machine, or

2. You already have "#!/bin/bash" in your script, so it can be
directly executed. You can rewrite your submit file so that your
script is the executable:

script_to_run = shower_test.sh

executable = $(script_to_run)
# no need for arguments, shower_test.sh will be executed directly
# no need for transfer_input_files, shower_test.sh will be sent
automatically since it is the executable

Output         = output/$(script_to_run).$(cluster).$(process).out
Error         = error/$(script_to_run).$(cluster).$(process).err
Log         = log/$(script_to_run).log

queue


Jason

On Tue, Jan 9, 2018 at 5:12 AM, MICHELE GROSSI
<michele.grossi01@universitadipavia.it> wrote:
> Hello,
> I add the parameters as suggested and the job run, then went on hold then
> run again.
> The same BASH script and the following submit.sub:
>
>Â # begin Condor submit file
> Should_transfer_files     = YES
> When_to_transfer_output   = on_exit
>
> script_to_run       = shower_test.sh
> Transfer_input_files     = $(script_to_run)
> Arguments       = $(script_to_run)
> Executable       = /bin/bash
> Output         = output/$(script_to_run).$(cluster).$(process).out
> Error         = error/$(script_to_run).$(cluster).$(process).err
> Log         = log/$(script_to_run).log
> Queue
> # end Condor submit file
>
>
> From the log folder I got: (use IP instead or real IP)
> 022 (868088.000.000) 01/09 11:55:57 Job disconnected, attempting to
> reconnect
>Â Â ÂSocket between submit and execute hosts closed unexpectedly
>Â Â ÂTrying to reconnect to slot1_1@xxxxxxxxxxxxxxxxxx <IP?addrs=IP+
> [--1]-9618&noUDP&sock=8624_1fad_3>
> ...
> 024 (868088.000.000) 01/09 11:55:57 Job reconnection failed
>Â Â ÂJob not found at execution machine
>Â Â ÂCan not reconnect to slot1_1@xxxxxxxxxxxxxxxxxx, rescheduling job
> ...
> 001 (868088.000.000) 01/09 12:01:46 Job executing on host: <IP?addrs=IP
> +[--1]-9618&noUDP&sock=8256_7334_3>
>
> from error folder I got:
> condor_exec.exe: /lib64/libc.so.6: version `GLIBC_2.14' not found (required
> by condor_exec.exe)
> condor_exec.exe: /lib64/libc.so.6: version `GLIBC_2.15' not found (required
> by condor_exec.exe)
>
> I cannot understand this kind of error because, as I said, running the
> script without condor, on the same grid environment went well.
>
> 2018-01-08 18:41 GMT+01:00 John M Knoeller <johnkn@xxxxxxxxxxx>:
>>
>> did the job go on hold?
>>
>> was there an error message in the .log file?
>>
>> was there anything in the .log file?
>>
>>
>>
>> There is no statement in your submit file telling it to transfer the
>> shower_test.sh file. This is ok if you have a shared file system
>>
>> between the submit machine and execute machine, but not if you donât.
>> Missing statements about file transfer may also be why
>>
>> you arenât getting any output files â because you havenât told condor to
>> transfer them back.
>>
>>
>>
>> try adding
>>
>>
>>
>> should_transfer_files = yes
>>
>> transfer_input_files = $(script_to_run)
>>
>> when_to_transfer_output = ON_EXIT
>>
>>
>>
>> to your submit file.
>>
>> -tj
>>
>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
>> Of MICHELE GROSSI
>> Sent: Monday, January 8, 2018 11:13 AM
>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>> Subject: Re: [HTCondor-users] PROBLEM Submitting CONDOR JOB WITH BASH
>> SCRIPT
>>
>>
>>
>> Sorry for being misleading.
>>
>> I mean that the submission works bit the job run is short and in fact I
>> got no output. It seems that the bash script inside did not start.
>>
>> MG
>>
>> saluti,
>> Michele
>>
>>
>>
>> Il 08 gen 2018 6:04 PM, "John M Knoeller" <johnkn@xxxxxxxxxxx> ha scritto:
>>
>> We donât understand what you mean when you say âSUBMISSION WORKS BUT IT
>> DOES NOT WORKâ
>>
>>
>>
>> you need to be more specific.
>>
>>
>>
>> Does condor_submit succeed or fail?
>>
>> If it fails, what is the error message
>>
>> If it succeeds, what is it that is failing?
>>
>>
>>
>> -tj
>>
>>
>>
>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
>> Of MICHELE GROSSI
>> Sent: Friday, January 5, 2018 4:02 PM
>> To: htcondor-users@xxxxxxxxxxx
>> Subject: [HTCondor-users] PROBLEM Submitting CONDOR JOB WITH BASH SCRIPT
>>
>>
>>
>> Hello, I would ask your help in understanding how to submit a job to
>> condor like what I'm trying to explain.
>>
>> FIRST PART
>>
>> I have a bash script (for sake of simplicity I removed my internal path
>> folder and I used a fancy one, consider that the shell script and the
>> program works fine local) that using shell command, loop through a folder
>> and extract some files parsing the name. Once scouting the name I need, I
>> use that to execute another program.
>> Then I need to execute a program (SECOND PART) that can be run in terminal
>> typing the row you see starting with ./
>>
>>
>>
>> #!/bin/bash
>>
>> cp -r /myfolder/lastfolder .
>> cd last folder
>> source config.sh
>>
>> for i in $( ls /myfolder/lastfolder/gen*$1/data.dat ); do
>> gen=$(echo $i)
>> percorso=`echo $gen | cut -d'/' -f9- `
>> singlegen_old=`echo $gen | cut -d'/' -f9-| sed -e 's/\/data.dat//g' `
>> singlegen=`echo $percorso | sed -e 's/\/data.dat//g' `
>> number=`echo $singlegen | sed -e 's/gen//g' `
>>
>> ./main.exe firstargument.cmnd $i thirdargument_$number.dat
>>
>> done
>>
>> The submit condor is the following:
>>
>> # begin Condor submit file
>> # actual job is performed by the shell script
>>
>> script_to_run = shower_test.sh
>> Arguments = $(script_to_run)
>> Executable = /bin/bash
>> Output = $(script_to_run).$(cluster).$(process).out
>> Error = $(script_to_run).$(cluster).$(process).err
>> Log = $(script_to_run).log
>> Queue 1
>> # end Condor submit file
>>
>> The CONDOR SUBMISSION WORKS BUT IT DOES NOT WORK.
>>
>> THANK YOU
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with
>> a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with
>> a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/