[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] need help in settings for condo job submission using python bindings



On 9/25/2017 3:44 PM, John M Knoeller wrote:
We have classified this as a bug and will fix it the next releases of 8.6 and 8.7 so the sub.queue() does the same version check that condor_submit does.

-tj


Future releases of HTCondor will not suffer from this confusion. With HTCondor v8.6.7+, Xin's example job below will end up being submitted exactly the same regardless of using condor_submit or using the htcondor.Submit().queue() Python API. For details, the ticket associated with this bug is at:

  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6420

Thank you for reporting this issue Xin, and apologies for the grief it caused you.

regards,
Todd





*From:* HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] *On Behalf Of *Xin Wang
*Sent:* Monday, September 25, 2017 12:52 PM
*To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; 'htcondor-admin@xxxxxxxxxxx' <htcondor-admin@xxxxxxxxxxx> *Subject:* Re: [HTCondor-users] need help in settings for condo job submission using python bindings

Hi, John,

Thank you for the message. This is helpful. With condor_q -long, I can get the real config consumed by condor.

As expected, they are slightly different, and some of the related difference is highlighted below:

For the job that is submitted using sub.queue(), here are some of the settings (which works but could not generate the right output and error files):

*Env*= "PYTHONHOME=/my/path/to/anaconda3 "

Out = "_condor_stdout"

Err = "_condor_stderr"

UserLog = "/tmp/test1.log"

ShouldTransferFiles = "*IF_NEEDED*"

*TransferOutputRemaps = "_condor_stdout=/tmp/test1.out;_condor_stderr=/tmp/test1.err"*

For the job that is submitted using schedd.submit(job_ad), here are some of the settings (which works correctly but requires setting LD_LIBRARY_PATH explicitly):

*Environment*= "PYTHONHOME=/my/path/to/anaconda3 LD_LIBRARY_PATH=/my/path/to/anaconda3/lib"

Out = "/tmp/test2.out"

Err = "/tmp/test2.err"

UserLog = "/tmp/test2.log"

ShouldTransferFiles = "*YES*"

So, schedd.submit(job_ad) is using default stdout and stderr and try to remap them back to the files, which obviously did not accomplish what it is meant for.

Other observations:

 1. I tried to set Env instead of Environment for the job_ad and then
    schedd.submit(job_ad), but I’m still facing the same error of
    “libpython3.6.so” cannot be loaded, not sure why sub.queue() can
    succeed;
 2. I tried to set ShouldTransferFiles to “YES” and then sub.queue(),
    but the condor_q -long shows that the setting is still “IF_NEEDED”.

Any thought? Especially is there any fix so that I can get sub.queue() work properly with the stdout and stderr files settings?

Thank you.

Xin

*From:* HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] *On Behalf Of *John M Knoeller
*Sent:* Monday, September 25, 2017 12:31 PM
*To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx <mailto:htcondor-users@xxxxxxxxxxx>>; 'htcondor-admin@xxxxxxxxxxx' <htcondor-admin@xxxxxxxxxxx <mailto:htcondor-admin@xxxxxxxxxxx>> *Subject:* Re: [HTCondor-users] need help in settings for condo job submission using python bindings

Yes,  condor_submit and sub.queue() do a great many things that schedd.submit() does not do.  This is why the schedd.submit() method (and SOAP) was deprecated, because it requires you to do all of the things that sub.queue() does internally.

I don’t have any guesses why your output and error files are empty.   I would suggest comparing the job ad you see from condor_q -long for a job that returned the correct output and a job that did not.  If the failure to return output is somehow a bug in HTCondor, it will almost certainly be triggered by some difference in those job ads.

-tj

*From:* HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] *On Behalf Of *Xin Wang
*Sent:* Monday, September 25, 2017 10:25 AM
*To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx <mailto:htcondor-users@xxxxxxxxxxx>>; 'htcondor-admin@xxxxxxxxxxx' <htcondor-admin@xxxxxxxxxxx <mailto:htcondor-admin@xxxxxxxxxxx>> *Subject:* Re: [HTCondor-users] need help in settings for condo job submission using python bindings

Hi, John,

I tried your approach and use condor_submit -dump <dumpfile> to see the job classad for my submission file. It has ~80 lines, and most of them do not make any sense to me. I tried to add those extra settings to my script but it did not help.

The error when running schedd.submit(job_ad) in my original script is below

condor_exec.exe: error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

which clearly indicates that something seems wrong with the environment and the condor cannot find the python3.6 shared libraries.

The strange thing is that I did set PYTHONHOME in the environment, which is sufficient for the method of condor_submit <submitfile> and the job submitted using sub.queue() but not sufficient for schedd.submit(job_ad).

To confirm my idea, when I updated the environment to sub['environment'] = "PYTHONHOME=/my/path/to/anaconda3 LD_LIBRARY_PATH=/my/path/to/anaconda3/lib"

, then my script works with schedd.submit(job_ad).

Now the question is, does condor_submit and the job submitted using sub.queue() do anything extra that schedd.submt is not doing?

For the job submitted using sub.queue(), I’m 100% sure that the job ran without issues, as I can see all results generated by my script. The only thing is that output and error files specified in the condor config are not updated at all for the job.

Thank you.

Xin

*From:* HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] *On Behalf Of *John M Knoeller
*Sent:* Friday, September 22, 2017 5:09 PM
*To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx <mailto:htcondor-users@xxxxxxxxxxx>>; 'htcondor-admin@xxxxxxxxxxx' <htcondor-admin@xxxxxxxxxxx <mailto:htcondor-admin@xxxxxxxxxxx>> *Subject:* Re: [HTCondor-users] need help in settings for condo job submission using python bindings

[External Message]

First of all, the job submitted using schedd.submit(job_ad) doesn’t run because the job ad  is incomplete.  When you use that method, you must fully specify the job classad,.   To see what a fully specified job classad looks like, run condor_submit -dump <submit_file>

For the job submitted using sub.queue() – are you sure that the job ran and produced output?  when the job is submitted, our output and error files will be created as 0 size files before the job ever runs.

-tj

*From:* HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] *On Behalf Of *Xin Wang
*Sent:* Friday, September 22, 2017 2:44 PM
*To:* 'htcondor-users@xxxxxxxxxxx' <htcondor-users@xxxxxxxxxxx <mailto:htcondor-users@xxxxxxxxxxx>>; 'htcondor-admin@xxxxxxxxxxx' <htcondor-admin@xxxxxxxxxxx <mailto:htcondor-admin@xxxxxxxxxxx>> *Subject:* [HTCondor-users] need help in settings for condo job submission using python bindings

I’m trying to submit jobs to condor to run some python scripts. If I generate a job file and submit with condor_submit, everything works fine.

Here is the job file:

universe = vanilla

environment = "PYTHONHOME=/my/path/to/anaconda3"

executable = /my/path/to/anaconda3/bin/python

arguments = /my/path/to/scripts/myrun.py

log = /tmp/job.log

output = /tmp/test.log

error = /tmp/test.err

queue

For the same job, I tried to submit through python bindings, using two different methods but do not have luck with either.

Firstly I tried schedd.Submit with the following codes:

import htcondor

schedd = htcondor.Schedd()

sub = htcondor.Submit()

sub['universe'] = 'vanilla'

sub['environment'] = "PYTHONHOME=/my/path/to/anaconda3"

sub['executable'] = '/my/path/to/anaconda3/bin/python'

sub['arguments'] = '/my/path/to/scripts/myrun.py'

sub['log'] = '/tmp/job.log'

sub['output'] = '/tmp/test.log'

sub['error'] = '/tmp/test.err'

with schedd.transaction() as txn:

     sub.queue(txn)

The job was submitted without any issues, can run successfully without issues, and have log file /tmp/job.log generated successfully. However, output and error does not work, and /tmp/test.log or /tmp/test.err are generated but with size 0 (empty).

Secondly, I tried schedd.submit with the following codes:

import htcondor

schedd = htcondor.Schedd()

job_ad = {

     "cmd" : ‘/my/path/to/anaconda3/bin/python',

     "arguments" : '/my/path/to/scripts/myrun.py',

     'env': "PYTHONHOME=/my/path/to/anaconda3",

     "log": '/tmp/job.log',

     "out": '/tmp/test.log',

     "err": "/tmp/test.err",

}

clusterId = schedd.submit(job_ad)

The job could not run. However, /tmp/test.err can be generated proper error messages:

condor_exec.exe: error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

I suspect that the error is because the environment is not properly set, but I had no luck when I also tried to set “environment” instead of “env”.

How should I fix the settings so that I can submit condor task through python bindings properly? Thanks.

Xin



            Jefferies archives and monitors outgoing and incoming
            e-mail. The contents of this email, including any
            attachments, are confidential to the ordinary user of the
            email address to which it was addressed. If you are not the
            addressee of this email you may not copy, forward, disclose
            or otherwise use it or any part of it in any form
            whatsoever. This email may be produced at the request of
            regulators or in connection with civil litigation. Jefferies
            accepts no liability for any errors or omissions arising as
            a result of transmission. Use by other than intended
            recipients is prohibited. In the United Kingdom, Jefferies
            operates as Jefferies International Limited; registered in
            England: no. 1978621; registered office: Vintners Place, 68
            Upper Thames Street, London EC4V 3BJ. Jefferies
            International Limited is authorized and regulated by the
            Financial Conduct Authority.



            Jefferies archives and monitors outgoing and incoming
            e-mail. The contents of this email, including any
            attachments, are confidential to the ordinary user of the
            email address to which it was addressed. If you are not the
            addressee of this email you may not copy, forward, disclose
            or otherwise use it or any part of it in any form
            whatsoever. This email may be produced at the request of
            regulators or in connection with civil litigation. Jefferies
            accepts no liability for any errors or omissions arising as
            a result of transmission. Use by other than intended
            recipients is prohibited. In the United Kingdom, Jefferies
            operates as Jefferies International Limited; registered in
            England: no. 1978621; registered office: Vintners Place, 68
            Upper Thames Street, London EC4V 3BJ. Jefferies
            International Limited is authorized and regulated by the
            Financial Conduct Authority.



            Jefferies archives and monitors outgoing and incoming
            e-mail. The contents of this email, including any
            attachments, are confidential to the ordinary user of the
            email address to which it was addressed. If you are not the
            addressee of this email you may not copy, forward, disclose
            or otherwise use it or any part of it in any form
            whatsoever. This email may be produced at the request of
            regulators or in connection with civil litigation. Jefferies
            accepts no liability for any errors or omissions arising as
            a result of transmission. Use by other than intended
            recipients is prohibited. In the United Kingdom, Jefferies
            operates as Jefferies International Limited; registered in
            England: no. 1978621; registered office: Vintners Place, 68
            Upper Thames Street, London EC4V 3BJ. Jefferies
            International Limited is authorized and regulated by the
            Financial Conduct Authority.



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685