[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jupyter notebook python bindings - Files not returned to Jupyter

Hi All,
So the solution I've gone with was to create shared nfs mounts between htcondor and Jupyter and set the initialdir parameter to use the relevant directory within the mount.
If there is a better way around this definitely interested but this seems to be working.

On Tue, 17 May 2022 at 11:56, Ryan Weekes <1weeksy1@xxxxxxxxx> wrote:

Hi Jason,

Thanks for responding.

Jupyter is running within a fully custom container based off centos 7. There's multiple layers of images adding more functionality and it's running on a separate k8s stack.
I've been testing connecting to HTCondor servers on separate servers also running Centos 7 from this environment.
Seems like communication to the HTCondor stack is working as expected but I'm missing some black magic for it to interact back with the Jupyter environment.

Not sure if it helps but here are some submit parameters values I'm using from the notebook:-

job = htcondor.Submit({
  "universe": "docker",
  "docker_image": "containerstack/cpustress",
  "executable": "/usr/bin/docker", Â# the program to run on the execute node
  "arguments": "--cpu 40 --timeout 30s --metrics-brief",
  "should_transfer_files": "YES",
  "when_to_transfer_output": "ON_EXIT",
  "output": "out.$(ClusterId).$(ProcId)",    # anything the job prints to standard output will end up in this file
  "error": "err.$(ClusterId).$(ProcId)",    Â# anything the job prints to standard error will end up in this file
  "log": "log.$(ClusterId).$(ProcId)",     Â# this file will contain a record of what happened to the job
  "request_cpus": "Cpus",      Â# how many CPU cores we want
  "request_memory": "128MB",   Â# how much memory we want
  "request_disk": "128MB",    Â# how much disk space we want
  "initialdir": "/tmp/",


Hi Weeksy,

The path in the hold reason suggests to me that you're running Jupyter inside a container on the submit machine? Are you able to point to which image that you're using (or if you've built a custom image, what it's based off of)?


Jason Patton

On Mon, 16 May 2022 at 13:03, Ryan Weekes <1weeksy1@xxxxxxxxx> wrote:
Hi All

I have 2 problems that appear to be related. Wondering if anyone out there knows the fix.

I'm submitting jobs from a Jupyter notebook using python bindings however I have issues getting jobs to submit unless I set the initialdir to /tmp/ or some other common location.

141.000: ÂJob is held.

Hold reason: Cannot access initial working directory /home/jovyan/shares/Users/weeksy: No such file or directory

If i set initialdir to /tmp/ the job processes but the files associated with the job don't get returned from the submit server to Jupyter. I can see them all sitting on /tmp/ on the submit server. I don't see any errors in the logs for this second issue.

Hope someone out there can help.