[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How do I have my interactive job and my submission job in condor match 100%?



Hi Brando,

getenv can be dangerous as the environment in your submission environment might not work on the executing node.

Are you preparing the environment in your batch job the same way as you set it up compared to when you run interactively? (do you source all the same environment scripts etc.?) Maybe you can try and print your batch job's environment into your log file running `env` and compare with the interactive environment.

Cheers,
  Thomas

On 25/03/2021 16.55, brando.science@xxxxxxxxx wrote:
Hi,

I am a user of a HTCondor hpc. I noticed that my pytorch jobs that use cuda work just fine in the interactive mode (it seems with any version of pytorch or cuda even if nvidia-smi says one version of cuda but my pytorch says another) but when I try to run them in the condor_submit without interactive it doesn't run. It get's into a deadlock because I am trying to do parallel training (but note this does not happen in interactive mode even with 4 gpus).

My question seems simple. How do I force my condor_submit job to be identical to the environment when I run it from a interactive session?

I've tried the famous getenv flag and that didn't work for some reason. I assume it is because it copies my envs from the login node instead from the interactive session (but I cannot run a submission job from an interactive session so I can't do it that way). Is there a way to have the submission run job with exactly the same settings as a interactive job? I am not a sys adminÂI am only a user if that helps.

I've also read these two pages:

- https://htcondor.readthedocs.io/en/latest/users-manual/services-for-jobs.html?highlight=environment#environment-variables <https://htcondor.readthedocs.io/en/latest/users-manual/services-for-jobs.html?highlight=environment#environment-variables> - https://htcondor.readthedocs.io/en/latest/man-pages/condor_submit.html <https://htcondor.readthedocs.io/en/latest/man-pages/condor_submit.html> and posted this question on SO: https://stackoverflow.com/questions/66790905/how-do-i-have-my-interactive-job-and-my-submission-job-in-condor-match-100 <https://stackoverflow.com/questions/66790905/how-do-i-have-my-interactive-job-and-my-submission-job-in-condor-match-100>


Thanks for your time HTConder users list.


Sincerley, Brando


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature