[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] How do I have my interactive job and my submission job in condor match 100%?



Hi,

I am a user of a HTCondor hpc. I noticed that my pytorch jobs that use cuda work just fine in the interactive mode (it seems with any version of pytorch or cuda even if nvidia-smi says one version of cuda but my pytorch says another) but when I try to run them in the condor_submit without interactive it doesn't run. It get's into a deadlock because I am trying to do parallel training (but note this does not happen in interactive mode even with 4 gpus).Â

My question seems simple. How do I force my condor_submit job to be identical to the environment when I run it from a interactive session?

I've tried the famous getenv flag and that didn't work for some reason. I assume it is because it copies my envs from the login node instead from the interactive session (but I cannot run a submission job from an interactive session so I can't do it that way). Is there a way to have the submission run job with exactly the same settings as a interactive job? I am not a sys adminÂI am only a user if that helps.

I've also read these two pages:

https://htcondor.readthedocs.io/en/latest/users-manual/services-for-jobs.html?highlight=environment#environment-variablesÂ
https://htcondor.readthedocs.io/en/latest/man-pages/condor_submit.html
and posted this question on SO:Âhttps://stackoverflow.com/questions/66790905/how-do-i-have-my-interactive-job-and-my-submission-job-in-condor-match-100Â


Thanks for your time HTConder users list.


Sincerley, Brando