[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor-ce: troubleshooting and jobRouter



Hello,

I'm practicing with HTCondor-ce and need some help as i'm not very fluent at troubleshooting / configuration.

Test pilot jobs submitted by a CMS factory are failing a validation shell script when running in the execute node.
Apparently, the reason is that no env var is passed to the job:

Environment = ""

I verified that the shell script succeeds if I submit it from the condor-ce itself by adding environment = "PATH=/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin" in the submit file.

However, if i submit the same from an external machine, again no environment is passed to the job in the exec node. That seems to suggest that a few parameters are trimmed away. I think that JobRouter should be where such submission parameters might be altered but i'm not sure at all and some simpler misconfiguration could explain this problem.

A couple of questions:

1) For jobs I submit there are logfiles such as /var/log/condor-ce/GridmanagerLog.dteam039
containing a line such as:

09/17/18 15:08:10 (D_ALWAYS:2) [4098033] GAHP[4098037] <- 'CONDOR_JOB_SUBMIT [SNIP] Environment\ =\ "PATH=/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"; [SNIP]

where i can see the submit file content,
however there is no similar file for the cms user: /var/log/condor-ce/GridmanagerLog.pilcms017 Is there a way to compare the job parameters "before" and "after" the routing?

2) Does someone have a few examples of jobrouting configuration for a WLCG like HTCondor-CE ? Currently i'm looking at https://opensciencegrid.org/docs/compute-element/job-router-recipes/ . If the examples there are mostly adequate for a non OSG CE I can go on and refere to those ones.

Thanks for any help, bye

Stefano