[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor-ce: troubleshooting and jobRouter



Hello Geonmo, thank You,

On 19/09/2018 02:02, "Geonmo Ryu" wrote:

Hello, Stefano.

Â

Could you use osg-local-job-environment.conf?

Im' not sure about which is the best place where to put it:

I have no more that path /var/lib/osg/ ; in my first attempt with condor-ce i followed instructions from
https://opensciencegrid.org/docs/compute-element/install-htcondor-ce/ which led me to install
htcondor-ce-*.osg*el7*.rpm

however now i have:
[root@ce01-htc ~]# rpm -qa | grep condor-ce
htcondor-ce-client-3.1.0-1.el7.noarch
htcondor-ce-condor-3.1.0-1.el7.noarch
htcondor-ce-view-3.1.0-1.el7.noarch
htcondor-ce-3.1.0-1.el7.noarch

and there is no /var/lib/osg/ from those packages.

As an attempt to provide external jobs with a PATH i defined

[root@ce01-htc ~]# condor_config_val -dump APPEND_REQUIREMENTS
APPEND_REQUIREMENTS = Environment = "PATH=/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"

but i'm not sure this is the best approach.

Cheers,
Stefano

Â

I modified the file to add the environment.

Â

## /var/lib/osg/osg-local-job-environment.conf

#!/bin/sh

VO_CMS_SW_DIR=/cvmfs/cms.cern.ch

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

export VO_CMS_SW_DIR

export PATH

####################################

Â

and here is my jobrouter configuration

Â

##Â/etc/condor-ce/config.d/61-job-routes.conf

##Â 61-job-routes.confÂ

#####################################################

# Example Job Route

#

# This is an extraordinarily simple job route.

# All it does is route local condor and set a

# simple Accounting Group and default RequestMemory.

#####################################################

# No custom functions for job router entries; these are causing crashes in 8.3.5.

# Can remove the eval_set_environment attribute below starting in 8.3.8.

JOB_ROUTER_ENTRIES = [ \

    name = "condor_pool_dteam"; \

    TargetUniverse = 5; \

    Requirements = target.x509UserProxyVOName =?= "dteam"; \

    set_requirements = (Arch == "X86_64") && (TARGET.OpSys == "LINUX"); \

    MaxJobs = 100; \

    MaxIdleJobs = 100; \

] \Â

[ \

    name = "condor_pool_ops"; \

    TargetUniverse = 5; \

    Requirements = target.x509UserProxyVOName =?= "ops"; \

    set_requirements = (Arch == "X86_64") && (TARGET.OpSys == "LINUX"); \

    MaxJobs = 100; \

    MaxIdleJobs = 100; \

] \Â

[ \

    name = "condor_pool_cms"; \

    TargetUniverse = 5; \

    Requirements = target.x509UserProxyVOName =?= "cms"; \

    set_requirements = (Arch == "X86_64") && (TARGET.OpSys == "LINUX"); \

    MaxJobs = 1280; \

    MaxIdleJobs = 1280; \

] \Â

Â

Â

Â

Â

Â

-----------------------ìë ëìì-----------------------
ëëìë: "Stefano Dal Pra "<stefano.dalpra@xxxxxxxxxxxx>
ëëìë: htcondor-users <htcondor-users@xxxxxxxxxxx>
ëëìê: 2018-09-18 22:12:29 GMT +0900 (ROK)
ìë: [HTCondor-users] condor-ce: troubleshooting and jobRouter

Â

Â

Hello,

Â

Â

I'm practicing with HTCondor-ce and need some help as i'm not very

Â

fluent at troubleshooting / configuration.

Â

Â

Test pilot jobs submitted by a CMS factory are failing a validation

Â

shell script when running in the execute node.

Â

Apparently, the reason is that no env var is passed to the job:

Â

Â

Environment = ""

Â

Â

I verified that the shell script succeeds if I submit it from the

Â

condor-ce itself by adding

Â

environment =

Â

"PATH=/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin" in the

Â

submit file.

Â

Â

However, if i submit the same from an external machine, again no

Â

environment is passed to the job in the exec node.

Â

That seems to suggest that a few parameters are trimmed away. I think

Â

that JobRouter should be where such submission

Â

parameters might be altered but i'm not sure at all and some simpler

Â

misconfiguration could explain this problem.

Â

Â

A couple of questions:

Â

Â

1) For jobs I submit there are logfiles such as

Â

/var/log/condor-ce/GridmanagerLog.dteam039

Â

containing a line such as:

Â

Â

09/17/18 15:08:10 (D_ALWAYS:2) [4098033] GAHP[4098037] <-

Â

'CONDOR_JOB_SUBMIT [SNIP] Environment\ =\

Â

"PATH=/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"; [SNIP]

Â

Â

where i can see the submit file content,

Â

however there is no similar file for the cms user:

Â

/var/log/condor-ce/GridmanagerLog.pilcms017

Â

Is there a way to compare the job parameters "before" and "after" the

Â

routing?

Â

Â

2) Does someone have a few examples of jobrouting configuration for a

Â

WLCG like HTCondor-CE ?

Â

Currently i'm looking at

Â

https://opensciencegrid.org/docs/compute-element/job-router-recipes/ .

Â

If the examples there are mostly adequate for a non OSG CE I can go on

Â

and refere to those ones.

Â

Â

Thanks for any help, bye

Â

Â

Stefano

Â

Â

_______________________________________________

Â

HTCondor-users mailing list

Â

To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a

Â

subject: Unsubscribe

Â

You can also unsubscribe by visiting

Â

https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

Â

Â

The archives can be found at:

Â

https://lists.cs.wisc.edu/archive/htcondor-users/

Â

Â



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/