[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] worker node job directory names?



Hi Jaime,
that "trick" appears to have fixed the problem indeed, cheers!

________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Jaime Frey [jfrey@xxxxxxxxxxx]
Sent: 07 July 2020 23:01
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] worker node job directory names?

> On Jul 7, 2020, at 2:37 PM, Maarten Litmaath <Maarten.Litmaath@xxxxxxx> wrote:
>
> Dear HTCondor users,
> there is new site trying to get HTCondor CE + batch system to work
> for the ALICE LHC experiment.  So far they seem to be the only site
> where the job directory names have a structure like in this example:
>
>    /users/condor/spool/715/0/cluster715.proc0.subproc0/\
>    home_*_${CE}_9619_${CE}#716.0#1594078077/
>
> The presence of those '#' characters is problematic for legacy SW
> that cannot handle such paths, which the site got by default.
>
> How may the admins configure HTCondor to avoid such characters
> being used in job directories?  I looked at all occurrences of the
> words "directory" or "scratch" in the admin guide, to no avail…


Those directory names are part of an attempt to submit from HTCondor to other batch systems in non-CE environments.
In particular, to handle the user submitting many jobs with the same working directory, we create a temporary subdirectory in which to run each job. To ensure each job gets a different subdirectory that HTCondor can clean up after any errors, we use the the GlobalJobId attribute from the HTCondor as part of the name. That’s where the ‘#’ characters are coming from.

We should sanitize those values, and I will ensure future releases do so.
As an immediate work-around, you can disable the unique subdirectory name based on GlobalJobId logic by modifying the Job Router rules to include the following line:
  Set_Remote_JobDirectory=Undefined

For example, in the standard CE configuration files, you would set JOB_ROUTER_ENTRIES like so for Slurm:

JOB_ROUTER_ENTRIES @=jre
[
  GridResource = "batch slurm";
  TargetUniverse = 9;
  name = "Local_Slurm";
  Set_Remote_JobDirectory=Undefined
]
@jre

 - Jaime


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/