[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGMan Errors



Hello Takis,

I have a couple things to say here. First is yes you can pass variables down for the job submission files to use in the .dag file with the use of the VARS line. Here is a link to the documentation for that.

Regarding the DAGMAN_USE_CONDOR_SUBMIT it looked like you added aa underscore at the beginning of the knob setting so may could be the reason that didn't work for submission: _CONDOR_DAGMAN_USE_CONDOR_SUBMIT=False condor_submit_dag -f Schwinger_Wilson_MPS.dag âMaybe give that one more try as CONDOR_DAGMAN_USE_CONDOR_SUBMIT=False condor_submit_dag -f Schwinger_Wilson_MPS.dag

Finally, âthe authentication error does seem like the bug Marco linked early. It turns out that bug is actually older than 9.7 but is more prominent due to the introduction of DAGMan direct submit.  We are currently working on patching that issue.

Hope this helps,
Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Takis Angelides <takis.angelides@xxxxxxxxx>
Sent: Monday, April 25, 2022 8:51 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] DAGMan Errors
 
Hey Marco,

The version I am using is:

$CondorVersion: 9.0.11 Mar 12 2022 BuildID: 578027 PackageID: 9.0.11-1 $
$CondorPlatform: x86_64_CentOS7 $

To avoid using inline submit, would it be possible to give numerical arguments to my jobs in the .dag file?

For example:

example.dag:

JOB EX1 EX1.sub N=20 x=1.0 mg=0.05 D=10 l_0=0.5 lambda=100.0 acc=1e-08 ms=100 mem=1

and 

EX1.sub:

environment = "JULIA_DEPOT_PATH='/lustre/fs23/group/nic/tangelides/.julia:$JULIA_DEPOT_PATH'"
executable = /lustre/fs23/group/nic/tangelides/julia-1.7.2/bin/julia
arguments = /afs/ifh.de/user/a/angeltak/Schwinger-Wilson-MPS/run_Schwinger_Wilson_dag.jl $(N) $(x) $(mg) $(D) $(l_0) $(lambda) $(acc) $(ms)
transfer_input_files = /afs/ifh.de/user/a/angeltak/Schwinger-Wilson-MPS/run_Schwinger_Wilson_dag.jl
request_memory = $(mem)G
max_retries = 5
should_transfer_files = IF_NEEDED
output = /afs/ifh.de/user/a/angeltak/logs/N_$(N)_x_$(x)_D_$(D)_mg_$(mg)_l0_$(l_0)_$(Cluster)_$(Process).out
output = /afs/ifh.de/user/a/angeltak/logs/N_$(N)_x_$(x)_D_$(D)_mg_$(mg)_l0_$(l_0)_$(Cluster)_$(Process).error
output = /afs/ifh.de/user/a/angeltak/logs/N_$(N)_x_$(x)_D_$(D)_mg_$(mg)_l0_$(l_0)_$(Cluster)_$(Process).log
queue

with

run_Schwinger_Wilson_dag.jl:

using LinearAlgebra
using Arpack
using BenchmarkTools
using Plots
using LaTeXStrings
using Test
using HDF5
include("MPO.jl")
include("variational_first_excited_state_MPS_algorithm.jl")

# ----------------------------------------------------------------------------------------------------------------------------------

# Generate data for entanglement entropy vs mass plot

N = parse(Int, ARGS[1])
x = parse(Float64, ARGS[2])
mg = parse(Float64, ARGS[3])
D = parse(Int64, ARGS[4])
l_0 = parse(Float64, ARGS[5])
lambda = parse(Float64, ARGS[6])
accuracy = parse(Float64, ARGS[7])
max_sweep_number = parse(Int64, ARGS[8])

generate_entropy_data(mg, x, N, D, accuracy, lambda, l_0, max_sweep_number)

# ----------------------------------------------------------------------------------------------------------------------------------

Kind regards,

Takis

On Mon, Apr 25, 2022 at 2:01 PM Marco van Zwetselaar <zwets@xxxxxxxxxx> wrote:
Hi Takis,

On 25/04/2022 14:07, Takis Angelides wrote:
> I did not get a syntax error. I tried what you suggested and got an
> error message: _CONDOR_DAGMAN_USE_CONDOR_SUBMIT=False: Command not
> found. from the command line. after
> submitting: _CONDOR_DAGMAN_USE_CONDOR_SUBMIT=False condor_submit_dag
> -f Schwinger_Wilson_MPS.dag in the command line.

I'm afraid I can't help you there, that command looks perfectly fine (at
least on Linux, I have no clue about Windows). However, see below.

> I also tried adding to .dag the line CONFIG dagman.config and in this
> dagman.config file I have the line DAGMAN_USE_CONDOR_SUBMIT = False.
> This also failed to switch this configuration to False.

What version of HTCondor are you using? The DAGMAN_USE_CONDOR_SUBMIT
knob was at some point renamed DAGMAN_USE_DIRECT_SUBMIT (with inverse
meaning, and defaulting to True).

Given that the knob defaults to True, I suppose you shouldn't need to
set anything (as that would be equivalent to setting the other one to
False). However ...

> Do you also have an idea about the second error I am seeing about
> preauthentication failure?

Yes, this looks like its caused by the same bug discussed in this
thread:
https://lists.cs.wisc.edu/archive/htcondor-users/2022-April/msg00031.shtml,
which occurs precisely when using direct submit (which became default in
9.7.0).

What's happening is that HTCondor incorrectly tries to obtain the system
credentials, while it is running as a user (it's doing a "direct
submission") and should be getting the user's credentials.

You could try working around the issue with
DAGMAN_USE_DIRECT_SUBMIT=False, but I suppose that will not work with
your inline job definitions.

Cheers
Marco

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/