[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Submitting to one of several independent pools



Michael, thank you for your reply.
 
From your message pointing me to the “-name” option, I have been trying both the -name and -remote options in condor_submit and they are working just fine. Unfortunately I use DAG jobs and I cannot get them to correctly run with option "-r" (AFAIK, it is equivalent to "condor_submit -remote", but there is no equivalent to "condor_submit -name" for DAG, right?)
 
I am submitting a simple a.dag file, but the DAG job just gets stuck never running and I find the following in the a.dag.dagman.out file in the spool dir:
 
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAGMAN_LOG_ON_NFS_IS_ERROR setting: False
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Default node log file is: <C:\condor\spool\88\0\cluster88.proc0.subproc0\.\a.dag.nodes.log>
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAG Lockfile will be written to a.dag.lock
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAG Input file is a.dag
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Parsing 1 dagfiles
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Parsing a.dag ...
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) ERROR: Could not open file a.dag for input (cwd) (errno 2, No such file or directory)
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Removing any/all submitted HTCondor jobs...
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Running: C:\condor\bin\condor_rm.exe -const DAGManJobId' '=?=' '88
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Warning: failure: C:\condor\bin\condor_rm.exe -const DAGManJobId' '=?=' '88
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS)  (my_pclose() returned 1 (errno 2, No such file or directory))
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) ERROR: Warning is fatal error because of DAGMAN_USE_STRICT setting
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Aborting DAG...
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Writing Rescue DAG to a.dag.rescue002...
 
The a.dag file certainly has not been copied into that directory.
In the a.dag.dagman.log I am also getting this:
 
        (0) Abnormal termination (signal -1073741819)
 
Any idea on how to fix this?
 
Thanks
Oscar
 
 
-----Mensaje original-----
De: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> En nombre de Michael Pelletier
Enviado el: martes, 25 de septiembre de 2018 16:43
Para: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Asunto: Re: [HTCondor-users] Submitting to one of several independent pools
 
Oscar,
 
The "-name" option to condor_submit is what you're looking for:
 
       -name schedd_name
 
          Submit to the specified condor_schedd . Use this option to submit to
          a condor_schedd other than the default local  one.   schedd_name  is
          the  value  of  the  Name ClassAd attribute on the machine where the
          condor_schedd daemon runs.
 
You would set up the workstation with a default scheduler, probably the production one, and then to submit for test you'd add the "-name" option to the submission to specify the hostname of the test pool's schedd.
 
If you want to avoid the need for the command line option while testing, so you don't have to change options going from test to production, you can set the _CONDOR_SCHEDD_NAME environment variable to override what's in the workstation's configuration file setting for the default scheduler.
 
Michael V. Pelletier
Information Technology
Digital Transformation & Innovation
Integrated Defense Systems
Raytheon Company
 
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/