[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Submitting to one of several independent pools



Hi Oscar,

The condor_submit_dag -r option assumes that all of the necessary files will be visible on the remote machine.

Here's an excerpt from the manual page for condor_submit_dagÂon the '-r' flag that suggests something to try:

<...> Note that this option does not currently specify input files for condor_dagman, nor the individual nodes to be taken along! It is assumed that any necessary files will be present on the remote computer, possibly via a shared file system between the local computer and the remote computer. <...>ÂIf other options are desired, including transfer of other input files, consider using the -no_submit option, modifying the resulting submit file for specific needs, and then using condor_submit on that.

We use a shared file system so I haven't run into this personally, but hopefully that helps.

Good luck,
Collin

On Tue, Oct 2, 2018 at 3:47 AM Laborda Sanchez, Oscar (Volkswagen Group Services) <extern.Oscar.Laborda@xxxxxxx> wrote:
Michael, thank you for your reply.
Â
From your message pointing me to the â-nameâ option, I have been trying both the -name and -remote options in condor_submit and they are working just fine. Unfortunately I use DAG jobs and I cannot get them to correctly run with option "-r" (AFAIK, it is equivalent to "condor_submit -remote", but there is no equivalent to "condor_submit -name" for DAG, right?)
Â
I am submitting a simple a.dag file, but the DAG job just gets stuck never running and I find the following in the a.dag.dagman.out file in the spool dir:
Â
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAGMAN_LOG_ON_NFS_IS_ERROR setting: False
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Default node log file is: <C:\condor\spool\88\0\cluster88.proc0.subproc0\.\a.dag.nodes.log>
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAG Lockfile will be written to a.dag.lock
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAG Input file is a.dag
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Parsing 1 dagfiles
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Parsing a.dag ...
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) ERROR: Could not open file a.dag for input (cwd) (errno 2, No such file or directory)
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Removing any/all submitted HTCondor jobs...
10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Running: C:\condor\bin\condor_rm.exe -const DAGManJobId' '=?=' '88
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Warning: failure: C:\condor\bin\condor_rm.exe -const DAGManJobId' '=?=' '88
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS)Â (my_pclose() returned 1 (errno 2, No such file or directory))
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) ERROR: Warning is fatal error because of DAGMAN_USE_STRICT setting
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Aborting DAG...
10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Writing Rescue DAG to a.dag.rescue002...
Â
The a.dag file certainly has not been copied into that directory.
In the a.dag.dagman.log I am also getting this:
Â
ÂÂÂÂÂÂÂ (0) Abnormal termination (signal -1073741819)
Â
Any idea on how to fix this?
Â
Thanks
Oscar
Â
Â
-----Mensaje original-----
De: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> En nombre de Michael Pelletier
Enviado el: martes, 25 de septiembre de 2018 16:43
Para: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Asunto: Re: [HTCondor-users] Submitting to one of several independent pools
Â
Oscar,
Â
The "-name" option to condor_submit is what you're looking for:
Â
ÂÂÂÂÂÂ -name schedd_name
Â
ÂÂÂÂÂÂÂÂÂ Submit to the specified condor_schedd . Use this option to submit to
 a condor_schedd other than the default local one. schedd_name is
 the value of the Name ClassAd attribute on the machine where the
ÂÂÂÂÂÂÂÂÂ condor_schedd daemon runs.
Â
You would set up the workstation with a default scheduler, probably the production one, and then to submit for test you'd add the "-name" option to the submission to specify the hostname of the test pool's schedd.
Â
If you want to avoid the need for the command line option while testing, so you don't have to change options going from test to production, you can set the _CONDOR_SCHEDD_NAME environment variable to override what's in the workstation's configuration file setting for the default scheduler.
Â
Michael V. Pelletier
Information Technology
Digital Transformation & Innovation
Integrated Defense Systems
Raytheon Company
Â
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
Â
The archives can be found at:
Â
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Collin Mehring | PE-JoSE - Software Engineer