[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job Submit fails !



Hi all,

Occasionally, jobs submission fails or takes too much time (working after a lot of tries).
I took a look into submission log and it seems that the only problem is about job log files.

Output and error are logged into the same file. I would like to know if the condor_submit  fails if it could not open/read/write output file ?
Or the problem is elsewhere ? How to track it down ?

See the example below: (the log file is full of those kind of lines).

01/24/14 14:54:07 Submitting Condor Node COMPO-2 job(s)...
01/24/14 14:54:07 submitting: condor_submit -a dag_node_name' '=' 'COMPO-2 -a +DAGManJobId' '=' '416727 -a DAGManJobId' '=' '416727 -a submit_event_notes' '=' 'DAG' 'Node:' 'COMPO-2 -a log' '=' '/net/raid81/COSMOS/SHOTS/EPISODE_105/S068/BP_105_068_060/SCRIPTS/EXEC_20140124_145237-remyn-work79.sta/exec.dag.nodes.log -a exe' '=' 'COMPO.csh -a node_name' '=' 'COMPO -a parent_env' '=' ' -a idx_name' '=' 'i -a idx' '=' '2 -a iter' '=' '2 -a next_job_start_delay' '=' '2 -a priority' '=' '-2 -a DAG_STATUS' '=' '2 -a FAILED_COUNT' '=' '1 -a +DAGParentNodeNames' '=' '"" /net/raid81/XXXXXX/SHOTS/EPISODE_105/S068/BP_105_068_060/SCRIPTS/EXEC_20140124_145237-XXXX-work79.sta/job.submit
01/24/14 14:54:07 From submit: Submitting job(s)
01/24/14 14:54:07 From submit: ERROR: Can't open "/net/raid81/XXXXX/SHOTS/EPISODE_105/S068/BP_105_068_060/LOG/remyn-COMPO.csh.2.out"  with flags 01101 (No such file or directory)
01/24/14 14:54:07 failed while reading from pipe.
01/24/14 14:54:07 Read so far: Submitting job(s)ERROR: Can't open "/net/raid81/XXXXX/SHOTS/EPISODE_105/S068/BP_105_068_060/LOG/remyn-COMPO.csh.2.out"  with flags 01101 (No such file or directory)
01/24/14 14:54:07 ERROR: submit attempt failed

Some lines are coming from condor_submit, I guest but what about the "failed while reading from pipe" and the next one ?


Regards
Renaud