JOB_ROUTER_REMOTE = $(JOB_ROUTER)
JOB_ROUTER_REMOTE_ARGS = -local-name JOB_ROUTER_REMOTE
JOB_ROUTER_REMOTE_LOG = $(LOG)/RemoteRouterLog
JOB_ROUTER_REMOTE_ENVIRONMENT = "_CONDOR_JOB_ROUTER_LOG=$(LOG)/RemoteRouterLog _CONDOR_JOB_ROUTER_LOCK=$(LOCK)/RemoteRouterLock _CONDOR_ROUTER_NAME=RemoteRouter"
DAEMON_LIST = $(DAEMON_LIST), JOB_ROUTER_REMOTE
JOB_ROUTER_POLLING_PERIOD = 10
JOB_ROUTER_REMOTE.JOB_ROUTER_ENTRIES = \
name = "RemoteRouteVanilla"; \
requirements = ( target.INPUT_FILES is undefined && target.JobUniverse is 5 && target.JobWasRouted isnt True && target.WantDocker is undefined && target.RouteMeToCentral is True ); \
GridResource = "condor sg03 sg03"; \
set_remote_jobuniverse = 5; \
delete_RouteMeToCentral = True; \
The job reaches pool2 (sg03), appears for a few seconds in condor_q, and starts running. Immediately after starting it stops and disappears from condor_q. With condor_history I see i.a. the following ClassAds:
SubmitterGlobalJobId = "sg02#428.0#1532950161"
Iwd = "/var/lib/condor/spool/9025/0/cluster259025.proc0.subproc0"
RouteName = "RemoteRouteVanilla"
GlobalJobId = "sg03#259025.0#1532950179"
LastRemoteHost = "slot1@sg04"
StartdPrincipal = "execute-side@matchsession/xxx.xxx.xxx.148"
ReleaseReason = "Data files spooled"
RemoveReason = "JobRouter orphan (by user condor)â
LastPublicClaimId = â<xxx.xxx.xxx.148:9618>#1532078126#3103#..."
RoutedFromJobId = "427.0"
I know the job started running, as I touched a file in a specific /tmp-location in my executable and it appeared on the execution machine. However, as mentioned above, it stops running after a few seconds. On my submit machine the only log I get is that the job was submitted.
Do you have any idea, what is going wrong? The HTCondor version is 8.6.5 on all machines.