[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] prepare job hook anyone?



On 02/19/2015 03:05 PM, Todd Tannenbaum wrote:

> So if your job requires some capability from the execute machine, e.g.
> it requires that some job hook be configured, then your job should
> explicitly state that in the job ad requirements expression.

Yes, and the manual sez to do that "add (<keyword>_HOOK_PREPARE_JOB =!=
UNDEFINED)" and Greg says that doesn't actually work.

...  Then in
> your setup, for instance, your jobs would only be matched to "your"
> startds and would not be matched to startds are remote sites where you
> are flocking.

Flocking was the first thing that came to mind. What about a
heterogeneous pool? Are you saying I need to explicitly define an attr
for each hook on each machine and then explicitly list each attr in the
job requirements? How is that better than just listing the machine names
in the requirements?

That aside, defining a hook results in

...
000 (82164.000.000) 02/19 16:05:36 Job submitted from host:
<144.92.167.235:55862>
    DAG Node: update_1000
...
000 (82165.000.000) 02/19 16:05:37 Job submitted from host:
<144.92.167.235:55862>
    DAG Node: update_10002
...
022 (82164.000.000) 02/19 16:05:37 Job disconnected, attempting to reconnect
    Socket between submit and execute hosts closed unexpectedly
    Trying to reconnect to slot1@xxxxxxxxxxxxxxxxxxx <144.92.167.241:43149>
...
024 (82164.000.000) 02/19 16:05:37 Job reconnection failed
    Job not found at execution machine
    Can not reconnect to slot1@xxxxxxxxxxxxxxxxxxx, rescheduling job
...
022 (82165.000.000) 02/19 16:05:37 Job disconnected, attempting to reconnect
    Socket between submit and execute hosts closed unexpectedly
    Trying to reconnect to slot2@xxxxxxxxxxxxxxxxxxx <144.92.167.241:43149>
...
024 (82165.000.000) 02/19 16:05:37 Job reconnection failed
    Job not found at execution machine
    Can not reconnect to slot2@xxxxxxxxxxxxxxxxxxx, rescheduling job
...
022 (82164.000.000) 02/19 16:05:44 Job disconnected, attempting to reconnect
    Socket between submit and execute hosts closed unexpectedly
    Trying to reconnect to slot1@xxxxxxxxxxxxxxxxxxx <144.92.167.241:43149>
...


and so on. That's with a do-nothing hook shell script:
#!/bin/sh
cat > /dev/null
/bin/true

and submit file with
+HookKeyword = "BLASTUPDATE"

So there seems to be more to it and I can't tell if it's my setup or what.

Thanks,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature