On 02/19/2015 03:05 PM, Todd Tannenbaum wrote: > So if your job requires some capability from the execute machine, e.g. > it requires that some job hook be configured, then your job should > explicitly state that in the job ad requirements expression. Yes, and the manual sez to do that "add (<keyword>_HOOK_PREPARE_JOB =!= UNDEFINED)" and Greg says that doesn't actually work. ... Then in > your setup, for instance, your jobs would only be matched to "your" > startds and would not be matched to startds are remote sites where you > are flocking. Flocking was the first thing that came to mind. What about a heterogeneous pool? Are you saying I need to explicitly define an attr for each hook on each machine and then explicitly list each attr in the job requirements? How is that better than just listing the machine names in the requirements? That aside, defining a hook results in ... 000 (82164.000.000) 02/19 16:05:36 Job submitted from host: <144.92.167.235:55862> DAG Node: update_1000 ... 000 (82165.000.000) 02/19 16:05:37 Job submitted from host: <144.92.167.235:55862> DAG Node: update_10002 ... 022 (82164.000.000) 02/19 16:05:37 Job disconnected, attempting to reconnect Socket between submit and execute hosts closed unexpectedly Trying to reconnect to slot1@xxxxxxxxxxxxxxxxxxx <144.92.167.241:43149> ... 024 (82164.000.000) 02/19 16:05:37 Job reconnection failed Job not found at execution machine Can not reconnect to slot1@xxxxxxxxxxxxxxxxxxx, rescheduling job ... 022 (82165.000.000) 02/19 16:05:37 Job disconnected, attempting to reconnect Socket between submit and execute hosts closed unexpectedly Trying to reconnect to slot2@xxxxxxxxxxxxxxxxxxx <144.92.167.241:43149> ... 024 (82165.000.000) 02/19 16:05:37 Job reconnection failed Job not found at execution machine Can not reconnect to slot2@xxxxxxxxxxxxxxxxxxx, rescheduling job ... 022 (82164.000.000) 02/19 16:05:44 Job disconnected, attempting to reconnect Socket between submit and execute hosts closed unexpectedly Trying to reconnect to slot1@xxxxxxxxxxxxxxxxxxx <144.92.167.241:43149> ... and so on. That's with a do-nothing hook shell script: #!/bin/sh cat > /dev/null /bin/true and submit file with +HookKeyword = "BLASTUPDATE" So there seems to be more to it and I can't tell if it's my setup or what. Thanks, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature