I also had to write a universal (Windows & Linux) wrapper script cause (as far as I understood) it is impossible to use different executables (directive executable) in as single parallel job.
If I recall correctly, the canonical approach here is to do something like the following:
executable = my_executable_for_$$(ARCH)
so that the executable has a different name on Windows than it does on Linux. I haven't tried this, though. :)
Indeed looks straightforward! Iâve tried this trick with the substitution "$(OPSYS)â, unfortunately it does not seem to work. It simply transforms to the OS of the host where job is submitted. So I see that the job is running on Windows, but the "$(OPSYS)â variable in the "executableâ field transforms to âLINUXâ.
Well, here comes a tricky part. I need to submit a job, with dozens of processes like in item 1 above, but one of these processes must be run on this special node like in item 2. I tried to tell this special node that it is also a âdedicatedâ one, but this does not seem to work. So I am stuck here. I suppose my question is the following. Is it possible to submit a parallel job in a way that one of these parallel processes flocks to a different pool.
Not as far as I know. You may, in this case, want to consider startd flocking instead -- have the special node report to each of the pools which need to be able to run jobs on it. (That is, add their collectors to the COLLECTOR_HOST list.) This will probably result in the special node being matched simultaneously in multiple pools, which can have confusing results. (It should work -- the first schedd to contact the start will 'win' -- but may lead starvation, if one of the schedds is consistently faster/slower than the others.) However, since the special node will be in the pools, it will probably be accessible to parallel universe jobs.
Interesting idea, I shall try this. Where can I read about "startd flockingâ? Is there some recipe? Probably I simply not read the documentation careful enough, but I cannot find a word about this.
To solve previous item I tried the condor_tail and it does not seem to work at all. It simply hangs until job finishes, then it exits reporting that there is no such job. No output is provided. I could not make it work and I do not know how to debug. Any ideas?
Try it with a vanilla universe job first? I don't know if condor_tail is expected to work with parallel universe jobs.
Iâve tried â no luck. Here is the simple submit file I used for this:
executable = wrapper.sh
arguments = ping -c444 127.0.0.1
universe = vanilla
requirements = OpSys == "LINUX"
How can I debug this?
All the best,
Alexander A. Prokhorov