[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] checkpoints cause re-wrapping with condor_pid_ns_init



Hi Todd,

I think this one is definitely a bug. I have USE_PID_NAMESPACES = True and every time the starter restarts a user process after a checkpoint, it adds another invocation of condor_pid_ns_init:

First time:

04/03/19 23:01:53 (pid:10486) Using wrapper /etc/condor/sugwg-job-wrapper.sh to exec /usr/libexec/condor/condor_pid_ns_init /home/daniel.finstad/opt/pisn/bin/pycbc_inference --processing-scheme cpu --nprocesses 24 --fake-strain-from-file H1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt L1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt V1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/AdVirgo.txt --asd-file H1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt L1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt V1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/AdVirgo.txt --fake-strain-seed 0 --pad-data 8 --strain-high-pass 5 --sample-rate 2048 --low-frequency-cutoff 10 --verbose --force --inj-seed 2345 --instruments H1 L1 V1 --gps-start-time 1126259454 --gps-end-time 1126259470 --channel-name V1:V1:LOSC-STRAIN H1:H1:LOSC-STRAIN L1:L1:LOSC-STRAIN --config-file pisn_inference_large_eoc.ini --fake-strain-seed V1:158 H1:156 L1:157 --injection-file H1L1V1-CREATE_INJECTIONS_2397-1126259454-16.hdf --seed 52 --output-file H1L1V1-INFERENCE_2397-1126259454-16.hdf

Next time with two of them:

04/04/19 03:00:03 (pid:10486) Using wrapper /etc/condor/sugwg-job-wrapper.sh to exec /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /home/daniel.finstad/opt/pisn/bin/pycbc_inference --processing-scheme cpu --nprocesses 24 --fake-strain-from-file H1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt L1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt V1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/AdVirgo.txt --asd-file H1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt L1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt V1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/AdVirgo.txt --fake-strain-seed 0 --pad-data 8 --strain-high-pass 5 --sample-rate 2048 --low-frequency-cutoff 10 --verbose --force --inj-seed 2345 --instruments H1 L1 V1 --gps-start-time 1126259454 --gps-end-time 1126259470 --channel-name V1:V1:LOSC-STRAIN H1:H1:LOSC-STRAIN L1:L1:LOSC-STRAIN --config-file pisn_inference_large_eoc.ini --fake-strain-seed V1:158 H1:156 L1:157 --injection-file H1L1V1-CREATE_INJECTIONS_2397-1126259454-16.hdf --seed 52 --output-file H1L1V1-INFERENCE_2397-1126259454-16.hdf

Eventually...

04/10/19 07:42:55 (pid:10486) Using wrapper /etc/condor/sugwg-job-wrapper.sh to exec /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /usr/libexec/condor/condor_pid_ns_init /home/daniel.finstad/opt/pisn/bin/pycbc_inference --processing-scheme cpu --nprocesses 24 --fake-strain-from-file H1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt L1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt V1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/AdVirgo.txt --asd-file H1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt L1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/aLIGO_design.txt V1:/home/daniel.finstad/projects/pycbc-pisn-paper/data/asd_june_2016/AdVirgo.txt --fake-strain-seed 0 --pad-data 8 --strain-high-pass 5 --sample-rate 2048 --low-frequency-cutoff 10 --verbose --force --inj-seed 2345 --instruments H1 L1 V1 --gps-start-time 1126259454 --gps-end-time 1126259470 --channel-name V1:V1:LOSC-STRAIN H1:H1:LOSC-STRAIN L1:L1:LOSC-STRAIN --config-file pisn_inference_large_eoc.ini --fake-strain-seed V1:158 H1:156 L1:157 --injection-file H1L1V1-CREATE_INJECTIONS_2397-1126259454-16.hdf --seed 52 --output-file H1L1V1-INFERENCE_2397-1126259454-16.hdf

Cheers,
Duncan.

-- 

Duncan Brown                              Room 263-1, Physics Department
Charles Brightman Professor of Physics     Syracuse University, NY 13244
http://dabrown.expressions.syr.edu                   Phone: 315 443 5993