[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] shim_dmtcp problems with command arguments.



We have a small condor in Electrical Engineering here at UW Madison. The users want to be able to use checkpointing, and I had high hopes for dmtcp, but command arguments aren't working.

For testing I have a submit file that just runs uname under dmtcp. When the command is /sbin/uname the job returns .err and .out files with the expected content.

When the command is /sbin/uname -a no such files are created. The uname -a version of the submit file is below.

We have condor 7.4.4, and shim_dmtcp version 0.4

Thanks
Russ Poyner

universe = vanilla
executable = /mnt/condor/bin/shim_dmtcp

###############################################################################
# Argument Meaning
#------------------
# --log log file name for actions in shim_dmtcp script, if n/a use /dev/null
# --stdin stdin file, if n/a use /dev/null
# --stdout stdout file, if n/a use /dev/null
# --stderr stderr file, if n/a use /dev/null
# --ckptint checkpointing interval in seconds
# 1 the executable name you should have transferred in
# 2+ arguments to the executable
###############################################################################
arguments = --log shim_dmtcp.$(CLUSTER).$(PROCESS).log --stdout \
uname_job.$(CLUSTER).$(PROCESS).out --stderr uname_job.$(CLUSTER).$(PROCESS).err \
--ckptint 1800 /bin/uname -a
requirements = (Machine == "<ahost>.ece.wisc.edu")

###############################################################################
# Enable file transfer. Here is where you ���mixin��� the user���s input and
# output fles along with what is needed for DMTCP. Don���t forget to transfer
# the actual executable along.
###############################################################################
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_input_files = /usr/bin/dmtcp_checkpoint, \
/usr/bin/dmtcp_command, \
/usr/bin/dmtcp_coordinator, \
/usr/bin/dmtcp_restart, \
/usr/lib/dmtcp/dmtcphijack.so, \
/mnt/condor_pharm/dmtcp-1.2.6/mtcp/libmtcp.so, \
/usr/lib/libmtcp.so.1, \
/mnt/condor_pharm/dmtcp-1.2.6/mtcp/mtcp_restart

###############################################################################
# Set up various environment variables. If you need to specify more, mix them
# in here.
###############################################################################
environment=DMTCP_TMPDIR=./;JALIB_STDERR_PATH=/dev/null; \
PATH=/bin:/usr/bin:/mnt/condor/bin:$(PATH); \
DMTCP_PREFIX_ID=$(CLUSTER)_$(PROCESS); \
DMTCP_BIN=/usr/bin/; \
DMTCP_LIB=/mnt/condor_pharm/dmtcp-1.2.6/mtcp/;

###############################################################################
# SIGINT is our soft checkpointing signal
###############################################################################
kill_sig = 2

###############################################################################
# Output and log files for the shim process which performs the work.
###############################################################################
output = shim.$(CLUSTER).$(PROCESS).out
error = shim.$(CLUSTER).$(PROCESS).err
log = shim.$(CLUSTER).$(PROCESS).log
Notification = Never
queue 1