Hi, I have an application compiled with condor_compile. I am trying to run it in standalone way using: ./executable input -_condor_D_ALL
then from another shell I am sending checkpoint signal : kill -USR2 pid
But this is what I get:
.............................................. Got SIGUSR2 Saved signal state. About to save file state CondorFileTable::checkpoint
OPEN FILE TABLE: fd 0 logical name: default stdin
offset: 0 dups: 1 open flags: 0x0 not currently bound to a url. fd 1 logical name: default stdout offset: 315 dups: 1
open flags: 0x1 url: fd:1 size: 315 opens: 1 fd 2 logical name: default stderr offset: 0 dups: 1 open flags: 0x1
not currently bound to a url. working dir = /home/yara/sbagchi/tislam/condorExperiments/spec_429.mcf Done saving file state About to update MyImage Adding a DATA segment: start[0x659000], end [0x694cd000]
Image::AddSegment: name=[DATA], start=[659000], end=[694cd000], length=[0x68e74000], prot=[0xffffffff00000000] Adding a STACK segment: start[0x7fffbfa5d000], end [0x7fffbfa66fff] Image::AddSegment: name=[STACK], start=[7fffbfa5d000], end=[7fffbfa66fff], length=[0x9fff], prot=[0x0]
Pos: 1759986720 Pos: 1760027679 Size of ckpt image = 1760027679 bytes About to write checkpoint Image::Write(): fd -1 file_name ./mcf.ckpt Checkpoint name is "./mcf.ckpt" Tmp name is "./mcf.ckpt.tmp"
Wrote headers OK Wrote all SegMaps OK write(fd=3,core_loc=0x659000,len=0x68e74000) I wrote 745472 bytes with write... I wrote -1 bytes with write... in SegMap::Write(): fd = 3, write_size=1759240192 errno=14, core_loc=70f000
Write() Segment[0] of type DATA -> FAILED errno = 14, nbytes = -1 Periodic Ckpt complete, doing a virtual restart... About to restore file state CondorFileTable::resume
working dir = /home/mcf
OPEN FILE TABLE: fd 0 logical name: default stdin offset: 0 dups: 1 open flags: 0x0 not currently bound to a url.
fd 1 logical name: default stdout offset: 315 dups: 1 open flags: 0x1 not currently bound to a url. fd 2 logical name: default stderr
offset: 0 dups: 1 open flags: 0x1 not currently bound to a url. Done restoring file state About to restore signal state About to return to user code
.............................................. This debug message clearly shows some error occurred so I only see mcf.ckpt.tmp being generated. Any idea what errno=14 means? checkpoint's size might be the reason?