Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1

Date: Thu, 05 Nov 2015 14:07:44 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1

On 11/4/2015 1:15 PM, Feldt, Andrew N. wrote:


11/02/15 11:18:33 (8.0) (2889688):Read: Opened "/var/lib/condor/spool/8/0/cluster8.proc0.subproc0" via file stream
11/02/15 11:18:33 (8.0) (2889688):Read: Read headers OK
11/02/15 11:18:33 (8.0) (2889688):Read: Read SegMap[0](DATA) OK
11/02/15 11:18:33 (8.0) (2889688):Read: Read SegMap[1](STACK) OK
11/02/15 11:18:33 (8.0) (2889688):Read: Read all SegMaps OK
11/02/15 11:18:33 (8.0) (2889688):Read: Found a DATA block, increasing heap from 0x887000 to 0x986000
11/02/15 11:18:33 (8.0) (2889688):Read: About to overwrite 1789952 bytes starting at 0x7d1000(DATA)
11/02/15 11:18:33 (8.0) (2889688):Reaped child status - pid 2889690 exited with status 0
11/02/15 11:18:33 (8.0) (2889688):Read: *** longjmp causes uninitialized stack frame ***: condor_exec.8.0 terminated

I think "longjmp causes uninitialized stack frame" is coming from GCC'sfortify source compiler options.

So you are running universe=standard jobs on HTCondor v8.4.1 on RHEL6.7. Some questions -

- Is this failure on restart happening at your site for ALL standarduniverse jobs? Or just consistently for certain jobs? Or onlyoccasionally? If the latter, ~ how many jobs get stuck on restart - 5%,50%, 90%, or?

- Where did you get your HTCondor binaries from? Options include RPMdownloaded from htcondor.org, or RPMs from EPEL, self compiled fromsource, or?


 - Could you send along the output from condor_version ?

thanks
Todd

Follow-Ups:
- Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
  - From: Feldt, Andrew N.

References:
- [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
  - From: Feldt, Andrew N.
- Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
  - From: Feldt, Andrew N.

Prev by Date: [HTCondor-users] stuck queued jobs
Next by Date: Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
Previous by thread: Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
Next by thread: Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1