[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1



> On Nov 11, 2015, at 2:55 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
> 
> On 11/10/2015 11:03 AM, Feldt, Andrew N. wrote:
> 
>> 
>> Todd,
>> 
>> We have now reverted to condor-8.2.10-345812 for our production
>> HTCondor pool.  This is allowing our jobs to properly vacate as
>> needed.  (This is from the htcondor-previous repo.)  I will be
>> interested in future updates to the 8.4 series which may address the
>> checkpoint-restart problem.
>> 
>> Andy
>> 
> 
> Hi Andy,
> 
> We think we now know what is happening and how to fix it.
> 
> I am guessing that your v8.4 attempt was using binaries from the RPM package?
> 
> Our thinking is that the v8.4 binaries contained in the tarball would work, but the v8.4 binaries in the RPM packages would fail (with respect to standard universe restart).  This is because our tarball binaries are built with cmake, and our RPM packages are built via rpmbuild calling out to cmake.  The issue is rpmbuild sneaks in a bunch of additional and undesired compiler flags.  We are working to fix this issue for the upcoming HTCondor v8.4.2 release. Follow progress and see details at:
>  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5382
> 
> Thank you for bringing this to our attention!!
> 
> Also, I am always curious how folks are using standard universe... could you share a brief description of the sort of jobs (i.e. what application, what scientific domain, etc) that are using standard universe at Univ of Oklahoma?
> 
> best regards
> Todd

Todd,

Yes, our 8.4 attempt was from the RPM packages obtained by the âstableâ repo.

Here, our primary current HTCondor users are astronomers (well, one cosmologist dominates).  The cosmology programs she runs go for 3-4 months at a time.  We are a Physics and Astronomy department and have had folks from all four of our primary research areas (Astrophysics, Atomic-Molecular-Optical, Condensed-Matter and High-Energy) submitting jobs over the years.  There is a separate HTCondor pool for the High Energy folks (which I donât manage, so I canât speak to what is happening there) so we donât see that on our main dept. pool anymore.

Andy