[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: [Condor-users] I wasted some CPU cycles ;-)



Hi,

If you use the standard condor_config from the windows-installer you should
check the MaxSuspendTime in the condor_config. I'm not sure, but as far as I
know this time is set to 30 minutes or so. If the owner of the computer uses
his computer more than 30 minutes the job is killed, so maybe you should
boost the maxsuspendtime.

Thomas 

> -----Ursprüngliche Nachricht-----
> Von: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] Im Auftrag von Miguel Dilaj
> Gesendet: Donnerstag, 7. Juli 2005 14:11
> An: 'Condor-Users Mail List'
> Betreff: RE: [Condor-users] I wasted some CPU cycles ;-)
> 
> Hi Matthias,
> 
> Thanks a lot for your answer.
> Should I understand that there is no way to suspend when 
> there's user activity and continue later on Windows?
> I expected exactly that from the answers I entered in the 
> GUI, no migration, but suspend and resume later.
> Regards,
> 
> Miguel
> 
> 
> 
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of 
> matthias.m.roehm@xxxxxxxxxxxxxxxxxxx
> Sent: 07 July 2005 12:49
> To: condor-users@xxxxxxxxxxx
> Subject: Re: [Condor-users] I wasted some CPU cycles ;-)
> 
> 
> Hi Miguel,
> 
> Condor can't checkpoint on Windows systems, therefore the job 
> is killed and restarted (from the beginning) on another 
> machine. If you want your jobs to run all the time (even if 
> the machine is used by a user), use the folling in your config file:
> 
> WANT_SUSPEND = FALSE
> WANT_VACATE = FALSE
> START = TRUE
> SUSPEND = FALSE
> PREEMPT = FALSE
> 
> mit freundlichen Grüßen / with kind regards,
> 
> Matthias Röhm
> 
> =======================================================
> Matthias Röhm, DaimlerChrysler AG, Research Center Ulm, 
> Department for Data Mining Solutions, RMI/DM 89013 Ulm,  Germany
> 
> Phone:               +49 731 505 4864
> Email:               mailto:Matthias.M.Roehm@xxxxxxxxxxxxxxxxxxx
> =======================================================
> 
> condor-users-bounces@xxxxxxxxxxx schrieb am 07.07.2005 13:00:44:
> 
> > Disclaimer: idiot here ;-)
> 
> > I've got a serious problem.
> 
> > I was running my jobs for the last few days, until I accumulated 2 
> > days
> of
> > run time (the "normal" time for such a task to finish) and today I
> decided
> > to check the size of the file being generated.
> > This morning, after running overnight, the file was 44 
> MB... After 2 
> > days
> of
> > running it should have been close to the final size of 610 
> MB, so that
> was
> > my first shock.
> > Just checked again (the machine is currently in use by the Owner, so
> Condor
> > is not active) and the file is not there anymore.
> 
> > I suspected that this morning when I checked the file 
> size... Instead 
> > of being suspended to resume later, my jobs are being 
> killed for some 
> > reason. Being a new starter with Condor probably I missed something.
> 
> > A bit of background: the machines are all Windows (2K and XP), with 
> > the central server on 2K. After little struggling I got the jobs 
> > running
> using
> > this .sub:
> 
> > #
> > # Submit 4 jobs of rtgen.exe to Condor Universe = vanilla 
> Executable = 
> > rtgen.exe Arguments = ntlm alpha 1 7 $(Process) 9000 40000000 ncc 
> > Initialdir = E:/ Transfer_input_files = libeay32.dll, charset.txt 
> > Should_transfer_files = YES When_to_transfer_output = ON_EXIT 
> > Nice_user = True Notification = Never Getenv = False 
> Requirements = ( 
> > (OpSys == "WINNT50") || (OpSys == "WINNT51") ) # later I've to try 
> > #Requirements = ( (OpSys == "WINNT50") || (OpSys == "WINNT51") ) && 
> > (VirtualMachineID == 1) # and #hold = True Queue 4
> >
> > I'm pretty sure that my problem is not there, but in the 
> condor_config
> file
> > on each node, most likely under Part 3, that I left exactly as 
> > installed
> by
> > the Windows GUI installer (I only modified bits in Parts 1 
> and 2, to 
> > make
> it
> > work).
> 
> > During installation using the GUI, I choose to suspend and continue
> later,
> > no migration.
> > What do I have to modify in condor_config (in the clients only? Or 
> > also
> the
> > central server?) to ensure that a job that has to run for 2 days of 
> > CPU time, generating a file of 610 MB, is not killed when 
> the owner is 
> > using
> the
> > machine?
> 
> > TIA!
> > Regards,
> 
> > Miguel
> 
> >
> >
> **************************************************************
> **************
> *******************************
> 
> > DISCLAIMER:
> > This e-mail contains proprietary information, some or all 
> of which may 
> > be legally privileged. It is for the intended recipient only. If an 
> > addressing or transmission error has misdirected this 
> e-mail, please 
> > notify the author by replying to this e-mail. If you are not the 
> > intended recipient you may not use, disclose, distribute, 
> copy, print 
> > or rely on this e-mail.
> >
> **************************************************************
> **************
> *******************************
> 
> 
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> 
> **************************************************************
> *********************************************
> DISCLAIMER:                                                   
>                                              
> This e-mail contains proprietary information, some or all of 
> which may be legally privileged.              
> It is for the intended recipient only. If an addressing or 
> transmission error has misdirected this e-mail, 
> please notify the author by replying to this e-mail. If you 
> are not the intended recipient you may not use,
> disclose, distribute, copy, print or rely on this e-mail.     
>                                              
> **************************************************************
> *********************************************
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>