[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] I wasted some CPU cycles ;-)



Hi all,

Well... I've setup:

	WANT_SUSPEND	= TRUE
	WANT_VACATE		= FALSE
	PREEMPT		= FALSE
	KILL			= FALSE

(and also CPU=1, that works) in all nodes, including the central server. All
nodes are Windows running condor 6.6.10.

After that I did condor_reconfig -full -all, followed by condor_restart -all
But my processes are still going to Idle (instead of Suspended) when the
user returns and starts typing.
To be fair: the processes briefly go to Suspended state, but later are Idle
again.

I want my processes to be Suspend'ed when the user returns, and continue
later when the machine is free. No migration, no restarting.

Again... Any ideas what am I doing wrong, or where to look for further
investigation???
TIA!!
Regards,

Miguel



-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Miguel Dilaj
Sent: 07 July 2005 14:13
To: 'Condor-Users Mail List'
Subject: RE: [Condor-users] I wasted some CPU cycles ;-)


OK... I think that I'm starting to understand this ;-)

What I have (the default) is:

	WANT_SUSPEND = TRUE
	WANT_VACATE = FALSE
	START			= $(UWCS_START)
	SUSPEND			= $(UWCS_SUSPEND)
	CONTINUE		= $(UWCS_CONTINUE)
	PREEMPT			= $(UWCS_PREEMPT)
	KILL			= $(UWCS_KILL)
	PERIODIC_CHECKPOINT	= $(UWCS_PERIODIC_CHECKPOINT)
	PREEMPTION_REQUIREMENTS	= $(UWCS_PREEMPTION_REQUIREMENTS)
	PREEMPTION_RANK		= $(UWCS_PREEMPTION_RANK)
	NEGOTIATOR_PRE_JOB_RANK = $(UWCS_NEGOTIATOR_PRE_JOB_RANK)
	NEGOTIATOR_POST_JOB_RANK = $(UWCS_NEGOTIATOR_POST_JOB_RANK)

And

	UWCS_SUSPEND = ( $(KeyboardBusy) || \
	                 ( (CpuBusyTime > 2 * $(MINUTE)) \
	                   && $(ActivationTimer) > 90 ) )

And

	UWCS_KILL = $(ActivityTimer) > $(MaxVacateTime) 


So I *GUESS* that if I use

	PREEMPT		= FALSE
	KILL			= FALSE

I can be sure that the jobs will stay alive.
Probably not the best/most elegant solution...

Cheers,

Miguel


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
matthias.m.roehm@xxxxxxxxxxxxxxxxxxx
Sent: 07 July 2005 13:42
To: condor-users@xxxxxxxxxxx
Subject: RE: [Condor-users] I wasted some CPU cycles ;-)


Hi,

the settings I mentioned are for testing if your job is running fine if it
isn't suspended by condor. If you want your job to be suspended and resumed
later, look at the following expressions in your config files:

WANT_SUSPEND, WANT_VACATE, START, SUSPEND, CONTINUE, PREEMPT

Set PREEMPT=FALSE to avoid that your job gets killed after a few minutes of
being suspended. I expect them to look like this:

WANT_SUSPEND = TRUE
WANT_VACATE = FALSE
PREEMPT = FALSE
START = TRUE
SUSPEND = $( TEST_SUSPEND)
CONTINUE = $(UWCS_CONTINUE)

# Suspend jobs if the keyboard has been touched
TEST_SUSPEND = $(KeyboardBusy)

# Continue jobs if:
# 1) the cpu is idle, AND
# 2) we've been suspended more than 10 seconds, AND
# 3) the keyboard hasn't been touched in a while
UWCS_CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
                  && (KeyboardIdle > $(ContinueIdleTime)) )


mit freundlichen Grüßen / with kind regards,

Matthias Röhm

=======================================================
Matthias Röhm, DaimlerChrysler AG, Research Center Ulm, Department for Data
Mining Solutions, RMI/DM 89013 Ulm,  Germany

Phone:               +49 731 505 4864
Email:               mailto:Matthias.M.Roehm@xxxxxxxxxxxxxxxxxxx
=======================================================

condor-users-bounces@xxxxxxxxxxx schrieb am 07.07.2005 14:11:28:

> Hi Matthias,

> Thanks a lot for your answer.
> Should I understand that there is no way to suspend when there's user
> activity and continue later on Windows? I expected exactly that from 
> the answers I entered in the GUI, no
migration,
> but suspend and resume later.
> Regards,

> Miguel

>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of 
> matthias.m.roehm@xxxxxxxxxxxxxxxxxxx
> Sent: 07 July 2005 12:49
> To: condor-users@xxxxxxxxxxx
> Subject: Re: [Condor-users] I wasted some CPU cycles ;-)

>
> Hi Miguel,

> Condor can't checkpoint on Windows systems, therefore the job is
> killed
and
> restarted (from the beginning) on another machine. If you want your
> jobs
to
> run all the time (even if the machine is used by a user), use the
> folling
in
> your config file:

> WANT_SUSPEND = FALSE
> WANT_VACATE = FALSE
> START = TRUE
> SUSPEND = FALSE
> PREEMPT = FALSE

> mit freundlichen Grüßen / with kind regards,

> Matthias Röhm

> =======================================================
> Matthias Röhm, DaimlerChrysler AG, Research Center Ulm, Department for
Data
> Mining Solutions, RMI/DM 89013 Ulm,  Germany

> Phone:               +49 731 505 4864
> Email:               mailto:Matthias.M.Roehm@xxxxxxxxxxxxxxxxxxx
> =======================================================

> condor-users-bounces@xxxxxxxxxxx schrieb am 07.07.2005 13:00:44:

> > Disclaimer: idiot here ;-)

> > I've got a serious problem.

> > I was running my jobs for the last few days, until I accumulated 2
> > days
> of
> > run time (the "normal" time for such a task to finish) and today I
> decided
> > to check the size of the file being generated.
> > This morning, after running overnight, the file was 44 MB... After 2
> > days
> of
> > running it should have been close to the final size of 610 MB, so
> > that
> was
> > my first shock.
> > Just checked again (the machine is currently in use by the Owner, so
> Condor
> > is not active) and the file is not there anymore.

> > I suspected that this morning when I checked the file size...
> > Instead of being suspended to resume later, my jobs are being killed 
> > for some reason. Being a new starter with Condor probably I missed 
> > something.

> > A bit of background: the machines are all Windows (2K and XP), with
> > the central server on 2K. After little struggling I got the jobs 
> > running
> using
> > this .sub:

> > #
> > # Submit 4 jobs of rtgen.exe to Condor
> > Universe = vanilla
> > Executable = rtgen.exe
> > Arguments = ntlm alpha 1 7 $(Process) 9000 40000000 ncc Initialdir =
> > E:/ Transfer_input_files = libeay32.dll, charset.txt 
> > Should_transfer_files = YES When_to_transfer_output = ON_EXIT
> > Nice_user = True
> > Notification = Never
> > Getenv = False
> > Requirements = ( (OpSys == "WINNT50") || (OpSys == "WINNT51") )
> > # later I've to try
> > #Requirements = ( (OpSys == "WINNT50") || (OpSys == "WINNT51") ) &&
> > (VirtualMachineID == 1)
> > # and
> > #hold = True
> > Queue 4
> >
> > I'm pretty sure that my problem is not there, but in the
> > condor_config
> file
> > on each node, most likely under Part 3, that I left exactly as
> > installed
> by
> > the Windows GUI installer (I only modified bits in Parts 1 and 2, to
> > make
> it
> > work).

> > During installation using the GUI, I choose to suspend and continue
> later,
> > no migration.
> > What do I have to modify in condor_config (in the clients only? Or
> > also
> the
> > central server?) to ensure that a job that has to run for 2 days of
> > CPU time, generating a file of 610 MB, is not killed when the owner 
> > is using
> the
> > machine?

> > TIA!
> > Regards,

> > Miguel

> >
> >
>
****************************************************************************

> *******************************

> > DISCLAIMER:
> > This e-mail contains proprietary information, some or all of which
> > may be legally privileged. It is for the intended recipient only. If 
> > an addressing or transmission error has misdirected this e-mail, 
> > please notify the author by replying to this e-mail. If you are not 
> > the intended recipient you may not use, disclose, distribute, copy, 
> > print or rely on this e-mail.
> >
>
****************************************************************************

> *******************************

>
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users

>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users

>
>
****************************************************************************
*******************************

> DISCLAIMER:
> This e-mail contains proprietary information, some or all of which may
> be legally privileged. It is for the intended recipient only. If an 
> addressing or transmission error has misdirected this e-mail,
> please notify the author by replying to this e-mail. If you are not
> the intended recipient you may not use,
> disclose, distribute, copy, print or rely on this e-mail.
>
****************************************************************************
*******************************


>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


****************************************************************************
*******************************
DISCLAIMER:

This e-mail contains proprietary information, some or all of which may be
legally privileged.              
It is for the intended recipient only. If an addressing or transmission
error has misdirected this e-mail, 
please notify the author by replying to this e-mail. If you are not the
intended recipient you may not use,
disclose, distribute, copy, print or rely on this e-mail.

****************************************************************************
*******************************


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

****************************************************************************
*******************************
DISCLAIMER:

This e-mail contains proprietary information, some or all of which may be
legally privileged.              
It is for the intended recipient only. If an addressing or transmission
error has misdirected this e-mail, 
please notify the author by replying to this e-mail. If you are not the
intended recipient you may not use,
disclose, distribute, copy, print or rely on this e-mail.

****************************************************************************
*******************************


***********************************************************************************************************
DISCLAIMER:                                                                                                
This e-mail contains proprietary information, some or all of which may be legally privileged.              
It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, 
please notify the author by replying to this e-mail. If you are not the intended recipient you may not use,
disclose, distribute, copy, print or rely on this e-mail.                                                  
***********************************************************************************************************