[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] disable check pointing



Hi,

Initially I was not putting any RequestMemory, then I tried 0 (thinking it would skip the check). The weird part is that both "solutions" work without the ramdisk...
I do not recall the exactly ImageSize, but I think it was +-300k. I am pretty sure the machine is not going out of memory. I will try to run this again today I think and send you the ImageSize.

Someone rightly corrected me that I was using the wrong syntax. I meant if there is a way of nor having condor to refresh the ImageSize and not disable checkpointing. Sorry for that. 

On Mon, May 7, 2012 at 6:36 PM, John (TJ) Knoeller <johnkn@xxxxxxxxxxx> wrote:
What is RequestMemory for your job?  how big does ImageSize become after the first update?


On 5/7/2012 2:50 PM, Tiago Macarios wrote:
Hi,

I have been struggling with a problem the whole day. It is probably something stupid, but I would really appreciate some light.
I have this computer (32 cores) that is a dedicated pool, we use it to process simulations. Today someone submitted a simulation that needs to read and write loads of tiny files and it caused the computer to go almost idle due to the disk bottleneck. This computer has 64 GB ram, so I figure I would get 20GB as a ramdisk and things would work as they should. The problem is that after the jobs update their ImageSize for the first time they just go to the IDLE state and I get:

013.029:  Run analysis summary.  Of 64 machines,
     64 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 match but are currently offline
      0 are available to run your job
        Last successful match: Mon May  7 19:31:42 2012

WARNING:  Be advised:
   No resources matched request's constraints

The Requirements _expression_ for your job is:

( ( target.OpSys == "LINUX" ) && ( TARGET.Disk >= 0 ) ) &&
( TARGET.Arch == "X86_64" ) && ( ( TARGET.Memory * 1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) && ( TARGET.HasFileTransfer )

Job ClassAd Requirements _expression_ evaluates to false

I figure it is something to do with ( TARGET.Memory * 1024 ) >= ImageSize, how can I change it? I dont really care about check pointing, I just need the end result and if something fails I will restart it from beginning. Can I disable check pointing somehow in the vanilla universe? FYI: The jobs do not use much memory.

Thanks,
Mac.


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/