Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAG File descriptor panic when quota is exceeded

Date: Thu, 24 Dec 2009 00:20:08 -0500
From: Ian Stokes-Rees <ijstokes@xxxxxxxxxxxxxxxxxxx>
Subject: [Condor-users] DAG File descriptor panic when quota is exceeded

I did a condor_rm earlier today on a 100k node DAG and Condor becameintermittent then stopped responding for 45+ minutes. condor_restartand other attempts to revive it did not work (we only attempted theseafter about 30 minutes). Is this a possible side effect of the rescueDAG being created for a large DAG?


Thanks,

Ian

some more details:

30k nodes had completed, about 2k were queued, and only a handful wereactively executing (less than 100). The job was submitted around 9pm,and at 9am today we could see that overnight nothing finished betweenabout midnight and 4am (I don't have the log files available to me atthe moment). We discovered there was some bad data in one of the keyinput files, hence the decision to cancel the DAG with the condor_rm.


--
Ian Stokes-Rees, Research Associate
SBGrid, Harvard Medical School
http://sbgrid.org

begin:vcard
fn:Ian Stokes-Rees
n:Stokes-Rees;Ian
org:Harvard Medical School;Biological Chemistry and Molecular Pharmacology
adr:250 Longwood Ave;;SGM-105;Boston;MA;02115;USA
email;internet:ijstokes@xxxxxxxxxxxxxxxxxxx
title:Research Associate, Sliz Lab
tel;work:+1.617.432.5608 x75
tel;fax:+1.617.432.5600
tel;cell:+1.617.331.5993
x-mozilla-html:TRUE
url:http:/sbgrid.org
version:2.1
end:vcard

Follow-Ups:
- Re: [Condor-users] DAG File descriptor panic when quota is exceeded
  - From: R. Kent Wenger

Prev by Date: [Condor-users] ClassAds and START statement
Next by Date: [Condor-users] fair scheduling
Previous by thread: Re: [Condor-users] DAG File descriptor panic when quota is exceeded
Next by thread: Re: [Condor-users] DAG File descriptor panic when quota is exceeded
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] DAG File descriptor panic when quota is exceeded