[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] NFS errors with log file



Steve,

We are successfully running O(100k) node DAGs in LIGO using the existing
7.0.1 schedd and dagman scalability enhancements with just a single schedd.
I am curious what limitation you are running into with your large dags
on a single schedd?  Are you using an older 6.8.x version?

Thanks.

We have not applied any of the dagman scalability enhancements as far
as I know.  The multiple schedd configuration dates from the condor 6.7
days.  There is one schedd to deal with running the dags, six to submit
glideins to remote grid sites, and four to match jobs to slots in
the glidein pool. These four sub schedd's use a large fraction of the cpu.. and as I said right now they are all on the same node.
HAving a dual quad core node will help a lot and probably get
us around the problem for now.

Steve Timm



--
Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/