[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] confusion around new spool in 7.5.5




On Feb 18, 2011, at 11:26 , Peter Doherty wrote:

I upgraded to v7.5.5 and there's one thing I'm scratching my head over.

I used to have a SPOOL directory filled with directories with names like:
cluster15093481.proc0.subproc0.tmp/

According to the changelog I should now have dirs in the format of:
$(SPOOL)/<#>/<#>/cluster<#>.proc<#>.subproc<#>


But the thing is, I don't have anything.
my SPOOL just has:
job_queue.log
local_univ_execute
spool_version

I've got a few thousand jobs in the queue right now.
Where are the spool files? I'm sure I'm looking in the correct directory. I've tried to find them, but I can't. I see a lot of lock files in $(TMP_DIR)

I believe the constant I/O of all the spool files was one of the bottlenecks of our Schedd, so if that's really been improved upon, I'm eager to see the effect, but from reading the changelog, the only different should have been subdirs for the spool to keep from hitting ext3 limits.


Hmm, okay. Jobs seem to be running okay, but I see a lot of these errors in the Shadow Log:

02/18/11 12:09:25 (pid:649) (15101845.0) (649): Directory::setOwnerPriv() -- failed to find owner of /raid0/ gwms_schedd/spool/1845/0/cluster15101845.proc0.subproc0.tmp 02/18/11 12:09:25 (pid:649) (15101845.0) (649): Directory::Rewind(): failed to find owner of "/raid0/gwms_schedd/spool/1845/0/ cluster15101845.proc0.subproc0.tmp"

I guess that's part of the problem. I checked the perms on the spool directory, and then I set it to 777 and verified regular users can write to it, but that didn't stop the errors, or cause files to be created there.
So I'm not really clear what's going on.

--Peter

Attachment: smime.p7s
Description: S/MIME cryptographic signature