[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] confusion around new spool in 7.5.5




On Feb 18, 2011, at 12:18 , Dan Bradley wrote:

Peter,

Prior to 7.5.5, Condor created job spool directories if they didn't already exist whenever it launched a job. Now, it only creates job spool directories when needed. This means that jobs which do not spool input files and do not spool output files will not have a spool directory.

This sounds very wise. It should reduce the I/O considerably with a large pool.


Upon upgrade, I would expect any jobs that already have spool directories (i.e. any running job) to still have spool directories (but moved into the new location). Do you think that is not the case in your situation?

No, I think your guess is correct.



Hmm, okay. Jobs seem to be running okay, but I see a lot of these errors in the Shadow Log:

02/18/11 12:09:25 (pid:649) (15101845.0) (649): Directory::setOwnerPriv() -- failed to find owner of /raid0/ gwms_schedd/spool/1845/0/cluster15101845.proc0.subproc0.tmp 02/18/11 12:09:25 (pid:649) (15101845.0) (649): Directory::Rewind(): failed to find owner of "/raid0/gwms_schedd/ spool/1845/0/cluster15101845.proc0.subproc0.tmp"

I guess that's part of the problem. I checked the perms on the spool directory, and then I set it to 777 and verified regular users can write to it, but that didn't stop the errors, or cause files to be created there.
So I'm not really clear what's going on.

It appears that these messages are expected for jobs that do not have spool directories (see my other message). Therefore, we should fix the shadow not to generate noise in this case.


Ah, okay, so it's just a harmless error message. I take that risk when running development code. :-) If there's an intermediate release that can silence the Shadow, I'll take it, otherwise I'll just ignore the errors.

--Peter

Attachment: smime.p7s
Description: S/MIME cryptographic signature