[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Directory creation problem with 7.4.1 on Fedora 12?



On Fri, Mar 26, 2010 at 3:25 PM, Timothy St. Clair <tstclair@xxxxxxxxxx> wrote:
> I do not see anything out of the ordinary in your condor_config file,
> other then you should no longer need the HOSTALLOW params in your config
> file, they have been replaced ALLOW.
>
> Have you made any changes to your condor_config.local file?  If so that
> would also be useful, if not, then you may want change ALL_DEBUG =
> D_FULLDEBUG to get more verbose logging which may help to correctly
> identify the issue.
>

Finally had time to get back to this.  I'm seeing exactly the same
behaviour on a brand new Fedora 13 box, with Condor 7.4.2 installed
from the Fedora repository.  The only changes I made to
condor_config.local were to remove the unnecessary daemons (it's just
an execute node).

I have logs with that debug setting, so let me know which you'd like to see.

Adam

> Cheers,
> Tim
>
>
> On Wed, 2010-03-24 at 17:02 +0000, Adam Huffman wrote:
>> On Wed, Mar 24, 2010 at 3:47 PM, Timothy St. Clair <tstclair@xxxxxxxxxx> wrote:
>> > Couple of questions:
>> >
>> >        1.) How was condor started? (as root or as a user?)
>>
>> It's set to start automatically and when testing I run
>>
>> service condor restart
>>
>> etc.  It appears to run condor_master as the 'condor' user, which has
>> its home directory as /var/lib/condor
>>
>> >        2.) Could you post your condor_config?
>> >
>>
>> Attached.  I've removed a couple of the local host ranges from the
>> ALLOW directives.  I added the old-style HOSTALLOW in case that made a
>> difference, which it didn't.
>>
>> Cheer,s
>> Adam
>>
>> > Cheers,
>> > Tim
>> >
>> >
>> > On Wed, 2010-03-24 at 11:11 +0000, Adam Huffman wrote:
>> >> I've installed 7.4.1 from the Fedora repositories on a brand new
>> >> Fedora 12 machine.
>> >>
>> >> Whenever I submit jobs to that machine, the jobs move to the Hold
>> >> state.  Here's an example error message on the central manager:
>> >>
>> >> 112.016:  Request is held.
>> >>
>> >> Hold reason: Error from starter on slot21@...man.ac.uk: Failed to
>> >> execute '/var/lib/condor/execute/dir_2772/condor_exec.exe': No such
>> >> file or directory
>> >>
>> >> and on the execute host itself:
>> >>
>> >>
>> >> 03/24 10:58:39 Using config source: /etc/condor/condor_config
>> >> 03/24 10:58:39 Using local config sources:
>> >> 03/24 10:58:39    /var/lib/condor/condor_config.local
>> >> 03/24 10:58:39 DaemonCore: Command Socket at <...:44951>
>> >> 03/24 10:58:39 Done setting resource limits
>> >> 03/24 10:58:39 Communicating with shadow <...:36474>
>> >> 03/24 10:58:39 Submitting machine is "...man.ac.uk"
>> >> 03/24 10:58:39 setting the orig job name in starter
>> >> 03/24 10:58:39 setting the orig job iwd in starter
>> >> 03/24 10:58:39 File transfer completed successfully.
>> >> 03/24 10:58:40 Job 112.13 set to execute immediately
>> >> 03/24 10:58:40 Starting a VANILLA universe job with ID: 112.13
>> >> 03/24 10:58:40 IWD: /var/lib/condor/execute/dir_2752
>> >> 03/24 10:58:40 Input file:
>> >> /var/lib/condor/execute/dir_2752/dammin-mer_cb_clair.13.inp
>> >> 03/24 10:58:40 Output file:
>> >> /var/lib/condor/execute/dir_2752/dammin-mer_cb_clair.13.out
>> >> 03/24 10:58:40 Error file:
>> >> /var/lib/condor/execute/dir_2752/dammin-mer_cb_clair.13.err
>> >> 03/24 10:58:40 About to exec /var/lib/condor/execute/dir_2752/condor_exec.exe
>> >> 03/24 10:58:40 Create_Process(/var/lib/condor/execute/dir_2752/condor_exec.exe):
>> >> child failed with errno 2 (No such file or directory) before exec()
>> >> 03/24 10:58:40 ERROR
>> >> "Create_Process(/var/lib/condor/execute/dir_2752/condor_exec.exe,,
>> >> ...) failed: No such file or directory" at line 530 in file
>> >> os_proc.cpp
>> >> 03/24 10:58:40 ShutdownFast all jobs.
>> >>
>> >> In fact there are no subdirectories in /var/lib/condor/execute, so I
>> >> wonder whether it's having trouble creating them.  I added
>> >> transfer_executable = true to the submit file, even though it wasn't
>> >> needed before.  It didn't make any difference.
>> >>
>> >> The same version (7.4.1) is working on an older Fedora 12 machine, the
>> >> difference being that it may have an older version of condor_config,
>> >> as it's been upgraded several times.
>> >>
>> >>
>> >> Adam
>> >> _______________________________________________
>> >> Condor-users mailing list
>> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> >> subject: Unsubscribe
>> >> You can also unsubscribe by visiting
>> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> >>
>> >> The archives can be found at:
>> >> https://lists.cs.wisc.edu/archive/condor-users/
>> >
>> > _______________________________________________
>> > Condor-users mailing list
>> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> > subject: Unsubscribe
>> > You can also unsubscribe by visiting
>> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> >
>> > The archives can be found at:
>> > https://lists.cs.wisc.edu/archive/condor-users/
>> >
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>