[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor killing jobs when other are completed?



It is a mounting issue.

It seems that there is a kind of limit (maybe from the Windows system) that only allows me to mount a given folder 12-14 times simultaneously.

I am using the command 

mount -t cifs //windows7-computer/folder-to-mount /home/sonia/mnt-folder -o user=windows_user,pass=windows_password,uid=linux_user,gid=linux_group

I have also tried with smbfs instead of cifs but the problem persists.

Since this is not a condor related issue, I have posted this question to a linux-cifs mailing list.
 
I haven't tried to use automounter. Do you think I should try it?

Cheers,
Sónia

-----Ursprungligt meddelande-----
Från: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] För Mag Gam
Skickat: den 27 december 2010 18:30
Till: Condor-Users Mail List
Ämne: Re: [Condor-users] Condor killing jobs when other are completed?

Could be a mounting issue.

On all of your execute boxes. See if you can access this filesystem on
ALL of your execute boxes. Are you using automounter (amd) by any
chance?



On Mon, Dec 27, 2010 at 12:10 PM, Sónia Liléo <sonia.lileo@xxxxx> wrote:
> Thanks Erik and Mag for your hints!
>
> The problem now is that many of my requests get held.
> Condor_q -better-analyze says that:
>
> 2218.000:  Request is held.
>
> Hold reason: Cannot access initial working directory \\sonia\condor\jobs: Invalid argument
>
> I have mounted permanently the Windows directory \\sonia\condor\jobs in each of my Linux machines.
> (The submitter machine is running Windows while the executing nodes are Linux machines).
> I need to do this because the jobs run by condor need different input data and scripts included in \\sonia\condor\jobs.
>
> So it seems now to me that this is not a Condor related problem, but instead a problem with accessing a cifs file system.
>
> Even though this might not be a Condor related problem, have you possibly experienced a similar issue?
> I am using a fully-qualified DNS name when I mount the cifs file system.
> Weird... 12 to 14 jobs always run perfectly while the rest (about 300 jobs) get held with the hold reason described above.
> (My condor pool has more than 14 nodes).
>
> What is your feeling? Can it be so that this problem is somehow related to Condor?
>
> Cheers,
> Sónia
>
>
> -----Ursprungligt meddelande-----
> Från: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] För Mag Gam
> Skickat: den 27 december 2010 15:22
> Till: Condor-Users Mail List
> Ämne: Re: [Condor-users] Condor killing jobs when other are completed?
>
> A lot file of Starter, startd and schedd would be very helpful.
>
> Hint,
> grep jobid logfilename
>
>
>
> On Sun, Dec 26, 2010 at 7:53 PM, Erik Aronesty <erik@xxxxxxx> wrote:
>> My experience was that on a new installation, vanilla jobs will get killed if
>>
>> ) they are vacated because the server is too busy (usually because
>> they over-fork)
>>
>> or
>>
>> ) there's a problem in communicating "ALIVE" messages (this happened
>> to me because DNS is handled a bit counterinutitively in condor)
>>
>> Double check that
>>
>> ) the IP address that ths ubmitting machine knows itself as is the
>> same ip address that the executing machine can ping the submitter at
>>
>> 2) vanilla jobs are never allowed to be vacated or preempted ... since
>> it just kills them and thats probably not what you want
>>
>>
>> On Thu, Dec 16, 2010 at 6:17 AM, Sónia Liléo <sonia.lileo@xxxxx> wrote:
>>> Hi!
>>>
>>>
>>>
>>> I am submitting several jobs to my condor pool and I have noticed that most
>>> of the submitted jobs are killed before being executed.
>>>
>>>
>>>
>>> The same account is being used to execute all the jobs.
>>>
>>> Is it so that if this account is configured as dedicated, all jobs belonging
>>> to that user will be terminated when a job is completed?
>>>
>>>
>>>
>>> The execute nodes are Linux machines while the submitter is a Windows
>>> machine.
>>>
>>> Neither of the variables EXECUTE_LOGIN_IS_DEDICATED nor
>>> DEDICATED_EXECUTE_ACCOUNT_REGEXP are defined in none of my machines.
>>>
>>>
>>>
>>> How should I solve this problem?
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Sónia
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Sónia Liléo
>>> O2 Strandvägen 5B 114 51 Stockholm
>>> Tel: +46 8 559 310 37 Mobile: +46 73 752 95 74
>>>
>>> www.o2.se
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/