[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor questions.



inline below

Cheers,
Tim

----- Original Message -----
> From: "Ian Davies" <IDavies@xxxxxxxxx>
> To: "Condor-Users Mail List (condor-users@xxxxxxxxxxx)" <condor-users@xxxxxxxxxxx>
> Sent: Thursday, March 29, 2012 4:33:14 AM
> Subject: [Condor-users] Condor questions.
> 
> Oops - re-sent due to formatting errors  - please ignore previous
> email...
> 
> 
> Hi there, I'm new to condor, and am struggling to setup/administer a
> small pool of Windows 7 machines.
> Could someone give me a few clues please?

Here are some resources:
1. http://research.cs.wisc.edu/condor/manual/v7.7/6_2Microsoft_Windows.html
2. http://research.cs.wisc.edu/condor/manual/v7.7/7_4Condor_on.html
3. http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool


> 
> So far I have 4 machines with condor installed, and have had some
> success submitting jobs from a client, and they execute OK on three
> PCs.
> 
> However I have two problems:
> 
> 1 A general point - All the different executables to be run assume
> they are run from a particular directory in a  hierarchy - the
> requirements being different for the different types of executable,
>    So I have tried submitting  a windows batch file, which uses pushd
>    to a shared unc path on the submitting machine: like this
> 
> 
>  pushd \\mycomputername\MyDirectory
> 
> \\mycomputername\MyDirectory\myExecutable args..
> 
> This seems to work on for three machines, but my question is  this
> going to be scalable when I add more machines to the pool?

It looks like your trying to connect a remote directory, depending on your requirements && size you may want to look into file transfer. 

>  - I have sometimes seen error output from the executables like:
> "No more connections can be made to this remote computer at this time
> because there are already as many connections as the computer can
> accept In the error output of the programs."
> My question Would this happen more for a bigger pool - being as all
> the remote executable would have to write to/read from  the same
> share?

If that is the case you would want a beefy active directory machine. 

>  and would upgrading one of the machines to a windows server help ?
>  or is there another way around this?
> 
> 2 Second question is very specific:
> One of the machines shows the correct LocalCredd value - I've made
> the Credd permissions very lax, But will  not run jobs - I've looked
> at its starter Log, it seems to loose communication with the master
>  (EDI-IAR1), then shutdown.
> 
> StarterLog.slot1:
> 03/29/12 09:21:38 setting the orig job iwd in starter
> 03/29/12 09:21:38 condor_read() failed: recv() returned -1, errno =
> 10053 , reading 5 bytes from credd EDI-IAR1:9620.
> 03/29/12 09:21:38 IO: Failed to read packet header
> 03/29/12 09:21:38 ERROR: Could not locate valid credential for user
> 'iada@TMVSE'
> ...

Is your pool validating it's credz against a single source?  
http://research.cs.wisc.edu/condor/manual/v7.7/6_2Microsoft_Windows.html#SECTION00724000000000000000



> 
> StartLog:
> 03/29/12 09:20:35 slot1: Changing state: Unclaimed -> Claimed
> 03/29/12 09:20:35 slot1: match_info called
> 03/29/12 09:20:35 slot1: Got activate_claim request from shadow
> (<10.105.11.190:64724>)
> 03/29/12 09:20:35 slot1: Remote job ID is 678.0
> 03/29/12 09:20:36 slot1: Got universe "VANILLA" (5) from request
> classad
> 03/29/12 09:20:36 slot1: State change: claim-activation protocol
> successful
> 03/29/12 09:20:36 slot1: Changing activity: Idle -> Busy
> 03/29/12 09:20:36 condor_read() failed: recv() returned -1, errno =
> 10054 , reading 5 bytes from <127.0.0.1:49509>.
> 03/29/12 09:20:36 IO: Failed to read packet header
> 03/29/12 09:20:36 Starter pid 1124 exited with status 1
> 03/29/12 09:20:36 slot1: State change: starter exited
> 03/29/12 09:20:36 slot1: Changing activity: Busy -> Idle
> 
> 
> Thanks for any help!
> 
> 
> Ian Davies
> Contract Software Engineer
> Toshiba Medical Visualization Systems Europe, Ltd Bonnington Bond, 2
> Anderson Place, Edinburgh EH6 5NP, UK Tel + 44 (0)131 472 4792 Fax +
> 44 (0) 131 472 4799 www.tmvse.com IDavies@xxxxxxxxx
> 
> 
> 
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud
> service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>