[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Rendering
- Date: Thu, 29 Mar 2012 09:25:29 +0000
- From: "Davies, Ian" <IDavies@xxxxxxxxx>
- Subject: Re: [Condor-users] Rendering
I'm new to condor, and am struggling to setup/administer a small pool of Windows 7 machines.
Could someone give me a few clues please?
So far I have 4 machines with condor installed, and have had some success submitting jobs from a client, and they execute OK on three PCs.
However I have two problems:
1 A general point - All the different executables to be run assume they are run from a particular directory in a hierarchy - the requirements being different for the different types of executable,
So I have tried submitting a windows batch file, which uses pushd to a shared unc path on the submitting machine: like this
This seems to work on for three machines, but my question is this going to be scalable when I add more machines to the pool?
- I have sometimes seen error output from the executables like:
"No more connections can be made to this remote computer at this time because there are already as many connections as the computer can accept
In the error output of the programs."
My question Would this happen more for a bigger pool - being as all the remote executable would have to write to/read from the same share?
and would upgrading one of the machines to a windows server help ?
or is there another way around this?
2 Second question is very specific:
One of the machines shows the correct LocalCredd value - I've made the Credd permissions very lax,
But will not run jobs - I've looked at its starter Log, it seems to loose communication with the master (EDI-IAR1), then shutdown.
03/29/12 09:21:38 setting the orig job iwd in starter
03/29/12 09:21:38 condor_read() failed: recv() returned -1, errno = 10053 , reading 5 bytes from credd EDI-IAR1:9620.
03/29/12 09:21:38 IO: Failed to read packet header
03/29/12 09:21:38 ERROR: Could not locate valid credential for user 'iada@TMVSE'
03/29/12 09:20:35 slot1: Changing state: Unclaimed -> Claimed
03/29/12 09:20:35 slot1: match_info called
03/29/12 09:20:35 slot1: Got activate_claim request from shadow (<10.105.11.190:64724>)
03/29/12 09:20:35 slot1: Remote job ID is 678.0
03/29/12 09:20:36 slot1: Got universe "VANILLA" (5) from request classad
03/29/12 09:20:36 slot1: State change: claim-activation protocol successful
03/29/12 09:20:36 slot1: Changing activity: Idle -> Busy
03/29/12 09:20:36 condor_read() failed: recv() returned -1, errno = 10054 , reading 5 bytes from <127.0.0.1:49509>.
03/29/12 09:20:36 IO: Failed to read packet header
03/29/12 09:20:36 Starter pid 1124 exited with status 1
03/29/12 09:20:36 slot1: State change: starter exited
03/29/12 09:20:36 slot1: Changing activity: Busy -> Idle
Thanks for any help!
Contract Software Engineer
Toshiba Medical Visualization Systems Europe, Ltd
Bonnington Bond, 2 Anderson Place, Edinburgh EH6 5NP, UK
Tel + 44 (0)131 472 4792 Fax + 44 (0) 131 472 4799
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com