[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Windows Server 2003 condor_submit problem



Hi,

Thanks for the help.  

It turns out what really made things work was to reset the permissions on
'cmd.exe' on the Windows 2003 machine to be executable by any local user.

Thanks again,
Diane
 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ben Burnett
Sent: Monday, June 23, 2008 11:18 AM
To: 'Condor-Users Mail List'
Subject: Re: [Condor-users] Windows Server 2003 condor_submit problem

Hi Diane:

Windows 2003 and XP are two distinct versions of Windows, so your jobs will
need to account for this:

Requirements = ( $(OpSys) == "WINNT51" || ($(OpSys) == "WINNT52" )

Should solve idle problem you experience when running jobs from the CM (if
you add that to your jobs).

As for the other errors: Try disabling the firewall on the Windows 2003
machine, and see if you "condor_q" commands work on the worker nodes.  If
this works, then you may need to play around with opening some ports in the
firewall.  I see that you are also using ADD_WINDOWS_FIREWALL_EXCEPTION, are
the firewalls disabled on the worker nodes?  Also, is there any particular
reason you're CM's COLLECTOR_NAME differs from that of the worker nodes?
(they should probably all be the same, if they are all part of the same
pool.)

Regards,
-B


From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of diane
Sent: Monday, June 23, 2008 2:47 PM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] Windows Server 2003 condor_submit problem

Hello,

I have a condor pool consisting of a Windows XP machine (master) and a
Windows 2003 machine (slave), and am unable to get jobs to run on the
Windows 2003 machine (they sit in the queue with an ?Idle? status until
?master? slots become available).
Does anyone know how to fix this problem?

Running condor_status from the master indicates all machines are in the
pool.

In the condor_config files, I have:

On the Slave:
DAEMON_LIST = MASTER START
On the Master:
DAEMON_LIST = MASTER COLLECTOR NEGOTIATOR SCHEDD START

On the Slave:
COLLECTOR_NAME = My Pool
On the Master:
COLLECTOR_NAME = fbp-test-pool

On the Slave:
CONDOR_HOST = <ip address of master>
On the master:
CONDOR_HOST = $(FULL_HOSTNAME)

Also on the slave:
        ADD_WINDOWS_FIREWALL_EXCEPTION = FALSE

        WINDOWS_FIREWALL_FAILURE_RETRY = 10

When I run condor_q on the slave machine, I get error:

Error: Can't find address for schedd <my windows 2003 machine>

Extra Info: You probably saw this error because the condor_schedd is not
running on the machine you are trying to query. If the condor_schedd is not
running, the Condor system will not be able to find an address and port to
connect to and satisfy this request. Please make sure the Condor daemons are
running and try again.

Extra Info: If the condor_schedd is running on the machine you are trying to
query and you still see the error, the most likely cause is that you have
setup a personal Condor, you have not defined SCHEDD_NAME in your
condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE
setting. You must define either or both of those settings in your config
file, or you must use the -name option to condor_q. Please see the Condor
manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.

I guess this makes sense since schedd is NOT running on the slave.


Also, when I reverse the roles (make the Windows XP the slave and the
Windows 2003 the master) I get the same results (jobs run on the Windows XP
machine but not the 2003 machine).

Any help would be appreciated.
Thanks,
Diane


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/