[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Where to define "submitter name" ??



> Hi probably rtfm ... but a simple nod of what to change where to sort
this
> niggle out might help... please ...
> 
> On my submit machine (linux wbel) I get:
> 
> [condor@WEREWOLF transit]$ condor_q
> 
> -- Submitter: localhost.localdomain : <192.168.0.3:36869> :
> localhost.localdomain
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
>   30.0   condor          5/25 14:19   0+00:14:43 R  0   2.3
condor_dagman
> -f -
>   31.0   condor          5/25 14:21   0+00:06:08 R  0   0.0
spssdag.bat
> 
> and ... (activity field chopped for readability)
> 
> [root@WEREWOLF transit]# condor_status
> 
> Name          OpSys       Arch   State      Activity   LoadAv Mem
> 
> localhost.loc LINUX       INTEL  Unclaimed  Idle       0.000   373
> xpnode0       WINNT51     INTEL  Claimed    Busy       0.020   384
> xpnode1       WINNT51     INTEL  Unclaimed  Idle       0.010   384
> xpnode2       WINNT51     INTEL  Unclaimed  Idle       0.010   384
> xpnode3       WINNT51     INTEL  Unclaimed  Idle       0.010   384
> 
> 
> --- Question:
> 
> what do I need to define so that Submitter is "werwolf" and not
> localhost.localdomain
> 
> and the status listing also uses a sensible machine name.
>
> Note: I am NOT using DNS services for the private network i.e. the
inward
> interface on linux and 4 xpnodes (192.168.0.*). The outward interface
on
> the linux box does indeed use the campuswide dns, and thus a lookup
> on it's Ip will return werewolf.york.ac.uk.

Are you sure your system has recovered from the recent full moon?  We
just had one last Monday.  (Sorry, couldn't resist.)  While I can't
promise a silver bullet (ok, I'll stop for real this time :-) )...
condor is probably getting confused with your internal/external
interfaces.  Do you have NETWORK_INTERFACE set in your config?  If so,
where does it point?  It ought to be your internal IP.  (Might the
daemons be querying DNS with the 192.* address?)  You can find out what
daemon is using what IP address by a 'condor_status -any -l'.  Be
prepared for a lot of output.

To debug this issue further, please turn on D_HOSTNAME for the master
and schedd and look closely at the log files.  

> [condor@WEREWOLF transit]$ condor_q
> 
> -- Failed to fetch ads from: <192.168.0.3:36869> :
localhost.localdomain
> [condor@WEREWOLF transit]$ condor_q
> 
> -- Failed to fetch ads from: <192.168.0.3:36869> :
localhost.localdomain
> [condor@WEREWOLF transit]$ condor_q
> 
> -- Submitter: localhost.localdomain : <192.168.0.3:36869> :
> localhost.localdomain
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
>   30.0   condor          5/25 14:19   0+00:23:21 R  0   2.3
condor_dagman
> -f -
>   31.0   condor          5/25 14:21   0+00:14:46 R  0   0.0
spssdag.bat
> 
> 2 jobs; 0 idle, 2 running, 0 held
> 
> The system is as near a dammit quiescent ... why should I get these
> failures?

This might (?) be a symptom of the same problem.  The schedd is single
threaded and if it's going out to lunch trying to figure out its
hostname, this could happen.  Just a hunch.  Tail -f the schedd log at
the same time this happens and you'll be able to see what's going on.
(I highly recommend turning on D_FULLDEBUG and D_HOSTNAME for the
schedd...)

Mike Yoder
Principal Member of Technical Staff
Direct : +1.408.321.9000
Fax    : +1.408.904.5992
Mobile : +1.408.497.7597
yoderm@xxxxxxxxxx

Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com