[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Does condor work on a eucalyptus cloud?



Hi

I have been familiar with condor for about a month. I installed release
7.4.0 on a x86_64 CentOS 5.4 and am happily using it in a small cluster of
real and virtual machines (VMware) here.

We also have a small eucalyptus cloud and I was trying to get condor to
work there, without success.

Our eucalyptus cloud is configured as "MANAGED" (not by me), and there is
no domain name defined in the configuration.
I installed condor on two private instances I spawned on this private
cloud with owner myself. In only one of the two, condor is master. I
configured it to use

UID_DOMAIN      = $(FULL_HOSTNAME)
FILESYSTEM_DOMAIN   = $(FULL_HOSTNAME)
CONDOR_IDS=myId.myGroupId

ALLOW_READ = *
ALLOW_WRITE = XXX.XXX.XXX.* 192.168.X.Y1 192.168.X.Y2 127.0.0.1 *.localdomain

XXX.XXX.XXX.* - is the IP address assigned to the instances by eucalyptus
for external access
192.168.X.Y* - internal addresses inside the eucalyptus cloud

>From each of the instances I can ping the other, but only inside the
192.168.*.* net

I can get all condor deamons to run, but once I request condor_status, I
get a problem:
$condor_status
Error:  Could not fetch ads --- can't find collector

$condor_q            (that works okay, but it seems to be oblivious of the
ethernet address of the instance)


-- Submitter: localhost.localdomain : <127.0.0.1:54676> :
localhost.localdomain
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held

The deamons are running:

nadiap@localhost :~
$ps -ef | grep condor
nadiap   17652     1  0 20:02 ?        00:00:03 condor_master
nadiap   17653 17652  0 20:02 ?        00:00:00 condor_collector -f nadiap
  17654 17652  0 20:02 ?        00:00:00 condor_schedd -f
nadiap   17656 17654  0 20:02 ?        00:00:02 condor_procd -A
/tmp/condor-lock.localhost0.17840921886182/procd_pipe.SCHEDD -S 60 -C 6358
nadiap   17749 17652  0 20:04 ?        00:00:04 condor_startd -f

In the logs, it says the following:

==> /usr/local/condor/local/log/MasterLog <==
12/15 22:11:42 DaemonCore: Command Socket at <127.0.0.1:57888>
12/15 22:11:42 Warning: Collector information was not found in the
configuration file. ClassAds will not be sent to the collector and this
daemon will not join a larger Condor pool.
12/15 22:11:42 passwd_cache::cache_uid(): getpwnam("condor") failed: user
not found
12/15 22:11:42 passwd_cache::cache_uid(): getpwnam("condor") failed: user
not found
12/15 22:11:42 Collector port not defined, will use default: 9618
12/15 22:11:43 Started DaemonCore process
"/usr/local/condor/sbin/condor_collector", pid and pgroup = 18237
12/15 22:11:46 Started DaemonCore process
"/usr/local/condor/sbin/condor_schedd", pid and pgroup = 18240
12/15 22:11:47 Started DaemonCore process
"/usr/local/condor/sbin/condor_startd", pid and pgroup = 18241


Is there anything special I need to do in the config file to get the
collector to work properly?

My question to the group is: Has anyone gotten this to work on a
eucalyptus cloud? It seems that it ought to, but what is the right
incantation?

Thanks,
Nadia