[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] matchmaking and failing to contact starter



The condor_status output is:

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

vm1@advaitha.  LINUX       IA64    Owner      Idle       0.010   997  0+00:25:14
vm2@advaitha.  LINUX       IA64    Unclaimed  Idle       0.000   997  0+00:25:05
cauvery.ad.in      LINUX       INTEL  Unclaimed  Idle       0.000   501  0+01:05:18
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   501  0+23:39:32
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   501  0+23:39:33
spiti.ad.info         LINUX       INTEL  Unclaimed  Idle       0.000  2027  0+00:12:47
vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.000   501  0+00:18:37
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   501[?????]
vm3@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   501  0+00:03:34
vm4@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   501  0+00:03:35

----------------------------------------------------------------------------------------------------------
The shell script is:
#!/bin/bash
echo "merging files"
cat $1 >$3
cat $2 >>$3
-----------------------------------------------
the submit file is:
[digz 07:29 PM @advaitha digz]$ cat sub
Executable = condor.sh
Universe = vanilla
Requirements = OpSys == "LINUX" && ARCH == "INTEL"
Rank =( (machine == "spiti.ad.infosys.com")*3) || ((machine == "cauvery.ad.infosys.com")*2) || (machine == "vindhya.ad.infosys.com")
|| (machine == "sahya.ad.infosys.com")

Error = logs/err.$(cluster)
Output = logs/out.$(cluster)
Log = logs/log.$(cluster)
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = files/in1, files/in2

Arguments = in1 in2 out1
Queue


The bold entries in my status should match the classAds based on the rank forcing it to run on SPITI (italics in status display)
==================================================

however its running only on vindhya and failing with the error messages below:
The log file says :



00 (131.000.000) 12/13 19:26:26 Job submitted from host: <172.25.243.135:57464>
      2 ...
      3 007 (131.000.000) 12/13 19:26:30 Shadow exception!
      4         Can no longer talk to condor_starter <172.25.243.154:43223>............<!---this is vindhya IP address--~!>
      5         0  -  Run Bytes Sent By Job
      6         0  -  Run Bytes Received By Job
      7 ...
   

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***