[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job submission using condor-G to gt4



Vinodh,

Comments are inline below:

Vinodh wrote:

hi,

	i trying to submit a file using condor_G.

the command i gave was condor_submit hi, where hi is

executable   = /bin/ls
transfer_executable=false
arguments    = -l
universe     = grid
grid_type    = gt4
globusscheduler = advaitha:8443
jobmanager_type = Fork
output       = inspiral.out
error        = inspiral.err
log          = inspiral.log
notification =  error
queue 1


this is working fine and the log is

000 (185.000.000) 12/16 12:21:59 Job submitted from
host: <172.25.243.135:57464>
017 (185.000.000) 12/16 12:22:15 Job submitted to
Globus
   RM-Contact: advaitha:8443
   JM-Contact:
https://172.25.243.135:8443/wsrf/services/ManagedExecutableJobService?77c697d0-6e00-11da-9d7a-da23fb7f3afa
   Can-Restart-JM: 0
...
001 (185.000.000) 12/16 12:22:23 Job executing on
host: gt4 advaitha:8443 Fork
...
005 (185.000.000) 12/16 12:22:31 Job terminated.
       (1) Normal termination (return value 0)
               Usr 0 00:00:00, Sys 0 00:00:00  -  Run
Remote Usage
               Usr 0 00:00:00, Sys 0 00:00:00  -  Run
Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
       0  -  Run Bytes Sent By Job
       0  -  Run Bytes Received By Job
       0  -  Total Bytes Sent By Job
       0  -  Total Bytes Received By Job

then, in the file hi i changed the jobmanager_type as
Condor. then, its not working.

after my submission, condor_q -ana gave the output

186.000:  Run analysis summary.  Of 13 machines,
     0 are rejected by your job's requirements
     3 reject your job because of their own
requirements
     0 match but are serving users with a better
priority in the pool
    10 match but reject the job for unknown reasons
     0 match but will not currently preempt their
existing job
     0 are available to run your job

WARNING: Analysis is only meaningful for Globus
universe jobs using matchmaking.


You are not using matchmaking, since you are submitting to a specific globusscheduler, therefore, the above analysis is not meaningful.

then, the command condor_q -ana gives

186.000:  Run analysis summary.  Of 13 machines,
     0 are rejected by your job's requirements
     3 reject your job because of their own
requirements
     0 match but are serving users with a better
priority in the pool
    10 match but reject the job for unknown reasons
     0 match but will not currently preempt their
existing job
     0 are available to run your job

WARNING: Analysis is only meaningful for Globus
universe jobs using matchmaking.
---
187.000:  Run analysis summary.  Of 13 machines,
    13 are rejected by your job's requirements
     0 reject your job because of their own
requirements
     0 match but are serving users with a better
priority in the pool
     0 match but reject the job for unknown reasons
     0 match but will not currently preempt their
existing job
     0 are available to run your job
       No successful match recorded.
       Last failed match: Fri Dec 16 12:26:34 2005
       Reason for last match failure: no match found

WARNING:  Be advised:
  No resources matched request's constraints
  Check the Requirements expression below:

Requirements = (OpSys == "LINUX" && Arch == "INTEL")
&& (Disk >= DiskUsage) && ((Memory * 1024) >=
ImageSize) && (TARGET.FileSystemDomain ==
MY.FileSystemDomain)


Ok, this last bit _is_ meaningful, because the second job is the plain vanilla universe job that was submitted by the Condor jobmanager for Globus when it received the job that Condor-G submitted through the globus protocols.

The problem is that the requirements expression for this new job is not matching any machines in your Condor pool. My guess is that FileSystemDomain is responsible. Check the FileSystemDomain in the job (with 'condor_q -l') and in the machines in your pool.

If they are different, then this explains the problem. To solve that, you would need to understand which filesystems the Globus job needs to access (usually at least the filesystem containing the GASS cache where the stdin/stdout files are). If all of these required filesystems are accessible from the machines in your pool, then you should configure FILESYSTEM_DOMAIN to be the same in the Condor configuration on the gatekeeper and the machines. If the filesystems are _not_ accessible from the machines in your pool, then there are ways of modifying the Condor jobmanager to enable file-transfer mode, which will enable some types of jobs to run.



the machine itself submits one another job. both these
jobs are idle forever.

the log file is

000 (186.000.000) 12/16 12:26:13 Job submitted from
host: <172.25.243.135:57464>
...
017 (186.000.000) 12/16 12:26:26 Job submitted to
Globus
   RM-Contact: advaitha:8443
   JM-Contact:
https://172.25.243.135:8443/wsrf/services/ManagedExecutableJobService?0deb5480-6e01-11da-9d7a-da23fb7f3afa
   Can-Restart-JM: 0


the output of condor_q -globus is

ID OWNER STATUS MANAGER HOST EXECUTABLE

186.0 vinodh PENDING Condor advaitha /bin/ls

Regards,
Vinodh Kumar. G

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam
protection around http://mail.yahoo.com
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users