[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Getting files created in remote machine



Hi,

I am submitting a python script (qap_compute.py), which reads data from a file
assignment.$(Process), does some computations and creates a file
assignment.$(Process).result that contains the result. After carefully reading
the examples in the manual, I made the following submit description file (the
Arguments line is actually one line):

# file : submit.condor
# This is a submit description file.

Executable	=	c-get.bat
Universe	=	vanilla
Input		=	qap_compute.py
Error		=	errors\err.$(Process)
#Output		=	outputs\out.$(Process)
Log		=	logs\log

transfer_input_files		=	assignment.0, assignment.1, assignment.2
should_transfer_files		=	YES

Arguments	=	assignment.0 assignment.1 assignment.2 assignment.0.result
assignment.1.result assignment.2.result
when_to_transfer_output		=	ON_EXIT

Queue 3

The log shows the jobs are submitted but they are not executed. condor_q
-analyze says they are rejected for unknown reasons:
...
000 (027.000.000) 06/08 17:47:25 Job submitted from host: <10.2.28.69:1042>
...
000 (027.001.000) 06/08 17:47:25 Job submitted from host: <10.2.28.69:1042>
...
000 (027.002.000) 06/08 17:47:25 Job submitted from host: <10.2.28.69:1042>
...


What is wrong with my submission? Actually, without the arguments line and with
"transfer_input_files = assignment.$(Process)", the jobs are executed, but the
.result files are not sent back:

5:38 PM 6/8/2007000 (022.000.000) 06/08 17:04:09 Job submitted from host:
<10.2.28.69:1042>
...
000 (022.001.000) 06/08 17:04:09 Job submitted from host: <10.2.28.69:1042>
...
000 (022.002.000) 06/08 17:04:09 Job submitted from host: <10.2.28.69:1042>
...
001 (022.000.000) 06/08 17:04:20 Job executing on host: <10.2.28.229:1054>
...
001 (022.001.000) 06/08 17:04:23 Job executing on host: <10.2.28.50:1042>
...
001 (022.002.000) 06/08 17:04:24 Job executing on host: <10.2.28.69:1041>
...
006 (022.000.000) 06/08 17:04:28 Image size of job updated: 7340
...
005 (022.002.000) 06/08 17:04:30 Job terminated.
	(1) Normal termination (return value 1)
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	246  -  Run Bytes Sent By Job
	17585  -  Run Bytes Received By Job
	246  -  Total Bytes Sent By Job
	17585  -  Total Bytes Received By Job
...
006 (022.001.000) 06/08 17:04:31 Image size of job updated: 7204
...
005 (022.001.000) 06/08 17:04:36 Job terminated.
	(1) Normal termination (return value 1)
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	246  -  Run Bytes Sent By Job
	17585  -  Run Bytes Received By Job
	246  -  Total Bytes Sent By Job
	17585  -  Total Bytes Received By Job
...
005 (022.000.000) 06/08 17:04:36 Job terminated.
	(1) Normal termination (return value 1)
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	246  -  Run Bytes Sent By Job
	17585  -  Run Bytes Received By Job
	246  -  Total Bytes Sent By Job
	17585  -  Total Bytes Received By Job


Please help me. 
[ I am using Condor 6.6.11, Windows 2000 ]

Regards,
Simon