[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [condor-users] Job won't run



Hi all,

 

I found the problem,

 

My cluster does not have a shared file system, NFS etc.

 

Although I had put in the job submit script ‘should_transfer_file=yes’

 

I had omitted ‘TransferFiles=ALWAYS’

 

The inclusion of this line solved the problem.

 

Thanks to those who provided suggestions.

 

Cheers,

 

Sandy

 

 

Computer Officer, RA Certification Manager

Department of Computer Science - UWA

Llandinam Building

Penglais Campus

Aberystwyth

Ceredigion

Wales - UK

SY23 3DB

Tel: (01970)-622433

Fax: (01970)-628536

 

-----Original Message-----
From: owner-condor-users@xxxxxxxxxxx [mailto:owner-condor-users@xxxxxxxxxxx] On Behalf Of Raymond Wong
Sent: 15 March 2004 10:49
To: condor-users@xxxxxxxxxxx
Subject: RE: [condor-users] Job won't run

 

Hi,

I am pretty new to condor too, so not too sure if I am getting this correct. Anyway, if job is not running, shouldn’t we be looking at the Schedd and Shadow Log of the submitting machine and Start and Starter log of the remote host?

 

Unless your job is having problem getting a match, otherwise, if your remote host is already rejecting the job, I do not think the collector has much part to play.

 

Raymond Wong

System Engineer

DID: 7358

Pager: 98028590

 

-----Original Message-----
From: Sandy Spence [mailto:axs@xxxxxxxxxx]
Sent: Thursday, March 11, 2004 7:51 PM
To: condor-users@xxxxxxxxxxx
Subject: RE: [condor-users] Job won't run

 

Hi Again,

 

Sorry I should have said, I am running Condor version 6.6.1.

 

I also have looked in the Collector log and there is a message:

 

condor_write(): Socket closed when trying to write buffer

Buf::write(): condor_write() failed

 

Cheers,

 

Sandy

 

Computer Officer, RA Certification Manager

Department of Computer Science - UWA

Llandinam Building

Penglais Campus

Aberystwyth

Ceredigion

Wales - UK

SY23 3DB

Tel: (01970)-622433

Fax: (01970)-628536

 

-----Original Message-----
From: owner-condor-users@xxxxxxxxxxx [mailto:owner-condor-users@xxxxxxxxxxx] On Behalf Of Sandy Spence
Sent: 11 March 2004 11:44
To: condor-users@xxxxxxxxxxx
Subject: [condor-users] Job won't run

 

Hi,

 

I have set up a test cluster of two nodes, one master and one slave.  I can submit a test job locally on each machine and both run to completion.  If after setting the macro START to False on the master and resubmit the job it is rejected by the slave. With the message rejected by your job’s requirements.

 

Anyone got any suggestions on where I might begin to track this problem.

 

Cheers,

 

Sandy

 

 

 

Computer Officer, RA Certification Manager

Department of Computer Science - UWA

Llandinam Building

Penglais Campus

Aberystwyth

Ceredigion

Wales - UK

SY23 3DB

Tel: (01970)-622433

Fax: (01970)-628536