[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Evictions

Thanks I was looking for a log folder last night and did not find it, I will check with that command and see what I find.
I did compile with the condor_compile.  Last night I wrote as simple a hello world program and ran it in the standard universe with evictions from the nodes.  In this program I did not try to write to any external files, just used cout which is supposed to be handled and put into the designated output file from the submission script as I understand things.
I will get that log and see what it says.  I am just about ready to delete everything and just rebuild the grid from scratch with the new version that was released yesterday.  It is good practice and I have it down now where I can install and setup the 9 computer grid in about an hour :)
Thanks again.



001 (022.000.000) 02/01 18:40:17 Job executing on host: <>


004 (022.000.000) 02/01 18:40:17 Job was evicted.

(0) Job was not checkpointed.

Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage

Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage

224 - Run Bytes Sent By Job

13488982 - Run Bytes Received By Job



----- Original Message -----
From: Jaime Frey
Sent: Thursday, February 02, 2006 10:06 AM
Subject: Re: [Condor-users] Evictions

Ah, you're using the standard universe. That changes things. You don't have to worry about shared file systems or file transfer in that case. You used condor_compile to link your program, yes?

The logs are in the Condor log directory. Run 'condor_config_val log' to determine where that is.

 -- Jaime

On Jan 31, 2006, at 5:09 PM, Stephen Broughton wrote:

I don't have current access to the grid, I will check the shadow log this evening.  Just in case, where is this log file?
I have a condor NFS share with all the binaries that all the nodes connect to, each of the nodes has a local scratch folder with their local config file.  The program is located on a sub folder of the condor shared folder with write access (I believe) set for read/write/execute.   I have not explicitly designated file transfer.  The program does output to a text file in the program source folder.  As I was able to run this as a test in the previous installation.
I will check the file permissions, that seems like a likely problem.
My main purpose in this program is to just output a few time stamps into a single output file, is there a way to do this that is supported more directly in Condor that would work better than just writting to a text file?
## Prime Number Condor command file
Universte  = standard
executable      = prime_new
log    = prime_new.log
#output   = prime_new.$(Process).out
output   = prime_new.out
error   = prime_new.err
# 1
arguments  = 10000000 10000500 1
# 2
arguments  = 10000501 10001000 2
The program is running from /home/condor/condor/prime that exists on all nodes through an NFS share.
----- Original Message -----
From: Jaime Frey
Sent: Tuesday, January 31, 2006 3:50 PM
Subject: Re: [Condor-users] Evictions

On Jan 31, 2006, at 11:26 AM, Stephen Broughton wrote:

I just noticed from the log that all the evictions are from the nodes, the job only completes on the master which is also the submittign machine and the NFS server for the Condor installation binaries.  This test program worked when I had a Condor 6.7.12 install and all the same configuration settings.

Does the shadow log on your submit machine or the starter log on the evicting execute machines contain any interesting error messages? If you don't have file transfer enabled, is the executable, input, output, or error on a local disk?

|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |

Condor-users mailing list