Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor_q analyze question

Date: Thu, 06 Nov 2008 10:26:44 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [Condor-users] Condor_q analyze question

Brandon Leeds wrote:

Hi All,


Hi Brandon, some hopefully helpful comments below....

We are trying to understand why a job appears to be running andaccumulating cpu time in the condor_q output,

Note that if you just do "condor_q", the time you are seeing isRUN_TIME, i.e. wall clock time. To see CPU time you need to pass the"-cputime" flag to condor_q. The cpu time is then display instead ofwall clock time; note that Condor only updates cpu time periodically, soyou will not notice cpu time incrementing every second with condor_q.

but are told by the enduser that his job is no longer accessing the files it should be alongthe computations typical pathway. In hopes of understanding if thepriority is so low that it is starving ,

If his job is marked as running with condor_q, then there is not a lowpriority starvation issue.


Some thoughts:

a) the job will still be displayed as "running" in condor_q even if itis currently suspended at the execute node because the SUSPENDexpression in the config file evaluated to true. You can do acondor_status to see if the node running the job is in suspended state.Or if the user specified an job log file (log=<some-file>) in thesubmit description file, that log will also state if the job was suspended.

b) the job will still be displayed as "running" in condor_q when in factfiles are being staged (copied) onto or off of the execute node.

c) if the user is expecting to see output files "grow" as the job runs,note there are many circumstances where they may not happen. forinstance, if the job is vanilla and file transfer is being used (i.e. noshared file system), the job's files will only get updated when the jobcompletes or optionally is preempted. if the job is standard universe,files may only get updated when the program does a sync to disk - i.e.file I/O may be cached in RAM for long periods of time.


he looked at using the analyze

flag to condor_q. Unfortunately we get this result:


$ condor_q -pool condor -name blaze -analyze 527579.0
Error: Collector has no record of schedd/submitter

This error is saying your pool does not have a schedd (submission point)named "blaze". If "blaze" is a hostname, perhaps the fully-qualifiedname? Also you can do "condor_status -schedd" to see a list of allpossible values you can use with the "-name" option.

Or perhaps the login of the user named "blaze" ? Then perhaps you meantto use the "-submitter" option to condor_q instead of the "-name" option.


-Todd

--
Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257

References:
- [Condor-users] Condor_q analyze question
  - From: Brandon Leeds

Prev by Date: Re: [Condor-users] Variable expressions in transfer_input_files
Next by Date: Re: [Condor-users] -better-analyze doesn't tell me details (7.0.4)
Previous by thread: [Condor-users] Condor_q analyze question
Next by thread: [Condor-users] (no subject)
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Condor_q analyze question