[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] i have a problem



my file .sub is the following:
 
universe = vanilla
executable = sim_rebounding_DT.exe
requirements = Memory >= 128
rank = kflops
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = meas.txt
error = FIR_SRA.err
log = FIR_SRA.log
output = FIR_SRA_cmeans_432.txt
arguments = 0 0
queue
output = FIR_SRA_cmeans_422.txt
arguments = 0 1
queue

when i submit this file to condor pool the command condor_status show it:
 
C:\thesis\simulation>condor_status
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
vm1@dtc-mvill <mailto:vm1@dtc-mvill>  WINNT51     INTEL  Unclaimed  Idle       0.000   251  0+02:08:28
vm2@dtc-mvill <mailto:vm2@dtc-mvill>  WINNT51     INTEL  Unclaimed  Idle       0.000   251  0+02:08:29
dtc-snaranjo. WINNT51     INTEL  Unclaimed  Idle       0.030   478  0+02:03:27
dtc-vhinojosa WINNT51     INTEL  Claimed    Busy       0.000  1015  0+00:02:21
id-vhinojosa. WINNT51     INTEL  Unclaimed  Idle       0.840   254  0+02:05:22
                     Machines Owner Claimed Unclaimed Matched Preempting
       INTEL/WINNT51        5     0       1         4       0          0
               Total        5     0       1         4       0          0
 
The the dtc-vhinojosa is running with the job, but due to my constraint the next machine is dtc-snaranjo, but i don't know because it doesn't run the job.
 
i use the command -analyze and the result is the next:
 
C:\thesis\simulation>condor_q -analyze

-- Submitter: dtc-vhinojosa : <10.0.1.171:4685> : dtc-vhinojosa
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
045.000:  Request is being serviced
---
045.001:  Run analysis summary.  Of 5 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      1 match, but are serving users with a better priority in the pool
      4 match, match, but reject the job for unknown reasons
      0 match, but will not currently preempt their existing job
      0 are available to run your job

can i help me which could be the reasons for the condor show the message "reject the job for unknown reasons"  or where i can search the mistake?
 
thanks for your help
 
regards,
 
 
victor
________________________________

De: condor-users-bounces@xxxxxxxxxxx en nombre de David A. Kotz
Enviado el: Jue 01/06/2006 02:40 p.m.
Para: Condor-Users Mail List
Asunto: Re: [Condor-users] i have a problem



Victor,

The first step is to use the -analayze switch to condor_q.  Try using
this command on the submit node:

   condor_q -analyze 26.0

and also this one (if it works in Windows):

   condor_q -better-analzye 26.0

Those commands should give you some indication of why job 26.0 is not
starting.

If you get nothing useful from those commands, compare long listings of
the jobs and the machines:

   condor_q -l 26.0
   condor_status -l dtc-mvill

to see if you can spot incompatibilities between the job's requirements
and the machine's requirements.

- dave


Víctor Hinojosa wrote:
> i have a condor pool. the summary is the following:
> 
> Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
> vm1@dtc-mvill <mailto:vm1@dtc-mvill>  WINNT51     INTEL  Unclaimed  Idle       0.000   508  0+02:08:29
> vm2@dtc-mvill <mailto:vm2@dtc-mvill>  WINNT51     INTEL  Unclaimed  Idle       0.330   508  0+02:08:30
> dtc-vhinojosa WINNT51     INTEL  Unclaimed  Idle       0.000  1015  0+00:08:06
> id-vhinojosa. WINNT51     INTEL  Unclaimed  Idle       0.010   254  0+02:08:09
>                      Machines Owner Claimed Unclaimed Matched Preempting
>        INTEL/WINNT51        4     0       0         4       0          0
>                Total        4     0       0         4       0          0
> 
> i submit a task with condor_submit. i check the status of my job with condor_q command.
> 
> -- Submitter: dtc-vhinojosa : <10.0.1.171:2934> : dtc-vhinojosa
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
>   26.0   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.1   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.2   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.3   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.4   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.5   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.6   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.7   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.8   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
>   26.9   Victor          6/1  13:02   0+00:00:00 I  0   0.3  sim_rebounding_DT
> 10 jobs; 10 idle, 0 running, 0 held
> 
> when i install the condor pool i set up all machines with the option "always run Condor jobs".so i don't know what happen. somebody can help me or where i can search the mistake?
> 
> regards,
> 
> 
> victor hinojosa
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


<<winmail.dat>>