[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] submitted jobs are not running



Sorry, I know I have sent it whithout it and responded on it imediatelly with condor_q output. To be sure, here it is one more time.

labounek@emperor:~$ condor_q


-- Schedd: emperor.fnol.loc : <172.19.37.11:34081?...
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   1.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0000.sh
   2.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0001.sh
   3.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0002.sh
   4.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0003.sh
   5.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0004.sh
   6.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0005.sh
   7.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0006.sh
   8.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0007.sh
   9.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0008.sh
  10.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0009.sh
  11.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0010.sh
  12.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0011.sh
  13.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0012.sh
  14.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0013.sh
  15.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0014.sh
  16.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0015.sh
  17.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0016.sh
  18.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0017.sh
  19.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0018.sh
  20.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0019.sh
  21.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0020.sh
  22.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0021.sh
  23.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0022.sh
  24.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0023.sh
  25.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0024.sh
  26.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0025.sh
  27.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0026.sh
  28.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0027.sh
  29.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0028.sh
  30.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0029.sh
  31.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0030.sh
  32.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0031.sh
  33.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0032.sh
  34.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0033.sh
  35.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0034.sh
  36.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0035.sh
  37.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0036.sh
  38.0   labounek        3/10 19:24   0+00:00:00 H  0   0.0  slice_0037.sh
  39.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0038.sh
  40.0   labounek        3/10 19:24   0+00:00:01 H  0   0.0  slice_0039.sh

40 jobs; 0 completed, 0 removed, 0 idle, 0 running, 40 held, 0 suspended
labounek@emperor:~$


Execute bit set for my id? Do you think this? I think I have it at boths.

labounek@emperor:~/test/dti.bedpostX/condor_logs$ ls -l slice_0007.sh
-rwxrwx--- 1 labounek mri 142 Mar 10 19:24 slice_0007.sh
labounek@emperor:~/test/dti.bedpostX/condor_logs$

labounek@emperor:~$ ls -l /usr/share/fsl/5.0/bin/bedpostx_single_slice.sh
-rwxr-xr-x 1 root root 3806 Oct 2 17:52 /usr/share/fsl/5.0/bin/bedpostx_single_slice.sh
labounek@emperor:~$

Regards,
Rene



Cituji Bob Ball <ball@xxxxxxxxx>:

No condor_q output was included.

Make sure that both slice_0007.sh and bedpostx_single_slice.sh have their execute bit set for your id. If this is the case then do "condor_q -analyze" on the stuck job.

bob

On 3/10/2016 1:45 PM, Labounek Renà wrote:
Dear condor users,
I have submitted jobs but they are still held and not running. Condor_status looks ok:



labounek@emperor:~$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot10@xxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:23 slot11@xxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:24 slot12@xxxxxxxxxxx LINUX X86_64 Unclaimed Idle 6.320 2682 0+00:00:25 slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:04 slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:23 slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:24 slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:25 slot5@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:26 slot6@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:27 slot7@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:28 slot8@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:21 slot9@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 1.000 2682 0+00:00:22 Total Owner Claimed Unclaimed Matched Preempting Backfill

       X86_64/LINUX    12     0       0        12 0          0        0

              Total    12     0       0        12 0          0        0
labounek@emperor:~$



Condor_submit comand looked like this:

condor_submit slice_0007.condor

The file contains this text:

Executable = /home/labounek/test/dti.bedpostX/condor_logs/slice_0007.sh
Universe = vanilla
output = /home/labounek/test/dti.bedpostX/condor_logs/slice_0007.out
error = /home/labounek/test/dti.bedpostX/condor_logs/slice_0007.error
Log   = /home/labounek/test/dti.bedpostX/condor_logs/slice_0007.log
Queue

The file slice_0007.sh contains one comand:

/usr/share/fsl/5.0/bin/bedpostx_single_slice.sh /home/labounek/test/dti 7 --nf=3 --fudge=1 --bi=1000 --nj=1250 --se=25 --model=2 --cnonlinear

I think everything should be ok, but it is stucked. Here is the condor_q output:







_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/