[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] More BLAST



I have a ntfs share mapped to the K:\ of the execute machines and /mnt/condor on the central manager and submit machines.  The windows Blast executables are located in the /blast/bin directory on the share.  Using the following submit script I ran condor_submit

# Submit script
Universe = Vanilla
Requirements = OpSys == "WINNT51" && Arch == "INTEL"
Executable = /mnt/condor/blast/bin/blastall.exe
Arguments = -p blastn -d K:\blast\db\nt -i K:\blast\test\sample.fsa -o blastout.ncbi
Should_Transfer_Files = Yes
When_To_Transfer_Output = ON_EXIT
Log = BLAST.log
Queue


According to condor_q my job ran, however not in the way I hoped that it would.  It appears that BLAST did not really actually run.  The entire job process took mere seconds and there was no output.  I checked the Log Files and this is what they show...

--
BLAST.log
--
...
000 (025.000.000) 12/23 13:46:24 Job submitted from host: <10.115.0.59:40535>
...
001 (025.000.000) 12/23 13:46:27 Job executing on host: <10.115.2.13:1056>
...
005 (025.000.000) 12/23 13:46:28 Job terminated.
    (1) Normal termination (return value 1)
        Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
    0  -  Run Bytes Sent By Job
    2621440  -  Run Bytes Received By Job
    0  -  Total Bytes Sent By Job
    2621440  -  Total Bytes Received By Job
...

--
ShadowLog
--
12/23 13:46:24 ******************************************************
12/23 13:46:24 ** condor_shadow (CONDOR_SHADOW) STARTING UP
12/23 13:46:24 ** /data/condor/condor-7.4.0/sbin/condor_shadow
12/23 13:46:24 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
12/23 13:46:24 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
12/23 13:46:24 ** $CondorVersion: 7.4.0 Oct 31 2009 BuildID: 193173 $
12/23 13:46:24 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
12/23 13:46:24 ** PID = 17457
12/23 13:46:24 ** Log last touched 12/23 13:29:40
12/23 13:46:24 ******************************************************
12/23 13:46:24 Using config source: /mnt/condor/etc/condor_config
12/23 13:46:24 Using local config sources:
12/23 13:46:24    /mnt/condor/local/local.ablethr21/condor_config.local
12/23 13:46:24 DaemonCore: Command Socket at <10.115.0.59:42175>
12/23 13:46:24 Initializing a VANILLA shadow for job 25.0
12/23 13:46:24 (25.0) (17457): Request to run on ablethr518096.AGR.GC.CA <10.115.2.13:1056> was ACCEPTED
12/23 13:46:28 (25.0) (17457): Job 25.0 terminated: exited with status 1
12/23 13:46:28 (25.0) (17457): **** condor_shadow (condor_SHADOW) pid 17457 EXITING WITH STATUS 100

--
SchedLog
--
12/23 13:46:24 (pid:22969) Sent ad to central manager for condor@xxxxxxxxx
12/23 13:46:24 (pid:22969) Sent ad to 1 collectors for condor@xxxxxxxxx
12/23 13:46:24 (pid:22969) Activity on stashed negotiator socket
12/23 13:46:24 (pid:22969) Negotiating for owner: condor@xxxxxxxxx
12/23 13:46:24 (pid:22969) Checking consistency running and runnable jobs
12/23 13:46:24 (pid:22969) Tables are consistent
12/23 13:46:24 (pid:22969) Rebuilt prioritized runnable job list in 0.007s.
12/23 13:46:24 (pid:22969) Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
12/23 13:46:24 (pid:22969) Completed REQUEST_CLAIM to startd ablethr518096.AGR.GC.CA <10.115.2.13:1056> for condor@xxxxxxxxx
12/23 13:46:24 (pid:22969) Starting add_shadow_birthdate(25.0)
12/23 13:46:24 (pid:22969) Started shadow for job 25.0 on ablethr518096.AGR.GC.CA <10.115.2.13:1056> for condor@xxxxxxxxx, (shadow pid = 17457)
12/23 13:46:28 (pid:22969) Shadow pid 17457 for job 25.0 exited with status 100
12/23 13:46:28 (pid:22969) Checking consistency running and runnable jobs
12/23 13:46:28 (pid:22969) Tables are consistent
12/23 13:46:28 (pid:22969) Rebuilt prioritized runnable job list in 0.008s.  (Expedited rebuild because no match was found)
12/23 13:46:28 (pid:22969) match (ablethr518096.AGR.GC.CA <10.115.2.13:1056> for condor@xxxxxxxxx) out of jobs; relinquishing
12/23 13:46:28 (pid:22969) Completed RELEASE_CLAIM to startd at <10.115.2.13:1056>
12/23 13:46:28 (pid:22969) Match record (ablethr518096.AGR.GC.CA <10.115.2.13:1056> for condor@xxxxxxxxx, 25.-1) deleted
12/23 13:46:28 (pid:22969) Attempting to chown '/mnt/condor/local/local.ablethr21/spool/cluster25.proc0.subproc0' from 502 to 502.502, but the path was unexpectedly owned by 0
12/23 13:46:29 (pid:22969) Sent owner (0 jobs) ad to 1 collectors

--
StarterLog
--
12/23 11:46:25 Locale: English_United States.1252
12/23 11:46:25 ******************************************************
12/23 11:46:25 ** condor_starter (CONDOR_STARTER) STARTING UP
12/23 11:46:25 ** C:\condor\bin\condor_starter.exe
12/23 11:46:25 ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
12/23 11:46:25 ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
12/23 11:46:25 ** $CondorVersion: 7.4.0 Oct 31 2009 BuildID: 193173 $
12/23 11:46:25 ** $CondorPlatform: INTEL-WINNT50 $
12/23 11:46:25 ** PID = 3564
12/23 11:46:25 ** Log last touched 12/23 11:29:39
12/23 11:46:25 ******************************************************
12/23 11:46:25 Using config source: C:\condor\condor_config
12/23 11:46:25 Using local config sources:
12/23 11:46:25    C:\condor/condor_config.local
12/23 11:46:25 DaemonCore: Command Socket at <10.115.2.13:3768>
12/23 11:46:25 GLEXEC_JOB not supported on this platform; ignoring
12/23 11:46:25 Setting resource limits not implemented!
12/23 11:46:25 Communicating with shadow <10.115.0.59:42175>
12/23 11:46:25 Submitting machine is "ablethr21.agr.gc.ca"
12/23 11:46:25 setting the orig job name in starter
12/23 11:46:25 setting the orig job iwd in starter
12/23 11:46:26 File transfer completed successfully.
12/23 11:46:27 Job 25.0 set to execute immediately
12/23 11:46:27 Starting a VANILLA universe job with ID: 25.0
12/23 11:46:27 Tracking process family by login "condor-reuse-slot1"
12/23 11:46:27 IWD: C:\condor\execute\dir_3564
12/23 11:46:27 Output file: C:\condor\execute\dir_3564\blastout.ncbi
12/23 11:46:27 Renice expr "10" evaluated to 10
12/23 11:46:27 About to exec C:\condor\execute\dir_3564\condor_exec.exe -p blastn -d K:\blast\db\nt -i K:\blast\test\sample.fsa -o blastout.ncbi
12/23 11:46:27 Create_Process succeeded, pid=1132
12/23 11:46:27 Process exited, pid=1132, status=1
12/23 11:46:27 Got SIGQUIT.  Performing fast shutdown.
12/23 11:46:27 ShutdownFast all jobs.
12/23 11:46:28 **** condor_starter (condor_STARTER) pid 3564 EXITING WITH STATUS 0

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Does anyone have any idea what might be wrong here?  Any help would be greatly appreciated.

D