[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Python API submission of DAGs



Hello I am developing a library based in suds, probably I will add the posibility for dag. You have to add the atributes that doesnt appear. You can get an example here:
http://spinningmatt.wordpress.com/2011/09/16/submitting-a-dag-via-aviary-using-python/

El 16/12/2013 12:39, Brian Candler escribió:
I want to submit DAGs using the Python API - specifically so I can easily grab the the cluster ID, and monitor job status / completion.

1. Does the Python API have any specific support for DAGs? If it does, I haven't been able to find it.

2. I have read
http://research.cs.wisc.edu/htcondor/HTCondorWeek2013/presentations/Bockelman_Python.pdf
and got the basic job submission example working.

Is my best approach to take *.condor.sub file which condor_submit_dag generates, and translate it from submission format to ClassAd format, then just submit this as a normal job via the Python API?

3. If I resubmit this using "condor_submit -verbose *.condor.sub" then I get about 75 attributes shown. Hopefully I don't have to create all these by hand?! Is it just the ones which correspond to the submit file attributes?

4. Some submit file settings don't appear to map to ClassAd attributes, e.g. "copy_to_spool = false". Is this just affecting the behaviour of condor_submit and can be ignored?

Thanks,

Brian.

---- submit file generated by condor_submit_dag ----
# Filename: test-10.dag.condor.sub
# Generated by condor_submit_dag test-10.dag
universe    = scheduler
executable    = /usr/bin/condor_dagman
getenv        = True
output        = test-10.dag.lib.out
error        = test-10.dag.lib.err
log        = test-10.dag.dagman.log
remove_kill_sig    = SIGUSR1
+OtherJobRemoveRequirements    = "DAGManJobId == $(cluster)"
# Note: default on_exit_remove expression:
# ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
# attempts to ensure that DAGMan is automatically
# requeued by the schedd if it exits abnormally or
# is killed (e.g., during a reboot).
on_exit_remove = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
copy_to_spool    = False
arguments = "-f -l . -Lockfile test-10.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag test-10.dag -Suppress_notification -CsdVersion $CondorVersion:' '8.0.4' 'Oct' '19' '2013' 'BuildID:' '189770' '$ -Force -Dagman /usr/bin/condor_dagman" environment = _CONDOR_DAGMAN_LOG=test-10.dag.dagman.out;_CONDOR_SCHEDD_ADDRESS_FILE=/var/lib/condor/spool/.schedd_address;_CONDOR_SCHEDD_DAEMON_AD_FILE=/var/lib/condor/spool/.schedd_classad;_CONDOR_MAX_DAGMAN_LOG=0
queue

---- condor_submit -verbose *.condor.sub ----
$ condor_submit -verbose test-10.dag.condor.sub

** Proc 528414.0:
OtherJobRemoveRequirements = "DAGManJobId == 528414"
LeaveJobInQueue = false
OnExitRemove = ( ExitSignal =?= 11 || ( ExitCode =!= undefined && ExitCode >= 0 && ExitCode <= 2 ) )
OnExitHold = false
PeriodicRemove = false
PeriodicRelease = false
FileSystemDomain = "proliant.example.com"
RequestDisk = DiskUsage
ExecutableSize = 256
ImageSize = 256
ShouldTransferFiles = "IF_NEEDED"
BufferBlockSize = 32768
BufferSize = 524288
StreamErr = false
Err = "test-10.dag.lib.err"
LocalUserCpu = 0.0
TotalSuspensions = 0
RemoteSysCpu = 0.0
CommittedTime = 0
StreamOut = false
NumCkpts = 0
ExitStatus = 0
LocalSysCpu = 0.0
TransferIn = false
RemoteUserCpu = 0.0
RemoteWallClockTime = 0.0
NumRestarts = 0
CompletionDate = 0
JobNotification = 0
Out = "test-10.dag.lib.out"
CumulativeSlotTime = 0
UserLog = "/home/brian/coderepo2/test-test/test-10.dag.dagman.log"
QDate = 1387193476
CommittedSlotTime = 0
TargetType = "Machine"
Cmd = "/usr/bin/condor_dagman"
NumSystemHolds = 0
EnvDelim = ";"
RemoveKillSig = "SIGUSR1"
Owner = "brian"
CurrentTime = time()
WantCheckpoint = false
MyType = "Job"
MinHosts = 1
PeriodicHold = false
CommittedSuspensionTime = 0
ExitBySignal = false
CondorVersion = "$CondorVersion: 8.0.4 Oct 19 2013 BuildID: 189770 $"
MaxHosts = 1
CurrentHosts = 0
RootDir = "/"
RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
CondorPlatform = "$CondorPlatform: x86_64_Debian7 $"
JobStatus = 1
Iwd = "/home/brian/coderepo2/test-test"
Requirements = ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory )
Rank = 0.0
NiceUser = false
JobUniverse = 7
DiskUsage = 256
NumJobStarts = 0
WantRemoteSyscalls = false
Arguments = "-f -l . -Lockfile test-10.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag test-10.dag -Suppress_notification -CsdVersion $CondorVersion:' '8.0.4' 'Oct' '19' '2013' 'BuildID:' '189770' '$ -Force -Dagman /usr/bin/condor_dagman"
TransferInputSizeMB = 0
WantRemoteIO = true
In = "/dev/null"
EnteredCurrentStatus = 1387193476
RequestCpus = 1
JobPrio = 0
LastSuspensionTime = 0
Env = "_=/usr/bin/condor_submit;PWD=/home/brian/coderepo2/test-test;SHLVL=1;LANG=en_GB.UTF-8;TERM=xterm;LANGUAGE=en_GB:en;MAIL=/var/mail/brian;OLDPWD=/home/brian/coderepo2;LESSOPEN=| /usr/bin/lesspipe %s;_CONDOR_SCHEDD_DAEMON_AD_FILE=/var/lib/condor/spool/.schedd_classad;SSH_AUTH_SOCK=/tmp/ssh-EkZVvo9585/agent.9585;_CONDOR_SCHEDD_ADDRESS_FILE=/var/lib/condor/spool/.schedd_address;SSH_TTY=/dev/pts/2;_CONDOR_DAGMAN_LOG=test-10.dag.dagman.out;SHELL=/bin/bash;_CONDOR_MAX_DAGMAN_LOG=0;USER=brian;PATH=/home/brian/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games;LESSCLOSE=/usr/bin/lesspipe %s %s;SSH_CLIENT=xxx.xxx.xxx.155 54819 22;LOGNAME=brian;SSH_CONNECTION=xxx.xxx.xxx.155 yyyyy xxx.xxx.xxx.110 22;HOME=/home/brian"
WhenToTransferOutput = "ON_EXIT"
CumulativeSuspensionTime = 0
KillSig = "SIGTERM"
CoreSize = 0

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/