[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] BOINC running, all machine Owner






Hi

Thanks a lot for that job

I've tried th 6.7.17 with the new Boinc possibility but i didn't succeed.
First i had problems whith the owner option, even when i was root.
I tested the daemon user too, but not OK.
Finally i make a boinc user and have the next result.
3/28 18:02:56 ERROR "Create_Process(/usr/local/BOINC/run_client,condor_exec.exe, ...) failed" at line 373 in file os_proc.C
3/28 18:02:56 ShutdownFast all jobs.
When i run /usr/local/BOINC/run_client whith boinc user, it works well.
Bellow you can find the config file, the log files and the status.

Manu



condor_config.local

# Turn on backfill functionality, and use BOINC
ENABLE_BACKFILL = TRUE
BACKFILL_SYSTEM = BOINC

# Spawn a backfill job if we've been Unclaimed for more than 5
# minutes
START_BACKFILL = $(StateTimer) > (5 * $(MINUTE))

# Evict a backfill job if the machine is busy (based on keyboard
# activity or cpu load)
EVICT_BACKFILL = $(MachineBusy)

# Define a shared macro that can be used to define other settings.
# This directory must be manually created before attempting to run
# any backfill jobs.
#BOINC_HOME = $(LOCAL_DIR)/boinc
BOINC_HOME = /usr/local/BOINC

# Path to the boinc_client to use, and required universe setting
BOINC_Executable = $(BOINC_HOME)/run_client
BOINC_Universe = vanilla

# What initial working directory should BOINC use?

BOINC_InitialDir = $(BOINC_HOME)

# Save STDOUT and STDERR
BOINC_Output = $(BOINC_HOME)/boinc.out
BOINC_Error = $(BOINC_HOME)/boinc.err

# Specify the user that the boinc_client should run as:
BOINC_Owner = boinc

StartLog

3/28 14:27:20 State change: EVICT_BACKFILL is TRUE
3/28 14:27:25 State change: EVICT_BACKFILL is TRUE
3/28 14:27:43 ******************************************************
3/28 14:27:43 ** condor_startd (CONDOR_STARTD) STARTING UP
3/28 14:27:43 ** /opt/condor-6.7.17/sbin/condor_startd
3/28 14:27:43 ** $CondorVersion: 6.7.17 Feb 18 2006 $
3/28 14:27:43 ** $CondorPlatform: I386-LINUX_RH9 $
3/28 14:27:43 ** PID = 3743
3/28 14:27:43 ******************************************************
3/28 14:27:43 Using config file: /home/prof/condor/condor_config
3/28 14:27:43 Using local config files: /home/prof/condor/hosts/strauss/condor_config.local
3/28 14:27:43 DaemonCore: Command Socket at <192.168.45.110:35272>
3/28 14:27:45 New machine resource allocated
3/28 14:27:45 Instantiating a BOINC_BackfillMgr
3/28 14:27:45 Created a BOINC Backfill Manager
3/28 14:27:45 About to run initial benchmarks.
3/28 14:27:49 Completed initial benchmarks.
3/28 15:27:43 DaemonCore: Command received via TCP from host <192.168.45.110:36157> 3/28 15:27:43 DaemonCore: received command 448 (GIVE_STATE), calling handler (command_give_state)
3/28 17:52:50 State change: IS_OWNER is false
3/28 17:52:50 Changing state: Owner -> Unclaimed
3/28 18:02:50 State change: START_BACKFILL is TRUE
3/28 18:02:50 Changing state: Unclaimed -> Backfill
3/28 18:02:55 State change: BOINC client running for vm1
3/28 18:02:55 Changing activity: Idle -> Busy
3/28 18:02:56 BOINC client (pid 7858) exited with status 4
3/28 18:02:56 State change: Backfill starter exited
3/28 18:02:56 Changing activity: Busy -> Idle
3/28 18:03:00 State change: BOINC client running for vm1
3/28 18:03:00 Changing activity: Idle -> Busy
3/28 18:03:00 BOINC client (pid 7862) exited with status 4

and every 5 sec
3/28 18:31:00 Changing activity: Busy -> Idle
3/28 18:31:05 State change: BOINC client running for vm1
3/28 18:31:05 Changing activity: Idle -> Busy
3/28 18:31:06 BOINC client (pid 9227) exited with status 4
3/28 18:31:06 State change: Backfill starter exited
3/28 18:31:06 Changing activity: Busy -> Idle
3/28 18:31:10 State change: BOINC client running for vm1
3/28 18:31:10 Changing activity: Idle -> Busy
3/28 18:31:10 BOINC client (pid 9231) exited with status 4
3/28 18:31:10 State change: Backfill starter exited
3/28 18:31:10 Changing activity: Busy -> Idle
3/28 18:31:15 State change: EVICT_BACKFILL is TRUE
3/28 18:31:15 Changing activity: Idle -> Killing
3/28 18:31:20 State change: EVICT_BACKFILL is TRUE
3/28 18:31:25 State change: EVICT_BACKFILL is TRUE
every 5 sec
3/28 19:05:55 State change: EVICT_BACKFILL is TRUE

StarterLog.boinc

3/28 18:02:56 ******************************************************
3/28 18:02:56 ** condor_starter (CONDOR_STARTER) STARTING UP
3/28 18:02:56 ** /opt/condor-6.7.17/sbin/condor_starter
3/28 18:02:56 ** $CondorVersion: 6.7.17 Feb 18 2006 $
3/28 18:02:56 ** $CondorPlatform: I386-LINUX_RH9 $
3/28 18:02:56 ** PID = 7858
3/28 18:02:56 ******************************************************
3/28 18:02:56 Using config file: /home/prof/condor/condor_config
3/28 18:02:56 Using local config files: /home/prof/condor/hosts/strauss/condor_config.local
3/28 18:02:56 DaemonCore: Command Socket at <192.168.45.110:38524>
3/28 18:02:56 Done setting resource limits
3/28 18:02:56 Starter running a local job with no shadow
3/28 18:02:56 Getting job ClassAd from config file with keyword: "boinc"
3/28 18:02:56 "boinc_proc" not found in config file
3/28 18:02:56 Starting a VANILLA universe job with ID: 1.0
3/28 18:02:56 IWD: /usr/local/BOINC
3/28 18:02:56 Output file: /usr/local/BOINC/boinc.out
3/28 18:02:56 Error file: /usr/local/BOINC/boinc.err
3/28 18:02:56 About to exec /usr/local/BOINC/run_client condor_exec.exe
3/28 18:02:56 Create_Process: child failed with errno 8 (Exec format error) before exec() 3/28 18:02:56 ERROR "Create_Process(/usr/local/BOINC/run_client,condor_exec.exe, ...) failed" at line 373 in file os_proc.C
3/28 18:02:56 ShutdownFast all jobs.
3/28 18:03:00 ******************************************************
.
.
.
3/28 18:25:20 ******************************************************
3/28 18:25:20 ** condor_starter (CONDOR_STARTER) STARTING UP
3/28 18:25:20 ** /opt/condor-6.7.17/sbin/condor_starter
3/28 18:25:20 ** $CondorVersion: 6.7.17 Feb 18 2006 $
3/28 18:25:20 ** $CondorPlatform: I386-LINUX_RH9 $
3/28 18:25:20 ** PID = 8946
3/28 18:25:20 ******************************************************
3/28 18:25:20 Using config file: /home/prof/condor/condor_config
3/28 18:25:20 Using local config files: /home/prof/condor/hosts/strauss/condor_c
onfig.local
3/28 18:25:20 DaemonCore: Command Socket at <192.168.45.110:38808>
3/28 18:25:20 Done setting resource limits
3/28 18:25:20 Starter running a local job with no shadow
3/28 18:25:20 Getting job ClassAd from config file with keyword: "boinc"
3/28 18:25:20 "boinc_proc" not found in config file
3/28 18:25:20 Starting a VANILLA universe job with ID: 1.0
3/28 18:25:20 IWD: /usr/local/BOINC
3/28 18:25:20 Output file: /usr/local/BOINC/boinc.out
3/28 18:25:20 Error file: /usr/local/BOINC/boinc.err
3/28 18:25:20 About to exec /usr/local/BOINC/run_client condor_exec.exe
3/28 18:25:20 Create_Process: child failed with errno 8 (Exec format error) befo
re exec()
3/28 18:25:20 ERROR "Create_Process(/usr/local/BOINC/run_client,condor_exec.exe,
 ...) failed" at line 373 in file os_proc.C
3/28 18:25:20 ShutdownFast all jobs.


condor_status

strauss.labom LINUX INTEL Backfill Killing 0.990 250 0+00:31:39

chromalgema.labomath.univ-orleans.fr>condor_status -l strauss
MyType = "Machine"
TargetType = "Job"
Name = "strauss.labomath.univ-orleans.fr"
Machine = "strauss.labomath.univ-orleans.fr"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
COLLECTOR_HOST_STRING = "chromalgema.labomath.univ-orleans.fr"
CondorVersion = "$CondorVersion: 6.7.17 Feb 18 2006 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $"
VirtualMachineID = 1
VirtualMemory = 628640
Disk = 219563936
CondorLoadAvg = 0.000000
LoadAvg = 0.990000
KeyboardIdle = 1
ConsoleIdle = 6601
Memory = 250
Cpus = 1
StartdIpAddr = "<192.168.45.110:35272>"
Arch = "INTEL"
OpSys = "LINUX"
UidDomain = "labomath.univ-orleans.fr"
FileSystemDomain = "labomath.univ-orleans.fr"
Subnet = "192.168.45"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX INTEL 2.4.x normal"
TotalVirtualMemory = 628640
TotalDisk = 219563936
TotalCpus = 1
TotalMemory = 250
KFlops = 516291
Mips = 1674
LastBenchmark = 1143548869
TotalLoadAvg = 0.990000
TotalCondorLoadAvg = 0.000000
ClockMin = 1162
ClockDay = 2
TotalVirtualMachines = 1
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
HasPVM = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList = "HasFileTransfer,HasPerFileEncryption,HasReconnect,HasMPI,HasTDP,HasJobDeferral,HasJICLocalConfig,HasJICLocalStdin,HasPVM,HasRemoteSyscalls,HasCheckpointing"
CpuBusyTime = 2874
CpuIsBusy = TRUE
TimeToLive = 2147483647
State = "Backfill"
EnteredCurrentState = 1143561770
Activity = "Killing"
EnteredCurrentActivity = 1143563475
Start = ((KeyboardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <= 0.300000) || (State != "Unclaimed" && State != "Owner")))
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) || ((MY.CheckpointPlatform =!= UNDEFINED) && ((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) || (TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
CurrentRank = 0.000000
MonitorSelfTime = 1143566389
MonitorSelfCPUUsage = 0.000000
MonitorSelfImageSize = 7588.000000
MonitorSelfResidentSetSize = 3528
MonitorSelfAge = 0
DaemonStartTime = 1143548864
UpdateSequenceNumber = 393
MyAddress = "<192.168.45.110:35272>"
LastHeardFrom = 1143566574
UpdatesTotal = 978
UpdatesSequenced = 977
UpdatesLost = 0
UpdatesHistory = "0x00000000000000000000000000000000"





On Fri, 24 Feb 2006, Derek Wright wrote:


On Wed, 21 Dec 2005 12:55:32 -0600  Derek Wright wrote:

as luck would have it, i'm currently working on adding code to the
condor_startd to be able to automatically spawn a BOINC client
whenever a condor VM is in Unclaimed/Idle.

this code is now available in Condor version 6.7.17.  any and all
Condor and BOINC users out there are urged to upgrade.  please send
feedback to condor-admin ASAP, since the 6.8.0 code freeze is only a
few weeks away, and i'd like to get any BOINC-related improvements
made and committed before then.  thanks!

-derek

p.s. there's a new section in the manual all about Condor + backfill
jobs using BOINC:

http://www.cs.wisc.edu/condor/manual/v6.7/3_13Setting_Up.html#SECTION004138000000000000000

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


--
Emmanuel Le Guirriec
Ingenieur de Recherche Calcul Scientifique CNRS
UMR6628-MAPMO
Federation Denis Poisson
Universite d'Orleans
BP 6759
45067 Orleans Cedex 2
tel	02.38.49.46.69 / 48.50