[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] problem with task_port_pid() on Mac OS X



Hello,
 
I've recently set up a cluster of almost 50 computers, under Linux and Mac OS X 10.4 . Everything runs more or less smoothly, except on the Macs: jobs do execute, however the StarLogs and StaterLogs get filled with the following errors:
 
5/12 10:33:38 ProcAPI: task_port_pid() on pid 17907 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 17906 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 17905 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 17901 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 17899 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 14900 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 11147 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 23969 failed with 5((os/kern) failure)
5/12 10:33:38 ProcAPI: task_port_pid() on pid 23968 failed with 5((os/kern) failure)

and it goes on for hundred and hundred of lines.
 
In addition, it seems that Condor bars itself from executing on mutiple CPUs on the same machine. For example, on recent Macs with 4 vms (2 bi-core CPUs), there are only 2 vms that really execute tasks. The two other ones are "Claimed" but are most of the time "Suspended" because of a load average near 1.0, even though nothing except condor executes on the machine.
It could be related to the problem of "task_port_pid", but I'm not sure. I'm using Condor unstable V. 6.7.18 .
 
I really don't know what to look for to solve these problems. Do you have any idea ?
Thank you in advance,
 
Fabrice.