[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] passwd_cache: setgroups( ) failed. (may be due to job hooks )



Hi,

In one of our pool machine due to the below job are not able to start a in it. The Log follows.

3/12 21:51:42 passwd_cache: setgroups( idealgrid ) failed.
3/12 21:51:42 set_user_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted
3/12 21:51:42 Input file: /dev/null
3/12 21:51:42 Failed to open '/execute/pool2/dir_9088/controljob.out' as standard output: Permission denied (errno 13) 3/12 21:51:42 Failed to open '/execute/pool2/dir_9088/controljob.err' as standard error: Permission denied (errno 13)
3/12 21:51:42 Failed to open some/all of the std files...
3/12 21:51:42 Aborting OsProc::StartJob.
3/12 21:51:42 Failed to start job, exiting
3/12 21:51:42 ShutdownFast all jobs.

After some google I saw a prepare hook leaving execute directory to this state.
/http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=248/

so I disabled the prepare hook the job executed fine this time but with following errors in log.

3/12 21:55:04 HOOK_JOB_EXIT (/opt/condor-7.2.3/sbin/ccp_hook_job_exit.pl) invoked with reason: "exit"
3/12 21:55:04 DC stdout pipe closed for pid 9169
3/12 21:55:04 DC stderr pipe closed for pid 9169
3/12 21:55:04 DaemonCore: No more children processes to reap.
3/12 21:55:04 ProcAPI::buildFamily failed: parent 9169 not found on system.
3/12 21:55:04 HookClient /opt/condor-7.2.3/sbin/ccp_hook_job_exit.pl (pid 9169) exited with status 0
3/12 21:55:04 passwd_cache: setgroups( idealgrid ) failed.
3/12 21:55:04 set_user_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted
..
3/12 21:55:04 Attempting to remove /execute/pool2/dir_9160 as SuperUser (root) 3/12 21:55:04 Removing " /execute/pool2/dir_9160" as SuperUser (root) failed: /bin/rm exited with status 1 3/12 21:55:04 Removing /execute/pool2/dir_9160 as PRIV_CONDOR failed, trying again as file owner
3/12 21:55:04 passwd_cache: setgroups( idealgrid ) failed.
3/12 21:55:04 set_owner_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted 3/12 21:55:04 Attempting to remove /execute/pool2/dir_9160 as file owner 'idealgrid' (49527.49527) 3/12 21:55:04 Removing " /execute/pool2/dir_9160" as file owner 'idealgrid' (49527.49527) failed: /bin/rm exited with status 1 3/12 21:55:04 WARNING: /execute/pool2/dir_9160 still exists after trying to remove it as the owner 3/12 21:55:04 Attempting to chmod(0700) /execute/pool2/dir_9160 and all subdirs
3/12 21:55:04 passwd_cache: setgroups( idealgrid ) failed.
3/12 21:55:04 set_owner_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted 3/12 21:55:04 Attempting to chmod /execute/pool2/dir_9160 as file owner 'idealgrid' (49527.49527) 3/12 21:55:04 chmod( /execute/pool2/dir_9160) failed: Operation not permitted (errno 1)
3/12 21:55:04 Failed to chmod(0700)  /execute/pool2/dir_9160 and all subdirs
3/12 21:55:04 Can't remove " /execute/pool2/dir_9160" as directory owner, giving up! 3/12 21:55:04 **** condor_starter (condor_STARTER) pid 9160 EXITING WITH STATUS 0

I am using condor-7.2.3.

by
Johnson


Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com