[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor not able to create temp file inside dir_#### directory - passwd_cache: setgroups() failed



Hi,

The jobs went into hold state due to VMGAHP_ERR_INTERNAL or Permission denied error. The execute machine is new provisioned machine running Centos. This looks like user permission issue. I don't know were to look. Help me out.

I am using idealgrid user to submit and start jobs.
$ id idealgrid
uid=49527(idealgrid) gid=49527(idealgrid) groups=49527(idealgrid),2(daemon)

At the time creation premissoins of  execute/dir_22264 is.
drwxr-xr-x 1 idealgrid idealgrid 80 Apr  8  2010 dir_20935

The input file are get copied into  dir_#### directory.

For VM Job
In VMGAHPLog:

4/7 20:16:57 VMGAHP[22273]: condor_mkstemp(/opt/condor-7.2.0/local.ch2bl/execute/dir_22264/vmcCKuhT) returned -1, 'Permission denied' (errno 13) in VMType::createTempFile()
4/7 20:16:57 VMGAHP[22273]: Inside VMwareType::Shutdown

In StarterLog


4/7 20:17:00 VMGAHP[22273] -> '2' '1' 'VMGAHP_ERR_INTERNAL'
4/7 20:17:00 Failed to execute command('CONDOR_VM_START'), vmgahp error string('VMGAHP_ERR_INTERNAL')
4/7 20:17:00 VMGAHP_ERR_INTERNAL
4/7 20:17:01 Inside VM_GAHP_SERVER::cleanup()
4/7 20:17:01 VMGAHP[22273] <- 'QUIT'
4/7 20:17:02 VMGAHP[22273] -> 'S'
4/7 20:17:02 End of VM_GAHP_SERVER::cleanup
4/7 20:17:03 ProcAPI::buildFamily() Found daddypid on the system: 22273
4/7 20:17:03 Inside VMProc::cleanup()
4/7 20:17:03 Failed to start job, exiting
4/7 20:17:03 ShutdownFast all jobs.
4/7 20:17:03 Got ShutdownFast when no jobs running.
4/7 20:17:03 Create_Process: using fast clone() to create child process.
4/7 20:17:03 HOOK_JOB_EXIT (/opt/condor-7.2.0/sbin/ccp_hook_job_exit.pl) invoked with reason: "evict"
4/7 20:17:03 Removing /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264
4/7 20:17:03 Attempting to remove /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264 as SuperUser (root) 4/7 20:17:03 Removing "/opt/condor-7.2.0/local.ch2bl1/execute/dir_22264" as SuperUser (root) failed: /bin/rm exited with status 1 4/7 20:17:03 Removing /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264 as PRIV_CONDOR failed, trying again as file owner
4/7 20:17:03 passwd_cache: setgroups( idealgrid ) failed.
4/7 20:17:03 set_owner_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted 4/7 20:17:03 Attempting to remove /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264 as file owner 'idealgrid' (49527.49527) 4/7 20:17:03 Removing "/opt/condor-7.2.0/local.ch2bl1/execute/dir_22264" as file owner 'idealgrid' (49527.49527) failed: /bin/rm exited with status 1 4/7 20:17:03 WARNING: /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264 still exists after trying to remove it as the owner 4/7 20:17:03 Attempting to chmod(0700) /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264 and all subdirs
4/7 20:17:03 passwd_cache: setgroups( idealgrid ) failed.
4/7 20:17:03 set_owner_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted 4/7 20:17:03 Attempting to chmod /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264 as file owner 'idealgrid' (49527.49527) 4/7 20:17:03 chmod(/opt/condor-7.2.0/local.ch2bl1/execute/dir_22264) failed: Operation not permitted (errno 1) 4/7 20:17:03 Failed to chmod(0700) /opt/condor-7.2.0/local.ch2bl1/execute/dir_22264 and all subdirs 4/7 20:17:03 Can't remove "/opt/condor-7.2.0/local.ch2bl1/execute/dir_22264" as directory owner, giving up!

For Vanilla Job In StarterLog

4/7 20:25:21 Initialized user_priv as "idealgrid"
4/7 20:25:21 Done moving to directory "/opt/condor-7.2.0/local.ch2bl1/execute/dir_23950"
4/7 20:25:22 in OsProc::StartJob()
4/7 20:25:22 IWD: /opt/condor-7.2.0/local.ch2bl1/execute/dir_23950
4/7 20:25:22 passwd_cache: setgroups( idealgrid ) failed.
4/7 20:25:22 set_user_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted

4/7 20:25:22 About to exec /opt/condor-7.2.0/local.ch2bl1/execute/dir_23950/condor_exec.exe 4/7 20:25:22 Env = _CONDOR_SLOT=1 _CONDOR_LOWPORT=9600 _CONDOR_SCRATCH_DIR=/opt/condor-7.2.0/local.ch2bl1/execute/dir_23950 _CONDOR_HIGHPORT=10295
4/7 20:25:22 passwd_cache: setgroups( idealgrid ) failed.
4/7 20:25:22 set_user_egid - ERROR: initgroups(idealgrid, 49527) failed, errno: Operation not permitted 4/7 20:25:22 Create_Process: Cannot access specified executable "/opt/condor-7.2.0/local.ch2bl1/execute/dir_23950/condor_exec.exe": errno = 13 (Permission denied) 4/7 20:25:22 ERROR "Create_Process(/opt/condor-7.2.0/local.ch2bl1/execute/dir_23950/condor_exec.exe,, ...) failed" at line 516 in file os_proc.cpp
4/7 20:25:22 ShutdownFast all jobs.


Thanks
Johnson


Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com