[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor-G - job submission problem - updated



Jaime,

Thanks for your email.

/tmp is good for testing, but can be dangerous for production.

I agree. My next question is:

If I ended up moving to a production system and need to change the directory permission so that any user can have write permission to that directory, in your opinion, which directory is best to use:

1. $GLOBUS_LOCATION/tmp
2. $GLOBUS_LOCATION/etc/gram-service-* (where jndi-config.xml is located)
3. other?

.globus/scratch directory is not created under /tmp directory.
It won't be. The directory that should be created will be named something like job_3d2f6ea0-167e-11da-92fd-cb402071b328.

I didn't realize that the job submission takes quite some time because it was in HOLD state for sometime initially (I don't know how long & why). I waited for about 5 min. after submitting the job before I wrote my last email to you and no new directory has existed. But now after 10-15 min, it exists under /tmp.


I cannot find any directory starting with 'job_xxxx' under /tmp but now I see a directory called: condor_g_scratch.0x86211b0.30821, which comes and goes. I don't know if this is the directory that you meant. But it's hard to catch this dir. because it shows under /tmp and disappear in the next second. However I was able to get the permission of this dir.:

---------------------------------------------------------------------------------------------------------------------------
drwx------ 3 myuser myuser 4096 Aug 29 12:42 condor_g_scratch.0x86211b0.30821
---------------------------------------------------------------------------------------------------------------------------


4. Globus-container site did not display anything (no error, nothing).

Again, sorry for my email sent prematurely. Now, I started to see the container spit out some error messages (after about 10-15 minutes):


===========================================================
Unable to open file /tmp/.globus/scratch/job_e61b6440-18ab-11da-a0ee-803cf7057248///.ignoreme
Unable to open file /tmp/.globus/scratch/job_e61b6440-18ab-11da-a0ee-803cf7057248/hostname
===========================================================


I don't know if this is the reason why the container displayed this error message. The container is owned by 'globus' user and it is this user that wants to open the file. However, the permission only allows the user who submitted the job, e.g. myuser, to read/write into the directory: condor_g_scratch.0x86211b0.30821

How can this problem be resolved?

========================
GridmanagerLog file --> still get SIGTERM
========================

8/29 12:42:23 ******************************************************
8/29 12:42:23 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
8/29 12:42:23 ** /usr/local/condor/sbin/condor_gridmanager
8/29 12:42:23 ** $CondorVersion: 6.7.8 Jun 9 2005 $
8/29 12:42:23 ** $CondorPlatform: I386-LINUX_RH9 $
8/29 12:42:23 ** PID = 6493
8/29 12:42:23 ******************************************************
8/29 12:42:23 Using config file: /usr/local/condor/etc/condor_config
8/29 12:42:23 Using local config files: /usr/local/condor/local.ucf-6/condor_config.local
8/29 12:42:23 DaemonCore: Command Socket at <148.100.51.26:44178>
8/29 12:42:26 [6493] DaemonCore: Command received via UDP from host <148.100.51.26:36799>
8/29 12:42:26 [6493] DaemonCore: received command 60000 (DC_RAISESIGNAL), calling handler (HandleSigCommand())
8/29 12:42:26 [6493] Found job 89.0 --- inserting
8/29 12:42:26 [6493] (89.0) doEvaluateState called: gmState GM_INIT, globusState 32
8/29 12:42:26 [6493] GAHP server pid = 6495
8/29 12:42:29 [6493] resource https://UCF-6.linuxclass.marist.edu:8443 is now up
8/29 12:42:29 [6493] (89.0) doEvaluateState called: gmState GM_DELEGATE_PROXY, globusState 32
8/29 12:42:33 [6493] (89.0) doEvaluateState called: gmState GM_DELEGATE_PROXY, globusState 32
8/29 12:42:33 [6493] (89.0) doEvaluateState called: gmState GM_GENERATE_ID, globusState 32
8/29 12:42:33 [6493] (89.0) doEvaluateState called: gmState GM_SUBMIT_ID_SAVE, globusState 32
8/29 12:42:38 [6493] (89.0) doEvaluateState called: gmState GM_SUBMIT, globusState 32
8/29 12:42:38 [6493] (89.0) doEvaluateState called: gmState GM_SUBMIT_SAVE, glob
usState 32
8/29 12:42:38 [6493] (89.0) doEvaluateState called: gmState GM_SUBMIT_COMMIT, globusState 32
8/29 12:42:40 [6493] (89.0) gram callback: state StageIn, fault (null), exit code 0
8/29 12:42:40 [6493] (89.0) doEvaluateState called: gmState GM_SUBMITTED, globusState 32
8/29 12:42:43 [6493] (89.0) doEvaluateState called: gmState GM_SUBMITTED, globusState 64
8/29 12:44:18 [6493] (89.0) gram callback: state Failed, fault Staging error for RSL element fileStageIn., exit code 0
8/29 12:44:18 [6493] (89.0) doEvaluateState called: gmState GM_SUBMITTED, globusState 64
8/29 12:44:18 [6493] (89.0) doEvaluateState called: gmState GM_FAILED, globusState 4
8/29 12:44:18 [6493] (89.0) doEvaluateState called: gmState GM_FAILED, globusState 4
8/29 12:44:18 [6493] (89.0) doEvaluateState called: gmState GM_FAILED, globusState 4
8/29 12:44:23 [6493] No jobs left, shutting down
8/29 12:44:23 [6493] Got SIGTERM. Performing graceful shutdown.
8/29 12:44:23 [6493] **** condor_gridmanager (condor_GRIDMANAGER) EXITING WITH STATUS 0


DW

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/