[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Don't understand HoldReason msg



I ran condor_store_cred -c on the condor master machine (delta-mod, here) and made up a password, it didn't complain. User condor_pool is not an actual Windows user, right? But even so I have this output from the test command

condor_status -f "%s\t" Name -f "%s\n" ifThenElse(isUndefined(LocalCredd),\"UNDEF\",LocalCredd)

slot1@xxxxxxxxxxxxxxxxxxxxxxxxx delta-mod.water.ca.gov
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx delta-mod.water.ca.gov
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx delta-mod.water.ca.gov
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx UNDEF
slot1@xxxxxxxxxxxxxxxxxxxxxxxxx delta-mod.water.ca.gov

And sure enough, my job seems to only run on my machine (-002), and -017, -018, and DELTA-MOD.  Am I supposed to run condor_store_cred -c on every machine in the pool? Seems unlikely, what if someone had a pool of hundreds of machines? Is the condor_pool user thing fairly recent, I don't recall having to do this before. I'll study the manual on it.


On Fri, Jan 11, 2013 at 6:52 PM, O'Donnell, Michael <odonnellm@xxxxxxxx> wrote:
Yes, the shadow log only shows up on the submit machine. I would need to look through Condor's manual with regard to switching of permissions when jobs are submitted and run to refresh my memory. Is it possible that the condor pool password is not stored on the execute machine. Because your job does run when your comment out the log file the problem might be related to permissions of the pool user, condor_pool@xxxxxxxxxxxx.

Run this command:
condor_status -f "%s\t" Name -f "%s\n" ifThenElse(isUndefined(LocalCredd),\"UNDEF\",LocalCredd)

If you see undefined for any of your machines, or better the machine causing the problem, then you need to add the credentials for the condor_pool user.

I tend to write all my logs to a shared file server and therefore I use UNC paths. This might be an easy test for you. Without looking at UW Condor documentation of how daemons change permissions during the jobs execution, I can only speculate.

Do you have a D:\ drive on the local machine or are you trying to write the log to your submit machine? 

I can take a closer look on Monday if these suggestions do not help.
Mike

On Fri, Jan 11, 2013 at 4:40 PM, Ralph Finch <ralphmariafinch@xxxxxxxxx> wrote:
@Michael O'Donnell: No mapped drives, but I don't think that's the issue. I made a much simpler test case and it still didn't work. I commented out the line in the submit file trying to create the log file and things work, curiously, the similar lines create the output and error files don't cause troubles.

@
Nathan Panike: The ShadowLog is only on the submitting machine, right? I couldn't find the line you described but notice credential problems:

01/11/13 15:29:11 (35.5) (4612): getStoredCredential(): Could not locate credential for user 'condor_pool@xxxxxxxxxxxx'
01/11/13 15:29:11 (35.5) (4612): getStoredCredential(): Could not locate credential for user 'condor_pool@xxxxxxxxxxxx'
01/11/13 15:29:11 (35.5) (4612): SECMAN: required authentication with credd delta-mod.water.ca.gov failed, so aborting command command 81099.
01/11/13 15:29:11 (35.5) (4612): ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
01/11/13 15:29:11 (35.5) (4612): ERROR: Could not locate valid credential for user 'rfinch@WATER'
01/11/13 15:29:11 (35.5) (4612): init_user_ids() failed as user rfinch

I had run condor_credd and thought everything was good, but will revisit that issue.

Ralph Finch
Calif. Dept. of Water Resources


On Fri, Jan 11, 2013 at 12:16 PM, Nathan Panike <nwp@xxxxxxxxxxx> wrote:
Hi Ralph:

You want to look in the ShadowLog for this information.  There should be
a line in the ShadowLog that says

"WriteUserLog::initialize: failed to open file <...>"

Can you find that? BTW, was this part of a DAGMan workflow?

Nathan Panike


On Fri, Jan 11, 2013 at 11:26:22AM -0800, Ralph Finch wrote:
> $CondorVersion: 7.9.1 Oct 15 2012 BuildID: 70216 $
> $CondorPlatform: x86_64_winnt_6.1 $
>
> I'm submitting jobs to the pool and they're all being Held, the reason
> given is:
>
> HoldReason = "Failed to initialize user log to
> d:\delta\models\Historical_v81_Beta_Release\201X-Calibration\PEST\MTZ_Boundary_EC\condor\dsm2-15-8.log
> or "
>
> I don't understand this because the log files are in fact created in the
> above directory, with size 0, which I think is normal when nothing is
> written.
>
> Ralph Finch
> Calif. Dept. of Water Resources
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
- - - - - - - - - - - - - - - - - - - - - - - - - -
Michael O'Donnell

GIS/IT Analyst
Phone: 970.226.9407
Fax: 970.226.9230
Email:
odonnellm@xxxxxxxx

United States Geological Survey/BRD
Fort Collins Science Center
2150 Centre Ave., Bldg C
Fort Collins, CO 80526

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/