[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] "Failed to initialize GAHP"



Hi,

$grid-proxy-info
subject  : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=guragain/CN=659508/CN=Samir Guragain/CN=proxy
issuer   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=guragain/CN=659508/CN=Samir Guragain
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=guragain/CN=659508/CN=Samir Guragain
type     : full legacy globus proxy
strength : 1024 bits
path     : /tmp/x509up_u507
timeleft : 73:28:34  (3.0 days)


$condor_q -l 151060.0

JobGridType = "gt2"

JobUniverse = 9

GridResource = "gt2 uscms1.fltech-grid3.fit.edu:2119/jobmanager-condor"


I suppose  you're right and the user is submitting them remotely but still getting mapped to their local account?

Here is the full output:


ClusterId = 151060

QDate = 1288039973

CompletionDate = 0

Owner = "sguragai"

RemoteWallClockTime = 0.000000

LocalUserCpu = 0.000000

LocalSysCpu = 0.000000

RemoteUserCpu = 0.000000

RemoteSysCpu = 0.000000

ExitStatus = 0

NumCkpts_RAW = 0

NumCkpts = 0

NumJobStarts = 0

NumRestarts = 0

CommittedTime = 0

TotalSuspensions = 0

LastSuspensionTime = 0

CumulativeSuspensionTime = 0

ExitBySignal = FALSE

CondorVersion = "$CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $"

CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"

RootDir = "/"

Iwd = "/home/sguragai/physics/After_Thesis/CMSSW_3_8_5/src/SUSYBSMAnalysis/Zprime2muAnalysis/tes

t/FIT_2010B_Oct22/share/.condor_temp"

JobUniverse = 9

Cmd = "/home/sguragai/physics/After_Thesis/CMSSW_3_8_5/src/SUSYBSMAnalysis/Zprime2muAnalysis/tes

t/FIT_2010B_Oct22/job/CMSSW.sh"

MinHosts = 1

MaxHosts = 1

CurrentHosts = 0

WantRemoteSyscalls = FALSE

WantCheckpoint = FALSE

RequestCpus = 1

JobPrio = 0

User = "sguragai@xxxxxxxxxxxxxxxxxxxxxxxxxxx"

NiceUser = FALSE

Environment = ""

JobNotification = 0

WantRemoteIO = TRUE

UserLog = "/home/sguragai/physics/After_Thesis/CMSSW_3_8_5/src/SUSYBSMAnalysis/Zprime2muAnalysis

/test/FIT_2010B_Oct22/share/.condor_temp/CMSSW_1.log"

CoreSize = 0

KillSig = "SIGTERM"

Rank = 0.000000

In = "/dev/null"

TransferIn = FALSE

Out = "CMSSW_1.stdout"

StreamOut = FALSE

Err = "CMSSW_1.s

StreamErr = FALSE

BufferSize = 524288

BufferBlockSize = 32768

ShouldTransferFiles = "YES"

WhenToTransferOutput = "ON_EXIT"

TransferFiles = "ONEXIT"

TransferInput = "/home/sguragai/physics/After_Thesis/CMSSW_3_8_5/src/SUSYBSMAnalysis/Zprime2muAn

alysis/test/FIT_2010B_Oct22/share/default.tgz,/home/sguragai/physics/After_Thesis/CMSSW_3_8_5/sr

c/SUSYBSMAnalysis/Zprime2muAnalysis/test/FIT_2010B_Oct22/share/arguments.xml,/home/sguragai/phys

ics/After_Thesis/CMSSW_3_8_5/src/SUSYBSMAnalysis/Zprime2muAnalysis/test/FIT_2010B_Oct22/job/CMSS

W.sh"

TransferOutput = "out_files_1.tgz,crab_fjr_1.xml"

ImageSize_RAW = 20

ImageSize = 20

ExecutableSize_RAW = 20

ExecutableSize = 20

DiskUsage_RAW = 3737

DiskUsage = 3750

RequestMemory = ceiling(ifThenElse(JobVMMemory =!= UNDEFINED, JobVMMemory, ImageSize / 1024.0000

00))

RequestDisk = DiskUsage

Requirements = TRUE

PeriodicHold = FALSE

PeriodicRelease = FALSE

PeriodicRemove = FALSE

>

>

LeaveJobInQueue = FALSE

Args = "1 1"

JobGridType = "gt2"

GridResource = "gt2 uscms1.fltech-grid3.fit.edu:2119/jobmanager-condor"

GlobusResubmit = FALSE

WantClaiming = FALSE

GlobusRSL = "(maxWalltime=120)"

x509userproxysubject = "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=guragain/CN=659508/CN=Samir

Guragain"

x509UserProxyVOName = "cms"

x509UserProxyFirstFQAN = "/cms/Role=NULL/Capability=NULL"

x509UserProxyFQAN = "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=guragain/CN=659508/CN=Samir Gur

again,/cms/Role=NULL/Capability=NULL,/cms/dbs/Role=NULL/Capability=NULL,/cms/uscms/Role=NULL/Cap

ability=NULL"

x509userproxy = "/tmp/x509up_u507"

BLTaskID = "guragain_FIT_2010B_Oct22_c2d67b"

GlobalJobId = "uscms1.fltech-grid3.fit.edu#151060.0#1288039973"

ProcId = 0

AutoClusterId = 23

AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,Requirements,NiceUser,Concurrenc

yLimits"

DelegatedProxyExpiration = 1288361681

GridJobId = "gt2 uscms1.fltech-grid3.fit.edu:2119/jobmanager-condor https://uscms1.fltech-grid3.

fit.edu:40935/21720/1288040029/"

NumGlobusSubmits = 1

GridJobStatus = "PENDING"

LastRemoteStatusUpdate = 1288361617

GlobusStatus = 0

RemoveReason = "via condor_rm (by user sguragai)"

LastHoldReason = "Globus error 131: the user proxy expired (job is still running)"

LastHoldReasonCode = 2

LastHoldReasonSubCode = 131

JobStatusOnRelease = 3

LastJobStatus = 3

JobStatus = 5

EnteredCurrentStatus = 1288368141

HoldReason = "Failed to initialize GAHP"

HoldReasonCode = 0

HoldReasonSubCode = 0

ReleaseReason = UNDEFINED

NumSystemHolds = 2

Managed = "Schedd"

CurrentStatusUnknown = TRUE

ServerTime = 1288795302



Thank you as always for all your help,

Xenia


On Mon, Nov 1, 2010 at 9:58 PM, Steven Timm <timm@xxxxxxxx> wrote:
You say that the user is running the job locally. Something
doesn't make sense since the GAHP is only used for condor grid universe
gt2 type jobs.  What's the output of condor_q -l  of
the job in question? (particularly the universe, the type,
and the GridResource).  Also is there anything in GridmanagerLog.<username>---if there were errors launching the GAHP
it would be in there.

Steve Timm



On Mon, 1 Nov 2010, Xenia Fave wrote:

Hi Steve,

I checked and the user has appropriate and up to date grid certificates.
Also our CRL certificates are all valid. He is running  the jobs locally,
not submitting them to a remote site. Is there another way his certificates
could be causing the problem?
Thank you again,
Xenia



==================================
Xenia Fave
System Administrator - FLTECH/T3_US_FIT
Experimental High Energy Physics
Florida Institute of Technology
==================================




On Mon, Nov 1, 2010 at 3:58 PM, Steven Timm <timm@xxxxxxxx> wrote:

That error happens when a user is trying to submit to a
remote grid site and does not have a grid (x509) credential.
Are you running a globus (gt2) gatekeeper or submitting to a remote site/
Steve



On Mon, 1 Nov 2010, Xenia Fave wrote:

 Hi all,

One of our users is having trouble getting his jobs to run. Every time he
submits them they enter held mode with the error "Failed to initialize
GAHP".  I would appreciate any advice. Is this a problem with his
configuration files or with our setup?
Thank you in advance,

Xenia

==================================
Xenia Fave
System Administrator - FLTECH/T3_US_FIT
Experimental High Energy Physics
Florida Institute of Technology
==================================


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group
Leader.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/