[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Error in starter:Stream::get(int) failed to read padding



Hi,
 
I am trying to launch a VMware VM  job on ESX server 3.5.
 
But Starter is unable to download the VMX file.  VMdk files are kept in VMFS . 
 
I am getting error Stream::get(int) failed to read padding in Shadow log.
 
I see similar problem reported in users forum (subject Condor 7.1.0 and condor_config_val ).
 
What could be the problem?
Attached are the Starter logs, Job ad and Shadow logs.
 
 Regards
Kamakshi
 
  
 
 

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Starter logs.

12/24 08:24:09 condor_read(): recv() returned -1, errno = 104, assuming failure reading 5 bytes from unknown source.
12/24 08:24:09 IO: Failed to read packet header
12/24 08:24:09 Stream::get(int) failed to read padding
12/24 08:24:09 DoDownload: exiting at 1522
12/24 08:24:09 DaemonCore: No more children processes to reap.
12/24 08:24:09 File transfer failed (status=0).
12/24 08:24:09 Calling client FileTransfer handler function.
12/24 08:24:09 ERROR "Failed to transfer files" at line 1810 in file jic_shadow.C
12/24 08:24:09 ShutdownFast all jobs.
12/24 08:24:09 Got ShutdownFast when no jobs running.


Shadow logs.

12/24 08:23:21 (4.0) (9704): in RemoteResource::initStartdInfo()
12/24 08:23:21 (4.0) (9704): Adding to resolved authorization table: */10.207.100.103: WRITE
12/24 08:23:21 (4.0) (9704): Adding to resolved authorization table: */10.207.100.103: WRITE,DAEMON
12/24 08:23:21 (4.0) (9704): Entering DCStartd::activateClaim()
12/24 08:23:21 (4.0) (9704): DCStartd::activateClaim: successfully sent command, reply is: 1
12/24 08:23:21 (4.0) (9704): Request to run on <10.207.100.103:9601> was ACCEPTED
12/24 08:23:21 (4.0) (9704): Resource slot7mgmt.pesgrid.org.com changing state from PRE to STARTUP
12/24 08:23:21 (4.0) (9704): Getting monitoring info for pid 9704
12/24 08:23:21 (4.0) (9704): DaemonCore: in SendAliveToParent()
12/24 08:23:21 (4.0) (9704): DaemonCore: Leaving SendAliveToParent() - success
12/24 08:23:21 (4.0) (9704): entering FileTransfer::Init
12/24 08:23:21 (4.0) (9704): entering FileTransfer::SimpleInit
12/24 08:23:21 (4.0) (9704): condor_write(): Socket closed when trying to write 298 bytes to unknown source, fd is 9, errno=104
VM job clasdd ad

MyType = "Job"
TargetType = "Machine"
GlobalJobId = "gridprime.pesgrid.wipro.com#1230015652#4.0"
TransferInput = "/mail/orgdec-24/jt/01//custx-centos-email.vmx"
TransferExecutable = FALSE
ExecutableSize_RAW = 0
ExecutableSize = 0
VMPARAM_VMware_SnapshotDisk = TRUE
VMPARAM_VMware_Transfer = FALSE
VMPARAM_VMware_VMX_File = "custx-centos-email.vmx"
VMPARAM_VMware_VMDK_Files = "custx-centos-email.vmdk"
UserLog = "/mail/condor/log/VM4.log"
VMPARAM_VMware_vlanid = 24
JobName = "orgdec-24_mysub_testasset"
JobVMCheckpoint = TRUE
JobVMNetworkingType = "custom"
JobVMNetworking = TRUE
JobVMMemory = 64
JobVMType = "vmware"
Owner = "idealgrid"
JobUniverse = 13
Cmd = "custx-centos-email.vmx"
QDate = 1230015652
CompletionDate = 0
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteUserCpu = 0.000000
RemoteSysCpu = 0.000000
CoreSize = -1
ExitStatus = 0
ExitBySignal = FALSE
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
CumulativeSuspensionTime = 0
RootDir = "/"
MinHosts = 1
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
WantRemoteIO = TRUE
JobPrio = 0
User = "idealgrid@xxxxxxxxxxxxxxx"
NiceUser = FALSE
Env = ""
JobNotification = 0
KillSig = "SIGTERM"
ImageSize_RAW = 0
ImageSize = 0
In = "/dev/null"
Out = "/dev/null"
Err = "/dev/null"
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "YES"
TransferFiles = "ALWAYS"
WhenToTransferOutput = "ON_EXIT_OR_EVICT"
PeriodicHold = FALSE
PeriodicRemove = FALSE
PeriodicRelease = FALSE
OnExitHold = FALSE
OnExitRemove = TRUE
CondorVersion = "$CondorVersion: 7.0.3 Jun 20 2008 BuildID: 91405 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RHEL5 $"
ClusterId = 4
ProcId = 0
Requirements = (Arch == "INTEL" && HasVM && VM_Type == "vmware")
StageInStart = 1
StageInFinish = 1
FilesRetrieved = FALSE
LeaveJobInQueue = FilesRetrieved =?= FALSE
Arguments = ""
Iwd = "/mail/condorvm/spool/cluster4.proc0.subproc0"
JobStartDate = 1230015723
VMPARAM_VMware_Dir = "/vmfs/volumes/storage1/orgdec-24/jt/01/"
ScheddBday = 1230087324
AutoClusterId = 0
AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,Requirements,NiceUser"NumShadowExceptions = 878
RemoteWallClockTime = 48792.000000
LastRemoteHost = "slot7@xxxxxxxxxxxxxxxxxxxxxxxxxxx"
LastPublicClaimId = "<10.207.100.103:9601>#1230015559#8#..."
LastPublicClaimIds = ""
MaxHosts = 1
WantMatchDiagnostics = TRUE
LastMatchTime = 1230088133
NumJobMatches = 3
OrigMaxHosts = 1
JobStatus = 2
EnteredCurrentStatus = 1230088135
LastSuspensionTime = 0
CurrentHosts = 1
PublicClaimId = "<10.207.100.103:9601>#1230015559#359#..."
LastJobLeaseRenewal = 1230088135
RemoteHost = "slot7@xxxxxxxxxxxxxxxxxxxxxxxxx"
RemoteSlotID = 8
ShadowBday = 1230088135
JobLastStartDate = 1230087993
JobCurrentStartDate = 1230088135
NumShadowStarts = 880
JobRunCount = 880
MATCH_EXP_NegotiatorMatchExprNEGOTIATORNAME = "PRIMARY_NEGO"
ServerTime = 1230088142