[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor mail error notification



Hello to all!,

Not trying to be annoying but I really don’t have a clue of how to attack this issue, any ideas are welcome,

Thanks again,

Alex

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
Sent: Wednesday, August 19, 2009 12:31 PM
To: Condor-Users Mail List
Subject: [Condor-users] condor mail error notification

 

Hello to all,

I launched some jobs through my condor pool. I have a mixed farm of windows 2003 and windows XP boxes. The second ones are Virtual machines running on Linux hosts. The jobs I ran last night are still running but I am receiving several e-mail notifications from all the windows XP machines. I launched the jobs from a computer that belonged to another pool using “condor_submit –pool negotiation –name scheduler condor_submission_filename.sub”; The error message is the following:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is an automated email from the Condor system on machine "vm4-condor-xp.earthdata.com".  Do not reply.

 

"C:\Condor/bin/condor_startd.exe" on "vm4-condor-xp.earthdata.com" exited with status 4.

Condor will automatically restart this process in 10 seconds.

 

*** Last 20 line(s) of file C:\Condor/log/StartLog:

8/18 18:07:08 slot1: State change: No preempting claim, returning to owner

8/18 18:07:08 slot1: Changing state and activity: Preempting/Vacating -> Owner/Idle

8/18 18:07:08 slot1: State change: IS_OWNER is false

8/18 18:07:08 slot1: Changing state: Owner -> Unclaimed

8/18 18:11:59 slot1: match_info called

8/18 18:11:59 slot1: Received match <10.2.168.99:1578>#1250626520#7#...

8/18 18:11:59 slot1: State change: match notification protocol successful

8/18 18:11:59 slot1: Changing state: Unclaimed -> Matched

8/18 18:11:59 slot1: Request accepted.

8/18 18:11:59 slot1: Remote owner is aalas@xxxxxxxxxxxxx

8/18 18:11:59 slot1: State change: claiming protocol successful

8/18 18:11:59 slot1: Changing state: Matched -> Claimed

8/18 18:11:59 ERROR "Can't find WANT_SUSPEND in internal ClassAd" at line 1226 in file..\src\condor_startd.V6\Resource.cpp

8/18 18:11:59 slot1: Changing state and activity: Claimed/Idle -> Preempting/Killing

8/18 18:11:59 slot1: State change: No preempting claim, returning to owner

8/18 18:11:59 slot1: Changing state and activity: Preempting/Killing -> Owner/Idle

8/18 18:11:59 slot1: State change: IS_OWNER is false

8/18 18:11:59 slot1: Changing state: Owner -> Unclaimed

8/18 18:11:59 slot2: Changing state and activity: Claimed/Busy -> Preempting/Killing

8/18 18:11:59 startd exiting because of fatal exception.

*** End of file StartLog

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I am not an expert on condor so I don’t know how to interpret this error message? Any ideas?

Thanks in advance for your help,

 

Respectfully,

Alex Alas
Fugro EarthData Inc.