[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor mail error notification



Alex,

The ERROR means that Condor excepted. I've seen this particular error sporadically, and never often enough that I could properly debug it. How often does it happen for you?

You might try setting the ABORT_ON_EXCEPTION param to TRUE and CREATE_CORE_FILE = TRUE to get more information about where this is happening and what the startd's state is at the time.

Best,


matt

Alas, Alex [FEDI] wrote:
> Hello to all!,
> 
> Not trying to be annoying but I really don't have a clue of how to
> attack this issue, any ideas are welcome, 
> 
> Thanks again,
> 
> Alex 
> 
>  
> 
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
> Sent: Wednesday, August 19, 2009 12:31 PM
> To: Condor-Users Mail List
> Subject: [Condor-users] condor mail error notification
> 
>  
> 
> Hello to all,
> 
> I launched some jobs through my condor pool. I have a mixed farm of
> windows 2003 and windows XP boxes. The second ones are Virtual machines
> running on Linux hosts. The jobs I ran last night are still running but
> I am receiving several e-mail notifications from all the windows XP
> machines. I launched the jobs from a computer that belonged to another
> pool using "condor_submit -pool negotiation -name scheduler
> condor_submission_filename.sub"; The error message is the following:
> 
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXX
> 
> This is an automated email from the Condor system on machine
> "vm4-condor-xp.earthdata.com".  Do not reply.
> 
>  
> 
> "C:\Condor/bin/condor_startd.exe" on "vm4-condor-xp.earthdata.com"
> exited with status 4.
> 
> Condor will automatically restart this process in 10 seconds.
> 
>  
> 
> *** Last 20 line(s) of file C:\Condor/log/StartLog:
> 
> 8/18 18:07:08 slot1: State change: No preempting claim, returning to
> owner
> 
> 8/18 18:07:08 slot1: Changing state and activity: Preempting/Vacating ->
> Owner/Idle
> 
> 8/18 18:07:08 slot1: State change: IS_OWNER is false
> 
> 8/18 18:07:08 slot1: Changing state: Owner -> Unclaimed
> 
> 8/18 18:11:59 slot1: match_info called
> 
> 8/18 18:11:59 slot1: Received match <10.2.168.99:1578>#1250626520#7#...
> 
> 8/18 18:11:59 slot1: State change: match notification protocol
> successful
> 
> 8/18 18:11:59 slot1: Changing state: Unclaimed -> Matched
> 
> 8/18 18:11:59 slot1: Request accepted.
> 
> 8/18 18:11:59 slot1: Remote owner is aalas@xxxxxxxxxxxxx
> 
> 8/18 18:11:59 slot1: State change: claiming protocol successful
> 
> 8/18 18:11:59 slot1: Changing state: Matched -> Claimed
> 
> 8/18 18:11:59 ERROR "Can't find WANT_SUSPEND in internal ClassAd" at
> line 1226 in file..\src\condor_startd.V6\Resource.cpp
> 
> 8/18 18:11:59 slot1: Changing state and activity: Claimed/Idle ->
> Preempting/Killing
> 
> 8/18 18:11:59 slot1: State change: No preempting claim, returning to
> owner
> 
> 8/18 18:11:59 slot1: Changing state and activity: Preempting/Killing ->
> Owner/Idle
> 
> 8/18 18:11:59 slot1: State change: IS_OWNER is false
> 
> 8/18 18:11:59 slot1: Changing state: Owner -> Unclaimed
> 
> 8/18 18:11:59 slot2: Changing state and activity: Claimed/Busy ->
> Preempting/Killing
> 
> 8/18 18:11:59 startd exiting because of fatal exception.
> 
> *** End of file StartLog
> 
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXX
> 
> I am not an expert on condor so I don't know how to interpret this error
> message? Any ideas?
> 
> Thanks in advance for your help,
> 
>  
> 
> Respectfully,
> 
> Alex Alas 
> Fugro EarthData Inc.
> 
>  
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/