[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor jobs terminated by SIGINT



I've received the following complaint from some of my Condor users:


Over the past couple weeks, a number of Condor jobs have been terminated
with a SIGINT (interrupt signal, as by a keyboard ^C).  They are logged
in the program output as follows:

parser_rids_trips_December_23/PROG_OUT.txt                    
  Simulator interrupted with PC = 0x182480 <put_match_list$2>        

swim_rids_trips_December_23/PROG_OUT.txt                    
    Simulator interrupted with PC = 0x283500 <raise$2>        

parser_rids_trips_December_29/PROG_OUT.txt                    
    Reading the dictionary files: **Simulator interrupted with PC =
0x4dc900 <ra
bridged_lookup$4>        

gzip_rids_trips_January_03/PROG_OUT.txt                    
    Simulator interrupted with PC = 0x172580 <compress_block$57>        

mcf_rids_trips_January_03/PROG_OUT.txt                    
    Simulator interrupted with PC = 0x115380 <memset$6>        

parser_rids_trips_January_03/PROG_OUT.txt                    
    Reading the dictionary files: ***Simulator interrupted with PC =
0x571a00 <f
getc_unlocked$3>     

I can't explain this behavior because:
- The Condor jobs were all compiled with the Condor libraries.
- Most or all of them were terminated while I was out of the office.
- I can't cause this behavior to occur when I condor_rm jobs--an outside
  agent seems to be doing it.


Does anyone have a suggestion as to why these processes would have been
interrupted?  They were running on dedicated compute nodes in a cluster,
and the users submitting the jobs have the highest priority available in
my RANKing scheme.

-- 
David A. Kotz <dkotz@xxxxxxxxxxxxx>