I just saw something strange and am curious if anybody knows why this would have happened. I have a job that appears to have been run twice by condor, even though the first time it seems to have run to completion. It’s launched by wrapper scripts, and I thought maybe one of them was seeing bad output and re-submitting the job, except it’s the same cluster/job id, and there’s no indication in the log that the job was resubmitted. Can’t reproduce or see a pattern…
It’s a vanilla universe job submitted on a linux machine, flocked to another pool, and running on a Linux machine.
Here’s the log…
000 (24305.000.000) 10/18 15:29:23 Job submitted from host: <x.y.z.14:50865> ... 001 (24305.000.000) 10/18 15:34:22 Job executing on host: <a.b.c.138:32775> ... 006 (24305.000.000) 10/18 15:34:30 Image size of job updated: 28808 ... 005 (24305.000.000) 10/18 15:42:40 Job terminated. (1) Normal termination (return value 0) Usr 0 00:05:13, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:05:13, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 1989 - Run Bytes Sent By Job 8242 - Run Bytes Received By Job 1989 - Total Bytes Sent By Job 8242 - Total Bytes Received By Job ... 001 (24305.000.000) 10/18 16:25:00 Job executing on host: < a.b.c.112:32778> ... 006 (24305.000.000) 10/18 16:25:08 Image size of job updated: 27176 ... 005 (24305.000.000) 10/18 16:33:22 Job terminated. (1) Normal termination (return value 0) Usr 0 00:05:18, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:05:18, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 1989 - Run Bytes Sent By Job 8242 - Run Bytes Received By Job 1989 - Total Bytes Sent By Job 8242 - Total Bytes Received By Job ...
Michael. |