[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job keeps idle after reaching ~ 1GB and then re-executed but never finishes



Hi all,

Can anyone explain to me what happened to my jobs: the batch jobs I have
became idle for some time when one of the jobs reached approximately 1GB
of result/file, and then the jobs are re-executed..this keeps on repeating
twice then I decided to remove the job because it never finishes....I am
expecting that each job would return approximately 5 GB each. Is this some
restriction in Condor environment?,if so, how can I fix this?

The following is the log file of my job:
############################################
000 (552.000.000) 07/28 21:58:24 Job submitted from host: <10.0.40.139:32771>
...
000 (552.001.000) 07/28 21:58:24 Job submitted from host: <10.0.40.139:32771>
...
000 (552.002.000) 07/28 21:58:24 Job submitted from host: <10.0.40.139:32771>
...
001 (552.000.000) 07/28 21:58:27 Job executing on host: <10.0.40.148:32771>
...
001 (552.001.000) 07/28 21:58:29 Job executing on host: <10.0.40.139:32772>
...
001 (552.002.000) 07/28 21:58:32 Job executing on host: <10.0.40.112:32771>
...
006 (552.000.000) 07/28 21:58:36 Image size of job updated: 232508
...
006 (552.001.000) 07/28 21:58:37 Image size of job updated: 222952
...
006 (552.002.000) 07/28 21:58:40 Image size of job updated: 240012
...
006 (552.000.000) 07/28 22:18:36 Image size of job updated: 299492
...
006 (552.001.000) 07/28 22:18:37 Image size of job updated: 300036
...
006 (552.002.000) 07/28 22:18:40 Image size of job updated: 300564
...
006 (552.000.000) 07/28 22:38:35 Image size of job updated: 299992
...
006 (552.001.000) 07/28 22:38:37 Image size of job updated: 300588
...
006 (552.002.000) 07/28 22:38:40 Image size of job updated: 301064
...
006 (552.002.000) 07/28 22:58:40 Image size of job updated: 301196
...
006 (552.000.000) 07/28 23:18:35 Image size of job updated: 303588
...
006 (552.001.000) 07/28 23:18:37 Image size of job updated: 304188
...
006 (552.002.000) 07/28 23:18:40 Image size of job updated: 304660
...
006 (552.000.000) 07/28 23:38:35 Image size of job updated: 304708
...
006 (552.001.000) 07/28 23:38:37 Image size of job updated: 305304
...
006 (552.002.000) 07/28 23:38:40 Image size of job updated: 305776
...
006 (552.000.000) 07/29 00:38:36 Image size of job updated: 325480
...
006 (552.001.000) 07/29 00:38:37 Image size of job updated: 324032
...
006 (552.002.000) 07/29 00:38:39 Image size of job updated: 324516
...
006 (552.000.000) 07/29 00:58:36 Image size of job updated: 327124
...
006 (552.001.000) 07/29 00:58:37 Image size of job updated: 327980
...
006 (552.002.000) 07/29 01:18:39 Image size of job updated: 327168
...
006 (552.000.000) 07/29 01:38:36 Image size of job updated: 327640
...
006 (552.002.000) 07/29 02:18:40 Image size of job updated: 328708
...
006 (552.000.000) 07/29 03:38:35 Image size of job updated: 338992
...
006 (552.000.000) 07/29 04:38:36 Image size of job updated: 588500
...
006 (552.001.000) 07/29 04:38:37 Image size of job updated: 333832
...
006 (552.002.000) 07/29 04:38:40 Image size of job updated: 339928
...
006 (552.001.000) 07/29 05:38:37 Image size of job updated: 599632
...
006 (552.002.000) 07/29 05:38:40 Image size of job updated: 594704
...
006 (552.002.000) 07/29 05:58:40 Image size of job updated: 601292
...
001 (552.000.000) 07/29 06:23:53 Job executing on host: <10.0.40.139:32772>
...
001 (552.001.000) 07/29 06:34:15 Job executing on host: <10.0.40.148:32771>
...
001 (552.002.000) 07/29 06:34:17 Job executing on host: <10.0.40.112:32771>
...
006 (552.000.000) 07/29 06:44:01 Image size of job updated: 300600
...
006 (552.001.000) 07/29 06:54:23 Image size of job updated: 300148
...
006 (552.002.000) 07/29 06:54:25 Image size of job updated: 299148
...
006 (552.000.000) 07/29 07:04:01 Image size of job updated: 301100
...
006 (552.001.000) 07/29 07:14:23 Image size of job updated: 300616
...
006 (552.002.000) 07/29 07:14:25 Image size of job updated: 299456
...
006 (552.000.000) 07/29 07:24:01 Image size of job updated: 304700
...
006 (552.001.000) 07/29 07:34:23 Image size of job updated: 304212
...
006 (552.002.000) 07/29 07:34:25 Image size of job updated: 303052
...
006 (552.000.000) 07/29 07:44:01 Image size of job updated: 305816
...
006 (552.001.000) 07/29 07:54:23 Image size of job updated: 305332
...
006 (552.002.000) 07/29 07:54:26 Image size of job updated: 304172
...
006 (552.000.000) 07/29 08:44:01 Image size of job updated: 324556
...
006 (552.001.000) 07/29 08:54:23 Image size of job updated: 325088
...
006 (552.002.000) 07/29 08:54:25 Image size of job updated: 323752
...
006 (552.000.000) 07/29 09:04:01 Image size of job updated: 328840
...
006 (552.001.000) 07/29 09:14:23 Image size of job updated: 326720
...
006 (552.002.000) 07/29 09:14:25 Image size of job updated: 326972
...
006 (552.001.000) 07/29 09:54:23 Image size of job updated: 327492
...
006 (552.002.000) 07/29 09:54:25 Image size of job updated: 327228
...
006 (552.000.000) 07/29 11:44:01 Image size of job updated: 334344
...
006 (552.001.000) 07/29 11:54:23 Image size of job updated: 333860
...
006 (552.002.000) 07/29 11:54:25 Image size of job updated: 332700
...
006 (552.002.000) 07/29 12:34:26 Image size of job updated: 439668
...
006 (552.000.000) 07/29 12:44:01 Image size of job updated: 596972
...
006 (552.001.000) 07/29 12:54:23 Image size of job updated: 595696
...
006 (552.002.000) 07/29 12:54:25 Image size of job updated: 598368
...
006 (552.000.000) 07/29 20:04:01 Image size of job updated: 924564
...
006 (552.001.000) 07/29 20:14:23 Image size of job updated: 846144
...
006 (552.000.000) 07/29 20:24:01 Image size of job updated: 1033628
...
001 (552.000.000) 07/29 22:17:22 Job executing on host: <10.0.40.148:32771>
...
001 (552.001.000) 07/29 22:17:24 Job executing on host: <10.0.40.139:32772>
...
001 (552.002.000) 07/29 22:22:24 Job executing on host: <10.0.40.112:32771>
...
006 (552.000.000) 07/29 22:37:30 Image size of job updated: 299556
...
006 (552.001.000) 07/29 22:37:32 Image size of job updated: 298980
...
006 (552.002.000) 07/29 22:42:31 Image size of job updated: 301080
...
006 (552.000.000) 07/29 22:57:31 Image size of job updated: 300056
...
006 (552.001.000) 07/29 22:57:32 Image size of job updated: 299476
...
006 (552.002.000) 07/29 23:02:31 Image size of job updated: 301108
...
006 (552.000.000) 07/29 23:17:30 Image size of job updated: 303652
...
006 (552.002.000) 07/29 23:22:31 Image size of job updated: 304704
...
006 (552.000.000) 07/29 23:37:30 Image size of job updated: 304772
...
006 (552.001.000) 07/29 23:37:32 Image size of job updated: 303076
...
006 (552.002.000) 07/29 23:42:32 Image size of job updated: 305820
...
006 (552.001.000) 07/29 23:57:32 Image size of job updated: 304192
...
006 (552.000.000) 07/30 00:37:30 Image size of job updated: 323696
...
006 (552.002.000) 07/30 00:42:32 Image size of job updated: 326856
...
006 (552.000.000) 07/30 00:57:30 Image size of job updated: 327188
...
006 (552.001.000) 07/30 00:57:32 Image size of job updated: 322928
...
006 (552.002.000) 07/30 01:02:31 Image size of job updated: 328236
...
006 (552.001.000) 07/30 01:17:32 Image size of job updated: 326608
...
006 (552.002.000) 07/30 01:42:32 Image size of job updated: 328752
...
006 (552.001.000) 07/30 02:17:33 Image size of job updated: 327124
...
006 (552.000.000) 07/30 03:37:30 Image size of job updated: 333300
...
006 (552.002.000) 07/30 03:42:31 Image size of job updated: 334352
...
006 (552.001.000) 07/30 03:57:33 Image size of job updated: 372392
...
006 (552.000.000) 07/30 04:17:30 Image size of job updated: 376872
...
006 (552.002.000) 07/30 04:22:31 Image size of job updated: 443972
...
006 (552.000.000) 07/30 04:37:30 Image size of job updated: 597524
...
006 (552.002.000) 07/30 04:42:32 Image size of job updated: 588952
...
006 (552.001.000) 07/30 04:57:33 Image size of job updated: 591804
...
006 (552.002.000) 07/30 06:22:32 Image size of job updated: 596632
...
006 (552.001.000) 07/30 06:37:33 Image size of job updated: 594992
...
006 (552.002.000) 07/30 11:42:32 Image size of job updated: 851356
...
006 (552.000.000) 07/30 11:57:30 Image size of job updated: 977040
...
006 (552.001.000) 07/30 11:57:33 Image size of job updated: 679004
...
006 (552.002.000) 07/30 12:02:33 Image size of job updated: 1034548
...
006 (552.000.000) 07/30 12:17:31 Image size of job updated: 1033368
...
006 (552.001.000) 07/30 12:17:33 Image size of job updated: 1031860
...

...then the job became idle again :(
######################################################################


Thanks,

Leo