[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job keeps re-executing after reaching ~ 1GB size but never finishes...



Leo,

You will need to say what version of Condor, what operating system, and maybe some information about the machines in your pool. How much memory and virtual memory do the execute nodes have? Condor jobs can certainly use more than 1GB of memory.

You can also search the log files for both the submit node and the execute node for one of those job numbers to see if you can find more information about what is happening to your jobs.

- dave


Leo Cristobal C. Ambolode II wrote:
Hi all,

Can anyone explain to me what happened to my jobs: the batch jobs I have
became idle for some time when one of the jobs reached approximately 1GB
of result/file, and then the jobs are re-executed..this keeps on repeating
twice then I decided to remove the job because it never finishes....I am
expecting that each job would return approximately 5 GB each. Is this some
restriction in Condor environment?,if so, how can I fix this?

The following is the log file of my job:
############################################
000 (552.000.000) 07/28 21:58:24 Job submitted from host: <10.0.40.139:32771>
...
000 (552.001.000) 07/28 21:58:24 Job submitted from host: <10.0.40.139:32771>
...
000 (552.002.000) 07/28 21:58:24 Job submitted from host: <10.0.40.139:32771>
...
001 (552.000.000) 07/28 21:58:27 Job executing on host: <10.0.40.148:32771>
...
001 (552.001.000) 07/28 21:58:29 Job executing on host: <10.0.40.139:32772>
...
001 (552.002.000) 07/28 21:58:32 Job executing on host: <10.0.40.112:32771>
...
006 (552.000.000) 07/28 21:58:36 Image size of job updated: 232508
...
006 (552.001.000) 07/28 21:58:37 Image size of job updated: 222952
...
006 (552.002.000) 07/28 21:58:40 Image size of job updated: 240012
...
006 (552.000.000) 07/28 22:18:36 Image size of job updated: 299492
...
006 (552.001.000) 07/28 22:18:37 Image size of job updated: 300036
...
006 (552.002.000) 07/28 22:18:40 Image size of job updated: 300564
...
006 (552.000.000) 07/28 22:38:35 Image size of job updated: 299992
...
006 (552.001.000) 07/28 22:38:37 Image size of job updated: 300588
...
006 (552.002.000) 07/28 22:38:40 Image size of job updated: 301064
...
006 (552.002.000) 07/28 22:58:40 Image size of job updated: 301196
...
006 (552.000.000) 07/28 23:18:35 Image size of job updated: 303588
...
006 (552.001.000) 07/28 23:18:37 Image size of job updated: 304188
...
006 (552.002.000) 07/28 23:18:40 Image size of job updated: 304660
...
006 (552.000.000) 07/28 23:38:35 Image size of job updated: 304708
...
006 (552.001.000) 07/28 23:38:37 Image size of job updated: 305304
...
006 (552.002.000) 07/28 23:38:40 Image size of job updated: 305776
...
006 (552.000.000) 07/29 00:38:36 Image size of job updated: 325480
...
006 (552.001.000) 07/29 00:38:37 Image size of job updated: 324032
...
006 (552.002.000) 07/29 00:38:39 Image size of job updated: 324516
...
006 (552.000.000) 07/29 00:58:36 Image size of job updated: 327124
...
006 (552.001.000) 07/29 00:58:37 Image size of job updated: 327980
...
006 (552.002.000) 07/29 01:18:39 Image size of job updated: 327168
...
006 (552.000.000) 07/29 01:38:36 Image size of job updated: 327640
...
006 (552.002.000) 07/29 02:18:40 Image size of job updated: 328708
...
006 (552.000.000) 07/29 03:38:35 Image size of job updated: 338992
...
006 (552.000.000) 07/29 04:38:36 Image size of job updated: 588500
...
006 (552.001.000) 07/29 04:38:37 Image size of job updated: 333832
...
006 (552.002.000) 07/29 04:38:40 Image size of job updated: 339928
...
006 (552.001.000) 07/29 05:38:37 Image size of job updated: 599632
...
006 (552.002.000) 07/29 05:38:40 Image size of job updated: 594704
...
006 (552.002.000) 07/29 05:58:40 Image size of job updated: 601292
...
001 (552.000.000) 07/29 06:23:53 Job executing on host: <10.0.40.139:32772>
...
001 (552.001.000) 07/29 06:34:15 Job executing on host: <10.0.40.148:32771>
...
001 (552.002.000) 07/29 06:34:17 Job executing on host: <10.0.40.112:32771>
...
006 (552.000.000) 07/29 06:44:01 Image size of job updated: 300600
...
006 (552.001.000) 07/29 06:54:23 Image size of job updated: 300148
...
006 (552.002.000) 07/29 06:54:25 Image size of job updated: 299148
...
006 (552.000.000) 07/29 07:04:01 Image size of job updated: 301100
...
006 (552.001.000) 07/29 07:14:23 Image size of job updated: 300616
...
006 (552.002.000) 07/29 07:14:25 Image size of job updated: 299456
...
006 (552.000.000) 07/29 07:24:01 Image size of job updated: 304700
...
006 (552.001.000) 07/29 07:34:23 Image size of job updated: 304212
...
006 (552.002.000) 07/29 07:34:25 Image size of job updated: 303052
...
006 (552.000.000) 07/29 07:44:01 Image size of job updated: 305816
...
006 (552.001.000) 07/29 07:54:23 Image size of job updated: 305332
...
006 (552.002.000) 07/29 07:54:26 Image size of job updated: 304172
...
006 (552.000.000) 07/29 08:44:01 Image size of job updated: 324556
...
006 (552.001.000) 07/29 08:54:23 Image size of job updated: 325088
...
006 (552.002.000) 07/29 08:54:25 Image size of job updated: 323752
...
006 (552.000.000) 07/29 09:04:01 Image size of job updated: 328840
...
006 (552.001.000) 07/29 09:14:23 Image size of job updated: 326720
...
006 (552.002.000) 07/29 09:14:25 Image size of job updated: 326972
...
006 (552.001.000) 07/29 09:54:23 Image size of job updated: 327492
...
006 (552.002.000) 07/29 09:54:25 Image size of job updated: 327228
...
006 (552.000.000) 07/29 11:44:01 Image size of job updated: 334344
...
006 (552.001.000) 07/29 11:54:23 Image size of job updated: 333860
...
006 (552.002.000) 07/29 11:54:25 Image size of job updated: 332700
...
006 (552.002.000) 07/29 12:34:26 Image size of job updated: 439668
...
006 (552.000.000) 07/29 12:44:01 Image size of job updated: 596972
...
006 (552.001.000) 07/29 12:54:23 Image size of job updated: 595696
...
006 (552.002.000) 07/29 12:54:25 Image size of job updated: 598368
...
006 (552.000.000) 07/29 20:04:01 Image size of job updated: 924564
...
006 (552.001.000) 07/29 20:14:23 Image size of job updated: 846144
...
006 (552.000.000) 07/29 20:24:01 Image size of job updated: 1033628
...
001 (552.000.000) 07/29 22:17:22 Job executing on host: <10.0.40.148:32771>
...
001 (552.001.000) 07/29 22:17:24 Job executing on host: <10.0.40.139:32772>
...
001 (552.002.000) 07/29 22:22:24 Job executing on host: <10.0.40.112:32771>
...
006 (552.000.000) 07/29 22:37:30 Image size of job updated: 299556
...
006 (552.001.000) 07/29 22:37:32 Image size of job updated: 298980
...
006 (552.002.000) 07/29 22:42:31 Image size of job updated: 301080
...
006 (552.000.000) 07/29 22:57:31 Image size of job updated: 300056
...
006 (552.001.000) 07/29 22:57:32 Image size of job updated: 299476
...
006 (552.002.000) 07/29 23:02:31 Image size of job updated: 301108
...
006 (552.000.000) 07/29 23:17:30 Image size of job updated: 303652
...
006 (552.002.000) 07/29 23:22:31 Image size of job updated: 304704
...
006 (552.000.000) 07/29 23:37:30 Image size of job updated: 304772
...
006 (552.001.000) 07/29 23:37:32 Image size of job updated: 303076
...
006 (552.002.000) 07/29 23:42:32 Image size of job updated: 305820
...
006 (552.001.000) 07/29 23:57:32 Image size of job updated: 304192
...
006 (552.000.000) 07/30 00:37:30 Image size of job updated: 323696
...
006 (552.002.000) 07/30 00:42:32 Image size of job updated: 326856
...
006 (552.000.000) 07/30 00:57:30 Image size of job updated: 327188
...
006 (552.001.000) 07/30 00:57:32 Image size of job updated: 322928
...
006 (552.002.000) 07/30 01:02:31 Image size of job updated: 328236
...
006 (552.001.000) 07/30 01:17:32 Image size of job updated: 326608
...
006 (552.002.000) 07/30 01:42:32 Image size of job updated: 328752
...
006 (552.001.000) 07/30 02:17:33 Image size of job updated: 327124
...
006 (552.000.000) 07/30 03:37:30 Image size of job updated: 333300
...
006 (552.002.000) 07/30 03:42:31 Image size of job updated: 334352
...
006 (552.001.000) 07/30 03:57:33 Image size of job updated: 372392
...
006 (552.000.000) 07/30 04:17:30 Image size of job updated: 376872
...
006 (552.002.000) 07/30 04:22:31 Image size of job updated: 443972
...
006 (552.000.000) 07/30 04:37:30 Image size of job updated: 597524
...
006 (552.002.000) 07/30 04:42:32 Image size of job updated: 588952
...
006 (552.001.000) 07/30 04:57:33 Image size of job updated: 591804
...
006 (552.002.000) 07/30 06:22:32 Image size of job updated: 596632
...
006 (552.001.000) 07/30 06:37:33 Image size of job updated: 594992
...
006 (552.002.000) 07/30 11:42:32 Image size of job updated: 851356
...
006 (552.000.000) 07/30 11:57:30 Image size of job updated: 977040
...
006 (552.001.000) 07/30 11:57:33 Image size of job updated: 679004
...
006 (552.002.000) 07/30 12:02:33 Image size of job updated: 1034548
...
006 (552.000.000) 07/30 12:17:31 Image size of job updated: 1033368
...
006 (552.001.000) 07/30 12:17:33 Image size of job updated: 1031860
...

...then the job became idle again :(
######################################################################


Thanks,

Leo

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/