[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Standard Universe blues...



Hi all,

I've been struggling with this problem for a few days now... I wonder if anyone
would be able to suggest a possible solution.

We've got a fortran code that we have compiled with the Condor libraries. It
runs OK in the standard universe, but very slow, since it is very I/O intensive
(reads/writes around 70GB): in local it takes around 25 hours and with Condor
around 80-90.

I thought a possible solution was to use the option fetch_files

       fetch_files = file1, file2, ...

          If your job attempts to access a file mentioned in this list, Condor
          will  automatically  copy  the  whole file to the executing machine,
          where it can be accessed quickly. When your job closes the file,  it
          will  be  copied  back  to its original location. This list uses the
          same syntax as compress_files, shown above.

          This option only applies to standard-universe jobs.


but this is no good, as the files are copied back every time the file is
closed, which happens many many times. Ideally I would like something like this,
but that only copies the files when a checkpoint is made, so that in the 10
hours or so between evictions a lot of progress can be done on local files and
not through the network.

I've tried to fool the system by submitting it as a vanilla job (though
condor_compiled), wrapped in a script that will manually checkpoint the job
every 60 minutes or so, but I'm having troubles with this as well. (The
remaining problem right now is that I launch the process as a background job,
and then Condor considers this as a non-Condor load and the job gets suspended
and then evicted continuosly). Does anybody have an example of a script that is
able to regularly make a checkpoint of a program, plus move around some files,
and then continue until the job is evicted?

Thanks a lot,
Angel de Vicente
-- 
----------------------------------
http://www.iac.es/galeria/angelv/

PostDoc Software Support
Instituto de Astrofisica de Canarias