[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] architectural question/issue



On 8/28/06, bruce <bedouglas@xxxxxxxxxxxxx> wrote:
hi...

i'm grappling with an issue that i can't seem to get my hands around...

i'm creating a test app, where my app spawns off a number of child
processes. each app might create 100's-1000's of child processes. each child
process preforms an operation, and then needs to write the output of the
operation to a db.

here's the issue... if i use a db like mysql, i can only handle a limited
number of simultaneous connections based on mem/system resources. for a
reasonable machine, this might be 1000 simultaneous connections, which means
my child apps are going to be in a wait state as the child app waits to get
an open connection...

i could also have each child app simply write a raw output/text file, and
have some sort of external process be responsible for doing the read/write
from the files, and writing the data to the db.. but here again, this would
be slow, as it would be sequential in nature unless i created a multi
threaded kind of app that used some sort of connection pooling approach.

it doesn't appear that the file approach would be faster than waiting for
the open connection...

my question.... is there potentially a faster/better way to be able to slam
data at a fast rate into a db.. i could even set up some sort of ditributed
db farm/apps if that makes sense...

There is - but it is
1) database vendor dependant
2) a pain to get right and automate
3) normally requires a total table lock while it is running

On oracle it is SQL*Loader (I have experience with this but it was 3 years ago)
It can be blazingly fast, especially if you have indexes in the table
in question since it rebuilds the indexes at the end.

sql server has BCP (and some nice stuff in sql 2005 which lets you do
something similar but with a normal connection even from ado.net)

MySQL has LoadData
http://dev.mysql.com/doc/refman/5.0/en/load-data.html
I know nothing about it except that it exists.

Previous experience of these things suggests that, if you have more
than 10 megabytes of data the speed up from using the specific tool
can more than outweigh any amount of file creation and copying
required.

Do not expect it to work out of the box without some messing around.
I strongly suggest the first test should be based on your condor jobs
generating the various parts of the file in the required format and
then have one job (probably non condor) running, if possible, on the
same box as the database with the file local to it dumping it into the
database.
This is far and away the simplest option and stands a chance of being
the fastest (albeit with no 'partial' data being present as you go
along.

Someone else in mysql land may have *much* better advice about using LoadData

Best of luck.

Matt