[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] PROBLEMS WITH HYPER THREADING PART 2



Hi again, the time that each job takes in each machine if I put just VM1
is 23 minutes, if I put VM1 or VM2 or without constrains it takes 40
minutes!




>> Hi Ian, thank you so much! yes it makes sense!, but I wasn't clear.
>> I've a cluster with 12 P4 HT 3 GHz. Then I submit my job that
>> has queue 40, and each time must make 1,800 runs (its a
>> genetic algorithm).  It is not paralellized.
>
> Ahh! That clears things up thanks.
>
>> When I submit my job it takes 2 hrs without constrains, then
>> I say "just VM1" and it takes the same time.
>> In the machines I've put (is this config.ok?):
>>
>> NUM_CPUS = 2
>> NUM_VIRTUAL_MACHINES_TYPE_1= 2
>
> Are you defining custom virtual machine types? If you're not creating
> custom virtual machines you can comment the line above
> (NUM_VIRTUAL_MACHINES_TYPE_1) out.
>
>> NUM_VIRTUAL_MACHINES = 2
>> VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE = 2
>> VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD = 2
>> COUNT_HYPERTHREAD_CPUS = TRUE
>
> It won't matter what you set COUNT_HYPERTHREAD_CPUS to because you're
> overriding Condor's "take a guess at how many CPUs there are" mechanism
> when you explcitly set NUM_CPUS and NUM_VIRTUAL_MACHINES.
>
>> Then I tried to run with queue
>> 40, but each job with 3 runs(not 1,800) just to obtain results faster.
>
> Got it. I'm understanding what you're doing now. Job lands on a machine:
> does 3 itterations of your algorithm in serial. Right?
>
>> In this scenario, if I submit the job, without constrains it
>> takes half the time that if I put just vm1, for example.
>
> This makes sense. If you have 12 machines each with two slots and you
> constrain your jobs to only run on slot 1 you you'll run 12 instances of
> your algorithm in parallel. But if you omit the slot constraint you'll
> run 24 instances of your algorithm in parallel so your jobs will run
> roughly twice as fast (assuming all things are equally w.r.t. your
> jobs).
>
> But: this is different from what you wrote in the first paragraph of
> this post. In your first paragraph you make it sound like with or
> without a slot constraint the same number of jobs, running the same
> number of algorithm itterations per job, takes the same amount of time.
> Is this the problem?
>
>> The executable is the same, so I don't know why is beheaving
>> like that.
>
> One possibility might be that only 1/2 the slots on your machines are
> available to run jobs because the other half are in the owner state.
> What does condor_status show for your pool?
>
> - Ian
>
>
> Confidentiality Notice.  This message may contain information that is
> confidential or otherwise protected from disclosure.
> If you are not the intended recipient, you are hereby notified that any
> use, disclosure, dissemination, distribution,
> or copying of this message, or any attachments, is strictly prohibited.
> If you have received this message in error,
> please advise the sender by reply e-mail, and delete the message and any
> attachments.  Thank you.
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>


Ing. Paula Marti­nez
ITU - Redes y Telecomunicaciones