[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] Unexplained status=128
Griffith, Brent wrote:
You raise the question of whether or not a user is already logged into theI think Windows 2000 and XP can run the Condor jobs even though the user
is logged in. I haven't tested it with NT. I had Condor using the
machines when the user is logged in. My main problem is if the user is
logged in and he is using the main GUI which shares many DLL with the
worker executables then I get 128 error. It took me a while to finally
track that is the case since it is too random to know why this machine
which run the job fine a couple of hours ago was now having error 128.
May be you could check your nodes to see if the problem happens when
there is a DLL your works executables share with the excitable running
at that time.
remote NT machine. Could that be the cause of 128 errors? My
understanding is that NT can only handle one user at a time.
That is exactly what is bothering me and the bad thing about it the
randomness of it. It will be nice if there was a way to know if there
was a user logged into the machine. Even though, it reduces the number
of nodes that I have for computation, the resubmitting works since the
user wouldn't even know that this happened, as far as he is considered
the job is completed. That is one the good thing I like about Condor.
I have been struggling with similar 128 problems, but haven't been able to
track it down. (I am passing many DLLs found by dumpbin and loadtest... )
The most iritating thing is that my own submit machine shows the code 128
behavior and I know it can run the jobs.
My work around has been to excludeI think there is a solution for that, I have seen an option to
configure the negotiator not to send the job back immediately to the
same node after a failure but I couldn't remember which option it is.
May be Condors will enlighten us on this.
execute nodes that show the problem. The problem with resumbitting jobs that
exit with 128 is that the same nodes keep accepting jobs and running through
them quickly because they don't actually compute.
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>