Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] HoldReason = "Streaming not supported"
- Date: Wed, 9 Jul 2008 13:33:45 -0700 (PDT)
- From: Sean Manning <seangwm@xxxxxxx>
- Subject: Re: [Condor-users] HoldReason = "Streaming not supported"
Jaime Frey wrote:
>On Jun 27, 2008, at 12:36 PM, Sean Manning wrote:
>
>> Jaime Frey wrote:
>>
>>> On Jun 23, 2008, at 6:50 PM, Sean Manning wrote:
>>>
>>>> I am working on a Web Services interface to submit jobs to our
>>>> Globus
>>>> grid. It uses the condor and birdbath Java packages. We can
>>>> successfully submit the attached JDL on the command line of a condor
>>>> head node (the metascheduler of our grid) and see it complete, but
>>>> when we submit it with the Java program from an external Condor
>>>> client
>>>> machine the job stays Idle then Halts with an error. Running the
>>>> condor daemons as root got rid of one error, but now we get another
>>>> one: HoldReason = "Streaming not supported". I can't find any
>>>> information about this error in the usergroup archives. Does anyone
>>>> here have an idea what could be causing this?
>>>
>>> For GT4 GRAM jobs, if StreamOut and StreamErr aren't explicitly set
>>> to
>>> False in the job ad, then Condor assumes you want stdout and stderr
>>> to
>>> be streamed, which isn't supported by Condor for GT4 GRAM jobs. This
>>> appears to be a bug, as the default behavior for other job types is
>>> no
>>> streaming.
>>>
>>> If you add the following two attributes to your job ads, it should
>>> eliminate the problem:
>>> StreamOut = False
>>> StreamErr = False
>>>
>>> Thanks and regards,
>>> Jaime Frey
>>> UW-Madison Condor Team
>>>
>>
>> Dear Jaime,
>>
>> Thanks for the reply.
>>
>> I made that change, but jobs are still hanging with HoldReason =
>> "Streaming not supported." I can submit the new file with
>> condor_submit from the grid metascheduler and see it appear on the
>> head
>> node of a worker cluster, when condor_config has SOAP enabled. The
>> output and error come back to the machine I submitted the job from
>> just
>> like they are supposed to. But when I submit the same JDL to the
>> grid
>> metascheduler using our Web Services code, the job always holds
>> after a
>> delay.
>>
>> <snip>
>>
>> In principle, if we can submit a job to the grid using condor_submit,
>> then the web services submission should work as well. I would be very
>> grateful if you have any further advice about what I am missing.
>>
>
>
>Can you look at the values of StreamOut and StreamErr in the classad
>of the held job in the schedd? I'm guessing they're either missing or
>set to the string "False". They need to set to False (no quotes). I'll
>bet your JobHelper class isn't handling these attributes correctly.
>
>Thanks and regards,
>Jaime Frey
>UW-Madison Condor Team
>
Dear Jamie,
Thanks again for your help. I think that part of my problem is
definitely related to how I am parsing the JDL file in my class
JobHelper.java
I discovered that all the attributes of the JDL were being
interpreted as Strings, for reasons I will explain below. Early on, I
discovered that if I submitted a JDL to the Web Services interface
which had worked with command-line submission, I got a
Java.text.ParseException from the parser here (lines 66 and 67 of
JobHelper.java):
Ad jobad = new Ad(); // This is an org.glite.jdl.Ad. That's all I know
about it.
jobad.fromFile(file);
with messages like this:
Unable to parse: Doesn't seem to be a valid Expression
at org.glite.jdl.Ad.fromString(Ad.java:497)
at org.glite.jdl.Ad.fromFile(Ad.java:433)
at birdbath.JobHelper.getJobAttrFromJDL(JobHelper.java:67)
<snip>
To avoid these errors, I followed my predecessor in making some changes
to the JDL:
Terminate every line with a semicolon
Wrap quotes around every value
eg. foo = bar becomes foo = "bar"; and out =
out.$(Cluster).$(Process) becomes out = "$(Cluster).$(Process)";
Add a line InputSandbox = {*} where * is the full path to the JDL, a
comma, and a full path to the executable.
Change the variable StreamOutput to StreamOut, and StreamError to
StreamErr
I don't understand the need for many of these changes, but they
appeared to work. However, the quotes around every value cause it to
be interpreted as a STRING-ATTR not a BOOLEAN-ATTR or EXPRESSION-ATTR
or whatever. I'm having trouble debugging this because it uses various
classes in the condor package, and I don't know of any detailed
documentation for that package. Right now, JobHelper.java is checking
for variables with values like "TRUE" and "false" and treating them as
booleans when it creates a condor.ClassAdStructAttr to represent that
line. This code appears to work for booleans: I see the correct line
StreamOut = FALSE; in condor_q -l where I used to see StreamOut =
"False";
Status
=-=-=-=
Right now, all lines of the JDL are being interpreted as either
Booleans or Strings. I can get jobs to run, but not to complete. If I
manually change the owner of the spool/cluster1234.proc0.subproc0
folder and its contents from root to myself, the job runs on the grid
then goes into state C (completed?). The output never gets staged
over, and the job never terminates. Alternately, they fail to run and
halt with HoldReason = "Failed to get expiration time of proxy". If I
change the owner of the contents of the
spool/cluster1234.proc0.subproc0 folder from root to myself, the job
halts with HoldReason = "Globus error: Staging error for RSL element
fileStageOut." I have noticed that, in either case, some attributes
like GridJobId, GlobusSubmitId, GridftpUrlBase, and WallClockCheckpoint
are being left UNDEFINED. When I submit by command line, they are
either absent (the last three) or set to a specific value (GlobusJobId
= "babargt4.phys.uvic.ca#12114125215125#5330.0" or similar)
I have attached my class JobHelper.java. Could you look at the
parser code in the getJobAttrFromJDL () method and tell me if I am
doing anything wrong?
Regards,
Sean Manning
/*
* jobHelper.java
*
* Created on November 29, 2007, 2:23 PM
*
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
*/
package birdbath;
import condor.ClassAdStructAttr;
import condor.Status; // SM Not used
import condor.StatusCode;
import java.rmi.RemoteException;
import java.util.ArrayList;
import java.util.List;
import org.glite.jdl.JobAd;
import org.glite.jdl.*;
import condor.classad.*;
import java.util.Iterator;
import java.util.Vector;
import condor.ClassAdStructAttr; // SM Not used
import condor.ClassAdAttrType;
/**
*
* @author David Gong
*/
public class JobHelper {
public static String[] attrTypeString = {"ERROR-ATTR", "EXPRESSION-ATTR", "BOOLEAN-ATTR", "INTEGER-ATTR",
"FLOAT-ATTR", "STRING-ATTR", "ERROR-ATTR", "UNDEFINED-ATTR"};
private ClassAdStructAttr[] jobAttr;
private Vector<ClassAdStructAttr> jobAttrVect;
private Expr owner;
private Expr jobUniverse;
private Expr command;
private Expr arguments;
private Expr requirements;
private Expr inputSandbox;
public JobHelper(){
}
/** Creates a new instance of jobHelper */
public JobHelper(String file) throws Exception{
getJobAttrFromJDL(file);
}
public ClassAdStructAttr[] getJobAttrFromJDL(String file)throws NoSuchFieldException, Exception
{
/* See pp. 177ff of the paper manual for details of Condor ClassAds */
String[] attrTypeString = {"ERROR-ATTR", "EXPRESSION-ATTR", "BOOLEAN-ATTR", "INTEGER-ATTR",
"FLOAT-ATTR", "STRING-ATTR", "ERROR-ATTR", "UNDEFINED-ATTR"};
// ERROR AND UNDEFINED NEED TO BE CONFIRMED- DG
Vector<ClassAdStructAttr> result = new Vector <ClassAdStructAttr> ();
Ad jobad = new Ad(); // SM This is an org.glite.jdl.Ad. That's all I know about it.
// Vector <ClassAdStructAttr> myResult = new Vector <ClassAdStructAttr> ();
jobad.fromFile(file);
owner = jobad.lookup("Owner");
jobUniverse = jobad.lookup("JobUniverse");
command = jobad.lookup("Executable");
arguments = jobad.lookup("Arguments");
requirements = jobad.lookup("Requirements");
inputSandbox = jobad.lookup("InputSandbox");
/*
try{
jobad.delAttribute("Owner");
System.out.println("Owner is deleted");
jobad.delAttribute("JobUniverse");
System.out.println("JobUniverse is deleted");
jobad.delAttribute("Executable");
System.out.println("Executable is deleted");
jobad.delAttribute("Arguments");
System.out.println("Arguments is deleted");
jobad.delAttribute("Requirements");
System.out.println("Requirements is deleted");
jobad.delAttribute("InputSandbox");
System.out.println("InputSandbox is deleted");
}
catch(Exception err){;}
*/
String test = "$(foo).$(bar)";
if (!containsVariables (test)) {
System.out.println ("Error in containsVariables ... choose a better regex");
}
Iterator it = jobad.attributes();
AttrName temp = null;
int iAttrType = 0;
ClassAdStructAttr currentAttr = null;
while (it.hasNext()) {
temp = (AttrName) it.next(); // a condor.classad.AttrName from classad.jar
iAttrType = jobad.getType(temp.rawString()); // Attribute type as int
// SM Added this line
System.out.print ("Type of \"" + temp.toString () + "\" is " + attrTypeString[iAttrType] + "; ");
Expr tempV = jobad.lookup(temp.toString());
String val = null;
String raw = temp.rawString();
/*
* Get the "value" attribute of the attribute in one of two ways.
*/
if (tempV instanceof ListExpr) { // SM What is a ListExpr?
val = jobad.lookup (temp.rawString()).toString();
}
else {
// java.lang.ArithmeticException: boolean false in string context thrown here
// when I remove quotes around a boolean value in the JDL
val = jobad.lookup(temp.rawString()).stringValue();
}
/*
* Create the ClassAdStructAttr
*/
/*
* The parser wrongly reads many attributes of boolean type as strings.
* This causes obvious difficulties; for example, StreamOut (which must
* be explicitly set to FALSE for the code to work) cannot be set because
* it takes a boolean value not a string.
*
* This clause corrects the problem BY ASSUMING THAT 'TRUE' AND 'FALSE'
* ARE ALWAYS BOOLEAN VALUES, NEVER STRINGS. If you ever need to have
* a string "true" or similar, you will need to change this ... perhaps
* letting name determine type.
*/
// TODO Change interpretation of floats and integers too?
if (val.equalsIgnoreCase ("true") || val.equalsIgnoreCase("false")) {
System.out.print ("changed type to BOOLEAN_ATTR; ");
currentAttr = new ClassAdStructAttr (temp.rawString (), // Name
ClassAdAttrType.fromString("BOOLEAN-ATTR"), // Type
val); // Value
}
else if (isNonNegativeInt (val)) {;} // TODO Implement
else if (containsVariables (val)) { // TODO Choose whether to keep
System.out.print("changed type to EXPRESSION_ATTR; ");
currentAttr = new ClassAdStructAttr (temp.rawString (), // Name
ClassAdAttrType.fromString("EXPRESSION-ATTR"), // Type
val); // Value
}
else {
currentAttr = new ClassAdStructAttr (temp.rawString(), // Name
ClassAdAttrType.fromString(attrTypeString[iAttrType]), // Type
val); // Value
}
// SM Added this line.
System.out.println ("value is \"" + currentAttr.getValue () + "\"");
result.add (currentAttr);
}
jobAttrVect = result;
return jobAttr = (ClassAdStructAttr[]) result.toArray(new ClassAdStructAttr[0]);
}
public static boolean isNonNegativeInt (String s) {
/* A non-negative integer consists of 0,
* or one digit from 1-9 followed by zero or more digits,
* with zero or more whitespace characters before and after it.
*/
String intFormat = "\\s*[1-9]\\d+\\s*";
String zero = "\\s*0\\s*";
if (s.matches (intFormat)) {;}
else if (s.matches (zero)) {;}
return false;
}
public static boolean containsVariables (String s) {
// A variable consists of $(*), where * is one or more characters
// It has zero or more other characters before or after it
String varFormat = ".*\\$\\(.+\\).*";
if (s.matches (varFormat)) {return true;}
else {return false;}
}
public Vector<ClassAdStructAttr> getJobAttrVector(){
return jobAttrVect;
}
public static ClassAdStructAttr createStringAttr(String attrName, String attrValue){
return new ClassAdStructAttr(attrName, ClassAdAttrType.fromString(attrTypeString[5]), attrValue );
}
public Expr getOwner(){
return owner;
}
public Expr getJobUniverse(){
return jobUniverse;
}
public Expr getCommand(){
return command;
}
public Expr getRequirements(){
return requirements;
}
public Expr getArguments(){
return arguments;
}
public Expr getInputSandboxExpr(){
return inputSandbox;
}
public String[] getStageInFiles(){
Vector<String> retVal = new Vector<String>();
if (inputSandbox instanceof ListExpr){
Iterator it = ((ListExpr) inputSandbox).iterator();
Expr tmp = null;
while (it.hasNext()){
tmp = (Expr)it.next();
retVal.add(tmp.stringValue());
}
}
else{
retVal.add(inputSandbox.toString());
}
return (String[]) retVal.toArray(new String[0]);
}
public Vector<String> getStageInFilesVector(){
Vector<String> retVal = new Vector<String>();
if (inputSandbox instanceof ListExpr) {
Iterator it = ((ListExpr) inputSandbox).iterator();
Expr tmp = null;
while (it.hasNext()){
tmp = (Expr)it.next();
retVal.add(tmp.stringValue());
}
}
else {
retVal.add(inputSandbox.toString());
}
return retVal;
}
public void addProxy()throws Exception{
}
public static void main(String[] args) throws Exception{
JobHelper my = new JobHelper("/hepuser/seangwm/workspace_ganymede/CondorWSProjectRon/src/supportfiles/testjdl-gt4");
String[] files = my.getStageInFiles();
return;
}
public ClassAdStructAttr[] getJobAttr(){
return jobAttr;
}
}