Re: [classad-users] [classad users] Windows path problem with Java package


Date: Sat, 19 Oct 2002 10:03:23 -0500 (CDT)
From: Marvin Solomon <solomon@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [classad-users] [classad users] Windows path problem with Java package
The key to understanding this example is the distinction between a classad
Expr, which is an internal data structure that may contain strings,
and the *external representation* of that classad as a string.  The methods
to convert between these two forms are Expr.toString() and
ClassAdParser.parse().  Internally, a Java String (and hence a classad Expr
containing a string) can contain arbitrary unicode characters.  But to display
the string in a form that can be read and parsed, certain transformations need
to be made in toString(), including:

    1.  Surrounding the value with quotes so you can see where it starts and
        ends.
    2.  Preceding any embedded quotes with a backslash so that they are not
        mistaken for the quote added in step 1.
    3.  Preceding each embedded backslash with a backslash so that it is not
        mistaken for one of the backslashes added in step 2.

Parsing is just the inverse of this process: The initial and final quotes
are discarded, and backslash-escapes are processed:  \n becomes a newline,
\" becomes a single quote that does not terminate the string, \\ becomes
a single backslash, etc.  Note that the Java compiler does the same parsing
when it encounters a string constant in the program.

Now consider your example, step by step:

>   String attrValue = "C:\\Dir\\file.exe" ;

The value of attrValue is now a string containing two backslashes.

>   String classAd =  "[ " + attr1 + "= \"" + attrValue +"\"]"  ;

The value of classAd is now also a String containing just two backslashes:

    [ First= "C:\Dir\file.exe"]

>   ClassAdParser cp = new   ClassAdParser(  classAd ) ;
>   RecordExpr re   = (RecordExpr) cp.parse() ;

In the process of parsing this classad, the ClassAdParser encounters

    "C:\Dir\file.exe"

and parses it.  The escape \f turns into a formfeed character: '\u000C'.
The escape \D is not a valid escape sequence.  The parser copes with this
by simply ignoring the backslash.  (The Java compiler signals a fatal error
in similar circumstances).  The resulting value of re is an Expr containing
an attribute First whose value is a string containing a formfeed and no
backslashes.

>   re.insertAttribute   ( attr2 , Constant.getInstance(attrValue)   ) ;

This adds an attribute to re with name Second.  The value of the attribute
is the same as the value of your variable attrValue.  As we saw above, it
has two backslashes in it.

>   System.out.println( "***   ClassAd:\n" + re ) ;

To convert re to a string, the steps 1-3 are applied to each of the
attributes.  The result is

    [ First = "C:Dir\file.exe"; Second = "C:\\Dir\\file.exe" ]

Note that if you supplied this to a Java compiler:

    String First = "C:Dir\file.exe"; String Second = "C:\\Dir\\file.exe";

the value of Second would contain two backslashes (not four) and the
value of First would contain a formfeed.

>   Constant co1 = (Constant) re.lookup( attr1 ) ;
>   System.out.println(  "***   Retrieve 1:\n" +  co1.stringValue()    ) ;
>   Constant co2 = (Constant) re.lookup( attr2 ) ;
>   System.out.println(  "***   Retrieve 2:\n" +  co2.stringValue()    ) ;

The result contains a non-printing character.  If you run it through "cat -v",
you will see

    ***   Retrieve 1: C:Dir^Lile.exe
    ***   Retrieve 2: C:\Dir\file.exe

The expression co1.stringValue() has type String, so the "+" operator
simply concatenates the strings, and System.out.println prints the result,
with no escape processing.  The formfeed is simply sent to your display,
which treats it line a linefeed:  It goes to the next line without returning
to the left margin.  Cat -v displays it as ^L

To summarize, there is no bug here except perhaps for the classad parser
being more forgiving than it should be.

The simple rule is that backslashes in quoted string literals (whether in
your Java program or in strings to be parsed as classads) need to be doubled.
Internal strings (whether the value of a Java String variable or a component
of an Expr) can contain arbitrary characters.

Marvin Solomon                  Professor
Computer Sciences Department    University of Wisconsin
1210 W. Dayton St.              Madison WI, 53706-1685, USA
(608) 263-2844                  solomon@xxxxxxxxxxx 
http://www.cs.wisc.edu/~solomon



Alessandro Maraschini wrote:
> 
> Hi all,
> 
> I'm working with Java classad.jar package and I found some problem while using the windows backslash separator,can you please help me?
> 
> Here follows a simple test program:
> 
>     String attr1 = "First" ;
>     String attr2 = "Second" ;
>     String attrValue = "C:\\Dir\\file.exe" ;
> 
>     // Creating a RecordExpr from parser
>     String classAd =  "[ " + attr1 + "= \"" + attrValue +"\"]"  ;
>     ClassAdParser cp = new   ClassAdParser(  classAd ) ;
>     System.out.println( "***   Parse:\n"  + classAd  ) ;
>     RecordExpr re   = (RecordExpr) cp.parse() ;
> 
>     //  insertAttribute
>     System.out.println( "***   Insert:\n"  + attrValue ) ;
>     re.insertAttribute   ( attr2 , Constant.getInstance(attrValue)   ) ;
> 
>     // Print ClassAd
>     System.out.println( "***   ClassAd:\n" + re ) ;
> 
>     // retrieve info
>     Constant co1 = (Constant) re.lookup( attr1 ) ;
>     System.out.println(  "***   Retrieve 1:\n" +  co1.stringValue()    ) ;
>     Constant co2 = (Constant) re.lookup( attr2 ) ;
>     System.out.println(  "***   Retrieve 2:\n" +  co2.stringValue()    ) ;
> 
> and here's the standard output:
> 
> ***   Parse:
> [ First= "C:\Dir\file.exe"]
> ***   Insert:
> C:\Dir\file.exe
> ***   ClassAd:
> [ First = "C:Dir\file.exe"; Second = "C:\\Dir\\file.exe" ]
> ***   Retrieve 1:
> C:Dir
>      ile.exe
> ***   Retrieve 2:
> C:\Dir\file.exe
> 
> There are two different problems:
> 
> 1) If I use the ClassAdParser to parse a String and create a RecordExpr then I lose my information on some backslash, and
> 
> moreover when I retrieve this value it interpretates "survival" backslashes as Java special caracther '\n'  '\f' '\t' and so on
> 
> 3) If I use the insertAttribute it works well when I get back the value but when I print the created RecordExpr with the toString() method it prints two backslashes
> 
> I guess maybe the two problems are connected someway...
> 
> thanks,
> 
> Alessandro
> 
> --
> =============================================================
> Alessandro Maraschini
> GRID R&D Group
> Defence, Space & Environment Division
> DATAMAT S.p.A.
> Via Laurentina, 760 -I- 00143 Rome - Italy
> http://www.datamat.it
> mailto:alessandro.maraschini@xxxxxxxxxx
> Phone: +39 06 5027 4501 (direct) +39 06 5027 4570 (secretary)
> Fax: +39 06 5027 4500
> =============================================================
> 
>
Condor Classads Info:
http://www.cs.wisc.edu/condor/classad/




[← Prev in Thread] Current Thread [Next in Thread→]