jdk/test/com/sun/jdi/UTF8Test.java - platform/libcore - Git at Google

 /*
  * Copyright (c) 2004, 2015, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 only, as
  * published by the Free Software Foundation.
  *
  * This code is distributed in the hope that it will be useful, but WITHOUT
  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
  * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
  * version 2 for more details (a copy is included in the LICENSE file that
  * accompanied this code).
  *
  * You should have received a copy of the GNU General Public License version
  * 2 along with this work; if not, write to the Free Software Foundation,
  * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
  *
  * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
  * or visit www.oracle.com if you need additional information or have any
  * questions.
  */

 /**
  * @test
  * @bug 5033550
  * @summary  JDWP back end uses modified UTF-8
  * @author jjh
  *
  * @run build TestScaffold VMConnection TargetListener TargetAdapter
  * @run compile -g UTF8Test.java
  * @run driver UTF8Test
  */

 /*
   There is UTF-8 and there is modified UTF-8, which I will call M-UTF-8.
   The two differ in the representation of binary 0, and
   in some other more esoteric representations.
   See
       http://java.sun.com/developer/technicalArticles/Intl/Supplementary/#Modified_UTF-8
       http://java.sun.com/javase/6/docs/technotes/guides/jni/spec/types.html#wp16542

   All the following are observations of the treatment
   of binary 0.  In UTF-8, this represented as one byte:
       0x00

   while in modified UTF-8, it is represented as two bytes
       0xc0 0x80

   ** I haven't investigated if the other differences between UTF-8 and
      M-UTF-8 are handled in the same way.

  Here is how these our handled in our BE, JDWP, and FE:

  - Strings in .class files are M-UTF-8.

  - To get the value of a string object from the VM, our BE calls
       char * utf = JNI_FUNC_PTR(env,GetStringUTFChars)(env, string, NULL);
    which returns M-UTF-8.

 - To create a string object in the VM, our BE VirtualMachine.createString() calls
       string = JNI_FUNC_PTR(env,NewStringUTF)(env, cstring);
       This function expects the string to be M-UTF-8
       BUG:  If the string came from JDWP, then it is actually UTF-8

 - I haven't investigated strings in JVMTI.

 - The JDWP spec says that strings are UTF-8.  The intro
   says this for all strings, and the createString command and
   the StringRefernce.value command say it explicitly.

 - Our FE java writes strings to JDWP as UTF-8.

 - BE function outStream_writeString uses strlen meaning
   it expects no 0 bytes, meaning that it expects M-UTF-8
   This function writes the byte length and then calls
   outStream.c::writeBytes which just writes the bytes to JDWP as is.

   BUG: If such a string came from the VM via JNI, it is actually
        M-UTF-8
   FIX:  - scan string to see if contains an M-UTF-8 char.
           if yes,
              - call String(bytes, 0, len, "UTF8")
                to get a java string.  Will this work -ie, the
                input is M-UTF-8 instead of real UTF-8
              - call some java method (NOT JNI which
                would just come back with M-UTF-8)
                on the String to get real UTF-8


 - The JDWP StringReference.value command does reads a string
   from the BE out of the JDWP stream and does this to
   createe a Java String for it (see PacketStream.readString):
          String readString() {
           String ret;
           int len = readInt();

           try {
               ret = new String(pkt.data, inCursor, len, "UTF8");
           } catch(java.io.UnsupportedEncodingException e) {

   This String ctor converts _both- the M-UTF-8 0xc0 0x80
   and UTF-8 0x00  into a Java char containing 0x0000

   Does it do this for the other differences too?

 Summary:
 1.  JDWP says strings are UTF-8.
     We interpret this to mean standard UTF-8.

 2.  JVMTI will be changed to match JNI saying that strings
     are M-UTF-8.

 3.  The BE gets UTF-8 strings off JDWP and must convert them to
     M-UTF-8 before giving it to JVMTI or JNI.

 4.  The BE gets M-UTF-8 strings from JNI and JVMTI and
     must convert them to UTF-8 when writing to JDWP.


  Here is how the supplementals are represented in java Strings.
  This from java.lang.Character doc:
     The Java 2 platform uses the UTF-16 representation in char arrays and
     in the String and StringBuffer classes. In this representation,
     supplementary characters are represented as a pair of char values,
     the first from the high-surrogates range, (\uD800-\uDBFF), the second
     from the low-surrogates range (\uDC00-\uDFFF).
   See utf8.txt


 ----

 NSK Packet.java in the nsk/share/jdwp framework does this to write
 a string to JDWP:
  public void addString(String value) {
         final int count = JDWP.TypeSize.INT + value.length();
         addInt(value.length());
         try {
             addBytes(value.getBytes("UTF-8"), 0, value.length());
         } catch (UnsupportedEncodingException e) {
             throw new Failure("Unsupported UTF-8 ecnoding while adding string value to JDWP packet:\n\t"
                                 + e);
         }
     }
  ?? Does this get the standard UTF-8?  I would expect so.

 and the readString method does this:
         for (int i = 0; i < len; i++)
             s[i] = getByte();

         try {
             return new String(s, "UTF-8");
         } catch (UnsupportedEncodingException e) {
             throw new Failure("Unsupported UTF-8 ecnoding while extracting string value from JDWP packet:\n\t"
                                 + e);
         }
 Thus, this won't notice the modified UTF-8 coming in from JDWP .


 */

 import com.sun.jdi.*;
 import com.sun.jdi.event.*;
 import com.sun.jdi.request.*;
 import java.io.UnsupportedEncodingException;
 import java.util.*;

     /********** target program **********/

 /*
  * The debuggee has a few Strings the debugger reads via JDI
  */
 class UTF8Targ {
     static String[] vals = new String[] {"xx\u0000yy",           // standard UTF-8 0
                                          "xx\ud800\udc00yy",     // first supplementary
                                          "xx\udbff\udfffyy"      // last supplementary
                                          // d800 = 1101 1000 0000 0000   dc00 = 1101 1100 0000 0000
                                          // dbff = 1101 1011 1111 1111   dfff = 1101 1111 1111 1111
     };

     static String aField;

     public static void main(String[] args){
         System.out.println("Howdy!");
         gus();
         System.out.println("Goodbye from UTF8Targ!");
     }
     static void gus() {
     }
 }

     /********** test program **********/

 public class UTF8Test extends TestScaffold {
     ClassType targetClass;
     ThreadReference mainThread;
     Field targetField;
     UTF8Test (String args[]) {
         super(args);
     }

     public static void main(String[] args)      throws Exception {
         new UTF8Test(args).startTests();
     }

     /********** test core **********/

     protected void runTests() throws Exception {
         /*
          * Get to the top of main()
          * to determine targetClass and mainThread
          */
         BreakpointEvent bpe = startToMain("UTF8Targ");
         targetClass = (ClassType)bpe.location().declaringType();
         targetField = targetClass.fieldByName("aField");

         ArrayReference targetVals = (ArrayReference)targetClass.getValue(targetClass.fieldByName("vals"));

         /* For each string in the debuggee's 'val' array, verify that we can
          * read that value via JDI.
          */

         for (int ii = 0; ii < UTF8Targ.vals.length; ii++) {
             StringReference val = (StringReference)targetVals.getValue(ii);
             String valStr = val.value();

             /*
              * Verify that we can read a value correctly.
              * We read it via JDI, and access it directly from the static
              * var in the debuggee class.
              */
             if (!valStr.equals(UTF8Targ.vals[ii]) ||
                 valStr.length() != UTF8Targ.vals[ii].length()) {
                 failure("     FAILED: Expected /" + printIt(UTF8Targ.vals[ii]) +
                         "/, but got /" + printIt(valStr) + "/, length = " + valStr.length());
             }
         }

         /* Test 'all' unicode chars - send them to the debuggee via JDI
          * and then read them back.
          */
         doFancyVersion();

         resumeTo("UTF8Targ", "gus", "()V");
         try {
             Thread.sleep(1000);
         } catch (InterruptedException ee) {
         }


         /*
          * resume the target listening for events
          */

         listenUntilVMDisconnect();

         /*
          * deal with results of test
          * if anything has called failure("foo") testFailed will be true
          */
         if (!testFailed) {
             println("UTF8Test: passed");
         } else {
             throw new Exception("UTF8Test: failed");
         }
     }

     /**
      * For each unicode value, send a string containing
      * it to the debuggee via JDI, read it back via JDI, and see if
      * we get the same value.
      */
     void doFancyVersion() throws Exception {
         // This does 4 chars at a time just to save time.
         for (int ii = Character.MIN_CODE_POINT;
              ii < Character.MIN_SUPPLEMENTARY_CODE_POINT;
              ii += 4) {
             // Skip the surrogates
             if (ii == Character.MIN_SURROGATE) {
                 ii = Character.MAX_SURROGATE - 3;
                 break;
             }
             doFancyTest(ii, ii + 1, ii + 2, ii + 3);
         }

         // Do the supplemental chars.
         for (int ii = Character.MIN_SUPPLEMENTARY_CODE_POINT;
              ii <= Character.MAX_CODE_POINT;
              ii += 2000) {
             // Too many of these so just do a few
             doFancyTest(ii, ii + 1, ii + 2, ii + 3);
         }

     }

     void doFancyTest(int ... args) throws Exception {
         String ss = new String(args, 0, 4);
         targetClass.setValue(targetField, vm().mirrorOf(ss));

         StringReference returnedVal = (StringReference)targetClass.getValue(targetField);
         String returnedStr = returnedVal.value();

         if (!ss.equals(returnedStr)) {
             failure("Set: FAILED: Expected /" + printIt(ss) +
                     "/, but got /" + printIt(returnedStr) + "/, length = " + returnedStr.length());
         }
     }

     /**
      * Return a String containing binary representations of
      * the chars in a String.
      */
      String printIt(String arg) {
         char[] carray = arg.toCharArray();
         StringBuffer bb = new StringBuffer(arg.length() * 5);
         for (int ii = 0; ii < arg.length(); ii++) {
             int ccc = arg.charAt(ii);
             bb.append(String.format("%1$04x ", ccc));
         }
         return bb.toString();
     }

     String printIt1(String arg) {
         byte[] barray = null;
         try {
              barray = arg.getBytes("UTF-8");
         } catch (UnsupportedEncodingException ee) {
         }
         StringBuffer bb = new StringBuffer(barray.length * 3);
         for (int ii = 0; ii < barray.length; ii++) {
             bb.append(String.format("%1$02x ", barray[ii]));
         }
         return bb.toString();
     }

 }
	/*
	* Copyright (c) 2004, 2015, Oracle and/or its affiliates. All rights reserved.
	* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
	*
	* This code is free software; you can redistribute it and/or modify it
	* under the terms of the GNU General Public License version 2 only, as
	* published by the Free Software Foundation.
	*
	* This code is distributed in the hope that it will be useful, but WITHOUT
	* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
	* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
	* version 2 for more details (a copy is included in the LICENSE file that
	* accompanied this code).
	*
	* You should have received a copy of the GNU General Public License version
	* 2 along with this work; if not, write to the Free Software Foundation,
	* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
	*
	* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
	* or visit www.oracle.com if you need additional information or have any
	* questions.
	*/

	/**
	* @test
	* @bug 5033550
	* @summary JDWP back end uses modified UTF-8
	* @author jjh
	*
	* @run build TestScaffold VMConnection TargetListener TargetAdapter
	* @run compile -g UTF8Test.java
	* @run driver UTF8Test
	*/

	/*
	There is UTF-8 and there is modified UTF-8, which I will call M-UTF-8.
	The two differ in the representation of binary 0, and
	in some other more esoteric representations.
	See
	http://java.sun.com/developer/technicalArticles/Intl/Supplementary/#Modified_UTF-8
	http://java.sun.com/javase/6/docs/technotes/guides/jni/spec/types.html#wp16542

	All the following are observations of the treatment
	of binary 0. In UTF-8, this represented as one byte:
	0x00

	while in modified UTF-8, it is represented as two bytes
	0xc0 0x80

	** I haven't investigated if the other differences between UTF-8 and
	M-UTF-8 are handled in the same way.

	Here is how these our handled in our BE, JDWP, and FE:

	- Strings in .class files are M-UTF-8.

	- To get the value of a string object from the VM, our BE calls
	char * utf = JNI_FUNC_PTR(env,GetStringUTFChars)(env, string, NULL);
	which returns M-UTF-8.

	- To create a string object in the VM, our BE VirtualMachine.createString() calls
	string = JNI_FUNC_PTR(env,NewStringUTF)(env, cstring);
	This function expects the string to be M-UTF-8
	BUG: If the string came from JDWP, then it is actually UTF-8

	- I haven't investigated strings in JVMTI.

	- The JDWP spec says that strings are UTF-8. The intro
	says this for all strings, and the createString command and
	the StringRefernce.value command say it explicitly.

	- Our FE java writes strings to JDWP as UTF-8.

	- BE function outStream_writeString uses strlen meaning
	it expects no 0 bytes, meaning that it expects M-UTF-8
	This function writes the byte length and then calls
	outStream.c::writeBytes which just writes the bytes to JDWP as is.

	BUG: If such a string came from the VM via JNI, it is actually
	M-UTF-8
	FIX: - scan string to see if contains an M-UTF-8 char.
	if yes,
	- call String(bytes, 0, len, "UTF8")
	to get a java string. Will this work -ie, the
	input is M-UTF-8 instead of real UTF-8
	- call some java method (NOT JNI which
	would just come back with M-UTF-8)
	on the String to get real UTF-8


	- The JDWP StringReference.value command does reads a string
	from the BE out of the JDWP stream and does this to
	createe a Java String for it (see PacketStream.readString):
	String readString() {
	String ret;
	int len = readInt();

	try {
	ret = new String(pkt.data, inCursor, len, "UTF8");
	} catch(java.io.UnsupportedEncodingException e) {

	This String ctor converts _both- the M-UTF-8 0xc0 0x80
	and UTF-8 0x00 into a Java char containing 0x0000

	Does it do this for the other differences too?

	Summary:
	1. JDWP says strings are UTF-8.
	We interpret this to mean standard UTF-8.

	2. JVMTI will be changed to match JNI saying that strings
	are M-UTF-8.

	3. The BE gets UTF-8 strings off JDWP and must convert them to
	M-UTF-8 before giving it to JVMTI or JNI.

	4. The BE gets M-UTF-8 strings from JNI and JVMTI and
	must convert them to UTF-8 when writing to JDWP.


	Here is how the supplementals are represented in java Strings.
	This from java.lang.Character doc:
	The Java 2 platform uses the UTF-16 representation in char arrays and
	in the String and StringBuffer classes. In this representation,
	supplementary characters are represented as a pair of char values,
	the first from the high-surrogates range, (\uD800-\uDBFF), the second
	from the low-surrogates range (\uDC00-\uDFFF).
	See utf8.txt


	----

	NSK Packet.java in the nsk/share/jdwp framework does this to write
	a string to JDWP:
	public void addString(String value) {
	final int count = JDWP.TypeSize.INT + value.length();
	addInt(value.length());
	try {
	addBytes(value.getBytes("UTF-8"), 0, value.length());
	} catch (UnsupportedEncodingException e) {
	throw new Failure("Unsupported UTF-8 ecnoding while adding string value to JDWP packet:\n\t"
	+ e);
	}
	}
	?? Does this get the standard UTF-8? I would expect so.

	and the readString method does this:
	for (int i = 0; i < len; i++)
	s[i] = getByte();

	try {
	return new String(s, "UTF-8");
	} catch (UnsupportedEncodingException e) {
	throw new Failure("Unsupported UTF-8 ecnoding while extracting string value from JDWP packet:\n\t"
	+ e);
	}
	Thus, this won't notice the modified UTF-8 coming in from JDWP .


	*/

	import com.sun.jdi.*;
	import com.sun.jdi.event.*;
	import com.sun.jdi.request.*;
	import java.io.UnsupportedEncodingException;
	import java.util.*;

	/******** target program ********/

	/*
	* The debuggee has a few Strings the debugger reads via JDI
	*/
	class UTF8Targ {
	static String[] vals = new String[] {"xx\u0000yy", // standard UTF-8 0
	"xx\ud800\udc00yy", // first supplementary
	"xx\udbff\udfffyy" // last supplementary
	// d800 = 1101 1000 0000 0000 dc00 = 1101 1100 0000 0000
	// dbff = 1101 1011 1111 1111 dfff = 1101 1111 1111 1111
	};

	static String aField;

	public static void main(String[] args){
	System.out.println("Howdy!");
	gus();
	System.out.println("Goodbye from UTF8Targ!");
	}
	static void gus() {
	}
	}

	/******** test program ********/

	public class UTF8Test extends TestScaffold {
	ClassType targetClass;
	ThreadReference mainThread;
	Field targetField;
	UTF8Test (String args[]) {
	super(args);
	}

	public static void main(String[] args) throws Exception {
	new UTF8Test(args).startTests();
	}

	/******** test core ********/

	protected void runTests() throws Exception {
	/*
	* Get to the top of main()
	* to determine targetClass and mainThread
	*/
	BreakpointEvent bpe = startToMain("UTF8Targ");
	targetClass = (ClassType)bpe.location().declaringType();
	targetField = targetClass.fieldByName("aField");

	ArrayReference targetVals = (ArrayReference)targetClass.getValue(targetClass.fieldByName("vals"));

	/* For each string in the debuggee's 'val' array, verify that we can
	* read that value via JDI.
	*/

	for (int ii = 0; ii < UTF8Targ.vals.length; ii++) {
	StringReference val = (StringReference)targetVals.getValue(ii);
	String valStr = val.value();

	/*
	* Verify that we can read a value correctly.
	* We read it via JDI, and access it directly from the static
	* var in the debuggee class.
	*/
	if (!valStr.equals(UTF8Targ.vals[ii]) \|\|
	valStr.length() != UTF8Targ.vals[ii].length()) {
	failure(" FAILED: Expected /" + printIt(UTF8Targ.vals[ii]) +
	"/, but got /" + printIt(valStr) + "/, length = " + valStr.length());
	}
	}

	/* Test 'all' unicode chars - send them to the debuggee via JDI
	* and then read them back.
	*/
	doFancyVersion();

	resumeTo("UTF8Targ", "gus", "()V");
	try {
	Thread.sleep(1000);
	} catch (InterruptedException ee) {
	}


	/*
	* resume the target listening for events
	*/

	listenUntilVMDisconnect();

	/*
	* deal with results of test
	* if anything has called failure("foo") testFailed will be true
	*/
	if (!testFailed) {
	println("UTF8Test: passed");
	} else {
	throw new Exception("UTF8Test: failed");
	}
	}

	/**
	* For each unicode value, send a string containing
	* it to the debuggee via JDI, read it back via JDI, and see if
	* we get the same value.
	*/
	void doFancyVersion() throws Exception {
	// This does 4 chars at a time just to save time.
	for (int ii = Character.MIN_CODE_POINT;
	ii < Character.MIN_SUPPLEMENTARY_CODE_POINT;
	ii += 4) {
	// Skip the surrogates
	if (ii == Character.MIN_SURROGATE) {
	ii = Character.MAX_SURROGATE - 3;
	break;
	}
	doFancyTest(ii, ii + 1, ii + 2, ii + 3);
	}

	// Do the supplemental chars.
	for (int ii = Character.MIN_SUPPLEMENTARY_CODE_POINT;
	ii <= Character.MAX_CODE_POINT;
	ii += 2000) {
	// Too many of these so just do a few
	doFancyTest(ii, ii + 1, ii + 2, ii + 3);
	}

	}

	void doFancyTest(int ... args) throws Exception {
	String ss = new String(args, 0, 4);
	targetClass.setValue(targetField, vm().mirrorOf(ss));

	StringReference returnedVal = (StringReference)targetClass.getValue(targetField);
	String returnedStr = returnedVal.value();

	if (!ss.equals(returnedStr)) {
	failure("Set: FAILED: Expected /" + printIt(ss) +
	"/, but got /" + printIt(returnedStr) + "/, length = " + returnedStr.length());
	}
	}

	/**
	* Return a String containing binary representations of
	* the chars in a String.
	*/
	String printIt(String arg) {
	char[] carray = arg.toCharArray();
	StringBuffer bb = new StringBuffer(arg.length() * 5);
	for (int ii = 0; ii < arg.length(); ii++) {
	int ccc = arg.charAt(ii);
	bb.append(String.format("%1$04x ", ccc));
	}
	return bb.toString();
	}

	String printIt1(String arg) {
	byte[] barray = null;
	try {
	barray = arg.getBytes("UTF-8");
	} catch (UnsupportedEncodingException ee) {
	}
	StringBuffer bb = new StringBuffer(barray.length * 3);
	for (int ii = 0; ii < barray.length; ii++) {
	bb.append(String.format("%1$02x ", barray[ii]));
	}
	return bb.toString();
	}

	}