CoSTARS Note #7:
Strings

1.  Strings

We now know the 8 primitive types in Java (int, double, boolean, char, byte, short, long, float).  We have already seen yet another type of value -- a string, or a group of characters contained in double-quotes.  For example, consider this line from our first program:
      System.out.println("Hello World");
As you know, this prints the string "Hello World".  So you have already used strings!  Here, we will learn how to store strings in variables and how to perform various operations on those strings.

To begin with, Java has a variable type called "String".  Notice that the type is capitalized, unlike the other types we have seen so far.  This signifies an important difference:  String is a class and string values are instances or objects of this class, whereas primitive types like int or boolean are not classes and their values are not objects or instances.  In the terminology introduced in the previous note, String is a reference type and not a primitive type.  Let's see some effects of this difference.

Declaring and Initializing Strings

Strings can be declared an initialized much like primitive types:
      String s1;
      String s2 = "abc";
This declares two string variables, s1 and s2, and initializes s2 so it holds the value "abc" (which is a String).  As with primitive types, you may not use a string variable before it has been declared and initialized.

Calling Instance Methods

Consider this code:
      String s1 = "abc";
      String s2 = "de";
      System.out.println(s1.length());  // prints 3
      System.out.println(s2.length());  // prints 2
This prints out the value 3, which is the length of the string "abc".  We are particularly interested in the highlighted expression:
      s1.length()
This is where we find the length of the string that variable "s1" refers to.  We do so by calling the method "length".  We have called methods before -- consider all the methods in the Math class.  If this method worked in the same manner, though, we would call it with the class name, like this:
      String.length()  // wrong!
This will not work!  Why not?  Well, which string's length should the method return -- s1, s2, some other string?  See, when you call Math.sqrt(5), there is only one possible square root of 5, and this is returned to you.  With strings, this is not the case:  the behavior of string methods depends on the actual strings involved.  For this reason, we provide the string itself, and not the String class, when calling string methods, again like this:
      String s1 = "abc";
      String s2 = "de";
      System.out.println(s1.length());  // prints 3
      System.out.println(s2.length());  // prints 2

Some vocabulary:

Instance Methods (or Member Methods, or Object Methods):  Methods that require an object.  To call these, preface the method name with the object reference.  This is usually stored in a variable, like this:
      String s = "abc";
      System.out.println(s.length());  // prints 3
But the variable is not required, and we can invoke methods on any object reference, as shown here:
      System.out.println("de".length());  // prints 2 (without a variable)

Static Methods (or Class Methods):  Methods that do not use an object.  To call these, preface the method name with the class name, as in Math.sqrt(5).

Primitives and Methods

As their name implies, instance methods require an instance of some class (that is, an object, like a string).  Primitive values like int, boolean, double, and char are not instances of classes.  And so we see that primitive values cannot be used to call instance methods.  For example, there is no method "foo" that works like this:
      int x = 3;
      x.foo();   // impossible!
This is impossible because "x" holds an int, which is a primitive value, and so we cannot ever call any instance methods on it!

Of course, we can (and often do) use methods with primitive values.  Consider the Math methods we recently covered.  We can even use primitive values with instance methods -- consider page.fillRect, for example, which takes 4 integer values (the left, top, width, and height of the rectangle).  The distinction is that primitive values can only be used as parameters to methods, and never as the objects on which methods are called.

The "null" Value

All reference types, strings included, include a special value called "null".  This value is stored in variables that can hold a string (or some other reference type), but at the moment do not hold any particular string.  As a variable must hold some value, we store the value null in the variable for the time being.

We can explicitly assign the value null into a string, and we can test whether or not a string is assigned the null value:
      String s1 = "abc";
      String s2 = null;
      System.out.println(s1 == null);  // prints false

      System.out.println(s2 == null);  // prints true

What is the length of the null string?  That is, what happens if we call s2.length() when s2 is assigned the value null?  Let's see:
      String s1 = "abc";
      String s2 = null;
      System.out.println(s1.length());  // prints 3

      System.out.println(s2.length());  // throws a NullPointerException!
Compile and run this code.  As you can see, it compiles and runs, but when it tries to take the length of the null string, the program prints an error message and exits!  Read the error message carefully.  It says "NullPointerException".  This is Java's way of telling you that it cannot find the length of a null string.  So a null string does not have a length of zero (as does the empty string, "").  Instead, the null string has no length at all, and you may not even call the length method on it.  More generally:

You cannot call any methods on the null value -- doing so will always generate a NullPointerException.

Finally, note that the null value is only assignable to reference types, like strings, and not to primitive types.  So this fails:
      int x = null; // will not work (cannot assign null to the primitive type "int")

2.  Scanning and Printing Strings

Scanner.next()

As with other types, we use the scanner to read in strings from the user.  From the pattern of nextInt, nextDouble, and so forth, we might expect to read in a string like this:
      String s = scanner.nextString(); // will not work (no such method!)
This will not work!  There is no such method as scanner.nextString.  Instead, we use scanner.next, like this:
      String s = scanner.next();
      System.out.println(s.length());
This short program reads in a string from the user and prints out its length.  Compile and run this program.  Enter "hello" (without the quotes) and watch it print out the value 5.  Run it again, and enter "go team" (again, without the quotes).  The program prints out the value 2.  What's going on?

The scanner.next() method returns the next word that the user input, where words are separated by spaces (or tabs or newlines or other similar whitespace, as these characters are called).  The scanner.next() method skips all whitespace.  So consider this program:
      String s1 = scanner.next();
      String s2 = scanner.next();
      String s3 = scanner.next();
      System.out.println(s1);
      System.out.println(s2);
      System.out.println(s3);
Compile and run it, entering "   a   bc   d   " (all on one line, without the quotes, but with the spaces).  As you can see, the program skips the spaces and reads in the strings "a", "bc", and "d".

Printing Delimiters

Actually, when we print strings out, we often cannot be sure if they include some trailing whitespace.  For example:
      String s1 = "ab";
      String s2 = "ab          ";
      System.out.println(s1);
      System.out.println(s2);
Compile and run this program.  As you can see, it appears that the two strings are both equal to "ab".  This is not true, but we cannot easily see that the second string has numerous trailing spaces.  For this reason, it is helpful to print out markers, or delimiters, around strings, like this:
      String s1 = "ab";
      String s2 = "ab          ";
      System.out.println("'" + s1 + "'"); // prints 'ab'
      System.out.println("'" + s2 + "'"); // prints 'ab          '
Now we can easily see the trailing spaces.  Here we used single-quotes as delimiters, but we could use any characters, really:
      String s1 = "ab";
      String s2 = "ab          ";
      System.out.println("[" + s1 + "]"); // prints [ab]
      System.out.println("[" + s2 + "]"); // prints [ab          ]

In any case, it is important to remember that the delimiters are used to mark the ends of the string, and are not actually part of the string.

Scanner.nextLine()

Scanner.next() satisfies our needs most of the time.  On rare occasions, however, we may actually want to preserve the whitespace that the user enters.  Since scanner.next() automatically removes this whitespace, we cannot use it in this case.  Instead, we use scanner.nextLine().  This method also returns a string, but it returns an entire line at a time, including all of the whitespace.  For example:
      String s1 = scanner.nextLine();
      System.out.println("'" + s1 + "'");
      String s2 = scanner.nextLine();
      System.out.println("'" + s2 + "'");
      String s3 = scanner.nextLine();
      System.out.println("'" + s3 + "'");
Compile and run this program, entering several lines of text with extra whitespace.  See how it preserves all the whitespace?  Run it once more, and this time be sure to enter a blank line (just hit return).  What happens?  How does scanner.nextLine() differ from scanner.next() when the user enters a blank line?

Here is a word of caution when using scanner.nextLine() along with other scanner methods.  Methods like scanner.nextInt(), scanner.nextDouble(), and even scanner.next() all stop when they reach any whitespace.  This can have an unexpected consequence when combined with scanner.nextLine().  For example:
      int x = scanner.nextInt();
      System.out.println("x = " + x);
      String s = scanner.nextLine();
      System.out.println("'" + s + "'");
Before you run this program, consider what should happen if you did run it and entered "2" (without quotes) on the first line and "abc" (again without quotes) on the second line.  Compile and run this program, and enter "2" and "abc" as described.  Did it work as you predicted?  What happened?

To understand what happened, we must recall that scanner.nextInt() stopped when it reached the newline after the "2".  This newline is still in the input buffer, which means that scanner.nextLine() sees it.  Now, scanner.nextLine() reads all characters up to a newline.  Since it finds the newline immediately, it is done!  So it returns the empty line, and the second line of input, with "abc", is never even read by the program!

Note that this problem does not exist with scanner.next, as it will skip whitespace like newlines until it finds some non-whitespace.  Let's demonstrate this:
      int x = scanner.nextInt();
      System.out.println("x = " + x);
      String s = scanner.next();         // next(), not nextLine()
      System.out.println("'" + s + "'");
Compile and run this program, and enter "2" and "abc" as described.   See how this works?

Now, what if you want the whitespace-preserving behavior of nextLine, but you also need to use nextInt to read in integers?  This can be done, but then you must include an extra call to nextLine after your nextInt call to consume the newline after the integer.  If you do this, be sure to comment it well, like this:"
      int x = scanner.nextInt();
      scanner.nextLine();  // consume the newline after the nextInt
      System.out.println("x = " + x);
      String s = scanner.nextLine();
      System.out.println("'" + s + "'");
Compile and run this program, and enter "2" and "abc" as described.  It works!   To further prove that it works, run it again, and this time include some extra whitespace on the second line, entering "    abc   " (without the quotes).  While scanner.next() would ignore this whitespace, here we see we can use scanner.nextLine() to capture the whitespace even after we read an integer using scanner.nextInt().

3.  The String Class

Java includes many more instance methods for Strings besides String.length.  Here are some that you should be familiar with.  Note that the descriptions refer to "this" string.  Which string is that?  It is the string on which the method was called.  For example:
      String s1 = "abcd";
      System.out.println(s1.length()); // prints 4
      String s2 = "efg";
      System.out.println(s2.length()); // prints 3
In the first call to the length method, "this" string is "abcd", whereas in the second call, "this" string is "efg".   We will learn more about the special Java keyword "this" when we learn to write our own classes.

Method Description
charAt Returns the character at the given index of this string, where the first index is 0 and the last index is at (length()-1).  For example:
      String s = "abcd";
      System.out.println(s.charAt(0)); // prints: a
      System.out.println(s.charAt(1)); // prints: b
      System.out.println(s.charAt(s.length()-1)); // prints: d
A common mistake is to use length() rather than (length()-1) as the last index.  This is one type of off-by-one error.  For example:
      String s = "abcd";
      System.out.println(s.charAt(s.length())); // DOES NOT WORK -- off-by-one error!
Compile and run this.  Read the error carefully.  It says StringIndexOutOfBoundsException, and even tells us that the offending index is 4 (whereas the largest legal index is 3 for the string "abcd").
compareTo Note:  In Java, we would like to compare two strings the way we compare to numbers.  That is, something like this:
Unfortunately, you cannot use relational operators (<, <=, >=, >) with strings, and while you can use equality operators (==, !=) with strings, you generally should not do so as they often do not work as expected.  Instead of these operators, you should use the compareTo method described here and the equals method described below.

This method compares "this" string to a second string (which we will call "that" string). The comparison is lexicographic, which basically means that it works the way words are sorted in the dictionary.  Returns a negative number if "this < that" -- that is, if "this" string would occur in a dictionary prior to "that" string.  Returns 0 if "this" equals "that".  And returns a positive number if "this > that".  For example:
      System.out.println("abc".compareTo("def")); // prints -3, so "abc" < "def"
      System.out.println("abc".compareTo("abc")); // prints 0 , so "abc" equals "def"
      System.out.println("def".compareTo("abc")); // prints 3 , so "def" > "abc"

The comparison uses Unicode values.  As Unicode 'Z' is less than Unicode 'a', we see that all uppercase letters occur before the lowercase letters:
      System.out.println("ABC".compareTo("abc")); // prints -32, so "ABC" < "abc"
      System.out.println("Z".compareTo("a"));     // prints -7 , so "Z"   < "a"


Also, as Unicode '9' is less than Unicode 'A', we see that all digits occur before the uppercase or lowercase letters:
      System.out.println("123".compareTo("ABC")); // prints -16, so "123" < "ABC"

To continue, we see that all whitespace characters occur before digits and uppercase or lowercase letters:
      System.out.println(" ".compareTo("123")); // prints -17, so " " < "123"

Finally, we see that punctuation and other characters occur somewhat unpredictably:
      System.out.println("%".compareTo("123")); // prints -12, so "%" < "123"
      System.out.println("~".compareTo("xyz")); // prints +6 , so "~" > "xyz"

Again, it is important to remember this rule:

Do not use relational or equality operators with strings.  Instead, use the compareTo and equals methods.
 

compareToIgnoreCase This method works the same as compareTo, except that it ignores case (as its name implies), so that uppercase letters are considered equal to lowercase letters.  For example:
      System.out.println("ABC".compareToIgnoreCase("abc")); // prints 0, so "ABC" equals "abc"
concat Returns a new string -- the result of concatenating the argument to the end of "this" string:
      System.out.println("ABC".concat("def")); // prints ABCdef
This method is not often used, as we can achieve the same result using string concatenation:
      System.out.println("ABC" + "def");       // prints ABCdef
contains Returns the boolean true if "this" string contains the argument, and false otherwise:
      System.out.println("ABC".contains("BC")); // prints true
      System.out.println("ABC".contains("CB")); // prints false
      System.out.println("ABC".contains("AC")); // prints false
In the last example, we see that "ABC" does not contain "AC" -- even though "A" and "C" are contained, they are not adjacent, so "AC" is not contained in "ABC".
endsWith Returns the boolean true if "this" string ends with the argument, and false otherwise:
      System.out.println("ABC".endsWith("BC")); // prints true
      System.out.println("ABC".endsWith("C"));  // prints true
      System.out.println("ABC".endsWith(""));   // prints true
      System.out.println("ABC".endsWith("B"));  // prints false
equals Returns the boolean true if "this" string equals the argument, and false otherwise:
      System.out.println("ABC".equals("ABC")); // prints true
      System.out.println("ABC".equals("AB"));  // prints false
      System.out.println("ABC".equals("abc")); // prints false
equalsIgnoreCase This method works the same as equals, except that it ignores case (as its name implies), so that uppercase letters are considered equal to lowercase letters.  For example:
      System.out.println("ABC".equalsIgnoreCase("abc")); // prints true
indexOf Returns the starting index where the argument first occurs in "this" string, or -1 if it does not occur.  Note that the argument can be either a char or a string.  For example, here we find where chars occur:
      System.out.println("ABC".indexOf('A'));  // prints 0
      System.out.println("ABC".indexOf('B'));  // prints 1
      System.out.println("ABC".indexOf('D'));  // prints -1
And here we find where strings occur:
      System.out.println("ABC".indexOf("BC")); // prints 1
      System.out.println("ABC".indexOf("CD")); // prints -1

Note that indexOf can take a second parameter, the fromIndex -- in this case, the method looks for the argument starting from that index (and so the method will not find the argument if it only occurs to the left of the index).  For example:
      System.out.println("ABCDBC".indexOf("BC",1)); // prints 1
      System.out.println("ABCDBC".indexOf("BC",2)); // prints 4
      System.out.println("ABCDBC".indexOf("BC",5)); // prints -1
lastIndexOf Returns the starting index where the argument last occurs in "this" string, or -1 if it does not occur.  Note that is argument can be either a char or a string.  For example, here we find where chars last occur:
      System.out.println("ABCAB".lastIndexOf('A'));  // prints 3
      System.out.println("ABCAB".lastIndexOf('B'));  // prints 4
      System.out.println("ABCAB".lastIndexOf('D'));  // prints -1
And here we find where strings last occur:
      System.out.println("ABCAB".lastIndexOf("AB")); // prints 3
      System.out.println("ABCAB".lastIndexOf("CD")); // prints -1

Note that lastIndexOf can take a second parameter, the fromIndex -- in this case, the method looks for the argument starting from that index (and so the method will not find the argument if it only occurs to the right of the index).  For example:
      System.out.println("ABCDBC".lastIndexOf("BC",4)); // prints 4
      System.out.println("ABCDBC".lastIndexOf("BC",3)); // prints 1
      System.out.println("ABCDBC".lastIndexOf("BC",0)); // prints -1
length Returns the length of the string -- that is, the number of characters in the string:
      System.out.println("abcdef".length()); // prints 6
      System.out.println("g    h".length()); // prints 6
      System.out.println("".length());       // prints 0
replace Returns a new string resulting from replacing all occurrences of its first argument with its second argument.  Note that the arguments can be either strings or chars.  For example, here we will replace one char with another:
      System.out.println("abcabc".replace('a','d')); // prints dbcdbc
And here we replace one string with another:
      System.out.println("abcabc".replace("bc","e")); // prints aeae

You cannot replace a string with a char, or a char with a string:
      System.out.println("abcabc".replace("bc",'e')); // will not compile!

Note that you can use this method to remove all occurrences of a string by replacing them with the empty string ("").
      System.out.println("abcabc".replace("bc","")); // prints aa

Finally, note that you cannot remove all occurrences of a char by replacing them with the empty char (''), because there is no such thing!
      System.out.println("abcabc".replace('c','')); // will not compile!

So, again:  Even though there is an empty string (""), there is no empty char ('').
startsWith Returns the boolean true if "this" string starts with the argument, and false otherwise:
      System.out.println("ABC".startsWith(""));   // prints true
      System.out.println("ABC".startsWith("A"));  // prints true
      System.out.println("ABC".startsWith("AB")); // prints true
      System.out.println("ABC".startsWith("BC")); // prints false
substring Returns a new string composed of characters from its first argument (the beginIndex) up to, but not including, its second argument (the endIndex).  For example:
      System.out.println("ABCD".substring(0,1));   // prints A
      System.out.println("ABCD".substring(0,2));   // prints AB
      System.out.println("ABCD".substring(1,2));   // prints B
      System.out.println("ABCD".substring(1,3));   // prints BC
A common use of substring is to find a suffix -- that is, a substring from some beginIndex until the end of the string.  Because the endIndex is not included in the string, we can use string.length() as the endIndex, as such:
      System.out.println("ABCD".substring(2,4));                 // prints CD
      System.out.println("ABCD".substring(2,"ABCD".length()));   // prints CD

Actually, suffixes are so commonly used that they have their own form of substring -- if you only include one argument, the beginIndex, then the method returns the suffix starting from this beginIndex:
      System.out.println("ABCD".substring(2));   // prints CD
toLowerCase Returns a new string where all uppercase characters are converted to lowercase, and all other characters are unchanged.  For example:
      System.out.println("ABcd123".toLowerCase());   // prints abcd123
toUpperCase Returns a new string where all uppercase characters are converted to lowercase, and all other characters are unchanged.  For example:
      System.out.println("ABcd123".toUpperCase());   // prints ABCD123
trim Returns a new string which is the same as "this" string with leading and trailing whitespace omitted (but with all other whitespace included).  For example:
      String s = "   ab   cd   ";
      System.out.println("[" + s.trim() + "]"); // prints [ab   cd]

4.  Converting Types with Strings

If we are not too particular about formatting, then it is easy to convert any type to a string -- just use string concatenation with the empty string (though there are other ways to do this that may be more elegant, this way works just fine and is undeniably easier).  For example, this code converts several types to strings:
      int i = 32;
      double d = 3.14;
      boolean b = true;
      char c = 'Q';
      String si = "" + i;  // convert int to string
      String sd = "" + d;  // convert double to string
      String sb = "" + b;  // convert boolean to string
      String sc = "" + c;  // convert char to string
      System.out.println(si.equals("32"));   // prints true
      System.out.println(sd.equals("3.14")); // ditto
      System.out.println(sb.equals("true")); // ditto
      System.out.println(sc.equals("Q"));    // ditto

The other direction is slightly more complicated.  For example, say we have a string that could be viewed as an integer.  Say, the string "32".  We cannot simply assign this string into an int variable:
      String s = "32";
      int i = s;  // will not compile!
Casting will not help, either:
      String s = "32";
      int i = (int) s;  // still will not compile!

Compile this code (that is, try to compile the code, as it will not compile).  Look closely at the error -- it says that int and string are "inconvertible types".  So moving from int to string is neither widening nor narrowing.  It is simply prohibited.

Fortunately, Java provides a method that effectively does this conversion for us:  Integer.valueOf.
      String s = "32";
      int i = Integer.valueOf(s);
      System.out.println(i == 32); // prints true

Java also provides similar methods to convert from strings to doubles and booleans:
      double d = Double.valueOf("3.14");
      System.out.println(d == 3.14);     // prints true
      boolean b = Boolean.valueOf("true");
      System.out.println(b == true);     // prints true

From this, you might conclude (wrongly) that the following could convert from strings to chars:
      char c = Character.valueOf("Q"); // will not compile!
      System.out.println(c == 'Q');
The problem: the Character.valueOf method does not convert from a string to a char!  Fortunately, we have seen another method that can perform this conversion:  charAt.  For example:
      String s = "Q";
      char c = s.charAt(0);         // convert string to char
      System.out.println(c == 'Q'); // prints true

5.  Practice