Sun Frowns on StringTokenizer

While working on a new Java book this weekend, I discovered that Sun is now discouraging programmers from using the java.util.StringTokenizer class, as noted in the class documentation:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

The following example illustrates how the String.split method can be used to break up a string into its basic tokens:

String[] result = "this is a test".split("\s");
for (int x=0; x < result.length; x++)
System.out.println(result[x]);

prints the following output:

this
is
a
test

For the book, I decided to ignore this advice and continue to show readers how to split delimited strings into separate tokens. The class serves a useful purpose and doesn't require knowledge of regular expressions.

Comments

First, you forgot to escape the < character in the .split() example.

Second... wtf?
A) How, exactly, is the concept of regular expressions a liability? How can you be a programmer today and never have heard of regular expressions?

B) split() doesn't really require "knowledge" of regular expressions either; you're free to call it with arguments like ' ' or ':'. True, you'd need to make a note that strings like '+' need to be escaped to be a valid regular expression, but... jeez, how dumb do you think your readers are?

The class "serves a useful purpose"? Well, sure, but so does .split(), and the as far as I can tell, StringTokenizer has a subset of the functionality of String.split(). It's not clear from the docs if Java's split() works this way, but in JavaScript, calling 'a:b:c'.split(/(:)/) returns ['a', ':', 'b', ':', 'c'] -- that is, the delimiters are included in the result, like with the three-argument form of StringTokenizer.

All right, if you're not convinced yet, how about this angle: StringTokenizer is a Java-only library. On the other hand, virtually every language that supports regular expressions (and, I'd imagine, some that don't) have split().

Hmm. this turned out longer than I thought it would. I guess I don't do silent anger like Shelley does....

Frankly, I think you'll be doing your readers a disservice if you promote StringTokenizer over String.split() or, worse, systematically exise every mention of regular expressions from your book.

Thanks for the correction.

I love regular expressions, and the book does cover java.util.regex. I just don't think it's a topic that should be covered in the first programming project of the book.

Not to mention the overhead. I see novice programmers doing this all the time, namely using regexps for string division based on one or a few simple non-metacharacters. REs are powerful tools, but they should be used with a little discretion.

Did you mean "s" instead of "s" in your example?

In Python, both this:

"this is a test".split("s")

and this:

re.split("s", "this is a test")

give you this response:

['thi', ' i', ' a te', 't']

That is interesting. I hadn't noticed that change.

Philip: yeah, it was originally "s", but it looks like Rogers changed it accidentally when editing to change < into &lt;

I think it's correct now. Java requires a backslash character to be represented as '', which makes regular expressions look particularly obtuse.

Using StringTokenizer is bug-prone.

java.util.StringTokenizer tokenizer=new java.util.StringTokenizer ("A,B,,C",",");
System.out.println(tokenizer.nextToken()); //A
System.out.println(tokenizer.nextToken());
//B
System.out.println(tokenizer.nextToken());
//C!! (and not "")

The solution is then to used the other constructor
java.util.StringTokenizer tokenizer=new java.util.StringTokenizer ("A,B,,C",",",true);
System.out.println(tokenizer.nextToken()); //A
System.out.println(tokenizer.nextToken());
//,
System.out.println(tokenizer.nextToken());
//B...
But you have then to manage appropriately the presence (and repeat) of the delimiter.

I think it is useless to spend time debugging a part of a program when an alternative exist.

One real problem with using simply .split with a regular
expression as the delimiter is that there is no way to tell just
what it was that actually matched.
Particularly if the regex is non-trivial, it may be
of real interest to know what it was.
You can do this with .StringTokenizer, but you can't with .split .

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).