|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectandyr.jtokeniser.Tokeniser
andyr.jtokeniser.RegexTokeniser
public class RegexTokeniser
The RegexTokeniser class uses regular expressions to define a word, and
tokenises according to that expression. All matching is performed via Java's
Pattern and Matcher classes.
The following is one example of the use of the tokeniser. The code:
RegexTokeniser ret = new RegexTokeniser("the cat sat on the mat", "\\w+");
while (ret.hasMoreTokens()) {
System.out.println(ret.nextToken());
}
prints the following output:
the
sat
on
the
mat
It is also possible to keep the strings inbetween tokens should it be
necessary. By default these are discarded. Note, it won't keep anything
before the first match or anything after the last match. For example, take
the string "123abc456def789" and the regular expression "\\D+" (one or more
non-digits):
RegexTokeniser ret = new RegexTokeniser("123abc456def789", "\\D+");
while (ret.hasMoreTokens()) {
System.out.println(ret.nextToken());
}
prints the following output:
abc
456
def
| Field Summary |
|---|
| Fields inherited from class andyr.jtokeniser.Tokeniser |
|---|
currentTokenPosition, tokens |
| Constructor Summary | |
|---|---|
RegexTokeniser(java.lang.String input)
Creates a RegexTokeniser that tokenises the input. |
|
RegexTokeniser(java.lang.String input,
java.lang.String regex)
Creates a RegexTokeniser that tokenises the input
according a regular expression that defines a "word" or token. |
|
RegexTokeniser(java.lang.String input,
java.lang.String regex,
boolean keepDelim)
Creates a RegexTokeniser that tokenises the input
according a regular expression that defines a "word" or token. |
|
| Method Summary |
|---|
| Methods inherited from class andyr.jtokeniser.Tokeniser |
|---|
countTokens, getTokens, hasMoreTokens, nextToken, numberOfTokens |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public RegexTokeniser(java.lang.String input,
java.lang.String regex,
boolean keepDelim)
RegexTokeniser that tokenises the input
according a regular expression that defines a "word" or token. If
keepDelit is true then all the strings in between the tokens
are kept as tokens too.
input - a string from which the tokens will be extracted.regex - the regular expression.keepDelim - flag indicating whether to return the delimiters as tokens.Pattern
public RegexTokeniser(java.lang.String input,
java.lang.String regex)
RegexTokeniser that tokenises the input
according a regular expression that defines a "word" or token.
input - a string from which the tokens will be extracted.regex - the regular expression.Patternpublic RegexTokeniser(java.lang.String input)
RegexTokeniser that tokenises the input.
input - a string from which the tokens will be extracted.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||