Loading sub-menu...

Regular Expressions in Java

It doesn't matter how many times I deal with Regular Expressions in Java, I still can never remember a darn thing. I always have to look everything up again. So, since I am currently working with some RegEx in Java now, I thought I'd start this page to capture notes to act as quick reminders.

The quintessential source code example

Commonly, use of Regular Expressions in Java will look something like this:

  • A Pattern object is a compiled representation of a regular expression.
  • A Matcher object is the engine that interprets the pattern and performs match operations against an input string.

Metacharacters

Metacharacters are special characters that affect the way a pattern is matched. The metacharacters supported by this API are:

There are two ways to force a metacharacter to be treated as an ordinary character:

  • precede the metacharacter with a backslash, or
  • enclose it within \Q (which starts the quote) and \E (which ends it).

Cheat Sheet

RegEx Construct Notes Matches
. A dot is a metacharacter that represents any character. It means, 'match any character here'.
cats
cato
cat5
[abc] A character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string. a, b, or c
[^abc] The negation symbol ^ means "except for" when used inside of the square brackets of a character class.
Any character except a, b, or c
[a-zA-Z] The range symbol - means "through" as in "include this through this".
a through z, or A through Z
[a-d[m-p]] union a through d, or m through p
[a-z&&[def]] intersection Sun says it matches d, e, or f, but I think they mean it should match a through z AND d, e, or f
[a-z&&[^bc]] subtraction
a through z, except for b and c
[a-z&&[^m-p]] subtraction
a through z, and not m through p
\d _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile '\\d')
A digit: [0-9]
\D _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile '
D')
A non-digit: [^0-9]
\s _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile '
s')
A whitespace character: [ \t\n\x0B\f\r]
\S _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile '
S')
A non-whitespace character: [^\s]
\w _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile '
w')
A word character: [a-zA-Z_0-9]
\W
_escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile '
W')
A non-word character: [^\w]
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.