It doesn't matter how many times I deal with Regular Expressions in Java, I still can never remember a darn thing. I always have to look everything up again. So, since I am currently working with some RegEx in Java now, I thought I'd start this page to capture notes to act as quick reminders.
The quintessential source code example
Commonly, use of Regular Expressions in Java will look something like this:
- A Pattern object is a compiled representation of a regular expression.
- A Matcher object is the engine that interprets the pattern and performs match operations against an input string.
Metacharacters
Metacharacters are special characters that affect the way a pattern is matched. The metacharacters supported by this API are:
There are two ways to force a metacharacter to be treated as an ordinary character:
- precede the metacharacter with a backslash, or
- enclose it within \Q (which starts the quote) and \E (which ends it).
Cheat Sheet
| RegEx Construct | Notes | Matches |
|---|---|---|
| . | A dot is a metacharacter that represents any character. It means, 'match any character here'. |
cats cato cat5 |
| [abc] | A character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string. | a, b, or c |
| [^abc] | The negation symbol ^ means "except for" when used inside of the square brackets of a character class. |
Any character except a, b, or c |
| [a-zA-Z] | The range symbol - means "through" as in "include this through this". |
a through z, or A through Z |
| [a-d[m-p]] | union | a through d, or m through p |
| [a-z&&[def]] | intersection | Sun says it matches d, e, or f, but I think they mean it should match a through z AND d, e, or f |
| [a-z&&[^bc]] | subtraction |
a through z, except for b and c |
| [a-z&&[^m-p]] | subtraction |
a through z, and not m through p |
| \d | _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile '\\d') |
A digit: [0-9] |
| \D | _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile ' D') |
A non-digit: [^0-9] |
| \s | _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile ' s') |
A whitespace character: [ \t\n\x0B\f\r] |
| \S | _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile ' S') |
A non-whitespace character: [^\s] |
| \w | _escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile ' w') |
A word character: [a-zA-Z_0-9] |
| \W |
_escaped construct (_within a string literal, you must preceed the backslash with another backslash for the string to compile ' W') |
A non-word character: [^\w] |
Related Resources
Add Comment