Regular expressions are a powerfull way to match or extract information from a text in a very short way. But regular expressions can become lenghty and hard to read. This is especially true when you have a lot of capturing groups you want to refere to later in a match. Java regex DSL uses the builder pattern to provide a fluent API which makes it easier to create large regular expressions by splitting it into reusable named components and extract information from it.
The first step is to create a Regex
object that describes the regular expression using RegexBuilder
. To create a Regex
that matches on a string followed by a number you would write
Regex regex = RegexBuilder.create()
.string("#name1").number("#name2")
.build();
To match the created Regex
against a text, e.g. "foofoofoo1234" use:
Match match = regex.match("foofoofoo1234");
You can now access the match using the specified names "name1" and "name2":
match.getByName("name1"); //will result in "foofoofoo"
match.getByName("name2"); //will result in "1234"
You can group expressions by using group()
, which takes an optional parameter - the name of the group to access in a match. Every expression following the group will be considered part of the group until the group is closed using end()
. Let's say you want to parse a timestamp of the following format h:m:s.ms you could write
Regex regex = RegexBuilder.create()
.group("#timestamp")
.number("#hour").constant(":").number("#min).constant(":").number(#secs).constant(":").number("#ms")
.end()
.build();
You can then access a match like so
Match match = regex.match("10:34:22.234");
match.getByName("timestamp"); //will return the group, i.e. 10:34:22.234
match.getByName("timestamp->ms") //will return the ms of the group timestamp, i.e. 22.234
You can nest as many groups as you like, as you can see in this non-sence example:
Regex regex = RegexBuilder.create()
.group("#g1")
.group("#g2")
.group("#g3")
.string("#myString")
.end()
.end()
.end()
.build();
Match match = regex.match("foofoofoo");
match.getByName("g1->g->g3->myString"); //will return "foofoofoo"
To mark an expression as optional you can use the option()
. It is used in exactly the same way as a group expression.
You can reuse a regex by nesting it into the builder:
Regex regexToReuse = ... //some regex
Regex regex = Builder.create()
.regex("#reused", regex) //reuse the previously created regex here
.build()
To pass through a regular expression use the pattern() expression;
Regex regex = Builder.create()
.pattern("#myPattern", ".*(\d)") //use any pattern you like
.build()
Expression | Description |
---|---|
string(name) | Matches any word character |
number(name) | Matches any number, including floats (e.g. 0.2345) |
any() | Matches any character, including whitespaces |
regex(name, regex) | Matches the given Regex |
pattern(name, pattern) | Matches the given regular expression |
group(name) | Starts a group (has to be closed with end() ) |
option(name) | Starts an optional expression (has to be closed with end() ) |