howl.regex
Overview
Lua, by itself, does not provide regular expression. Instead it provides its own
more limited form of pattern matching. Regular expressions are instead provided
by the howl.regex module as an extension. To blend in with Lua, the operations
provided by the regex module closely mimics the corresponding operations found
in the Lua string standard library. Support for regular expressions is also
included in Howl’s string extensions, making it easy to use
within your code. Since regular expressions are not native to Lua, there’s no
syntactical sugar available for constructing a regular expression. Instead
regular expression are constructed as ordinary strings. The global function
r provides a constructor function for this. Since this is available in the
global namespace, it’s possible to construct a regular expression anywhere
within Howl just by prefixing a string with r
, like so:
my_regex = r'\\d+[lL]'
You can then either use provided methods directly on the regular expression:
r'(r\\w+)\\s+(\\S+)':match('red right hand') -- => "red", "right"
Or use Howl’s string extensions which allows for passing in regular expressions instead of Lua patterns:
local s = 'red right hand' s:ufind(r'(\\w+)', 5) -- => 5, 9, "right" s:umatch(r'(r\\w+)\\s+(\\S+)') -- => "red", "right"
Howl’s regular expressions are implemented as Perl compatible regular expressions. While it’s an implementation detail, susceptible to change, they are currently implemented on top of GLibs regular expression support. You can read more about the full syntax supported by the implementation here.
Comparison with Lua patterns
As mentioned above, the operations provided by the regex module closely mimics
the corresponding operations found in the Lua string standard library. Thus you
have match
, and find
and gmatch
, which work similarly to their
counterparts. Lua patterns are more limited than regular expressions, but even
so they also offer functionality not present in ordinary regular expressions.
One such example is the support for positional captures using an empty capture,
()
, which returns the position of the match within the target string. The
methods in the regex
module allows for these kind of captures within regular
expressions as well.
One important thing to note is the handling of offsets, both when passed as arguments and when returned as result values. In contrast to Lua, offsets are considered as character offsets, and not byte offsets. Consider for instance the following use of positional captures:
('å1')\umatch(r'()\\d') -- => 2 ('å1')\match('()%d') -- => 3
Another thing to be aware of is how optional captures are handled by the regex
module. This is not supported by Lua patterns, so there’s no corresponding
functionality to contrast with, but consider the following regular expression:
'(1)(\\w)?(2)
This expression contains three different captures, but the second of these is
not required to match. In the case where such a capture does not actually match,
the corresponding returned match is the empty string, ''
:
r'(1)(\\w)?(2)'\match '1x2' -- => '1', 'x', '2' r'(1)(\\w)?(2)'\match '12' -- => '1', '', '2'
See also:
- The spec for regex
- The “Regular expression syntax” page for GRegex
Properties
pattern
Holds the regular expression string used to construct the regular expression.
r('\\d+').pattern -- => '\\d+'
capture_count
Holds the the number of capturing groups in the regular expression:
r('foo(bar)(\\w+)').capture_count -- => 2
Functions
is_instance (v)
Returns true if v
is a regular expression instance, and false otherwise.
r (pattern)
Constructs a regular expression from pattern
. As was noted in the overview,
this is available globally as simple r
. Raises an error if pattern
is not a
valid regular expression. This function also accepts regular expressions as
parameters, in which case the passed regular expression is returned as is.
Methods
match (string [, init])
Matches the regular expression against string
. If init
is specified, starts
matching from that position. Has the same semantics as Lua’s string.match,
with the one significant difference that init
as well as any returned
positional captures are treated as character offsets.
r'(r\\w+)\\s+(\\S+)':match('red right hand') -- => "red", "right" r'()red()':match('red') -- => 1, 4 r'\\pL':find('!äö') -- => 2, 2
find (s [, init])
Finds the first match of the regular expression in s
, optionally starting at
init
if specified. Has the same semantics as Lua’s string.find, with the
one significant difference that init
as well as any returned indices are
treated as character offsets.
local s = 'red right hand' s:ufind(r'(\\w+)', 5) -- => 5, 9, "right"
gmatch (s)
Returns an iterator function, where each consecutive call returns the next match
of the regular expression in s
. Has the same semantics as Lua’s
string.gmatch, with the one significant difference that any positional
captures are returned as character offsets.
[m for m in r'\\w+'\gmatch 'well hello there'] -- => { 'well', 'hello', 'there' } [p for p in r'()\\s+'\gmatch 'ΘΙΚΛ ΞΟΠ ' ] -- => { 5, 9 }