Re ObjectsΒΆ
Re
objects are memoized
for efficiency, so they compile their pattern just once, regardless of how
many times they’re mentioned in a program.
Note that the in
test turns the sense of the matching around (compared
to the standard re
module). It asks “is the given string in the set of
items this pattern describes?” To be fancy, the Re
pattern is an
intensionally defined set (namely “all strings matching the pattern”). This
order often makes excellent sense whey you have a clear intent for the test.
For example, “is the given string within the set of all legitimate
commands?”
Second, the in
test had the side effect of setting the underscore name
_
to the result. Python doesn’t support en passant
assignment–apparently, no matter how hard you try, or how much
introspection you use. This makes it harder to both test and collect results
in the same motion, even though that’s often exactly appropriate. Collecting
them in a class variable is a fallback strategy (see the En Passant
section below for a slicker one).
If you prefer the more traditional re
calls:
if Re(pattern).search(some_string):
print Re._[1]
Re
works even better with named pattern components, which are exposed
as attributes of the returned object:
person = 'John Smith 48'
if person in Re(r'(?P<name>[\w\s]*)\s+(?P<age>\d+)'):
print Re._.name, "is", Re._.age, "years old"
else:
print "don't understand '{}'".format(person)
One trick being used here is that the returned object is not a pure
_sre.SRE_Match
that Python’s re
module returns. Nor is it a subclass.
(That class appears to be unsubclassable.)
Thus, regular expression matches return a proxy object that
exposes the match object’s numeric (positional) and
named groups through indices and attributes. If a named group has the same
name as a match object method or property, it takes precedence. Either
change the name of the match group or access the underlying property thus:
x._match.property
It’s possible also to loop over the results:
for found in Re('pattern (\w+)').finditer('pattern is as pattern does'):
print found[1]
Or collect them all in one fell swoop:
found = Re('pattern (\w+)').findall('pattern is as pattern does')
Pretty much all of the methods and properties one can access from the standard
re
module are available.