Pattern Matching

Pattern matching is a flexible way to search for patterns in strings. In Lua pattern matching, character classes and magic characters are essential components to creating simple or complex patterns to match.

Character classes represents sets of characters that can be used in pattern matching.

%a

Letters '[a-zA-Z]'

%c

Control Characters

%d

Digits '[0-9]'

%l

Lowercase Letters

%p

Punctuation Characters

%s

Whitespace Characters (ie. space, tabs, newline, etc.)

%u

Uppercase Letters

%w

Alphanumeric Characters (equivalent to %a and %d combined)

%x

Hexadecimal Digits '[0-9a-fA-F]'

%z

Null Character.

Magic characters are special characters that don’t have their literal meaning when used in pattern matching. Instead, they allow you to do special actions.

[ ]

A character class set allows you to match any one character from the enclosed set.

string.match("abcd", "[beta]")

-- returns a

( )

A capture allows you to enclose sub patterns in your pattern. You can capture multiple patterns.

local field, value = string.match("id: 123456", '(%a+):%s*(%d+)')
-- returns id, 123456

.

Represents any single character.

string.match("abcd", "a..d")
-- returns abcd

%

An escape can be used to escapes the following character, treating it as a literal character instead of a magic character.

string.match("50% off today!", "(%d+)%%")
-- returns 50

Anchors are used to match the start or end of a string:

^

When used at the beginning of a pattern, it forces the pattern to match the start of a string.

When used inside [ ], the meaning of ^ becomes NOT.

string.match("abcd", "^%a")
-- returns a

 

string.match("abcd", "[^ad]")
-- returns b

$

When used at the end of a pattern, it forces the pattern to match the end of the string.

string.match("abcd", "%a$")
-- returns d

Modifiers allow you to specify repetitions:

+

Matches 1 or more repetitions. It will always match the longest possible chain.

string.match("foo 123 bar", "%d+")
-- returns 123

-

Matches 0 or more repetitions. This will always match the shortest possible chain.

When used inside [ ], the meaning of - becomes a range.

string.match("abcd", "abc-")
-- returns ab

string.match("123", "[0-9]")
-- returns 1

*

Matches 0 or more repetitions. This will always match the longest possible chain.

string.match("abcd", "abc*")
-- returns abc

?

Optional can match 0 or 1 occurrence.

string.match("abc", "abcd?")
-- returns abc

The best way to understand how pattern matching works, is to try it out in the using lua functions to apply the patterns:

  • - Looks for the first match of pattern in the string and returns the start and end position of the match.

  • - Looks for the first match of pattern in the string and returns the match.