regexpPattern
Description
creates a pattern that matches the regular expression.pat
= regexpPattern(expression
)
specifies additional options with one or more name-value pair arguments. For example, you
can specify pat
= regexpPattern(expression
,Name,Value
)'IgnoreCase'
as true
to ignore case when
matching..
Examples
Combine Patterns and Regular Expressions
Use regexpPattern
to specify patterns using regular expressions that can be used as inputs for text-searching functions.
Find words that start with c
, end with t
, and contain one or more vowels in between.
txt = "bat cat can car coat court CUT ct CAT-scan"; expression = 'c[aeiou]+t';
The regular expression 'c[aeiou]+t'
specifies this pattern:
c
must be the first character.c
must be followed by one of the characters inside the brackets,[aeiou]
.The bracketed pattern must occur one or more times, as indicated by the
+
operator.t
must be the last character, with no characters between the bracketed pattern and thet
.
Extract the pattern. Note, the words CUT
and CAT
do not match because they are uppercase.
pat = regexpPattern(expression); extract(txt,pat)
ans = 2x1 string
"cat"
"coat"
Patterns created using regexpPattern
can be combined with other pattern functions to create more complicated patterns. Use whitespacePattern
and lettersPattern
to create a new pattern that also matches words after the regular expression matches, and then extract the new pattern.
pat = regexpPattern(expression) + whitespacePattern + lettersPattern; extract(txt,pat)
ans = 2x1 string
"cat can"
"coat court"
Ignore newline
Characters
Create a string containing a newline
character. Use the regular expression '.'
to match any character except newline
characters.
txt = "First Line" + newline + "Second Line"
txt = "First Line Second Line"
expression = '.+';
The regular expression '.+'
matches one or more of any character including newline
characters. Count how many times the pattern matches.
pat = regexpPattern(expression); count(txt,pat)
ans = 1
Create a new regular expression pattern, but this time specify DotExceptNewline
as true
so that the pattern does not match newline
characters. Count how many times the pattern matches.
pat = regexpPattern(expression,"DotExceptNewline",true);
count(txt,pat)
ans = 2
Ignore Whitespaces in Expressions When Matching
Create txt
as a string.
txt = "Hello World";
The expression '. *'
only matches individual characters because of the whitespace between .
and *
. Create a pattern to match the regular expression '. *'
, and then extract the pattern.
expression = '. *';
pat = regexpPattern(expression);
extract(txt,pat)
ans = 10x1 string
"H"
"e"
"l"
"l"
"o "
"W"
"o"
"r"
"l"
"d"
Create a new regular expression pattern, but this time specify FreeSpacing
as true
to ignore whitespaces in the regular expression. Extract the new pattern.
pat = regexpPattern(expression,"FreeSpacing",true);
extract(txt,pat)
ans = "Hello World"
Ignore Case with Regular Expressions
Find words that start with c
, end with t
, and contain one or more vowels in between, regardless of case.
txt = "bat cat can car coat court CUT ct CAT-scan"; expression = 'c[aeiou]+t';
The regular expression 'c[aeiou]+t'
specifies this pattern:
c
must be the first character.c
must be followed by one of the characters inside the brackets,[aeiou]
.The bracketed pattern must occur one or more times, as indicated by the
+
operator.t
must be the last character, with no characters between the bracketed pattern and thet
.
Extract the pattern. Note that the words CUT and CAT do not match because they are uppercase.
pat = regexpPattern(expression); extract(txt,pat)
ans = 2x1 string
"cat"
"coat"
Create a new regular expression pattern, but this time specify IgnoreCase
as true
to ignore case with the regular expression. Extract the new pattern.
pat = regexpPattern(expression,"IgnoreCase",true);
extract(txt,pat)
ans = 4x1 string
"cat"
"coat"
"CUT"
"CAT"
Designate ^
and $
Anchors as Line or Text Anchors
The metacharacters ^
and $
can be used to specify line anchors or text anchors. The behavior that regexpPattern
uses is specified by the Anchors
option.
Create txt
as a string containing newline
characters.
txt = "cat" + newline + "bat" + newline + "rat";
The regular expression '^.+?$' matches one or more characters between two anchors. Create a pattern for this regular expression, and specify Anchors
as “text”
so that the ^ and $ anchors are treated as text anchors. Extract the pattern.
expression = '^.+?$'; pat = regexpPattern(expression,"Anchors","text"); extract(txt,pat)
ans = "cat bat rat"
Create a new regular expression pattern, but this time specify Anchors
as “line”
so that the ^ and $ anchors are treated as line anchors. Extract the new pattern.
pat = regexpPattern(expression,"Anchors","line"); extract(txt,pat)
ans = 3x1 string
"cat"
"bat"
"rat"
Input Arguments
expression
— Regular expression
character vector | cell array of character vectors | string array
Regular expression, specified as a character vector, a cell
array of character vectors, or a string array. Each expression can
contain characters, metacharacters, operators, tokens, and flags that
specify patterns to match in str
.
The following tables describe the elements of regular expressions.
Metacharacters
Metacharacters represent letters, letter ranges, digits, and space characters. Use them to construct a generalized pattern of characters.
Metacharacter | Description | Example |
---|---|---|
| Any single character, including white space |
|
| Any character contained within the square brackets. The following characters are treated
literally: |
|
| Any character not contained within the square brackets. The following characters are treated
literally: |
|
| Any character in the range of |
|
| Any alphabetic, numeric, or underscore character. For
English character sets, |
|
| Any character that is not alphabetic, numeric, or underscore.
For English character sets, |
|
| Any white-space character; equivalent to |
|
| Any non-white-space character; equivalent to |
|
| Any numeric digit; equivalent to |
|
| Any nondigit character; equivalent to |
|
| Character of octal value |
|
| Character of hexadecimal value |
|
Character Representation
Operator | Description |
---|---|
| Alarm (beep) |
| Backspace |
| Form feed |
| New line |
| Carriage return |
| Horizontal tab |
| Vertical tab |
| Any character with special meaning in regular expressions
that you want to match literally (for example, use |
Quantifiers
Quantifiers specify the number of times a pattern must occur in the matching text.
expr
represents any regular expression.
Quantifier | Number of Times Expression Occurs | Example |
---|---|---|
| 0 or more times consecutively. |
|
| 0 times or 1 time. |
|
| 1 or more times consecutively. |
|
| At least
|
|
| At least
|
|
| Exactly Equivalent
to |
|
Quantifiers can appear in three modes, described in the following table. q represents any of the quantifiers in the previous table.
Mode | Description | Example |
---|---|---|
| Greedy expression: match as many characters as possible. | Given the text
|
| Lazy expression: match as few characters as necessary. | Given the text
|
| Possessive expression: match as much as possible, but do not rescan any portions of the text. | Given the text |
Grouping Operators
Grouping operators allow you to capture tokens, apply one operator to multiple elements, or disable backtracking in a specific group. Tokens are portions of the matched text that you define by enclosing part of the regular expression in parentheses.
Grouping Operator | Description | Example |
---|---|---|
| Group elements of the expression and capture tokens. |
|
| Group, but do not capture tokens. |
Without
grouping, |
| Group atomically. Do not backtrack within the group to complete the match, and do not capture tokens. |
|
| Match expression If
there is a match with You can include |
|
Anchors
Anchors in the expression match the beginning or end of the input text or word.
Anchor | Matches the... | Example |
---|---|---|
| Beginning of the input text. |
|
| End of the input text. |
|
| Beginning of a word. |
|
| End of a word. |
|
Lookaround Assertions
Lookaround assertions look for patterns that immediately precede or follow the intended match, but are not part of the match.
The pointer remains at the current location, and characters that correspond to the
test
expression are not captured or discarded. Therefore,
lookahead assertions can match overlapping character groups.
Lookaround Assertion | Description | Example |
---|---|---|
| Look ahead for characters that match |
|
| Look ahead for characters that do not match
|
|
| Look behind for characters that match |
|
| Look behind for characters that do not match
|
|
If you specify a lookahead assertion before an
expression, the operation is equivalent to a logical AND
.
Operation | Description | Example |
---|---|---|
| Match both |
|
| Match |
|
Logical and Conditional Operators
Logical and conditional operators enable you to test the state of a given condition, and then
use the outcome to determine which pattern, if any, to match next. These operators
support logical OR
, and if
or
if/else
conditions.
Conditions can be tokens, lookaround operators, or dynamic expressions
of the form (?@cmd)
. Dynamic expressions must return
a logical or numeric value.
Conditional Operator | Description | Example |
---|---|---|
| Match expression If
there is a match with |
|
| If condition |
|
| If condition |
|
Token Operators
Tokens are portions of the matched text that you define by enclosing part of the regular expression in parentheses. You can refer to a token by its sequence in the text (an ordinal token), or assign names to tokens for easier code maintenance and readable output.
Ordinal Token Operator | Description | Example |
---|---|---|
| Capture in a token the characters that match the enclosed expression. |
|
Named Token Operator | Description | Example |
---|---|---|
| Capture in a named token the characters that match the enclosed expression. |
|
Note
If an expression has nested parentheses, MATLAB® captures
tokens that correspond to the outermost set of parentheses. For example,
given the search pattern '(and(y|rew))'
, MATLAB creates
a token for 'andrew'
but not for 'y'
or 'rew'
.
Comments
Characters | Description | Example |
---|---|---|
(?#comment) | Insert a comment in the regular expression. The comment text is ignored when matching the input. |
|
Search Flags
Search flags modify the behavior for matching expressions. An
alternative to using a search flag within an expression is to pass
an option
input argument.
Flag | Description |
---|---|
(?-i) | Match letter case (default for |
(?i) | Do not match letter case (default for |
(?s) | Match dot ( |
(?-s) | Match dot in the pattern with any character that is not a newline character. |
(?-m) | Match the |
(?m) | Match the |
(?-x) | Include space characters and comments when matching (default). |
(?x) | Ignore space characters and comments when matching. Use |
The expression that the flag modifies can appear either after the parentheses, such as
(?i)\w*
or inside the parentheses and separated from the flag with a
colon (:
), such as
(?i:\w*)
The latter syntax allows you to change the behavior for part of a larger expression.
Data Types: char
| cell
| string
Note
regexpPattern
does not support back references, conditions based on
back references, and dynamic regular expressions.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'DotExceptNewline',true,'FreeSpacing',false
DotExceptNewline
— Dot matching of new lines
false
or 0
(default) | true
or 1
Dot matching of newline
character, specified as the
comma-separated pair consisting of 'DotExceptNewline'
and a logical
scalar. Set this option to 0 (false) to omit newline
characters
from dot matching.
Example: pat =
regexpPattern('m.','DotExceptNewline',true)
FreeSpacing
— Matching white space
false
or 0
(default) | true
or 1
Matching white space character, specified as the comma-separated pair consisting
of 'FreeSpacing'
and a logical scalar. Set this option to 1 (true)
to omit whitespace characters and comments when matching.
Example: pat =
regexpPattern('m.','FreeSpacing',false)
IgnoreCase
— Ignore case when matching
false
or 0
(default) | true
or 1
Ignore case when matching, specified as the comma-separated pair consisting of
'IgnoreCase'
and a logical scalar. Set this option to 1 (true) to
match regardless of case.
Example: pat =
regexpPattern('m.','IgnoreCase',true)
Anchors
— Metacharacter treatment
'text'
(default) | 'line'
Metacharacter treatment, specified as the comma-separated pair consisting of
'Anchors'
and one of these values:
Value | Description |
---|---|
'text' | Treat the metacharacters ^ and $ as
text anchors. This anchors regular expression matches to the beginning or end
of text, which might span multiple lines. |
'line' | Treat the metacharacters ^ and $ as
line anchors. This anchors regular expression matches to the beginning or end
of lines in the text. This option is useful when you have multiline text and
do not want matches to span multiple lines. |
Example: pat =
regexpPattern('\d+','Anchors','line')
Output Arguments
pat
— Pattern expression
pattern object
Pattern expression, returned as a pattern
object.
Extended Capabilities
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
Version History
Introduced in R2020b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)