Course Summary
Let's now summarize all the concepts we learned so far.
We'll cover the following
Introduction to regular expressions
- A regular expression, also known as regex or regexp, is a pattern of characters we want to match or search for in a text.
- Regular expressions are widely used in Unix operating systems, text editors, programming languages, and various domains like bioinformatics and lexical analysis.
- Regular expressions scan the string from left to right to look for matches with a given pattern.
- The regular expression pattern lets us match an input text against it. It performs operations based on the results, like parsing useful information, finding and replacing texts, splitting a string, extracting data, and so on.
The String
class and its methods
-
A string is a sequence of Unicode characters enclosed by double quotes.
-
A string literal prefixed with @ in C# denotes a verbatim string literal, in which escape sequences and interpolation are not processed. For example, the string
"C:\\temp"
can also be represented as@"C:\temp"
. -
The
string
keyword in C# is an alias for theSystem.String
class in the .NET Framework. It provides various methods and properties to work with strings. -
Strings are an immutable sequence of
System.Char
objects. -
The
Concat()
method of theString
class concatenates the two strings. -
The
Match()
method of theString
class matches a regular expression against a string. -
The
Replace()
method of theString
class replaces a given substring with another substring in a string. -
The
Split()
method of theString
class splits a string into multiple substrings based on the characters in an array. -
The
Substring()
method of theString
class extracts a part of a string. -
The
Contains()
method of aString
class checks whether a string contains a given substring. -
The
StartsWith()
andEndsWith()
methods of theString
class check whether a string starts with or ends with a given substring. -
The
IndexOf()
method of theString
class to find the index of a given character or substring in a string.
Regular expressions APIs in C#
-
The
System.Text.RegularExpressions
namespace of the .NET Framework provides a set of classes and methods to create, match, and manipulate regular expressions. -
The
Regex
class is the primary type of the Regular Expressions API. It provides a set of static methods and properties to work with regular expressions. -
The
Match
class represents the results of a single regular expression match. It contains information about each match, such as the value of the matched string, and its start and length within the input string. -
The
MatchCollection
class contains a collection ofMatch
objects. -
The
Group
class represents a matching subexpression within a regular expression match. -
The
GroupCollection
class contains a collection ofGroup
objects that represent all the captured groups within a single regular expression match. -
The
Match()
method of theRegex
class matches a regular expression against a string. This method returns aMatch
object that contains information about the match. -
The
Matches()
method of theRegex
class finds all the matches of a regular expression in a string. This method returns a collection ofMatch
objects that contain information about all the matches. -
The
Replace()
method of theRegex
class replaces a regular expression with another string. -
The
Split()
method of theRegex
class splits a string into an array of substrings. This method splits the input string at the positions that match the regular expression. -
The
IsMatch()
method of theRegex
class checks whether a regular expression matches a given string. This method returnsTrue
if the regular expression matches the given string. Otherwise, it returnsFalse
.
Special characters in regular expressions
-
The dot (
.
) character matches any single character, except for the newline characters. -
The caret (
^
) character matches the start of the input string. -
The dollar sign (
$
) character matches the end of the input string. -
A pair of square brackets
[]
represents a character class.
Character classes
-
A
character
class matches the single character enclosed within the square brackets. -
A
character
class also includes a range of characters, represented by two characters separated by a hyphen-
. For example,A-Z
matches any uppercase letter fromA
toZ
.
Meta characters
-
The backslash (
\
) symbol is a meta character that represents various predefined character classes. -
\s
matches any whitespace (space, tab, carriage-return,newline, and form-feed). -
\d
matches digits (0 to 9).
Quantifiers
-
The symbols
?
,*
,+
are used as quantifiers. -
X?
matches zero or one occurrence ofX
. -
X*
matches zero or more occurrences ofX
. -
X+
matches one or more occurrences ofX
.
RegexOptions
-
The
RegexOptions
enumeration controls how regular expression operations are performed. -
We can include one or more values from the
RegexOptions
enumeration in a bitwise combination by using the OR (|
) operator. For example, if we want to perform case-insensitive and culture-insensitive matches, we use the valueRegexOptions.IgnoreCase | RegexOptions.CultureInvariant
. -
We can pass the value of the
RegexOptions
enumeration as an argument to methods that expect options. For example, we can specify the options for constructing aRegex
object.
– RegexOptions.CultureInvariant
specifies that cultural differences in language are ignored.
-
RegexOptions.IgnoreCase
specifies that the regular expression is case-insensitive. -
RegexOptions.Multiline
specifies that the regular expression matches multiple lines of input. -
RegexOptions.Singleline
specifies that the “.” character matches all characters, including newline characters. -
RegexOptions.IgnorePatternWhitespace
specifies that white space in the regular expression pattern is ignored.
Working with capture groups
-
Groups specified by parenthesis
()
, subdivide the match found by regular expressions. -
We can access groups using the
Groups
property of theMatch
object. -
The
Value
property of theGroup
object contains the value of the group that is matched. -
The
Success
property indicates whether the group matches the input string. -
The
Index
property of the Group object contains the index of the matched group. -
The
Length
property of the Group object contains the length of the matched group. -
The captured groups are numbered, starting from 1.
-
$n
denotes the nth captured group, where n is the number of the captured group.
Working with backreferences
-
Backreferences let us reuse previously matched sub-strings within a regular expression pattern.
-
\\n
ddenotes the nth captured group, where n is the number of the captured group. -
$&
denotes the entire match. -
$\
denotes the part of the string before the match. -
${name}
denotes the value of the named captured groupname
.
Advanced topics
-
Regex patterns are often used to search for sensitive data, such as credit card numbers or social security numbers. We must make sure to not accidentally store or log this sensitive data.
-
A malicious user might try to submit a string causes our regular expression to take a long time to process. This is called a regex denial of service (DoS) attack.
-
We can prevent DoS attacks by using the
Timeouts
property to specify how long a regular expression operation can take before it times out. -
We should always use the simplest regular expressions that match the patterns we look for to ensure good performance.
Congratulations
Congratulations on finishing this course! The lessons you’ve learned here will be invaluable as you continue to make more complex and practical C# applications.
Thanks for enrolling in this course, and good luck with your next steps as a programmer!