Advanced Regex in C#: Complex Pattern Matching and Real-World Examples
Delve into advanced regex use cases in C#, including complex pattern matching like extracting HTML tags. Learn through practical examples and enhance your text manipulation skills with powerful regex expressions.
Complex Pattern Matching
While simple patterns are useful for basic text manipulation, real-world scenarios often demand more intricate regex expressions.
Example: Extracting HTML Tags
Syntax
string pattern = @"<(\w+)>(.*?)</\1>";
string input = "This is a paragraph.
This is a span.";
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine("Tag: {0}, Content: {1}",
match.Groups[1].Value, match.Groups[2].Value);
}
Output
Tag: p, Content: This is a paragraph.
Tag: span, Content: This is a span.
Explanation:
(\w+)
: Captures one or more word characters as the tag name (Group 1).(.*?)
: Non-greedy capture of content between tags (Group 2).\1
: Backreference to the captured tag name for closing tag.
Example: Parsing Log Files
Syntax
string pattern = @"^(?\d{4}-\d{2}-\d{2})\s+(?
Output
Date: 2023-11-22, Time: 12:34:56, Level: INFO, Message: Application started
Explanation:
- Named capture groups
(?
extract specific parts of the log entry.)
Lookarounds
Lookarounds are powerful features that allow you to assert conditions without consuming characters.
Example: Matching Words Ending with "ing"
Syntax
string pattern = @"\b\w+ing\b";
string input = "singing dancing walking";
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Output
singing
dancing
walking
Explanation:
\b
: Word boundary ensures complete word matching.
Balancing Groups
For complex nested structures like parentheses or brackets, balancing groups can be used.
Example: Matching Balanced Parentheses
(This example requires advanced regex knowledge and isn't provided here.)
Performance Optimization
For large input strings or complex patterns, consider the following:
- Compiling the regex: Use the
RegexOptions.Compiled
flag. - Using indexes: If you know the approximate position of the match.
- Breaking down complex patterns: Simplify patterns to improve performance.
- Profiling: Identify performance bottlenecks in your regex operations.
Additional Use Cases
- Data validation: Email addresses, phone numbers, credit card numbers.
- Text extraction: Extracting specific information from documents.
- Code analysis: Finding code patterns, refactoring.
- Security: Detecting potential vulnerabilities in code or data.
- Natural language processing: Tokenization, stemming, lemmatization.