Advanced Regex in C#: Complex Pattern Matching and Real-World Examples

Delve into advanced regex use cases in C#, including complex pattern matching like extracting HTML tags. Learn through practical examples and enhance your text manipulation skills with powerful regex expressions.



Complex Pattern Matching

While simple patterns are useful for basic text manipulation, real-world scenarios often demand more intricate regex expressions.

Example: Extracting HTML Tags

Syntax

string pattern = @"<(\w+)>(.*?)</\1>";
string input = "

This is a paragraph.

This is a span."; Regex regex = new Regex(pattern); MatchCollection matches = regex.Matches(input); foreach (Match match in matches) { Console.WriteLine("Tag: {0}, Content: {1}", match.Groups[1].Value, match.Groups[2].Value); }
Output

Tag: p, Content: This is a paragraph.
Tag: span, Content: This is a span.
Explanation:
  • (\w+): Captures one or more word characters as the tag name (Group 1).
  • (.*?): Non-greedy capture of content between tags (Group 2).
  • \1: Backreference to the captured tag name for closing tag.

Example: Parsing Log Files

Syntax

string pattern = @"^(?\d{4}-\d{2}-\d{2})\s+(?
Output

Date: 2023-11-22, Time: 12:34:56, Level: INFO, Message: Application started
Explanation:
  • Named capture groups (?) extract specific parts of the log entry.

Lookarounds

Lookarounds are powerful features that allow you to assert conditions without consuming characters.

Example: Matching Words Ending with "ing"

Syntax

string pattern = @"\b\w+ing\b";
string input = "singing dancing walking";
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);

foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Output

singing
dancing
walking
Explanation:
  • \b: Word boundary ensures complete word matching.

Balancing Groups

For complex nested structures like parentheses or brackets, balancing groups can be used.

Example: Matching Balanced Parentheses

(This example requires advanced regex knowledge and isn't provided here.)

Performance Optimization

For large input strings or complex patterns, consider the following:

  • Compiling the regex: Use the RegexOptions.Compiled flag.
  • Using indexes: If you know the approximate position of the match.
  • Breaking down complex patterns: Simplify patterns to improve performance.
  • Profiling: Identify performance bottlenecks in your regex operations.

Additional Use Cases

  • Data validation: Email addresses, phone numbers, credit card numbers.
  • Text extraction: Extracting specific information from documents.
  • Code analysis: Finding code patterns, refactoring.
  • Security: Detecting potential vulnerabilities in code or data.
  • Natural language processing: Tokenization, stemming, lemmatization.