Taking the pain out of regular expressions

If you’ve ever had to write or read a regular expression, you won’t need any convincing that a tool to make this easier would be useful!

If you aren’t familiar with regexs (as they are colloquially known) then it’s worth getting to know, as they can be amazingly useful at times. Sadly, they can also be amazingly difficult to write, and impossible to read afterwards! After all, would you have a clue what this is supposed to do…

^(([-\w \.]+)|(""[-\w \.]+"") )?<([\w\-\.]+)@((\[([0-9]{1,3}\.){3}[0-9]{1,3}\])|(([\w\-]+\.)+)([a-zA-Z]{2,4}))>$

(in case you’re wondering, it validates an email address)

That one is quite simple compared to some! For this reason, most people tend to avoid regexs, and resort to writing far too much code instead. This is a shame, as regexs can make life much easier once you get the hang of them. they are built in to JavaScript, and fully supported by the .NET framework.

There are side benefits to being a regex expert as well…

For a good, if technical introduction to regexs, please read what MSDN has to say about them. For a more gentle introduction, you could read this one or this one instead.

I have found a couple of resources that make working with regexs much easier.

Expresso regex editor

Expresso is a free application for Windows that allows you to build and debug regexs with as much ease as could reasonably be expected. If you enter your regex, it will break it down and show you a tree representation of it, so you can see what parts do what.

If you enter some sample text, it will run the regex against it, and show you how the results were achieved. This is great for debugging (which is otherwise pretty much impossible).

This is excellent as it stands, but still requires you to look at the regex in all its ugliness. We would really like to avoid that, in which case we might like to consider…

Using regexs in C#

I came across a fluent library for using regexs in .NET code, which takes all of the pain out of it, and allows you to write code that you can actually understand! This is something that regex developers would never have dreamed of in The Old Days(TM).

If you add the RegexToolbox Nuget package to your project, then you can write code like this…

Regex regex = new RegexBuilder()
    .WordBoundary()
    .UppercaseLetter()
    .LowercaseLetter(RegexQuantifier.OneOrMore)
    .Whitespace(RegexQuantifier.OneOrMore)
    .UppercaseLetter()
    .LowercaseLetter(RegexQuantifier.OneOrMore)
    .WordBoundary()
    .BuildRegex();

This is the equivalent of this C#…

Regex regex = new Regex(@"\w[A-Z][a-z]+\s+[A-Z][a-z]+\w");

Yes, the fluent syntax is longer, but it’s almost readable. Once you know what the regex is supposed to being doing (matching a person’s name, which is defined as two words next to each other, both beginning with a capital letter), then it’s actually quite clear, certainly compared to the basic regex!

Be First to Comment

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.