Posted on by

Using RegEx Validation

The Participants Database plugin offers a few ways to validate the data that is submitted when a record is created or updated on the frontend. Mostly, it’s either yes or no, but sometimes you need more control over that. This is where what’s known as a “regex” is used.

If you just need to validate a single value, the simplest regex is just the word with the delimiters: /yes/ to validate on the word “yes.”

A point of primary importance I want to make before we start with the regex is that you must tell your users what you expect. Use the “help” field to let them know what will work. It is very frustrating to users if they have to guess what input will satisfy the validator!

Regex

Regex is a shortened name for “Regular Expression.” Regexes have been around a long time, but it’s still a fairly arcane art (meaning not many people know how to really use it) despite how widespread it’s use is. I will be barely touching the surface in this short article, the subject is very broad. I will supply a few helpful links at the end of this article to help you if you need more explanation.

Mostly, I want to give you just enough explanation to use regexes in this plugin.

Let’s say you have a form where you are collecting a user’s phone number, and you want to make sure they put it in in the right format. This is a very simple task for a regex, but it will get us started on explaining how they work and how to create one. The phone number must be like this example: 012-224-2334 x214 The extension is optional, and can be 1-4 numerals long.

Delimiters

First, every regex must have delimiters. These are a character that tells the regex interpreter where the regex ends. The requirements are:

  • it must be the first character
  • it can be any non-alphanumeric, non-backslash, non-whitespace character
  • it must not be used in the expression (unless escaped)
  • it must be the last character
  • common choices are: /#|

In the example above /yes/ the / is the delimiter. The regex #yes# will work exactly the same.

What kind of character?

One of the main features of regular expressions is something called “character classes” which is a way of defining which characters you want to look at. There are several ways to represent character classes, but for our purposes, I will use the bracket form, which I think easiest for beginners. If you need a character to be an alphanumeric, the bracket form for that is [a-zA-Z0-9]. This means: lowercase characters from a though z, uppercase A to Z and numerals from 0 through 9. If you need it to be a numeral, you can use [0-9].

How many characters?

Once you’ve specified what kind of character you’re looking for, you can specify how many of them is acceptable. You can say 0 or more with an * or you could say 1 or more with a + For our purposes, we need to say exactly how many are wanted, and for that you use a numeric range: {} This can be used to specify a range of acceptable numbers of characters, for instance: {1,3} for a minimum of one and a maximum of 3 characters. If you just want a specific number of characters, you just use the one number: {3}

Literal characters

If you are looking for a specific character in the data, you just use that character in the expression. If it happens to be a “metacharacter” $^*.+|\?(){}[] you’ll need to escape it with a backslash so it will be treated as a plain character, not as having an operational meaning in the expression.

Putting Our Example Together

These are the four main ingredients of our regex expression, so having explained those, we can start assembling our regex. This is a simple example. First, we have what must be a group of three numerals, so we use this:

[0-9]{3}

That means “3 and only 3 characters in the range 0-9.” After that, we need to see a dash, which will be in there as a literal character. Following that, we have another group of three, etc. You get the idea:

[0-9]{3}-[0-9]{3}-[0-9]{4}

Now, we have the extension, and this part is optional. We do that by creating a “capturing group” which is set off with parentheses. We follow the matching group with a question mark, indicating that the whole group is optional.

( ?x[0-9]{1,4})?

At the beginning of the group, we have  ?x which is how we match the “x” character with the possibility that there is a space in front of it. The space and the x are literals, and the question mark is saying the space is optional.

We’re almost complete, but there is one more piece needed that I didn’t mention: anchors.

Anchoring the expression

Anchors serve to anchor the expression in the input string. There are two anchor characters, ^$ The carat is used to anchor the expression to the beginning of the incoming string (the data you’re validating) so that nothing squeaks by in “front” of the expression. The dollar sign does that same for the end of the expression. This is needed because without anchors, the expression will match any matching string in the data and basically ignore any characters outside of the match. If you want to make sure there is nothing outside of the matched string in the data, you must use anchors.

Here is the complete expression, as you would enter it into the validation field:

/^[0-9&]{3}-[0-9]{3}-[0-9]{4}( ?x[0-9]{1,4})?$/

Note it includes our delimiters and anchors.

Giving it a test

It is very important to test regular expressions, they can be tricky, difficult to understand, and have unintended effects. You must test them with a lot of different kinds of inputs to make sure it’s holding up. I like to use an online code tester, there are several to choose from, but this is the best one I’ve found: Online Regex Tester

Regex Flags

The general regex spec allows the use of “regex flags” which are letters that are added to the regex expression after the last delimiter. Please note that only the “i” and “m” flags are supported, using other flags will result in the regex failing.

For example, you can use the “i” flag to make the pattern case-insensitive…so for example:

/[a-z]+/i

Would match any string of upper or lower case alphabetical characters in the range “a” through “z”.

Optional Validation

It is possible to set up a regex validator that is optional: in other words, the field does not have to be filled in, but if it is, it will be validated. Take a look at this article that explains how that is done:

Regexes with a Blank Alternative

User interface issues

It’s very important to understand how the user is going to interact with your form, especially considering they may not understand what you’re looking for as well as you do. It’s always a good idea to tell your users what is expected, especially if you are validating the input like we are doing here.

Another thing to consider, the expression we put together here is not very forgiving of slight differences. It wouldn’t validate a capital “X” for instance. If you can accept a wider range of inputs, you should craft your regex allow it because that will make it easier for your users to complete the form. Users dislike being told the form is not completed correctly, so do what you can to help them get it right the first time. If the user can’t figure out how to enter the data correctly, they will blame the website…and they will be correct: it is the site designer’s job to make user interactions work smoothly.

Learning more about regexes

The following links offer solid help and reference information on regexes. It is not a simple subject, so it will take some study to understand them enough to make your own, but you’ll see plenty of examples to get you started. There are several programmer’s forums where practical help with specific regexes can be found. If you do post a question on such a board, include as much specific information as you can. Make it easy for people to answer and they’ll be more likely to help.

  • regular-expressions.info great tutorials, extensive site on regexes
  • PHP.net regex reference
  • Online Regex Tester this is an awesome way to test and learn to use regexes
  • PHP Live Regex – great for testing PHP-specific regex functions, also lets you test several possibilities at once
  • Stack Overflow great programmer’s forum where you can ask questions and get answers from people who know their stuff

15 thoughts on “Using RegEx Validation

  1. Hi

    I have made a regex to verify UK postcodes to a limited number of post code areas using regex101. It works in regex 101 but when I paste it into the the field it doesn’t match. It should match if the first part of the postcode is NG16 DE7 NG15 of NG8 and the last part is a digit followed by 2 letters. I use NG16 1HB as a postcode that should match. The regex from “https://regex101.com/ I used is:
    /^(NG16|DE7|NG15|NG8)?\s*[0-9][A-Z]{2}$/gmi

    1. Hi Gregor,

      The issue there is the regex match field validation doesn’t support flags in the regex, so you need to build the regex without them:

      /^(NG16|DE7|NG15|NG8)?\s*[0-9][A-Z]{2}$/

      The only one that you might need is the “i” flag, if you want to allow people to use lower case in the post code. The “g” and “m” are unnecessary with a test like this.

      1. Hi Roland (and Gregor)!

        Following your discussion of postcode validation with interest. I decided I wanted to split the post code into two halves so that I could sort the database on Post Code Area (e.g RG3) and ignore the specific Postman’s Walk (1AA). Did either of you have a suggestion for trimming leading and trailing spaces so that the user entry didnt fail because they couldnt see they’d added a surplus invisible space?
        My code so far for the Post Code Area is

        |[A-Z]{1,2}[0-9][A-Z0-9]$|

        Thanks
        John

        1. There isn’t a built-in way to trim spaces from the data that someone enters, but you can construct your expression to allow for it. If you are using anchors (^ and $) then you must explicitly allow spaces in your regex, which you can do with something like  *, for example:

          /^ *[A-Z]{1,2}[0-9][A-Z0-9] *$/

          If you don’t want those spaces making it into the data, you can disallow them in the regex, then instruct the user to avoid adding them. Alternatively, you can write a simple snippet of custom code to trim the spaces out.

  2. Helo, how to not allowed same email being register two time, i would like to allowed only 1 emel address can be register for each user

    1. In the Participants Database settings, under the “Signup Form” tab, you can set this up using the “Duplicate Record Check Field” set to your email field. Then “Duplicate Record Preference” determines what happens when there is a match, and “Duplicate Record Error Message” sets the error message that is shown when the fields match.

  3. I need to test a field named “year of birth” and entered following regex: ^19[2-9][0-9]|200[0-4]$. That means values between 1920 an 2004 are accepted. Testing this regex with “https://regex101.com/” works well, but doesn’t work in PDb, with or without adding delimiters.

    Also your phone-number example /^[[0-9]]{3}-[[0-9]]{3}-[[0-9]]{4}( ?x[[0-9]]{1,4})?$/ doesn’t work, neither in PDb nor in “https://regex101.com/”. But changing the regex in ^[0-9]{3}-[0-9]{3}-[0-9]{4}( ?x[0-9]{1,4})?$ will work in “https://regex101.com/” but not in PDb.

    Ir seems to me that regex in PDb doesn’t work at all. Am I facing a configuration problem?

    1. I’m sorry about the confusion, the text editor added extra brackets to the regex string, breaking it. It should be:

      /^[0-9]{3}-[0-9]{3}-[0-9]{4}( ?x[0-9]{1,4})?$/

      Participants Database regex validations must always have delimiters. Regex101 adds them automatically.

      As to your regex not working, it’s helpful to let me know which string gave a false positive or negative.

      1. Hallo Roland,
        thanks for your reply. Meanwhile I found out that regex validation is subject of the front end. There it works fine. Unfortunately I started testing in the backend part of PDb. Obviously data field input in the backend seems to be not subject of field validation, any value can be added/changed independent from regex rules defined. So I was a little bit confused.

        I like your plugin, it provides the felxibility I need.
        Best regards,
        Peter

        1. There is a plugin setting under the “Admin” tab: “Admin Record Edits are Validated” that you can use to enforce validation in the backend. This is off by default.

  4. Hello

    Is there any way to set a “regex” field as “not required”?

    Greetings

    1. Take a look at this article:

      Regexes with a Blank Alternative

      1. The main goal that i need is to don’t show the “required” message next to the label.

        Greetings

        1. Fran, you can do this with a CSS rule that hides it. There isn’t any way to prevent it from showing for a required field.

Leave a Reply
You have to agree to the comment policy.

Would you like to be notified of followup comments via e-mail? You can also subscribe without commenting.

15 thoughts on “Using RegEx Validation

  1. Hi

    I have made a regex to verify UK postcodes to a limited number of post code areas using regex101. It works in regex 101 but when I paste it into the the field it doesn’t match. It should match if the first part of the postcode is NG16 DE7 NG15 of NG8 and the last part is a digit followed by 2 letters. I use NG16 1HB as a postcode that should match. The regex from “https://regex101.com/ I used is:
    /^(NG16|DE7|NG15|NG8)?\s*[0-9][A-Z]{2}$/gmi

    1. Hi Gregor,

      The issue there is the regex match field validation doesn’t support flags in the regex, so you need to build the regex without them:

      /^(NG16|DE7|NG15|NG8)?\s*[0-9][A-Z]{2}$/

      The only one that you might need is the “i” flag, if you want to allow people to use lower case in the post code. The “g” and “m” are unnecessary with a test like this.

      1. Hi Roland (and Gregor)!

        Following your discussion of postcode validation with interest. I decided I wanted to split the post code into two halves so that I could sort the database on Post Code Area (e.g RG3) and ignore the specific Postman’s Walk (1AA). Did either of you have a suggestion for trimming leading and trailing spaces so that the user entry didnt fail because they couldnt see they’d added a surplus invisible space?
        My code so far for the Post Code Area is

        |[A-Z]{1,2}[0-9][A-Z0-9]$|

        Thanks
        John

        1. There isn’t a built-in way to trim spaces from the data that someone enters, but you can construct your expression to allow for it. If you are using anchors (^ and $) then you must explicitly allow spaces in your regex, which you can do with something like  *, for example:

          /^ *[A-Z]{1,2}[0-9][A-Z0-9] *$/

          If you don’t want those spaces making it into the data, you can disallow them in the regex, then instruct the user to avoid adding them. Alternatively, you can write a simple snippet of custom code to trim the spaces out.

  2. Helo, how to not allowed same email being register two time, i would like to allowed only 1 emel address can be register for each user

    1. In the Participants Database settings, under the “Signup Form” tab, you can set this up using the “Duplicate Record Check Field” set to your email field. Then “Duplicate Record Preference” determines what happens when there is a match, and “Duplicate Record Error Message” sets the error message that is shown when the fields match.

  3. I need to test a field named “year of birth” and entered following regex: ^19[2-9][0-9]|200[0-4]$. That means values between 1920 an 2004 are accepted. Testing this regex with “https://regex101.com/” works well, but doesn’t work in PDb, with or without adding delimiters.

    Also your phone-number example /^[[0-9]]{3}-[[0-9]]{3}-[[0-9]]{4}( ?x[[0-9]]{1,4})?$/ doesn’t work, neither in PDb nor in “https://regex101.com/”. But changing the regex in ^[0-9]{3}-[0-9]{3}-[0-9]{4}( ?x[0-9]{1,4})?$ will work in “https://regex101.com/” but not in PDb.

    Ir seems to me that regex in PDb doesn’t work at all. Am I facing a configuration problem?

    1. I’m sorry about the confusion, the text editor added extra brackets to the regex string, breaking it. It should be:

      /^[0-9]{3}-[0-9]{3}-[0-9]{4}( ?x[0-9]{1,4})?$/

      Participants Database regex validations must always have delimiters. Regex101 adds them automatically.

      As to your regex not working, it’s helpful to let me know which string gave a false positive or negative.

      1. Hallo Roland,
        thanks for your reply. Meanwhile I found out that regex validation is subject of the front end. There it works fine. Unfortunately I started testing in the backend part of PDb. Obviously data field input in the backend seems to be not subject of field validation, any value can be added/changed independent from regex rules defined. So I was a little bit confused.

        I like your plugin, it provides the felxibility I need.
        Best regards,
        Peter

        1. There is a plugin setting under the “Admin” tab: “Admin Record Edits are Validated” that you can use to enforce validation in the backend. This is off by default.

  4. Hello

    Is there any way to set a “regex” field as “not required”?

    Greetings

    1. Take a look at this article:

      Regexes with a Blank Alternative

      1. The main goal that i need is to don’t show the “required” message next to the label.

        Greetings

        1. Fran, you can do this with a CSS rule that hides it. There isn’t any way to prevent it from showing for a required field.

Leave a Reply
You have to agree to the comment policy.

Would you like to be notified of followup comments via e-mail? You can also subscribe without commenting.