A Walkthrough Guide to Regular Expressions (RegEx) in JavaScript

A Walkthrough Guide to Regular Expressions (RegEx) in JavaScript

How to Work With Regular Expressions (REGUX) in JavaScript

A JavaScript Regular Expression (or Regex) is a sequence of characters that we can utilize to work effectively with strings. Using this syntax, we can:

  • search for text in a string
  • replace substrings in a string
  • extract information from a string

Dating back to the 1950s, Regular Expressions were formalized as a concept for pattern searching in string-processing algorithms.

JavaScript has regular expressions support directly built into the language. A solid understanding of regular expressions will make you a much more effective programmer. So let’s get started!

Basic Regex Pattern

Here’s a basic pattern:

var regex = /hello/;

console.log(regex.test('hello world'));  
// true

We are simply matching the literal text with the test string. We’ll look at the regex test method in detail shortly.

Why Use Regular Expressions?

As mentioned, Regular Expressions are a way to describe patterns in string data. We can use them to check a string of characters, for example, to look for an e-mail address — by matching the pattern which is defined by our regular expression.

Creating a Regular Expression

In JavaScript, we can create a Regular Expression in two ways: Either by using the RegExp constructor or by using forward slashes / to enclose the regex pattern.

The Constructor Method:

The syntax is like so:

new RegExp(pattern[, flags])

So for example:

let regexConst = new RegExp('abc');

The Literal Method:

The syntax is like so:

/pattern/flags

An example:

let regexLiteral = /abc/;

Note: flags are optional, we’ll look at these later in this article!

In the case where you need to create regular expressions dynamically, you’ll need to use the constructor method.

In either case, the result will give a regex object — which will have the same methods and properties attached to them, for our use.

Regular Expression Methods

When testing our regular expressions, we generally use one of two methods: RegExp.prototype.test() or RegExp.prototype.exec().

RegExp.prototype.test()

We use this method to test whether a match has been found or not. It accepts a string that we test against a regular expression and returns either true or false, depending if the match is found or not.

Let’s see an example:

let regex = /hello/; 
let str = 'hello world';
let result = regex.test(str);

console.log(result); 
// returns 'true' as hello is present in our string

RegExp.prototype.exec()

We use this method to receive an array of all the matched groups. It accepts a string that we test against our regular expression.

An example:

let regex = /hello/;
let str = 'hello world';
let result = regex.exec(str);

console.log(result);
// returns [ 'hello', index: 0, input: 'hello world', groups: undefined ]

In this example, ‘hello’ is our matched pattern, index is where the regular expression starts & input is the string that was passed.

For the rest of the article, we’ll be using the test() method.

The Power of Regex

We’ve so far seen how to create simple regular expression patterns. This is really just the tip of the iceberg. Let’s now take a dive into the syntax to see the full power of regular expressions for handling more complex tasks!

An example of a more complex task would be if we needed to match a number of email addresses. By using the special characters defined in the syntax — we can achieve this!

Let’s take a look now so we can more fully grasp & therefore utilize regular expressions in our programs.

Flags:

In any regular expression, we can use the following flags:

  • g: matches the pattern multiple times
  • i: makes the regex case insensitive
  • m: enables multi-line mode. Where ^ and $ match the start and end of the entire string. Without this, multi-line strings match the beginning and end of each line.
  • u: enables support for Unicode
  • s: short for single line, it causes the . to also match newline characters

Flags may also be combined in a single regular expression & the flag order doesn’t matter. They are added at the end of the string in regex literals:

/hello/ig.test('HEllo')
// returns true

If using RegExp object constructors, they’re added as the second parameter:

new RegExp('hello', 'ig').test('HEllo') 
// returns true

Character groups:

Character Set [abc]

We use character sets to match different characters in a single position. They match any single character in the string with the characters inside the brackets:

let regex = /[hc]ello/;

console.log(regex.test('hello'));
// returns true

console.log(regex.test('cello'));
// returns true

console.log(regex.test('jello'));
// returns false

Negated Character Set [^abc]

It matches anything that is not enclosed in the brackets:

let regex = /[^hc]ello/;

console.log(regex.test('hello'));
// returns false

console.log(regex.test('cello'));
// returns false

console.log(regex.test('jello'));
// returns true

Ranges [a-z]

If we want to match all of the letters of an alphabet in a single position, we can use ranges. For example: [a-j] will match all the letters from a to j. We can also use digits like [0–9] or capital letters like [A-Z]:

let regex = /[a-z]ello/;

console.log(regex.test('hello'));
// returns true

console.log(regex.test('cello'));
// returns true

console.log(regex.test('jello'));
// returns true

If at least one character exists in the range we test, it’ll return true:

/[a-z]/.test('a')  // true
/[a-z]/.test('1')  // false
/[a-z]/.test('A')  // false (as our range is in lower case)
/[a-c]/.test('d')  // false
/[a-c]/.test('cd') // true (as 'c' is in the range)

Ranges can also be combined using -:

/[A-Z-0-9]/

/[A-Z-0-9]/.test('a') // false
/[A-Z-0-9]/.test('1') // true
/[A-Z-0-9]/.test('A') // true

Multiple range item matches

We can check if a string contains one or only one character in a range. Start the regex with ^ and end with $:

/^[A-Z]$/.test('A')  // true
/^[A-Z]$/.test('AB') // false
/^[A-Z]$/.test('Ab') // false
/^[A-Z-0-9]$/.test('1')  // true
/^[A-Z-0-9]$/.test('A1') // false

Meta-characters

Meta-characters are characters with a special meaning. Let’s take a look at some of these here:

  • \d: matches any digit, being [0-9]
  • \D: matches any character that is not a digit, effectively [^0-9]
  • \w: matches any alphanumeric character (plus underscore), equivalent to [A-Za-z_0-9]
  • \W: matches any non-alphanumeric character, so anything except [^A-Za-z_0-9]
  • \s: matches any whitespace character: spaces, tabs, newlines, and Unicode spaces
  • \S: matches any character that is not a whitespace
  • \0: matches null
  • \n: matches a newline character
  • \t: matches a tab character
  • \uXXXX: matches a unicode character with code XXXX (requires the u flag)
  • .: matches any character that is not a newline char (e.g. \n) (unless you use the s flag, explained later on)
  • [^]: matches any character, including newline characters. It’s very useful on multi-line strings

Quantifiers

Quantifiers are symbols that have a unique meaning in regex.

Let’s see them in action:

  • + Matches the preceding expression 1 or more times: ```js let regex = /\d+/;

console.log(regex.test('1')); // true

console.log(regex.test('1122')); // true


- `*` Matches the preceding expression 0 or more times:
```js
let regex = /\d+/;

console.log(regex.test('1'));
// true

console.log(regex.test('1122'));
// true
  • ? Matches the preceding expression 0 or 1 time, that is preceding pattern is optional: ```js let regex = /hii?d/;

console.log(regex.test('hid')); // true

console.log(regex.test('hiid')); // true

console.log(regex.test('hiiid')); // false


- `^` Matches the beginning of the string, the regex that follows should be at the start of the test string:
```js
let regex = /^h/;

console.log(regex.test('hi'));
// true

console.log(regex.test('bye'));
// false
  • $ Matches the end of the string, the regex that precedes it should be at the end of the test string: ```js let regex = /.com$/;

console.log(regex.test('')); // true

console.log(regex.test('test@email')); // false


- `{N}` Matches _exactly_ N occurrences of the preceding regex:
```js
let regex = /hi{2}d/;

console.log(regex.test('hiid'));
// true

console.log(regex.test('hid'));
// false
  • {N,} Matches at least N occurrences of the preceding regular expression. ```js let regex = /hi{2,}d/;

console.log(regex.test('hiid')); // true

console.log(regex.test('hiiid')); // true

console.log(regex.test('hiiiid')); // true


- `{N,M}` Matches _at least_ N occurrences and _at most_ M occurrences of the preceding regex (when M > N).
```js
let regex = /hi{1,2}d/;

console.log(regex.test('hid'));
// true

console.log(regex.test('hiid'));
// true

console.log(regex.test('hiiid'));
// false
  • X|Y Alternation matches either X or Y: ```js let regex = /(red|yellow) bike/;

console.log(regex.test('red bike')); // true

console.log(regex.test('yellow bike')); // true

console.log(regex.test('brown bike')); // false

_Note_: To use any special character as a part of the expression, for example, if you want to match literal `+` or `.`, then you’ll need to escape them with a backslash `\`. Like so:
```js
let regex = /a+b/;  
// this doesn't work

var regex = /a\+b/; 
// this works!

console.log(regex.test('a+b')); 
// true

Reviewing Regex

With these concepts fresh in our minds, let’s review what we’ve learned!

Match any 10-digit number:

let regex = /^\d{10}$/;

console.log(regex.test('4658264822'));
// true

So \d matches any digit character. {10} matches the previous expression, in this case \d exactly 10 times. So if the test string contains less than or more than 10 digits, the result will be false.

Match date with the following format: DD-MM-YYYY or DD-MM-YY

let regex = /^(\d{1,2}-){2}\d{2}(\d{2})?$/;

console.log(regex.test('01-01-2000'));
// true

console.log(regex.test('01-01-00'));
// true

console.log(regex.test('01-01-200'));
// false

Here we’ve wrapped the entire expression inside ^ and $ so that the match spans the entire string. ( is the start of the first sub-expression. \d{1,2} matches at least 1 digit and at most 2 digits. - matches the literal hyphen character. ) is the end of first sub-expression.

Then {2} matches the first sub-expression exactly 2 times. \d{2} matches exactly 2 digits. (\d{2})? matches exactly 2 digits. However it’s optional, so either year contains 2 digits or 4 digits.

Summary

And there we go! We’ve examined Regular Expressions from the very basics right through to more advanced implementations. Including both the literal and constructor methods, testing methods, flags, and character syntax.

Regular expressions can indeed be fairly complex! However, taking the time to learn the syntax will greatly help you to identify the regex patterns more easily. Any new confidence you gain will surely have you ready to conquer the next obstacle you encounter on your coding journey!

Conclusion

If you liked this blog post, follow me on Twitter where I post daily about Tech related things! Buy Me A Coffee If you enjoyed this article & would like to leave a tip — click here

🌎 Let's Connect

Did you find this article valuable?

Support Richard Rembert by becoming a sponsor. Any amount is appreciated!