Regular expressions in JavaScript

Today we will discuss one of the slightly more difficult topics, namely regular expressions. Of course, really difficult are the most powerful and advanced expressions. However, possibilities offered by this tool are really huge.

Regular expressions in JavaScript

Regular Expressions are a way to save a pattern, which can be compared with strings, to check whether a string matches the specified pattern (also for searching substrings, replacing matched elements, verification, etc).

For example, you can specify a list of characters that may appear in some position:

// after "a" letter must appear a digit from 0 to 7:
a[01234567]

So if after “a” letter one of listed digits will appear, then the pattern will be matched.

There are also short versions i.e. with specifying ranges: a[0-7], [a-z], [A-Z], etc.

The list can be negated using the “^” char:

[^0-9a-zA-Z]

Special sequences

There are special sequences, replacing the predefined character sets:

– \d — any digit: [0-9]
– \D — any char (not a digit): [^0-9]
– \w — any digit, a letter (lowercase or capital) or underscore: [0-9a-zA-Z_]
– \W — any character which is not a digit, a letter, and an underscore: [^0-9a-zA-Z_]
– \s — any whitespace: [ \t\r\n\v\f]
– \S — any non-whitespace: [^ \t\r\n\v\f]

Special chars:

– \t — horizontal tab character (0x09)
– \v — vertical tab character (0x0B)
– \r — “carriage return” (0x0D)
– \n — new line (0x0A)
– \f — page break (0x0C)

Repetition of pattern

Sometimes a piece of the pattern should be repeated. And there are special qualifiers for this, and they determine how many times the pattern can (or must) repeat.

The list:

{n} — exactly n occurrences of the preceding expression
{n,} — n or more occurrences of the preceding expression
{n,m} — from n to m occurrences of the preceding expression
? — preceding item is optional (may occur 0 or 1 time)
+ — preceding element can occur one or more times
* — preceding element can occur zero or more times

Example:

/t{1,2}/ matching to words with 't', 'tt'

Flags

There are three flags that you can use for regular expressions in JavaScript:

g — search globally

i — ignore case sensitive

m — multi-line input, starts with “^”, ends with “$”; in other words processing is applied to a string containing multiple lines

Example:

/fooBar/i;

Let’s get to use regular expressions in JavaScript

The regular expression in JS can be created in two ways:

– through the use of literal,

– by formally calling of the RegExp constructor.

In the first case, put in the code an expression between slashes (optionally with flags at the end):

/expression/[flags]

In the second case, the expression is created with the following structure:

new RegExp("expression"[, "flags"])

// a
/(ab)+c/gi

// b
new RegExp("(ab)+c", "gi")

Expressions marked in example as (a) and (b) are equivalent.

Methods of the RegExp and String objects for regular expressions

RegExp.compile(expr[, flags]) — internal compilation of the expression, which allows for faster subsequent processing later.

RegExp.exec(expr) — performs a search over a string using expression expr, returns an array with results.

Example — using the RegExp.exec() method:

// sample 1
var result = /s(amp)le/i.exec("My Sample Text");
// result: "Sample","amp"

// sample 2
var str = "John is tall, but Steve is taller.";
var regexp = /John/;
var res = regexp.exec(str);
for (index in res) {
    document.write("result[" + index + "] = "
       + res[index] + "<br />");
}

RegExp.test(expr) — returns true if the string contains a pattern matching expression, false if it doesn’t.

Example — the RegExp.test() method:

var isOk = /sample/.test("My Sample text");
// result: false (because of case sensitive / "i" flag missing)

String.match(expr) — compares the string with a regular expression to match result(s).

Example — using the String.match() method:

var str = "Programming with RoR".match(/r?or?/gi);
// result: ["ro","RoR"]

String.search(pattern) — compares a regular expression with a string and returns the index of the beginning of the matching string, or -1 if nothing fits.

Example — using the String.search() method:

var idx = "Looking for JavaScript tips…".search(/for/);
// result: 8

String.replace(expr, string) — replace if matched.

Example — using the String.replace() method:

var re = /(\w+)\s(\w+)/;
var str = "John Doe";
var newstr = str.replace(re, "$2, $1");
alert(newstr); // result: "Doe, John"

Symbols “$1” and “$2” are used by the script to mark matchings in regular expression.

String.split(expr) — method divides a string into an array in places where the pattern will match.

Example — using the String.split() method:

var str = "I am happy!".split(/\s/g);
// result: ["I","am","happy"]

And at this point we have a complete toolbox to work with regular expressions in JavaScript.

Finally, a few practical examples.

Example — parsing the URL address:

// pattern
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

var url = "http://www.server.com:80/stuff?q#fragment";
var result = parse_url.exec(url);
var names = ['url', 'protocol', 'slash', 'server', 'port',
    'path', 'query', 'anchor'];

var blanks = '              ';
var i;
for (i = 0; i < names.length; i += 1) {
    document.writeln(names[i] + ':' + blanks.substring(
        names[i].length), result[i]);
}

Example — removing line breaks from a string (e.g. from page header):

var reg = new RegExp("(<br| /)>", "i");
sub_header_content = header_content.replace(reg, "");

Next two functions are from my standard library of JavaScripts for data validation. The first checks if typed only digits, the second is a simple e-mail validator.

Example — digits only allowed:

/**
 * Allow: 0-9 digits only
 */
function onlyDigitsUtil(input) {
    var out    = "";
    var dt     = 0;
    var result = "";

    var regExp = /[0-9]+/;
    result = regExp.exec(input);

    if (result !== null) {
        dt = result.length;

        for (var i = 0; i < dt; i++) {
            out += result[i];
        }
    }

    return out;
}

Example — e-mail validation in JavaScript:

/**
 * E-mail address validation
 */
function emailValidatorQuickTest(fieldVal) {
    // pattern
    var filter  = /^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/;

    if (!filter.test(fieldVal)) {
        return false;
    }

    return true;
}

Summary

Regular expressions can be a subject for a whole book. I hope this short article will help in the efficient handling of regular expressions in JavaScript.

For the curious I recommend Regular Expressions on MDN, especially the table with special chars description.

/Thank you for your attention./