RegexFinder (Draft)

Preliminaries

This report presents a small report on regular expressions (regex) used in some files of the NPM1 package manager for the JavaScript2 programming language. To obtain the data, we coded a script ‘RegexFinder.rb’ (RegexFinder) in Ruby3 programming language. As input, it takes a directory where NPM is installed. The output is information related to the regex usage; this is presented on display and saved either as markdown and pdf files.

Settings: The NPM version is 6.13.6; it is installed in a macOS operative system. The programming language is Ruby version is 2.6.3p62; we also use Pandoc4 version 2.9.2.1 to build a pdf report.

File directory: Normally, in macOS, NPM global libraries are installed in /usr/local/lib/. This survey considers libraries in the following directory (and its subdirectories) /usr/local/lib/node_modules. To see the complete list of global libraries, please refer to this global libraries.txt file.

Limitations: RegexFinder only looks for regexes that match the pattern (replace|match)\((/.+/\w*). Other regex usages such as the RegExp object and /…/ =~ ‘…’ are not considered. Additionally, there is a list of ‘skipped’ files.

Skipped files: We have omitted eight files from this survey since they seem to be damaged or incorrectly formatted (in utf8). If they are correctly formatted, then those may have long strings that make Pandoc collapse. Those files are:

Function types: A function in JavasScript is a parametric block of code composed by a named or anonymous identifier, function code, a list of parameters and a return value. During this survey, we found the following function types, see Function 1, 2, 3 and 4. Note that there exists at least other two functions types: function* generator{…} and new Function().

Nested instruction: A nested instruction (or function) is an instruction defined within another. For example: function outer(function inner(params){ … }){ … }

// Function 1: Regular function
function foo(parameters) {
    ...
}
// Function 2: Function expression
const foo = function(parameters) {
    ...
}
// Function 3: Shorthand method definition
const foo = {
    var(parameters){
        ...
    },
    baz(parameters){
        ...
    }
};
// Function 4: Arrow function
const foo = (parameters) => {
    ...
}

Survey results

This section is composed of two parts. The first one is a table that summarizes the information found by RegexFinder. The second part is the pdf Report produced by the execution of RegexFinder.

Summary Table shows some relevant information obtained by RegexFinder from the JavaScript files. The columns have the following meanings.

Report (output): This part briefly presents the parameters on which RegexFinder executes, the skipped files and some data regarding the regex usages in non-skipped files. Such data is as follows: the Regex pattern, a boolean flag Backrefence that shows whether it is used with backreferences, the variable that uses the pattern Variable name a field Description which is not used at the moment, and the line number Row and instruction instruction where the regex appears.

Note: Attached to this report is the Report-Complete.md file containing all regex occurrences found using the script with the same restrictions as presented at the beginning of this document.


Attached files

Markdown and text files

Or, if you prefer, Pdf files:


Backreferences in numbered and named capturing groups

Another usage of capturing groups is using its contents within the regular expression itself.

Backreference by number: We can reference a capture group by number; where number takes values from 1 to 9. For example, in the regex: (a^*)b\1, the \1 is matching a^*.

Backreference by name: If in a regular expression, there exist more than nine capture groups, we must provide names to identify them. The syntax to name capture groups is ?<name>, so we can backreference them with \k<name>. Note that we can use this convention even for less than nine capture groups, for example: (?<part>a^*)b\k<part>. As before, \k<part> matches a^*.

An execution of the RegexFinder script on the NPM library (as described above) produces the following results:


  1. https://www.npmjs.com↩︎

  2. https://www.javascript.com↩︎

  3. https://www.ruby-lang.org/en/↩︎

  4. https://pandoc.org↩︎