Ludwig.Regex

This module provides regular expression suport for Ludwig.

There are many different flavours of regular expressions. This module supports specifically POSIX extended regular expressions (ERE). More information these regular expressions can be found here: https://en.wikipedia.org/wiki/Regular_expression#POSIX_extended.

The regexes provided by this module support Unicode, except the different character classes: e.g., [:alpha:] will match "l" but not "λ".

Module Members

backrefs

(Function)

Construct a ‘Replacement’ using traditional back-references.

For more info, see the documentation for sub.

Type Signature

 fun (String) -> Replacement
Argument: template

Template where backreferences should be spliced in.

Type: String

Returns:

The resulting ‘Replacement’.

Type: Replacement

match

(Function)

Match a regex against an input Ludwig.String.

This will match the regex anywhere in the input Ludwig.String. If you do not want that, use "^" and "$" to anchor the Regex.

Type Signature

 fun { regex: Regex,
       input: String} -> List<Match>
Argument: regex

The regex to match.

Type: Regex

Argument: input

The input string.

Type: String

Returns:

The full list of matches.

Type: List<Match>

matches

(Function)

This is a utility function that, instead of returning all the different matches like match, just tells you whether or not there was at least one match.

Type Signature

 fun { regex: Regex,
       input: String} -> Bool
Argument: regex

The regex to match.

Type: Regex

Argument: input

The input string.

Type: String

Returns:

Was there at least one match?

Type: Bool

new

(Function)

Create a new regular expression.

Examples:

sha1: Regex.new {pattern: "^[:xdigit:]{40}$"}
ami: Regex.new {pattern: "ami-[:xdigit:]+"}
float: Regex.new {pattern: "[:digit:]+(\\.[:digit:]+)?"}

Escaping

Note that if you are creating a regex from a raw string, you will need to double-escape backslashes. For example, if you want to match the literal string [], you need to use:

Regex.new {pattern: "\\[\\]"}

Type Signature

 fun { pattern: String,
       caseSensitive: Optional<Bool>,
       multiLine: Optional<Bool> } -> Regex
Argument: pattern

The string to compile.

Type: String

Argument: caseSensitive

Compile a case-sensitive regex. Defaults to True.

Type: Optional<Bool>

Argument: multiLine

Compile a regex that can match over multiple lines. Defaults to False.

Type: Optional<Bool>

Returns:

The compiled regex.

Type: Regex

split

(Function)

Split a string according to a regex.

You can limit the number of times the string is split by specifying the limit argument. If the limit is N, the maximum number of elements in the resulting list is naturally N + 1.

Examples:

split01: Regex.split {
  regex: Regex.new {pattern: '[:space:]+'},
  input: 'So call    me maybe'
} # => ['So', 'call', 'me', 'maybe']

split02: Regex.split {
  regex: Regex.new {pattern: ',[:space:]*'},
  input: 'Tim, 24, Finance, 4,,No'
} # => ['Tim', '24', 'Finance', '4', '', 'No']

Type Signature

 fun { regex: Regex,
       input: String,
       limit: Optional<Int> } -> List<String>
Argument: regex

Regex to split on.

Type: Regex

Argument: input

Input string to split.

Type: String

Argument: limit

Maximum number of splits. Defaults to no limit.

Type: Optional<Int>

Returns:

List of the substrings, split by the given regex.

Type: List<String>

sub

(Function)

Replace occurences of the regex in the input string using a ‘Replacement’.

By default, this will replace all occurences. If you only want to replace one, use 1 for the limit argument.

The ‘replacement’ argument is very flexible, because it is a function that computes the replacement string from the match. Some examples should make this clear:

  • We can replace the matches by a fixed string by using a function that always returns the same value:
Regex.sub {
  regex: Regex.new {pattern: '[:digit:]+'},
  input: '1 lemonade and 2 ice-tea please',
  replacement: fun(_): "100"
} # => '100 lemonade and 100 ice-tea please'
  • We can use the backrefs function to work with traditional back-references. In that case, \0 refers to the whole match, \1 referes to the match first group, and so on. We need to double-escape the backslashes here as well.
Regex.sub {
  regex: Regex.new {pattern: '[:digit:]+'},
  input: '1 lemonade and 2 ice-tea please',
  replacement: Regex.backrefs('\\0 big')
} # => '1 big lemonade and 2 big ice-tea please'
  • We can use a custom function to replace the string, for example incrementing integers by one:
Regex.sub {
  regex: Regex.new {pattern: '[:digit:]+'},
  input: '1 lemonade and 2 ice-tea please',
  replacement: fun(m): case String.toInt(m.range.contents) of
    | Optional i -> Int.toString(i + 1)
    | _          -> '?'
} # => '2 lemonade and 3 ice-tea please'

Type Signature

 fun { regex: Regex,
       input: String,
       replacement: Replacement,
       limit: Optional<Int> } -> String
Argument: regex

The regex to replace.

Type: Regex

Argument: input
Type: String
Argument: replacement

How to replace the matches.

Type: Replacement

Argument: limit

Maximum number of replacements. Defaults to no limit (replace everything).

Type: Optional<Int>

Returns:

A new String with the replacements.

Type: String

Match

(Type)

type Match:
  range: Range
  groups: List<Range>

A single match.

Record {

Field: range

The range of the “whole match”.

Type: Range

Field: groups

The ranges of the submatches (if any).

Type: List<Range>

}

Range

(Type)

type Range:
  contents: String
  start: Int
  length: Int

A range of text that was matched by a regex.

Record {

Field: contents

The contents of the match.

Type: String

Field: start

The absolute, 0-based offset in the input string.

Type: Int

Field: length

The length of the match.

Type: Int

}

Regex

(Type)

type opaque Regex

A compiled regular expression. These can be constructed using the new function.

Replacement

(Type)

type Replacement:
  fun(Match) -> String

This is how the replacement is computed from a match. Match) -> Ludwig.String