Lumesh Regex Module

wiki libs

Lumesh Regular Expression Module Documentation

Module Function List

Use help regex for assistance.

Function Description

Provides string processing capabilities based on regular expressions, implemented using the lightweight regex_lite library. All functions support two calling styles:

  • Functional Call: regex.func(arg0, arg1...)
  • Imperative Call: regex.func arg0 arg1...

Function List and Detailed Description

1. regex.match

  • Functionality: Checks if the text matches the regular expression pattern.
  • Parameters:
    • pattern: string - Regular expression.
    • text: string - Text to match.
  • Return Value: boolean - Returns true if matched, otherwise false.
  • Example:
    # Check for digit match
    regex.match '\d+' '123' # => true
    regex.match '\d+' 'abc' # => false

2. regex.find

  • Functionality: Finds the position and content of the first match.
  • Parameters:
    • pattern: string - Regular expression.
    • text: string - Text to search.
  • Return Value: [start, end, text] | none
    • start: integer - Starting index of the match (0-based).
    • end: integer - Ending index of the match.
    • text: string - Matched content.
    • Returns none if no match is found.
  • Example:
    regex.find '\d+' 'abc123def'
    # => [3, 6, '123']

3. regex.find_all

  • Functionality: Finds all matches and their positions.
  • Parameters:
    • pattern: string - Regular expression.
    • text: string - Text to search.
  • Return Value: [[start, end, text], ...] - List of match results (may be empty).
  • Example:
    regex.find_all '\d+' '12a34b56'
    # => [[0,2,'12'], [3,5,'34'], [6,8,'56']]

4. regex.capture

  • Functionality: Extracts the first matching capture group.
  • Parameters:
    • pattern: string - Regular expression with capture groups.
    • text: string - Text to search.
  • Return Value: [full, group1, group2, ...] | none
    • full: string - Full matched text.
    • groupN: string | none - Content of the N-th capture group (returns none if not matched).
  • Example:
    regex.capture '(\d+)-(\w+)' '123-abc'
    # => ['123-abc', '123', 'abc']

5. regex.captures

  • Functionality: Extracts all matching capture groups.
  • Parameters:
    • pattern: string - Regular expression with capture groups.
    • text: string - Text to search.
  • Return Value: [[full, group1, ...], ...] - List of all matching capture groups.
  • Example:
    regex.captures '(\d+)' 'a1b2'
    # => [['1'], ['2']]

6. regex.split

  • Functionality: Splits text by the regular expression.
  • Parameters:
    • pattern: string - Regular expression used as a delimiter.
    • text: string - Text to split.
  • Return Value: [part1, part2, ...] - List of substrings after splitting.
  • Example:
    regex.split '\s*,\s*' 'a, b, c'
    # => ['a', 'b', 'c']

7. regex.replace

  • Functionality: Replaces all matches.
  • Parameters:
    • pattern: string - Regular expression.
    • replacement: string - Replacement text (supports $n for capture group references).
    • text: string - Text to process.
  • Return Value: string - New string after replacement.
  • Example:
    regex.replace '(\d+)' 'number:$1' '123 abc'
    # => 'number:123 abc'

8. regex.capture_name

  • Functionality: Extracts named capture groups.
  • Parameters:
    • pattern: string - Regular expression with named capture groups (e.g., (?P<name>...)).
    • text: string - Text to search.
    • names: boolean (optional) - Whether to return group names (default is false).
  • Return Value:
    • When names=false: [group1, group2, ...] | none - List of capture group values (skips the full match at index 0).
    • When names=true: [[name, value], ...] | none - List of pairs of group names and values.
  • Example:
    # Return group values
    regex.capture_name '(?P<num>\d+)(\w+)' '123abc'
    # => ['123', 'abc']

    # Return group names and values
    regex.capture_name '(?P<num>\d+)(\w+)' '123abc' true
    # => [['num', '123'], [none, 'abc']] // Unnamed group shows as none

General Notes

  • Regex Syntax: Follows regex_lite specifications (a lightweight subset of PCRE).
  • Escape Handling: Double backslashes are required in Lumesh strings (e.g., \d represents a digit).
  • Error Handling:
    • Throws errors for non-string parameters.
    • Throws errors for invalid regex patterns.
  • Indexing Rules: All position indices start from 0 (left-closed, right-open interval).

Regular Expression Syntax Reference

Follows standard regular expression syntax; special characters need to be escaped:

Character Meaning Example
. Any character a.c matches “abc”
\d Digit [0-9] \d+ matches “123”
\w Word character [a-zA-Z0-9_] \w+ matches “var1”
\s Whitespace character a\sb matches “a b”
* 0 or more times a*b matches “b”, “ab”, “aab”
+ 1 or more times a+b matches “ab”, “aab”
? 0 or 1 time a?b matches “b”, “ab”
{n} Exactly n times a{3} matches “aaa”
^ Start of string ^a matches strings starting with a
$ End of string a$ matches strings ending with a
[…] Character set [aeiou] matches any vowel
[^…] Non-character set [^0-9] matches non-digit characters
`a
() Grouping (ab)+ matches “abab”

Note: In double-quoted strings, backslashes must be escaped, e.g., "\d" should be written as "\\d", while single quotes do not require escaping: '\d'.

Advanced Usage

Performance Recommendations

  1. For frequently used regular expressions, use new to precompile and store the result.
  2. Prefer using find over match when only partial matches are needed.
  3. For simple string operations, consider using string functions instead of regular expressions.

Error Handling

  • All functions validate the number and type of parameters.
  • Invalid regular expressions will return descriptive errors.
  • Non-matching cases typically return None instead of an error.