Classes

The Replus Class

class replus.Replus(patterns_dir_or_dict: str | PathLike | Dict[str, Dict], whitespace_noise: str | None = None, flags: int | None = RegexFlag.V0)

The Replus engine class builds and compiles regular expressions based on templates.

Variables:
  • group_counter – a Counter object to count group name occurrence on each template

  • patterns – a list of tuples made of [(key, pattern, template), …]

  • patterns_src – a dict containing all of patterns_dir/*.json combined together, “patterns” excluded

  • patterns_all – all patterns that can be run, e.g. {“dates”: [pattern0, pattern1], …}

  • all_groups – a dict of list with the templates as keys, e.g. {pattern_template_a: [group_0, group_1], pattern_template_b: [group_0, group_1]}

  • flags – the regex flags to compile the patterns

  • whitespace_noise – a pattern to replace white space in the template

Instantiates the Replus engine

Parameters:
  • patterns_dir_or_dict (Union[os.PathLike, Dict[str, Dict]]) – the path to the directory where the *.json pattern templates are stored or a dict of dicts with the patterns.

  • whitespace_noise (str, defaults to None) – a pattern to replace white space in the template

  • flags (int, defaults to regex.V0) – the regex flags to compile the patterns

parse(string: str, filters: List[str] | None = None, exclude: List[str] | None = None, pos: int | None = None, endpos: int | None = None, flags: int | None = 0, overlapped: bool | None = False, partial: bool | None = False, concurrent: bool | None = None, timeout: float | None = None, ignore_unused: bool | None = False, **kwargs: Any) List[Match] | List[Group]

Returns a list of Match objects

Parameters:
  • string (str) – the string to parse

  • filters (List[str]) – one or more pattern types to parse; if none is provided, all will be used

  • exclude (List[str], defaults to None) – a list of pattern types to exclude

  • pos (int, defaults to None) – starting position of the matching

  • endpos (int, defaults to None) – ending position of the matching

  • flags (int, defaults to 0) – flags to use while matching

  • overlapped (bool, defaults to False) – if True will allow overlapping matches

  • partial (float, defaults to None) – if True will allow partial matches

  • concurrent (bool, defaults to None) – if True will run concurrently

  • timeout – timeout for matching

  • ignore_unused (bool, defaults to False) – ignore unused

Returns:

a list of Match objects

Return type:

List[Match]

static purge_overlaps(matches: List[Match] | List[Group]) List[Match] | List[Group]

Purge the list of Match and Group objects from overlapping instances

Parameters:

matches (Union[List[Match], List[Group]]) – a list of Match or Group objects

Returns:

a list of Match or Group objects

Return type:

Union[List[Match], List[Group]]

search(string: str, filters: List[str] | None = None, exclude: List[str] | None = None, pos: int | None = None, endpos: int | None = None, flags: int | None = 0, overlapped: bool | None = False, partial: bool | None = False, concurrent: bool | None = None, timeout: float | None = None, ignore_unused: bool | None = False, **kwargs: Any) Match | Group | None

Returns a single Match object

Parameters:
  • string (str) – the string to parse

  • filters (Tuple[str]) – one or more pattern types to parse; if none is provided, all will be used

  • exclude (List[str], defaults to None) – a list of pattern types to exclude

  • pos (int, defaults to None) – starting position of the matching

  • endpos (int, defaults to None) – ending position of the matching

  • flags (int, defaults to 0) – flags to use while matching

  • overlapped (bool, defaults to False) – if True will allow overlapping matches

  • partial (float, defaults to None) – if True will allow partial matches

  • concurrent (bool, defaults to None) – if True will run concurrently

  • timeout – timeout for matching

  • ignore_unused (bool, defaults to False) – ignore unused

Returns:

a Match object

Return type:

Match

The Match Class

class replus.Match(match_type: str, match: Match, all_groups_names: List[str], pattern: Pattern)

A Match object is an abstract and expanded representation of a regex.regex.Match

Variables:
  • type – the type of the match, corresponding to the stem of the file of the pattern’s template

  • match – a regex.regex.Match object

  • partial – if it’s a partial match

  • value – the string value of the match

  • offset – the offset of the match {"start": int, "end": int}

  • pattern – the string representation of the pattern that matched

  • length – the length of the match (no. of characters)

  • all_group_names – all the names of all the groups for the corresponding pattern for this match

  • _start – the start offset of the Match

  • _end – the end offset Match

  • _span – the span of the Match (_start, _end)

Instantiates a Match object

Parameters:
  • match_type (str) – the type of the match, corresponding to the stem of the file of the pattern’s template

  • match (regex.regex.Match) – a regex.regex.Match object

  • all_groups_names (List[str]) – all the names of all the groups for the corresponding pattern for this match

  • pattern – the pattern that matched

Type:

pattern: regex.regex.Pattern

groups(group_query: str | None = None, root: bool = False) List[Group]

Returns a list of repeated Group objects that belong to the Match object

Parameters:
  • group_query (str, defaults to None) – the name of the group to find repetitions of

  • root (bool, defaults to False) – includes the root if True

Returns:

a list of Group objects

Return type:

List[Group]

The Group Class

class replus.Group(match: Match, group_name: str, root: Match, rep_index: int = 0)

A Group object is an abstract and expanded representation of a regex.regex.Match

Variables:
  • root – the root Match object

  • match – a regex.regex.Match object

  • name – the name of the group, including its rep_index. E.g.: date_0

  • key – the key of the group, i.e. the name without the rep_index

  • value – the string value of the match

  • offset – the offset of the match {"start": int, "end": int}

  • length – the length of the match (no. of characters)

  • rep_index – the repetition index

  • _start – the start offset of the Match

  • _end – the end offset Match

  • _span – the span of the Match (_start, _end)

groups(group_query: str | None = None, root: bool = False) List[Group]

Returns a list of repeated Group objects that belong to the Group object

Parameters:
  • group_query (str, defaults to None) – the name of the group to find repetitions of

  • root (bool, defaults to False) – includes the root if True

Returns:

a list of Group objects

Return type:

List[Group]

reps() List[Group]

Returns a list of the Group object’s repetitions

Returns:

a list of the Group object’s repetitions

Return type:

List[Group]