Classes

The Replus Class

class replus.Replus(patterns_dir_or_dict: str | PathLike | Dict[str, Dict], whitespace_noise: str | None = None, flags: int | None = RegexFlag.V0)

The Replus engine class builds and compiles regular expressions based on templates.

Variables:
  • group_counter – a Counter object to count group name occurrence on each template

  • patterns – a list of tuples made of [(key, pattern, template), …]

  • patterns_src – a dict containing all of patterns_dir/*.json combined together, “patterns” excluded

  • patterns_all – all patterns that can be run, e.g. {“dates”: [pattern0, pattern1], …}

  • all_groups – a dict of list with the templates as keys, e.g. {pattern_template_a: [group_0, group_1], pattern_template_b: [group_0, group_1]}

  • flags – the regex flags to compile the patterns

  • whitespace_noise – a pattern to replace white space in the template

Instantiates the Replus engine

Parameters:
  • patterns_dir_or_dict (Union[os.PathLike, Dict[str, Dict]]) – the path to the directory where the *.json pattern templates are stored or a dict of dicts with the patterns.

  • whitespace_noise (str, defaults to None) – a pattern to replace white space in the template

  • flags (int, defaults to regex.V0) – the regex flags to compile the patterns

parse(string: str, filters: List[str] | None = None, exclude: List[str] | None = None, pos: int | None = None, endpos: int | None = None, flags: int | None = 0, overlapped: bool | None = False, partial: bool | None = False, concurrent: bool | None = None, timeout: float | None = None, ignore_unused: bool | None = False, **kwargs: Any) List[Match] | List[Group]

Returns a list of Match objects

Parameters:
  • string (str) – the string to parse

  • filters (List[str]) – one or more pattern types to parse; if none is provided, all will be used

  • exclude (List[str], defaults to None) – a list of pattern types to exclude

  • pos (int, defaults to None) – starting position of the matching

  • endpos (int, defaults to None) – ending position of the matching

  • flags (int, defaults to 0) – flags to use while matching

  • overlapped (bool, defaults to False) – if True will allow overlapping matches

  • partial (float, defaults to None) – if True will allow partial matches

  • concurrent (bool, defaults to None) – if True will run concurrently

  • timeout – timeout for matching

  • ignore_unused (bool, defaults to False) – ignore unused

Returns:

a list of Match objects

Return type:

List[Match]

static purge_overlaps(matches: List[Match] | List[Group]) List[Match] | List[Group]

Purge the list of Match and Group objects from overlapping instances

Parameters:

matches (Union[List[Match], List[Group]]) – a list of Match or Group objects

Returns:

a list of Match or Group objects

Return type:

Union[List[Match], List[Group]]

search(string: str, filters: List[str] | None = None, exclude: List[str] | None = None, pos: int | None = None, endpos: int | None = None, flags: int | None = 0, overlapped: bool | None = False, partial: bool | None = False, concurrent: bool | None = None, timeout: float | None = None, ignore_unused: bool | None = False, **kwargs: Any) Match | Group | None

Returns a single Match object

Parameters:
  • string (str) – the string to parse

  • filters (Tuple[str]) – one or more pattern types to parse; if none is provided, all will be used

  • exclude (List[str], defaults to None) – a list of pattern types to exclude

  • pos (int, defaults to None) – starting position of the matching

  • endpos (int, defaults to None) – ending position of the matching

  • flags (int, defaults to 0) – flags to use while matching

  • overlapped (bool, defaults to False) – if True will allow overlapping matches

  • partial (float, defaults to None) – if True will allow partial matches

  • concurrent (bool, defaults to None) – if True will run concurrently

  • timeout – timeout for matching

  • ignore_unused (bool, defaults to False) – ignore unused

Returns:

a Match object

Return type:

Match

The AbstractMatch Class

class replus.AbstractMatch
end(group_name: str | None = None, rep_index: int | None = None) int

Returns the end character index of self or of Group with group_name

Parameters:
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns:

the end index of the Match

Return type:

int

first() Group | None

Returns the first Group object or None

Returns:

the first Group object

Return type:

Union[Group, None]

group(group_name: str) Group | None

Returns a Group object with the given group_name or None

Parameters:

group_name (str) – the name of the group

Returns:

a Group object

Return type:

Optional[Group]

json(*args: Any, **kwargs: Any) str

Returns a json-string of the serialized object

Returns:

a json-string of the serialized object

Return type:

str

last() Group | None

Returns the last Group object or None

Returns:

the last Group object

Return type:

Union[Group, None]

serialize() dict

Returns a dict representation of the Match object structured as follows

o = {
    "key": self.key,
    "name": self.name,
    "offset": self.offset,
    "value": self.value,
    "groups": {subgroup_0: [group_object.serialize()]}
}
Returns:

a dict representation of the Match object

Return type:

dict

span(group_name: str | None = None, rep_index: int | None = None) Tuple[int, int]

Returns the span of self or of Group with group_name

Parameters:
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns:

the span of the Match

Return type:

Tuple[int]

start(group_name: str | None = None, rep_index: int | None = None) int

Returns the start character index of self or of Group with group_name

Parameters:
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns:

the start index of the Match

Return type:

int

The Match Class

class replus.Match(match_type: str, match: Match, all_groups_names: List[str], pattern: Pattern)

A Match object is an abstract and expanded representation of a regex.regex.Match

Variables:
  • type – the type of the match, corresponding to the stem of the file of the pattern’s template

  • match – a regex.regex.Match object

  • partial – if it’s a partial match

  • value – the string value of the match

  • offset – the offset of the match {"start": int, "end": int}

  • pattern – the string representation of the pattern that matched

  • length – the length of the match (no. of characters)

  • all_group_names – all the names of all the groups for the corresponding pattern for this match

  • _start – the start offset of the Match

  • _end – the end offset Match

  • _span – the span of the Match (_start, _end)

Instantiates a Match object

Parameters:
  • match_type (str) – the type of the match, corresponding to the stem of the file of the pattern’s template

  • match (regex.regex.Match) – a regex.regex.Match object

  • all_groups_names (List[str]) – all the names of all the groups for the corresponding pattern for this match

  • pattern – the pattern that matched

Type:

pattern: regex.regex.Pattern

groups(group_query: str | None = None, root: bool = False) List[Group]

Returns a list of repeated Group objects that belong to the Match object

Parameters:
  • group_query (str, defaults to None) – the name of the group to find repetitions of

  • root (bool, defaults to False) – includes the root if True

Returns:

a list of Group objects

Return type:

List[Group]

The Group Class

class replus.Group(match: Match, group_name: str, root: Match, rep_index: int = 0)

A Group object is an abstract and expanded representation of a regex.regex.Match

Variables:
  • root – the root Match object

  • match – a regex.regex.Match object

  • name – the name of the group, including its rep_index. E.g.: date_0

  • key – the key of the group, i.e. the name without the rep_index

  • value – the string value of the match

  • offset – the offset of the match {"start": int, "end": int}

  • length – the length of the match (no. of characters)

  • rep_index – the repetition index

  • _start – the start offset of the Match

  • _end – the end offset Match

  • _span – the span of the Match (_start, _end)

groups(group_query: str | None = None, root: bool = False) List[Group]

Returns a list of repeated Group objects that belong to the Group object

Parameters:
  • group_query (str, defaults to None) – the name of the group to find repetitions of

  • root (bool, defaults to False) – includes the root if True

Returns:

a list of Group objects

Return type:

List[Group]

reps() List[Group]

Returns a list of the Group object’s repetitions

Returns:

a list of the Group object’s repetitions

Return type:

List[Group]