Classes
The Replus Class
- class replus.Replus(patterns_dir_or_dict: str | PathLike | Dict[str, Dict], whitespace_noise: str | None = None, flags: int | None = RegexFlag.V0)
The Replus engine class builds and compiles regular expressions based on templates.
- Variables:
group_counter – a Counter object to count group name occurrence on each template
patterns – a list of tuples made of [(key, pattern, template), …]
patterns_src – a dict containing all of patterns_dir/*.json combined together, “patterns” excluded
patterns_all – all patterns that can be run, e.g. {“dates”: [pattern0, pattern1], …}
all_groups – a dict of list with the templates as keys, e.g. {pattern_template_a: [group_0, group_1], pattern_template_b: [group_0, group_1]}
flags – the regex flags to compile the patterns
whitespace_noise – a pattern to replace white space in the template
Instantiates the Replus engine
- Parameters:
patterns_dir_or_dict (Union[os.PathLike, Dict[str, Dict]]) – the path to the directory where the *.json pattern templates are stored or a dict of dicts with the patterns.
whitespace_noise (str, defaults to None) – a pattern to replace white space in the template
flags (int, defaults to regex.V0) – the regex flags to compile the patterns
- parse(string: str, filters: List[str] | None = None, exclude: List[str] | None = None, pos: int | None = None, endpos: int | None = None, flags: int | None = 0, overlapped: bool | None = False, partial: bool | None = False, concurrent: bool | None = None, timeout: float | None = None, ignore_unused: bool | None = False, **kwargs: Any) List[Match] | List[Group]
Returns a list of Match objects
- Parameters:
string (str) – the string to parse
filters (List[str]) – one or more pattern types to parse; if none is provided, all will be used
exclude (List[str], defaults to None) – a list of pattern types to exclude
pos (int, defaults to None) – starting position of the matching
endpos (int, defaults to None) – ending position of the matching
flags (int, defaults to 0) – flags to use while matching
overlapped (bool, defaults to False) – if True will allow overlapping matches
partial (float, defaults to None) – if True will allow partial matches
concurrent (bool, defaults to None) – if True will run concurrently
timeout – timeout for matching
ignore_unused (bool, defaults to False) – ignore unused
- Returns:
a list of Match objects
- Return type:
List[Match]
- static purge_overlaps(matches: List[Match] | List[Group]) List[Match] | List[Group]
Purge the list of Match and Group objects from overlapping instances
- search(string: str, filters: List[str] | None = None, exclude: List[str] | None = None, pos: int | None = None, endpos: int | None = None, flags: int | None = 0, overlapped: bool | None = False, partial: bool | None = False, concurrent: bool | None = None, timeout: float | None = None, ignore_unused: bool | None = False, **kwargs: Any) Match | Group | None
Returns a single Match object
- Parameters:
string (str) – the string to parse
filters (Tuple[str]) – one or more pattern types to parse; if none is provided, all will be used
exclude (List[str], defaults to None) – a list of pattern types to exclude
pos (int, defaults to None) – starting position of the matching
endpos (int, defaults to None) – ending position of the matching
flags (int, defaults to 0) – flags to use while matching
overlapped (bool, defaults to False) – if True will allow overlapping matches
partial (float, defaults to None) – if True will allow partial matches
concurrent (bool, defaults to None) – if True will run concurrently
timeout – timeout for matching
ignore_unused (bool, defaults to False) – ignore unused
- Returns:
a Match object
- Return type:
The AbstractMatch Class
- class replus.AbstractMatch
- end(group_name: str | None = None, rep_index: int | None = None) int
Returns the end character index of self or of Group with group_name
- Parameters:
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns:
the end index of the Match
- Return type:
int
- first() Group | None
Returns the first Group object or None
- Returns:
the first Group object
- Return type:
Union[Group, None]
- group(group_name: str) Group | None
Returns a Group object with the given group_name or None
- Parameters:
group_name (str) – the name of the group
- Returns:
a Group object
- Return type:
Optional[Group]
- json(*args: Any, **kwargs: Any) str
Returns a json-string of the serialized object
- Returns:
a json-string of the serialized object
- Return type:
str
- last() Group | None
Returns the last Group object or None
- Returns:
the last Group object
- Return type:
Union[Group, None]
- serialize() dict
Returns a dict representation of the Match object structured as follows
o = { "key": self.key, "name": self.name, "offset": self.offset, "value": self.value, "groups": {subgroup_0: [group_object.serialize()]} }
- Returns:
a dict representation of the Match object
- Return type:
dict
- span(group_name: str | None = None, rep_index: int | None = None) Tuple[int, int]
Returns the span of self or of Group with group_name
- Parameters:
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns:
the span of the Match
- Return type:
Tuple[int]
- start(group_name: str | None = None, rep_index: int | None = None) int
Returns the start character index of self or of Group with group_name
- Parameters:
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns:
the start index of the Match
- Return type:
int
The Match Class
- class replus.Match(match_type: str, match: Match, all_groups_names: List[str], pattern: Pattern)
A Match object is an abstract and expanded representation of a regex.regex.Match
- Variables:
type – the type of the match, corresponding to the stem of the file of the pattern’s template
match – a regex.regex.Match object
partial – if it’s a partial match
value – the string value of the match
offset – the offset of the match
{"start": int, "end": int}pattern – the string representation of the pattern that matched
length – the length of the match (no. of characters)
all_group_names – all the names of all the groups for the corresponding pattern for this match
_start – the start offset of the Match
_end – the end offset Match
_span – the span of the Match (_start, _end)
Instantiates a Match object
- Parameters:
match_type (str) – the type of the match, corresponding to the stem of the file of the pattern’s template
match (regex.regex.Match) – a regex.regex.Match object
all_groups_names (List[str]) – all the names of all the groups for the corresponding pattern for this match
pattern – the pattern that matched
- Type:
pattern: regex.regex.Pattern
- groups(group_query: str | None = None, root: bool = False) List[Group]
Returns a list of repeated Group objects that belong to the Match object
- Parameters:
group_query (str, defaults to None) – the name of the group to find repetitions of
root (bool, defaults to False) – includes the root if True
- Returns:
a list of Group objects
- Return type:
List[Group]
The Group Class
- class replus.Group(match: Match, group_name: str, root: Match, rep_index: int = 0)
A Group object is an abstract and expanded representation of a regex.regex.Match
- Variables:
root – the root Match object
match – a regex.regex.Match object
name – the name of the group, including its rep_index. E.g.: date_0
key – the key of the group, i.e. the name without the rep_index
value – the string value of the match
offset – the offset of the match
{"start": int, "end": int}length – the length of the match (no. of characters)
rep_index – the repetition index
_start – the start offset of the Match
_end – the end offset Match
_span – the span of the Match (_start, _end)
- groups(group_query: str | None = None, root: bool = False) List[Group]
Returns a list of repeated Group objects that belong to the Group object
- Parameters:
group_query (str, defaults to None) – the name of the group to find repetitions of
root (bool, defaults to False) – includes the root if True
- Returns:
a list of Group objects
- Return type:
List[Group]