helpers¶
Helpers for editdistance¶
- symspellpy.helpers.null_distance_results(string1, string2, max_distance)[source]¶
Determines the proper return value of an edit distance function when one or both strings are null.
- Parameters
string_1 – Base string.
string_2 – The string to compare.
max_distance (
int
) – The maximum distance allowed.
- Return type
int
- Returns
- -1 if the distance is greater than the max_distance, 0 if the strings are
equivalent (both are None), otherwise a positive number whose magnitude is the length of the string which is not None.
- symspellpy.helpers.prefix_suffix_prep(string1, string2)[source]¶
Calculates starting position and lengths of two strings such that common prefix and suffix substrings are excluded. Expects len(string1) <= len(string2).
- Parameters
string_1 – Base string.
string_2 – The string to compare.
- Return type
Tuple
[int
,int
,int
]- Returns
- A tuple of lengths of the part excluding common prefix and suffix, and
the starting position.
Helpers for symspellpy¶
- class symspellpy.helpers.DictIO(dictionary, separator=' ')[source]¶
An iterator wrapper for python dictionary to format the output as required by
load_dictionary_stream()
andload_dictionary_bigram_stream()
.- Parameters
dictionary (
Dict
[str
,int
]) – Dictionary with words as keys and frequency count as values.separator (
str
) – Separator characters between term(s) and count.
- iteritems¶
An iterator object of dictionary.items().
- separator¶
Separator characters between term(s) and count.
- symspellpy.helpers.case_transfer_matching(cased_text, uncased_text)[source]¶
Transfers the casing from one text to another - assuming that they are ‘matching’ texts, alias they have the same length.
- Parameters
cased_text (
str
) – Text with varied casing.uncased_text (
str
) – Text that is in lowercase only.
- Return type
str
- Returns
Text with the content of uncased_text and the casing of cased_text.
- Raises
ValueError – If the input texts have different lengths.
- symspellpy.helpers.case_transfer_similar(cased_text, uncased_text)[source]¶
Transfers the casing from one text to another - for similar (not matching) text.
Use difflib.SequenceMatcher to identify the different type of changes needed to turn cased_text into uncased_text.
For inserted sections: transfer the casing from the prior character. If no character before or the character before is the space, transfer the casing from the following character.
For deleted sections: no case transfer is required.
For equal sections: swap out the text with the original, the cased one, a otherwise the two are the same.
For replaced sections: transfer the casing using
case_transfer_matching()
if the two has the same length, otherwise transfer character-by-character and carry the last casing over to any additional characters.
- Parameters
cased_text (
str
) – Text with varied casing.uncased_text (
str
) – Text in lowercase.
- Return type
str
- Returns
Text with the content of uncased_text but the casing of cased_text.
- Raises
ValueError – If cased_text is empty.
- symspellpy.helpers.increment_count(count, count_previous)[source]¶
Increments count up to
sys.maxsize
.- Return type
int
- symspellpy.helpers.is_acronym(word, contain_digits=False)[source]¶
Checks if the word is all caps (acronym) and/or contain numbers.
- Parameters
word (
str
) – The word to checkcontain_digits (
bool
) – A flag to determine whether any term with digits can be considered as acronym
- Return type
bool
- Returns
- True if the word is all caps and/or contain numbers, e.g., ABCDE, AB12C,
abc12, ab12c. False if the word contains lower case letters, e.g., abcde, ABCde, abcDE, abCDe.
- symspellpy.helpers.parse_words(phrase, preserve_case=False, split_by_space=False)[source]¶
Creates a non-unique wordlist from sample text. Language independent (e.g. works with Chinese characters)
- Parameters
phrase (
str
) – Sample text that could contain one or more words.preserve_case (
bool
) – A flag to determine if we can to preserve the cases or convert all to lowercase.split_by_space (
bool
) – Splits the phrase into words simply based on space.
- Return type
List
[str
]- Returns
A list of words
- symspellpy.helpers.try_parse_int64(string)[source]¶
Converts the string representation of a number to its 64-bit signed integer equivalent.
- Parameters
string (
str
) – String representation of a number.- Return type
Optional
[int
]- Returns
- The 64-bit signed integer equivalent, or None if conversion failed or if
the number is less than the min value or greater than the max value of a 64-bit signed integer.
Misc¶
- symspellpy.helpers.to_similarity(distance, length)[source]¶
Calculates a similarity measure from an edit distance.
- Parameters
distance (
int
) – The edit distance between two strings.length (
int
) – The length of the longer of the two strings the edit distance is from.
- Return type
float
- Returns
- A similarity value from 0 to 1.0 (1 - (length / distance)), -1 if
distance is negative