helpers¶
Helpers for editdistance¶
Helpers for symspellpy¶
- class symspellpy.helpers.DictIO(dictionary, separator=' ')[source]¶
An iterator wrapper for python dictionary to format the output as required by
load_dictionary_stream()andload_dictionary_bigram_stream().- Parameters:
dictionary (
dict[str,int]) – dictionary with words as keys and frequency count as values.separator (
str) – Separator characters between term(s) and count.
- iteritems¶
An iterator object of dictionary.items().
- separator¶
Separator characters between term(s) and count.
- symspellpy.helpers.case_transfer_matching(cased_text, uncased_text)[source]¶
Transfers the casing from one text to another - assuming that they are ‘matching’ texts, alias they have the same length.
- Parameters:
cased_text (
str) – Text with varied casing.uncased_text (
str) – Text that is in lowercase only.
- Return type:
str- Returns:
Text with the content of uncased_text and the casing of cased_text.
- Raises:
ValueError – If the input texts have different lengths.
- symspellpy.helpers.case_transfer_similar(cased_text, uncased_text)[source]¶
Transfers the casing from one text to another - for similar (not matching) text.
Use difflib.SequenceMatcher to identify the different type of changes needed to turn cased_text into uncased_text.
For inserted sections: transfer the casing from the prior character. If no character before or the character before is the space, transfer the casing from the following character.
For deleted sections: no case transfer is required.
For equal sections: swap out the text with the original, the cased one, a otherwise the two are the same.
For replaced sections: transfer the casing using
case_transfer_matching()if the two has the same length, otherwise transfer character-by-character and carry the last casing over to any additional characters.
- Parameters:
cased_text (
str) – Text with varied casing.uncased_text (
str) – Text in lowercase.
- Return type:
str- Returns:
Text with the content of uncased_text but the casing of cased_text.
- Raises:
ValueError – If cased_text is empty.
- symspellpy.helpers.increment_count(count, count_previous)[source]¶
Increments count up to
sys.maxsize.- Return type:
int
- symspellpy.helpers.is_acronym(word, contain_digits=False)[source]¶
Checks if the word is all caps (acronym) and/or contain numbers.
- Parameters:
word (
str) – The word to checkcontain_digits (
bool) – A flag to determine whether any term with digits can be considered as acronym
- Return type:
bool- Returns:
- True if the word is all caps and/or contain numbers, e.g., ABCDE, AB12C,
abc12, ab12c. False if the word contains lower case letters, e.g., abcde, ABCde, abcDE, abCDe.
- symspellpy.helpers.parse_words(phrase, preserve_case=False, split_by_space=False)[source]¶
Creates a non-unique wordlist from sample text. Language independent (e.g. works with Chinese characters)
- Parameters:
phrase (
str) – Sample text that could contain one or more words.preserve_case (
bool) – A flag to determine if we can to preserve the cases or convert all to lowercase.split_by_space (
bool) – Splits the phrase into words simply based on space.
- Return type:
list[str]- Returns:
A list of words
- symspellpy.helpers.try_parse_int64(string)[source]¶
Converts the string representation of a number to its 64-bit signed integer equivalent.
- Parameters:
string (
str) – String representation of a number.- Return type:
Optional[int]- Returns:
- The 64-bit signed integer equivalent, or None if conversion failed or if
the number is less than the min value or greater than the max value of a 64-bit signed integer.
Misc¶
- symspellpy.helpers.to_similarity(distance, length)[source]¶
Calculates a similarity measure from an edit distance.
- Parameters:
distance (
int) – The edit distance between two strings.length (
int) – The length of the longer of the two strings the edit distance is from.
- Return type:
float- Returns:
- A similarity value from 0 to 1.0 (1 - (length / distance)), -1 if
distance is negative