regex - Python Comparing two Strings and Determining 'Uniqueness' -
the title mess bear me while explain question in more detail (or really, it's set of semi-related questions). i'm compiling list of words large text file , storing them in dictionary keys respective occurrences (integers) value. want apply several processes consolidate dictionary 'related' words lumped together.
first operation plurals. see no reason have 'cat' , 'cats' key in dictionary. same car vs. cars , book vs. books , on. want write function (upon seeing new word not in dictionary) checks see if new word plural form of key in dict (and vice versa).
if new_word ends s -> check dict key matches new_word[:-1] else if new_word not end in s -> check dict new_word + 's'
is there better way approach problem? (i have handle edge cases plurals...this general @ point)
on same topic, if want determine if words similar consulting database of known suffixes , prefixes , seeing if new_word seen word suffix or prefix attached.
i use nltk handle lot of other tasks in program such splitting sentences , individual words prefer write 'similar-ness' algorithm myself. thank in advance guys!
Comments
Post a Comment