How to remove string accents using python 3
Using unicodedata
>>> import unicodedata
>>> s = 'Découvrez tous les logiciels à télécharger'
>>> s
'Découvrez tous les logiciels à télécharger'
>>> s_no_accents = ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))
>>> s_no_accents
'Decouvrez tous les logiciels a telecharger'
Using 3rd party package
>>>unidecode.unidecode('Découvrez tous les logiciels à télécharger')
'Decouvrez tous les logiciels a telecharger'
Example create keywords
Create a list of keywords (words with and without accents) from a string to develop a simple search function:
>>> import unicodedata
>>> title = 'Découvrez tous les logiciels à télécharger by Jérôme Martin'
>>> words_list = title.split()
>>> keywords = []
>>> for word in words_list:
... if not word in keywords:
... keywords.append(word)
... word_no_accents = ''.join((c for c in unicodedata.normalize('NFD', word) if unicodedata.category(c) != 'Mn'))
... if not word_no_accents in keywords:
... keywords.append(word_no_accents)
...
>>> keywords
['Découvrez', 'Decouvrez', 'tous', 'les', 'logiciels', 'à', 'a', 'télécharger', 'telecharger', 'by', 'Jérôme', 'Jerome', 'Martin']
References
Links | Site |
---|---|
How to remove accents from values in columns? | stackoverflow |
How to remove accent in Python 3.5 and get a string with unicodedata or other solutions? | stackoverflow |
Unidecode | ypi.org |
How to replace accented characters in python? | stackoverflow |
Python: solving unicode hell with unidecode | stackoverflow |