How to remove string accents using python 3
Using unicodedata
>>> import unicodedata>>> s = 'Découvrez tous les logiciels à télécharger'>>> s'Découvrez tous les logiciels à télécharger'>>> s_no_accents = ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))>>> s_no_accents'Decouvrez tous les logiciels a telecharger'
Using 3rd party package
>>>unidecode.unidecode('Découvrez tous les logiciels à télécharger')'Decouvrez tous les logiciels a telecharger'
Example create keywords
Create a list of keywords (words with and without accents) from a string to develop a simple search function:
>>> import unicodedata>>> title = 'Découvrez tous les logiciels à télécharger by Jérôme Martin'>>> words_list = title.split()>>> keywords = []>>> for word in words_list:... if not word in keywords:... keywords.append(word)... word_no_accents = ''.join((c for c in unicodedata.normalize('NFD', word) if unicodedata.category(c) != 'Mn'))... if not word_no_accents in keywords:... keywords.append(word_no_accents)...>>> keywords['Découvrez', 'Decouvrez', 'tous', 'les', 'logiciels', 'à', 'a', 'télécharger', 'telecharger', 'by', 'Jérôme', 'Jerome', 'Martin']
References
| Links | Site |
|---|---|
| How to remove accents from values in columns? | stackoverflow |
| How to remove accent in Python 3.5 and get a string with unicodedata or other solutions? | stackoverflow |
| Unidecode | ypi.org |
| How to replace accented characters in python? | stackoverflow |
| Python: solving unicode hell with unidecode | stackoverflow |
