How to remove string accents in python 2

Published: January 19, 2019

DMCA.com Protection Status

Removing accents using unicodedata:

>>> import unicodedata
>>> s = 'Découvrez tous les logiciels à télécharger'
>>> s
'D\xc3\xa9couvrez tous les logiciels \xc3\xa0 t\xc3\xa9l\xc3\xa9charger'
>>> s1 = unicode(s,'utf-8')
>>> s2 = unicodedata.normalize('NFD', s1).encode('ascii', 'ignore')     
>>> s2
'Decouvrez tous les logiciels a telecharger'

Note: it is necessary to first determine the encodage. Example, if the string is encoded based on iso-8859-1, replace

>>> s1 = unicode(s,'utf-8')

by

>>> s1 = unicode(s,'iso-8859-1')

References