Removing accents using unicodedata:
>>> import unicodedata
>>> s = 'Découvrez tous les logiciels à télécharger'
>>> s
'D\xc3\xa9couvrez tous les logiciels \xc3\xa0 t\xc3\xa9l\xc3\xa9charger'
>>> s1 = unicode(s,'utf-8')
>>> s2 = unicodedata.normalize('NFD', s1).encode('ascii', 'ignore')
>>> s2
'Decouvrez tous les logiciels a telecharger'
Note: it is necessary to first determine the encodage. Example, if the string is encoded based on iso-8859-1, replace
>>> s1 = unicode(s,'utf-8')
by
>>> s1 = unicode(s,'iso-8859-1')
References
Links | Site |
---|---|
unicodedata | python doc |
What is the best way to remove accents in a python unicode string? | stackoverflow |
Python detect string byte encoding | stackoverflow |
How do I check if a string is unicode or ascii? | stackoverflow |