Removing accents using unicodedata:
>>> import unicodedata>>> s = 'Découvrez tous les logiciels à télécharger'>>> s'D\xc3\xa9couvrez tous les logiciels \xc3\xa0 t\xc3\xa9l\xc3\xa9charger'>>> s1 = unicode(s,'utf-8')>>> s2 = unicodedata.normalize('NFD', s1).encode('ascii', 'ignore')>>> s2'Decouvrez tous les logiciels a telecharger'
Note: it is necessary to first determine the encodage. Example, if the string is encoded based on iso-8859-1, replace
>>> s1 = unicode(s,'utf-8')
by
>>> s1 = unicode(s,'iso-8859-1')
References
| Links | Site |
|---|---|
| unicodedata | python doc |
| What is the best way to remove accents in a python unicode string? | stackoverflow |
| Python detect string byte encoding | stackoverflow |
| How do I check if a string is unicode or ascii? | stackoverflow |
