(Comments)
Came across this in comp.lang.python, fairly useful for localization work as you often have to guess what encoding something is in:Now, how should you guess the encoding? Here is a strategy: 1. use the encoding that was sent through the HTTP header. Be absolutely certain to not ignore this encoding. 2. use the encoding in the XML declaration (if any). 3. use the encoding in the http-equiv meta element (if any) 4. use UTF-8 5. use Latin-1, and check that there are no characters in the range(128,160) 6. use cp1252 7. use Latin-1 In the order from 1 to 6, check whether you manage to decode the input. Notice that in step 5, you will definitely get successful decoding; consider this a failure if you have get any control characters (from range(128, 160)); then try in step 7 latin-1 again.Share on Twitter Share on Facebook
Comments