I created a file in Unix System Services, and FTPed it down to my Linux box. I could edit it, and process it with no problems, until I came to read in the file using Python.
Python gave me
File “<frozen codecs>”, line 322, in decode
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xb8 in position 3996: invalid start byte
The Linux command file pagentn.txt gave me
pagentn.txt: ISO-8859 text
whereas other files had ASCII text.
I changed my Python program to have
with open(“/home/colinpaice/python/pagentn.txt”,encoding=”ISO-8859-1″) as file:
and it worked!
I browsed the web, and found a Python way of finding the code page of a file
import chardet
rawdata = open(infile, 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
it returned a dict with
result {‘encoding’: ‘ISO-8859-1’, ‘confidence’: 0.73, ‘language’: ”}
Hi Colin, did you try utf-8 in the first place ?
LikeLike
No, I just assumed it would work!
LikeLike