python編碼問(wèn)題:
所有使用python的都會(huì)遇到下面的問(wèn)題:
Traceback (most recent call last):
File "amazon_test.py", line 30, in
print(s)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
解決方法
首先万哪,你要有個(gè)通用的環(huán)境:
- locale保證
LANG=zh_CN.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
具體設(shè)置:
# ~/.bashrc中添加
LANG=zh_CN.UTF-8
LANGUAGE=zh_CN:zh:en_US:en
LC_ALL=en_US.UTF-8
- py文件第一行一般為
#!/usr/bin/env python
第二行# -*- coding: utf-8 -*-
或者# coding=utf-8
保證文件的編碼為utf-8格式(有些人會(huì)把vim環(huán)境設(shè)置為gbk或者chinese秀撇,文件保存時(shí)可能會(huì)變成gbk格式,需要注意)
p.s. : vimrc設(shè)置推薦:
set encoding=utf-8 " 新創(chuàng)建文件格式為utf-8
set termencoding=utf-8 " 終端顯示格式舍肠,把解析的字符用utf-8編碼來(lái)進(jìn)行顯示和渲染終端屏幕
set fileencodings=utf-8,gb18030,gbk,cp936,gb2312 " 可以查看多種格式的文件
python2
-
解碼輸入流
- 讀取文件
with open(file_path, 'r') as f: for line in f: line = line.decode('your_file_encoding', errors='ignore').strip()
- 標(biāo)準(zhǔn)輸入流
for line in sys.stdin: line = line.decode('your_file_encoding', errors='ignore').strip()
寫某編碼的文件
print >> sys.stdout, line.encode('gb18030', 'ignore')
# 或者用搀继,推薦下面的方法
sys.stdout.write(line.encode('gb18030', 'ignore') + '\n')
python3
- 解碼輸入流
- 讀取文件
with open(file_path, mode='r', encoding='gb18030', errors='ignore') as f: for line in f: # line is unicode string pass
- 標(biāo)準(zhǔn)輸入流
import io import sys sys.stdin = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8') for line in sys.stdin: pass
import sys
sys.stdin.reconfigure(encoding='utf-8')
for line in sys.stdin:
pass
- 編碼輸出
- 寫文件
with open(file_output, encoding='your_dest_encoding', mode='w') as f: f.write(line)
- 輸出流
import sys import io sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8') sys.stdout.write(line + '\n')