在平時使用谷歌翻譯的過程中,經(jīng)常會遇到需要批量翻譯大量文本的情景左权,這種時候需要調(diào)用谷歌翻譯的API
首先可以使用python庫googletrans
pip install googletrans
#使用方法
from googletrans import Translator
translator = Translator(service_urls=['translate.google.cn'])
source = '我還是不開心!'
text = translator.translate(source,src='zh-cn',dest='en').text
print(text)
"i'm still not happy!"
但是在面對大規(guī)模需要翻譯的句子時就會很慢鞍爱,所以可以使用協(xié)程的方法涕蜂。
這里我們使用了基于gevents庫的grequests庫析校。
仔細(xì)看了下googletrans庫的核心代碼礁哄,發(fā)現(xiàn)主要是構(gòu)造一個url长酗,然后發(fā)起get請求,得到一個json的結(jié)果桐绒,從中提取出翻譯結(jié)果夺脾。
構(gòu)造url的過程需要一個token,根據(jù)某些規(guī)則生成茉继,所以為了方便還是調(diào)用googletrans的部分函數(shù)咧叭。
具體的參考代碼如下:
這里是將德語(de)翻譯成英語(en),代碼中需要相應(yīng)修改烁竭。
import grequests
import logging
import json
from googletrans import Translator
from googletrans.utils import format_json
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}
translator = Translator(service_urls=['translate.google.cn'])
logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s',filename='log.txt')
logger = logging.getLogger()
def exception_handler(request, exception):
logger.warning('exception when at %s :%s',request.url,exception)
def work(urls):
reqs = (grequests.get(u,verify=True, allow_redirects=True, timeout=4) for u in urls)
res = grequests.map(reqs, exception_handler=exception_handler,size=20)
return res
def totaltranslate():
file2 = open('de2en_en.txt',mode='a',encoding='utf-8')
with open('de.txt',mode='r',encoding='utf-8') as f:
urls = []
num = 0
for line in f:
num+=1
line = line.strip()
token = translator.token_acquirer.do(line)
url="https://translate.google.cn/translate_a/single?client=t&sl=de&tl=en&hl=en&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&ie=UTF-8&oe=UTF-8&otf=1&ssel=3&tsel=0&kc=1&tk={0}&q={1}".format(token,line)
urls.append(url)
if len(urls) >= 50:
res = work(urls)
for r in res:
if hasattr(r,'status_code'):
if r.status_code == 200:
try:
a=format_json(r.text)
target = ''.join([d[0] if d[0] else '' for d in a[0]])
source = ''.join([d[1] if d[1] else '' for d in a[0]])
except Exception as e:
logger.error('when format:%s',e)
logger.error('%s\n%s',r.text)
source = ''
target = ''
if len(source) != 0 and len(target) != 0:
file2.write(target+'\n')
else:
file2.write('\n')
else:
file2.write('\n')
urls = []
logger.info('finish 50 sentence, now at %s',num)
file2.close()
def sentencetranslate(line):
line = line.strip()
text = translator.translate(line,src='de',dest='en').text
return text
def completetranslate():
file1 = open('de2en_en.txt',mode='r',encoding='utf-8')
file2 = open('new_de2en_en.txt',mode='a',encoding='utf-8')
i = 1
with open('de.txt',mode='r',encoding='utf-8') as f:
for line in f:
t = file1.readline()
if len(t) == 1:#'only \n'
text = sentencetranslate(line)
file2.write(text+'\n')
else:
file2.write(t)
i += 1
if i%100 == 0:
print(i)
file1.close()
file2.close()
if __name__ == "__main__":
totaltranslate()
completetranslate()
totaltranslate()以及翻譯了大部分的句子菲茬,但是可能因?yàn)槟承┰蛴幸恍┚渥記]有翻譯,所以還需要completetranslate()來補(bǔ)全翻譯結(jié)果派撕。
在grequests.map()中size設(shè)置為20的情況下婉弹,平均每秒翻譯20句。