學習https://tensorflow.google.cn/tutorials/keras/text_classification_with_hub例子,存在兩個問題:
1. 下載數(shù)據(jù)集很慢
INFO:absl:Downloading http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz into C:\Users\DEEP\tensorflow_datasets\downloads\ai.stanfor.edu_amaas_
解決辦法:自己先下載下來,建一個tomcat服務酿箭,然后加入
imdb._DOWNLOAD_URL='http://localhost:8080/aclImdb_v1.tar.gz' #修改下載地址
# 將訓練集按照 6:4 的比例進行切割拉背,從而最終我們將得到15,000
# 個訓練樣本, 10,000 個驗證樣本以及 25,000 個測試樣本
train_validation_split = tfds.Split.TRAIN.subsplit([6,4])
(train_data, validation_data), test_data = tfds.load(
name="imdb_reviews",
split=(train_validation_split, tfds.Split.TEST),
as_supervised=True)
2. 數(shù)據(jù)分組錯誤
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_datasets\core\tfrecords_reader.py", line 356, in _str_to_relative_instruction
? ? raise AssertionError('Unrecognized instruction format: %s' % spec)
AssertionError: Unrecognized instruction format: NamedSplit('train')(tfds.percent[0:60])
解決辦法:這是由于tensorflow_datasets\core\splits.py對于_SubSplit轉str沒有做特別處理,結果便是NamedSplit('train')(tfds.percent[0:60])
而_SUB_SPEC_RE??tensorflow_datasets/core/tfrecords_reader.py
42 _SUB_SPEC_RE=re.compile(r'''
43 ^
44 (?P\w+)
45 (\[
46 ((?P-?\d+)
47 (?P%)?)?
48 :
49 ((?P-?\d+)
50 (?P%)?)?
51 \])?
52 $
無法匹配。所以需要改成需要的形式:
imdb._DOWNLOAD_URL='http://localhost:8080/aclImdb_v1.tar.gz'
# 將訓練集按照 6:4 的比例進行切割芍躏,從而最終我們將得到15,000
# 個訓練樣本, 10,000 個驗證樣本以及 25,000 個測試樣本
train_validation_split ="train[:60%]","train[60%:]"
(train_data, validation_data), test_data = tfds.load(
name="imdb_reviews",
split=(train_validation_split, tfds.Split.TEST),
as_supervised=True)