第44章 如何使用公共數(shù)據(jù)集

前兩章中筑悴,在JAX里使用了TensorFlow數(shù)據(jù)集來做模型訓(xùn)練。吳恩達老師說過,“公共數(shù)據(jù)集為機器學(xué)習(xí)研究這枚火箭提供了動力”。解決了機器學(xué)習(xí)中“無米下炊”的難題往声。

公共數(shù)據(jù)集

公共數(shù)據(jù)集有Kaggle、UCI機器學(xué)習(xí)庫纸兔、Visual Data等誓沸。涵蓋圖像識別、自然語言處理塔鳍、臨床等伯铣。

TensorFlow Datasets作為數(shù)據(jù)集管理庫不僅包含了數(shù)據(jù)集的下載、安裝轮纫,還包含了數(shù)據(jù)集使用方法腔寡,比如分割、批量處理掌唾,迭代設(shè)置等等放前。本章仍以TensorFlow Datasets為例介紹如何使用公共數(shù)據(jù)集。

圖1 Kaggle數(shù)據(jù)集

TensorFlow Datasets概覽

目前來說糯彬,TensorFlow Datasets已經(jīng)管理一千多個數(shù)據(jù)集犀斋,可以通過代碼獲取全部數(shù)據(jù)集的名稱。代碼如下情连。


import tensorflow_datasets as tfds

def preview_datasets():
    
    builders = tfds.list_builders()
    
    print("builders.shape = ", len(builders))
    print("builders = ", builders)
    
if __name__ == "__main__":
    
    preview_datasets()

運行結(jié)果打印輸出如下叽粹,


Length of builders =  1142
builders =  ['abstract_reasoning', 'accentdb', 'aeslc', 'aflw2k3d', 'ag_news_subset', 'ai2_arc', 'ai2_arc_with_ir', 'amazon_us_reviews', 'anli', 'answer_equivalence', 'arc', 'asqa', 'asset', 'assin2', 'bair_robot_pushing_small', 'bccd', 'beans', 'bee_dataset', 'beir', 'big_patent', 'bigearthnet', 'billsum', 'binarized_mnist', 'binary_alpha_digits', 'ble_wind_field', 'blimp', 'booksum', 'bool_q', 'bucc', 'c4', 'c4_wsrs', 'caltech101', 'caltech_birds2010', 'caltech_birds2011', 'cardiotox', 'cars196', 'cassava', 'cats_vs_dogs', 'celeb_a', 'celeb_a_hq', 'cfq', 'cherry_blossoms', 'chexpert', 'cifar10', 'cifar100', 'cifar100_n', 'cifar10_1', 'cifar10_corrupted', 'cifar10_n', 'citrus_leaves', 'cityscapes', 'civil_comments', 'clevr', 'clic', 'clinc_oos', 'cmaterdb', 'cnn_dailymail', 'coco', 'coco_captions', 'coil100', 'colorectal_histology', 'colorectal_histology_large', 'common_voice', 'conll2002', 'conll2003', 'controlled_noisy_web_labels', 'coqa', 'cos_e', 'cosmos_qa', 'covid19', 'covid19sum', 'crema_d', 'criteo', 'cs_restaurants', 'curated_breast_imaging_ddsm', 'cycle_gan', 'd4rl_adroit_door', 'd4rl_adroit_hammer', 'd4rl_adroit_pen', 'd4rl_adroit_relocate', 'd4rl_antmaze', 'd4rl_mujoco_ant', 'd4rl_mujoco_halfcheetah', 'd4rl_mujoco_hopper', 'd4rl_mujoco_walker2d', 'dart', 'davis', 'deep1b', 'deep_weeds', 'definite_pronoun_resolution', 'dementiabank', 'diabetic_retinopathy_detection', 'diamonds', 'div2k', 'dmlab', 'doc_nli', 'dolphin_number_word', 'domainnet', 'downsampled_imagenet', 'drop', 'dsprites', 'dtd', 'duke_ultrasound', 'e2e_cleaned', 'efron_morris75', 'emnist', 'eraser_multi_rc', 'esnli', 'eurosat', 'fashion_mnist', 'flic', 'flores', 'food101', 'forest_fires', 'fuss', 'gap', 'geirhos_conflict_stimuli', 'gem', 'genomics_ood', 'german_credit_numeric', 'gigaword', 'glove100_angular', 'glue', 'goemotions', 'gov_report', 'gpt3', 'gref', 'groove', 'grounded_scan', 'gsm8k', 'gtzan', 'gtzan_music_speech', 'hellaswag', 'higgs', 'hillstrom', 'horses_or_humans', 'howell', 'i_naturalist2017', 'i_naturalist2018', 'i_naturalist2021', 'imagenet2012', 'imagenet2012_corrupted', 'imagenet2012_fewshot', 'imagenet2012_multilabel', 'imagenet2012_real', 'imagenet2012_subset', 'imagenet_a', 'imagenet_lt', 'imagenet_pi', 'imagenet_r', 'imagenet_resized', 'imagenet_sketch', 'imagenet_v2', 'imagenette', 'imagewang', 'imdb_reviews', 'irc_disentanglement', 'iris', 'istella', 'kddcup99', 'kitti', 'kmnist', 'laion400m', 'lambada', 'lfw', 'librispeech', 'librispeech_lm', 'libritts', 'ljspeech', 'lm1b', 'locomotion', 'lost_and_found', 'lsun', 'lvis', 'malaria', 'math_dataset', 'math_qa', 'mctaco', 'media_sum', 'mlqa', 'mnist', 'mnist_corrupted', 'movie_lens', 'movie_rationales', 'movielens', 'moving_mnist', 'mrqa', 'mslr_web', 'mt_opt', 'mtnt', 'multi_news', 'multi_nli', 'multi_nli_mismatch', 'natural_instructions', 'natural_questions', 'natural_questions_open', 'newsroom', 'nsynth', 'nyu_depth_v2', 'ogbg_molpcba', 'omniglot', 'open_images_challenge2019_detection', 'open_images_v4', 'openbookqa', 'opinion_abstracts', 'opinosis', 'opus', 'oxford_flowers102', 'oxford_iiit_pet', 'para_crawl', 'pass', 'patch_camelyon', 'paws_wiki', 'paws_x_wiki', 'penguins', 'pet_finder', 'pg19', 'piqa', 'places365_small', 'placesfull', 'plant_leaves', 'plant_village', 'plantae_k', 'protein_net', 'q_re_cc', 'qa4mre', 'qasc', 'quac', 'quality', 'quickdraw_bitmap', 'race', 'radon', 'reddit', 'reddit_disentanglement', 'reddit_tifu', 'ref_coco', 'resisc45', 'rlu_atari', 'rlu_atari_checkpoints', 'rlu_atari_checkpoints_ordered', 'rlu_control_suite', 'rlu_dmlab_explore_object_rewards_few', 'rlu_dmlab_explore_object_rewards_many', 'rlu_dmlab_rooms_select_nonmatching_object', 'rlu_dmlab_rooms_watermaze', 'rlu_dmlab_seekavoid_arena01', 'rlu_locomotion', 'rlu_rwrl', 'robomimic_mg', 'robomimic_mh', 'robomimic_ph', 'robonet', 'robosuite_panda_pick_place_can', 'rock_paper_scissors', 'rock_you', 's3o4d', 'salient_span_wikipedia', 'samsum', 'savee', 'scan', 'scene_parse150', 'schema_guided_dialogue', 'sci_tail', 'scicite', 'scientific_papers', 'scrolls', 'sentiment140', 'shapes3d', 'sift1m', 'simpte', 'siscore', 'smallnorb', 'smartwatch_gestures', 'snli', 'so2sat', 'speech_commands', 'spoken_digit', 'squad', 'squad_question_generation', 'stanford_dogs', 'stanford_online_products', 'star_cfq', 'starcraft_video', 'stl10', 'story_cloze', 'summscreen', 'sun397', 'super_glue', 'svhn_cropped', 'symmetric_solids', 'tao', 'tatoeba', 'ted_hrlr_translate', 'ted_multi_translate', 'tedlium', 'tf_flowers', 'the300w_lp', 'tiny_shakespeare', 'titanic', 'trec', 'trivia_qa', 'tydi_qa', 'uc_merced', 'ucf101', 'unified_qa', 'universal_dependencies', 'unnatural_instructions', 'user_libri_audio', 'user_libri_text', 'vctk', 'visual_domain_decathlon', 'voc', 'voxceleb', 'voxforge', 'waymo_open_dataset', 'web_graph', 'web_nlg', 'web_questions', 'webvid', 'wider_face', 'wiki40b', 'wiki_auto', 'wiki_bio', 'wiki_dialog', 'wiki_table_questions', 'wiki_table_text', 'wikiann', 'wikihow', 'wikipedia', 'wikipedia_toxicity_subtypes', 'wine_quality', 'winogrande', 'wit', 'wit_kaggle', 'wmt13_translate', 'wmt14_translate', 'wmt15_translate', 'wmt16_translate', 'wmt17_translate', 'wmt18_translate', 'wmt19_translate', 'wmt_t2t_translate', 'wmt_translate', 'wordnet', 'wsc273', 'xnli', 'xquad', 'xsum', 'xtreme_pawsx', 'xtreme_pos', 'xtreme_s', 'xtreme_xnli', 'yahoo_ltrc', 'yelp_polarity_reviews', 'yes_no', 'youtube_vis', 'huggingface:acronym_identification', 'huggingface:ade_corpus_v2', 'huggingface:adv_glue', 'huggingface:adversarial_qa', 'huggingface:aeslc', 'huggingface:afrikaans_ner_corpus', 'huggingface:ag_news', 'huggingface:ai2_arc', 'huggingface:air_dialogue', 'huggingface:ajgt_twitter_ar', 'huggingface:allegro_reviews', 'huggingface:allocine', 'huggingface:alt', 'huggingface:amazon_polarity', 'huggingface:amazon_reviews_multi', 'huggingface:amazon_us_reviews', 'huggingface:ambig_qa', 'huggingface:americas_nli', 'huggingface:ami', 'huggingface:amttl', 'huggingface:anli', 'huggingface:app_reviews', 'huggingface:aqua_rat', 'huggingface:aquamuse', 'huggingface:ar_cov19', 'huggingface:ar_res_reviews', 'huggingface:ar_sarcasm', 'huggingface:arabic_billion_words', 'huggingface:arabic_pos_dialect', 'huggingface:arabic_speech_corpus', 'huggingface:arcd', 'huggingface:arsentd_lev', 'huggingface:art', 'huggingface:arxiv_dataset', 'huggingface:ascent_kb', 'huggingface:aslg_pc12', 'huggingface:asnq', 'huggingface:asset', 'huggingface:assin', 'huggingface:assin2', 'huggingface:atomic', 'huggingface:autshumato', 'huggingface:babi_qa', 'huggingface:banking77', 'huggingface:bbaw_egyptian', 'huggingface:bbc_hindi_nli', 'huggingface:bc2gm_corpus', 'huggingface:beans', 'huggingface:best2009', 'huggingface:bianet', 'huggingface:bible_para', 'huggingface:big_patent', 'huggingface:bigbench', 'huggingface:billsum', 'huggingface:bing_coronavirus_query_set', 'huggingface:biomrc', 'huggingface:biosses', 'huggingface:biwi_kinect_head_pose', 'huggingface:blbooks', 'huggingface:blbooksgenre', 'huggingface:blended_skill_talk', 'huggingface:blimp', 'huggingface:blog_authorship_corpus', 'huggingface:bn_hate_speech', 'huggingface:bnl_newspapers', 'huggingface:bookcorpus', 'huggingface:bookcorpusopen', 'huggingface:boolq', 'huggingface:bprec', 'huggingface:break_data', 'huggingface:brwac', 'huggingface:bsd_ja_en', 'huggingface:bswac', 'huggingface:c3', 'huggingface:c4', 'huggingface:cail2018', 'huggingface:caner', 'huggingface:capes', 'huggingface:casino', 'huggingface:catalonia_independence', 'huggingface:cats_vs_dogs', 'huggingface:cawac', 'huggingface:cbt', 'huggingface:cc100', 'huggingface:cc_news', 'huggingface:ccaligned_multilingual', 'huggingface:cdsc', 'huggingface:cdt', 'huggingface:cedr', 'huggingface:cfq', 'huggingface:chr_en', 'huggingface:cifar10', 'huggingface:cifar100', 'huggingface:circa', 'huggingface:civil_comments', 'huggingface:clickbait_news_bg', 'huggingface:climate_fever', 'huggingface:clinc_oos', 'huggingface:clue', 'huggingface:cmrc2018', 'huggingface:cmu_hinglish_dog', 'huggingface:cnn_dailymail', 'huggingface:coached_conv_pref', 'huggingface:coarse_discourse', 'huggingface:codah', 'huggingface:code_search_net', 'huggingface:code_x_glue_cc_clone_detection_big_clone_bench', 'huggingface:code_x_glue_cc_clone_detection_poj104', 'huggingface:code_x_glue_cc_cloze_testing_all', 'huggingface:code_x_glue_cc_cloze_testing_maxmin', 'huggingface:code_x_glue_cc_code_completion_line', 'huggingface:code_x_glue_cc_code_completion_token', 'huggingface:code_x_glue_cc_code_refinement', 'huggingface:code_x_glue_cc_code_to_code_trans', 'huggingface:code_x_glue_cc_defect_detection', 'huggingface:code_x_glue_ct_code_to_text', 'huggingface:code_x_glue_tc_nl_code_search_adv', 'huggingface:code_x_glue_tc_text_to_code', 'huggingface:code_x_glue_tt_text_to_text', 'huggingface:com_qa', 'huggingface:common_gen', 'huggingface:common_language', 'huggingface:common_voice', 'huggingface:commonsense_qa', 'huggingface:competition_math', 'huggingface:compguesswhat', 'huggingface:conceptnet5', 'huggingface:conceptual_12m', 'huggingface:conceptual_captions', 'huggingface:conll2000', 'huggingface:conll2002', 'huggingface:conll2003', 'huggingface:conll2012_ontonotesv5', 'huggingface:conllpp', 'huggingface:consumer-finance-complaints', 'huggingface:conv_ai', 'huggingface:conv_ai_2', 'huggingface:conv_ai_3', 'huggingface:conv_questions', 'huggingface:coqa', 'huggingface:cord19', 'huggingface:cornell_movie_dialog', 'huggingface:cos_e', 'huggingface:cosmos_qa', 'huggingface:counter', 'huggingface:covid_qa_castorini', 'huggingface:covid_qa_deepset', 'huggingface:covid_qa_ucsd', 'huggingface:covid_tweets_japanese', 'huggingface:covost2', 'huggingface:cppe-5', 'huggingface:craigslist_bargains', 'huggingface:crawl_domain', 'huggingface:crd3', 'huggingface:crime_and_punish', 'huggingface:crows_pairs', 'huggingface:cryptonite', 'huggingface:cs_restaurants', 'huggingface:cuad', 'huggingface:curiosity_dialogs', 'huggingface:daily_dialog', 'huggingface:dane', 'huggingface:danish_political_comments', 'huggingface:dart', 'huggingface:datacommons_factcheck', 'huggingface:dbpedia_14', 'huggingface:dbrd', 'huggingface:deal_or_no_dialog', 'huggingface:definite_pronoun_resolution', 'huggingface:dengue_filipino', 'huggingface:dialog_re', 'huggingface:diplomacy_detection', 'huggingface:disaster_response_messages', 'huggingface:discofuse', 'huggingface:discovery', 'huggingface:disfl_qa', 'huggingface:doc2dial', 'huggingface:docred', 'huggingface:doqa', 'huggingface:dream', 'huggingface:drop', 'huggingface:duorc', 'huggingface:dutch_social', 'huggingface:dyk', 'huggingface:e2e_nlg', 'huggingface:e2e_nlg_cleaned', 'huggingface:ecb', 'huggingface:ecthr_cases', 'huggingface:eduge', 'huggingface:ehealth_kd', 'huggingface:eitb_parcc', 'huggingface:electricity_load_diagrams', 'huggingface:eli5', 'huggingface:eli5_category', 'huggingface:elkarhizketak', 'huggingface:emea', 'huggingface:emo', 'huggingface:emotion', 'huggingface:emotone_ar', 'huggingface:empathetic_dialogues', 'huggingface:enriched_web_nlg', 'huggingface:enwik8', 'huggingface:eraser_multi_rc', 'huggingface:esnli', 'huggingface:eth_py150_open', 'huggingface:ethos', 'huggingface:ett', 'huggingface:eu_regulatory_ir', 'huggingface:eurlex', 'huggingface:euronews', 'huggingface:europa_eac_tm', 'huggingface:europa_ecdc_tm', 'huggingface:europarl_bilingual', 'huggingface:event2Mind', 'huggingface:evidence_infer_treatment', 'huggingface:exams', 'huggingface:factckbr', 'huggingface:fake_news_english', 'huggingface:fake_news_filipino', 'huggingface:farsi_news', 'huggingface:fashion_mnist', 'huggingface:fever', 'huggingface:few_rel', 'huggingface:financial_phrasebank', 'huggingface:finer', 'huggingface:flores', 'huggingface:flue', 'huggingface:food101', 'huggingface:fquad', 'huggingface:freebase_qa', 'huggingface:gap', 'huggingface:gem', 'huggingface:generated_reviews_enth', 'huggingface:generics_kb', 'huggingface:german_legal_entity_recognition', 'huggingface:germaner', 'huggingface:germeval_14', 'huggingface:giga_fren', 'huggingface:gigaword', 'huggingface:glucose', 'huggingface:glue', 'huggingface:gnad10', 'huggingface:go_emotions', 'huggingface:gooaq', 'huggingface:google_wellformed_query', 'huggingface:grail_qa', 'huggingface:great_code', 'huggingface:greek_legal_code', 'huggingface:gsm8k', 'huggingface:guardian_authorship', 'huggingface:gutenberg_time', 'huggingface:hans', 'huggingface:hansards', 'huggingface:hard', 'huggingface:harem', 'huggingface:has_part', 'huggingface:hate_offensive', 'huggingface:hate_speech18', 'huggingface:hate_speech_filipino', 'huggingface:hate_speech_offensive', 'huggingface:hate_speech_pl', 'huggingface:hate_speech_portuguese', 'huggingface:hatexplain', 'huggingface:hausa_voa_ner', 'huggingface:hausa_voa_topics', 'huggingface:hda_nli_hindi', 'huggingface:head_qa', 'huggingface:health_fact', 'huggingface:hebrew_projectbenyehuda', 'huggingface:hebrew_sentiment', 'huggingface:hebrew_this_world', 'huggingface:hellaswag', 'huggingface:hendrycks_test', 'huggingface:hind_encorp', 'huggingface:hindi_discourse', 'huggingface:hippocorpus', 'huggingface:hkcancor', 'huggingface:hlgd', 'huggingface:hope_edi', 'huggingface:hotpot_qa', 'huggingface:hover', 'huggingface:hrenwac_para', 'huggingface:hrwac', 'huggingface:humicroedit', 'huggingface:hybrid_qa', 'huggingface:hyperpartisan_news_detection', 'huggingface:iapp_wiki_qa_squad', 'huggingface:id_clickbait', 'huggingface:id_liputan6', 'huggingface:id_nergrit_corpus', 'huggingface:id_newspapers_2018', 'huggingface:id_panl_bppt', 'huggingface:id_puisi', 'huggingface:igbo_english_machine_translation', 'huggingface:igbo_monolingual', 'huggingface:igbo_ner', 'huggingface:ilist', 'huggingface:imagenet-1k', 'huggingface:imagenet_sketch', 'huggingface:imdb', 'huggingface:imdb_urdu_reviews', 'huggingface:imppres', 'huggingface:indic_glue', 'huggingface:indonli', 'huggingface:indonlu', 'huggingface:inquisitive_qg', 'huggingface:interpress_news_category_tr', 'huggingface:interpress_news_category_tr_lite', 'huggingface:irc_disentangle', 'huggingface:isixhosa_ner_corpus', 'huggingface:isizulu_ner_corpus', 'huggingface:iwslt2017', 'huggingface:jeopardy', 'huggingface:jfleg', 'huggingface:jigsaw_toxicity_pred', 'huggingface:jigsaw_unintended_bias', 'huggingface:jnlpba', 'huggingface:journalists_questions', 'huggingface:kan_hope', 'huggingface:kannada_news', 'huggingface:kd_conv', 'huggingface:kde4', 'huggingface:kelm', 'huggingface:kilt_tasks', 'huggingface:kilt_wikipedia', 'huggingface:kinnews_kirnews', 'huggingface:klue', 'huggingface:kor_3i4k', 'huggingface:kor_hate', 'huggingface:kor_ner', 'huggingface:kor_nli', 'huggingface:kor_nlu', 'huggingface:kor_qpair', 'huggingface:kor_sae', 'huggingface:kor_sarcasm', 'huggingface:labr', 'huggingface:lama', 'huggingface:lambada', 'huggingface:large_spanish_corpus', 'huggingface:laroseda', 'huggingface:lc_quad', 'huggingface:lccc', 'huggingface:lener_br', 'huggingface:lex_glue', 'huggingface:liar', 'huggingface:librispeech_asr', 'huggingface:librispeech_lm', 'huggingface:limit', 'huggingface:lince', 'huggingface:linnaeus', 'huggingface:liveqa', 'huggingface:lj_speech', 'huggingface:lm1b', 'huggingface:lst20', 'huggingface:m_lama', 'huggingface:mac_morpho', 'huggingface:makhzan', 'huggingface:masakhaner', 'huggingface:math_dataset', 'huggingface:math_qa', 'huggingface:matinf', 'huggingface:mbpp', 'huggingface:mc4', 'huggingface:mc_taco', 'huggingface:md_gender_bias', 'huggingface:mdd', 'huggingface:med_hop', 'huggingface:medal', 'huggingface:medical_dialog', 'huggingface:medical_questions_pairs', 'huggingface:medmcqa', 'huggingface:menyo20k_mt', 'huggingface:meta_woz', 'huggingface:metashift', 'huggingface:metooma', 'huggingface:metrec', 'huggingface:miam', 'huggingface:mkb', 'huggingface:mkqa', 'huggingface:mlqa', 'huggingface:mlsum', 'huggingface:mnist', 'huggingface:mocha', 'huggingface:monash_tsf', 'huggingface:moroco', 'huggingface:movie_rationales', 'huggingface:mrqa', 'huggingface:ms_marco', 'huggingface:ms_terms', 'huggingface:msr_genomics_kbcomp', 'huggingface:msr_sqa', 'huggingface:msr_text_compression', 'huggingface:msr_zhen_translation_parity', 'huggingface:msra_ner', 'huggingface:mt_eng_vietnamese', 'huggingface:muchocine', 'huggingface:multi_booked', 'huggingface:multi_eurlex', 'huggingface:multi_news', 'huggingface:multi_nli', 'huggingface:multi_nli_mismatch', 'huggingface:multi_para_crawl', 'huggingface:multi_re_qa', 'huggingface:multi_woz_v22', 'huggingface:multi_x_science_sum', 'huggingface:multidoc2dial', 'huggingface:multilingual_librispeech', 'huggingface:mutual_friends', 'huggingface:mwsc', 'huggingface:myanmar_news', 'huggingface:narrativeqa', 'huggingface:narrativeqa_manual', 'huggingface:natural_questions', 'huggingface:ncbi_disease', 'huggingface:nchlt', 'huggingface:ncslgr', 'huggingface:nell', 'huggingface:neural_code_search', 'huggingface:news_commentary', 'huggingface:newsgroup', 'huggingface:newsph', 'huggingface:newsph_nli', 'huggingface:newspop', 'huggingface:newsqa', 'huggingface:newsroom', 'huggingface:nkjp-ner', 'huggingface:nli_tr', 'huggingface:nlu_evaluation_data', 'huggingface:norec', 'huggingface:norne', 'huggingface:norwegian_ner', 'huggingface:nq_open', 'huggingface:nsmc', 'huggingface:numer_sense', 'huggingface:numeric_fused_head', 'huggingface:oclar', 'huggingface:offcombr', 'huggingface:offenseval2020_tr', 'huggingface:offenseval_dravidian', 'huggingface:ofis_publik', 'huggingface:ohsumed', 'huggingface:ollie', 'huggingface:omp', 'huggingface:onestop_english', 'huggingface:onestop_qa', 'huggingface:open_subtitles', 'huggingface:openai_humaneval', 'huggingface:openbookqa', 'huggingface:openslr', 'huggingface:openwebtext', 'huggingface:opinosis', 'huggingface:opus100', 'huggingface:opus_books', 'huggingface:opus_dgt', 'huggingface:opus_dogc', 'huggingface:opus_elhuyar', 'huggingface:opus_euconst', 'huggingface:opus_finlex', 'huggingface:opus_fiskmo', 'huggingface:opus_gnome', 'huggingface:opus_infopankki', 'huggingface:opus_memat', 'huggingface:opus_montenegrinsubs', 'huggingface:opus_openoffice', 'huggingface:opus_paracrawl', 'huggingface:opus_rf', 'huggingface:opus_tedtalks', 'huggingface:opus_ubuntu', 'huggingface:opus_wikipedia', 'huggingface:opus_xhosanavy', 'huggingface:orange_sum', 'huggingface:oscar', 'huggingface:para_crawl', 'huggingface:para_pat', 'huggingface:parsinlu_reading_comprehension', 'huggingface:pass', 'huggingface:paws', 'huggingface:paws-x', 'huggingface:pec', 'huggingface:peer_read', 'huggingface:peoples_daily_ner', 'huggingface:per_sent', 'huggingface:persian_ner', 'huggingface:pg19', 'huggingface:php', 'huggingface:piaf', 'huggingface:pib', 'huggingface:piqa', 'huggingface:pn_summary', 'huggingface:poem_sentiment', 'huggingface:polemo2', 'huggingface:poleval2019_cyberbullying', 'huggingface:poleval2019_mt', 'huggingface:polsum', 'huggingface:polyglot_ner', 'huggingface:prachathai67k', 'huggingface:pragmeval', 'huggingface:proto_qa', 'huggingface:psc', 'huggingface:ptb_text_only', 'huggingface:pubmed', 'huggingface:pubmed_qa', 'huggingface:py_ast', 'huggingface:qa4mre', 'huggingface:qa_srl', 'huggingface:qa_zre', 'huggingface:qangaroo', 'huggingface:qanta', 'huggingface:qasc', 'huggingface:qasper', 'huggingface:qed', 'huggingface:qed_amara', 'huggingface:quac', 'huggingface:quail', 'huggingface:quarel', 'huggingface:quartz', 'huggingface:quickdraw', 'huggingface:quora', 'huggingface:quoref', 'huggingface:race', 'huggingface:re_dial', 'huggingface:reasoning_bg', 'huggingface:recipe_nlg', 'huggingface:reclor', 'huggingface:red_caps', 'huggingface:reddit', 'huggingface:reddit_tifu', 'huggingface:refresd', 'huggingface:reuters21578', 'huggingface:riddle_sense', 'huggingface:ro_sent', 'huggingface:ro_sts', 'huggingface:ro_sts_parallel', 'huggingface:roman_urdu', 'huggingface:roman_urdu_hate_speech', 'huggingface:ronec', 'huggingface:ropes', 'huggingface:rotten_tomatoes', 'huggingface:russian_super_glue', 'huggingface:rvl_cdip', 'huggingface:s2orc', 'huggingface:samsum', 'huggingface:sanskrit_classic', 'huggingface:saudinewsnet', 'huggingface:sberquad', 'huggingface:sbu_captions', 'huggingface:scan', 'huggingface:scb_mt_enth_2020', 'huggingface:scene_parse_150', 'huggingface:schema_guided_dstc8', 'huggingface:scicite', 'huggingface:scielo', 'huggingface:scientific_papers', 'huggingface:scifact', 'huggingface:sciq', 'huggingface:scitail', 'huggingface:scitldr', 'huggingface:search_qa', 'huggingface:sede', 'huggingface:selqa', 'huggingface:sem_eval_2010_task_8', 'huggingface:sem_eval_2014_task_1', 'huggingface:sem_eval_2018_task_1', 'huggingface:sem_eval_2020_task_11', 'huggingface:sent_comp', 'huggingface:senti_lex', 'huggingface:senti_ws', 'huggingface:sentiment140', 'huggingface:sepedi_ner', 'huggingface:sesotho_ner_corpus', 'huggingface:setimes', 'huggingface:setswana_ner_corpus', 'huggingface:sharc', 'huggingface:sharc_modified', 'huggingface:sick', 'huggingface:silicone', 'huggingface:simple_questions_v2', 'huggingface:siswati_ner_corpus', 'huggingface:smartdata', 'huggingface:sms_spam', 'huggingface:snips_built_in_intents', 'huggingface:snli', 'huggingface:snow_simplified_japanese_corpus', 'huggingface:so_stacksample', 'huggingface:social_bias_frames', 'huggingface:social_i_qa', 'huggingface:sofc_materials_articles', 'huggingface:sogou_news', 'huggingface:spanish_billion_words', 'huggingface:spc', 'huggingface:species_800', 'huggingface:speech_commands', 'huggingface:spider', 'huggingface:squad', 'huggingface:squad_adversarial', 'huggingface:squad_es', 'huggingface:squad_it', 'huggingface:squad_kor_v1', 'huggingface:squad_kor_v2', 'huggingface:squad_v1_pt', 'huggingface:squad_v2', 'huggingface:squadshifts', 'huggingface:srwac', 'huggingface:sst', 'huggingface:stereoset', 'huggingface:story_cloze', 'huggingface:stsb_mt_sv', 'huggingface:stsb_multi_mt', 'huggingface:style_change_detection', 'huggingface:subjqa', 'huggingface:super_glue', 'huggingface:superb', 'huggingface:svhn', 'huggingface:swag', 'huggingface:swahili', 'huggingface:swahili_news', 'huggingface:swda', 'huggingface:swedish_medical_ner', 'huggingface:swedish_ner_corpus', 'huggingface:swedish_reviews', 'huggingface:swiss_judgment_prediction', 'huggingface:tab_fact', 'huggingface:tamilmixsentiment', 'huggingface:tanzil', 'huggingface:tapaco', 'huggingface:tashkeela', 'huggingface:taskmaster1', 'huggingface:taskmaster2', 'huggingface:taskmaster3', 'huggingface:tatoeba', 'huggingface:ted_hrlr', 'huggingface:ted_iwlst2013', 'huggingface:ted_multi', 'huggingface:ted_talks_iwslt', 'huggingface:telugu_books', 'huggingface:telugu_news', 'huggingface:tep_en_fa_para', 'huggingface:text2log', 'huggingface:textvqa', 'huggingface:thai_toxicity_tweet', 'huggingface:thainer', 'huggingface:thaiqa_squad', 'huggingface:thaisum', 'huggingface:the_pile', 'huggingface:the_pile_books3', 'huggingface:the_pile_openwebtext2', 'huggingface:the_pile_stack_exchange', 'huggingface:tilde_model', 'huggingface:time_dial', 'huggingface:times_of_india_news_headlines', 'huggingface:timit_asr', 'huggingface:tiny_shakespeare', 'huggingface:tlc', 'huggingface:tmu_gfm_dataset', 'huggingface:tne', 'huggingface:told-br', 'huggingface:totto', 'huggingface:trec', 'huggingface:trivia_qa', 'huggingface:truthful_qa', 'huggingface:tsac', 'huggingface:ttc4900', 'huggingface:tunizi', 'huggingface:tuple_ie', 'huggingface:turk', 'huggingface:turkic_xwmt', 'huggingface:turkish_movie_sentiment', 'huggingface:turkish_ner', 'huggingface:turkish_product_reviews', 'huggingface:turkish_shrinked_ner', 'huggingface:turku_ner_corpus', 'huggingface:tweet_eval', 'huggingface:tweet_qa', 'huggingface:tweets_ar_en_parallel', 'huggingface:tweets_hate_speech_detection', 'huggingface:twi_text_c3', 'huggingface:twi_wordsim353', 'huggingface:tydiqa', 'huggingface:ubuntu_dialogs_corpus', 'huggingface:udhr', 'huggingface:um005', 'huggingface:un_ga', 'huggingface:un_multi', 'huggingface:un_pc', 'huggingface:universal_dependencies', 'huggingface:universal_morphologies', 'huggingface:urdu_fake_news', 'huggingface:urdu_sentiment_corpus', 'huggingface:vctk', 'huggingface:visual_genome', 'huggingface:vivos', 'huggingface:web_nlg', 'huggingface:web_of_science', 'huggingface:web_questions', 'huggingface:weibo_ner', 'huggingface:wi_locness', 'huggingface:wider_face', 'huggingface:wiki40b', 'huggingface:wiki_asp', 'huggingface:wiki_atomic_edits', 'huggingface:wiki_auto', 'huggingface:wiki_bio', 'huggingface:wiki_dpr', 'huggingface:wiki_hop', 'huggingface:wiki_lingua', 'huggingface:wiki_movies', 'huggingface:wiki_qa', 'huggingface:wiki_qa_ar', 'huggingface:wiki_snippets', 'huggingface:wiki_source', 'huggingface:wiki_split', 'huggingface:wiki_summary', 'huggingface:wikiann', 'huggingface:wikicorpus', 'huggingface:wikihow', 'huggingface:wikipedia', 'huggingface:wikisql', 'huggingface:wikitablequestions', 'huggingface:wikitext', 'huggingface:wikitext_tl39', 'huggingface:wili_2018', 'huggingface:wino_bias', 'huggingface:winograd_wsc', 'huggingface:winogrande', 'huggingface:wiqa', 'huggingface:wisesight1000', 'huggingface:wisesight_sentiment', 'huggingface:wmt14', 'huggingface:wmt15', 'huggingface:wmt16', 'huggingface:wmt17', 'huggingface:wmt18', 'huggingface:wmt19', 'huggingface:wmt20_mlqe_task1', 'huggingface:wmt20_mlqe_task2', 'huggingface:wmt20_mlqe_task3', 'huggingface:wmt_t2t', 'huggingface:wnut_17', 'huggingface:wongnai_reviews', 'huggingface:woz_dialogue', 'huggingface:wrbsc', 'huggingface:x_stance', 'huggingface:xcopa', 'huggingface:xcsr', 'huggingface:xed_en_fi', 'huggingface:xglue', 'huggingface:xnli', 'huggingface:xor_tydi_qa', 'huggingface:xquad', 'huggingface:xquad_r', 'huggingface:xsum', 'huggingface:xsum_factuality', 'huggingface:xtreme', 'huggingface:yahoo_answers_qa', 'huggingface:yahoo_answers_topics', 'huggingface:yelp_polarity', 'huggingface:yelp_review_full', 'huggingface:yoruba_bbc_topics', 'huggingface:yoruba_gv_ner', 'huggingface:yoruba_text_c3', 'huggingface:yoruba_wordsim353', 'huggingface:youtube_caption_corrections', 'huggingface:zest', 'kubric:kubric_frames', 'kubric:movi_a', 'kubric:movi_b', 'kubric:movi_c', 'kubric:movi_d', 'kubric:movi_e', 'kubric:movi_f', 'kubric:msn_easy_frames', 'kubric:multi_shapenet_frames', 'kubric:nerf_synthetic_frames', 'kubric:nerf_synthetic_scenes', 'kubric:shapenet_pretraining', 'robotics:language_table', 'robotics:language_table_blocktoabsolute_oracle_sim', 'robotics:language_table_blocktoblock_4block_sim', 'robotics:language_table_blocktoblock_oracle_sim', 'robotics:language_table_blocktoblock_sim', 'robotics:language_table_blocktoblockrelative_oracle_sim', 'robotics:language_table_blocktorelative_oracle_sim', 'robotics:language_table_separate_oracle_sim', 'robotics:language_table_sim', 'robotics:mt_opt_rlds', 'robotics:mt_opt_sd']

數(shù)據(jù)集太多,可能會感到陌生,不用一一去查看和測試這些數(shù)據(jù)集虫几。下面列出常用的幾種類型锤灿,分別設(shè)計文本類、圖像類辆脸、文檔概要但校、自然語言類、對象檢測啡氢、推薦以及視頻類數(shù)據(jù)状囱。

類別 名稱 說明
文本 billsum BookSum: A Collection of Datasets for Long-form Narrative Summarization
NEWSROOM NEWSROOM is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications.
samsum SAMSum Corpus contains over 16k chat dialogues with manually annotated summaries.
圖像 imagenet_v2 ImageNet-v2 is an ImageNet test set (10 per class) collected by closely following the original labelling protocol
Cityscapes Cityscapes is a dataset consisting of diverse urban street scenes across 50 different cities at varying times of the year as well as ground truths for several vision tasks including semantic segmentation, instance level segmentation (TODO), and stereo pair disparity inference.
cifar100 This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
food101 This dataset consists of 101 food categories, with 101'000 images. For each class, 250 manually reviewed test images are provided as well as 750 training images.
mnist The MNIST database of handwritten digits.
文檔概要 opinion_abstracts The movie critics and consensus crawled.
booksum BookSum: A Collection of Datasets for Long-form Narrative Summarization.
自然語言 natural_questions The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question.
math_qa A large-scale dataset of math word problems and an interpretable neural math problem solver that learns to map problems to operation programs.
對象識別 waymo_open_dataset The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions. This data is licensed for non-commercial use.
對象識別 coco COCO is a large-scale object detection, segmentation, and captioning dataset.
推薦 hillstrom This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
視頻 bair_robot_pushing_small this data set contains roughly 44,000 examples of robot pushing motions, including one training set (train) and two test sets of previously seen (testseen) and unseen (testnovel) objects. This is the small 64x64 version.
TAO The TAO dataset is a large video object detection dataset consisting of 2,907 high resolution videos and 833 object categories.
WebVid WebVid is a large-scale dataset of short videos with textual descriptions sourced from the web. The videos are diverse and rich in their content.

完整列表可以訪問https://www.tensorflow.org/datasets/catalog/overview

一般來說,安裝好了TensorFlow倘是,TensorFlow Datasets庫也被默認安裝亭枷。也可以通過如下命令單獨安裝TensorFlow Datasets,


pip install tensorflow_datasets

TensorFlow Datasets使用

下面以MNIST數(shù)據(jù)集為例搀崭,介紹TensorFlow Datasets數(shù)據(jù)集的基本使用方法叨粘。代碼如下,


import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds

def setup():
    
    mnist = tfds.load(name” = mnist", data_dir = "/tmp")
    trains, tests = mnist["train"], mnist["test"]
    
    assert isinstance(trains, tf.data.Dataset)
    
def main():
    
    setup()
    
if __name__ == "__main__":
    
    main()

代碼里瘤睹,首先導(dǎo)入tensorflow_datasets作為導(dǎo)入數(shù)據(jù)的入口升敲,之后調(diào)用load函數(shù)并傳遞name和data_dir(可選,默認當前用戶主目錄下)來加載數(shù)據(jù)集轰传。在按照trains和tests將其分割成訓(xùn)練集和測試集驴党。運行結(jié)果打印輸入出如下,


Downloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /tmp/mnist/3.0.1...
Dl Completed...: 100%|#########################| 5/5 [00:07<00:00,  1.58s/ file]
Dataset mnist downloaded and prepared to /tmp/mnist/3.0.1. Subsequent calls will reuse this data.

由于是第一次下載获茬,tfds連接數(shù)據(jù)的下載點獲取數(shù)據(jù)的下載地址和內(nèi)容鼻弧,耐心等待下載完成。修改代碼锦茁,添加打印trains和tests功能攘轩,然后再次運行。


import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds

def setup():
    
    mnist = tfds.load(name” = mnist", data_dir = "/tmp")
    trains, tests = mnist["train"], mnist["test"]
    
    assert isinstance(trains, tf.data.Dataset)
    
    print(trains, tests)
    
def main():
    
    setup()
    
if __name__ == "__main__":
    
    main()

運行結(jié)果打印輸出如下码俩,


<_PrefetchDataset element_spec={'image': TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}> <_PrefetchDataset element_spec={'image': TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}>

運行可知度帮,已經(jīng)下載數(shù)據(jù)集,不會再次下載稿存,除非把數(shù)據(jù)集從目錄里刪除笨篷。另外,數(shù)據(jù)集已經(jīng)被調(diào)整成相應(yīng)的維度和數(shù)據(jù)格式瓣履。根據(jù)輸出的打印信息可知率翅,MNIST數(shù)據(jù)集中的數(shù)據(jù)為3維、大小為[28, 28, 1]的圖片袖迎,數(shù)據(jù)類型是uint8冕臭,而label類型時int64腺晾。

TensorFlow Datasets的load函數(shù),提供了一種簡便的方法辜贵,用以構(gòu)建和加載tensorflow.data.Dataset最快捷的方法悯蝉。器獲取的是一個不同的字典類型文件,根據(jù)不同的key獲取不同的value托慨。

然而鼻由,在JAX中,處理的數(shù)據(jù)基本都是float類型厚棵,很顯然與uint8蕉世、int64
不兼容。為了方便那些在程序中需要使用NumPy數(shù)組的用戶婆硬,可以使用tfds.as_numpy返回一個用于生成NumPy數(shù)組狠轻。以下是示例代碼,


import tensorflow as tf
import tensorflow_datasets as tfds

def setup():
    
    trains = tfds.load(name = "mnist", split = tfds.Split.TRAIN, data_dir = "/tmp")
    trains = trains.shuffle(1024).batch(128).repeat(5).prefetch(10)
    
    i = 0
    
    for item in tfds.as_numpy(trains):
        
        images, labels = item["image"], item["label"]
        
        print(f"i = {i}, images.shape = {images.shape}, labels.shape = {labels.shape}")
        
        i = i + 1
        
def main():
    
    setup()
    
if __name__ == "__main__":
    
    main()

運行結(jié)果打印輸出如下柿祈,

…
i = 2326, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2327, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2328, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2329, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2330, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2331, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2332, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2333, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2334, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2335, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2336, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2337, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2338, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2339, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2340, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2341, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2342, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2343, images.shape = (128, 28, 28, 1), labels.shape = (128,)
i = 2344, images.shape = (96, 28, 28, 1), labels.shape = (96,)

tfds.load函數(shù)還可以指定batch_size = -1哈误,從而返回tf.Tensor對象中獲取完整數(shù)據(jù)集哩至。修改代碼如下躏嚎,


import tensorflow as tf
import tensorflow_datasets as tfds

def setup():
    
    trains = tfds.load(name = "mnist", split = tfds.Split.TRAIN, data_dir = "/tmp")
    trains = tfds.load(name = "mnist", batch_size = -1, split = tfds.Split.TRAIN, data_dir = "/tmp")
    trains = trains.shuffle(1024).batch(128).repeat(5).prefetch(10)
    
    i = 0
    
    for item in tfds.as_numpy(trains):
        
        images, labels = item["image"], item["label"]
        
        print(f"i = {i}, images.shape = {images.shape}, labels.shape = {labels.shape}")
        
        i = i + 1
        
def setup_():
    
    trains = tfds.load(name = "mnist", batch_size = -1, split = tfds.Split.TRAIN, data_dir = "/tmp")
    trains = tfds.as_numpy(trains)
    
    train_images, train_labels = trains["image"], trains["label"]
    
    print(f"train_images.shape = {train_images.shape}, train_labels.shape = {train_labels.shape}")
        
        
def main():
    
    # setup()
    setup_()
    
if __name__ == "__main__":
    
    main()

運行結(jié)果打印輸出如下,


train_images.shape = (60000, 28, 28, 1), train_labels.shape = (60000,)

Load函數(shù)調(diào)用時菩貌,split參數(shù)指定將數(shù)據(jù)進行分割卢佣。如果需要對數(shù)據(jù)集進行更細分,可以安權(quán)重將其細分成訓(xùn)練集箭阶、測試集和驗證集虚茶。代碼如下,


import tensorflow_datasets as tfds

def setup():
    
    splits = ["train[:50%]", "train[:20%]", "train[:25%]"]
    
    (trains, validations, tests), metas = tfds.load(name = "mnist", data_dir = "/tmp/", split = list(splits), with_info = True, as_supervised = True)
    
    print(f"trains = {trains}, validations = {validations}, tests = {tests}), metas = {metas}")
    
def main():
    
    setup()
    
if __name__ == "__main__":
    
    main()

這里使用了splits = ["train[:50%]", "train[:20%]", "train[:25%]”]來指定split = list(splits)參數(shù)仇参,按其指定的權(quán)重降訓(xùn)練集嘹叫、驗證集和測試集分別分割成50%、25%和25%诈乒。with_info屬性獲取了mnist數(shù)據(jù)集的基本信息罩扇。包含數(shù)據(jù)的種類、大小以及對應(yīng)的格式怕磨。運行結(jié)果打印輸入出如下喂饥,


rains = <_PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>, validations = <_PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>, tests = <_PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>), metas = tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='/tmp/mnist/3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)

結(jié)論

這兩章以TensorFlow Datasets為例介紹了公共數(shù)據(jù)集,以及在JAX里使用數(shù)據(jù)集肠鲫。這些數(shù)據(jù)集讓JAX很自由地借助于TensorFlow Datasets等公共數(shù)據(jù)集來進行訓(xùn)練员帮,從而解決用戶尋找數(shù)據(jù)集的困難。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末导饲,一起剝皮案震驚了整個濱河市捞高,隨后出現(xiàn)的幾起案子氯材,更是在濱河造成了極大的恐慌,老刑警劉巖棠枉,帶你破解...
    沈念sama閱讀 212,029評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件浓体,死亡現(xiàn)場離奇詭異,居然都是意外死亡辈讶,警方通過查閱死者的電腦和手機命浴,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,395評論 3 385
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來贱除,“玉大人生闲,你說我怎么就攤上這事≡禄希” “怎么了碍讯?”我有些...
    開封第一講書人閱讀 157,570評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長扯躺。 經(jīng)常有香客問我捉兴,道長,這世上最難降的妖魔是什么录语? 我笑而不...
    開封第一講書人閱讀 56,535評論 1 284
  • 正文 為了忘掉前任倍啥,我火速辦了婚禮,結(jié)果婚禮上澎埠,老公的妹妹穿的比我還像新娘虽缕。我一直安慰自己,他們只是感情好蒲稳,可當我...
    茶點故事閱讀 65,650評論 6 386
  • 文/花漫 我一把揭開白布氮趋。 她就那樣靜靜地躺著,像睡著了一般江耀。 火紅的嫁衣襯著肌膚如雪剩胁。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,850評論 1 290
  • 那天祥国,我揣著相機與錄音昵观,去河邊找鬼。 笑死系宫,一個胖子當著我的面吹牛索昂,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播扩借,決...
    沈念sama閱讀 39,006評論 3 408
  • 文/蒼蘭香墨 我猛地睜開眼椒惨,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了潮罪?” 一聲冷哼從身側(cè)響起康谆,我...
    開封第一講書人閱讀 37,747評論 0 268
  • 序言:老撾萬榮一對情侶失蹤领斥,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后沃暗,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體月洛,經(jīng)...
    沈念sama閱讀 44,207評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,536評論 2 327
  • 正文 我和宋清朗相戀三年孽锥,在試婚紗的時候發(fā)現(xiàn)自己被綠了嚼黔。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 38,683評論 1 341
  • 序言:一個原本活蹦亂跳的男人離奇死亡惜辑,死狀恐怖唬涧,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情盛撑,我是刑警寧澤碎节,帶...
    沈念sama閱讀 34,342評論 4 330
  • 正文 年R本政府宣布,位于F島的核電站抵卫,受9級特大地震影響狮荔,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜介粘,卻給世界環(huán)境...
    茶點故事閱讀 39,964評論 3 315
  • 文/蒙蒙 一殖氏、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧碗短,春花似錦受葛、人聲如沸题涨。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,772評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽纲堵。三九已至巡雨,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間席函,已是汗流浹背铐望。 一陣腳步聲響...
    開封第一講書人閱讀 32,004評論 1 266
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留茂附,地道東北人正蛙。 一個月前我還...
    沈念sama閱讀 46,401評論 2 360
  • 正文 我出身青樓,卻偏偏與公主長得像营曼,于是被迫代替她去往敵國和親乒验。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 43,566評論 2 349

推薦閱讀更多精彩內(nèi)容