新南威爾士大學(xué) comp9444 Assignment2課業(yè)解析

題意:

遞歸神經(jīng)網(wǎng)絡(luò)與情感分類——按要求訓(xùn)練二元文本分類器,用pytorch實(shí)現(xiàn)不同的神經(jīng)網(wǎng)絡(luò)模型,使準(zhǔn)確率達(dá)到80%以上

解析:

任務(wù)要求分別實(shí)現(xiàn)基于LSTM(一種特殊的RNN)和CNN兩種神經(jīng)網(wǎng)絡(luò)的二元文本分類器,訓(xùn)練并預(yù)測電影評價(jià)是正面的還是負(fù)面的蚯斯。根據(jù)要求定義相應(yīng)的損失函數(shù)loss function,并返回如下預(yù)測結(jié)果,用于計(jì)算準(zhǔn)確率子漩。需要注意的是,不得使用提供的數(shù)據(jù)集之外的數(shù)據(jù)石洗,也不能修改數(shù)據(jù)幢泼。?

混淆矩陣

True Positive(真正,TP):將正類預(yù)測為正類數(shù)

True Negative(真負(fù)讲衫,TN):將負(fù)類預(yù)測為負(fù)類數(shù)?

False Positive(假正缕棵,F(xiàn)P):將負(fù)類預(yù)測為正類數(shù)誤報(bào) (Type I error)?

False Negative(假負(fù),F(xiàn)N):將正類預(yù)測為負(fù)類數(shù)→漏報(bào) (Type II error)?


?TrueTrue Positive(TP)?True Negative(TN)?

?FalseFalse Positive(FP)?False Negative(FN)?

??預(yù)測?類別??

???Yes?No合計(jì)

?實(shí)際?Yes?TP?FNP(實(shí)際為Yes)

?類別?No?FP?TNN(實(shí)際為No)?

??合計(jì)?P`(被分為Yes)?N`(被分為No)?P+N

涉及知識點(diǎn):

RNN,CNN

更多可加微信討論

微信號tiamo-0620

pdf

COMP9444 Neural Networks and Deep Learning

Term 3, 2019

Project 2 - Recurrent Networks and Sentiment Classification

Due: Sunday 24 November, 23:59 pm

Marks: 24% of final assessment

NOTE: READ THIS DOCUMENT IN ITS ENTIRELY PRIOR TO STARTING THE ASSINGNMENT.

This assignment is divided into three parts:

Part 1 contains simple PyTorch questions focused on reccurent neural networks designed to get you started and familiar with this part of the library.

Part 2 involves creating specific recurrent network structures in order to detect if a movie review is positive or negative in sentiment.

Part 3 is an unrestricted task where marks will be assigned primarily on final accuracy, and you may implement any network structure you choose.

Provided Files

Copy the archivehw2.zipinto your own filespace and unzip it. This should create anhw2directory with three skeleton filespart1.py,part2.pyandpart3.pyas well as two subdirectories:dataand.vector_cache

Your task is to complete the skeleton files according to the specifications in this document, as well as in the comments in the files themselves. Each file contains functions or classes markedTODO:which correspond to the marking scheme shown below. This document contains general information for each task, with in-code comments supplying more detail. Parts 1 and 2 in this assignment are sufficiently specified to have only one correct answer (although there may be multiple ways to implement it). If you feel a requirement is not clear you may ask for additional information on the course forum.

There is also an additional file,imdb_dataloader.py. This is used to load the dataset provided to you in./datafor parts 2 and 3. It will also be used in our testing. Do not modify this file.

Marking Scheme

All parts of the assignment will be automarked. Marks are assigned as follows.

Part 1:1.[0.5]RnnCell

2.[0.5]Rnn

3.[1]RnnSimplified

4.[1]Lstm

5.[1]Conv

Part 21.[3]LSTM

2.[3]CNN

3.[1]Loss

4.[1]Measures

Part 3[12]Full Model

Similarly to the first assignment, when you submit your files through give, simple submission tests will be run to test the functionality of part 1, and to check that the code you have implemented in parts 2 and 3 is in the correct format and that we can test your models. The tests you see on submission are the only tests we will run for part 1 - so if you pass these you know you will receive full marks for part 1. After submissions have closed, we will run the final marking scripts, which will assign marks for each task. For part 2 this will test the correctness of the networks, and for part 3 this will be inference on the full test dataset. We will not release these final tests, however you will be able to see basic information outlining which sections of code were incorrect (if you do not receive full marks) when you view your marked assignment.

Groups

This assignment may be done individually, or in groups of two students. Groups are determined by an SMS field called hw2group. Every student has initially been assigned a unique hw2group which is "h" followed by their studentID number, e.g.h1234567. If you plan to complete the assignment individually, you dona€?t need to do anything (but, if you do create a group with only you as a member, thata€?s ok too). If you wish to form a group, go to the COMP9444 WebCMS page and click on "Groups" in the left hand column, then click "Create". Click on the menu for "Group Type" and select "hw2". After creating a group, click "Edit", search for the other member, and click "Add". WebCMS assigns a unique group ID to each group, in the form of "g" followed by six digits (e.g. g012345). We will periodically run a script to load these values into SMS.

Setting up your development environment

You should follow the instructions from Assignment 1, or use the environment you have already created there. In this assignment we will be using an additional library which you must install.

Activate your environment (not nessesary if you are not using virtual envs):

conda activate COMP9444

Install torchtext:

conda install torchtext

For this assignment a GPU will speed up computation, which may be helpful for part 3. For this reason you may wish to look into Google Colabs, which is a free service from google that allows development in hosted notebooks that are able to connect to GPU and TPU (Google's custom NN chip - faster than GPU's) hardware runtimes. This is not necessary to complete the assignment but some students might find it helpful.

More information and a good getting started guide ishere.

It is important to note this is just an option and not something required by this course - some of the tutors are not familiar with colabs and will not be able to give troubleshooting advice for colab-specific issues. If you are in doubt, develop locally.

Part 1 [4 marks]

For Part 1 of the assignment, you should work through the filepart1.pyand complete the functions where specified.

Part 2 [8 marks]

For Part 2, you will develop several models to solve a text classification task on movie review data. The goal is to train a classifier that can correctly identify whether a review is positive or negative. The labeled data is located indata/imdb/aclimdband is split into train (training) and dev (development) sets, which contain 25000 and 6248 samples respectively. For each set, the balance between positive and negative reviews is equal, so you don't need to worry about class imbalances.

You should take at least 10 minutes to manually inspect the data so as to understand what is being classified. In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and dev sets contain a disjoint set of movies, so no significant performance is obtained by memorizing movie-unique terms and their association with observed labels. In the labeled train/dev sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included.

The provided filepart2.pyis what you need to complete. This code makes heavy use oftorchtext, which aims to be the NLP equivelent totorchvision. It is advisable to develop a basic understanding of the package by skimming the documentationhere, or reading the very good tutorialhere.

Since this is not an NLP course, the following have already been implemented for you:

Dataloading: a dataloader has been provided inimdb_dataloader.py. This will load the files into memory correctly.

Preprocessing: review strings are converted to lower case, lengths of the reviews are calculated and added to the dataset. This allows for dynamic padding.

Tokenization: the review strings are broken into a list of their constituent words.

Vectorization: words are converted to vectors. Here we use 50-dimensional GloVe embeddings.

Batching: We use theBucketIterator()provided by torchtext so as to create batches of similar lengths. This isn't necessary for accuracy but will speed up training since the total sequence length can be reduced for some batches.

Glove vectors are stored in the.vector_cachedirectory.

You should seek to understand the code provided as it will be a good starting point for part 3. Additionally, the code is structured to be backend-agnostic. That is, if a GPU is present, it will automatically be used, if one is not, the CPU will be used. This is the purpose of the.to(device)function being called on several operations.

For all tasks in this part, if arguments are not specified assume PyTorch defaults.

Task 1: LSTM Network

Implement an LSTM Network according to the function docstring. When combined with an appropriate loss function this model should achieve ~81% when run using the provided code.

Task 2: CNN Network

Implement a CNN Network according to the function docstring. When combined with an appropriate loss function this model should achieve ~82% when run using the provided code.

Task 3: Loss function

Define a loss function according to the function docstring.

Task 4: Measures

Return (in the following order), the number of true positive classifications, true negatives, false positives and false negatives. True positives are positive reviews correctly identified as positive. True negatives are negative reviews correctly identified as negative. False positives are negative reviews incorrectly identified as positive. False negatives are postitive reviews incorrectly identified as negative.

Part 3 [12 marks]

The goal of this section is to simply achieve the highest accuracy you can on a holdout test set (i.e. a section of the dataset that we do not make available to you, but will test your model against).

You may use any form of model and preprocessing you like to achieve this, provided you adhere to the constraints listed below.

The provided codepart3.pyis essentially the same aspart2.pyexcept that it reports the overall accuracy, and at the end of training it saves the model in a file calledmodel.pth(which you will need to submit). A good starting point would be to copy the relevant sections of code from your best model forpart2.pyintopart3.py.

Your code must be capable of handling various batch sizes. You can check this is working ok with the submission tests. The code provided inpart3.pyalready does this.

You can modify and change the code however you would like, however you MUST ensure that we can load your code to test it. This is done in the following way:

Import and create and instance of your network from thepart3.pyfile you submit.

Restore this network to its trained state using thestate-dictyou provide.

Load a test dataset, preprocessing each sample using thetext_fieldyou specify in yourPreProcessingclass.

Feed this dataset into your model and record the accuracy.

You should check the docs on thetorchtext.data.Fieldclass to understand what you can and cana€?t do to the input.

Specific to preprocessing, you may add a post-processing function to the field, as long as that function is also declared in thePreprocessingclass. You may also add a custom tokenizer, stopwords, etc. Note that none of this is necessarily required, but it is possible.

You may wish to carry out some data augmentation. This is because in practice more data will outperform a better model. Data augmentation (transforming the data you have been provided and creating a new sample with the same label) is allowed. You are allowed to modify themain()function to create additional data in place. You may not call any remote API's when doing this. Assume the test environment has no internet connection.

You may NOT download or load data other than what we have provided. If we find your submitted model has been trained on external data you will receive a mark of 0 for the assignment.

We understand that some of you may wish to use external libraries. This is possible. If you wish to do so, post on the course forum detailing the library you would like to use if we think this request is reasonable we will add it to the testing environment. You need to demonstrate a real need for the library and explain why not using it would be grossly inefficient. We will keep a list of accepted packages and their versions in theFAQ

Marks for part 3 will be based primarily on the accuracy your model achieves on the unseen test set.

When you submit part 3, in addition to the standard checks that we can run and evaluate your model, you will also see an accuracy value. This is the result of running your model on a very small number of held-out training examples (~600). These samples can be considered representative of the final test set, however the final accuracy will be calculated from significantly more samples (~18 000). The submission test should take no longer than 10s to run.

Example of a successful submission:

submission_test.py::test_rnnCell PASSED

submission_test.py::test_rnn PASSED

submission_test.py::test_rnnSimplified PASSED

submission_test.py::test_lstm PASSED

submission_test.py::test_conv PASSED

submission_test.py::test_part2_networks PASSED

submission_test.py::test_measures PASSED

submission_test.py::test_part3

Importing your code..

Using device: cuda:0

Loading model.pth ..

Loading Vocab objects..

Loading submission test samples..

Loaded 652 samples

Evaluating model..

Submission accuracy = 89.74%

PASSED

Constraints

Saved model state-dict must be under 5MB and you cannot load external assets in the network class

Model must be defined in a class namednetwork.

The save file you submit must be generated by thepart3.pyfile you submit.

Must use 50d GloVe Vectors for vectorization. This means no pretraining. We are solely interested in the problem as a classification task, so trying to use something like BERT or GPT-2 is not relevant.

While you may train on a GPU, you must ensure your model is able to be evaluated (i.e. perform inference) on a CPU.

Common Questions:

Can I train on the full dataset if I find it?No. We are aware that it is possible to obtain the full IMDB dataset. For this reason we will be automatically searching code for the loading of external assets. If this is found you will receive 0. In addition we will retrain a random selection of submissions, and those achieving high accuracy. If we find the code used for training does not match the model output you will receive a mark of 0.

Can I train on the dev set?Yes, you should use the dev set during development to guide your architectural choices, however more data will almost always help a model, so prior to submission it would be a good idea to train on all labeled data provided.

Can I use different word vectors?No.

My model is only slightly larger than 5MB, can you still accept it?No, the 5MB limit is part of the assignment spec and changes the way to approach the problem compared to if there was no limit.

Can we assume you will callmodel.eval()on our model prior to testing?Yes.

Can we assume a max length on the reviews?No. But nothing will be significantly longer than what is present in the test and dev sets.

General Advice:

You have been provided only rudimentary skeleton code that saves your model and prints the loss and accuracy at various inputs. You will almost certainly need to expand on this code so as to have a clearer understanding of what your model is doing.

If you find your local accuracy is high, but the submission accuracy is low, you are overfitting to your local data.

When doing a project like this, the effect of tooling is generally underestimated. You will need a system to log your experimental results while developing. One way to do this is a simple document with loss curves matched to hyperparameters and a git commit tag. There are more sophisticated systems such assacredortensorboardthat you might want to look into as well.

Blindly modifying code, looking at the output, then modifying again will have you going in circles very quickly. Decide on a hypothesis you want to test, then do so and record the result. Then move onto the next idea.

You should consider the test script to be the final arbiter with regards to whether a certain approach is valid. If you do it, and the submission test runs and you get a good accuracy then the approach is valid. If it causes errors then it is not valid.

Submission

You can test your code by typing

python3 part2.py

python3 part3.py

You should submit by typing

give cs9444 hw2 part1.py part2.py part3.py model.pth

You can submit as many times as you like - later submissions by either group member will overwrite previous submissions by either group member. You can check that your submission has been received by using the following command:

9444 classrun -check

The submission deadline is Sunday 24 November, 23:59. 15% penalty will be applied to the (maximum) mark for every 24 hours late after the deadline.

Additional information may be found in theFAQand will be considered as part of the specification for the project. You should check this page regularly.

Final Notes

Similarly to Assignment 1, we will be using PyTest to automatically grade submissions.

For part 2, you can pass the submission test and still have a very incorrect model. These tests just check if we can run them. You should rigorously test your code based on the specifications listed here, as well as within the provided file.

Ensure that you are passing submission tests early, because if a submission cannot be run, it will receive 0 marks for that part. There will be no special consideration given in these cases. Automated testing marks are final. "I uploaded the wrong version at the last minute" is not a valid excuse for a remark. For this reason, ensure you are in the process of uploading your solution at least 2 hours before the deadline. Do not leave this assignment to the last minute, as it is likely that close to the deadline, the wait time on submission test results will increase.

Plagiarism Policy

Your program must be entirely your own work. Plagiarism detection software will be used to compare all submissions pairwise and serious penalties will be applied, particularly in the case of repeat offences.

DO NOT COPY FROM OTHERS; DO NOT ALLOW ANYONE TO SEE YOUR CODE

Please refer to theUNSW Policy on Academic Integrity and Plagiarismif you require further clarification on this matter.

Good luck!

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末招驴,一起剝皮案震驚了整個(gè)濱河市篙程,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌别厘,老刑警劉巖房午,帶你破解...
    沈念sama閱讀 218,451評論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異丹允,居然都是意外死亡郭厌,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,172評論 3 394
  • 文/潘曉璐 我一進(jìn)店門雕蔽,熙熙樓的掌柜王于貴愁眉苦臉地迎上來折柠,“玉大人,你說我怎么就攤上這事批狐∩仁郏” “怎么了?”我有些...
    開封第一講書人閱讀 164,782評論 0 354
  • 文/不壞的土叔 我叫張陵嚣艇,是天一觀的道長承冰。 經(jīng)常有香客問我,道長食零,這世上最難降的妖魔是什么困乒? 我笑而不...
    開封第一講書人閱讀 58,709評論 1 294
  • 正文 為了忘掉前任,我火速辦了婚禮贰谣,結(jié)果婚禮上娜搂,老公的妹妹穿的比我還像新娘。我一直安慰自己吱抚,他們只是感情好百宇,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,733評論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著秘豹,像睡著了一般携御。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上既绕,一...
    開封第一講書人閱讀 51,578評論 1 305
  • 那天啄刹,我揣著相機(jī)與錄音,去河邊找鬼岸更。 笑死鸵膏,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的怎炊。 我是一名探鬼主播,決...
    沈念sama閱讀 40,320評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼评肆!你這毒婦竟也來了债查?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,241評論 0 276
  • 序言:老撾萬榮一對情侶失蹤瓜挽,失蹤者是張志新(化名)和其女友劉穎盹廷,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體久橙,經(jīng)...
    沈念sama閱讀 45,686評論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡俄占,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,878評論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了淆衷。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片缸榄。...
    茶點(diǎn)故事閱讀 39,992評論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖祝拯,靈堂內(nèi)的尸體忽然破棺而出甚带,到底是詐尸還是另有隱情,我是刑警寧澤佳头,帶...
    沈念sama閱讀 35,715評論 5 346
  • 正文 年R本政府宣布鹰贵,位于F島的核電站,受9級特大地震影響康嘉,放射性物質(zhì)發(fā)生泄漏碉输。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,336評論 3 330
  • 文/蒙蒙 一亭珍、第九天 我趴在偏房一處隱蔽的房頂上張望腊瑟。 院中可真熱鬧,春花似錦块蚌、人聲如沸闰非。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,912評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽财松。三九已至,卻和暖如春纱控,著一層夾襖步出監(jiān)牢的瞬間辆毡,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,040評論 1 270
  • 我被黑心中介騙來泰國打工甜害, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留舶掖,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 48,173評論 3 370
  • 正文 我出身青樓尔店,卻偏偏與公主長得像眨攘,于是被迫代替她去往敵國和親主慰。 傳聞我的和親對象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,947評論 2 355

推薦閱讀更多精彩內(nèi)容