開(kāi)源 NSFW 模型
This repo contains code for running Not Suitable for Work (NSFW) classification deep neural network Caffe models. Please refer our blog post which describes this work and experiments in more detail.
該倉(cāng)庫(kù)包含運(yùn)行NSFW類的深度神經(jīng)網(wǎng)絡(luò)Caffe模型代碼顿肺。想要了解有關(guān)該模型的更多作品和實(shí)驗(yàn)細(xì)節(jié)管嬉,請(qǐng)參閱我們的博客留凭。
Not suitable for work classifier
NSFW分級(jí)器
Detecting offensive / adult images is an important problem which researchers have tackled for decades. With the evolution of computer vision and deep learning the algorithms have matured and we are now able to classify an image as not suitable for work with greater precision.
監(jiān)測(cè)黃色或暴力圖片是研發(fā)人員解決了幾十年的重要問(wèn)題照弥。隨著計(jì)算機(jī)圖像和深度機(jī)器學(xué)習(xí)的發(fā)展爽蝴,算法逐漸成熟,我們也能更加精準(zhǔn)地識(shí)別出黃色或暴力圖片更胖。
Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.
定義NSFW的分級(jí)是一種主觀判斷技掏,要識(shí)別這些黃色或暴力圖片也很麻煩。此外兴溜,在某些環(huán)境下侦厚,存在一定異議的東西,在另一個(gè)環(huán)境下可能是合理的拙徽。正因如此刨沦,我們下面描述的模型僅涉及NSFW內(nèi)容中的一種:色情圖片。
Since images and user generated content dominate the internet today, filtering nudity and other not suitable for work images becomes an important problem. In this repository we opensource a Caffe deep neural network for preliminary filtering of NSFW images.
由于圖片和UGC內(nèi)容主導(dǎo)著當(dāng)今的互聯(lián)網(wǎng)膘怕,過(guò)濾裸露和其他NSFW圖片成為至關(guān)重要的問(wèn)題想诅。本倉(cāng)庫(kù)將開(kāi)源一套基于深度神經(jīng)網(wǎng)絡(luò)Caffe模型用來(lái)初步過(guò)濾NSFW圖片。
Usage
使用方法
- The network takes in an image and gives output a probability (score between 0-1) which can be used to filter not suitable for work images. Scores < 0.2 indicate that the image is likely to be safe with high probability. Scores > 0.8 indicate that the image is highly probable to be NSFW. Scores in middle range may be binned for different NSFW levels.
- 在神經(jīng)網(wǎng)絡(luò)模型輸入一張圖片岛心,將輸出一個(gè)用于過(guò)濾NSFW圖片的分值(介于0-1之間)来破。若分值 < 0.2,意味著該圖片很可能是正常的鹉梨;分值 > 0.8 意味著該圖片極有可能屬于NSFW圖片讳癌;介于 0.2-0.8 之間的圖片一般要按業(yè)務(wù)需要分類進(jìn)行處理。
- Depending on the dataset, usecase and types of images, we advise developers to choose suitable thresholds. Due to difficult nature of problem, there will be errors, which depend on use-cases / definition / tolerance of NSFW. Ideally developers should create an evaluation set according to the definition of what is safe for their application, then fit a ROC curve to choose a suitable threshold if they are using the model as it is.
- 我們建議開(kāi)發(fā)人員根據(jù)數(shù)據(jù)集存皂、用例和圖片類型選擇適當(dāng)?shù)拈撝瞪卫ぁS捎趯?shí)際問(wèn)題的不同,依賴于用例旦袋、NSFW 的定義骤菠、NSFW的容忍程度等影響因素,會(huì)產(chǎn)生一些可能的判斷錯(cuò)誤疤孕。理想狀態(tài)下商乎,開(kāi)發(fā)人員應(yīng)當(dāng)根據(jù)其應(yīng)用的安全性定義創(chuàng)建一個(gè)評(píng)估數(shù)據(jù)集,然后在正確使用該模型的條件下祭阀,擬合一個(gè)ROC曲線來(lái)選擇一個(gè)合適的閾值范圍鹉戚。
- Results can be improved by fine-tuning the model for your dataset/ uscase / definition of NSFW. We do not provide any guarantees of accuracy of results. Please read the disclaimer below.
- 通過(guò)微調(diào)你模型中的數(shù)據(jù)集、用例和NSFW的定義范圍专控,能不斷優(yōu)化模型結(jié)果抹凳。我們對(duì)于模型結(jié)果的準(zhǔn)確性不做任何保證。請(qǐng)閱讀下面的免責(zé)聲明伦腐。
- Using human moderation for edge cases in combination with the machine learned solution will help improve performance.
- 在機(jī)器學(xué)習(xí)解決方案的基礎(chǔ)上赢底,結(jié)合邊緣情況進(jìn)行人工調(diào)整,有助于優(yōu)化模型性能。
Description of model
模型說(shuō)明
We trained the model on the dataset with NSFW images as positive and SFW(suitable for work) images as negative. These images were editorially labelled. We cannot release the dataset or other details due to the nature of the data.
我們用包含NSFW圖像和SFW圖像的數(shù)據(jù)集來(lái)訓(xùn)練和調(diào)教模型幸冻。這些圖片都被打上了標(biāo)簽粹庞,將NSFW作為陽(yáng)性結(jié)果、SFW作為陰性結(jié)果洽损。由于這部分?jǐn)?shù)據(jù)集的特殊性質(zhì)庞溜,我們無(wú)法公開(kāi)他們的任何細(xì)節(jié)信息。
We use CaffeOnSpark which is a wonderful framework for distributed learning that brings deep learning to Hadoop and Spark clusters for training models for our experiments. Big thanks to the CaffeOnSpark team!
在我們的實(shí)驗(yàn)中趁啸,我們利用了一個(gè)非常棒的分布式學(xué)習(xí)框架 CaffeOnSpark强缘,將深度學(xué)習(xí)應(yīng)用到Hadoop和Spark集群中訓(xùn)練模型。特別感謝CaffeOnSpark團(tuán)隊(duì)不傅!
The deep model was first pretrained on ImageNet 1000 class dataset. Then we finetuned the weights on the NSFW dataset.We used the thin resnet 50 1by2 architecture as the pretrained network. The model was generated using pynetbuilder tool and replicates the residual network paper's 50 layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here
我們首先利用ImageNet1000多個(gè)數(shù)據(jù)集對(duì)模型進(jìn)行了預(yù)訓(xùn)練旅掂。在此基礎(chǔ)上對(duì)NSFW數(shù)據(jù)集的比重進(jìn)行了微調(diào)。利用少量的resnet 50 1by2框架作為預(yù)訓(xùn)練網(wǎng)站访娶。模型工具由pynetbuilder和50層殘留網(wǎng)站的副本(每層中包含一半的過(guò)濾器)生成商虐。想要查看更多關(guān)于生成和訓(xùn)練模型的信息,請(qǐng)點(diǎn)擊這里崖疤。
Please note that deeper networks, or networks with more filters can improve accuracy. We train the model using a thin residual network architecture, since it provides good tradeoff in terms of accuracy, and the model is light-weight in terms of runtime (or flops) and memory (or number of parameters).
注意秘车,網(wǎng)絡(luò)層次越深,或用更多的過(guò)濾器劫哼,將會(huì)提升模型的準(zhǔn)確性叮趴。由于苛刻的網(wǎng)絡(luò)架構(gòu)提供了良好的精度,并且在運(yùn)行時(shí)間(或浮點(diǎn)運(yùn)算)和內(nèi)存(或大量參數(shù))上非常的輕量权烧,我們用了這套苛刻的網(wǎng)絡(luò)架構(gòu)來(lái)訓(xùn)練模型眯亦。
Docker Quickstart
Docker 快速入門
This Docker quickstart guide can be used for evaluating the model quickly with minimal dependency installation.
Install Docker Engine:
- Windows Installation
- Mac OSX Installation
- Ubuntu Installation
快速入門手冊(cè)能幫助你在最小依賴安裝的條件下快速評(píng)估模型。
安裝Docker引擎: - Windows Installation
- Mac OSX Installation
- Ubuntu Installation
Build a caffe docker image (CPU)
編譯一個(gè) caffe docker鏡像(CPU)
docker build -t caffe:cpu https://raw.githubusercontent.com/BVLC/caffe/master/docker/standalone/cpu/Dockerfile
Check the caffe installation
檢查Caffe是否已安裝
docker run caffe:cpu caffe --version
caffe version 1.0.0-rc3
Run the docker image with a volume mapped to your * open_nsfw * repository. Your * test_image.jpg * should be located in this same directory.
運(yùn)行與open_nsfw 庫(kù)對(duì)應(yīng)的docker鏡像般码。注意* test_image.jpg *應(yīng)處于同一個(gè)文件目錄下妻率。
cd open_nsfw
docker run --volume=$(pwd):/workspace caffe:cpu \
python ./classify_nsfw.py \
--model_def nsfw_model/deploy.prototxt \
--pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel \
test_image.jpg
We will get the NSFW score returned:
我們將得到返回的NSFW分值:
NSFW score: 0.14057905972
Running the model
如何運(yùn)行模型
To run this model, please install Caffe and its python extension and make sure pycaffe is available in your PYTHONPATH.
運(yùn)行該模型,請(qǐng)安裝Caffe和python擴(kuò)展組件板祝,并確保pycaffe在你的環(huán)境變量PYTHONPATH下是可用的宫静。
We can use the classify.py script to run the NSFW model. For convenience, we have provided the script in this repo as well, and it prints the NSFW score.
利用classify.py腳本可運(yùn)行NSFW模型。為了方便券时,我們?cè)趥}(cāng)庫(kù)中已經(jīng)提供了這個(gè)腳本孤里,用它能輸出NSFW的分值。
python ./classify_nsfw.py \
--model_def nsfw_model/deploy.prototxt \
--pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel \
INPUT_IMAGE_PATH
Disclaimer
免責(zé)聲明
The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project. Results can be improved by fine-tuning the model for your dataset.
由于對(duì)于“限制級(jí)”內(nèi)容的定義與主觀判斷和所處上下文有關(guān)橘洞,該模型僅作為常規(guī)參考模型來(lái)初步過(guò)濾色情圖片捌袜。我們不保證任何輸出結(jié)果的準(zhǔn)確性,僅作為廣大開(kāi)發(fā)者探索和學(xué)習(xí)的開(kāi)源項(xiàng)目震檩。通過(guò)微調(diào)您模型中的數(shù)據(jù)集可以優(yōu)化模型結(jié)果琢蛤。
授權(quán)
代碼基于 BSD 2 clause license 的許可,詳情見(jiàn)鏈接的授權(quán)文件抛虏。
Contact
聯(lián)系方式
The model was trained by [Jay Mahadeokar] (https://github.com/jay-mahadeokar/), in collaboration with Sachin Farfade , Amar Ramesh Kamat, Armin Kappeler and others. Special thanks to Gerry Pesavento for taking the initiative for open-sourcing this model. If you have any queries, please raise an issue and we will get back ASAP.
該模型由Jay Mahadeokar博其,Sachin Farfade , Amar Ramesh Kamat,Armin Kappeler 等人合作訓(xùn)練。特別鳴謝Gerry Pesavento帶頭倡議開(kāi)源了該模型迂猴。若有任何問(wèn)題慕淡,我們將盡快給您答復(fù)。