[翻譯中] Yahoo 開(kāi)源:open_nsfw 介紹

開(kāi)源 NSFW 模型

This repo contains code for running Not Suitable for Work (NSFW) classification deep neural network Caffe models. Please refer our blog post which describes this work and experiments in more detail.
該倉(cāng)庫(kù)包含運(yùn)行NSFW類的深度神經(jīng)網(wǎng)絡(luò)Caffe模型代碼顿肺。想要了解有關(guān)該模型的更多作品和實(shí)驗(yàn)細(xì)節(jié)管嬉,請(qǐng)參閱我們的博客留凭。

Not suitable for work classifier

NSFW分級(jí)器

Detecting offensive / adult images is an important problem which researchers have tackled for decades. With the evolution of computer vision and deep learning the algorithms have matured and we are now able to classify an image as not suitable for work with greater precision.
監(jiān)測(cè)黃色或暴力圖片是研發(fā)人員解決了幾十年的重要問(wèn)題照弥。隨著計(jì)算機(jī)圖像和深度機(jī)器學(xué)習(xí)的發(fā)展爽蝴,算法逐漸成熟,我們也能更加精準(zhǔn)地識(shí)別出黃色或暴力圖片更胖。

Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.
定義NSFW的分級(jí)是一種主觀判斷技掏,要識(shí)別這些黃色或暴力圖片也很麻煩。此外兴溜,在某些環(huán)境下侦厚,存在一定異議的東西,在另一個(gè)環(huán)境下可能是合理的拙徽。正因如此刨沦,我們下面描述的模型僅涉及NSFW內(nèi)容中的一種:色情圖片。

Since images and user generated content dominate the internet today, filtering nudity and other not suitable for work images becomes an important problem. In this repository we opensource a Caffe deep neural network for preliminary filtering of NSFW images.
由于圖片和UGC內(nèi)容主導(dǎo)著當(dāng)今的互聯(lián)網(wǎng)膘怕,過(guò)濾裸露和其他NSFW圖片成為至關(guān)重要的問(wèn)題想诅。本倉(cāng)庫(kù)將開(kāi)源一套基于深度神經(jīng)網(wǎng)絡(luò)Caffe模型用來(lái)初步過(guò)濾NSFW圖片。


Demo Image

Usage

使用方法

  • The network takes in an image and gives output a probability (score between 0-1) which can be used to filter not suitable for work images. Scores < 0.2 indicate that the image is likely to be safe with high probability. Scores > 0.8 indicate that the image is highly probable to be NSFW. Scores in middle range may be binned for different NSFW levels.
  • 在神經(jīng)網(wǎng)絡(luò)模型輸入一張圖片岛心,將輸出一個(gè)用于過(guò)濾NSFW圖片的分值(介于0-1之間)来破。若分值 < 0.2,意味著該圖片很可能是正常的鹉梨;分值 > 0.8 意味著該圖片極有可能屬于NSFW圖片讳癌;介于 0.2-0.8 之間的圖片一般要按業(yè)務(wù)需要分類進(jìn)行處理。
  • Depending on the dataset, usecase and types of images, we advise developers to choose suitable thresholds. Due to difficult nature of problem, there will be errors, which depend on use-cases / definition / tolerance of NSFW. Ideally developers should create an evaluation set according to the definition of what is safe for their application, then fit a ROC curve to choose a suitable threshold if they are using the model as it is.
  • 我們建議開(kāi)發(fā)人員根據(jù)數(shù)據(jù)集存皂、用例和圖片類型選擇適當(dāng)?shù)拈撝瞪卫ぁS捎趯?shí)際問(wèn)題的不同,依賴于用例旦袋、NSFW 的定義骤菠、NSFW的容忍程度等影響因素,會(huì)產(chǎn)生一些可能的判斷錯(cuò)誤疤孕。理想狀態(tài)下商乎,開(kāi)發(fā)人員應(yīng)當(dāng)根據(jù)其應(yīng)用的安全性定義創(chuàng)建一個(gè)評(píng)估數(shù)據(jù)集,然后在正確使用該模型的條件下祭阀,擬合一個(gè)ROC曲線來(lái)選擇一個(gè)合適的閾值范圍鹉戚。
  • Results can be improved by fine-tuning the model for your dataset/ uscase / definition of NSFW. We do not provide any guarantees of accuracy of results. Please read the disclaimer below.
  • 通過(guò)微調(diào)你模型中的數(shù)據(jù)集、用例和NSFW的定義范圍专控,能不斷優(yōu)化模型結(jié)果抹凳。我們對(duì)于模型結(jié)果的準(zhǔn)確性不做任何保證。請(qǐng)閱讀下面的免責(zé)聲明伦腐。
  • Using human moderation for edge cases in combination with the machine learned solution will help improve performance.
  • 在機(jī)器學(xué)習(xí)解決方案的基礎(chǔ)上赢底,結(jié)合邊緣情況進(jìn)行人工調(diào)整,有助于優(yōu)化模型性能。

Description of model

模型說(shuō)明

We trained the model on the dataset with NSFW images as positive and SFW(suitable for work) images as negative. These images were editorially labelled. We cannot release the dataset or other details due to the nature of the data.
我們用包含NSFW圖像和SFW圖像的數(shù)據(jù)集來(lái)訓(xùn)練和調(diào)教模型幸冻。這些圖片都被打上了標(biāo)簽粹庞,將NSFW作為陽(yáng)性結(jié)果、SFW作為陰性結(jié)果洽损。由于這部分?jǐn)?shù)據(jù)集的特殊性質(zhì)庞溜,我們無(wú)法公開(kāi)他們的任何細(xì)節(jié)信息。

We use CaffeOnSpark which is a wonderful framework for distributed learning that brings deep learning to Hadoop and Spark clusters for training models for our experiments. Big thanks to the CaffeOnSpark team!
在我們的實(shí)驗(yàn)中趁啸,我們利用了一個(gè)非常棒的分布式學(xué)習(xí)框架 CaffeOnSpark强缘,將深度學(xué)習(xí)應(yīng)用到Hadoop和Spark集群中訓(xùn)練模型。特別感謝CaffeOnSpark團(tuán)隊(duì)不傅!

The deep model was first pretrained on ImageNet 1000 class dataset. Then we finetuned the weights on the NSFW dataset.We used the thin resnet 50 1by2 architecture as the pretrained network. The model was generated using pynetbuilder tool and replicates the residual network paper's 50 layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here
我們首先利用ImageNet1000多個(gè)數(shù)據(jù)集對(duì)模型進(jìn)行了預(yù)訓(xùn)練旅掂。在此基礎(chǔ)上對(duì)NSFW數(shù)據(jù)集的比重進(jìn)行了微調(diào)。利用少量的resnet 50 1by2框架作為預(yù)訓(xùn)練網(wǎng)站访娶。模型工具由pynetbuilder和50層殘留網(wǎng)站的副本(每層中包含一半的過(guò)濾器)生成商虐。想要查看更多關(guān)于生成和訓(xùn)練模型的信息,請(qǐng)點(diǎn)擊這里崖疤。

Please note that deeper networks, or networks with more filters can improve accuracy. We train the model using a thin residual network architecture, since it provides good tradeoff in terms of accuracy, and the model is light-weight in terms of runtime (or flops) and memory (or number of parameters).
注意秘车,網(wǎng)絡(luò)層次越深,或用更多的過(guò)濾器劫哼,將會(huì)提升模型的準(zhǔn)確性叮趴。由于苛刻的網(wǎng)絡(luò)架構(gòu)提供了良好的精度,并且在運(yùn)行時(shí)間(或浮點(diǎn)運(yùn)算)和內(nèi)存(或大量參數(shù))上非常的輕量权烧,我們用了這套苛刻的網(wǎng)絡(luò)架構(gòu)來(lái)訓(xùn)練模型眯亦。

Docker Quickstart

Docker 快速入門

This Docker quickstart guide can be used for evaluating the model quickly with minimal dependency installation.
Install Docker Engine:

  • Windows Installation
  • Mac OSX Installation
  • Ubuntu Installation
    快速入門手冊(cè)能幫助你在最小依賴安裝的條件下快速評(píng)估模型。
    安裝Docker引擎:
  • Windows Installation
  • Mac OSX Installation
  • Ubuntu Installation

Build a caffe docker image (CPU)
編譯一個(gè) caffe docker鏡像(CPU)

docker build -t caffe:cpu https://raw.githubusercontent.com/BVLC/caffe/master/docker/standalone/cpu/Dockerfile

Check the caffe installation
檢查Caffe是否已安裝

docker run caffe:cpu caffe --version
caffe version 1.0.0-rc3

Run the docker image with a volume mapped to your * open_nsfw * repository. Your * test_image.jpg * should be located in this same directory.
運(yùn)行與open_nsfw 庫(kù)對(duì)應(yīng)的docker鏡像般码。注意* test_image.jpg *應(yīng)處于同一個(gè)文件目錄下妻率。

cd open_nsfw
docker run --volume=$(pwd):/workspace caffe:cpu \
python ./classify_nsfw.py \
--model_def nsfw_model/deploy.prototxt \
--pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel \
test_image.jpg

We will get the NSFW score returned:
我們將得到返回的NSFW分值:

NSFW score:   0.14057905972

Running the model

如何運(yùn)行模型

To run this model, please install Caffe and its python extension and make sure pycaffe is available in your PYTHONPATH.
運(yùn)行該模型,請(qǐng)安裝Caffe和python擴(kuò)展組件板祝,并確保pycaffe在你的環(huán)境變量PYTHONPATH下是可用的宫静。

We can use the classify.py script to run the NSFW model. For convenience, we have provided the script in this repo as well, and it prints the NSFW score.
利用classify.py腳本可運(yùn)行NSFW模型。為了方便券时,我們?cè)趥}(cāng)庫(kù)中已經(jīng)提供了這個(gè)腳本孤里,用它能輸出NSFW的分值。

python ./classify_nsfw.py \
 --model_def nsfw_model/deploy.prototxt \
 --pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel \
 INPUT_IMAGE_PATH 

Disclaimer

免責(zé)聲明

The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project. Results can be improved by fine-tuning the model for your dataset.
由于對(duì)于“限制級(jí)”內(nèi)容的定義與主觀判斷和所處上下文有關(guān)橘洞,該模型僅作為常規(guī)參考模型來(lái)初步過(guò)濾色情圖片捌袜。我們不保證任何輸出結(jié)果的準(zhǔn)確性,僅作為廣大開(kāi)發(fā)者探索和學(xué)習(xí)的開(kāi)源項(xiàng)目震檩。通過(guò)微調(diào)您模型中的數(shù)據(jù)集可以優(yōu)化模型結(jié)果琢蛤。

授權(quán)

代碼基于 BSD 2 clause license 的許可,詳情見(jiàn)鏈接的授權(quán)文件抛虏。

Contact

聯(lián)系方式

The model was trained by [Jay Mahadeokar] (https://github.com/jay-mahadeokar/), in collaboration with Sachin Farfade , Amar Ramesh Kamat, Armin Kappeler and others. Special thanks to Gerry Pesavento for taking the initiative for open-sourcing this model. If you have any queries, please raise an issue and we will get back ASAP.
該模型由Jay Mahadeokar博其,Sachin Farfade , Amar Ramesh Kamat,Armin Kappeler 等人合作訓(xùn)練。特別鳴謝Gerry Pesavento帶頭倡議開(kāi)源了該模型迂猴。若有任何問(wèn)題慕淡,我們將盡快給您答復(fù)。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末沸毁,一起剝皮案震驚了整個(gè)濱河市峰髓,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌息尺,老刑警劉巖携兵,帶你破解...
    沈念sama閱讀 217,907評(píng)論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異搂誉,居然都是意外死亡徐紧,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,987評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門炭懊,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)并级,“玉大人,你說(shuō)我怎么就攤上這事侮腹〕氨蹋” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 164,298評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵父阻,是天一觀的道長(zhǎng)愈涩。 經(jīng)常有香客問(wèn)我,道長(zhǎng)至非,這世上最難降的妖魔是什么钠署? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,586評(píng)論 1 293
  • 正文 為了忘掉前任,我火速辦了婚禮荒椭,結(jié)果婚禮上谐鼎,老公的妹妹穿的比我還像新娘。我一直安慰自己趣惠,他們只是感情好狸棍,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,633評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著味悄,像睡著了一般草戈。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上侍瑟,一...
    開(kāi)封第一講書(shū)人閱讀 51,488評(píng)論 1 302
  • 那天唐片,我揣著相機(jī)與錄音丙猬,去河邊找鬼。 笑死费韭,一個(gè)胖子當(dāng)著我的面吹牛茧球,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播星持,決...
    沈念sama閱讀 40,275評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼抢埋,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了督暂?” 一聲冷哼從身側(cè)響起揪垄,我...
    開(kāi)封第一講書(shū)人閱讀 39,176評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎逻翁,沒(méi)想到半個(gè)月后饥努,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,619評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡八回,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,819評(píng)論 3 336
  • 正文 我和宋清朗相戀三年肪凛,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片辽社。...
    茶點(diǎn)故事閱讀 39,932評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡伟墙,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出滴铅,到底是詐尸還是另有隱情戳葵,我是刑警寧澤,帶...
    沈念sama閱讀 35,655評(píng)論 5 346
  • 正文 年R本政府宣布汉匙,位于F島的核電站拱烁,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏噩翠。R本人自食惡果不足惜戏自,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,265評(píng)論 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望伤锚。 院中可真熱鬧擅笔,春花似錦、人聲如沸屯援。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,871評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)狞洋。三九已至弯淘,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間吉懊,已是汗流浹背庐橙。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 32,994評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工假勿, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人态鳖。 一個(gè)月前我還...
    沈念sama閱讀 48,095評(píng)論 3 370
  • 正文 我出身青樓废登,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親郁惜。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,884評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,332評(píng)論 0 10
  • SQ轉(zhuǎn)崗的第二天甲锡。 她說(shuō):希望我安排人接手她一部分測(cè)試的工作 我回:現(xiàn)在不行兆蕉,你還不能接項(xiàng)目。 再問(wèn):大概什么時(shí)候...
    灼灼2015閱讀 262評(píng)論 3 1
  • 《喬布斯的魔力演講》閱讀筆記 準(zhǔn)備一個(gè)演講(Prepare) 加爾.雷納德(Garr Reynolds)在《演說(shuō)之...
    萬(wàn)學(xué)凡閱讀 665評(píng)論 0 1
  • “我被綁架了”開(kāi)什么玩笑缤沦,誰(shuí)會(huì)綁架你虎韵,窮得只剩一堆破爛了,那人肯定是瞎子缸废!我說(shuō)的不是要錢的綁架包蓝,是在生活,工作.....
    灣灣的月亮閱讀 298評(píng)論 0 0
  • 老爹一早就打電話來(lái)說(shuō):閨女 今天端午要吃好點(diǎn) 發(fā)現(xiàn)老爹的眼里每天都是過(guò)節(jié) 每天都要吃好點(diǎn) 可惜沒(méi)有吃粽子 北方喜愛(ài)...
    Jessy程閱讀 288評(píng)論 2 4