利用docker容器進(jìn)行metascape富集分析還是很簡單的军拟,默認(rèn)分析human基因list剃执,注意-S參數(shù)

利用docker容器進(jìn)行metascape富集分析還是很簡單的,默認(rèn)分析human基因list懈息,注意-S參數(shù)肾档。

ps:

我測試的路徑在:sftp://root@ip:22/home/softwares/MSBio/data
命令:bin/ms.sh -u -o /data/output_single_id_txt /data/example/single_list_id.txt

這里要特別注意!1杓獭怒见!


Introduction

Metascape for Bioinformaticians (MSBio) enables Metascape analyses to be carried out in batch mode using users' own hardware. Metascape is a complex piece of software with many third-party dependencies (luckily no commercial ones), so Docker container technology is used to run Metascape offline. If you do not have access to a Docker infrastructure, please continue to use the Metascape.org web site. Although we have tested Docker on Mac (M1/2 chip is not supported) and Windows, the instructions below are written with Linux in mind.

To run MSBio, you need two docker images (~3GB each) and a valid license. Free MSBio licenses are available for non-web and non-commercial use only. To run MSBio within a commercial entity, please check out what commercial license enables.

Installation

Before you start, make sure you have access to a Docker infrastructure.
The installation package can be obtained by register for a free license (valid for a year). Create a new folder for MSBio, unzip the installation zip file to get three subfolders:

$ unzip msbio_v3.5.20210815.zip
$ lt

bin data license

To download MSBio docker images, enter the MSBio working folder, run:

bin/install.sh

Note: winbin/install.bat for Windows.

As each image is ~3GB in size, be patient as they are downloaded. If successful, you should see two Docker images (Image ID and Size varies):

$ docker image list

REPOSITORY TAG IMAGE ID CREATED SIZE

metadocker8/msdata latest 99ea12d0ba82 30 hours ago 3.23GB

metadocker8/msbio latest 7914a952382e 30 hours ago 3.34GB

Metascape Docker container requires a minimum of 6GB memory to run, as the database consumes memory. We recommend providing 8GB+.

Note: the installation script also creates a data folder and makes it writable to all users (MSBio creates files with user ID 1002).

Usage

Container Management

The MSBio containers must be running in order to do metascape analysis. To launch the containers, at the MSBio work folder:

bin/up.sh

After you are done with your analyses, shut down the containers to save memory resource:

bin/down.sh

Gene-List Analysis

To analyze your gene list(s), use bin/ms.sh. The minimum syntax is:

bin/ms.sh -o output_folder input_list_file

However, since the input file formats are different depending on whether you are doing a single-gene-list or multi-gene-list analyses, you must specify -u if your input format follows the single-gene-list standard. Another very important point is both output_folder and input_list_file must be subfolders of data, as the data working folder is mounted inside the container as /data and therefore files within are visible inside the Docker container. Also, both output_folder and input_list_file must start with /data, since this is the path within the container! (Since version v3.5.20211016, the "data" path not starting with "/" will be automatically prepended and will work.)

Example for single-gene-list analysis:

bin/ms.sh -u -o /data/output_single_id_txt /data/example/single_list_id.txt

Here, -u stands for "unique", as our genes are in a column. This is important because the file format of (.txt, .csv, or .xlsx) for a single gene list is different from the format used for multiple gene lists. The exact format of the input files is described in the online menu, and example input files are also available under the data/example folder. Our recommendation is to always use the multiple gene list format, we might need to retire the single gene list format in the future, as it causes some confusion.

If the gene list is not for human, use -S. Please read the next section for important options.

If your analysis command crashes without error, chances are the process within the container was killed due to insufficient memory, so it did not get a chance to complain. You should make sure your Docker server allows 8GB+ for the container.

Example of multi-gene-list analysis:

bin/ms.sh -o /data/output_multiple_sym_txt /data/example/multiple_list_symbol.txt

Advanced Options

MSBio supports many options, however, you can ignore most of them. We here only explain a few important ones:

-o OUTPUT, --output OUTPUT

The output folder path must be provided. It must start with /data/ as this is the path within the container.

-u, --one_list

This is important, when your input uses the single-list file format.

-p, --PPI

By default, MSBio perform PPI network analysis. If you do not want PPI analysis, use this option. (Note: MSBio alpha did not run PPI by default, we change the behavior in beta)

-G, --skip_go

By default, MSBio performs GO enrichment analysis. If you would like to skip, use this option.

-t ID_TYPE, --id_type ID_TYPE

ID type of genes in the input file. By default, you do not need to specify and let Metascape auto-guess. But you can also force Metascape to interpret your IDs as one of the following types: "Entrez", "RefSeq", "Symbol", or "dbxref". Type strings are case-sensitive.

-s, --skip_convert

If you are pretty sure the input gene IDs are already correct Entrez Gene IDs, you can use this option to skip the ID conversion and slightly speeds thing up.

-S SOURCE_TAX_ID, --source_tax_id SOURCE_TAX_ID

By default, Metascape treats the source organism as human, if it is not, you can specify the source taxonomy ID using this option.

-T TARGET_TAX_ID, --target_tax_id TARGET_TAX_ID

By default, Metascape treats the target organism as human, if it is not, you can specify the target taxonomy ID using this option.

--option option.json

All settings for Metascape "Custom Analysis" and more can be changed using a JSON file. data/example/option.json is an example file containing all default settings. This is what is used if the --option is not provided. You can provide your own option.json file to customize ontology categories and annotation categories. Although not recommended, you can even overwrite gene list and PPI network size limits.

For -S and -T you can use either taxonomy ID or common names. The supported IDs are: 9606, 10090, 10116, 4932, 5833, 6239, 7227, 7955, 3702, and 4896. The supported names are: human, mouse, rat, yeast, malaria, "c. elegans", fly, zebrafish, arabidopsis, or "s. pombe".

Batch Processing

At the beginning of each bin/ms.sh run, it first needs to load databases. If you need to run multiple tasks and you would like to avoid this overhead, you can use a .job file as the input file, see data/example/test.job. This way the databases are only loaded once and Metascape can run multiple tasks afterward. for examples.

Each line in a .job file is a JSON-format description of a Metascape task. You must minimally specify the input, output, and "single":true if input file format uses the single-gene-list standard (equivalent to the -u option). You can even provide job-specific option.json file, if you want to alter the default behavior.

To run the job file:

bin/ms.sh /data/example/test.job

Since v3.5.20211016, you may also omit the "/" in the beginning of the input and output arguments, e.g.:

bin/ms.sh data/example/test.job

For debugging purpose, if you want to skip a task, use "#" to comment out that task line.

When Metascape executes a task, it encloses the output message within two lines, starting with 'START>' and 'COMPLETE>'. For example:

START> job #12, input=/data/example/multiple_list_id_bg.xlsx, output=/data/output_multiple_id_xlsx_bg
...
Cytoscape Free Memory: 1531
COMPLETE> job #12, input=/data/example/multiple_list_id_bg.xlsx, output=/data/output_multiple_id_xlsx_bg

If a task line is commented out or the input or output path for a task is missing, there will be a line:

SKIP> job #1

This supposedly makes it easier for you to parse the batch processing output to identify the failed tasks.

Parallel Metascape Analyses

When one bin/ms.sh is running, you must not execute another bin/ms.sh command! This is because the backend plotting components can only plot one task at a time, so if you run two ms.sh simultaneously, plots from two gene lists may cross0talk with each other.

In case you really need to run multiple tasks in parallel, you need to use multiple MSBio containers, which is isolated from each other. Each ms.sh process in a container should only process one gene list at a time!

As an example, to launch two MSBio containers, do:

bin/up.sh
bin/up.sh 2

There will be two MSBio containers running, named msbio1 and msbio2. The first command, bin/up.sh names the container as msbio1 by default; it is equivalent to bin/up.sh 1. Now you can use both containers in parallel. The following two commands can be run at once:

bin/ms.sh -u -o /data/output_single_id_txt /data/example/single_list_id.txt &
bin/ms.sh 2 -o /data/output_multiple_sym_txt /data/example/multiple_list_symbol.txt

bin/ms.sh 2 means run the command using the msbio2 container. bin/ms.sh followed by "-" means msbio1 is used. You may also use bin/ms.sh 1, if you want to be explicitly using msbio1.

To shut down both containers:

bin/down.sh
bin/down.sh 2

If you need more containers, just follow this usage pattern. To minimize resource consumption, only msbio1 runs the database server, and all other containers talk to msbio1. So bin/down.sh 1 will only work if there are no other containers depending on msbio1.

Mac and Windows

If you install Docker Desktop for Mac/Windows, MSBio does work in our tests. For MAC, commands are the same in the examples above. For Windows, the scripts are in the winbin folder instead. So commands bin/up.sh, bin/down.sh, bin/ms.sh are replaced by winbin/up.bat, winbin/down.bat, and winbin/ms.bat, respectively.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市姑宽,隨后出現(xiàn)的幾起案子遣耍,更是在濱河造成了極大的恐慌,老刑警劉巖炮车,帶你破解...
    沈念sama閱讀 206,013評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件舵变,死亡現(xiàn)場離奇詭異,居然都是意外死亡瘦穆,警方通過查閱死者的電腦和手機(jī)纪隙,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評論 2 382
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來扛或,“玉大人绵咱,你說我怎么就攤上這事「婧埃” “怎么了麸拄?”我有些...
    開封第一講書人閱讀 152,370評論 0 342
  • 文/不壞的土叔 我叫張陵派昧,是天一觀的道長。 經(jīng)常有香客問我拢切,道長蒂萎,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,168評論 1 278
  • 正文 為了忘掉前任淮椰,我火速辦了婚禮五慈,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘主穗。我一直安慰自己泻拦,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評論 5 371
  • 文/花漫 我一把揭開白布忽媒。 她就那樣靜靜地躺著争拐,像睡著了一般。 火紅的嫁衣襯著肌膚如雪晦雨。 梳的紋絲不亂的頭發(fā)上架曹,一...
    開封第一講書人閱讀 48,954評論 1 283
  • 那天,我揣著相機(jī)與錄音闹瞧,去河邊找鬼绑雄。 笑死,一個(gè)胖子當(dāng)著我的面吹牛奥邮,可吹牛的內(nèi)容都是我干的万牺。 我是一名探鬼主播,決...
    沈念sama閱讀 38,271評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼洽腺,長吁一口氣:“原來是場噩夢啊……” “哼脚粟!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起已脓,我...
    開封第一講書人閱讀 36,916評論 0 259
  • 序言:老撾萬榮一對情侶失蹤珊楼,失蹤者是張志新(化名)和其女友劉穎通殃,沒想到半個(gè)月后度液,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,382評論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡画舌,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評論 2 323
  • 正文 我和宋清朗相戀三年堕担,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片曲聂。...
    茶點(diǎn)故事閱讀 37,989評論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡霹购,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出朋腋,到底是詐尸還是另有隱情齐疙,我是刑警寧澤膜楷,帶...
    沈念sama閱讀 33,624評論 4 322
  • 正文 年R本政府宣布,位于F島的核電站贞奋,受9級特大地震影響赌厅,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜轿塔,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評論 3 307
  • 文/蒙蒙 一特愿、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧勾缭,春花似錦揍障、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至幻梯,卻和暖如春审胚,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背礼旅。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評論 1 260
  • 我被黑心中介騙來泰國打工膳叨, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人痘系。 一個(gè)月前我還...
    沈念sama閱讀 45,401評論 2 352
  • 正文 我出身青樓菲嘴,卻偏偏與公主長得像,于是被迫代替她去往敵國和親汰翠。 傳聞我的和親對象是個(gè)殘疾皇子龄坪,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評論 2 345

推薦閱讀更多精彩內(nèi)容