肺是人類的主要呼吸器官
箩言,其中近端氣道
和遠端肺泡
分別負責空氣傳導和氣體交換牺弄。然而料饥,在人類肺發(fā)育的胚胎階段蒲犬,近端-遠端模式的調節(jié)在很大程度上是未知的。在這里岸啡,我們使用單細胞 RNA 測序研究了受精后第 4-8 周(卡內基第 12-21 期)人類胚胎的早期肺發(fā)育原叮,并獲得了169,686 個細胞的轉錄組圖譜
。我們在第 4 周觀察到肺器官發(fā)生開始時近端和遠端上皮細胞的可識別基因表達模式巡蘸。此外奋隶,我們確定了近端 (例如 THRB 和 EGR3)
和遠端 (例如 ETV1 和 SOX6)
上皮模式化的新型轉錄調節(jié)因子。進一步解剖揭示了各種基質細胞群悦荒,包括早期胚胎 BDNF+群體唯欣,提供了具有空間特異性的近端-遠端模式生態(tài)位。此外搬味,我們闡明了氣道和血管平滑肌祖細胞在肺發(fā)育早期的細胞命運分叉和成熟境氢。總之身腻,我們的研究擴大了早期胚胎階段人類肺發(fā)育生物學的范圍产还。內在轉錄調節(jié)因子和新的生態(tài)位提供者的發(fā)現加深了對人類肺發(fā)育中上皮近端-遠端模式的理解,為再生醫(yī)學開辟了新的途徑嘀趟。
背景:
“內消呼肝胰脐区,外表感神經∷矗”呼吸系統(tǒng)源自內胚層牛隅,肺臟作為重要的組成炕柔,經歷5個主要階段!
肺在人類妊娠約 4 周時發(fā)育媒佣,是前腸內胚層的腹側生長物匕累,周圍環(huán)繞著中胚層。在形態(tài)學上默伍,發(fā)育計劃經歷五個不同的階段:胚胎(4-7 周)欢嘿、假腺(5-17 周)、小管(16-26 周)也糊、囊狀(26-38 周)和肺泡(36 周-3 歲)
炼蹦。在此過程中,肺內胚層祖細胞分化為近端氣道上皮細胞類型狸剃,例如纖毛細胞掐隐、基底細胞和分泌細胞,以及遠端肺泡細胞類型钞馁。近端-遠端模式的確定對于肺形態(tài)發(fā)生至關重要虑省。在小鼠肺發(fā)育的假腺階段,Sox2 + 和 Sox9 + 細胞分別主要位于近端和遠端上皮僧凰。然而探颈,在人類肺發(fā)育中,近端上皮表達 SOX2
训措,而遠端上皮共表達 SOX2 和 SOX9
膝擂。
-
res1: The major components required for lung development are readily available at the initiation of lung formation
人類胚胎肺進行了集中采樣,涵蓋胚胎階段和早期假腺階段 圖 1a
獲得了高質量的 scRNA-seq 圖譜隙弛,其中包含 169,686 個細胞,平均檢測到 3764 個基因和 12,118 個獨特分子標識符 (UMI)
注釋了六個人胚胎肺細胞簇狞山,包括肺基質細胞全闷、上皮祖細胞、神經嵴祖細胞萍启、內皮祖細胞总珠、原紅細胞和巨噬細胞 圖1b-d
基質細胞在整個采樣期間所占比例最高,其次是上皮細胞勘纯、神經嵴和內皮細胞(圖1d)
有趣的是局服,我們觀察到所有六種細胞類型早在第 4 周(CS12)就出現了
,揭示了一種易于組裝的機制驳遵,并表明肺器官發(fā)生是通過復雜的細胞串擾啟動的 圖1e
-
res2: 轉錄因子調節(jié)上皮細胞的近遠端模式。
上皮細胞一直是六種主要肺細胞類型中的主要研究焦點鼠哥。對上皮細胞簇(基質細胞的第二大細胞簇(17,860 個細胞))進行了深入分析熟菲。
我們將我們的數據集(最早的時間點)與從第 11 周到成人的公開數據集進行了整合。第 4-8 周是一個分支點
朴恳,大多數近端祖細胞(例如基底細胞和纖毛細胞)和遠端祖細胞(例如肺泡 1 型和 2 型細胞(AT1 和 AT2))軌跡已經出現抄罕。
使用單細胞調控網絡推理和聚類 (SCENIC) 計算第 4-8 周期間的 TF 基因(調節(jié)子)活性。接下來菜皂,使用 Waddington-OT20 (WOT) 通過計算每個相鄰時間點之間的細胞間相似性來推斷近端和遠端細胞命運的評分贞绵。最后,根據每個時間點的 WOT 軌跡得分獲得了調節(jié)子活動的加權和
恍飘。我們的研究強調了第 4-8 周期間近端-遠端模式形成的分子特征榨崩,遠遠早于小管階段遠端上皮分化的第一個形態(tài)學標志。
上皮細胞由 5 個群體組成章母,包括高度表達已知近端/遠端標記物的亞型 SOX2 (EPI_SOX2hi/ETV5lo) 和 SOX9/ETV5 (EPI_SOX9hi/ETV5hi)
RNA 速度分析29表明 EPI_SOX2hi/ETV5lo 和 EPI_SOX9hi/ETV5hi群體源自增殖亞型 EPI_DTL+ 和 EPI_TOP2A+母蛛,表明肺上皮祖細胞是預期近端/遠端上皮的來源
事實上,在 EPI_SOX2hi/ETV5lo 和 EPI_SOX9hi/ETV5hi 中觀察到可辨別的近端-遠端表達模式第 4 周的細胞(圖 2e乳怎、f)
SOX2 和 SOX9 在兩個群體中共表達
彩郊,直到自第6
周起 EPI_SOX2hi/ETV5lo 細胞中遠端因子大幅下調(圖 2e ,f)蚪缀,表明近端規(guī)范在第 6 周左右開始秫逝。一致地,SOX2 的轉錄活性在第 6 周激增(圖2c)询枚。
-
res3 基質細胞有助于上皮近端-遠端模式的多樣化微環(huán)境
基質細胞簇是第 4-8 周肺細胞類型中最大的群體(圖1 D)這些基質細胞如何影響胚胎階段的人肺上皮發(fā)育。
根據基質細胞的分子特征鑒定了 7 種基質細胞亞型寺渗,包括5 種基質亞型
和2 種平滑肌細胞 (SMC) 亞型
(即氣道 SMC (ASMC) 和血管 SMC (VSMC))
由于缺乏單細胞研究匿情,早期基質群體的異質性以前被低估了
。
為了進一步解析這些基質細胞的空間特性信殊,特別是它們與上皮的接近程度炬称,我們通過 10× Visium 技術對 6 周的肺組織切片進行了空間轉錄組學分析. 上皮亞型 EPI_SOX2hi/ETV5lo 和 EPI_SOX9hi/ETV5hi 及其分子標志物分別在近端和遠端上皮中的分布(圖 3 g)
ASMC 分布在上皮周圍,而 VSMC 分散涡拘,沒有任何典型模式玲躯。smiFISH 分別用標記物 COL9A2/TCF21 和 MYH11/LIMS2 驗證了 + 和 ASMC 的空間位置(圖 3h)。值得注意的是鳄乏,LIMS2 是本研究中新發(fā)現的 ASMC 標志物跷车。這些結果證明了基質亞型分布的特異性及其在胚胎肺器官發(fā)生過程中作為生態(tài)位提供者的不同作用
。
我們在主要肺細胞類型之間進行了配體-受體相互作用分析
基質細胞和內皮細胞與上皮細胞的連通性最高,反映了協調氣道分支形態(tài)發(fā)生的基質衍生微環(huán)境朽缴。
基質亞型對上皮生態(tài)位的貢獻。有趣的是水援,我們觀察到 BDNF+ 基質亞型僅由 7.5% 的基質細胞組成密强,負責> 33% 的基質衍生的上皮受體配體 圖4a
有趣的是,BDNF+ 基質亞型出現在第 4 周蜗元,并隨著肺發(fā)育逐漸表現出細胞比例降低,這種細胞類型的特征基因或渤,包括 BDNF 和 FGF10,在器官發(fā)生過程中也被下調.突出了這種 BDNF + 細胞群在早期胚胎階段的獨特作用奕扣。配體-受體相互作用分析進一步闡明劳坑,BDNF+ 基質亞型主要解釋配體的表達,包括 BDNF成畦、FGF9/10、WNT2/2B涝开、LAMA5 和 NMU循帐,其受體是上皮顯性的,表明高度特異性的 BDNF+ 基質-上皮通訊
我們推斷 BNDF+ 細胞衍生的配體與上皮細胞上的受體相互作用
- res 4 基質細胞來源的 SMC 命運分叉的特征
基質細胞來源的 VSMC 和 ASMC 對于調節(jié)肺功能至關重要拄养。配體-受體相互作用分析表明,VSMC 和 ASMC 共同貢獻了超過 30% 的基質-上皮細胞串擾,表明它們在肺上皮發(fā)育中起關鍵作用
編碼 VSMC 和 ASMC 的發(fā)育起源和細胞規(guī)格的程序尚不清楚瘪匿。因此跛梗,我們探討了 SMC 的發(fā)育程序及其在上皮發(fā)育中的作用。
跨時間點(第 4-6 周)的單細胞譜分別顯示在 UMAP 中棋弥,ASMC 和 VSMC 的軌跡評分分別用藍色和綠色表示
有趣的是核偿,WOT 推斷一些基質細胞的細胞命運在第 4-5 周已經偏向于 ASMC 或 VSMC,早于從解剖學上觀察到 ASMC(第 5 周)和 VSMC(第 7 周)的階段
選擇軌跡評分> 0.0001 的細胞作為 SMC 譜系顽染。力導向布局顯示 ASMC 和 VSMC 發(fā)展明顯分叉(圖 5b)漾岳。通過沿 Palantir偽時間排列單元格(5c),我們發(fā)現間充質細胞的核心 TFs粉寞,例如MEIS2尼荆、SHOX2、TWIST1
唧垦,隨著兩種類型的 SMC 的發(fā)育而逐漸下調捅儒,而真正的 SMC 標志物 ACTA2 和 TAGLN 在 ASMC 和 VSMC 群體中表達并逐漸上調。已報道的 VSMC 和 ASMC 標志物 MEF2C 和 MYH11 的表達水平沿其相應的軌跡特異性上調振亮。
我們確定了經典調節(jié)因子巧还,分別用于 VSMC 的 MEF2C 和用于 ASMC 的 ZEB1,驗證了該工作流程的穩(wěn)健性
發(fā)現了用于 SMC 開發(fā)的新型 TFs双炕,即用于 VSMC 的 EBF1 和用于 ASMC 的 FOXF1
......【討論了很多基因】
學習一下方法:
scRNA-seq analysis
Processing of sequencing files
The FASTQ files of single-cell libraries generated from Illumina NovaSeq system underwent adaptor index removal byTrim Galore (0.5.0)
. The clean FASTQ files were aligned to the Hg38 and Mm10 genomes with human and mouse gene annotation based onGencode
v35 and vM21 versions, respectively, bySTARsolo function of STAR (2.7.6a)
. For details on the quality of reads, please see Supplementary information, Table S9.Cell filtering, clustering, and visualization
Low-quality cells were filtered out by the number of UMIs and total counts on each sample. We usedScrublet
(1.0) to predict and exclude potential multiplets. To minimize batch effects, we defined the major cell clusters for each sample separately. Briefly, we appliedScanpy
(1.7.2)to reduce the dimensionality of cells byPCA
and perform clustering and visualization byLeiden
andUMAP algorithms
. Cell clusters wereassigned to known lung cell types
based on cell type-specific markers. See Supplementary information,Table S9
for the details on data quality control.
For the visualization of all scRNA-seq data, we first identified the highlyvariable TFs
(scanpy.pp.highly_variable_genes) anddifferentially expressed genes
(DEGs) (P value < 0.001
, Wilcoxon rank-sum test) among major cell types. These TFs and DEGs (1144 genes) were used forcomputing PCA
. We next applied FFT-accelerated Interpolation-based t-SNE73,74 (FIt-SNE) algorithm to display the layout in two dimensions using the top 10 principal components.Identification of the major cell type markers
All data were normalized using functionsscanpy.pp.normalize
andscanpy.pp.log1p
. DEGs of each major cell type were calculated based on the normalized data (P value < 0.01
, Wilcoxon rank-sum test,log fold change ≥ 0.25
). DEGs were listed in Supplementary information, Table S1.Clustering and defining cell subtypes
Top highly variable genes (HVGs)
of each sample were selected and merged. After PCA-based dimensionality reduction, we used harmony to correct batch effects based on top 50 PCs viaharmony.run_harmony
(max_iter_harmony?=?15, max_iter_kmeans?=?10
) implemented byPython package harmonypy
(0.0.6). Cell clusters were identified using the Leiden algorithm (resolution?=?0.5
, Scanpy) based on harmony space. Genes withP value < 0.01
(Wilcoxon rank-sum test) and genelog fold change ≥ 0.25
for each cluster were selected as DEGs.Clusters sharing top DEGs (n?=?20)
weremerged
before being classified as cell subtypes. DEGs of cell subtypes can be found in Supplementary information, Table S2.Trajectory
inference for lung epithelial cell and SMC development
To map the developmental trajectories of lung epithelial cells, we integrated and analyzed the dataset obtained in this study and the two datasets from 10× Genomics platform: Miller et al.9 (ArrayExpress: E-MTAB-8221) of 11.5 weeks, 15.0 weeks, 18.0 weeks, 21.0 weeks and Travaglini et al.13 (EGA: EGAS00001004344) of adult lung. The union of TFs within the top 5000 HVGs and top 100 HVGs per dataset (643 genes) were selected for analysis. Next, we appliedmonocle3
to pre-process data, reduce dimensionality, and visualize trajectory. To discern proximal and distal epithelium lineages, as well as VSMC and ASMC, we computed developmental trajectories utilizingWOT
(1.0.8.post2),20 an optimal transport-based approach to infer ancestor–descendant relationships between cells across adjacent time points of development. We obtained the cell trajectory scores via tmap_model.ancestors and tmap_model.trajectories implemented in the Python package WOT.
For VSMC and ASMC trajectory inference, the cells of weeks 4–5 from SC_Early withtrajectory scores > 0.0001
for VSMC and ASMC were considered asSMC progenitors
. The predicted progenitors combined with SMCs were selected to perform trajectory analysis again using Python packageharmonyTS
(0.1.4).77 The developmental pseudotime and branch probability of VSMC and ASMC were obtained via Python package Palantir (1.0.0).61 Then, we smoothed the gene expression via R packagegam
(1.20.1). The heatmaps illustrating the gene expression tendency along the developmental trajectory of ASMC and VSMC were generated by R packageComplexHeatmap
(2.14.0) (Fig. 5c). To observe the bias of cell fate at weeks 4, 5, and 6, we estimated the cell fate probability viatmap_model.fates
implemented in Python packageWOT
.TF activity inference
To discover the transcriptional regulators of lung epithelial proximal–distal patterning, we appliedpySCENIC
(0.11.0) algorithm to identify significant regulons (NES > 1). We then calculated the TF activity based on gene expression levels of each regulon. Weighted sums of TF activities were calculated based onWOT trajectory scores
at each time point. The motifs listed in this study were validated byhttps://jaspar.genereg.net
. For details on the TF list, please see Supplementary information, Tables S3 and S7.RNA velocity analysis
We applied thescVelo
(0.2.3) package to calculate the RNA velocity of epithelial cells at weeks 4, 6 and 8. For each time point, thetop 10 PCs
oftop 3000 HVGs
were used as input. For details of these processes, please see the scVelo pipeline (https://scvelo.readthedocs.io/index.html).Comparison of TFs between
human and mouse epithelia
We collected the mouse lung epithelial single-cell data from E12.0 to E14.0 with an interval of 0.5 days. Low-quality cells were filtered out by the number of detectedgenes (< 1800 & > 11,000)
and totalcounts (counts > 100,000)
. We usedScrublet
to predict and exclude potential doublet cells (score > 0.4
). Clustering and visualization were implemented by functionsscanpy.tl.leiden
and scanpy.tl.umap, respectively. The Nkx2-1 (the core TF for lung epithelium development) and Cdh1 (a universal marker for epithelial cells) positive populations were selected as the mouse lung epithelial cells. Sox2/Sox21/Klf5- and Sox9/Gata6/Etv5-expressing cells were annotated as proximal and distal epithelial cells, respectively.
Human–mouse orthologous genes were used for comparison (Human and Mouse Homologs from MGI database, https://www.informatics.jax.org/homology.shtml). We calculated the differentially expressed TFs in human and mouse (P value < 0.01, t-test
), respectively. The TFs were divided into 5 categories: proximal shared, distal shared, human-specific, mouse-specific and human–mouse inverse. The TF list can be found in Supplementary information, Table S5.Comparison of human and mouse stromal cells
To compare mesenchymal lineages between human and mouse, we integrated and analyzed stromal cells from this study and the Goodwin et al.47 (GEO:GSE153069
) mouse dataset from the 10× Genomics platform. Orthologous genes in human and mouse were extracted for comparison. Then, we calculated species-specific and cell type-specific DEGs across human and mouse stromal cell types at 6–7 weeks and E11.5 (P value < 0.01
, Wilcoxon rank-sum test). These DEG sets included ‘Common markers for stromal cells’, ‘Shared DEGs’, ‘Human SC_BDNF+ specific’, ‘Mouse mesothelium-specific’ and ‘Mouse sub-meso-specific’. For details on DEGs, please see Supplementary information, Table S10.Ligand–receptor interaction analysis
We integrated the ligand–receptor databases,cellphoneDB
and intact (https://www.ebi.ac.uk/intact). The product of the expression percentage (nUMI > 0) of a given ligand and its receptor from the corresponding provider and recipient cells was calculated as the ligand–receptor score. To calculate the significance of the interaction for each ligand–receptor pair, cell type labels were permuted 1000 times to generate the background distribution of ligand–receptor scores (standardized to mean of 0 and standard deviation of 1). P value < 0.001 (U-test) was considered significant. For details on the ligand–receptor list, please see Supplementary information, Table S6.
Spatial transcriptomic analysis
Processing of sequencing files
The FASTQ files of the spatial transcriptomic library was generated by MGISEQ2000. The FASTQ files were aligned to the hg38 genome with human gene annotation based onGencode v32
usingspaceranger (2.0.0)
. The serial number of theVisum slide
is V11T16-101, the slide file can be downloaded fromhttps://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/slidefile-download
. For the details of data quality, please see Supplementary information, Table S9.Basic analysis and visualization of the Visium data
Low-quality spots were filtered out if the number of detectedgenes < 2000
. The spatial data were normalized via functionsscanpy.pp.normalize
andscanpy.pp.log1p
. Spatial visualization was performed via functionscanpy.pl.spatial
.Mapping single-cell data to spatial transcriptomic data
We usedTangram35 (1.0.3)
to map the annotated scRNA-seq data to the spatial spots. To match these two omics, we selected the 6-week data for mapping, using unnormalized UMI counts as input. For the spatial data, we calculated the Moran’I (I, a measure of spatial autocorrelation) for each gene. Genes with absolute Igreater than 0.05
were selected asspatially variable genes (SVGs)
, as calculated by squipy.gr.spatial_autocorr function of Python packagesquidpy
(1.2.3). A total of 2450 genes of SVGs and HVGs (from single-cell data) were used for mapping, resulting in a probability matrix of the assignment of each cell to all spots. To further access the proportion of cell types in each spot, we summed the probability of each single-cell-annotated cell type. For the details of mapping probability, please see Supplementary information, Table S11.The adjacent score for stromal cells and epithelial cells
We first identified spots surrounding the proximal or distal epithelium region. These spots, as well as those in the epithelial region, were defined as proximal- or distal-associated spots, respectively. To reduce the integration noise between single-cell and spatial data, cell types witha proportion less than 0.4
were excluded in each selected spot. The adjacent score of each cell type to the epithelium was calculated by summing their proportions in proximal- or distal-associated spots. The scores were displayed by Sankey diagram via sankeyNetwork function of R packagenetworkD3
(0.4). The information can be found in Supplementary information, Table S11.Evaluating the score of cellular interactions in Visium slides
We first assigned each cell to a spot based on the maximum mapping probability from Tangram, defined as a recipient cell (expressing receptors), surrounded by provider cells (expressing ligands) in the neighboring spots. Their ligand–receptor scores and P values were calculated as described in the Ligand–receptor interaction analysis section. The ligand–receptor score between distal epithelium and SC_BDNF+ were shown in Supplementary information, Fig. S7c.