glusterfs架構(gòu)詳解

Gluserfs詳解

doc home:https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/
??本文主要對(duì)官網(wǎng)進(jìn)行了翻譯壁袄,更方便記錄查看京郑,解釋有誤的地方請(qǐng)大家指出巨坊,架構(gòu)整理和源碼詳解會(huì)在之后相繼發(fā)布文章刚陡。

FUSE

GlusterFS is a userspace filesystem. This was a decision made by the GlusterFS developers initially as getting the modules into linux kernel is a very long and difficult process.

GlusterFS是一個(gè)userspace filesystem,這是GlusterFS開發(fā)人員最初做出的決定,因?yàn)閷⒛K引入linux內(nèi)核是一個(gè)非常漫長(zhǎng)和困難的過程。

Being a userspace filesystem, to interact with kernel VFS, GlusterFS makes use of FUSE (File System in Userspace). For a long time, implementation of a userspace filesystem was considered impossible. FUSE was developed as a solution for this. FUSE is a kernel module that support interaction between kernel VFS and non-privileged user applications and it has an API that can be accessed from userspace. Using this API, any type of filesystem can be written using almost any language you prefer as there are many bindings between FUSE and other languages.

作為userspace filesystem,為了與kernel vfs交互,Glusterfs使用了FUSE。FUSE是一個(gè)kernel module假颇,支持內(nèi)核VFS和非特權(quán)用戶應(yīng)用程序之間的交互,并且有一個(gè)可以從用戶空間訪問的API骨稿,使用這個(gè)API拆融,幾乎可以使用您喜歡的任何語言來編寫任何類型的文件系統(tǒng),因?yàn)樵贔USE和其他語言之間有許多綁定

image.png

This shows a filesystem "hello world" that is compiled to create a binary "hello". It is executed with a filesystem mount point /tmp/fuse. Then the user issues a command ls -l on the mount point /tmp/fuse. This command reaches VFS via glibc and since the mount /tmp/fuse corresponds to a FUSE based filesystem, VFS passes it over to FUSE module. The FUSE kernel module contacts the actual filesystem binary "hello" after passing through glibc and FUSE library in userspace(libfuse). The result is returned by the "hello" through the same path and reaches the ls -l command.

這顯示了一個(gè)文件系統(tǒng)“hello world”啊终,它被編譯來創(chuàng)建一個(gè)二進(jìn)制“hello”镜豹。它是在文件系統(tǒng)掛載點(diǎn)/tmp/fuse上執(zhí)行的。然后用戶在掛載點(diǎn)/tmp/fuse上發(fā)出ls -l命令蓝牲。這個(gè)命令通過glibc到達(dá)VFS趟脂,因?yàn)閙ount /tmp/fuse對(duì)應(yīng)于基于fuse的文件系統(tǒng),所以VFS將它傳遞給fuse模塊例衍。FUSE內(nèi)核模塊在用戶空間(libfuse)中通過glibc和FUSE庫之后昔期,與實(shí)際的文件系統(tǒng)二進(jìn)制文件“hello”進(jìn)行聯(lián)系。結(jié)果由“hello”通過相同的路徑返回佛玄,并到達(dá)ls -l命令硼一。

The communication between FUSE kernel module and the FUSE library(libfuse) is via a special file descriptor which is obtained by opening /dev/fuse. This file can be opened multiple times, and the obtained file descriptor is passed to the mount syscall, to match up the descriptor with the mounted filesystem.

FUSE內(nèi)核模塊與FUSE庫(libfuse)之間的通信是通過一個(gè)特殊的file descriptor進(jìn)行的,該file descriptor是通過打開/dev/fuse. .來獲取的可以多次打開這個(gè)文件梦抢,并將獲得的file descriptor傳遞給掛載的syscall般贼,以便將描述符與掛載的文件系統(tǒng)相匹配。

More about userspace filesystems

FUSE reference

Translators

Translating “translators”:

A translator converts requests from users into requests for storage.(將用戶請(qǐng)求轉(zhuǎn)換為存儲(chǔ)請(qǐng)求)

*One to one, one to many, one to zero (e.g. caching)


image.png
  • A translator can modify requests on the way through :

    ==translator可以通過一下方式修改請(qǐng)求==

    convert one request type to another ( during the request transfer amongst the translators) modify paths, flags, even data (e.g. encryption)

    ==將一種請(qǐng)求類型轉(zhuǎn)換為另一種請(qǐng)求類型(在轉(zhuǎn)換器之間的請(qǐng)求傳輸期間)奥吩,修改paths哼蛆、flags甚至data(例如,encryption)==
  • Translators can intercept or block the requests. (e.g. access control)

    ==Translators可以攔截或阻止請(qǐng)求(如訪問控制)==
  • Or spawn new requests (e.g. pre-fetch)

    ==或產(chǎn)生新的請(qǐng)求(如預(yù)取)==

How Do Translators Work?

  • Shared Objects ==共享對(duì)象==
  • Dynamically loaded according to 'volfile'
    ==根據(jù)“volfile”動(dòng)態(tài)加載==

    dlopen/dlsync setup pointers to parents / children call init (constructor) call IO functions through fops.

    ==dlopen/dlsync設(shè)置指針的父/子調(diào)用init(構(gòu)造函數(shù))通過fops調(diào)用IO函數(shù)==
  • Conventions for validating/ passing options, etc.
    ==約定驗(yàn)證/傳遞選項(xiàng)等==
  • The configuration of translators (since GlusterFS 3.1) is managed through the gluster command line interface (cli), so you don't need to know in what order to graph the translators together.
    ==Translators 配置(自從GlusterFS 3.1)是通過gluster命令行接口(cli)進(jìn)行管理的霞赫,所以不需要知道以什么順序?qū)⑦@些翻譯器組合在一起腮介。==

Types of Translators

List of known translators with their current status.

Translator Type Functional Purpose note
Storage Lowest level translator, stores and accesses data from local file system. Lowest level轉(zhuǎn)換器,存儲(chǔ)和訪問本地文件系統(tǒng)中的數(shù)據(jù)
Debug Provide interface and statistics for errors and debugging 提供錯(cuò)誤和調(diào)試的接口和統(tǒng)計(jì)信息端衰。
Cluster Handle distribution and replication of data as it relates to writing to and reading from bricks & nodes. 處理數(shù)據(jù)的分布和復(fù)制叠洗,因?yàn)樗婕暗綄?duì)塊和節(jié)點(diǎn)的寫入和讀取
Encryption Extension translators for on-the-fly encryption/decryption of stored data. 動(dòng)態(tài)加密/解密存儲(chǔ)數(shù)據(jù)的加密擴(kuò)展
Protocol Extension translators for client/server communication protocols. translators 客戶端/服務(wù)器通信協(xié)議的擴(kuò)展程序
Performance Tuning translators to adjust for workload and I/O profiles. 調(diào)整translator以適應(yīng)工作負(fù)載和I/O配置文件
Bindings Add extensibility, e.g. The Python interface written by Jeff Darcy to extend API interaction with GlusterFS. 添加可擴(kuò)展性甘改,例如Jeff Darcy編寫的Python接口,以擴(kuò)展與GlusterFS的API交互
System System access translators, e.g. Interfacing with file system access control. 文件系統(tǒng)訪問接口
Scheduler I/O schedulers that determine how to distribute new write operations across clustered systems. I/O調(diào)度程序灭抑,確定如何跨集群系統(tǒng)分發(fā)新的寫操作
Features Add additional features such as Quotas, Filters, Locks, etc. 添加額外的特性楼誓,如配額、過濾器名挥、鎖等

The default / general hierarchy of translators in vol files :


image.png

All the translators hooked together to perform a function is called a graph. The left-set of translators comprises of Client-stack.The right-set of translators comprises of Server-stack.

==所有translator hook在一起執(zhí)行一個(gè)function稱作一個(gè)graph==

The glusterfs translators can be sub-divided into many categories, but two important categories are - Cluster and Performance translators :

==gluster translator可以分為很多類別,兩個(gè)重要的類別是Cluster and Performance(性能) translators==

One of the most important and the first translator the data/request has to go through is fuse translator which falls under the category of Mount Translators.

==data/request必須通過的一個(gè)translator是fuse translator主守,它屬于Mount translator的范疇禀倔。==

  1. Cluster Translators:
  • DHT(Distributed Hash Table)==(分布式Hash Table)==
  • AFR(Automatic File Replication)==(自動(dòng)文本復(fù)制)==
  1. Performance Translators:
  • io-cache
  • io-threads
  • md-cache
  • O-B (open behind)
  • QR (quick read)
  • r-a (read-ahead)
  • w-b (write-behind)

例如:gluster volume info查看到

[root@node4 /]# gluster volume info
 
Volume Name: heketidbstorage
Type: Replicate
Volume ID: d141e423-cc06-4fa5-a7ef-5edbc1b405ce
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.8.4.184:/var/lib/heketi/mounts/vg_cbdb3ac545853d57be6c39db60b7f647/brick_661b5cdd36ee50aaeeeb3490737f2851/brick
Brick2: 10.8.4.182:/var/lib/heketi/mounts/vg_ddfdf2348bf510c356d5234e0ed0a0ec/brick_5ca55df9bbb54e531d6fc4205b297503/brick
Brick3: 10.8.4.183:/var/lib/heketi/mounts/vg_2ceb6870ad884b6507a767308980cf8a/brick_eaa40efdc196a5a18158c79e0b1b0459/brick
Options Reconfigured:
user.heketi.id: 4550c84383c151de59bd6679e73b9117
user.heketi.dbstoragelevel: 1
performance.readdir-ahead: off
performance.io-cache: off
performance.read-ahead: off
performance.strict-o-direct: on
performance.quick-read: off
performance.open-behind: off
performance.write-behind: off
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Other Feature Translators include:

  • changelog ==(日志更新)==
  • locks - GlusterFS has locks translator which provides the following internal locking operations called inodelk, entrylk, which are used by afr to achieve synchronization of operations on files or directories that conflict with each other.
    ==locks - GlusterFS,提供了以下稱為inodelk、entrylk的內(nèi)部鎖定操作参淫,afr使用這些操作來實(shí)現(xiàn)對(duì)相互沖突的文件或目錄的操作的synchronization==
  • marker ==(標(biāo)記)==
  • quota ==(配額)==
Debug Translators
  • trace - To trace the error logs generated during the communication amongst the translators.
    ==跟蹤translator程序之間通信過程中產(chǎn)生的錯(cuò)誤日志==
  • io-stats ==(io狀態(tài))==
DHT(Distributed Hash Table) Translator
What is DHT?

DHT is the real core of how GlusterFS aggregates capacity and performance across multiple servers. Its responsibility is to place each file on exactly one of its subvolumes – unlike either replication (which places copies on all of its subvolumes) or striping (which places pieces onto all of its subvolumes). It’s a routing function, not splitting or copying.

DHT是GlusterFS跨多個(gè)服務(wù)器聚合容量和性能的真正核心救湖。它的職責(zé)是將每個(gè)文件精確地放在其中的一個(gè)subvolumes(子卷)上——這與replication(將副本放在所有子卷上)和striping(將分片放在所有子卷上)不同。它是一個(gè)路由函數(shù)涎才,而不是分割或復(fù)制鞋既。

How DHT works?

The basic method used in DHT is consistent hashing. Each subvolume (brick) is assigned a range within a 32-bit hash space, covering the entire range with no holes or overlaps. Then each file is also assigned a value in that same space, by hashing its name. Exactly one brick will have an assigned range including the file’s hash value, and so the file “should” be on that brick. However, there are many cases where that won’t be the case, such as when the set of bricks (and therefore the range assignment of ranges) has changed since the file was created, or when a brick is nearly full. Much of the complexity in DHT involves these special cases, which we’ll discuss in a moment.

DHT中使用的基本方法是consistent hashing。每一個(gè)subvolume(brick)在32-bit hash space分配一個(gè)范圍耍铜,覆蓋entire range(整個(gè)范圍)邑闺,沒有holes or overlaps(漏洞或重疊)。然后棕兼,通過hash每個(gè)文件的名稱陡舅,在相同的空間中為每個(gè)文件分配一個(gè)值。只有一個(gè)brick具有指定的范圍伴挚,包括文件的hash value靶衍,因此文件“應(yīng)該”位于該brick上。但是茎芋,在許多情況下颅眶,情況并非如此,例如brick集(因此田弥,范圍的范圍分配)發(fā)生了變化涛酗,或者brick幾乎滿了。DHT中的許多復(fù)雜性都涉及到這些特殊情況偷厦,我們稍后將對(duì)此進(jìn)行討論

When you open() a file, the distribute translator is giving one piece of information to find your file, the file-name. To determine where that file is, the translator runs the file-name through a hashing algorithm in order to turn that file-name into a number.

當(dāng)您打開一個(gè)file時(shí)煤杀,distribute translator將會(huì)提供一條信息來查找您的文件,即file-name文件名稱沪哺。為了確定文件的位置沈自,translator通過哈希算法運(yùn)行文件名,以便將文件名轉(zhuǎn)換為數(shù)字辜妓。

A few Observations of DHT hash-values assignment:

The assignment of hash ranges to bricks is determined by extended attributes stored on directories, hence distribution is directory-specific.
Consistent hashing is usually thought of as hashing around a circle, but in GlusterFS it’s more linear. There’s no need to “wrap around” at zero, because there’s always a break (between one brick’s range and another’s) at zero.
If a brick is missing, there will be a hole in the hash space. Even worse, if hash ranges are reassigned while a brick is offline, some of the new ranges might overlap with the (now out of date) range stored on that brick, creating a bit of confusion about where files should be.

hash范圍到brick的分配是由存儲(chǔ)在目錄中的擴(kuò)展屬性決定的枯途,因此分布是特定于目錄的忌怎。Consistent hashing is usually thought of as hashing around a circle, but in GlusterFS it’s more linear 但在GlusterFS中它更線性,沒有必要在0處“纏繞”酪夷,因?yàn)樵?處總是有一個(gè)斷點(diǎn)(在一個(gè)brick的范圍和另brick的范圍之間)榴啸。如果丟了一個(gè)brick,hash space就會(huì)有一個(gè)洞晚岭,更糟糕的是鸥印,如果在brick離線時(shí)重新分配hash范圍,那么一些新的范圍可能與brick上存儲(chǔ)的(現(xiàn)在已經(jīng)過時(shí)的)范圍重疊坦报,從而造成文件應(yīng)該放在哪里的混亂库说。

AFR(Automatic File Replication) Translator

The Automatic File Replication (AFR) translator in GlusterFS makes use of the extended attributes to keep track of the file operations.It is responsible for replicating the data across the bricks.

GlusterFS中的AFR使用擴(kuò)展屬性來跟蹤文件操作。它負(fù)責(zé)跨塊復(fù)制數(shù)據(jù)

RESPONSIBILITIES OF AFR

Its responsibilities include the following:

  1. Maintain replication consistency (i.e. Data on both the bricks should be same, even in the cases where there are operations happening on same file/directory in parallel from multiple applications/mount points as long as all the bricks in replica set are up).

保持復(fù)制一致性(即兩個(gè)brick上的數(shù)據(jù)應(yīng)該是相同的片择,即使在多個(gè)應(yīng)用程序/掛載點(diǎn)并行地在相同的文件/目錄上發(fā)生操作的情況下潜的,只要復(fù)制集中的所有brick都已建立)。

  1. Provide a way of recovering data in case of failures as long as there is at least one brick which has the correct data.

提供一種在發(fā)生故障時(shí)恢復(fù)數(shù)據(jù)的方法字管,只要至少有一個(gè)brick具有正確的數(shù)據(jù)啰挪。

  1. Serve fresh data for read/stat/readdir etc.

為read/stat/readdir等提供新的數(shù)據(jù)

Geo-Replication

Geo-replication provides asynchronous replication of data across geographically distinct locations and was introduced in Glusterfs 3.2. It mainly works across WAN and is used to replicate the entire volume unlike AFR which is intra-cluster replication. This is mainly useful for backup of entire data for disaster recovery.

Geo-replication提供跨地理位置的異步數(shù)據(jù)復(fù)制,這在Glusterfs 3.2中介紹過嘲叔。它主要跨WAN工作亡呵,用于復(fù)制整個(gè)卷,而不像AFR硫戈,它是集群內(nèi)復(fù)制政己。這主要用于備份整個(gè)數(shù)據(jù)以進(jìn)行災(zāi)難恢復(fù)。

Geo-replication uses a master-slave model, whereby replication occurs between Master - a GlusterFS volume and Slave - which can be a local directory or a GlusterFS volume. The slave (local directory or volume is accessed using SSH tunnel).

Geo-replication使用主從模型掏愁,在主從(GlusterFS卷)和從(可以是本地目錄或GlusterFS卷)之間進(jìn)行復(fù)制歇由。從(本地目錄或卷使用SSH隧道訪問)。

Geo-replication provides an incremental replication service over Local Area Networks (LANs), Wide Area Network (WANs), and across the Internet.

Geo-replication通過局域網(wǎng)(LANs)果港、廣域網(wǎng)(WANs)和Internet提供增量復(fù)制服務(wù)沦泌。

Geo-replication over LAN

You can configure Geo-replication to mirror data over a Local Area Network.

image.png
Geo-replication over WAN

You can configure Geo-replication to replicate data over a Wide Area Network.

image.png
Geo-replication over Internet

You can configure Geo-replication to mirror data over the Internet.

image.png
Multi-site cascading Geo-replication

You can configure Geo-replication to mirror data in a cascading fashion across multiple sites.

image.png

There are mainly two aspects while asynchronously replicating data:

1.Change detection - These include file-operation necessary details. There are two methods to sync the detected changes: ==(兩種方法來同步檢測(cè)到的變化:)==

i. Changelogs - Changelog is a translator which records necessary details for the fops that occur. The changes can be written in binary format or ASCII. There are three category with each category represented by a specific changelog format. All three types of categories are recorded in a single changelog file.

變更日志-變更日志是一個(gè)記錄發(fā)生的fops的必要細(xì)節(jié)的翻譯程序。更改可以用二進(jìn)制格式或ASCII編寫辛掠。有三個(gè)類別谢谦,每個(gè)類別由特定的changelog格式表示。所有這三種類型的類別都記錄在一個(gè)單獨(dú)的變更日志文件中萝衩。

Entry - create(), mkdir(), mknod(), symlink(), link(), rename(), unlink(), rmdir()

Data - write(), writev(), truncate(), ftruncate()

Meta - setattr(), fsetattr(), setxattr(), fsetxattr(), removexattr(), fremovexattr()

In order to record the type of operation and entity underwent, a type identifier is used. Normally, the entity on which the operation is performed would be identified by the pathname, but we choose to use GlusterFS internal file identifier (GFID) instead (as GlusterFS supports GFID based backend and the pathname field may not always be valid and other reasons which are out of scope of this document). Therefore, the format of the record for the three types of operation can be summarized as follows:

為了記錄所執(zhí)行的操作和實(shí)體的類型回挽,使用了類型標(biāo)識(shí)符。通常情況下,實(shí)體的操作將被執(zhí)行路徑名,但我們選擇使用從而內(nèi)部文件標(biāo)識(shí)符(GFID)而不是(從而支持基于GFID端和路徑名字段可能并不總是有效的)猩谊。因此千劈,三種操作的記錄格式可以概括為:

Entry - GFID + FOP + MODE + UID + GID + PARGFID/BNAME [PARGFID/BNAME]

Meta - GFID of the file

Data - GFID of the file

GFID's are analogous to inodes. Data and Meta fops record the GFID of the entity on which the operation was performed, thereby recording that there was a data/metadata change on the inode. Entry fops record at the minimum a set of six or seven records (depending on the type of operation), that is sufficient to identify what type of operation the entity underwent. Normally this record includes the GFID of the entity, the type of file operation (which is an integer [an enumerated value which is used in Glusterfs]) and the parent GFID and the basename (analogous to parent inode and basename).

GFID類似于inode。數(shù)據(jù)和元fops記錄執(zhí)行操作的實(shí)體的GFID牌捷,從而記錄inode上的數(shù)據(jù)/元數(shù)據(jù)更改。輸入fops記錄至少有6條或7條記錄(取決于操作的類型),這足以確定實(shí)體所經(jīng)歷的操作類型粱玲。通常,這個(gè)記錄包括實(shí)體的GFID痴怨、文件操作的類型(它是一個(gè)整數(shù)[Glusterfs中使用的枚舉值])、父GFID和basename(類似于父inode和basename)。

Changelog file is rolled over after a specific time interval. We then perform processing operations on the file like converting it to understandable/human readable format, keeping private copy of the changelog etc. The library then consumes these logs and serves application requests.

Changelog文件在特定的時(shí)間間隔后滾動(dòng)。然后棒口,我們對(duì)文件執(zhí)行處理操作,例如將其轉(zhuǎn)換為可理解/人類可讀的格式辜膝,保留更改日志的私有副本等无牵。然后,庫使用這些日志并為應(yīng)用程序請(qǐng)求提供服務(wù)内舟。

ii. Xsync - Marker translator maintains an extended attribute “xtime” for each file and directory. Whenever any update happens it would update the xtime attribute of that file and all its ancestors. So the change is propagated from the node (where the change has occurred) all the way to the root.
==Xsync—標(biāo)記轉(zhuǎn)換器為每個(gè)文件和目錄維護(hù)一個(gè)擴(kuò)展屬性“xtime”。每當(dāng)發(fā)生任何更新時(shí)初橘,它都會(huì)更新該文件及其所有祖先的xtime屬性验游。因此,更改從節(jié)點(diǎn)(發(fā)生更改的地方)一直傳播到根==

image.png

Consider the above directory tree structure. At time T1 the master and slave were in sync each other.
image.png

At time T2 a new file File2 was created. This will trigger the xtime marking (where xtime is the current timestamp) from File2 upto to the root, i.e, the xtime of File2, Dir3, Dir1 and finally Dir0 all will be updated.

在T2時(shí)保檐,創(chuàng)建了一個(gè)新文件File2耕蝉。這將觸發(fā)從File2到根i的xtime標(biāo)記(其中xtime是當(dāng)前時(shí)間戳)。e, File2的xtime, Dir3, Dir1夜只,最后Dir0都會(huì)被更新垒在。

Geo-replication daemon crawls the file system based on the condition that xtime(master) > xtime(slave). Hence in our example it would crawl only the left part of the directory structure since the right part of the directory structure still has equal timestamp. Although the crawling algorithm is fast we still need to crawl a good part of the directory structure.

Geo-replication守護(hù)進(jìn)程根據(jù)xtime(主)> xtime(從)的條件抓取文件系統(tǒng)。因此扔亥,在我們的示例中场躯,它只會(huì)爬行目錄結(jié)構(gòu)的左側(cè)部分,因?yàn)槟夸浗Y(jié)構(gòu)的右側(cè)部分仍然具有相同的時(shí)間戳旅挤。雖然抓取算法比較快踢关,但仍然需要抓取目錄結(jié)構(gòu)的一部分。

2.Replication - We use rsync for data replication. Rsync is an external utility which will calculate the diff of the two files and sends this difference from source to sync.
==Replication——我們使用rsync進(jìn)行數(shù)據(jù)復(fù)制粘茄。Rsync是一個(gè)外部實(shí)用程序签舞,它將計(jì)算兩個(gè)文件的差異,并將這種差異從源文件發(fā)送到sync柒瓣。==

Overall working of GlusterFS

As soon as GlusterFS is installed in a server node, a gluster management daemon(glusterd) binary will be created. This daemon should be running in all participating nodes in the cluster. After starting glusterd, a trusted server pool(TSP) can be created consisting of all storage server nodes (TSP can contain even a single node). Now bricks which are the basic units of storage can be created as export directories in these servers. Any number of bricks from this TSP can be clubbed together to form a volume.

一旦GlusterFS安裝到服務(wù)器節(jié)點(diǎn)上儒搭,就會(huì)創(chuàng)建一個(gè)gluster管理守護(hù)進(jìn)程(glusterd)二進(jìn)制文件。這個(gè)守護(hù)進(jìn)程應(yīng)該在集群中所有參與節(jié)點(diǎn)中運(yùn)行芙贫。啟動(dòng)glusterd之后搂鲫,可以創(chuàng)建由所有storage server nodes(TSP甚至可以包含單個(gè)節(jié)點(diǎn))組成的可信服務(wù)器池(TSP)。這個(gè)TSP中的任意數(shù)量的brick可以被連接在一起形成一個(gè)volume磺平。

Once a volume is created, a glusterfsd process starts running in each of the participating brick. Along with this, configuration files known as vol files will be generated inside /var/lib/glusterd/vols/. There will be configuration files corresponding to each brick in the volume. This will contain all the details about that particular brick. Configuration file required by a client process will also be created. Now our filesystem is ready to use. We can mount this volume on a client machine very easily as follows and use it like we use a local storage:

創(chuàng)建卷之后默穴,glusterfsd進(jìn)程將在每個(gè)participating brick(參與的brick)中運(yùn)行怔檩。與此同時(shí),將在/var/lib/glusterd/vol/中生成稱為vol的配置文件,將有與volume中的每個(gè)brick對(duì)應(yīng)的配置文件蓄诽。這將包含關(guān)于特定磚塊的所有細(xì)節(jié)薛训。還將創(chuàng)建客戶機(jī)進(jìn)程所需的配置文件。現(xiàn)在我們的文件系統(tǒng)可以使用了仑氛。我們可以很容易地將這個(gè)卷掛載在客戶端機(jī)器上乙埃,如下所示,并像使用本地存儲(chǔ)一樣使用它:

mount.glusterfs <IP or hostname>:<volume_name> <mount_point>
IP or hostname can be that of any node in the trusted server pool in which the required volume is created.

When we mount the volume in the client, the client glusterfs process communicates with the servers’ glusterd process. Server glusterd process sends a configuration file (vol file) containing the list of client translators and another containing the information of each brick in the volume with the help of which the client glusterfs process can now directly communicate with each brick’s glusterfsd process. The setup is now complete and the volume is now ready for client's service.

在客戶機(jī)中掛載卷時(shí)锯岖,客戶機(jī)glusterfs進(jìn)程與服務(wù)器的glusterd進(jìn)程通信介袜。服務(wù)器glusterd進(jìn)程發(fā)送一個(gè)配置文件(vol文件),其中包含the list of client translators出吹,另一個(gè)配置文件包含卷中每個(gè)brick的信息遇伞,在此幫助下,客戶端glusterfs進(jìn)程現(xiàn)在可以直接與每個(gè)brick的glusterfsd進(jìn)程通信捶牢。安裝現(xiàn)在已經(jīng)完成鸠珠,卷已經(jīng)為客戶端服務(wù)做好了準(zhǔn)備。

image.png

當(dāng)客戶機(jī)在掛載的文件系統(tǒng)中發(fā)出系統(tǒng)調(diào)用(文件操作或Fop)時(shí)秋麸,VFS(確定文件系統(tǒng)的類型為glusterfs)將把請(qǐng)求發(fā)送到FUSE內(nèi)核模塊渐排。FUSE內(nèi)核模塊將依次通過/dev/fuse將其發(fā)送到客戶機(jī)節(jié)點(diǎn)的userspace中的GlusterFS(這在FUSE部分已經(jīng)描述過了)【捏。客戶機(jī)上的GlusterFS進(jìn)程由一組 client translators的translator程序組成驯耻,這些translatot在存儲(chǔ)服務(wù)器glusterd進(jìn)程發(fā)送的配置文件(vol文件)中定義。這些轉(zhuǎn)換器中的第一個(gè)是FUSE Translator炒考,它由FUSE庫(libfuse)組成可缚。每個(gè)translator都具有與glusterfs支持的每個(gè)文件操作或fop對(duì)應(yīng)的function。該請(qǐng)求將在每個(gè)translator中執(zhí)行相應(yīng)的功能斋枢。主要client translators包括:

  • FUSE translator
  • DHT translator- DHT translator maps the request to the correct brick that contains the file or directory required.
  • AFR translator- It receives the request from the previous translator and if the volume type is replicate, it duplicates the request and passes it on to the Protocol client translators of the replicas.
  • Protocol Client translator- Protocol Client translator is the last in the client translator stack. This translator is divided into multiple threads, one for each brick in the volume. This will directly communicate with the glusterfsd of each brick.
    In the storage server node that contains the brick in need, the request again goes through a series of translators known as server translators, main ones being:

在storage server node中包含需要的brick城看,請(qǐng)求再次通過一系列稱為server translators的translator,主要有:

  • Protocol server translator
  • POSIX translator
    The request will finally reach VFS and then will communicate with the underlying native filesystem. The response will retrace the same path.

請(qǐng)求將最終到達(dá)VFS杏慰,然后與底層本機(jī)文件系統(tǒng)通信测柠。響應(yīng)將重新跟蹤相同的路徑。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末缘滥,一起剝皮案震驚了整個(gè)濱河市轰胁,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌朝扼,老刑警劉巖赃阀,帶你破解...
    沈念sama閱讀 221,635評(píng)論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡榛斯,警方通過查閱死者的電腦和手機(jī)观游,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,543評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來驮俗,“玉大人懂缕,你說我怎么就攤上這事⊥醮眨” “怎么了搪柑?”我有些...
    開封第一講書人閱讀 168,083評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)索烹。 經(jīng)常有香客問我工碾,道長(zhǎng),這世上最難降的妖魔是什么百姓? 我笑而不...
    開封第一講書人閱讀 59,640評(píng)論 1 296
  • 正文 為了忘掉前任渊额,我火速辦了婚禮,結(jié)果婚禮上垒拢,老公的妹妹穿的比我還像新娘旬迹。我一直安慰自己,他們只是感情好子库,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,640評(píng)論 6 397
  • 文/花漫 我一把揭開白布舱权。 她就那樣靜靜地躺著矗晃,像睡著了一般仑嗅。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上张症,一...
    開封第一講書人閱讀 52,262評(píng)論 1 308
  • 那天仓技,我揣著相機(jī)與錄音,去河邊找鬼俗他。 笑死脖捻,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的兆衅。 我是一名探鬼主播地沮,決...
    沈念sama閱讀 40,833評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼羡亩!你這毒婦竟也來了摩疑?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,736評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤畏铆,失蹤者是張志新(化名)和其女友劉穎雷袋,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體辞居,經(jīng)...
    沈念sama閱讀 46,280評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡楷怒,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,369評(píng)論 3 340
  • 正文 我和宋清朗相戀三年蛋勺,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片鸠删。...
    茶點(diǎn)故事閱讀 40,503評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡抱完,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出冶共,到底是詐尸還是另有隱情乾蛤,我是刑警寧澤,帶...
    沈念sama閱讀 36,185評(píng)論 5 350
  • 正文 年R本政府宣布捅僵,位于F島的核電站家卖,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏庙楚。R本人自食惡果不足惜上荡,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,870評(píng)論 3 333
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望馒闷。 院中可真熱鬧酪捡,春花似錦、人聲如沸纳账。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,340評(píng)論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽疏虫。三九已至永罚,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間卧秘,已是汗流浹背呢袱。 一陣腳步聲響...
    開封第一講書人閱讀 33,460評(píng)論 1 272
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留翅敌,地道東北人羞福。 一個(gè)月前我還...
    沈念sama閱讀 48,909評(píng)論 3 376
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像蚯涮,于是被迫代替她去往敵國和親治专。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,512評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,345評(píng)論 0 10
  • 自行整理, 學(xué)習(xí)用途, 侵知?jiǎng)h歉 一.術(shù)語介紹 來源http://gluster.readthedocs.io/e...
    丸蛋蟹閱讀 2,562評(píng)論 0 3
  • 對(duì)于老前端來說最頭疼的事情之一那就是瀏覽器的兼容問題遭顶,掉的頭發(fā)有一部分就有他的功勞张峰,為什么會(huì)有瀏覽器兼容問題呢? ...
    鋒享前端閱讀 438評(píng)論 0 1
  • 閱讀筆記--大腦組成液肌,一半臨摹一半自畫線條感不好
    lamu卓瑪閱讀 345評(píng)論 2 1
  • 近日挟炬,聽一名青年教師講授一年級(jí)上冊(cè)《畫》一課,其中一個(gè)環(huán)節(jié)的設(shè)計(jì),我提出了不同的看法谤祖。在認(rèn)識(shí)“無”這個(gè)生字...
    陳小懶cycy閱讀 365評(píng)論 0 0