上一節(jié)分析reciept產(chǎn)生過(guò)程的時(shí)候提到:reciept會(huì)為日志數(shù)據(jù)生成一個(gè)Bloom過(guò)濾器,那Bloom過(guò)濾器是用來(lái)干嘛的呢仇轻?有什么用呢?
一,Bloom過(guò)濾器的數(shù)據(jù)結(jié)構(gòu)和reciept創(chuàng)建Bloom的過(guò)程
type Bloom [BloomByteLength]byte
BloomByteLength = 256
Bloom 就是一個(gè)256個(gè)字節(jié)數(shù)組别凹。一共2048位。
我們看看怎么把龐大的收據(jù)日志數(shù)據(jù)放到bloom過(guò)濾器里面的洽糟。
func CreateBloom(receipts Receipts) Bloom {
bloomBin := new(big.Int)
for _, receipt := range receipts {
bloomBin.Or(bloomBin, LogsBloom(receipt.Logs))
}
return BytesToBloom(bloomBin.Bytes())
}
func LogsBloom(logs []*Log) *big.Int {
bin := new(big.Int)
for _, log := range logs {
bin.Or(bin, bloom9(log.Address.Bytes()))
for _, b := range log.Topics {
bin.Or(bin, bloom9(b[:]))
}
}
return bin
}
func bloom9(b []byte) *big.Int {
b = crypto.Keccak256(b[:])
r := new(big.Int)
for i := 0; i < 6; i += 2 {
t := big.NewInt(1)
b := (uint(b[i+1]) + (uint(b[i]) << 8)) & 2047
r.Or(r, t.Lsh(t, b))
}
return r
}
1炉菲,先看看bloom9(b []byte)算法函數(shù)。
1.1坤溃, 首先將傳入的數(shù)據(jù)拍霜,進(jìn)行hash256的運(yùn)算,得到一個(gè)32字節(jié)的hash
1.2薪介,然后取第0和第1字節(jié)的值合成一個(gè)2字節(jié)無(wú)符號(hào)的int祠饺,和2047做按位與運(yùn)算,得到一個(gè)小于2048的值b汁政,這個(gè)值就表示bloom里面第b位的值為1道偷。同理取第2,3 和第4,5字節(jié)合成另外兩個(gè)無(wú)符號(hào)int,增加在bloom里面的命中率记劈。
1.3勺鸦,也就是說(shuō)對(duì)于任何一個(gè)輸入,如果它對(duì)應(yīng)的三個(gè)下標(biāo)的值不都為1抠蚣,那么它肯定不在這個(gè)區(qū)塊中祝旷。 當(dāng)如如果對(duì)應(yīng)的三位都為1,也不能說(shuō)明一定在這個(gè)區(qū)塊中嘶窄。 這就是布隆過(guò)濾器的特性怀跛。
1.4,這三個(gè)數(shù)取或柄冲,得到一個(gè)bigInt吻谋,代表這個(gè)傳參數(shù)據(jù)的bloom9值。
2现横,LogsBloom(logs []*Log)方法把日志數(shù)據(jù)轉(zhuǎn)成對(duì)應(yīng)的bloom9值漓拾,包括日志的合約地址以及每個(gè)日志Topic
3,CreateBloom(receipts Receipts)方法創(chuàng)建收據(jù)的bloom
3.1戒祠,創(chuàng)建一個(gè)空的bigInt bloomBin骇两,遍歷receipts,取得receipt里的日志姜盈,調(diào)用LogsBloom(receipt.Logs)將取得所有日志的bloom值按位或和到bloomBin低千。這意味著bloomBin包括了所有日志的bloom9數(shù)據(jù)。
3.2馏颂,調(diào)用BytesToBloom(bloomBin.Bytes())方法示血,把bloomBin加入?yún)^(qū)塊的bloom過(guò)濾器中棋傍,這時(shí)Bloom過(guò)濾器就有了本次交易的所有收據(jù)。
3.3难审,需要說(shuō)明的是Bloom過(guò)濾器只是提供一個(gè)查找數(shù)據(jù)是否存在的工具瘫拣,它本身不包含任何數(shù)據(jù)。
4, BloomLookup()方法查找對(duì)應(yīng)的數(shù)據(jù)是否在bloom過(guò)濾器里面告喊。
func BloomLookup(bin Bloom, topic bytesBacked) bool {
bloom := bin.Big()
cmp := bloom9(topic.Bytes()[:])
return bloom.And(bloom, cmp).Cmp(cmp) == 0
}
先將傳入的數(shù)據(jù)轉(zhuǎn)成bloom9值麸拄,傳入的bloomBin 轉(zhuǎn)成bigInt。根據(jù)按位與操作葱绒,判斷傳入的值是否在Bloom過(guò)濾器里面感帅。
二斗锭,Bloom過(guò)濾器的實(shí)際應(yīng)用
bloom過(guò)濾器是用來(lái)快速的查找log的地淀,那以太坊是如何用bloom過(guò)濾器來(lái)查找的呢?
想要要找某一條log岖是,如果從區(qū)塊鏈的頭區(qū)塊開(kāi)始帮毁,根據(jù)區(qū)塊頭的hash依次開(kāi)始查找的話是效率比較低的,每個(gè)區(qū)塊寫(xiě)在本地?cái)?shù)據(jù)庫(kù)是散列存儲(chǔ)的豺撑, 會(huì)增加很多io請(qǐng)求烈疚,io請(qǐng)求的速度很慢的。如何能快速的找到目的區(qū)塊聪轿,這時(shí)候就要用到Chain_Indexer爷肝。以太坊的BloomIndexer具體實(shí)現(xiàn)了Chain_Indexer,可以認(rèn)為是Chain_Indexer的派生類陆错。
Chain_Indexer的初始化:
func NewChainIndexer(chainDb, indexDb ethdb.Database, backend ChainIndexerBackend, section, confirm uint64, throttling time.Duration, kind string) *ChainIndexer {
c := &ChainIndexer{
chainDb: chainDb,
indexDb: indexDb,
backend: backend,
update: make(chan struct{}, 1),
quit: make(chan chan error),
sectionSize: section,
confirmsReq: confirm,
throttling: throttling,
log: log.New("type", kind),
}
// Initialize database dependent fields and start the updater
c.loadValidSections()
go c.updateLoop()
return c
}
chainDb是整個(gè)區(qū)塊鏈的Db
indexDb是這個(gè)BloomIndexer的Db
sectionSize等于4096灯抛,把每4096個(gè)區(qū)塊劃到一個(gè)section中
loadValidSections,取得indexDb里面存放的section的數(shù)量
c.updateLoop是chainIndexer 更新的主循環(huán)音瓷,有新的區(qū)塊对嚼,或者有新的沒(méi)有在indexDb里面存放的section產(chǎn)生都會(huì)send到c.updateLoop的goroutine里面去。
func (c *ChainIndexer) updateLoop() {
var (
updating bool
updated time.Time
)
for {
select {
case errc := <-c.quit:
// Chain indexer terminating, report no failure and abort
errc <- nil
return
case <-c.update:
// Section headers completed (or rolled back), update the index
c.lock.Lock()
if c.knownSections > c.storedSections {
// Periodically print an upgrade log message to the user
if time.Since(updated) > 8*time.Second {
if c.knownSections > c.storedSections+1 {
updating = true
c.log.Info("Upgrading chain index", "percentage", c.storedSections*100/c.knownSections)
}
updated = time.Now()
}
// Cache the current section count and head to allow unlocking the mutex
section := c.storedSections
var oldHead common.Hash
if section > 0 {
oldHead = c.SectionHead(section - 1)
}
// Process the newly defined section in the background
c.lock.Unlock()
newHead, err := c.processSection(section, oldHead)
if err != nil {
c.log.Error("Section processing failed", "error", err)
}
c.lock.Lock()
// If processing succeeded and no reorgs occcurred, mark the section completed
if err == nil && oldHead == c.SectionHead(section-1) {
c.setSectionHead(section, newHead)
c.setValidSections(section + 1)
if c.storedSections == c.knownSections && updating {
updating = false
c.log.Info("Finished upgrading chain index")
}
c.cascadedHead = c.storedSections*c.sectionSize - 1
for _, child := range c.children {
c.log.Trace("Cascading chain index update", "head", c.cascadedHead)
child.newHead(c.cascadedHead, false)
}
} else {
// If processing failed, don't retry until further notification
c.log.Debug("Chain index processing failed", "section", section, "err", err)
c.knownSections = c.storedSections
}
}
// If there are still further sections to process, reschedule
if c.knownSections > c.storedSections {
time.AfterFunc(c.throttling, func() {
select {
case c.update <- struct{}{}:
default:
}
})
}
c.lock.Unlock()
}
}
}
1绳慎,c.updateLoop收到update的通知后纵竖,看是否有已知的未寫(xiě)入indexDb的section。
2杏愤,調(diào)用c.processSection(section, oldHead)生成新的section
func (c *ChainIndexer) processSection(section uint64, lastHead common.Hash) (common.Hash, error) {
c.log.Trace("Processing new chain section", "section", section)
// Reset and partial processing
if err := c.backend.Reset(section, lastHead); err != nil {
c.setValidSections(0)
return common.Hash{}, err
}
for number := section * c.sectionSize; number < (section+1)*c.sectionSize; number++ {
hash := GetCanonicalHash(c.chainDb, number)
if hash == (common.Hash{}) {
return common.Hash{}, fmt.Errorf("canonical block #%d unknown", number)
}
header := GetHeader(c.chainDb, hash, number)
if header == nil {
return common.Hash{}, fmt.Errorf("block #%d [%x…] not found", number, hash[:4])
} else if header.ParentHash != lastHead {
return common.Hash{}, fmt.Errorf("chain reorged during section processing")
}
c.backend.Process(header)
lastHead = header.Hash()
}
if err := c.backend.Commit(); err != nil {
c.log.Error("Section commit failed", "error", err)
return common.Hash{}, err
}
return lastHead, nil
}
2.1靡砌,調(diào)用c.backend.Reset(section, lastHead)產(chǎn)生一個(gè)待組裝的section,每個(gè)section中存在一個(gè)bloom過(guò)濾器珊楼。
2.2通殃,把number等于section * c.sectionSize到(section+1)*c.sectionSize的block依次加入到待組裝的section中。并把這些block的header.bloom加入到section的bloom過(guò)濾器中亥曹。
2.3邓了,調(diào)用c.backend.Commit()恨诱,把新的section寫(xiě)入db。返回最近的那block的header骗炉。
3照宝,更新sectionHead和ValidSctions,如果還有新的沒(méi)有在db里面的section的話句葵,在throttling時(shí)間后在循環(huán)更新一次厕鹃。
三,外部調(diào)用接口查找log的流程
PublicFilterAPI提供了給外部rpc調(diào)用的過(guò)濾查找接口乍丈。比如GetLogs()方法
func (api *PublicFilterAPI) GetLogs(ctx context.Context, crit FilterCriteria) ([]*types.Log, error) {
// Convert the RPC block numbers into internal representations
if crit.FromBlock == nil {
crit.FromBlock = big.NewInt(rpc.LatestBlockNumber.Int64())
}
if crit.ToBlock == nil {
crit.ToBlock = big.NewInt(rpc.LatestBlockNumber.Int64())
}
// Create and run the filter to get all the logs
filter := New(api.backend, crit.FromBlock.Int64(), crit.ToBlock.Int64(), crit.Addresses, crit.Topics)
logs, err := filter.Logs(ctx)
if err != nil {
return nil, err
}
return returnLogs(logs), err
}
1剂碴,F(xiàn)ilterCriteria是外部請(qǐng)求的過(guò)濾條件,可以根據(jù)起始區(qū)塊轻专,日志的合約地址忆矛,日志topics的hash值來(lái)設(shè)置過(guò)濾條件。
2请垛,以太坊內(nèi)部根據(jù)FilterCriteria催训,創(chuàng)建一個(gè)過(guò)濾器,把合約地址和topics的hash作為bloombit的匹配器的匹配條件宗收。
3漫拭,調(diào)用filter.Logs(ctx)來(lái)獲取日志
func (f *Filter) Logs(ctx context.Context) ([]*types.Log, error) {
// Figure out the limits of the filter range
header, _ := f.backend.HeaderByNumber(ctx, rpc.LatestBlockNumber)
if header == nil {
return nil, nil
}
head := header.Number.Uint64()
if f.begin == -1 {
f.begin = int64(head)
}
end := uint64(f.end)
if f.end == -1 {
end = head
}
// Gather all indexed logs, and finish with non indexed ones
var (
logs []*types.Log
err error
)
size, sections := f.backend.BloomStatus()
if indexed := sections * size; indexed > uint64(f.begin) {
if indexed > end {
logs, err = f.indexedLogs(ctx, end)
} else {
logs, err = f.indexedLogs(ctx, indexed-1)
}
if err != nil {
return logs, err
}
}
rest, err := f.unindexedLogs(ctx, end)
logs = append(logs, rest...)
return logs, err
}
3.1,如果沒(méi)有設(shè)定起始位置混稽,就認(rèn)為從最新區(qū)塊的header開(kāi)始采驻。找到開(kāi)始位置區(qū)塊對(duì)應(yīng)的的section,如果開(kāi)始位置在section里面就走f.indexedLogs()在chainIndexer里面找log匈勋,如果不是就調(diào)用f.unindexedLogs()不再chainIndexer里面找礼旅。
3.2,f.unindexedLogs()相對(duì)簡(jiǎn)單颓影。
func (f *Filter) unindexedLogs(ctx context.Context, end uint64) ([]*types.Log, error) {
var logs []*types.Log
for ; f.begin <= int64(end); f.begin++ {
header, err := f.backend.HeaderByNumber(ctx, rpc.BlockNumber(f.begin))
if header == nil || err != nil {
return logs, err
}
if bloomFilter(header.Bloom, f.addresses, f.topics) {
found, err := f.checkMatches(ctx, header)
if err != nil {
return logs, err
}
logs = append(logs, found...)
}
}
return logs, nil
}
3.2.1各淀,因?yàn)闆](méi)有并入section的區(qū)塊都是比較新的區(qū)塊,數(shù)量也不多诡挂。直接從最新的區(qū)塊開(kāi)始遍歷查找就可以了碎浇。
3.2.2,bloomFilter(header.Bloom, f.addresses, f.topics)方法璃俗,根據(jù)合約地址和topics的bloom9值在header的bloom過(guò)濾器中按位與操作奴璃,看是否在這個(gè)區(qū)塊中。
3.2.3城豁,如果找到這個(gè)block苟穆,調(diào)用checkMatches方法在block里面查找對(duì)應(yīng)的log
func (f *Filter) checkMatches(ctx context.Context, header *types.Header) (logs []*types.Log, err error) {
// Get the logs of the block
logsList, err := f.backend.GetLogs(ctx, header.Hash())
if err != nil {
return nil, err
}
var unfiltered []*types.Log
for _, logs := range logsList {
unfiltered = append(unfiltered, logs...)
}
logs = filterLogs(unfiltered, nil, nil, f.addresses, f.topics)
if len(logs) > 0 {
// We have matching logs, check if we need to resolve full logs via the light client
if logs[0].TxHash == (common.Hash{}) {
receipts, err := f.backend.GetReceipts(ctx, header.Hash())
if err != nil {
return nil, err
}
unfiltered = unfiltered[:0]
for _, receipt := range receipts {
unfiltered = append(unfiltered, receipt.Logs...)
}
logs = filterLogs(unfiltered, nil, nil, f.addresses, f.topics)
}
return logs, nil
}
return nil, nil
}
3.2.3.1 調(diào)用ethApi的f.backend.GetLogs(ctx, header.Hash())方法,找到這個(gè)區(qū)塊的所有收據(jù)下的所有日志。
3.2.3.2 調(diào)用filterLogs(unfiltered, nil, nil, f.addresses, f.topics)雳旅,根據(jù)f.addresses, f.topics過(guò)濾出想要的logs跟磨。如果第一個(gè)log的hash是空的,需要通過(guò)light client重現(xiàn)獲取一遍所有的日志攒盈,再走一下過(guò)濾抵拘。
3.3,f.indexedLogs() 在chainIndexer里面查找日志
func (f *Filter) indexedLogs(ctx context.Context, end uint64) ([]*types.Log, error) {
// Create a matcher session and request servicing from the backend
matches := make(chan uint64, 64)
session, err := f.matcher.Start(ctx, uint64(f.begin), end, matches)
if err != nil {
return nil, err
}
defer session.Close()
f.backend.ServiceFilter(ctx, session)
// Iterate over the matches until exhausted or context closed
var logs []*types.Log
for {
select {
case number, ok := <-matches:
// Abort if all matches have been fulfilled
if !ok {
err := session.Error()
if err == nil {
f.begin = int64(end) + 1
}
return logs, err
}
f.begin = int64(number) + 1
// Retrieve the suggested block and pull any truly matching logs
header, err := f.backend.HeaderByNumber(ctx, rpc.BlockNumber(number))
if header == nil || err != nil {
return logs, err
}
found, err := f.checkMatches(ctx, header)
if err != nil {
return logs, err
}
logs = append(logs, found...)
case <-ctx.Done():
return logs, ctx.Err()
}
}
}
indexedLogs啟動(dòng)一個(gè)匹配器來(lái)查找Filter條件下對(duì)應(yīng)的區(qū)塊型豁,這一節(jié)暫不分析f.matcher的工作原理僵蛛。
找到對(duì)應(yīng)區(qū)塊,接下來(lái)的事情就和unindexedLogs的處理一樣了迎变。
總結(jié):
以太坊的bloom過(guò)濾器大大的提高了查詢的效率充尉。以太坊先創(chuàng)建topics的bloom,再創(chuàng)建logs的bloom衣形,再創(chuàng)建收據(jù)的bloom驼侠,在創(chuàng)建header的bloom,最后創(chuàng)建block的bloom泵喘,一步一步構(gòu)建上去泪电。于此對(duì)應(yīng)的般妙,在查找日志的過(guò)程正好相反纪铺,先在block的bloom里面找,再在header的bloom里面找碟渺,再在收據(jù)的bloom里面找鲜锚,直到找到最終的日志。