H2 全文檢索功能

在前面的文章中映穗，我們介紹了 H2 的一些特性以及為什么H2 適合應(yīng)用在測試環(huán)境中荆烈。H2 不但可以作為嵌入式數(shù)據(jù)庫、內(nèi)存數(shù)據(jù)庫使用裸燎。在適當(dāng)?shù)膱鼍跋驴梢赃x擇使用 H2 替換掉 SQLite顾瞻，還可利用 H2 內(nèi)存數(shù)據(jù)庫的特點(diǎn)，將它還提供了全文檢索的功能德绿。

H2 內(nèi)置了兩個全文檢索（FullText Search）的實(shí)現(xiàn)：

Native FullText Search荷荤。使用 H2 中內(nèi)置的全文檢索，將索引存儲在數(shù)據(jù)庫指定的表中脆炎。
Apache Lucene FullText Search梅猿。 H2 使用 Java 來進(jìn)行變得，因此可以依賴第三方庫來實(shí)現(xiàn)功能的擴(kuò)展秒裕，在

1 命令行中使用 Native FullText Search

下面的例子中有主要涉及到兩個表：Car（汽車）、Brand（廠商）钞啸，其中涉及到一些關(guān)鍵詞兩個表中都涉及到几蜻，通過全文檢索能夠快速定位到數(shù)據(jù)在這兩表中的位置。

1.1 創(chuàng)建表

創(chuàng)建 cars 和 brands 的表結(jié)構(gòu)体斩。

create SCHEMA TEST_SCHEMA;

create table TEST_SCHEMA.cars
(
    id   bigint generated always as identity not null,
    name varchar(20),
    introduce varchar(200),
    primary key (id)
);


create table TEST_SCHEMA.brands
(
    id   bigint generated always as identity not null,
    name varchar(20),
    primary key (id)
);

創(chuàng)建后如下圖：

00.png

1.2 創(chuàng)建索引

使用 FT_INIT() 來進(jìn)行全文檢索的初始化梭稚，初始化過程指定使用 H2 內(nèi)置的全文檢索功能。

create alias if not exists FT_INIT for "org.h2.fulltext.FullText.init";

CALL FT_INIT();

執(zhí)行完語句之后會創(chuàng)建名字為 FT 的 Schema絮吵，在 FT 中會創(chuàng)建幾個新的表弧烤，其中 INDEXS 中存儲的是建立索引的規(guī)則。

01.png

指定建立索引的表和列蹬敲。

CALL FT_CREATE_INDEX('TEST_SCHEMA', 'CARS', NULL);
CALL FT_CREATE_INDEX('TEST_SCHEMA', 'BRANDS', NULL);

FT_CREATE_INDEX 函數(shù)的

? 第一個參數(shù)指定的建立索引的 SCHEMA Name暇昂；

? 第二個參數(shù)是建立索引的 TABLE Name；

? 第三個參數(shù)是建立索引的列表伴嗡，當(dāng)為 NULL 時表示為所有列建立索引急波。

02.png

1.3 插入數(shù)據(jù)并查詢索引

insert into TEST_SCHEMA.cars values (1, 'benz A200', 'Benz A200 L Car'), (2, 'BMW 3', 'BMW 3 2.0L');

insert into TEST_SCHEMA.brands values (1, 'benz'), (2, 'BMW');

插入數(shù)據(jù)后，結(jié)構(gòu)如下：

03.png

搜索之前我們先確定要得到的結(jié)果瘪校，通過上圖澄暮，我們知道，包含關(guān)鍵字 benz 的關(guān)鍵字記錄一共有兩條阱扬。

cars 表中的 id 為 1 的記錄泣懊，出現(xiàn)在 name、introduce 兩列中麻惶，
brands 表中 id 為 1 的記錄馍刮，出現(xiàn)在 name 列中。

查詢關(guān)鍵字 benz 應(yīng)該得到 2 條記錄;

SELECT * FROM FT_SEARCH_DATA('benz', 0, 0);

搜索結(jié)果包含 5 個字段：

? SCHEMA: 搜索到的記錄所屬的 Schema 名稱用踩；

? TABLE: 搜索到的記錄所屬的 table 名稱

? COLUMNS: 搜索到的結(jié)果定位的 column 名

? KEYS：搜索到的結(jié)果記錄對應(yīng)的地址

? SCORE: 搜索到的結(jié)果評分渠退，在 H2 的 Native FullText Search 中 score 的值始終為 1.0

查詢結(jié)果如下：

04.png

另外搜索的結(jié)果是忽略大小寫的忙迁，一次搜索 BENZ 會得到的同樣的搜索結(jié)果。

經(jīng)過嘗試碎乃，H2 內(nèi)置的全文檢索是按照英文字符進(jìn)行分詞的,數(shù)字和字母分詞姊扔，如果是中文依然按照英文字符進(jìn)行分詞。

例如：

"馬自達(dá)梅誓，創(chuàng)馳藍(lán)天" 分詞后為 "馬自達(dá)恰梢，創(chuàng)馳藍(lán)天"

"創(chuàng)馳藍(lán)天,2.5L" 分詞后為"創(chuàng)馳藍(lán)天","2","5","L","創(chuàng)馳藍(lán)天,2",創(chuàng)馳藍(lán)天,2.5","創(chuàng)馳藍(lán)天,2.5L", "2.5", "5L", "2.5"。

了解分詞之后規(guī)則之后梗掰，在一些簡單的場景中就可以使用這種簡單的全文檢索功能嵌言。

1.4 刪除索引

# 刪除指定的庫
call FT_DROP('TEST_SCHEMA', 'CARS');

# 刪除全部索引
call FT_DROP_ALL();

2 Java 代碼中使用 H2 的全文檢索功能

Spring Boot 2.x 中使用的數(shù)據(jù)庫連接池為 HikariCP，

application.properties

spring.datasource.schema=schema.sql
spring.datasource.data=data.sql
spring.datasource.type=org.h2.jdbcx.JdbcDataSource

spring.jpa.show-sql=true
spring.jpa.hibernate.ddl-auto=update

schema.sql

create table cars
(
    id   bigint generated always as identity not null,
    name varchar(20),
    introduce varchar(200),
    primary key (id)
);


create table brands
(
    id   bigint generated always as identity not null,
    name varchar(20),
    primary key (id)
);


# 使用 H2 Native FullText Search 初始化
create alias if not exists FT_INIT for "org.h2.fulltext.FullText.init";
CALL FT_INIT();

# 創(chuàng)建索引
CALL FT_CREATE_INDEX('PUBLIC', 'CARS', NULL);
CALL FT_CREATE_INDEX('PUBLIC', 'BRANDS', NULL);

data.sql

insert into cars values (1, 'benz A200', 'Benz A200 L Car'), (2, 'BMW 3', 'BMW 3 2.0L');

insert into brands values (1, 'benz'), (2, 'BMW');

測試代碼

@SpringBootTest
class FullTextSearchTests {

    @Autowired
    private FullTextService fullTextService;

    @Test
    void should_got_2_record_when_fulltext_search_given_2_cars_records_and_2_brands_records() throws SQLException {
        List<FullTextSearchResult> results = fullTextService.search("benz");

        then(results.size()).isEqualTo(2);
    }

}

其他依賴的類：

Brand.java

@Data
@Entity
@Table(name = "brands")
public class Brand {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long id;

    private String name;
}

Car.java

@Data
@Entity
@Table(name = "cars")
public class Car {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long id;

    private String name;

    private String introduce;
}

BrandRepository.java

@Repository
public interface BrandRepository extends JpaRepository<Brand, Long> {
}

CarRepository.java

@Repository
public interface CarRepository extends JpaRepository<Car, Long> {
}

FullTextSearchResult.java 將全文檢索搜索結(jié)果封裝為該類及穗。

@Builder
@Data
public class FullTextSearchResult {

    private String schema;

    private String table;

    private String columns;

    private String keys;

    private BigDecimal score;
}

FullTextService.java

@Service
public class FullTextService {

    public static final int SEARCH_RESULT_LIMIT = 0;
    public static final int SEARCH_RESULT_OFFSET = 0;

    public static final int SCHEMA_INDEX = 1;
    public static final int TABLE_INDEX = 2;
    public static final int COLUMNS_INDEX = 3;
    public static final int KEYS_INDEX = 4;
    public static final int SCORE_INDEX = 5;


    @Autowired
    private DataSource dataSource;

    public List<FullTextSearchResult> search(String keyword) throws SQLException {
        List<FullTextSearchResult> results = new ArrayList<>();

        ResultSet resultSet = FullText.searchData(
                dataSource.getConnection(),
                keyword,
                SEARCH_RESULT_LIMIT,
                SEARCH_RESULT_OFFSET);
        while (resultSet.next()) {
            String schemaName = resultSet.getString(SCHEMA_INDEX);
            String tableName = resultSet.getString(TABLE_INDEX);
            Object[] columns = (Object[]) resultSet.getArray(COLUMNS_INDEX).getArray();
            String column = (String) columns[0];
            Object[] keys = (Object[]) resultSet.getArray(KEYS_INDEX).getArray();
            String key = (String) keys[0];
            BigDecimal score = resultSet.getBigDecimal(SCORE_INDEX);

            results.add(
                    FullTextSearchResult.builder()
                            .schema(schemaName)
                            .table(tableName)
                            .columns(column)
                            .keys(key)
                            .score(score)
                            .build());
        }

        return results;
    }
}

在提取全文檢索的結(jié)果時 H2 提供的類并不能方便的使用摧茴。因此可以添加 FullText 的代理類，將常用的方法進(jìn)行封裝埂陆。

3 使用 Apache Lucene 的全文檢索

由于 H2 是使用 Java 編寫的苛白，因此只需要引入 Apache Lucene 的類，即可進(jìn)行數(shù)據(jù)庫的擴(kuò)展焚虱。與Native FullText Search 不同购裙，使用 Apache Lucene 會講索引儲存在 Lucene 之中，并可以根據(jù) Lucene 提供的特性進(jìn)行分詞和索引的功能擴(kuò)展鹃栽。

另外當(dāng)前最新版本的 H2 數(shù)據(jù)庫支持 Apache Lucene 5.5 以及 8.0.x 版本躏率。

3.1 命令行中使用 Apache Lucene 創(chuàng)建索引

初始化使用：org.h2.fulltext.FullTextLucene.init

create alias if not exists FTL_INIT for "org.h2.fulltext.FullTextLucene.init";
CALL FTL_INIT();

其他操作均以 FTL_ 開頭的函數(shù)來進(jìn)行操作，例如: FTL_SEARCH_DATA()

3.2 H2 提供的了對 Apahce Lucene 操作的封裝類

可以使用 fulltext.FullTextLucene.searchData 類進(jìn)行數(shù)據(jù)的檢索民鼓。

更多 API 可參考H2 Database Java doc