神經(jīng)搜索工具
特定語(yǔ)法
excutor
編寫(xiě)自己的flow;
class MyExecutor(Executor):
@requests
def foo(self, docs: DocumentArray, **kwargs):
docs[0].text = 'hello, world!'
docs[1].text = 'goodbye, world!'
@requests(on='/crunch-numbers')
def bar(self, docs: DocumentArray, **kwargs):
for doc in docs:
doc.tensor = torch.tensor(np.random.random([10, 2]))
flow
提供api接口,定義好輸入輸出,比較靈活吹艇;
一個(gè)項(xiàng)目可以由多個(gè)flow共同決定
可以將寫(xiě)好的flow放到hub上快速加載
Hub
Jcloud
示例:
01:搜索系統(tǒng)
整體框架
- 輸入:電影名稱,描述昂拂,電影類型
- 輸出:電影單
流程
- 下周數(shù)據(jù)集
- 將數(shù)據(jù)集加載到Docarray中
- 將Docarray受神,進(jìn)行數(shù)據(jù)預(yù)處理,比如分詞格侯,分句等鼻听,然后生成向量表示。
- 構(gòu)建索引
- 將輸入進(jìn)行編碼联四,在索引中找到最佳匹配選項(xiàng)精算,通過(guò)api返回出來(lái)。
02構(gòu)建PDF搜索系統(tǒng)
流程
- 準(zhǔn)備pdf數(shù)據(jù)
- 解析pdf碎连;準(zhǔn)備pdf解析flow
- 文本處理以及分局分詞
- embedding
- 構(gòu)建索引
- 構(gòu)建輸入的flow;進(jìn)行匹配驮履,返回最近的索引
from docarray import DocumentArray
from jina import Flow
docs = DocumentArray.from_files("pdf_data/*.pdf", recursive=True)
flow = Flow()
flow = (
Flow()
.add(
uses="jinahub://PDFSegmenter",
install_requirements=True,
name="segmenter"
)
.add(
uses="jinahub://SpacySentencizer",
uses_with={"traversal_paths": "@c"},
install_requirements=True,
name="sentencizer",
)
.add(
uses="jinahub://TransformerTorchEncoder",
uses_with={"traversal_paths": "@cc"},
install_requirements=True,
name="encoder"
)
.add(
uses="jinahub://SimpleIndexer",
uses_with={"traversal_right": "@cc"},
install_requirements=True,
name="indexer"
)
)
flow.plot()
with flow:
docs = flow.index(docs, show_progress=True)
# 構(gòu)建搜索flow
search_flow = (
Flow()
.add(
uses="jinahub://TransformerTorchEncoder",
name="encoder"
)
.add(
uses="jinahub://SimpleIndexer",
uses_with={"traversal_right": "@cc"},
name="indexer"
)
)
search_term = "一種基于詞向量的hownet表示方法"
from docarray import Document
query_doc = Document(text=search_term)
with search_flow:
results = search_flow.search(query_doc, show_progress=True, return_results=True)
for match in results[0].matches:
print(match.text)
print(match.scores["cosine"].value)
print("---")