第一章_德塔自然語言圖靈系統(tǒng)

第一章_德塔自然語言圖靈系統(tǒng)

基礎(chǔ)應(yīng)用:?元基催化與肽計(jì)算 編譯機(jī)的語言分析機(jī)

知識(shí)來源, 作者對(duì)分詞技術(shù)首先并不陌生, 感謝國內(nèi)良好且扎實(shí)的免費(fèi)義務(wù)教育和付費(fèi)學(xué)歷教育, 作者客觀的系統(tǒng)學(xué)習(xí)了16年國學(xué)語文和英語基礎(chǔ). 對(duì)語文語言學(xué)的 節(jié)偏旁, 音, 字, 詞, 句, 段, 章, 頂針, 搭配, 比擬修辭, 陳述, 議論, 散文, 古文, 詩詞, 諺語, 成語, 語法結(jié)構(gòu), 多義, 多語意識(shí), 辯論, 多態(tài), 諧音, 歸納等語言學(xué)細(xì)節(jié)有基礎(chǔ)的系統(tǒng)觀和個(gè)人理解力.

其次感謝2009~2010法國ESIEE的Pascal教授, 在做MP6法語郵件項(xiàng)目6個(gè)月中, 作者系統(tǒng)的學(xué)習(xí)了契形文字的構(gòu)造思想, 拉丁文的字意, 接觸了Flech,

Flesher分詞, 弗萊士元音詞根搭配, 詞綴的組合方式. 拉丁語系語言的意識(shí)理解密度. 和歐娜法語培訓(xùn), 關(guān)于法語名詞的性別表達(dá), 法語發(fā)音,法語書寫等細(xì)節(jié)課程.

有了這些基礎(chǔ), 作者感謝2011美國ELS10個(gè)月的學(xué)習(xí), 作者從中國臺(tái)灣(繁體), 中東, 日本, 朝鮮, 韓國和俄羅斯同學(xué)那學(xué)習(xí)了這5個(gè)國家的文字拼音組成方式, 發(fā)音方式, 書寫方式和劃分方式. 另外, 在ELS比較系統(tǒng)的重學(xué)了一遍美式英語所有英語時(shí)態(tài)語法. 如美式發(fā)音, 英語文章的詞匯語句搭配方式, 辯論, 形容詞的描述序列, 諺語, 里語和 復(fù)句同位詞的詞序理解方式. 作者有一篇散文對(duì)這段記憶的敘述.

Story - ELS: Wearing a formal suit and putting on a pair of black leather shoes, tightly stood in the hall of ELS. I felt happy and some fear, which was intwined in my heart. A rectangular, black table stood on the soft, gray and smooth carpet. I carried a package, which was loaded with important documents. Where I could see a lot of international students with fashionable clothes. I felt culture shock because the faces were not familiar to me. Ordinary I was used to seeing people with black hair, not people with yellow or brown hair. In China, all the people have dark eyes but here the people have blue, brown or green eyes, as I looked at all the strange faces and eyes, then began to feel anxious. I was reading a newspaper when the phone rang. Glancing around the hall quickly, and found everyone focused on their work. My heart had became calm after I put my cell phone on mute. To the left I could see through the window where the director's office was. The furnishings were a bit small but arranged orderly in a small area. A corridor with access to the LTC was next to the door. I was so happy to see where some Chinese people were waiting for their interviews.

To the right was a gate to the small hall, I saw a soft sofa here, ELS's teacher's photos were hanging on the white wall next to the sofa. I stood near the back gate where I could see some magazines in the holding hook, straight ahead was a vast globe map, I found out my hometown immediately and pointed it out suddenly, and felt friendly and warmly. When I went into Asian room, I could remember very clearly that to my right was a rectangular, new white board where Mr. Joel wrote our English sessions. I quietly seated at the first rank and looked toward at front of my eye sight, two narrow windows with white blinds on them, looked to pass the windows, where I could see many tall and straight pine trees with lush and green leaves on them. My mood had brightened lot. Everyone has a door in the deep heart, and the beautiful environment is often the best key, as a famous says" If you can't change the world, change yourself.".

作者還要感謝201909+北大楊葉,作者持續(xù)了3個(gè)月系統(tǒng)的學(xué)習(xí)了書法筆畫研究關(guān)于 蝌蚪文, 金文, 纂, 小篆, 片假文, 竹節(jié)文, 漢簡, 晉刻和隸書的書寫方式, 疊加方式, 組合方式. 以及瀏陽三聯(lián)201803+ 作者在這待了2個(gè)星期實(shí)習(xí)怎么思維表達(dá)當(dāng)老師.

作者最后要感謝lucene分詞和中科院的基于lucene分詞的上層隱馬科夫中文概率分詞分析插件, 2009年給作者啟蒙了軟件分詞算法的意識(shí)绘雁,以及閱讀了華中科技等大學(xué)的僅僅基于拼音概率分詞的論文思維.

有了上述這6點(diǎn)基礎(chǔ), 再加上國外5年課程學(xué)歷用英語教學(xué), 于是作者在編碼養(yǎng)療經(jīng)華瑞集系統(tǒng)的時(shí)候, 因?yàn)閘ucene中文分詞速度逐漸跟不上億字的分詞搜索需求, 有魄力201808+ 自己重新開始寫個(gè)分詞軟件并一直優(yōu)化它.

作者以為沒有學(xué)高三英語(當(dāng)時(shí)瀏陽三中轉(zhuǎn)校瀏陽一中, 人家已經(jīng)學(xué)完了在復(fù)習(xí)了, 作者少學(xué)了很多科目) 會(huì)有遺憾, 現(xiàn)在發(fā)現(xiàn), 當(dāng)年的這些遺憾因?yàn)檫@些年持續(xù)的作業(yè), 都通過另一種價(jià)值方式體現(xiàn)了.

羅瑤光

測試速度: 單機(jī)聯(lián)想Y7000筆記本win10 實(shí)測峰值每秒 中文分詞1630~1650萬+中文字, 詞庫65000+, 函數(shù)準(zhǔn)確率100%, 缺失語法函數(shù) 0. 3%-, 算法準(zhǔn)確率 99. 7%+, 100%完整開放源碼, 在api與書籍中.

測試效果: 輸入: 如果從容易開始于是從容不迫天下等于是非常識(shí)時(shí)務(wù)必為俊杰沿海南方向逃跑他說的確實(shí)在理結(jié)婚的和尚未結(jié)婚的提高產(chǎn)品質(zhì)量中外科學(xué)名著內(nèi)科學(xué)是臨床醫(yī)學(xué)的基礎(chǔ)內(nèi)科學(xué)作為臨床醫(yī)學(xué)的基礎(chǔ)學(xué)科重點(diǎn)論述人體各個(gè)系統(tǒng)各種疾病的病因發(fā)病機(jī)制臨床表現(xiàn)診斷治療與預(yù)防.

輸出結(jié)果: 如果+從+容易+開始+于是+從容不迫+天下+等于+是非+常識(shí)+時(shí)務(wù)+必+為+俊杰+沿海+南+方向+逃跑+他+說+的+確實(shí)+在理+結(jié)婚+的+和+尚未+結(jié)婚+的+提高+產(chǎn)品質(zhì)量+中外+科學(xué)+名著+內(nèi)科學(xué)+是+臨床+醫(yī)學(xué)+的+基礎(chǔ)+內(nèi)科學(xué)+作為+臨床+醫(yī)學(xué)+的+基礎(chǔ)+學(xué)科+重點(diǎn)+論述+人體+各個(gè)+系+統(tǒng)+各種+疾病+的+病因+發(fā)病+機(jī)制+臨床+表現(xiàn)+診斷+治療+與+預(yù)防+++++


定義: 德塔分詞是一種 基于神經(jīng)網(wǎng)絡(luò)索引字典進(jìn)行文章文字關(guān)聯(lián)切割, 然后進(jìn)行前序遍歷其詞性組合匹配, 按文學(xué)語法定義搭配 的規(guī)則切詞引擎.

德塔分詞的催化切詞優(yōu)化方式主要包含:

1 索引字典進(jìn)行細(xì)化拆分加速. 細(xì)化微分能夠有效的減少內(nèi)存運(yùn)算體積, 減少資源占用. 從而提高當(dāng)前的關(guān)于堆棧的搜索和操作速度. 2 函數(shù)進(jìn)行使用頻率統(tǒng)計(jì)排列加速優(yōu)化. 函數(shù)的使用頻率統(tǒng)計(jì)排列一旦有高頻提前的操作, 那么具備了隊(duì)列優(yōu)先意識(shí), 可進(jìn)行代謝.

3 動(dòng)態(tài)類卷積遍歷內(nèi)核的關(guān)鍵字優(yōu)化. 動(dòng)態(tài)卷積內(nèi)核的總數(shù)直接關(guān)聯(lián)到計(jì)算復(fù)雜度, 計(jì)算越復(fù)雜, 成本便越高, 時(shí)間開銷也越大, 當(dāng)然自適應(yīng)精度也相應(yīng)提高. 4 函數(shù)文件和 函數(shù)文件名 進(jìn)行新陳代謝, 二次新陳代謝優(yōu)化索引編碼加速. 函數(shù)越細(xì)化, 邏輯便越簡潔, 那么單位的call計(jì)算便越均勻, 這種balanced操作越有條理. 5文學(xué)切詞語法函數(shù)的細(xì)化優(yōu)化加速. 文學(xué)切詞問題便更有針對(duì)性. 定義者羅瑤光

Definition: Deta parser word segmentation was a word cutting engine where based on the index forest dictionary of the neural network map, which carried out the word associational cutting, and then circulating the traversal function of the part of speech(POS) and combinational matching, and defined the collocation according to the Chinese literary grammar.

1 The index dictionary could accelerate the refinement and splitting. The refinement differentiation could effectively reduce the memory operation volume and the occupation of resources, so as to improve the current stack search and operational speed.

2 The function could accelerate the optimization of the frequency’s usage and statistical arrangement. Once the function had the high-frequenct advance operation, meant It had a queue-priority consciousness and could be metabolized here. For example, the higher frequent logic sections could be arranged at the top by using Sequences of Von-Neumann (from top to bottom, from left to right).

3 The total number of dynamic convolutional kernels was directly related to the computational complexity. The more complex the computation was, the higher the cost would be. Of course the higher the time cost would be, the adaptive accuracy would also be improved accordingly.

4 Functional prototypes and function-file names were metabolized by PDE, and the secondary metabolism was optimized by Initons to accelerate index encoding. The more detailed the function, the more concise logic, the more uniform the unit-called calculation, and the more organized the balanced operation.

5 The refinement and optimization of literary lexical functions were accelerated. The literary lexical problem was more targeted.

分詞


1德塔的分詞是一種前序《排隊(duì)論》逐字遍歷文字索引, 通過索引中的詞匯匹配 按長度進(jìn)行提取, 然后將提取的詞匯串 進(jìn)行詞性切分的過程. refer page 12 ~

2德塔的分詞文字索引采用關(guān)聯(lián)分類生成小文件map集(詞性map, 詞長map, 詞類map), 進(jìn)行整體加速,作為一個(gè)催化細(xì)化過程. refer page 44, 54, 92,

3德塔的詞匯匹配目前有多個(gè)國家語言字符集, 可統(tǒng)一, 可拆分, 目前最大劃分處理長度為4, 劃分切詞采用動(dòng)態(tài) 類似CNN 卷積(遍歷pos函數(shù)語句的內(nèi)核計(jì)算, 非卷積的積分疊加計(jì)算) StringBuilder核做POS識(shí)別. refer page 45, 119, 120,

4德塔的詞性切分按照4字詞 3字詞 2字詞 單字進(jìn)行逐級(jí)按詞匯的 POS搭配語法模式進(jìn)行歸納, 按文本的POS出現(xiàn)頻率進(jìn)行流水閥門方式優(yōu)化. refer page 97, 116,

Deta parser, a sentence and word segmentational marching tool(NERO-NLP-POS), was based on an in-order sequence verbals computation, the computer monitors the humanoid specification, to read articles, sentences by each word one by one as the river flows, then cut the word link list by using stop method such as the index length recognizing, word dictionary marching, then extracts the pre-materials for the continuing POS(part of speech) process.

At the first. The author built a lot of association maps and classification sets, which did a nice verbal data storage of all kinds of the literary otho corpus(NERO), for instance, POS maps, word length maps and word type maps, to better catalyz the system-tunning of accelerations.

Each lexical map could make a combination and classificational definition, the author used the StringBuilder function to do the word segmentation by non-conventional different kernel computations. Like the water sequence flowed from top to bottom, from left to right, then gathered statistics of the frequency prototype usages, made the queued optimizations at the same time and catalyzed the system-tunning of accelerations.

The word segmentation and Its POS cut ways by following 4 Chinese chars words, 3 chars words, 2 chars words to loop check the POS lexical dictionary-map storages. Kernel computations work for finding a marching verbals word, then returned the response to the NLP engine.

Yaoguang.Luo


(德塔分詞邏輯, 已經(jīng)糾正紅色字 ‘卷積’改為‘內(nèi)核’, 因?yàn)榈谒男抻啺姹疽寻矸e兩字, ppt所有書中的原圖糾正內(nèi)容統(tǒng)一更新在第5版, 羅瑤光)

排序,

1 德塔分詞排序思想原型采用Sir Charles Antony Richard Hoare 的 快速排序思想.

refer page 版權(quán)原因無文字收錄 已經(jīng)refer?快速排序算法_百度百科

2 德塔分詞排序源碼原型采用Introduction to Algorithms 的 快速排序4代源碼.

refer page 版權(quán)原因無源碼收錄 已經(jīng)refer?https://github. com/yaoguangluo/Data_Processor/blob/master/DP/sortProcessor/Quick_4D_Sort.

java

3 基于1 和 2原型, 德塔分詞排序 采用Theory on YAOGUANG's Array Split Peak Defect 的微分催化算子優(yōu)化思想 2013年開始優(yōu)化. refer page 247, 248, 250, 529, 620,

4 優(yōu)化過程為 小高峰左右比對(duì)法,波動(dòng)算子過濾思想, 離散條件歸納微分思想(如笛摩根計(jì)算, 流水閥門計(jì)算等), 目前為TopSort5D.

refer page 658, 下冊(cè)134

5 德塔分詞的函數(shù)優(yōu)化方式和算法優(yōu)化方式, 包括分詞引擎, 讀心術(shù), NLP分析等核心組件均采用微分催化系統(tǒng). refer page 661,

The ordinary mode of the Deta catalytic sorting function was based on the quicksort 4D theory with Sir Charles Antony Richard Hoare, the author refered a book <Introduction to Algorithms> here. Since YaoguangLuo had simulated the java ‘Quick_4D_Sort’ source code and executed It successfully in computer IDE, then has been optimizing Its sort logic since 2013. The recent version of the source code was TopSort5D. And the optimized way of the sort arrangement included defect peak avoiding, tunning of the constant values, balancing of the computing sets and the discrete conditional differentiations (Demorgan, Frequency flows etc). And now those things widely were used in Deta’s catalytic family of technical community (Parser, Word segments, Mind reading, NLP computing etc).


神經(jīng)網(wǎng)絡(luò)索引


1德塔分詞的詞匯字典用map進(jìn)行索引, 因?yàn)镴dk8+的map對(duì)象的key支持2分搜索, 搜索速度到了峰值. refer page, 129, 131

2德塔分詞的索引不斷的將大map進(jìn)行細(xì)化分類, 如詞長map, 詞類map, 詞性map, 讓搜索再次加速.refer page 55,

3德塔分詞的索引map支持 2次組合計(jì)算, 支持分布式服務(wù)器進(jìn)行索引cache. 關(guān)于2次組合計(jì)算作者不建議單機(jī)使用. refer page 92,

4德塔分詞map的key用string的 char對(duì)應(yīng)ASCII int進(jìn)行標(biāo)識(shí)來執(zhí)行find key, 方便二分搜索存儲(chǔ)和 StringBuilder高速計(jì)算, 實(shí)現(xiàn)底層核統(tǒng)一. refer page 92

Nero Network Index Forest

1 Deta Parser did a word segmental indexed map by using humanoid sematic verbal dictionary, for the reason why using JDK8+ tool to do the map search logic, is that it had already integrated the binary search tree, balanced map tree arrangement and other technologies.

2 Deta Parser’s balanced binary search tree method made an observer mode of averaged classification with all types of the reflection java concurrent maps, those maps included the char word length, verbal types and part of speech corpus, etc. The author did it to accelerate the Nero-marching speedly for searching the words.

3 Deta Parser supported the secondary indexing computing combinations, this way could be suitable for the distributed cache searching systems. The author did not suggest this technology which be used on a single desktop.

4 For the computing logic, Finally Deta Parser functions used string builder to accelerate the searching engine.

?

神經(jīng)網(wǎng)絡(luò)索引的價(jià)值主要體現(xiàn)在2個(gè)地方, 切詞的關(guān)聯(lián)索引上和詞匯map索引上. 切詞的關(guān)聯(lián)索引價(jià)值, 主要體現(xiàn)在將詞匯的文字進(jìn)行鏈化提取, 這種鏈化計(jì)算方式將詞庫中本相對(duì)獨(dú)立的海量詞匯進(jìn)行了按人類語言文學(xué)中的頂針方法進(jìn)行了有效的前后長度關(guān)聯(lián)(NERO), 其價(jià)值有利于大文本的文字進(jìn)行有必要關(guān)聯(lián)鏈的 小段小段的提取(NLP), 類似擠牙膏一樣, 擠出來就刷牙用掉(POS).

詞匯map索引價(jià)值, 主要體現(xiàn)在 詞匯的文字進(jìn)行鏈化合理切分, 這種鏈化切分方式將詞庫中根據(jù)不同屬性的分類map來組合匹配按人類語言文學(xué)中的詞匯詞性和主謂賓搭配嚴(yán)謹(jǐn)定義來切分. 其價(jià)值在這些分類map可以自適應(yīng)設(shè)計(jì)和多樣化擴(kuò)展. 增加切詞準(zhǔn)確度和靈活度, 適應(yīng)各種不同的場景, 類似牙刷機(jī)制, 擠出牙膏根據(jù) 匹配不同的牙刷和刷牙方法(NERO + POS), 匹配適應(yīng)不同的口腔環(huán)境. 描述人 羅瑤光, 稍后優(yōu)化下.

The accomplishment of the neural network-index is mainly reflected in two sections, 1 for the relevanced index of word segmentation, and 2 for the lexical indexed map. The associated and relevanced index-value of word segmentation, is mainly reflected in the chained extraction of words. This chained calculation method effectively correlates the relatively independent of a large number of words in the thesaurus, according to the Thimble Theory in human language and Literature (Nero). The value of the big data documental process, splits the word chain links list into small chars-token (max 4) sections, and It is similar to a squeezing toothpaste, and a brushing teeth (POS) after a squeezed out with the DetaParser marching engine.

The index value of the lexical map is mainly reflected in the reasonable chain-segmentation of lexical characters. This chain of word segmental method, combines and matches the classified maps in the thesaurus, according to different attributes. And then separates them according to the rigorous definition of lexical POS and SVO’s collocation in human literary languages. The adaptive industrial system designed and diversified the expansion of this classification, will increase the accuracy and flexibility of word segmentation and adapt to different segmental scenes. Similar to the way of toothbrushes, the extruded toothpaste is matched to adapt to different oral cavity-environments, according to different toothbrushes and brushing methods (Nero + POS).

Author: Yaoguang.Luo 稍后持續(xù)優(yōu)化語法.

分詞在線性文本搜索中應(yīng)用,

1德塔分詞的搜索建立在map類的權(quán)重計(jì)算方法上, 不同的權(quán)重疊加產(chǎn)生的打分進(jìn)行排序輸出. refer page 下冊(cè)64

2權(quán)重的計(jì)算方法按詞性的主謂賓如代名動(dòng)形, 和 POS如 動(dòng)名形謂介分類. refer page 下冊(cè)66

3權(quán)重與詞長, 詞頻進(jìn)行耦合bit疊加計(jì)算(bit位計(jì)算比乘法要快一個(gè)數(shù)量級(jí)), 生成最終輸出結(jié)果. refer page 下冊(cè)68

4權(quán)重與詞長的 比值可以精度調(diào)節(jié), 確定搜索的精確性和記錄個(gè)人搜索偏好. refer page 下冊(cè)68


The Deta Parser word segmentation and Its applicationsin the linear text document environments.

1 There had a lot of rights weight by each indexed map, based on those right weights, Deta Parser did a marching score system to do the computation and calculation for the Chinese word segmentation logic.

2 The search weight of the computing logic, such as Subject Predicate Object(SVO), and part of speech(POS), for instance, Noun, Verb and Adjective etc.

3 To make a computing acceleration, the author injected a combination factor in the marching logics, such as bit calculation, frequency statistics and word length observations. Similars to the theory of Count Down Latch and Cyclic Barrier logic (made definitions first then proved, or proved first then did a conclusion) ways etc.

4 Above all things and logics once became JAVA transportations, the author set all global and local valuable scales to build the Foolishman- Self-Controller components to make the algorithms easy and simple.

Author: Yaoguang.Luo 稍后持續(xù)優(yōu)化語法.


動(dòng)態(tài) POS函數(shù)流水閥門細(xì)化遍歷內(nèi)核匹配


1動(dòng)態(tài)的核分為前序核和后序核兩種. 根據(jù)詞匯分析的位置進(jìn)行實(shí)時(shí)變動(dòng)更新. refer page 97

2前序核主要緩存存儲(chǔ)詞匯的位置和詞性, 用于POS詞性搭配的 POS函數(shù)流水閥門細(xì)化遍歷計(jì)算. refer page 97

3后序核主要緩存詞匯的切詞鏈后面準(zhǔn)備 跟進(jìn)的詞語. 用于POS語法的修正計(jì)算, 如連詞匹配. refer page 97

4內(nèi)核采用StringBuilder做核載體進(jìn)行計(jì)算加速. refer page 97

Dynamic River Flows Gate Function Marching and Circustantly Loop the POS Kernel Computing.

1 Dynamic kernel contains prefix and postfix two types, can read the word token one by one. It does dynamic computing also at the same time.

2 Prefix kernel stores a POS cache buffer by each current word piece of information such as positions, frequency etc, to accelerate the word marching.

3 Postfix relevant to the optimization of word marching and segmentation. For example, checking the conjunctional relationship and continuing the word token link list.

4 The algorithms kernel uses StringBuilder to do higher computing affections according to computer language grammar.


?

POS函數(shù)流水閥門細(xì)化遍歷前序內(nèi)核關(guān)系圖, 圖中舉例 如果是非常理想來進(jìn)行分詞. 首先通過索引字典森林長度匹配可以切分出 ‘如果’, ‘是非常’, ‘理想’, 3個(gè)索引關(guān)聯(lián)詞句, 作者詞庫無‘常理’詞匯, 如果有, 可另行討論. ‘如果’ 和 ‘理想’是比較穩(wěn)定的詞匯. ‘是非常’屬于三字詞, 于是開始流水閥門切分, 3字詞索引沒有 ‘是非常’ 這個(gè)詞匯, 于是開始流水閥門自然語言計(jì)算處理(如果三字詞有這個(gè)詞匯, 就流水閥門計(jì)算三字詞的詞性詞匯搭配, 如果有就return, 沒有同樣要更進(jìn)細(xì)化成2字詞來做流水法門. 這是該算法的強(qiáng)大之處). 首先拆分為‘是非-常’ 和 ‘是-非澄椋’ 這兩種詞匯, 于是開始分析兩種搭配詞匯的POS詞性, 通過分析每個(gè)詞匯的前后鏈接詞匯的詞性(如 ‘是非’的前鏈詞匯是‘如果’, ‘非常’的前鏈?zhǔn)恰恰?‘常’的前鏈?zhǔn)恰欠恰汀恰?‘理想’的前鏈包含‘城莅危’和‘非常’)來確定切詞, (這個(gè)詞匯搭配是嚴(yán)謹(jǐn)固定的語法, 不含概率計(jì)算事件. )如果2字詞搭配出現(xiàn)語法錯(cuò)誤和無索引搜索關(guān)聯(lián), 則更進(jìn)流水閥門至單字切詞, 圖中計(jì)算比較幸運(yùn)得到2字切詞計(jì)算結(jié)果, 按照流水閥門NERO-NLP-POS的水流計(jì)算, 在連副副 ‘如果-是-非呈也妫’ 計(jì)算時(shí)便return了結(jié)果, 沒有在計(jì)算到連名副‘如果-是非-扯闷埽’是因?yàn)檫B副副的語法計(jì)算的流水閥門高,優(yōu)先計(jì)算并輸出了. 描述人 羅瑤光


POS function gate river flows and their relationships.

POS functional gate river flows and their relationships. For example, the author did the word segmentation by using '如果是非常理想' in this sentence. At the first through the indexed forest mapped dictionary, Deta Parser could cut '如果是非常理想' into ‘如果’, ‘是非常’, ‘理想’ those three associated chars word sets token list. And in this result list, ‘如果’ and ‘理想’ these two lexical words seems to be immutably boned. ‘是非臣牒郏’ was a three chars word token, then did an inner marching computing by using POS functional gate river flows theory. And at this time, the orthos corpus mapped base of the author's Deta Parser system which could not find any verbals such as‘是非骋袄矗’, then continued do the two chars marched for the next step. About more powerful of these algorithms, was the Chinese chars literacy grammar marching system, for the chars segmental section, ‘是非常’ did a separation into two types such as ‘是非-匙倏酰’ and ‘是-非陈眨’, then analyzed contrast and distinguishment by these two segments. After analysis of each word and its prefix and postfix, POS combined with relationships, (The prefix token of ‘是非’ was ‘如果’, the prefix token of ‘非常’ was ‘是’, the prefix tokens of ‘沉钜埃’ were ‘是非' and ‘非’, and the prefix tokens of '理想’ were ‘初馄簦’ and ‘非趁薨玻’). This POS word segmentational theory was fixedly and immutably, which meant it should not contain any probability events here. If at this time, the DetaPaser did not find any associated chars relationships, then promoted to the next steps as reading and cutting sequence-list chars as single one by one. Above all, the result of the sample graph did a good show that DetaParser did a ‘如果-是-非吃1悖’ response because the priority of (Conjunction- Adj, v- Adj, v) was higher than (conjunction- noun- adj, v).

Author: Yaoguang Luo


2019年3月18日之前作者Github的 該算法函數(shù)編碼框架已經(jīng)出現(xiàn)

https://github. com/yaoguangluo/Deta_Parser/commit/25b90c9847d15df85c5c991448f2c271e0ad8106

注意: 鏈接的CNN 關(guān)鍵詞的歷史記錄 屬于作者用詞錯(cuò)誤, 作者當(dāng)年基礎(chǔ)學(xué)術(shù)累積不夠, 關(guān)于卷積的知識(shí)僅僅學(xué)了計(jì)算機(jī)視覺的理論課, 以為帶內(nèi)核計(jì)算的都叫CNN卷積



另外作者發(fā)現(xiàn)自己還有一個(gè)錯(cuò)誤, 就是以為序列鏈表方式計(jì)算就叫隱馬科夫鏈計(jì)算. 所以?CNN+隱馬可夫這兩個(gè)技術(shù)詞匯, 伴隨作者10年之久. 今天進(jìn)行ppt嚴(yán)謹(jǐn)定義, 翻閱大量定義文獻(xiàn)資料,才發(fā)現(xiàn)這些錯(cuò)誤. 予以糾正. 作者的ANN和RNN 出現(xiàn)的文本分析內(nèi)核計(jì)算才是真正的CNN卷積計(jì)算.

POS


Deta Parser的分詞詞性基于自身的詞性語料庫, 格式為 詞匯/詞性, 舉例如香蕉/名詞, deta的語料庫錄入系統(tǒng)函數(shù)作者的寫法是用string的contains 字符串來進(jìn)行map 索引登記, 于是這種格式有一個(gè)巨大的好處, 可以進(jìn)行復(fù)合標(biāo)注. 如果香蕉/水果名詞, 瀏陽/地理名詞城市名詞, 基于這種格式, 形容詞謂詞特指等復(fù)雜復(fù)合詞性可以很好的被計(jì)算機(jī)理解. 德塔分詞的詞性基于每兩個(gè)鄰近詞匯的固定搭配, 如主語后面必為謂語, 名詞+ 連詞+ 后面必為名詞, 形容詞+ 連詞+ 后面必為形容詞, 動(dòng)詞+ 后面必為賓語 +賓語補(bǔ)足語, 這種來自人類語言文學(xué)的嚴(yán)謹(jǐn)固定搭配定義分詞逐漸的取代了統(tǒng)計(jì)和概率論分詞. 這些價(jià)值全部融入Deta分詞api.?描述人 羅瑤光.

Deta POS

Deta parser of the word segmentation, was based on its corpus of POS or classes. The formative base was like a 'Verbal/POS'. For example ''Banana/Noun', the parser-engine might read the corpus base sequently, then to store the verbal 'Balana' as a key, and the POS was a key store. Also, the key store of POS could be a complex types of the annotational string. For example 'Banana/Noun' might be a 'Banana/FruitNoun', and one more example of 'LiuYang/CitynameGeographyNoun'. Meant the computer could easily understand a complex grammar-environment in how to parser the word correctly. Especially in a stable grammar-environment, such as Subject+ Predecate, Noun+ Conjunction+ Noun, Adjective+ Conjunction+ Adjective, Verb+ Subject or Adverb. Because of the strict and stable definition. Therefore Deta parser did not contain a probabilistic statistics about word segmentation.

1 德塔分詞的核心類, 包含了詞性的搭配切分所有函數(shù). refer page 97, 116



NLP


Deta

Parser的自然語言處理, 函數(shù)功能主要體現(xiàn)在基于詞匯索引森林的長度裁剪上, 中文的詞匯格式比較統(tǒng)一, 不像西方語的 元音搭配方式, 如一個(gè)詞匯中的元音含量的flech 弗萊士詞匯難度定義, 中文一般表達(dá)為 單字的文言詞, 雙字普通詞匯, 三字的俗語, 4字的成語, 5字以上一般為諺語和特定短語詞匯, 而中文的5字以上的短語詞匯某種意義上又可以進(jìn)行1234字拆分, 舉例 ‘巧媳婦難逃無米之炊’ 這9個(gè)字如果作為諺語詞匯出現(xiàn), 其實(shí)也可以分詞為‘巧+ 媳婦+ 難+ 逃+ 無米之炊’ 于是羅瑤光先生將長度最大值設(shè)為4. 在保障分詞的精準(zhǔn)度上, 進(jìn)行流水閥門的統(tǒng)計(jì)排列, 發(fā)現(xiàn)2字詞和單字詞的隨機(jī)文章中頻率比較高, 于是將2, 1字詞的處理函數(shù)靠前, 逐漸 deta的 NLP流水閥門切詞函數(shù)成型.因?yàn)檫@種方式, Deta POS的流水閥門也繼承了這種高頻優(yōu)先計(jì)算思維.?描述人 羅瑤光

Deta NLP

Deta parser of the Nature Language Process, was based on its map-forest of indexed length of lexicons. Because the formative word was combined from connected Chinese alphabetics. Meant totally different with european lexicons, a 'Flechs or Flesh' parser the ratio about amount of word-vowels per the word-length. Seemed the length of Chinese word commonly could parser as four types of 'one char of achaism or singleton', 'two chars of simple word', 'three chars of special word or slang', 'four chars of idiom and slang' and more. The 'more' meant an example of '巧媳婦難逃無米之炊' here, although it was a nine-chars of slang, but it could be separated out a tokens-list of '巧'+ '媳婦'+ '難'+ '逃'+ '無米之炊'. So the deta parser could easily make a recognition of this tokens-list by using 'Dynamic River Flows Gate Function Marching and Circustantly Loop the POS Kernel Computing'. Above tokens-list contained more 'one or two char words' of '巧'+ '媳婦'+ '難'+ '逃', so the priorty to process a class of 'one char-word' is more higher than the class of idiom and slang. The author considered it was an evolutional theory about priorty to high frequency.

Author Yaoguangluo 稍后優(yōu)化語法.


1 德塔分詞的核心類, 包含了詞性的詞長切分所有函數(shù). refer page 119, 120


ANN

德塔詞性的卷積計(jì)算ANN, 主要包含意識(shí)比率算子, 環(huán)境比率算子, 動(dòng)機(jī)比率算子, 情緒比率算子. 這個(gè)四個(gè)算子的組合計(jì)算產(chǎn)生了一些高級(jí)決策, 如 情感比重, 動(dòng)機(jī)比重, 詞權(quán)比重, 持續(xù)度, 趨勢比重, 預(yù)測比重, 猜想比重, 意識(shí)綜合. 這些決策在文本分析的領(lǐng)域可以擁有實(shí)際評(píng)估和決策的價(jià)值. 同時(shí)意識(shí)綜合 summing 也是德塔DNN計(jì)算的一個(gè)輸入?yún)?shù)組件, 用于文本中心思想詞匯標(biāo)識(shí)計(jì)算.

ANN, DetaParser ANN computing. It mainly contained a mind set, environment set, motivation set and emotion set etc. Those sets were computed as an advanceddecision, which emphasized weights as a comprehensive of trending, continuing, predicting, guessing and minding etc. With an associating in text mining and analyzed domain. This decision either would value a true estimation, or would calculate a summing centre for the next steps.

1詞性卷積計(jì)算refer page 182

2用于確定文本的中心

1 算子組成

1. 1 S SENSING 意識(shí)比率

1. 2 E ENVIRONMENT 環(huán)境比率

1. 3 M MOTIVATION 動(dòng)機(jī)比率

1. 4 E EMOTION 情緒比率

refer page 18


關(guān)于比率的描述:

羅瑤光先生個(gè)人認(rèn)為比率的價(jià)值體現(xiàn)在比重, 舉例如果100個(gè)詞匯中有80個(gè)形容詞, 則初步判斷為文章形容詞比重大, 文章屬于比較強(qiáng)表達(dá)細(xì)膩的散文文筆. 舉例如果100個(gè)詞匯中有80個(gè)動(dòng)詞, 則初步判斷為文章動(dòng)詞比重大, 文章屬于比較強(qiáng)刻畫生動(dòng)的活動(dòng)狀態(tài)的敘述文筆. 這個(gè)比重能夠很好的解釋一些文章中的作者的動(dòng)機(jī)和行為習(xí)慣. 以及寫作風(fēng)格.

1舉例 如動(dòng)機(jī)比率, 如果文中出現(xiàn)菜刀, 頂板, 油鍋, 五花肉, 香料, 這些詞匯, 這些詞匯的動(dòng)機(jī)map索引key 出現(xiàn)大量的烹飪, value時(shí)候, 那么計(jì)算機(jī)便能從這些比率中得到很多潛在的意識(shí)信息, 閱讀者和計(jì)算機(jī)首先便能從文章中了解到是描述烹飪過程的文章.

2舉例 如環(huán)境比率, 如果文中出現(xiàn)菜刀, 頂板, 油鍋, 五花肉, 香料, 這些詞匯, 這些詞匯的動(dòng)機(jī)map索引key 出現(xiàn)大量的廚房, 酒店, value時(shí)候, 那么計(jì)算機(jī)便能從這些比率中得到很多潛在的意識(shí)信息, 閱讀者和 計(jì)算機(jī)首先便能從文章中了解到是描述酒店廚師的烹飪過程的文章.

3舉例 如文學(xué)性比率, 如果文中出現(xiàn)菜刀, 頂板, 油鍋, 五花肉, 香料, 這些詞匯, 這些詞匯的大量屬于名詞的比重大, 那么計(jì)算機(jī)便能從這些比率中得到很多潛在的信息, 閱讀者和計(jì)算機(jī)首先便能從文章中了解到是描述酒店廚師的烹飪過程的技術(shù)類文章.

描述人羅瑤光

Implements a Ratio of POS.

Mr. YaoguangLuo considered the ratio of POS which meant the proportion of lexicons. For example their paper had 80 adjectives in 100 words, so the proportion of lexicon meant that the essay contained a lot of strokes, It more liked a prose. Of course, assumed their paper had 80 verbs in 100 words, so the proportion of lexicon meant that the essay contained a lot of actions, seemed their essay more likes a narrate story. Therefore, the author considered that ratios of POS could make a good descriptive activity, which to make a prediction of a personal written grammar. Examples as below.

1. Implements a ratio of motivation.

Assumed their paper appeared five words: Kitchen Knife, Chopping Board, Wok, Streaky-Meat and Spicy-Condiment. Resulted a higher motivation of lexicon was cooking. It meand the humanoid computer could read and find a potential information from this paper, would easily to know this paper was about to make a narrate of how to cook the food.

2. Implements a ratio of environment.

Although their paper appeared five words: Kitchen Knife, Chopping Board, Wok, Streaky-Meat and Spicy-Condiment. Resulted a higher environment of lexicon was kitchen. It meaned the humanoid computer would easily to know this paper was about to cook a food in where the address was a Hotel, Canting, Caffeteria, Pizzeria, Rosticerria or Restaurant.

3. Implements a ratio of literature.

Even their paper still appeared above five words. But resulted a higher ratio of POS was Noun. It meand the humanoid computer would easily to know this paper was about a definite essay of Cooking Science and Technology.

Author: YaoguangLuo 稍后翻譯語法 因?yàn)槌霈F(xiàn)情態(tài)助詞暑椰,干脆全文過去時(shí)態(tài)


RNN


德塔的詞位卷積計(jì)算RNN, 主要包含詞性比率, 詞距比率算子和歐基里德熵算子. 這三個(gè)算子主要用于求解 POS距離, COVEX距離, EUCLID距離. 這些權(quán)距 在一篇文章中, 能夠很清楚的計(jì)算每一個(gè)詞匯的使用度, 出現(xiàn)的價(jià)值, 和應(yīng)用頻率以及分布規(guī)律.用于文本的主要描述語句的 重心所在位置計(jì)算.

RNN, DetaParser RNN computing. It mainly contained a distance set and entropy set etc. Those sets were computed as the observer weights of Part of Speed POS, Covex of position and Euclid KNN. With associating in text mining and analysis domain. It could clearly find out an information by each lexicon, such as the frequency count, ruly distribution and trace weight. Above sets could make a good implementation of summing centre for the next steps.

1詞位卷積計(jì)算refer page 178

2用于確定文本的重心

2. 1 算子組成

2. 1. 1 P POS 詞性比率

2. 1. 2 C CORRELATION 詞距比率

2. 1. 3 E E-DISTANCE 歐基里德熵

refer page 18


關(guān)于距離的描述,

羅瑤光先生個(gè)人認(rèn)為文中的詞匯不同屬性和不同類別的詞匯的位置距離在計(jì)算主要描述語句的重心所在位置后, 可以更好的歸納文章的中心思想, 我接著舉例

如果文中出現(xiàn)菜刀, 頂板, 油鍋, 五花肉, 香料, 這些詞匯, 如果文中大量的出現(xiàn)五花肉的詞匯, 閱讀者和計(jì)算機(jī)便能理解這篇文章描述的是酒店廚師的烹飪食用肉類的的技術(shù)類文章. 當(dāng)然, 如果文中大量的出現(xiàn)香料的詞匯, 閱讀者和計(jì)算機(jī)便能理解這篇文章描述的是酒店廚師的烹飪過程中關(guān)于香料的使用方法介紹的的技術(shù)類文章.

接著舉例, 如果相同的香料 的詞匯, 如 品牌陳醋, 這個(gè)詞匯, 在全文1000字文章5段落中, 品牌陳醋在文中出現(xiàn)在第1段, 第2段, 第4段, 第5段, 出現(xiàn)了30多次, 其中第4段出現(xiàn)了20次, 這時(shí)候詞距的作用可以提高 品牌陳醋的重心價(jià)值, 說明酒店廚師的烹飪過程中關(guān)于香料的使用方法介紹的的技術(shù)類文章. 香料的具體使用方法在第四段.

歐基里德熵的價(jià)值能更好的觀測這些品牌陳醋 的詞距關(guān)聯(lián)的過程軌跡, 進(jìn)行邊緣囊括, 舉例如果文中 句型是 品牌陳醋 + 水餃 + 品牌陳醋+ 五花肉. 那么這個(gè)水餃(RNN比重雖然低)的在詞距的軌跡熵中計(jì)算 DNN中心計(jì)算中比重將會(huì)提高. 五花肉因?yàn)槌霈F(xiàn)在末尾, (越末尾位置比較大, 這里我設(shè)計(jì)的方法出了問題, 因?yàn)槲以谧xels的作文經(jīng)常 把conclusion寫在最后面, 我個(gè)人認(rèn)為最后的段落是用來總結(jié)的. 不代表全人類思想, 今天20200402又思考了這個(gè)問題,覺得依舊有合情的價(jià)值, 因?yàn)樵谝恍懽黠L(fēng)格中, 如果一開始就來個(gè)outlook進(jìn)行中心論點(diǎn)表達(dá), 然后再分布論證, 最后一個(gè)conclusion段落進(jìn)行總結(jié), 雖然outlook出現(xiàn)的價(jià)值詞匯RNN采集積分比較低, 但是詞距也相應(yīng)變的巨大, 最后的mean求解依舊占有大比重, 不會(huì)輕易偏離預(yù)想結(jié)果. )

描述人 羅瑤光

An Implement of Distance of POS.

Mr. Yaoguang Luo considered the distance of POS which meant the weight of lexicons. Those factors about the reflection of different attributes and the position of different classes, which could make a calculation of Mind. Then continuing examples as below.

Assumed their paper appeared five words: Kitchen Knife, Chopping Board, Wok, Streaky-Meat and Spicy-Condiment. Resulted a higher frequency of lexical appearance, was Streaky-Meat. It meaned the humanoid computer could read and find a potencial information from this paper, would easily to know this paper was about a definite essay on Cooking Science and Technology, which mainly made a presentation of meat. Similarly Resulted the higher frequency of lexical appearance, was Spicy-Condiment. which mainly made a presentation of Spicy-Formula. Lets continued examples as below.

Assumed their Spicy-Condiment contained a mature vinegar, which was a higher frequent lexicon. 'Vinegar' appeared at paragraph of 1, 2, 4 and 5, especially at 4. The accomplishment of distance of the same lexicon, could scale the weight of 'Vinegar'. Then humanoid computer would easily to know this paper was about an essay of Cooking Science and Technology. Which mainly an introduction of 'Vinegar' in presentation of Spicy-Formula. Especially at paragraph 4.

Euclidean KNN could trace an observation of frequent loxical distance. For example Deta RNN computings, Assumed It sequently input 1'Vinegar', 2'Dumpling', 3'Vinegar' and 4'Streaky-Meat', will result the Deta rank of DNN was higher than Deta rank of RNN by 'Dumpling'. And also we could find that the Deta ratio of DNN of 'Streaky-Meat' was highly.

Author: YaoguangLuo 稍后翻譯語法


DNN

德塔的詞匯深度計(jì)算 可以理解為 德塔詞性的卷積計(jì)算ANN 與 德塔的詞位卷積計(jì)算RNN 的前序笛卡爾卷積計(jì)算. 因?yàn)閰?shù) 由 文章中心思想 和 文章的重心詞位 兩類組成, 因此適用于分析和計(jì)算文章的 核心思想詞匯的價(jià)值

DetaParser DNN computing.?It mainly contained DetaParser ANN and DetaParser RNN, for the prefix Cartesian-calculations. Due to the inputs were two types of ANN summing and RNN position weights, thus, DetaParser DNN could dig a central Theory of the text document, especially suitable for the text mining system.

1詞匯深度計(jì)算refer page 183

2用于確定文本的核心


大文本中西醫(yī)結(jié)合 極速中文分詞進(jìn)行 DNN 關(guān)聯(lián)計(jì)算.

DNN關(guān)聯(lián)應(yīng)用擴(kuò)展

DNN 關(guān)聯(lián)應(yīng)用擴(kuò)展 具體方式有很多, 作者可以舉出一些比較有價(jià)值的搭配實(shí)例, 如將紅色分為 小紅, 淺紅, 中紅, 深紅, 按255色階分出4個(gè)程度階. 然后根據(jù)DNN的詞匯計(jì)算打分進(jìn)行將詞匯分類用這4種顏色代替, 舉例 香蕉和蘋果都是水果,進(jìn)行DNN計(jì)算, 如果香蕉是30分, 蘋果是40分, 水果是50分 那么進(jìn)行色階表達(dá)即可用 水果 深紅, 蘋果 中紅, 香蕉 淺紅 來色階表達(dá),這是名詞的, 當(dāng)然, 如果有形容詞用紫色標(biāo)識(shí), 就 深紫, 中紫, 淺紫. 有 動(dòng)詞用黃色就 深黃, 中黃, 淺黃, 綠色就.. . 等等等, 這樣德塔DNN的應(yīng)用價(jià)值就靈活體現(xiàn)了. 應(yīng)為屬于工業(yè)應(yīng)用, 作者在這里略. 定義人 羅瑤光

Expanding an associational DNN applications

The author listed a demonstrational sample of DNN ranks to classify as where their listed four colors of Red, Purple, Green and Yellow. Then also made four levels ranged from these colors where such as comparative degree of dark, little, median and deep. The max levels scale was 255 on each pix. Assumed the input was '香蕉 和 蘋果 都 是 水果', then scored the DNN value of '香蕉' was 30, '蘋果' was 40, and '水果' was 50. Firstly their values all were Noun of POS, meant only mark as Red color which was enough. Secondly the DNN value of '水果' was the highest of all, so was rendered a deep-red. The DNN value of '蘋果' was higher than '香蕉', therefore the '蘋果' was rendered a median-red, and the '香蕉' was rendered a little-red, made a distinction of other Nouns where rendered a dark-red. Similarly to other speeches classify. Purple for adjectives, and yellow for vebs etc.


2. 1 深度計(jì)算 (ANN sum核-> RNN PCE) refer page 18

為了方便大家的工程應(yīng)用, 我組織下簡單的文字來進(jìn)行描述下. 從上圖. 如果有一定經(jīng)驗(yàn)的數(shù)據(jù)算法工程師是很容易理解的. 如果是新手也不要著急, 因?yàn)檎嬲龁栴}只是概念描述 的問題.

Deta 的DNN 是一個(gè)前序比對(duì)累增積分過程的內(nèi)核算法. 需要做這個(gè)算法, 必要條件是ANN 的最終運(yùn)算集合以及 RNN 的卷積內(nèi)核參照. ANN 是比較基礎(chǔ)的東西, 基礎(chǔ)歸基礎(chǔ), 應(yīng)用領(lǐng)域非常強(qiáng)勢, 2維的數(shù)據(jù)永遠(yuǎn)離不開他. 通過 ANN 的計(jì)算, 我們?cè)谔幚砦恼碌脑~匯計(jì)算中可以得到一些通用的信息集合, 比如文章的敏感度, 意識(shí), 作者的精神狀態(tài), 動(dòng)機(jī), 作者當(dāng)時(shí)的多語言環(huán)境因素等等, 為什么可以得到?原因是比較通俗易懂的, 因?yàn)榘x, 貶義統(tǒng)計(jì), 文章的不同的詞性的比例, 和詞匯的轉(zhuǎn)義猜測, 和名詞的分類引申,這些基礎(chǔ)都是非常簡單的信息進(jìn)行普通處理.

RNN 的內(nèi)核矩陣就麻煩點(diǎn)了. Deta 的 RNN 內(nèi)核矩陣主要是三個(gè)維度: 詞性的統(tǒng)計(jì)值, 相同詞匯的頻率已經(jīng)在文章中出現(xiàn)的歐幾里得距離重心, 斜率關(guān)聯(lián)等等, 這里需要嚴(yán)謹(jǐn)?shù)乃惴ü絹硗频匠鰞?nèi)核.

有了 ANN 的最終數(shù)據(jù)集合 和 RNN 的卷積核, 我們就可以做CNN 輪詢了 Deta 的 DNN 計(jì)算定義就是基于德塔的Ann 矩陣數(shù)據(jù)得到最終1 維數(shù)列比, 然后進(jìn)行德塔的RNN 內(nèi)核做 卷積處理 的3 層深度前序累增積分概率比CNN 輪循運(yùn)算. (為了追求更高的質(zhì)量和精度, 小伙伴可以自由改寫我的作品思想源碼, 增加更多的維度皆可. 永久開源, 別擔(dān)心著作權(quán)問題, 以后贈(zèng)予對(duì)象如有進(jìn)行出版社出版, 相關(guān)文字和內(nèi)容的引用就要注意了. 當(dāng)前采用開源協(xié)議為GPL2.0協(xié)議, 之前為APACHE2.0協(xié)議)

上面介紹的是 ANN, RNN, CNN 關(guān)于公式, 環(huán)境, 原理和初始過程, 關(guān)于 Deta DNN 的計(jì)算算法在圖片中已經(jīng)列出來了.

這個(gè)算法的相關(guān)實(shí)現(xiàn)代碼的核心部分地址如下:

https://github.com/yaoguangluo/Data_Processor/blob/master/DP/NLPProcessor/DETA_DNN. java


Deta DNN(ANN summing kernel -> RNN PCE)

Above picture is a topic of foundation, a pre-sequence marching of incremental differentiations, where based on the ANN-summing and RNN-convolutional kernel computing. The kernel of ANN-summing and RNN-computing, is belong to the domain of CNN convolutional kernel. The definition of Deta ANN and RNN please see the original pages. The author refers 'Yann.Lecun' here about an inventory of CNN.

The author YaoguangLuo 稍后優(yōu)化語法.


圖靈機(jī)

1 文學(xué)分析refer page 168

關(guān)于圖中的環(huán)境, 動(dòng)機(jī)聯(lián)想, 傾向探索, 決策發(fā)覺的推薦詞匯描述.

Deta文學(xué)分析的推薦詞匯來自于語料詞庫. 在分詞處理文章之前, 先進(jìn)行語料庫的詞匯map導(dǎo)入索引預(yù)處理. 于是, 在輸入一篇需要分析的文章之后進(jìn)行德塔分詞, 切出的這些詞匯 通過 預(yù)處理的map索引集, 依次遍歷搜索進(jìn)行key find 來匹配映射其結(jié)果來統(tǒng)計(jì)展示, 舉個(gè)例子 如圖中文字 上癮, 煙癮,在map中能匹配到 化學(xué), 于是 環(huán)境 屬性行便出現(xiàn)了化學(xué)詞匯. 其它行方法類似. 作者描述下為什么 會(huì)用 環(huán)境, 動(dòng)機(jī), 傾向, 決策, 來分行, 是因?yàn)? 一開始, 作者便想通過一種具有普遍概括的規(guī)律來進(jìn)行描述這個(gè)組件功能, 于是用了原始的詞匯表達(dá)方法, 如名詞, 動(dòng)詞, 形容詞, 作者認(rèn)為 名詞具有環(huán)境描述的包含能力, 動(dòng)詞具有動(dòng)機(jī)描述的特征表達(dá), 形容詞具有具有情感的體現(xiàn). 這些特定的搭配能夠很好的解釋一篇文章的意識(shí)思維. 描述人 羅瑤光

An implement of suggest lexicons about Enviroment, Motivatial Lenovo, Trending-Explore, Decisional Trigger.

Deta literary analyst which was based on Orthos map and lexical dictionary. Before the word segmentational process of essay, the Deta parser engine would init an indexed base where could store the reflective chars set of an observation. Then input strings to do the word segmentation. The Deta engine could sequently parser the word correctly, which was based on this corpus by finding key values of map. Then did an exhibition of resultant statistics. For example, typed a word 'Addicted' or 'Craving for Tobacco', the reflective result of marching an environmental corpus, would list a word 'chemistry'. Its application could be used well in similar domains of motivation, trending and decision. The author considered these reflections were an universal value of conclusion, meant an ordinary of such as nouns, verbs and adjectives, not only the reflections, but also the author considered lexicons of nouns, verbs and adjectives where could make reflections of environment, features and emotions. Those specific partner of grammartical reflections could explain well in essay minding.

The author YaoguangLuo 稍后優(yōu)化語法.

德塔文學(xué)分析主要用于文章的思想分析和挖掘, 如確定多語意識(shí)的場景, 當(dāng)時(shí)的環(huán)境, 動(dòng)機(jī), 意識(shí)形態(tài)傾向和決策思維表達(dá)等. (多語意識(shí): 通過人物的對(duì)話方式, 語言特征, 模式場景等因素來 分析當(dāng)時(shí)的人文情感, 大眾思想, 從而了解所處時(shí)代的民族風(fēng)情, 社會(huì)建筑, 時(shí)代背景. 作者當(dāng)年引用馬海良的人文建筑涉及了這個(gè) ‘多語意識(shí)’詞匯, 白育芳當(dāng)時(shí)要作者寫明詞匯的refer出處, 教授人: 作者導(dǎo)師白育芳, 2007年.)

2 作品評(píng)估refer page 167

德塔作品評(píng)估 可理解為教育程度評(píng)估, 如語法, 詞匯的詞性統(tǒng)計(jì), 專業(yè)詞匯的統(tǒng)計(jì), 成語, 三字詞的詞長詞匯的統(tǒng)計(jì), 等等. 如一個(gè)句子中含有的高級(jí)詞匯的比率, 4字名詞的比率, 形容詞的比率. (作者最早意識(shí)出現(xiàn)在2009年?在上海章鑫杰那 處理法國ESIEE亞眠大學(xué)的法語郵件項(xiàng)目,Pascal教授曾傳授作者關(guān)于FLECH法語元音比重單詞分析的表述. 設(shè)計(jì)這個(gè)項(xiàng)目, 進(jìn)行了靈感發(fā)散. 德塔圖靈分詞全文沒有任何單詞分析和 非中文的語言分析, 不涉及flech任何思想和邏輯, 因此一直沒有refer. 作者擁有完整著作權(quán)和版權(quán))

Portrait-assessment

Deta portrait-assessment could absorb as an assessment of educational level. Such as grammars, lexical process, NLP, statistical POS and verbals of triper-chars lexicon, quadru-chars idiom and penta-chars Slang. For example the ratio of contained verbal in each sentence, and the word length of each contained verbal. Also could determinate as a higher educational level verbals.

3 動(dòng)機(jī)分析refer page 169

德塔動(dòng)機(jī)分析 基于動(dòng)機(jī)詞典的map key匹配進(jìn)行決策表達(dá). 比較簡單. 因?yàn)樵~典定義 帶有作者個(gè)人主觀思維特征. 所以沒有太多描述.

Motivation-assessment.?Deta motivation-assessment does a decision on trending, where based on motivational dictionary.

4 情感分析refer page 159

德塔情感分析 基于 褒義詞 貶義詞 和中性詞 的 map key匹配 進(jìn)行決策表達(dá). 比較簡單. 因?yàn)樵~典定義帶有作者個(gè)人主觀思維特征. 所以沒有太多描述.

Emotion-assessment.?Deta emotion-assessment does a decision on sentiment, where based on commendatory, derogatory and neutral dictionary.

5 習(xí)慣分析refer page 169

德塔習(xí)慣分析 基于 褒義詞 貶義詞 和中性詞, 動(dòng)機(jī)詞, 文學(xué)分析數(shù)據(jù), 作品評(píng)估比率, 教育程度等數(shù)據(jù)的全文比重, 來確定一個(gè)人寫作特征, 和寫作習(xí)慣. 寫作風(fēng)格. 因?yàn)樵~典定義 帶有作者個(gè)人主觀思維特征. 所以沒有太多描述.

Habit-assessment.?Deta habit-assessment does a comprehensive decision on multi-assessment of portrait, motivation and emotion.

6 教育程度評(píng)估refer page 168

德塔教育程度評(píng)估體現(xiàn)在文章中的(有效詞匯如詞長超過2位)的(有價(jià)值詞匯如名動(dòng)形謂狀)的全文, 全句, 其它POS詞性的比率來確定文章的句法特征. 舉個(gè)簡單的例子, 一個(gè)句子中有效有價(jià)值的形容詞比重大的文章通常代表作者的分析表達(dá)和散文修飾能力比較強(qiáng)勢.,思維來自作者初中語文學(xué)習(xí).

Education-assessment.?Deta education-assessment does an industrial application of essay-definition, such as a valuable reference, statement ability, narrate-level and tutor-assessment. It may include all of these above assessments to result a final summing-collection, to emphasis a fulfillment of educational level.

The author YaoguangLuo 稍后優(yōu)化語法.

關(guān)于作文輔導(dǎo)能力的文字描述


Implements an essay of tutoral ability.

The author expanded a deep analyst after he had already concluded the ratios, weights and reflections. The purpose of development quickly to make a recognition of personal and featural habit, written style, emotional status and educational level. The corpus, speeches and definitions were contains POS, Distance, SVO, word length and weight. The calculation of NLP could easily make an assessment of educational level. For example at above picture, the input string contains a few adjectives, so the ratio-result of prose seemed more lower was 0.1578. The score absolutely trends to an argumentation and statement, was 0.47368 and 0.40983. Recently Deta NLP algorithms could make a sustainable optimization by increasing more components from filters, determinations, definitions, PCA and artical types.

應(yīng)用

極速中文搜索


關(guān)于極速中文搜索, 目前整體應(yīng)用于養(yǎng)療經(jīng)的主引擎搜索組件中, 主要體現(xiàn)在搜索的內(nèi)容和搜索的對(duì)象的計(jì)算處理方式上. 關(guān)于搜索的內(nèi)容, 普遍采用分詞和統(tǒng)計(jì)方法, 主要包含內(nèi)容的關(guān)鍵詞, 詞頻, 詞長, 詞性等要素處理, 最后map索引封裝, 方便調(diào)用. 而搜索對(duì)象為文本文件, 文本進(jìn)行極速分詞, 或者DNN分詞, 然后進(jìn)行按搜索內(nèi)容的格式化數(shù)據(jù)進(jìn)行POS, NLP, PCA 等模塊計(jì)算來匹配打分, 然后包裝結(jié)果排序輸出. 這個(gè)過程中的一些固定中間變量, 可以進(jìn)行按精度調(diào)節(jié), 滿足不同的工業(yè)場景計(jì)算, 自適應(yīng)輸出.?定義人 羅瑤光.

About a speed Chinese search algorithm.

It mainly was integrated in YangLiaoJing(YLJ) engine system. As an important componient, the algorithm contained two parts of searching, were content and subject. First, talked about searching of content, It included the way of word segmentation and statistics, for instance key word, frequency word, word length, parts of speech and etc. finally built a responding map for feeding a functional call back. Then talked about searching of subject, It could be a Text file, strings document, DNN minds stream, then did a marching score before a searching of content. The way of marching could be POS(Part of Speech), NLP(Nature Language Process), PCA (Primary Componient Analysis) and etc. Then output a final result after sorting and arranged conclusions. Seen the engine might add more scales of the self-control in the industrial environment.

The author YaoguangLuo. 稍后優(yōu)化語法.

感想

20211112, 我在思考一個(gè)關(guān)鍵點(diǎn), 為什么我的德塔分詞 每秒近2300萬的分詞速度 在 sonar的 國際認(rèn)證寫法格式后, 變成了1600萬+說明一個(gè)問題, sonar的格式化是加強(qiáng)人的視覺理解格式, 不是計(jì)算機(jī)的迅速理解格式. 所以元基索引編碼 是趨勢.




涉及著作權(quán)文件:

1.羅瑤光. 《德塔自然語言圖靈系統(tǒng) V10.6.1》. 中華人民共和國國家版權(quán)局,軟著登字第3951366號(hào). 2019.

2.羅瑤光. 《Java數(shù)據(jù)分析算法引擎系統(tǒng) V1.0.0》. 中華人民共和國國家版權(quán)局低匙,軟著登字第4584594號(hào). 2014.

3.羅瑤光旷痕,羅榮武. 《類人DNA與 神經(jīng)元基于催化算子映射編碼方式 V_1.2.2》. 中華人民共和國國家版權(quán)局,國作登字-2021-A-00097017. 2021.

4.羅瑤光努咐,羅榮武. 《DNA元基催化與肽計(jì)算第二卷養(yǎng)療經(jīng)應(yīng)用研究20210305》. 中華人民共和國國家版權(quán)局苦蒿,國作登字-2021-L-00103660. 2021.

5.羅瑤光,羅榮武. 《DNA 元基催化與肽計(jì)算 第三修訂版V039010912》. 中華人民共和國國家版權(quán)局渗稍,國作登字-2021-L-00268255. 2021.

6.羅瑤光. 《DNA元基索引ETL中文腳本編譯機(jī)V0.0.2》. 中華人民共和國國家版權(quán)局佩迟,SD-2021R11L2844054. 2021. (登記號(hào):2022SR0011067)軟著登字第8965266號(hào)

7.類人數(shù)據(jù)生命的DNA計(jì)算思想 Github [引用日期2020-03-05]?https://github.com/yaoguangluo/Deta_Resource

8.羅瑤光,羅榮武. 《DNA元基催化與肽計(jì)算 第四修訂版 V00919》. 中華人民共和國國家版權(quán)局竿屹,SD-2022Z11L0025809. 2022. 登記號(hào):國作登字-2022-L-10071310

文件資源

1 Jar:?https://github.com/yaoguangluo/ChromosomeDNA/blob/main/BloomChromosome_V19001_20220108.jar

2 UML:?DNA元基催化與肽計(jì)算 第四修訂版V00919

3 PPT:?https://github.com/yaoguangluo/ChromosomeDNA/tree/main/ppt

4 Book:《DNA元基催化與肽計(jì)算 第四修訂版 V00919》上下冊(cè)

https://github.com/yaoguangluo/ChromosomeDNA/tree/main/元基催化與肽計(jì)算第四修訂版本整理

5 函數(shù)在Git的存儲(chǔ)地址:Demos

Github:https://github.com/yaoguangluo/ChromosomeDNA/

Coding:公開倉庫

Bitbucket:Bitbucket

Gitee:瀏陽德塔軟件開發(fā)有限公司GPL2.0開源大數(shù)據(jù)項(xiàng)目 (DetaChina) - Gitee.com

6 其它資源鏈接:

ZHIHU?DNA元基催化與肽計(jì)算第四修訂版

CSDN?DNA元基催化與肽計(jì)算UML集_羅瑤光19850525的博客-CSDN博客

CSDN?DNA元基催化與肽計(jì)算 第四修訂版V00919

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末报强,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子拱燃,更是在濱河造成了極大的恐慌秉溉,老刑警劉巖,帶你破解...
    沈念sama閱讀 222,104評(píng)論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件碗誉,死亡現(xiàn)場離奇詭異召嘶,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)哮缺,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,816評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門弄跌,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人尝苇,你說我怎么就攤上這事铛只。” “怎么了糠溜?”我有些...
    開封第一講書人閱讀 168,697評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵淳玩,是天一觀的道長。 經(jīng)常有香客問我非竿,道長蜕着,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 59,836評(píng)論 1 298
  • 正文 為了忘掉前任红柱,我火速辦了婚禮侮东,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘豹芯。我一直安慰自己,他們只是感情好驱敲,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,851評(píng)論 6 397
  • 文/花漫 我一把揭開白布铁蹈。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪握牧。 梳的紋絲不亂的頭發(fā)上容诬,一...
    開封第一講書人閱讀 52,441評(píng)論 1 310
  • 那天,我揣著相機(jī)與錄音沿腰,去河邊找鬼览徒。 笑死,一個(gè)胖子當(dāng)著我的面吹牛颂龙,可吹牛的內(nèi)容都是我干的习蓬。 我是一名探鬼主播,決...
    沈念sama閱讀 40,992評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼措嵌,長吁一口氣:“原來是場噩夢啊……” “哼躲叼!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起企巢,我...
    開封第一講書人閱讀 39,899評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤枫慷,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后浪规,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體或听,經(jīng)...
    沈念sama閱讀 46,457評(píng)論 1 318
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,529評(píng)論 3 341
  • 正文 我和宋清朗相戀三年笋婿,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了誉裆。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 40,664評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡萌抵,死狀恐怖找御,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情绍填,我是刑警寧澤霎桅,帶...
    沈念sama閱讀 36,346評(píng)論 5 350
  • 正文 年R本政府宣布,位于F島的核電站讨永,受9級(jí)特大地震影響滔驶,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜卿闹,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 42,025評(píng)論 3 334
  • 文/蒙蒙 一揭糕、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧锻霎,春花似錦著角、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,511評(píng)論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春产徊,著一層夾襖步出監(jiān)牢的瞬間昂勒,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,611評(píng)論 1 272
  • 我被黑心中介騙來泰國打工舟铜, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留戈盈,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 49,081評(píng)論 3 377
  • 正文 我出身青樓谆刨,卻偏偏與公主長得像塘娶,于是被迫代替她去往敵國和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子痴荐,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,675評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容