4) bidirectional maximum matching method (from left to right, from right to left two scans)
1, based on the understanding of the word segmentation method
4, the recognition of words
, the identification of new words
in order to improve the segmentation accuracy, and the emergence of signs and characteristics of scanning. As for the word mark to mark the breakpoint, the original string is divided into smaller strings again into mechanical word segmentation and word segmentation; feature will help combine lexical category labeling, the segmentation decision by rich lexical category information, and in the process of marking in turn on the word results for inspection, adjustment, thus greatly improve the accuracy the rate of segmentation.
The change Chinese segmentation principle of Two,
search engine algorithm in the change, the idsem team member Wang Kejiang to Chinese wordsegmentation principle to explain love Shanghai Chinese segmentation principle of participles, as follows:
new word recognition mainly refers to the professional terminology or named entities such as place names, organization names and trademarks "in love with Shanghai as a proprietary thesaurus thesaurus.
3, based on statistical word segmentation method
segmentation method based on statistics there are two ways: mutual information statistical word segmentation, machine learning statistical segmentation of mutual information statistics: statistical segmentation words appear in the premise of removing noise in the frequency and word location adjacent to form words according to the frequency principle and the adjacent words appear.
including the intersection and combination ambiguity in the details you can refer to the word "
1) maximum matching method (from left to right direction)
, a Chinese segmentation principle of the explanation of
segmentation method based on understanding is the machine simulation to understand words, with language knowledge and thesaurus, machine execution control, control means, and the segmentation control to simulate human to read web information. Can be understood as the machine to simulate people segmentation.
(3) at least to cut out the segmentation words in each sentence of minimum
) ambiguityThe ambiguity of
2, based on the segmentation method of string matching
" Chinese encyclopedia will not do
machine learning statistical segmentation: in a large number of segmentation of the text under the premise of using statistical machine learning model for learning word segmentation rules of unknown text segmentation, can also be training for statistical word segmentation.
in the future to explain the Chinese play word, then love Shanghai is how to Chinese segmentation, the following specific examples details
(2) reverse maximum matching method to the left direction from right)
matching based on segmentation according to different scan manners and vocabulary comparison, scanning mode is divided into four types: