Does FTS support Chinese, if not, how to make it support?

Peter_Liang · November 21, 2019, 10:03am

I find that FTP plugin seems not support Chinese, am I correct?
How can I make it support?

lugreen · November 21, 2019, 11:08am

Hi,
I think that it’s possible to achieve that point ,If you are familiar with Lucence .
See the source code of FTS addon:

You can implement the IndexWriterProvider and IndexSearcherProvider to process Chinese.

Peter_Liang · November 25, 2019, 3:00pm

Thanks @lugreen. I thought it an OOTB feature which can be enabled simply change a configuration. It looks like it isn’t.

lugreen · November 26, 2019, 1:45am

Hi @Peter_Liang,
In fact,it’s easy to support Chinese .

Create a IndexWriterProvider that extends from IndexWriterProviderBean , create an Analyzer that supports Chinese. The well known Analyzer for Chinese is IKAnalyzer. Though IKAnalyzer has no supports from original author , but there are some community edition can choose, i think GitHub - blueshen/ik-analyzer: Tokenizer support Lucene5/6/7/8/9+ version, LTS is a good choice.
Override the built-in IndexWriterProviderBean , see 扩展业务逻辑 - CUBA 框架开发者手册 to find the way.

Peter_Liang · November 27, 2019, 2:55pm

Thanks @lugreen. Yes, it is quite easy!! I used the same way you described. But I am SmartChineseAnalyzer instead of IKAnalyzer.

I added the smartcn dependencies and use SmartChineseAnalyzer as the Analyzer which supports both Chinese and English.

To use SmartChineseAnalyzer, I imported the lib in build.gradle.

-------------------build.gradle-------------------

configure(globalModule) {
dependencies {
...
    runtime 'org.apache.lucene:lucene-analyzers-smartcn:7.5.0'
    compile 'org.apache.lucene:lucene-analyzers-smartcn:7.5.0'
}
...

Then I extended IndexWriterProviderBean, overriding the createAnalyzer() method as follow.

public class IndexWriterProviderBean extends com.haulmont.fts.core.sys.IndexWriterProviderBean {

    @Override
    protected Analyzer createAnalyzer() {
        Map<String, Analyzer> analyzerPerField = new HashMap<>();
        analyzerPerField.put(FLD_LINKS, new WhitespaceAnalyzer());
       analyzerPerField.put(FLD_MORPHOLOGY_ALL, new SmartChineseAnalyzer());
       return new PerFieldAnalyzerWrapper(new EntityAttributeAnalyzer(), analyzerPerField);
    }
}