Giter VIP home page Giter VIP logo

rime-dict-creator's Introduction

rime-dict-creator

一个简单的可以快速根据纯文本创建 Rime 输入法下朙月拼音词库的转换脚本

用法

首先需要制作一个纯文本文件(UTF-8 格式),如果是 Word,可以先 Ctrl+A,Ctrl+C 复制粘贴到记事本或 VSCode 等编辑器中保存。

文件名称为 your-dict-name.txt,其中 your-dict-name 需要自己设定,例如创建一个法律词库,你可以选名 law.txt

# 首先通过 git 拉回项目,然后再进目录后执行下面的命令
composer update
php convert.php {your-dict-name}.txt

此脚本将会提取所有中文,并通过分词,去重,然后最后生成一个 Rime 的词库配置文件 luna_pinyin.{your-dict-name}.dict.yaml

如果你还没有配置扩展词库,在 luna_pinyin_simp.custom.yaml 加入:

patch:
  "translator/dictionary": luna_pinyin.extended

新建文件 luna_pinyin.extended.dict.yaml

# Rime dictionary
# encoding: utf-8
# Luna Pinyin Extended Dictionary  - 朙月拼音扩充词库

---
name: luna_pinyin.extended      # 词库名
version: "2022.01.25"
sort: by_weight                 # by_weight(按词频高低排序)或 original(保持原码表中的顺序)
use_preset_vocabulary: true     # true 或 false,选择是否导入预设词汇表【八股文】

import_tables:
  - luna_pinyin.{your-dict-name}

如果已经配置了扩充字库,则在你自己的词库中插入扩展表 luna_pinyin.{your-dict-name} 即可。

现有问题

  • 多音字暂时不好搞,因为采用的是 jifei/Pinyin 脚本进行简单提取拼音。
  • 关于分词不准确的情况,可以根据 结巴分词 PHP 提供的特性进行调整,目前采用的是默认模式分词。
  • 脚本不到 100 行,有闲者可以帮忙改下。

rime-dict-creator's People

Contributors

crazywhalecc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

housir2001

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.