Giter VIP home page Giter VIP logo

Comments (2)

Ayanaminn avatar Ayanaminn commented on August 16, 2024

感谢,这个功能比较实际。然后我想了一下,按空格分割可能还有若干种情况需要考虑,以避免无效分割:

  1. 有英文的情况,如

Birthday Liveについて話そうかなと思います

这里若仅以空格分割就反而多余。实际上应避免分割。

  1. 短句或单词的情况, 如

じゃあ行きまーす せーの ふふ はい はい あー全然ダメでした

这里有较为多的语气词,单字或短句,如果每个元素都分割可能没有必要,反而增加了调轴的工作量。
我的想法是这里可以提供用户自定义的阈值(或直接设定),如提供分句强度的选项。例如,强分割时每个元素都分割,弱分割时只有空格后的词长度大于5才进行分割(暂时选了一个5,我觉得可能这个长度能过滤大部分日语短词的长度)
那么以上例句就被分割为

じゃあ行きまーす せーの ふふ はい はい(adjust_required)
あー全然ダメでした(adjust_required)

这样应该会更实际一点

这几种情况我大概有实现方法的思路。我觉得你看是不是可以提交PR先把分割功能加进来,然后我再去打补丁。
也看你还有没有别的想法。

from n46whisper.

Lenshyuu227 avatar Lenshyuu227 commented on August 16, 2024

分句强度这个想法非常好!
现在我做的只是单纯把有空格的地方给替换掉换行,没考虑到切得太细碎的问题以及英文的情况。
那我过两天先提交一个PR吧。

from n46whisper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.