Comments (4)
Names and semantics should be close to Java version as possible.
(comment from Takaoka-san)
from sudachi.rs.
TL:DR
Nice API with multiple sentences is currently blocked in stable Rust by in-progress GATs feature, also see http://lukaskalbertodt.github.io/2018/08/03/solving-the-generalized-streaming-iterator-problem-without-gats.html.
Want to have:
- Analyzer having internal mutable state for storing one sentence (and not more) because of performance reasons
- Iterator over analyzed sentences needs to borrow analyzer mutably to enforce that it is impossible to access the next sentence before consuming the current one
Problems:
- Current Rust iterators can't borrow from
&self
without GATs which would probably introduce new-ish Iterator API as well
What to do
- Provide non-allocating API for
- Sentence splitting, returning Iterator of
&str
- Provide sentence-based analysis API
- Sentence splitting, returning Iterator of
- Optionally provide allocating combined API, which copies needed information (mostly POS) from analyzer
- Another option would be to implement Iterator-like pattern without using standard library traits to iterate over analyzed sentences (for consuming in while loop).
from sudachi.rs.
Splitting API into sentence splitter / analysis
for sentence in analyzer.split_sentences(line)? {
let result = analyzer.analyze_sentence(sentence)?
for token in result.tokens() {
// process token
}
}
from sudachi.rs.
Morpheme's part_of_speech should not return option of POS array, it should panic when given invalid POS id instead.
from sudachi.rs.
Related Issues (20)
- Python Exception Types HOT 5
- Can the default resource files be embedded on compile time? HOT 1
- aarch64 Linux Wheels HOT 4
- Provide binary wheels for Python 3.11 HOT 1
- Fix CI warning regarding state and output
- Create new MorphemeList when out parameter is None HOT 1
- Let required args of build/ubuild more explicit
- If a dictionary contains U+30FC hyphens(ー), it is not registered in the user dictionary. HOT 23
- Add an ability to convert result of surface() method to normalized variants by specifying a projection HOT 7
- Word which is joined from several katakana words is not OOV
- Support Python 3.12 HOT 1
- SudachiError: Invalid i16 literal when building user dictionary HOT 1
- setup.py install is deprecated
- Update SplitMode implementation HOT 1
- Update github workflow
- Update PyO3 to 0.21
- Import sudachipy.config and errors by default
- Raise errors in a consistent manner (python)
- create pre-tokenizer with surface-projection does not override dictionary-projection
- debug mode (`-d` option) does not work in the python version HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sudachi.rs.