Giter VIP home page Giter VIP logo

vaporetto's Issues

Error: InvalidModel(InvalidModelError { msg: "unsupported character type: 4" })

I need Chinese model and found some in http://www.phontron.com/kytea/download/model/as-0.4.0-1.mod.gz , all the models found in this site can't be converted.

macOS 13.4.1 with M1 chip

cargo run --release -p convert_kytea_model -- --model-in as-0.4.0-1.mod --model
-out as-0.4.0-1.mod.zst
    Finished release [optimized] target(s) in 0.14s
     Running `target/release/convert_kytea_model --model-in as-0.4.0-1.mod --model-out as-0.4.0-1.mod.zst`
Loading model file...
Saving model file...
Error: InvalidModel(InvalidModelError { msg: "unsupported character type: 4" })

error: The following required arguments were not provided: --model-out <model-out>

README.md says:

%  cargo run --release -p convert_kytea_model -- --model-in jp-0.4.7-5-tokenize.model.zstd

but this happens:

# cargo run --release -p convert_kytea_model -- --model-in jp-0.4.7-5-tokenize.model.zstd
    Updating crates.io index
  Downloaded cc v1.0.72
  Downloaded structopt-derive v0.4.18
  Downloaded quote v1.0.15
  Downloaded proc-macro2 v1.0.36
  Downloaded proc-macro-error v1.0.4
  Downloaded syn v1.0.86
  Downloaded bincode v1.3.3
  Downloaded anyhow v1.0.53
  Downloaded bitflags v1.3.2
  Downloaded ansi_term v0.12.1
  Downloaded unicode-segmentation v1.9.0
  Downloaded vec_map v0.8.2
  Downloaded jobserver v0.1.24
  Downloaded zstd v0.9.2+zstd.1.5.1
  Downloaded textwrap v0.11.0
  Downloaded heck v0.3.3
  Downloaded zstd-sys v1.6.2+zstd.1.5.1
  Downloaded libc v0.2.117
  Downloaded lazy_static v1.4.0
  Downloaded clap v2.34.0
  Downloaded structopt v0.3.26
  Downloaded zstd-safe v4.1.3+zstd.1.5.1
  Downloaded unicode-width v0.1.9
  Downloaded strsim v0.8.0
  Downloaded serde_derive v1.0.136
  Downloaded proc-macro-error-attr v1.0.4
  Downloaded version_check v0.9.4
  Downloaded unicode-xid v0.2.2
  Downloaded serde v1.0.136
  Downloaded atty v0.2.14
  Downloaded byteorder v1.4.3
  Downloaded daachorse v0.2.1
  Downloaded 32 crates (2.5 MB) in 1.12s
   Compiling libc v0.2.117
   Compiling proc-macro2 v1.0.36
   Compiling unicode-xid v0.2.2
   Compiling syn v1.0.86
   Compiling version_check v0.9.4
   Compiling serde_derive v1.0.136
   Compiling serde v1.0.136
   Compiling anyhow v1.0.53
   Compiling zstd-safe v4.1.3+zstd.1.5.1
   Compiling unicode-segmentation v1.9.0
   Compiling unicode-width v0.1.9
   Compiling bitflags v1.3.2
   Compiling byteorder v1.4.3
   Compiling strsim v0.8.0
   Compiling ansi_term v0.12.1
   Compiling vec_map v0.8.2
   Compiling lazy_static v1.4.0
   Compiling textwrap v0.11.0
   Compiling daachorse v0.2.1
   Compiling heck v0.3.3
   Compiling proc-macro-error-attr v1.0.4
   Compiling proc-macro-error v1.0.4
   Compiling quote v1.0.15
   Compiling atty v0.2.14
   Compiling jobserver v0.1.24
   Compiling clap v2.34.0
   Compiling cc v1.0.72
   Compiling zstd-sys v1.6.2+zstd.1.5.1
   Compiling structopt-derive v0.4.18
   Compiling structopt v0.3.26
   Compiling zstd v0.9.2+zstd.1.5.1
   Compiling bincode v1.3.3
   Compiling vaporetto v0.2.0 (/work/vae_experiments/vaporetto/vaporetto)
   Compiling convert_kytea_model v0.1.0 (/work/vae_experiments/vaporetto/convert_kytea_model)
    Finished release [optimized] target(s) in 1m 14s
     Running `target/release/convert_kytea_model --model-in jp-0.4.7-5-tokenize.model.zstd`
error: The following required arguments were not provided:
    --model-out <model-out>

USAGE:
    convert_kytea_model --model-in <model-in> --model-out <model-out>

I think, the correct command is:

% cargo run --release -p convert_kytea_model -- --model-in jp-0.4.7-5.mod --model-out jp-0.4.7-5-tokenize.model.zstd

Cannot deserialize predictor

I want to serialize & deserialize predictor to persistent it in file to reduce model building time when used as cli tool.

But I got DecodeError(UnexpectedEnd { additional: 1 }) error when deserializing.

However, when changing the parameter of predict_tags to false, deserializing works, but I need the yomi of each token.

Is it a bug or I missed something?

Here's my code:

use std::fs::File;
use std::io::Read;
use vaporetto::{Model, Predictor};

fn main() {
    let file = File::open("./model.zst").unwrap();
    let mut decoder = ruzstd::StreamingDecoder::new(file).unwrap();
    let mut buffer = vec![];
    decoder.read_to_end(&mut buffer).unwrap();
    let (model, _) = Model::read_slice(&buffer).unwrap();

    // when predict_tags set to false, it works
    let predictor = Predictor::new(model, true).unwrap();

    let serialized = predictor.serialize_to_vec().unwrap();
    unsafe { Predictor::deserialize_from_slice_unchecked(&serialized).unwrap(); }
    // DecodeError(UnexpectedEnd { additional: 1 })
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.