Giter VIP home page Giter VIP logo

tomtung / omikuji Goto Github PK

View Code? Open in Web Editor NEW
82.0 82.0 11.0 22.68 MB

An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

Home Page: https://crates.io/crates/omikuji

License: MIT License

Rust 93.35% Python 6.65%
classification extreme-classification extreme-multi-label-classification machine-learning multi-label-classification rust supervised-learning

omikuji's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

omikuji's Issues

performance measures

Sir,
could you implement other performance measures? kindly share me code for the recall, nDCG in rust.

Explainability

Hey @tomtung,

great repository! Your implementation of parabel has significantly outperformed several architectures of DNN that I tried (for dataset of 600k samples, 20k labels) while being much faster at the same time (both training and prediction). Thank you for the python wrapper as well, since it was easier and faster to try for me.

Can you think of any way to approach a challenge of explainability? Is there a way to e.g. for each prediction, get the most important words in the file that decided the file was classified in that way or another?

Thanks again!

thread 'main' panicked at 'Could not determine the UTC offset on this system

I've been running this on linux via AWS SageMaker for some experimentation with Twitter hashtags and I think the new releases changed something relating to the simple-logger update. I reverted back to 0.3.3 and everything works.

I really appreciate this implementation tho! it is very quick and efficient for testing out some extreme classification pipelines, thanks for writing it up!

Backtrace

thread 'main' panicked at 'Could not determine the UTC offset on this system. Possible causes are that the time crate does not implement "local_offset_at" on your system, or that you are running in a multi-threaded environment and the time crate is returning "None" from "local_offset_at" to avoid unsafe behaviour. See the time crate's documentation for more information. (https://time-rs.github.io/internal-api/time/index.html#feature-flags): IndeterminateOffset', /home/ec2-user/.cargo/registry/src/github.com-1ecc6299db9ec823/simple_logger-1.15.1/src/lib.rs:360:64
stack backtrace:
0: 0x556ab1b7413d - std::backtrace_rs::backtrace::libunwind::trace::hf6a6dfd7da937cb0
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
1: 0x556ab1b7413d - std::backtrace_rs::backtrace::trace_unsynchronized::hc596a19e4891f7f3
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x556ab1b7413d - std::sys_common::backtrace::_print_fmt::hb16700db31584325
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:67:5
3: 0x556ab1b7413d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h231c4190cfa75162
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:46:22
4: 0x556ab1b147cc - core::fmt::write::h2a1462b5f8eea807
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/fmt/mod.rs:1163:17
5: 0x556ab1b72cb4 - std::io::Write::write_fmt::h71ddfebc68685972
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/io/mod.rs:1696:15
6: 0x556ab1b73360 - std::sys_common::backtrace::_print::hcc197d4bebf2b369
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:49:5
7: 0x556ab1b73360 - std::sys_common::backtrace::print::h335a66af06738c7c
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:36:9
8: 0x556ab1b73360 - std::panicking::default_hook::{{closure}}::h6fac9ac9c8b79e52
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:210:50
9: 0x556ab1b7278a - std::panicking::default_hook::h341c1030c6a1161b
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:227:9
10: 0x556ab1b7278a - std::panicking::rust_panic_with_hook::h50680ff4b44510c6
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:624:17
11: 0x556ab1b92608 - std::panicking::begin_panic_handler::{{closure}}::h9371c0fbb1e8465a
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:521:13
12: 0x556ab1b92586 - std::sys_common::backtrace::__rust_end_short_backtrace::h9b3efa22a5768c0f
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:139:18
13: 0x556ab1b92542 - rust_begin_unwind
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
14: 0x556ab1a850e0 - core::panicking::panic_fmt::h23b9203e89cc61cf
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
15: 0x556ab1a853c2 - core::result::unwrap_failed::h32ef6b3156e8fc57
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1616:5
16: 0x556ab1b6ebdd - <simple_logger::SimpleLogger as log::Log>::log::h1c34c8b7ef19bacc
17: 0x556ab1b230a2 - omikuji::data::DataSet::load_xc_repo_data_file::hb7a7cb9826b9150f
18: 0x556ab1ac19ea - omikuji::train::h3d9bcb33a544581f
19: 0x556ab1acda31 - omikuji::main::h382f86dabe280b2b
20: 0x556ab1ab4c63 - std::sys_common::backtrace::__rust_begin_short_backtrace::h9a5620b049f38e48
21: 0x556ab1ad23f9 - main
22: 0x7f6556dff585 - __libc_start_main
23: 0x556ab1a902a5 -
24: 0x0 -

Notes

Seems like the same problem as noted here:
ravenclaw900/DietPi-Dashboard#82

Possible errors in the eurlex_train.txt and eurlex_test.txt - missing labels?

I was trying to load the eurlex_train.txt and eurlex_test.txt.
As far as I understood they are in the LibSVM format for multilabel classification.

Using the sklearn.datasets.load_svmlight_file fails though.
I've observed that in the eurlex_train.txt file, there are 28 rows holding no label, where the newline starts with a space.

If you run the following command

cat eurlex_train.txt | grep -n "^ " | cut -d ':' -f 1

it results in 28 rows with the following line numbers in the eurlex_train.txt where the labels are missing:

95
254
511
1529
1941
1955
4031
4428
4645
4729
5233
5764
6297
6335
6705
7085
9479
9677
10001
10490
10738
10912
11676
12282
12601
13149
14169
14724

Despite this, the training using the Rust CLI (and the python wrapper too) works straight.
I've observed that a check for the presence of labels in the line are present in the omikuji/src/data.rs by the parse_xc_repo_data_line function.

Since it seems I cannot rely on the very good sklearn.datasets.load_svmlight_file, what label should I assign to those rows?
In a first simple implementation I decided to skip missing-label rows.

Predefined tree architecture

Hey @tomtung!

Do you think it would be possible to predefine tree architecture for parabel? It could be useful when labels are hierarchical by definition. Do you have any hints for implementation? Like where to start?

Regards,
rabitwhte

Cargo Installation issues

Hi,
I'm having installation issues. I was trying to install the package using cargo (both cli app and from source). But I'm getting the errors as follows:

error[E0405]: cannot find trait `CommandFactory` in crate `clap`
 --> /Users/thanhdeku/.cargo/registry/src/github.com-1ecc6299db9ec823/omikuji-0.5.0/src/bin/omikuji.rs:9:10
  |
9 | #[derive(Parser)]
  |          ^^^^^^ not found in `clap`
  |
  = note: this error originates in the derive macro `Parser` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0412]: cannot find type `Command` in crate `clap`
 --> /Users/thanhdeku/.cargo/registry/src/github.com-1ecc6299db9ec823/omikuji-0.5.0/src/bin/omikuji.rs:9:10
  |
9 | #[derive(Parser)]
  |          ^^^^^^ not found in `clap`
  |
  = note: this error originates in the derive macro `Parser` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
  |
1 | use std::process::Command;
  |
help: if you import `Command`, refer to it directly
  |
9 - #[derive(Parser)]
9 + #[derive(Parser)]
  | 

error[E0433]: failed to resolve: could not find `Command` in `clap`
 --> /Users/thanhdeku/.cargo/registry/src/github.com-1ecc6299db9ec823/omikuji-0.5.0/src/bin/omikuji.rs:9:10
  |
9 | #[derive(Parser)]
  |          ^^^^^^ not found in `clap`
  |
  = note: this error originates in the derive macro `Parser` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
  |
1 | use std::process::Command;
  |
help: if you import `Command`, refer to it directly
  |
9 - #[derive(Parser)]
9 + #[derive(Parser)]
  | 

error[E0412]: cannot find type `Command` in crate `clap`
  --> /Users/thanhdeku/.cargo/registry/src/github.com-1ecc6299db9ec823/omikuji-0.5.0/src/bin/omikuji.rs:29:10
   |
29 | #[derive(Args)]
   |          ^^^^ not found in `clap`
   |
   = note: this error originates in the derive macro `Args` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
   |
1  | use std::process::Command;
   |
help: if you import `Command`, refer to it directly
   |
29 - #[derive(Args)]
29 + #[derive(Args)]
   | 

error[E0412]: cannot find type `Command` in crate `clap`
   --> /Users/thanhdeku/.cargo/registry/src/github.com-1ecc6299db9ec823/omikuji-0.5.0/src/bin/omikuji.rs:175:10
    |
175 | #[derive(Args)]
    |          ^^^^ not found in `clap`
    |
    = note: this error originates in the derive macro `Args` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
    |
1   | use std::process::Command;
    |
help: if you import `Command`, refer to it directly
    |
175 - #[derive(Args)]
175 + #[derive(Args)]
    | 

Some errors have detailed explanations: E0405, E0412, E0433.
For more information about an error, try `rustc --explain E0405`.
error: could not compile `omikuji` due to 5 previous errors
error: failed to compile `omikuji v0.5.0`, intermediate artifacts can be found at `/var/folders/zs/t_2_tn793rggn16t5qrgd6v80000gn/T/cargo-installNWnbDo`

Can you help me to check the issues ? May cargo version is cargo 1.63.0-nightly (39ad1039d 2022-05-25). Many thanks

Cannot load model if model directory contains symlinks

I'm trying to adapt Omikuji (via the integration with Annif) into a Data Version Control workflow. In DVC, large model files are typically stored in a cache outside the working tree (which is a git repository). There are several ways to keep the working directory synchronized with the cache, but one common solution is the use of symbolic links. This means that model files will be moved to the cache directory and replaced with symlinks that point to the original files.

I noticed that Omikuji has problems loading the model if the files in the model directory (settings.json, tree0.cbor, tree1.cbor ...) aren't regular files but symlinks. Loading the model apparently succeeds, but all the predictions are empty. I was able demonstrate this without involving DVC, by editing the Python example into this:

import os
import sys
import shutil
import time

import omikuji

if __name__ == "__main__":
    # Adjust hyper-parameters as needed
    hyper_param = omikuji.Model.default_hyper_param()
    hyper_param.n_trees = 2

    # Train
    model = omikuji.Model.train_on_data("./eurlex_train.txt", hyper_param)

    # Serialize & de-serialize
    model.save("./model")
    
    # create a directory containing symlinks to the saved model files
    shutil.rmtree("./model2", ignore_errors=True)
    os.mkdir("./model2")
    for fn in os.listdir("./model"):
        os.symlink(f"../model/{fn}", f"./model2/{fn}")

    # load the model from the directory containing symlinks
    model = omikuji.Model.load("./model2")

    # Predict
    feature_value_pairs = [
        (0, 0.101468),
        (1, 0.554374),
        (2, 0.235760),
        (3, 0.065255),
        (8, 0.152305),
        (10, 0.155051),
        # ...
    ]
    label_score_pairs = model.predict(feature_value_pairs, top_k=3)
    print("Dummy prediction results: {}".format(label_score_pairs))

The result of running this:

INFO [omikuji::data] Loading data from ./eurlex_train.txt
INFO [omikuji::data] Parsing data
INFO [omikuji::data] Loaded 15539 examples; it took 0.16s
INFO [omikuji::model::train] Training model with hyper-parameters HyperParam { n_trees: 2, min_branch_size: 100, max_depth: 20, centroid_threshold: 0.0, collapse_every_n_layers: 0, linear: HyperParam { loss_type: Hinge, eps: 0.1, c: 1.0, weight_threshold: 0.1, max_iter: 20 }, cluster: HyperParam { k: 2, balanced: true, eps: 0.0001, min_size: 2 }, tree_structure_only: false, train_trees_1_by_1: false }
INFO [omikuji::model::train] Initializing tree trainer
INFO [omikuji::model::train] Computing label centroids
Labels 3786 / 3786 [==============================================================] 100.00 % 23820.55/s INFO [omikuji::model::train] Start training forest
7824 / 7824 [======================================================================] 100.00 % 3112.38/s INFO [omikuji::model::train] Model training complete; it took 3.39s
INFO [omikuji::model] Saving model...
INFO [omikuji::model] Saving tree to ./model/tree0.cbor
INFO [omikuji::model] Saving tree to ./model/tree1.cbor
INFO [omikuji::model] Model saved; it took 0.08s
INFO [omikuji::model] Loading model...
INFO [omikuji::model] Loading model settings from ./model2/settings.json...
INFO [omikuji::model] Loaded model settings Settings { n_features: 5000, classifier_loss_type: Hinge }...
INFO [omikuji::model] Model loaded; it took 0.00s
Dummy prediction results: []

The suspicious part is the model loading (it should take more than 0.00s) and then the empty list of predictions.

I wonder if there's a good reason why the model files have to be actual files. Normally it doesn't matter whether a file used for a read operation is a regular file or a symlink; as long as the symlink points to an actual file (with the correct permissions etc.) that should work fine.

Some information about the system:
Ubuntu Linux 20.04 amd64, ext4 filesystem
Python 3.8.10
Omikuji 0.4.1 installed in a virtual environment with pip

Feature format error

I got this error when experimenting with training, but the same file works with craft-ml.

What is wrong with the feature vector in question?

parabel train data/hack_svml_train.txt
2019-05-10 16:18:41 INFO  [parabel::data] Loading data from data/hack_svml_train.txt
2019-05-10 16:18:41 INFO  [parabel::data] Parsing data
thread 'main' panicked at 'Failed to load training data: Custom { kind: InvalidData, error: StringError("Feature vector is invalid in line 278 13:1 1:1 2:1 102601:1 9:1 156:1 519:1 5037:1 3856:1 311:1 156:1 1078:1 308:1 12:1") }', libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
$ uname -a
Darwin C02QRF6FG8WP 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64

Issues when training on a large dataset

Hi Tom! At first, I wanted to thank you for your great contribution. This is the best implementation for XMC I've found (that is also feasible to use in production).

I ran a number of experiments and I have an observation that it works great when the training set is about 1-2M samples, but the task I'm trying to solve has 60M samples in the training set with 1M labels and 3M features from Tf-Idf. I always use default Parabel-like parameters.

Once I managed to train a model on 60M samples with 260k labels, but the only machine that managed to fit it was 160CPU 3.4T RAM GCP instance which is very expensive.

I tried 96CPU 1.4T machine to decrease costs, but it hangs for 3-4 hours on Initializing tree trainer step and then disconnects (I guess it gets out of memory).

Do you have any tips and tricks how to run training on a dataset of this size at a reasonable cost? E.g. would it be possible to train in batches on smaller/cheaper machines? Or are there any "magic" hyperparameter settings to achieve this?

Installation issues

HI,

I'm having installation issues. I was trying to install the package using pip and with 'setup.py' as well. I'm getting the same error each time:

2019-12-27T10:47:45,077 Created temporary directory: C:\Users\myuser\AppData\Local\Temp\pip-wheel-044u2wao
2019-12-27T10:47:45,077 Destination directory: C:\Users\myuser\AppData\Local\Temp\pip-wheel-044u2wao
2019-12-27T10:47:45,078 Running command 'c:\users\myuser\appdata\local\programs\python\python36\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\myuser\AppData\Local\Temp\pip-install-3k2qvyqz\omikuji\setup.py'"'"'; file='"'"'C:\Users\myuser\AppData\Local\Temp\pip-install-3k2qvyqz\omikuji\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\myuser\AppData\Local\Temp\pip-wheel-044u2wao' --python-tag cp36
2019-12-27T10:47:45,879 running bdist_wheel
2019-12-27T10:47:45,881 running build
2019-12-27T10:47:45,881 running build_py
2019-12-27T10:47:45,882 creating build\lib
2019-12-27T10:47:45,882 creating build\lib\omikuji
2019-12-27T10:47:45,883 copying python-wrapper\omikuji_init_.py -> build\lib\omikuji
2019-12-27T10:47:45,887 error: [WinError 2] Cannot find the file specified
2019-12-27T10:47:45,911 ERROR: Failed building wheel for omikuji

Any ideas on how to deal with that?

Regards,
Jakub

Cannot set collapse_every_n_layers via Python bindings

Hi,

I'm trying to figure out how to use the AttentionXML-like hyperparameters mentioned in the top level README using the Python bindings. This would require setting the collapse_every_n_layers hyperparameter to 5.

It appears to me that this hyperparameter is not exposed to the Python bindings (nor to the C API). There is no field called collapse_every_n_layers (or anything similar) in the HyperParam object returned by omikuji.Model.default_hyper_param(). In fact, the only source files that mention this hyperparameter are src/bin/omikuji.rs and src/model/train.rs so it is only used within the Rust implementation, not in any of the bindings.

Installation problem and wheels for Python 3.11 and 3.12

Could wheels be provided for Python 3.11 and just-released 3.12 to ease installation?

Also, I was unable to install Omikuji on Python 3.11 at all, due to a regexp error by Milksnake, see NatLibFi/Annif#703 (comment).

It might be that there will not be a new Milksnake release including the fix for this; it has alredy been some time since the fix has been merged to master. So, a least-effort option on Omikuji side seems to be to install current Milksnake master. (I checked this works.)

Alternatively, some projects have replaced Milksnake by Maturin, which might be a more involved, but also a long-term solution.

BTW, you might want also to update the classifiers for supported Python versions in setup.py too.

Regarding code execution

Dear Sir/Madam,
We have tried to install parabel-rs package as per the instructions given in repository. But during build process it is unable to find some directory. we are using python3.6+.

Below screenshot for the error.

image

Python import error with fresh pip install

Hey, really excited to test your code out on a problem I'm working on, but I encountered an error immediately after installing and attempting to import.

The error:

Python 3.6.5 (default, Apr 12 2018, 10:53:09)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import parabel
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-e20e1e492a86> in <module>
----> 1 import parabel

~/.pyenv/versions/3.6.5/envs/venv365/lib/python3.6/site-packages/parabel/__init__.py in <module>
      2 __all__ = ["Model", "LossType", "Trainer", "init_rayon_threads"]
      3
----> 4 from ._libparabel import lib, ffi
      5
      6 try:

~/.pyenv/versions/3.6.5/envs/venv365/lib/python3.6/site-packages/parabel/_libparabel.py in <module>
      3
      4 import os
----> 5 from parabel._libparabel__ffi import ffi
      6
      7 lib = ffi.dlopen(os.path.join(os.path.dirname(__file__), '_libparabel__lib.so'), 130)

~/.pyenv/versions/3.6.5/envs/venv365/lib/python3.6/site-packages/parabel/_libparabel__ffi.py in <module>
      1 # auto-generated file
----> 2 import _cffi_backend
      3
      4 ffi = _cffi_backend.FFI('parabel._libparabel__ffi',
      5     _version = 0x2601,

ModuleNotFoundError: No module named '_cffi_backend'

Environment

$ uname -a
Darwin C02QRF6FG8WP 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64

predict API

Hi,
Thanks for parabel and making it available! I've had some promising results already.

I'm using the python bindings and Im wondering about the API. Are there any kwargs I can use? For example, model.predict(test_example) returns the top ten ranked labels. Can I return all the rankings? Ive tried reading the source code but I just don't understand it sorry!

Wheel for Python 3.10

It would be nice to have a wheel for Python 3.10 in PyPI, especially now when Ubuntu 22.04 has been released with 3.10 as the default Python version. It would make installing Omikuji more straightforward (the version of cargo in apt repositories is too old for Omikuji, and I struggled some time to install a new enough cargo version in Dockerfile).

Bypassing writing to disk with python wrapper

First of all, what a great library is this! Really congrats!

I am using it on a classification problem with the python wrapper, as I wrote a small scikit-learn wrapper around it. However I believe that the great speedup achieved by the rust code is a bit hindered by passing around sparse arrays to disk.

Is there a way to pass directly the training features to the rust code without having to access the disk? Using a scipy.sparse.csr_matrix with its indices and values triple would be great.

Slow prediction problem

First of all, I want to say this repository is really helpful to my work. It trains really fast and efficient from small dataset to large dataset.
However, I found it really slow when I predicted data using saved model. When I started predicting, the number that shows how mach examples per second is nearly 4k, but the actual number of that is 200. I've checked the process status, it showed that most of the time the CPU usage of every core is nearly 0%! Sometimes it will increase to 20% or 50%, but it just lasted a few seconds.
I've tried prediction on different scaled data, with different parameters, this problem happened all the time.
I'm using google cloud machine with intel CPU and 64GB Memory, I'm not sure if this problem is due to my machine.
Thanks again!

unbalanced cluster with python binding

Hello,
Looking at the code of the python binding, I can not find anything on using regular k-means clustering in either balanced or unbalanced way.
There is no choice of multiple cluster in the python binding.
Should I use the CLI app for this matter ?

A few questions

Thanks again for this library and craft-ml. I have a few questions about the file format.

  1. Must labels be integers or can they just be strings?
  2. Can feature indexes be arbitrary unique integers (using feature hashing for instance) or must they be sequential?
  3. Have you compared craft-ml against parabel-rs? Do they behave similarly? Any recommendations for one over the other?

thanks

Wheel for Python 3.9?

Hi and thanks for the great library! We are using Omikuji in one backend of Annif, and it is performing very well in the subject indexing task.

As the title says, we are wondering if it would be possible to have wheel for Python 3.9 (for Linux) in PyPI? It would simplify installation into Docker image. Now for installing Omikuji on Python 3.9 it seems it is necessary to have also cargo installed (at least).

golang binding via c-api possible?

Thanks a lot for this great lib, it's wonderful!

This is not an issue per se, more like a question. I would like to use omikuji in my application, which is currently written in golang. I think my options are to use the python api in a small dedicated prediction server, or to somehow create a golang binding myself. I am not very familiar with rust, do you think the current c-api can be used to generate the necessary stub C code for such a binding (with cgo)? If you happen to have some pointers or advice on how to do that, I would love to hear it.
(Maybe adding a REST server option to omikuji itself could be a fun project too, if I ever find some time to learn rust...)

Thanks again!

regarding code

Dear Sir,

I have installed the cargo on ubuntu 18.04 and also run the training and testing on the Eurlex dataset and got the result in the form of a text file. So hereby attaching the screenshot of text file.

But sir, I would like to generate results in form of p@1,p@3,p@5....
How could i get it?
Kindly help me regarding this. I was stuck on this.
r1 (1) (1)

Different models (and results) while training on the same data

Hey @tomtung,

it's me again. I noticed that I get different results for the models that were trained on the same data, with the same hyper parameters. Is that possible? Is it because there is some randomness in label partitioning? Can I control that somehow (by specifying random state?)

Regards,
rabitwhte

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.