Comments (2)
Hi @xiaohan2012, thank you for your kind words.
It's true, that the Python interface doesn't have the tree_structure
parameter right now (I would like to improve this functionality first before adding it), but constructors accept **kwargs that are passed to the underlying CPP module. This actually allows using all undocumented parameters that are implemented in src/args.cpp
file also from Python (so for experimental purposes new options can be implemented just in CPP, without the need of updating the Python module).
So you can use the tree_structure
out of the box, like in this example below that trains two PLTs: first trains one constructed its tree using hierarchical k-means clustering, the second one loads a tree created by for the first one.
from napkinxc.datasets import load_dataset
from napkinxc.models import PLT
from napkinxc.measures import precision_at_k
X_train, Y_train = load_dataset("eurlex-4k", "train")
X_test, Y_test = load_dataset("eurlex-4k", "test")
plt = PLT("eurlex-model")
plt.fit(X_train, Y_train)
Y_pred = plt.predict(X_test, top_k=5)
print("Precision at k:", precision_at_k(Y_test, Y_pred, k=5))
plt2 = PLT("eurlex-model2", tree_structure="eurlex-model/tree", verbose=True) # I added the verbose option here as a proof, it will print the confirmation that the tree was loaded from a given file.
plt2.fit(X_train, Y_train)
Y_pred = plt2.predict(X_test, top_k=5)
print("Precision at k:", precision_at_k(Y_test, Y_pred, k=5)) # Since the tree was the same, this should give very similar results
When it comes to the tree format, it's pretty strict and limited right now:
- In the first line, it expects 2 numbers space separated:
m
- numbers of labels andt
- numbers of tree nodes. - Then
t - 1
lines are expected, each specifies one tree node with two or three numbers space separated:p
- id of the parent node,n
- node id,l
- label id (optional). - 0 is always the id of the root node.
p
andn
should be <t
andl
<m
.
from napkinxc.
Thank you, it works!
from napkinxc.
Related Issues (12)
- pip install napkinxc failed HOT 1
- segmentation fault (possibly in kmeans) HOT 2
- feature dimension mismatch between train and test data HOT 3
- string "amazontitles-3M" to "amazontitles-3m" in datasets.py HOT 1
- pickling models HOT 1
- C++ compilation error when building HOT 2
- a possible bug during kmeans initialization HOT 1
- OOM/SegFault issues? HOT 3
- Preparation of custom dataset for training HOT 2
- Segmentation Fault HOT 2
- build failed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from napkinxc.