Comments (8)
After carefully reading source code, I totally figure out what's wrong with my test.txt
.
In trainMode == 1, it will random choose 1 document as RHS, and treat the rest as LHS.
So if we want to evaluate 1 document with 1 user, I think it's better to choose the last document as RHS (as paper mentioned, maybe trainMode 6?).
For example:
train.txt
D_1<tab>D_2<tab>D_3
test.txt
D_1<tab>D_2<tab>D_3<tab>D_4
It will be clear that D_1~D_3 are LHS, and the D_4 is RHS.
Anyway, thank you for your patience!
from starspace.
Hi,
- what should contain the basedoc file? It is supposed to be a large set of unrelated and random documents?
So, for simplicity, let us asssume that I trained content-based recommendation (DocSpace) (-trainMode 1 -fileFormat labelDoc) with:
train.txt : where each line is a user and documents he read
roger federer loses <tab> venus williams wins <tab> world series ended
i love cats <tab> funny lolcat links <tab> how to be a petsitter
So, test.txt would follow the same lines, each line a new user with tab-separated documents he read and at test time one document from each line is randomly picked and ranked against all the (unrelated and random) documents from basedoc?
from starspace.
We updated the description in the doc to be more clear:
Input file format:
roger federer loses venus williams wins world series ended
i love cats funny lolcat links how to be a petsitter
Each line is a user, and each document (documents separated by tabs) are documents that they liked. So the first user likes sports, and the second is interested in pets in this case.
from starspace.
In terms of testing, you need to compare against basedocs, similar to this script for sentence matching:
https://github.com/facebookresearch/StarSpace/blob/master/examples/wikipedia_sentence_matching.sh
from starspace.
What about the test.txt
in this example?
Does D_i represent the next clicked document for User_i? (D_i is the i-th line in the test.txt
)
So the test.txt
has same number of lines with train.txt
?
from starspace.
@kdqzzxxcc yes the test.txt has the same format with train.txt. If you want to test on the same set of users, then yes it has the same number of lines with train.txt. However, that is not a requirement, i.e. you can test it on new users as well (as long as the new users are represented in the same format), in that case your test.txt does not necessarily has the same number of lines with train.txt.
from starspace.
@ledw I have tried the example mentioned above. I am still confused about the test.txt
file.
For example:
train.txt
roger federer loses <tab> venus williams wins <tab> world series ended
i love cats <tab> funny lolcat links <tab> how to be a petsitter
test.txt
I love tennis
I love pets
Each line in test.txt
contains one document.
Each line (D_i) in test.txt
represents User_i in train.txt
But there will be an error ERROR: File 'test.txt' is empty.
when I change test.txt
I love tennis<tab>I love tennis
I love pets<tab>I love pets
Then it is ok. So each line in test.txt
must contain at least 2 documents?
If there are 2 documents for each user in test.txt
, which one do we evaluate for Hits@n metrics?
from starspace.
@kdqzzxxcc Thanks for your suggestion. Yes I think what you suggested is a good option to have especially in prediction time. I'll add that to our list.
from starspace.
Related Issues (20)
- How to run starspace to train the model under Windows envrionment? HOT 1
- Is development stopped for starspace? HOT 1
- Starspace in Colab HOT 1
- Doc2Vec w/ Starspace
- Training on texts with different lengths
- basedoc argument for embed_doc
- Potential bugs in model.cpp?
- Getting embedding for test file
- Segmentation Fault when training after "initFromTsv"
- Example results do not agree with those published HOT 1
- loadBaseDocs() is called twice in query_predict utility
- label is printed twice in printArgs
- Must provide base labels when label is featured. on query_nn on trainDoc=1
- error load sample.txt HOT 1
- Incremental embedding
- fb
- fb
- my Facebook account was disabled for enable my disable account profile
- How is a single feature obtained from "bag of features" for tag prediction
- how to structure the labels for training
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from starspace.