Can you please provide a example about how to test Do

Hi, what should contain the basedoc file? It is supposed to be

We updated the deion in the doc to be more clear: Input file f

What about the test.txt in this example? <p dir="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Example for Docspace test about starspace HOT 8 CLOSED

facebookresearch commented on July 30, 2024 3

Example for Docspace test

from starspace.

Comments (8)

kdqzzxxcc commented on July 30, 2024 2

After carefully reading source code, I totally figure out what's wrong with my test.txt.

In trainMode == 1, it will random choose 1 document as RHS, and treat the rest as LHS.

So if we want to evaluate 1 document with 1 user, I think it's better to choose the last document as RHS (as paper mentioned, maybe trainMode 6?).

For example:
train.txt
D_1<tab>D_2<tab>D_3
test.txt
D_1<tab>D_2<tab>D_3<tab>D_4

It will be clear that D_1~D_3 are LHS, and the D_4 is RHS.

Anyway, thank you for your patience!

from starspace.

davidalbertonogueira commented on July 30, 2024 1

Hi,

what should contain the basedoc file? It is supposed to be a large set of unrelated and random documents?

So, for simplicity, let us asssume that I trained content-based recommendation (DocSpace) (-trainMode 1 -fileFormat labelDoc) with:

train.txt : where each line is a user and documents he read

roger federer loses <tab> venus williams wins <tab> world series ended
i love cats <tab> funny lolcat links <tab> how to be a petsitter

So, test.txt would follow the same lines, each line a new user with tab-separated documents he read and at test time one document from each line is randomly picked and ranked against all the (unrelated and random) documents from basedoc?

from starspace.

jaseweston commented on July 30, 2024

We updated the description in the doc to be more clear:

Input file format:

roger federer loses venus williams wins world series ended
i love cats funny lolcat links how to be a petsitter

Each line is a user, and each document (documents separated by tabs) are documents that they liked. So the first user likes sports, and the second is interested in pets in this case.

from starspace.

jaseweston commented on July 30, 2024

In terms of testing, you need to compare against basedocs, similar to this script for sentence matching:

https://github.com/facebookresearch/StarSpace/blob/master/examples/wikipedia_sentence_matching.sh

from starspace.

kdqzzxxcc commented on July 30, 2024

What about the test.txt in this example?

Does D_i represent the next clicked document for User_i? (D_i is the i-th line in the test.txt)

So the test.txt has same number of lines with train.txt ?

from starspace.

ledw commented on July 30, 2024

@kdqzzxxcc yes the test.txt has the same format with train.txt. If you want to test on the same set of users, then yes it has the same number of lines with train.txt. However, that is not a requirement, i.e. you can test it on new users as well (as long as the new users are represented in the same format), in that case your test.txt does not necessarily has the same number of lines with train.txt.

from starspace.

kdqzzxxcc commented on July 30, 2024

@ledw I have tried the example mentioned above. I am still confused about the test.txt file.

For example:

train.txt

roger federer loses <tab> venus williams wins <tab> world series ended
i love cats <tab> funny lolcat links <tab> how to be a petsitter

test.txt
I love tennis
I love pets

Each line in test.txt contains one document.
Each line (D_i) in test.txt represents User_i in train.txt
But there will be an error ERROR: File 'test.txt' is empty.

when I change test.txt
I love tennis<tab>I love tennis
I love pets<tab>I love pets

Then it is ok. So each line in test.txt must contain at least 2 documents?
If there are 2 documents for each user in test.txt, which one do we evaluate for Hits@n metrics?

from starspace.

ledw commented on July 30, 2024

@kdqzzxxcc Thanks for your suggestion. Yes I think what you suggested is a good option to have especially in prediction time. I'll add that to our list.

from starspace.

Example for Docspace test about starspace HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent