Giter VIP home page Giter VIP logo

Comments (8)

kdqzzxxcc avatar kdqzzxxcc commented on July 30, 2024 2

After carefully reading source code, I totally figure out what's wrong with my test.txt.

In trainMode == 1, it will random choose 1 document as RHS, and treat the rest as LHS.

So if we want to evaluate 1 document with 1 user, I think it's better to choose the last document as RHS (as paper mentioned, maybe trainMode 6?).

For example:
train.txt
D_1<tab>D_2<tab>D_3
test.txt
D_1<tab>D_2<tab>D_3<tab>D_4

It will be clear that D_1~D_3 are LHS, and the D_4 is RHS.

Anyway, thank you for your patience!

from starspace.

davidalbertonogueira avatar davidalbertonogueira commented on July 30, 2024 1

Hi,

  1. what should contain the basedoc file? It is supposed to be a large set of unrelated and random documents?

So, for simplicity, let us asssume that I trained content-based recommendation (DocSpace) (-trainMode 1 -fileFormat labelDoc) with:

train.txt : where each line is a user and documents he read

roger federer loses <tab> venus williams wins <tab> world series ended
i love cats <tab> funny lolcat links <tab> how to be a petsitter

So, test.txt would follow the same lines, each line a new user with tab-separated documents he read and at test time one document from each line is randomly picked and ranked against all the (unrelated and random) documents from basedoc?

from starspace.

jaseweston avatar jaseweston commented on July 30, 2024

We updated the description in the doc to be more clear:

Input file format:

roger federer loses venus williams wins world series ended
i love cats funny lolcat links how to be a petsitter

Each line is a user, and each document (documents separated by tabs) are documents that they liked. So the first user likes sports, and the second is interested in pets in this case.

from starspace.

jaseweston avatar jaseweston commented on July 30, 2024

In terms of testing, you need to compare against basedocs, similar to this script for sentence matching:

https://github.com/facebookresearch/StarSpace/blob/master/examples/wikipedia_sentence_matching.sh

from starspace.

kdqzzxxcc avatar kdqzzxxcc commented on July 30, 2024

What about the test.txt in this example?

Does D_i represent the next clicked document for User_i? (D_i is the i-th line in the test.txt)

So the test.txt has same number of lines with train.txt ?

from starspace.

ledw avatar ledw commented on July 30, 2024

@kdqzzxxcc yes the test.txt has the same format with train.txt. If you want to test on the same set of users, then yes it has the same number of lines with train.txt. However, that is not a requirement, i.e. you can test it on new users as well (as long as the new users are represented in the same format), in that case your test.txt does not necessarily has the same number of lines with train.txt.

from starspace.

kdqzzxxcc avatar kdqzzxxcc commented on July 30, 2024

@ledw I have tried the example mentioned above. I am still confused about the test.txt file.

For example:

train.txt

roger federer loses <tab> venus williams wins <tab> world series ended
i love cats <tab> funny lolcat links <tab> how to be a petsitter

test.txt
I love tennis
I love pets

Each line in test.txt contains one document.
Each line (D_i) in test.txt represents User_i in train.txt
But there will be an error ERROR: File 'test.txt' is empty.

when I change test.txt
I love tennis<tab>I love tennis
I love pets<tab>I love pets

Then it is ok. So each line in test.txt must contain at least 2 documents?
If there are 2 documents for each user in test.txt, which one do we evaluate for Hits@n metrics?

from starspace.

ledw avatar ledw commented on July 30, 2024

@kdqzzxxcc Thanks for your suggestion. Yes I think what you suggested is a good option to have especially in prediction time. I'll add that to our list.

from starspace.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.