rueycheng / adarank Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 6.0 5 KB

Python implementation of the AdaRank algorithm

License: MIT License

Python 100.00%

adarank's People

Contributors

Stargazers

Watchers

Forkers

shalder wubo2180 swarnadeep8597 ty2009137128 neelik lucasgviedma

adarank's Issues

Display ranking

Hi @rueycheng

Finally, I have managed to get the nDCG scores from k = 1 to k = 20

Now, I would like to display a ranking with the results.

I have tried this:

# Return ranking
 docno = load_docno(test_file, letor=True)
 print_trec_run(qid_test, docno, pred, output=open(ranking_file, 'wb'))

However, docnos cant' be found in my test_file (docno array is empty)

In print_trec_run:
format(qid=qid[i], docno=docno[i], rank=rank, sim=pred[i], run_id=run_id))

I get this error:
IndexError: index 7 is out of bounds for axis 0 with size 0

I have seen that docnos follow one of these patterns:

docno_pattern = re.compile(r'#\s*docid\s*=\s*(\S+)')
docno_pattern = re.compile(r'#\s*(\S+)')

However, I think there are no docnos in my test file :(

It looks like this:

0 qid:1 65:0.8635398447491647 88:0.5042806128839266
0 qid:1 0:0.4336122872341365 1:0.5701433653876129 5:0.464003602765715 39:0.4336122872341365 68:0.2044479985767591 81:0.2044479985767591
0 qid:1 0:0.4116103074959742 4:0.5412136437389908 39:0.4116103074959742 67:0.5412136437389908 68:0.1940740750173357 81:0.1940740750173357
0 qid:1 60:0.351932125005294 68:0.2570511854639458 74:0.5833888176004248 76:0.6353350635618945 81:0.2570511854639458
0 qid:1 20:0.2779414512496159 24:0.5269511067979753 37:0.5269511067979753 63:0.2995265310626487 92:0.5269511067979753
0 qid:1 20:0.5645782010909166 94:0.8253795822849901

How could I get a ranking with this test file? or What could I do in order to change the test file format?

Thanks in advance.

ValueError

Hi,

I am trying to use this AdaRank implementation which seems amazing.
However, when I try to train the model, I get a ValueError.

My code is:

 X, y, qid = load_svmlight_file(train_file, query_id=True)
 print('X shape:', X.shape)
 print('y shape:', y.shape)
 print('qid shape:', qid.shape)
 
 X_test, y_test, qid_test = load_svmlight_file(test_file, query_id=True)

 scorer = NDCGScorer(k=10)
 
 '''
 Run AdaRank for 100 iterations optimizing for NDCG@10. 
 When no improvement is made within the previous 10 iterations, 
 the algorithm will stop.
 '''
 model = AdaRank(max_iter=100, estop=10, scorer=scorer).fit(X, y, qid)

The shapes of my data are:

X shape: (140, 105)
y shape: (140,)
qid shape: (140,)

The error happens in line:
model = AdaRank(max_iter=100, estop=10, scorer=scorer).fit(X, y, qid)

The error is:
ValueError: shapes (3,) and (91,) not aligned: 3 (dim 0) != 91 (dim 0)

Specifically, the error is in this line of the library:
weighted_average = np.dot(weights, score)

When I try to debug in the library, I get these:

Number of queries: 3
Weights shape: (3,)
Number of weak ranker scores: 105

And inside for fid, score in enumerate(weak_ranker_score)::

fid: 0
Score shape: (91,)

So it seems that there is a problem of sizes between weights and scores, but I don't know what's going on.

I have 3 different qid, 140 documents for training with 105 features each one.

Hope you can help me,
Thanks in advance

I have been trying to use the algorithm but I am not sure how the dataset should look like. Could you provide an example dataset for training and testing? Or at least, could you provide more information about what are X, y and qid types?

Thank you in advance <3

Given a query, return a ranking

Hi @rueycheng,

After doing some research I think print_trec_run is not what I'm looking for. Perhaps you can help me with this:

Once I have the AdaRank model trained with a set of documents and queries in Svmlight format, I would like to return a ranking based on a query entered by the user. That is, I am not interested in generating a ranking from a test set with other different documents, but rather in having a ranking with scores for the documents with which I have trained the model. I don't know if I explain myself.

That is, after training the model:

 X, y, qid = load_svmlight_file(svm_file, query_id=True)
 model = AdaRank(max_iter=100, estop=10, scorer=NDCGScorer(k=10)).fit(X, y, qid)

I want to calculate the scores for the same X documents given a qid_test (or more than one). And with those scores, return a ranking like:

qid_test docno   rank   score
----------------------------
1        3       1      0.7
1        5       2      0.23
1        6       3      0
2        6       1      1
2        5       2      0
2        3       3      0

Is it possible to get something like this?

Thanks in advance.

rueycheng / adarank Goto Github PK

adarank's People

Contributors

Stargazers

Watchers

Forkers

adarank's Issues

Display ranking

ValueError

Example data

Given a query, return a ranking

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent