rueycheng / adarank Goto Github PK
View Code? Open in Web Editor NEWPython implementation of the AdaRank algorithm
License: MIT License
Python implementation of the AdaRank algorithm
License: MIT License
Hi @rueycheng
Finally, I have managed to get the nDCG scores from k = 1
to k = 20
Now, I would like to display a ranking with the results.
I have tried this:
# Return ranking
docno = load_docno(test_file, letor=True)
print_trec_run(qid_test, docno, pred, output=open(ranking_file, 'wb'))
However, docnos cant' be found in my test_file
(docno
array is empty)
In print_trec_run
:
format(qid=qid[i], docno=docno[i], rank=rank, sim=pred[i], run_id=run_id))
I get this error:
IndexError: index 7 is out of bounds for axis 0 with size 0
I have seen that docnos follow one of these patterns:
docno_pattern = re.compile(r'#\s*docid\s*=\s*(\S+)')
docno_pattern = re.compile(r'#\s*(\S+)')
However, I think there are no docnos in my test file :(
It looks like this:
0 qid:1 65:0.8635398447491647 88:0.5042806128839266
0 qid:1 0:0.4336122872341365 1:0.5701433653876129 5:0.464003602765715 39:0.4336122872341365 68:0.2044479985767591 81:0.2044479985767591
0 qid:1 0:0.4116103074959742 4:0.5412136437389908 39:0.4116103074959742 67:0.5412136437389908 68:0.1940740750173357 81:0.1940740750173357
0 qid:1 60:0.351932125005294 68:0.2570511854639458 74:0.5833888176004248 76:0.6353350635618945 81:0.2570511854639458
0 qid:1 20:0.2779414512496159 24:0.5269511067979753 37:0.5269511067979753 63:0.2995265310626487 92:0.5269511067979753
0 qid:1 20:0.5645782010909166 94:0.8253795822849901
How could I get a ranking with this test file? or What could I do in order to change the test file format?
Thanks in advance.
Hi,
I am trying to use this AdaRank implementation which seems amazing.
However, when I try to train the model, I get a ValueError
.
My code is:
X, y, qid = load_svmlight_file(train_file, query_id=True)
print('X shape:', X.shape)
print('y shape:', y.shape)
print('qid shape:', qid.shape)
X_test, y_test, qid_test = load_svmlight_file(test_file, query_id=True)
scorer = NDCGScorer(k=10)
'''
Run AdaRank for 100 iterations optimizing for NDCG@10.
When no improvement is made within the previous 10 iterations,
the algorithm will stop.
'''
model = AdaRank(max_iter=100, estop=10, scorer=scorer).fit(X, y, qid)
The shapes of my data are:
X shape: (140, 105)
y shape: (140,)
qid shape: (140,)
The error happens in line:
model = AdaRank(max_iter=100, estop=10, scorer=scorer).fit(X, y, qid)
The error is:
ValueError: shapes (3,) and (91,) not aligned: 3 (dim 0) != 91 (dim 0)
Specifically, the error is in this line of the library:
weighted_average = np.dot(weights, score)
When I try to debug in the library, I get these:
Number of queries: 3
Weights shape: (3,)
Number of weak ranker scores: 105
And inside for fid, score in enumerate(weak_ranker_score):
:
fid: 0
Score shape: (91,)
So it seems that there is a problem of sizes between weights
and scores
, but I don't know what's going on.
I have 3 different qid, 140 documents for training with 105 features each one.
Hope you can help me,
Thanks in advance
I have been trying to use the algorithm but I am not sure how the dataset should look like. Could you provide an example dataset for training and testing? Or at least, could you provide more information about what are X, y and qid types?
Thank you in advance <3
Hi @rueycheng,
After doing some research I think print_trec_run
is not what I'm looking for. Perhaps you can help me with this:
Once I have the AdaRank model trained with a set of documents and queries in Svmlight format, I would like to return a ranking based on a query entered by the user. That is, I am not interested in generating a ranking from a test set with other different documents, but rather in having a ranking with scores for the documents with which I have trained the model. I don't know if I explain myself.
That is, after training the model:
X, y, qid = load_svmlight_file(svm_file, query_id=True)
model = AdaRank(max_iter=100, estop=10, scorer=NDCGScorer(k=10)).fit(X, y, qid)
I want to calculate the scores for the same X
documents given a qid_test
(or more than one). And with those scores, return a ranking like:
qid_test docno rank score
----------------------------
1 3 1 0.7
1 5 2 0.23
1 6 3 0
2 6 1 1
2 5 2 0
2 3 3 0
Is it possible to get something like this?
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.