Giter VIP home page Giter VIP logo

Comments (7)

tqchen avatar tqchen commented on May 7, 2024

To @ChenglongChen , can you try to add seed to the python param to see if it fixed the problem? Seems the python module did not call seed during initialization as main console version did.

from xgboost.

ChenglongChen avatar ChenglongChen commented on May 7, 2024

Ok. Will try it soon. Thanks.

ÔÚ 2014Äê5ÔÂ21ÈÕ£¬0:14£¬Tianqi Chen [email protected] дµÀ£º

To @ChenglongChen , can you try to add seed to the python param to see if it fixed the problem? Seems the python module did not call seed during initialization as main console version did.

¡ª
Reply to this email directly or view it on GitHub.

from xgboost.

ChenglongChen avatar ChenglongChen commented on May 7, 2024

It seems there is actually a "bug" in my code, as I random shuffle the data each run. I remove that part and test with subsample<1, the results can be reproducible without explicitly setting the seed param. Very sorry :(

from xgboost.

tqchen avatar tqchen commented on May 7, 2024

No worries, glad that you find the problem :) Thanks for using XGboost

from xgboost.

tqchen avatar tqchen commented on May 7, 2024

Thanks for reporting the problem. Indeed the python module's seed via
set_param was not supported in previous code. I have made a fix to the
code, the most recent version of the code in repo should be able to
reproduce the result.

Tianqi

On Sat, May 24, 2014 at 11:25 AM, André Panisson
[email protected]:

I'm trying to reproduce a test with no success.
The following code, using the Kaggle Higgs dataset, should give the same
result for all steps, but I have different values in each step:

import numpy as npimport xgboost as xgbfrom sklearn import metrics
dtrain = np.loadtxt( open('../data/training.csv'), delimiter=',', skiprows=1,
converters={32: lambda x:int(x=='s'.encode('utf-8')) } )
X = dtrain[:1000,1:31]label = dtrain[:1000,32]weight = dtrain[:1000,31]

Create parametersparam = {}param['objective'] = 'binary:logitraw'param['eval_metric'] = 'auc'param['nthread'] = 1param['seed'] = 42num_round = 120

Create train and test setstrain_indices = np.arange(len(label)_0.8).astype(int)test_indices = np.arange(len(label)_0.8, len(label)).astype(int)

X_train, X_test = X[train_indices], X[test_indices]y_train, y_test = label[train_indices], label[test_indices]w_train, w_test = weight[train_indices], weight[test_indices]
xgmat = xgb.DMatrix(X_train, label=y_train, missing=-999.0, weight=w_train)xgmat_test = xgb.DMatrix(X_test, missing=-999.0)
for i in range(10):
bst = xgb.train(param.items(), xgmat, num_round)
y_out = bst.predict(xgmat_test)
print metrics.roc_auc_score(y_test, y_out)

I've set the seed parameter, but it does not help. Setting the python
random seed also does not help.


Reply to this email directly or view it on GitHubhttps://github.com/tqchen/xgboost/issues/9#issuecomment-44095344
.

Sincerely,

Tianqi Chen
Computer Science & Engineering, University of Washington

from xgboost.

tqchen avatar tqchen commented on May 7, 2024

@panisson it seems I mistakenly delete your previous comment. Really sorry about that.

from xgboost.

tqchen avatar tqchen commented on May 7, 2024

Thanks to @panisson , the python module should now support seeding and thus reproducible boosting results

from xgboost.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.