Comments (7)
To @ChenglongChen , can you try to add seed to the python param to see if it fixed the problem? Seems the python module did not call seed during initialization as main console version did.
from xgboost.
Ok. Will try it soon. Thanks.
ÔÚ 2014Äê5ÔÂ21ÈÕ£¬0:14£¬Tianqi Chen [email protected] дµÀ£º
To @ChenglongChen , can you try to add seed to the python param to see if it fixed the problem? Seems the python module did not call seed during initialization as main console version did.
¡ª
Reply to this email directly or view it on GitHub.
from xgboost.
It seems there is actually a "bug" in my code, as I random shuffle the data each run. I remove that part and test with subsample<1, the results can be reproducible without explicitly setting the seed param. Very sorry :(
from xgboost.
No worries, glad that you find the problem :) Thanks for using XGboost
from xgboost.
Thanks for reporting the problem. Indeed the python module's seed via
set_param was not supported in previous code. I have made a fix to the
code, the most recent version of the code in repo should be able to
reproduce the result.
Tianqi
On Sat, May 24, 2014 at 11:25 AM, André Panisson
[email protected]:
I'm trying to reproduce a test with no success.
The following code, using the Kaggle Higgs dataset, should give the same
result for all steps, but I have different values in each step:import numpy as npimport xgboost as xgbfrom sklearn import metrics
dtrain = np.loadtxt( open('../data/training.csv'), delimiter=',', skiprows=1,
converters={32: lambda x:int(x=='s'.encode('utf-8')) } )
X = dtrain[:1000,1:31]label = dtrain[:1000,32]weight = dtrain[:1000,31]Create parametersparam = {}param['objective'] = 'binary:logitraw'param['eval_metric'] = 'auc'param['nthread'] = 1param['seed'] = 42num_round = 120
Create train and test setstrain_indices = np.arange(len(label)_0.8).astype(int)test_indices = np.arange(len(label)_0.8, len(label)).astype(int)
X_train, X_test = X[train_indices], X[test_indices]y_train, y_test = label[train_indices], label[test_indices]w_train, w_test = weight[train_indices], weight[test_indices]
xgmat = xgb.DMatrix(X_train, label=y_train, missing=-999.0, weight=w_train)xgmat_test = xgb.DMatrix(X_test, missing=-999.0)
for i in range(10):
bst = xgb.train(param.items(), xgmat, num_round)
y_out = bst.predict(xgmat_test)
print metrics.roc_auc_score(y_test, y_out)I've set the seed parameter, but it does not help. Setting the python
random seed also does not help.—
Reply to this email directly or view it on GitHubhttps://github.com/tqchen/xgboost/issues/9#issuecomment-44095344
.
Sincerely,
Tianqi Chen
Computer Science & Engineering, University of Washington
from xgboost.
@panisson it seems I mistakenly delete your previous comment. Really sorry about that.
from xgboost.
Thanks to @panisson , the python module should now support seeding and thus reproducible boosting results
from xgboost.
Related Issues (20)
- clarification needed for model/saving loading HOT 2
- Missing XGBoostRanker in xgboost4j-spark jvm package HOT 3
- multi label support in Scala xgboost. HOT 8
- SparkXGBClassifier does not validate params HOT 6
- feature_weights only compatible with CPU ? HOT 1
- Improve XGBoost quantile predictions HOT 3
- Latest version training crashes HOT 12
- Error when trying to build HOT 4
- Model provides different results for different Python versions HOT 2
- error in the docs for ranking HOT 1
- Defining a callback to write hessians of train observations to a csv file HOT 2
- Slow inference on sphoradic stremaing data HOT 2
- XGBoost GPU Warning When Working with BayesSearchCV (XGBoost is running on: cuda:0, while the input data is on: cpu.)
- help installing xgboost with gpu HOT 2
- Potential Documentation Inaccuracy Regarding Feature Interaction Constraints
- Horizontal Federated Learning with Secure Features RFC
- [bug] Python - Cuda error (without using Cuda) HOT 5
- Pandas 2.2: Index.format is deprecated
- ArrayInterface handler for cuDF DataFrame cannot yet handle Boolean columns HOT 1
- src/metric/auc.cc:322: Check failed: auc <= local_area HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xgboost.