Giter VIP home page Giter VIP logo

Comments (5)

tqchen avatar tqchen commented on May 7, 2024

XBoost does not read from csv file directly so far. For text format, we
support libsvm format. Check the example to read using numpy

On Tuesday, September 16, 2014, explorerr [email protected] wrote:

I have a large file with 300+ features in each record.
While trying to load the data with DMatrix in python, I got the following
message:

dtest = xgb.DMatrix(tsDir+'xgbTest.csv', missing=-999.0)
86x397 matrix with 328730778 entries is loaded from ../data/xgbTest.csv

I know I have 1834123 lines of record.

I looked into the file at 86 line, which is no different than any other
line.

What could be a possible reason for this?

Thanks very much!

Rui


Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78.

from xgboost.

explorerr avatar explorerr commented on May 7, 2024

Thanks very much for your response.

I name the file as csv, but actually I have converted the file into libvm format.

I have successfully loaded the training dataset. I am having problem with the testing dataset.

so the one record of the training file is like this:

0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281 11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602 19:0.00 100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1 30:4 32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21 103:1 41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00 51:1 52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52 193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0 69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397 198:1 82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0 95:2608 96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00 141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0 114:0 115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397 124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19 132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0 117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00 150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397 159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679 90:0.00 169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99 181:0 182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0 192:1 60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011 201:0.00 168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0 212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00 218:0.00 219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1 227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999 236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0 243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00 251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0 259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268:538.23 269:1 262:0 271:3073.17 337:1

The testing dataset look exactly the same, just without the first column (the label)...

Thanks!

Rui

from xgboost.

tqchen avatar tqchen commented on May 7, 2024

Oh, you need dummy label for testset as well

On Tuesday, September 16, 2014, explorerr [email protected] wrote:

Thanks very much for your response.

I name the file as csv, but actually I have converted the file into libvm
format.

I have successfully loaded the training dataset. I am having problem with
the testing dataset.

so the one record of the training file is like this:

0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281
11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602 19:0.00
100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1 30:4
32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21 103:1
41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00 51:1
52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52
193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0
69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397 198:1
82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0 95:2608
96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00
141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0 114:0
115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397
124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19
132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0
117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00
150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397
159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679 90:0.00
169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99 181:0
182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0 192:1
60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011 201:0.00
168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0
212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00 218:0.00
219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1
227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999
236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0
243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00
251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0
259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268: 538.23
269:1 262:0 271:3073.17 337:1

The testing dataset look exactly the same, just without the first column
(the label)...

Thanks!

Rui


Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839738.

Sincerely,

Tianqi Chen
Computer Science & Engineering, University of Washington

from xgboost.

explorerr avatar explorerr commented on May 7, 2024

I see, thanks very much :)

On Tue, Sep 16, 2014 at 10:07 PM, Tianqi Chen [email protected]
wrote:

Oh, you need dummy label for testset as well

On Tuesday, September 16, 2014, explorerr [email protected]
wrote:

Thanks very much for your response.

I name the file as csv, but actually I have converted the file into
libvm
format.

I have successfully loaded the training dataset. I am having problem
with
the testing dataset.

so the one record of the training file is like this:

0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281
11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602 19:0.00
100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1
30:4
32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21
103:1
41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00 51:1
52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52
193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0
69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397 198:1
82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0 95:2608
96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00
141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0
114:0
115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397
124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19
132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0
117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00
150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397
159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679
90:0.00
169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99
181:0
182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0
192:1
60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011 201:0.00
168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0
212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00
218:0.00
219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1
227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999
236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0
243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00
251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0
259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268:
538.23
269:1 262:0 271:3073.17 337:1

The testing dataset look exactly the same, just without the first column
(the label)...

Thanks!

Rui


Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839738.

Sincerely,

Tianqi Chen
Computer Science & Engineering, University of Washington


Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839923.

from xgboost.

explorerr avatar explorerr commented on May 7, 2024

You did a great job with xgboost, bravo :)

On Tue, Sep 16, 2014 at 10:08 PM, Zhang Rui [email protected] wrote:

I see, thanks very much :)

On Tue, Sep 16, 2014 at 10:07 PM, Tianqi Chen [email protected]
wrote:

Oh, you need dummy label for testset as well

On Tuesday, September 16, 2014, explorerr [email protected]
wrote:

Thanks very much for your response.

I name the file as csv, but actually I have converted the file into
libvm
format.

I have successfully loaded the training dataset. I am having problem
with
the testing dataset.

so the one record of the training file is like this:

0 0:1 1:0.0397 2:0 3:318.77 4:1 21:0 6:0.00 7:0.00 8:0 9:1 10:0.0281
11:0.01 12:0.0397 137:0.0397 76:0.00 68:0.03 16:0.00 18:74500602
19:0.00
100:0.00 22:0 23:1 24:1 25:0.00 136:0.47 140:1 27:9999999.99 28:0 29:1
30:4
32:42 33:0.0281 105:0.0419 35:4 99:0.00 36:-1.23 37:59 38:0 39:281.21
103:1
41:0 42:1 43:0 44:1 45:3 46:0 221:0 47:9999999.99 48:0 49:0 50:0.00
51:1
52:1.00 53:0.0006 54:0.0281 55:0 56:0 57:0.0000 58:0.0281 59:295.52
193:1.00 61:1.00 62:0.0397 145:1 65:0.00 66:0.0281 121:0.0281 208:0
69:0.000 71:0 146:0 74:0 75:999 14:699 77:0.00 78:0 79:0 80:0.0397
198:1
82:7 83:2.127 84:0 239:0.00 86:0 87:1 88:0 107:0 91:0 93:1 264:0
95:2608
96:1 5:0.00 92:0.05 67:0 20:1.00 101:4 102:0.0005 151:0.0281 104:0.00
141:711 106:0 175:0 186:-3 108:38 109:0 110:0.00 111:0.01 203:1 113:0
114:0
115:0.0397 116:262 72:9999 118:1 157:5 120:0 31:4 122:0 123:0.0397
124:9999999.99 125:41 126:0 127:14 128:9999 129:0.0871 130:5 248:80.19
132:0 160:115.62 134:2 260:0.00 135:5.00 94:0 15:1 138:0 139:0 26:0
117:0.00 267:1 142:1 143:-3 252:1 13:68 73:-0.46 147:1 148:-3 149:1.00
150:9999999.99 97:0 152:0 164:79 154:1 155:12 156:0 119:2 158:0.0397
159:2.10 133:1392 161:0.05 162:0 163:0 153:0.00 166:-0.68 167:679
90:0.00
169:0 265:0 171:0.00 173:0 174:0.00 177:0 170:0 179:9999 176:999.99
181:0
182:-1.59 183:0.00 184:12 34:0 89:1 40:96 188:1459 189:0 165:0 191:0
192:1
60:0 63:0 195:0.0397 196:0 197:19 81:0.00 199:0.0281 200:0.0011
201:0.00
168:1 112:1.00 204:1.00 216:47 206:0 207:0.05 70:1 209:33 210:0 211:0
212:0.00 190:1 213:2 214:0.0397 215:0.0281 205:1 217:-3 266:1.00
218:0.00
219:1 220:0 172:1.000 222:0.0281 223:0 224:0.25 225:0 226:76 187:1
227:2777.65 228:0.00 229:1 230:18 231:9999 232:0 233:1 234:1 235:9999
236:0.00 237:0 238:263 257:1 85:0 240:0.0397 241:0.0000 202:1 242:0
243:0.00 244:0.00 245:-1.53 246:5 247:0 131:0.00 249:-1.90 250:1.00
251:-711 144:999 253:0 98:0 254:1 255:-0.66 256:0.0000 178:999.99 258:0
259:0 185:1.00 261:1 270:1 263:0 194:23 180:1 17:0.00 64:1.00 268:
538.23
269:1 262:0 271:3073.17 337:1

The testing dataset look exactly the same, just without the first
column
(the label)...

Thanks!

Rui


Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839738.

Sincerely,

Tianqi Chen
Computer Science & Engineering, University of Washington


Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/78#issuecomment-55839923.

from xgboost.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.