Giter VIP home page Giter VIP logo

kaggle-for-fun's Issues

Add utility functions / refactorings

Since we are reusing a lot of stuffs, might as well have some utility functions in place.

Suggestion:

  • start with utils.py
  • add a few functions and refactor existing code
  • submit PRs
  • repeat

Issue with the Cabin feature

Running LabelEncoder on the Cabins feature gives an error:

Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Cabin
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-121-48f3aad5f78e> in <module>()
      4     print(col)
      5     le.fit(list(train[col]) + list(cv[col]))
----> 6     train[col] = le.transform(train[col])
      7     cv[col] = le.transform(cv[col])

/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/label.py in transform(self, y)
    128         y = column_or_1d(y, warn=True)
    129 
--> 130         classes = np.unique(y)
    131         if len(np.intersect1d(classes, self.classes_)) < len(classes):
    132             diff = np.setdiff1d(classes, self.classes_)

/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
    208     ar = np.asanyarray(ar)
    209     if axis is None:
--> 210         return _unique1d(ar, return_index, return_inverse, return_counts)
    211     if not (-ar.ndim <= axis < ar.ndim):
    212         raise ValueError('Invalid axis kwarg specified for unique')

/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
    275         aux = ar[perm]
    276     else:
--> 277         ar.sort()
    278         aux = ar
    279     flag = np.concatenate(([True], aux[1:] != aux[:-1]))

TypeError: '>' not supported between instances of 'float' and 'str'

It looks like the reason is because there are missing values in the Cabins feature. How did you overcome this?

Add sample / test data / instructions where needed

Suggestion:

  • Run through each project in this repo
  • If sample data not present, download from kaggle page for the challenge
  • Take first 1000 rows of data and put in data/ folder
  • Try running the script!

Have fun!

Note: Please state which one you are working on so we don't duplicate efforts ๐Ÿ˜„

Add requirements.txt

This is a super simple one ๐Ÿ˜„

Suggestion:

  • Go through each of the code
  • Add all required libraries into requirements.txt
  • Submit PR!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.