Giter VIP home page Giter VIP logo

dat_sf_10's People

Contributors

craigsakuma avatar ghego avatar kebaler avatar ostegm avatar

Watchers

 avatar  avatar

dat_sf_10's Issues

Homework 2 review

Hey Otto,

Overall I think you did a good job with the homework.

I think you did a great job with getting rid of the NaN in one go with:

df = pd.read_csv('iris.csv', names = column_labels, nrows =150)

In case you don't have it in your toolkit - an alternative way to get rid of the NaN is to use df.dropna(). For the mapping I saw that you first added an empty class_labels column and then assigned a numerical value to it by using replace

df['Class_label'] = df['Class']
df.Class_label.replace(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'],[1,2,3],inplace=True),

An alternative way and I tried it was to create a dictionary since x is set to each key in the dictionary, not each value and used the map function as map works element-wise on a Series, so I had

data['Name_mapped'] = data['Name'].map(lambda x: name_dict[x])

I think one can also use the applymap method too and it should all give the same result.

The KNN and Kfold seem fine to me , except to covert the dataframe into an array, I used the values attribute

X = data[['col_1','col_2','col_3','col_4']].values
y = data['Name_mapped'].values

Also you're not alone in getting the deprecation warning error:). I got it too and I'll try to get help with it before class today.

I learned some interesting ways while reviewing your homework. Hope this helped. If you have any q, feel free to email me.

-Priya

@ghego, @craigsakuma, @kebaler

HW6 Review

Hey Otto,

Good stuff. You definitely put a lot of work into the column conversion! It was interesting that the bank-additional version didn't improve your CV score much. An interesting piece of the dataset is the "duration" column which the UCI dataset library tends to be overly predictive (0 = no, always) and is only gathered after the call has occurred.

One small error: you accidentally used a Gaussian estimator (which was from the original learning curve code on the site) which explains why your learning curve CV score for the Random Forest is significantly lower than your original score.

-Justin

Note on PCA

Hey Otto,

I checked out your HW3 since I was curious about your PCA implementation (and I think the student I was assigned dropped out of the course). Cool stuff and I'd forgotten about that stats package. A couple things that could improve your model would be normalizing the scales so that certain features don't overpower your data. Great idea putting class mean times into the spots for the data. I treated them all equally, which was probably a bad idea.

For PCA specifically, you were really close, actually. You had it implemented, in a way, by doing the analysis and seeing how to handle the variance. Now here is another reason why scaling is important: PCA is very sensitive to the scale, since it is consolidating variance and you don't want it to overcompensate for certain variables (at least that's my understanding of it). The next step would be to choose fewer columns for the n components based on your scree plot and then use that for your linear regression. By reducing the number of dimensions, you'll also improve your model's accuracy (remember that distance between points increases like crazy with higher dimensionality).

Best,
Justin

@ghego @craigsakuma @kebaler

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.