Giter VIP home page Giter VIP logo

machine-learning-with-python's Introduction

Machine-Learning-with-Python

Python codes for common Machine Learning Algorithms

machine-learning-with-python's People

Contributors

susanli2016 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning-with-python's Issues

Dataset not found

Hello Mrs. Susan, I hope you are doing very well, I have followed your project carefully and I am very interested in it.

Please, I did not find the dataset "sample_data_search.tar.gz" for the "Trip Segmentation by User Search Behaviors.ipynb" Notebook, I tried to contact you on several platforms and did not find your contact info, so please i need this dataset urgently if you could provide it to me i will appreciate it very much.

dimension mismatch

I am trying to predict the class of a string (randomly) but the clf.predict always give dimensions mismatch error. here i am adding the very first line to check if it classifies correctly. but it displays mismatch error, i have done everything the same way mentioned in the notebook.

s = []
s.append((df['final'][0]))
print(clf.predict(count_vect.transform(s)))

019_Polo_Towers.csv

Hello Susan"

Where can I find the data file "019_Polo_Towers.csv?" This data is needed for the "Analysis - Polo Towers OCC & ADR & Rental RevPar & Time Series" project.

Thanks.

Manny

TypeError: object of type 'numpy.int64' has no len()

I'm getting the error 'TypeError: object of type 'numpy.int64' has no len()' for the last section of code.

My data file doesn't have column headings to I used 'header=None' when reading the csv file.

My data file also uses integers as the labels rather than text.

#Load data file from from GCS %gcs read --object "gs://projectname/data/data.csv" --variable csv_as_bytes df = pd.read_csv(BytesIO(csv_as_bytes), header=None, encoding='latin-1') df.head()

Memory error on Consumer_complaints.ipynb

image

MemoryError Traceback (most recent call last)
in ()
3 tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2), stop_words='english')
4
----> 5 features = tfidf.fit_transform(df.Consumer_complaint_narrative).toarray()
6 labels = df.category_id
7 features.shape

~\Anaconda3\lib\site-packages\scipy\sparse\compressed.py in toarray(self, order, out)
945 if out is None and order is None:
946 order = self._swap('cf')[0]
--> 947 out = self._process_toarray_args(order, out)
948 if not (out.flags.c_contiguous or out.flags.f_contiguous):
949 raise ValueError('Output array must be C or F contiguous')

~\Anaconda3\lib\site-packages\scipy\sparse\base.py in _process_toarray_args(self, order, out)
1182 return out
1183 else:
-> 1184 return np.zeros(self.shape, dtype=self.dtype, order=order)
1185
1186

MemoryError:

Dataset

In Logistic Regression balanced, I was looking for the dataset
but was Not able to find the dataset.
Can you share the dataset?

h2o import file error

I am not able to run this line...

higgs = h2o.import_file('higgs_boston_train.csv')

Getting this error:


H2OResponseError Traceback (most recent call last)
in
----> 1 higgs = h2o.import_file('higgs_boston_train.csv')

/opt/conda/lib/python3.7/site-packages/h2o/h2o.py in import_file(path, destination_frame, parse, header, sep, col_names, col_types, na_strings, pattern, skipped_columns, custom_non_data_line_markers)
434 else:
435 return H2OFrame()._import_parse(path, pattern, destination_frame, header, sep, col_names, col_types, na_strings,
--> 436 skipped_columns, custom_non_data_line_markers)
437
438

/opt/conda/lib/python3.7/site-packages/h2o/frame.py in _import_parse(self, path, pattern, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns, custom_non_data_line_markers)
334 if H2OFrame.LOCAL_EXPANSION_ON_SINGLE_IMPORT and is_type(path, str) and "://" not in path: # fixme: delete those 2 lines, cf. PUBDEV-5717
335 path = os.path.abspath(path)
--> 336 rawkey = h2o.lazy_import(path, pattern)
337 self._parse(rawkey, destination_frame, header, separator, column_names, column_types, na_strings,
338 skipped_columns, custom_non_data_line_markers)

/opt/conda/lib/python3.7/site-packages/h2o/h2o.py in lazy_import(path, pattern)
296 assert_is_type(pattern, str, None)
297 paths = [path] if is_type(path, str) else path
--> 298 return _import_multi(paths, pattern)
299
300

/opt/conda/lib/python3.7/site-packages/h2o/h2o.py in _import_multi(paths, pattern)
302 assert_is_type(paths, [str])
303 assert_is_type(pattern, str, None)
--> 304 j = api("POST /3/ImportFilesMulti", {"paths": paths, "pattern": pattern})
305 if j["fails"]: raise ValueError("ImportFiles of '" + ".".join(paths) + "' failed on " + str(j["fails"]))
306 return j["destination_frames"]

/opt/conda/lib/python3.7/site-packages/h2o/h2o.py in api(endpoint, data, json, filename, save_to)
102 # type checks are performed in H2OConnection class
103 _check_connection()
--> 104 return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
105
106

/opt/conda/lib/python3.7/site-packages/h2o/backend/connection.py in request(self, endpoint, data, json, filename, save_to)
405 auth=self._auth, verify=self._verify_ssl_cert, proxies=self._proxies)
406 self._log_end_transaction(start_time, resp)
--> 407 return self._process_response(resp, save_to)
408
409 except (requests.exceptions.ConnectionError, requests.exceptions.HTTPError) as e:

/opt/conda/lib/python3.7/site-packages/h2o/backend/connection.py in _process_response(response, save_to)
741 # Client errors (400 = "Bad Request", 404 = "Not Found", 412 = "Precondition Failed")
742 if status_code in {400, 404, 412} and isinstance(data, (H2OErrorV3, H2OModelBuilderErrorV3)):
--> 743 raise H2OResponseError(data)
744
745 # Server errors (notably 500 = "Server Error")

H2OResponseError: Server error water.exceptions.H2ONotFoundArgumentException:
Error: File /tmp/Machine-Learning-with-Python/higgs_boston_train.csv does not exist
Request: POST /3/ImportFilesMulti
data: {'paths': '[/tmp/Machine-Learning-with-Python/higgs_boston_train.csv]'}

`AttributeError: 'DataFrame' object has no attribute 'ix'` in `Time Series Forecastings`

Hi,
I tried to run Time Series Forecastings example. From

first_date = store.ix[np.min(list(np.where(store['office_sales'] > store['furniture_sales'])[0])), 'Order Date']

print("Office supplies first time produced higher sales than furniture is {}.".format(first_date.date()))

I got
AttributeError: 'DataFrame' object has no attribute 'ix'
If I replace ix to iloc, based on https://stackoverflow.com/questions/59991397/attributeerror-dataframe-object-has-no-attribute-ix then I got
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

How to fix it? Thanks

Logistic regression

Hi, in have a problem with the implementation of lr
when i try it...
logit_model=sm.Logit(y,X)
i get it...
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data)

What's the definition of Diabetes Pedigree Function?

There is a feature in Diabetes.csv named Diabetes Pedigree Function (pedi), and I searched its introduction as below

A particularly interesting attribute used in the study was the Diabetes Pedigree Function, pedi. It provided some data on diabetes mellitus history in relatives and the genetic relationship of those relatives to the patient. This measure of genetic influence gave us an idea of the hereditary risk one might have with the onset of diabetes mellitus. Based on observations in the proceeding section, it is unclear how well this function predicts the onset of diabetes.

Can anybody tell me that the definition of this pedi feature? I mean, the mathematical definition

Update READ.me file

Hey! I love how simple and amazing your repository is. I figured out that there is a need for a READ.me file, and I would love to contribute to it. If you think it is a good idea, we can discuss it further.

Best regards,
Rafay

Recommendation system dataset

Hello Susan:
It was nice to follow your github and get in touch with so many good python applications and code. I am a Ph.D. student who is currently learning some basic python examples. I found your 'recommendation system' topic is quite interesting. And I am trying to learn about it. But I did not find the BX-Books.csv you used. Would you mind sharing it with me? My email address is: [email protected]
Thanks very much.

Getting Error in Forecasting Graph

Hi

As I try to run the Validating Forecasts portion code. it gives me error. I have attached code and error. Please have a look and suggest me a solution.

Many Thanks

getpred

Minor Typo

Hi Susan,

In the Time Series of Price Anomaly Detection Expedia.ipynb, there is a minor typo in the function markovAnomaly.

In line 39&40 of the block, it should be:
if (j < windows_size): df_anomaly.append(0)

Overall I find the notebook really helpful. Thanks.

Chase

Error in 'topic_modeling_Gensim.ipynb'

Hi,

I have tried to run 'topic_modeling_Gensim.ipynb' and I get this error at this stage in the notebook. Can anyone help?: -

import random
text_data = []
with open('dataset.csv') as f:
    for line in f:
        tokens = prepare_text_for_lda(line)
        if random.random() > .99:
            print(tokens)
            text_data.append(tokens)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-54-7369a1356984> in <module>()
      3 with open('dataset.csv') as f:
      4     for line in f:
----> 5         tokens = prepare_text_for_lda(line)
      6         if random.random() > .99:
      7             print(tokens)

<ipython-input-51-4f0710beb9ee> in prepare_text_for_lda(text)
      1 def prepare_text_for_lda(text):
----> 2     tokens = tokenize(text)
      3     tokens = [token for token in tokens if len(token) > 4]
      4     tokens = [token for token in tokens if token not in en_stop]
      5     tokens = [get_lemma(token) for token in tokens]

<ipython-input-45-f5c7dc83eb04> in tokenize(text)
      3 def tokenize(text):
      4     lda_tokens = []
----> 5     tokens = parser(text)
      6     for token in tokens:
      7         if token.orth_.isspace():

NameError: name 'parser' is not defined

Expected 2D array, got 1D array instead

Hi,

I'm getting the following error when I run the following cell
What should I do?

scaler = MinMaxScaler(feature_range=(-1, 1)) train_sc = scaler.fit_transform(train) test_sc = scaler.transform(test)

Expected 2D array, got 1D array instead: array=[17.24 18.190001 19.219999 ... 10.47 10.18 11.04 ]. Reshape your data either using array.reshape(-1, 1)

DeprecationWarning, DataConversionWarning, NameErrors, FutureWarnings

Hej Susan,

I am trying to retrace your steps on this logistic regression. I have started with your your article “Building A Logistic Regression in Python, Step by Step” on DataScience+ and I am now working through the (latest commit of your) Jupiter notebook used to make that post.

I have tried to reproduce your results with a clone of your notebook.

I have replaced from sklearn.cross_validation import train_test_split with from sklearn.model_selection import train_test_split because of a DeprecationWarning in cell 1.

Cell 24 raises a DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)

In cell 32 there's a NameError: name 'classifier' is not defined. You have used logreg.score in cell 30. Replacing classifier with logreg in cell 32 works but obviously produces the exact results as in line 30. I am not sure what you trying to do here (using a different classifier?), or if this is just an accidental duplicate (after changing classifier to a more specific logreg).

In the last section titled “ROC Curvefrom sklearn import metrics” it looks to me like you (accidentally) converted some Python code to MarkDown. This code (cell 34) produces two FutureWarnings: pandas.tslib is deprecated and will be removed in a future version., one NameError: name 'clf1' is not defined and another NameError: name 'Y_test' is not defined.

Kind regards

date of date_account_created is greater than timestamp_first_active in Airbnb New User Bookings.ipynb

date of date_account_created is greater than timestamp_first_active
u can check at Out[28]:
https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Airbnb%20New%20User%20Bookings.ipynb

  affiliate_channel affiliate_provider age country_destination date_account_created first_affiliate_tracked first_browser first_device_type gender id language signup_app signup_flow signup_method timestamp_first_active date_account_created_day date_account_created_month date_account_created_year
direct direct 56.0 US 2010-09-28 untracked IE Windows Desktop FEMALE 4ft3gnwmtx en Web 3 basic 2009-06-09 Tuesday 9 2010
direct direct 42.0 other 2011-12-05 untracked Firefox Mac Desktop FEMALE bjjt8pjhuk en Web 0 facebook 2009-10-31 Monday 12 2011
direct direct 41.0 US 2010-09-14 untracked Chrome Mac Desktop M 87mebub9p4 en Web 0 basic 2009-12-08 Tuesday 9 2010
other other NaN US 2010-01-01 omg Chrome Mac Desktop M osr2jwljor en Web 0 basic 2010-01-01 Friday 1 2010
other craigslist 46.0 US 2010-01-02 untracked Safari Mac Desktop FEMALE lsw9q7uk0j en Web 0 basic 2010-01-02 Saturday

ValueError: Length of endogenous variable must be larger the the number of lags used in the model and the number of observations burned in the log-likelihood calculation

Hi,

I tried to run Time Series Forecastings.ipynb both in Jupiter and python script. From Jupiter it seems fine. If I tried to run as a python file (paste sections one by one and run as whole), in

results.plot_diagnostics(figsize=(16, 8))
plt.show()

I got

Traceback (most recent call last):
  File "time-series.py", line 71, in <module>
    results.plot_diagnostics(figsize=(16, 8))
  File "/home/user/anaconda3/lib/python3.8/site-packages/statsmodels/tsa/statespace/mlemodel.py", line 4284, in plot_diagnostics
    raise ValueError(
ValueError: Length of endogenous variable must be larger the the number of lags used in the model and the number of observations burned in the log-likelihood calculation.

may I know what is the reason for it? Thanks

[bpr_OnlineRetail_Implicit.ipynb]: operands could not be broadcast together with shapes (3664,) (4338,)


ValueError Traceback (most recent call last)
Input In [9], in
28 # Create recommendations for customer with id 2
29 customer_id = 2
---> 30 recommendations = recommend(customer_id, sparse_customer_item, customer_vecs, item_vecs)
32 print(recommendations)

Input In [9], in recommend(customer_id, sparse_customer_item, customer_vecs, item_vecs, num_items)
9 min_max = MinMaxScaler()
10 rec_vector_scaled = min_max.fit_transform(rec_vector.reshape(-1,1))[:,0]
---> 11 recommend_vector = customer_interactions * rec_vector_scaled
13 item_idx = np.argsort(recommend_vector)[::-1][:num_items]
15 descriptions = []

ValueError: operands could not be broadcast together with shapes (3664,) (4338,)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.