lazyprogrammer / machine_learning_examples Goto Github PK

View Code? Open in Web Editor NEW

8.3K 8.3K 6.3K 22.98 MB

A collection of machine learning examples and tutorials.

Home Page: https://lazyprogrammer.me

Python 100.00%

data-science deep-learning machine-learning natural-language-processing python reinforcement-learning

machine_learning_examples's People

Contributors

Stargazers

Watchers

Forkers

m4573r hardy86 ston380 yaminisarada jcapitz rahul-c1 drbaguiar shivendra05 fdzul harishraj gravity226 jinesh-patel joshinidhi datasci-co stepheweffie aneesh-innoplexus kubes nagyistge adammendoza okdonga maad2011 joyjeni k0suke-murakami tbirand lan131 alison-thaung afeldman hitchhiker744 dawei756 andymaheshw shamsed09 louzhao8712 ansi6622 andreydrv numerojuno ashishlal patechoc monika101 franckess thslehigh inuraj marko9 cesarchamal zangsir nikagos terrygrimaldi oldmonk101 ajwah libbyalthea vinaykarode cesposo vicdatascience sudeepverma77 dataist2019 saipraveen88 ridzuan05 pangjctufts merico34 blatoo vibhorsood chewingbruto hectonpdomingos c-hamilton sbdatasc paulnrad sonicthehedgehogg octaviomtz hemkaus chanki8658 kimx3129 cinneesol rohit3185 y7cs228 anykine faisalal-tameemi yixuantan mariatch dmellop leearaneta billyuanyao znikdel zubairghori alvasvoboda jmoianes deltanib kairat100 wpmarinho thsiung akankshadiwedy steveli90 sumituk1 ehyacinthe eboraksdev alenzhao mjk276 pszerr pei-yachao renjinghai shjb16 biomerc

machine_learning_examples's Issues

Suggest me the example which is suitable for my requirement and source of the file : mymodel.pkl.

Hi,

Greetings,

I have jobs data, I want to get the matching results when I enter some keywords in a search text box.
like, google text box.

As of now, I am working with supervised_class/ if working with right example,
from where I can download mymodel.pkl file.

Thanks,
Babu Kalamgi

question in grid_world.py

Hi, @lazyprogrammer ,

Does it miss a line "(1, 1): step_cost," under line 104 in grid_world.py, according to only two terminal states such as (0, 3) and (1, 3)?

Some removals

regex not needed for defining x

x = int(r[2].split('[')[0]) y = int(non_decimal.sub('', r[1].split('[')[0]))

error

I have installed rnn and tensorflow package but following error showing:

from rnn_class.brown import abget_sentences_with_word2idx_limit_voc, get_sentences_with_word2idx
ModuleNotFoundError: No module named 'rnn_class'

PyTorch: Deep Learning and Artificial Intelligence
I just want to ask that your course "PyTorch: Deep Learning and Artificial Intelligence" is very expensive on Udemy.
I like your lectures, and learned a lot from previous courses, Thank you!!

best_line_fit.py: Line 39 & 40 should have "+ c" not "-c"

Using data points (0,3) (1,4) (2,5) illustrate the fact that "c" (the y-intercept) is locked at zero. Flip the sign and the resultant line tracks the points.

https://github.com/lazyprogrammer/machine_learning_examples/blob/master/best_fit_line.py

Simpler more general build_state

If you want to try more than 10 bins you can use the idea from one of your other examples... convert base NUM_BINS to base 10.

NUM_BINS = 20
def build_state(features):
val = NUM_BINS3 * features[3] + NUM_BINS2 * features[2] + NUM_BINS*features[1]+features[0]

How to determine num_words variable when creating embedding matrix?

I have been following the poetry generation notebook and at the point where we have to create an embedding matrix that uses the following code

# prepare embedding matrix
num_words = min(MAX_VOCAB_SIZE, len(word2idx) + 1)
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word2idx.items():
    if i < MAX_VOCAB_SIZE:
        embedding_vector = word2vec.get(word)
        if embedding_vector is not None:
            # words not found in embedding index will be all zeros
            embedding_maxtrix[i] = embedding_vector

in the tutorial MAX_VOCAB_SIZE = 3000 and len(word2idx) + 1 = 4615. Shouldn't the second line be
num_words = max(MAX_VOCAB_SIZE, len(word2idx) + 1) because if we take the min, we're essentially dropping out some words from the embedding matrix? As in we should be considering each and every word in word2idx. What really is the point of num_words here if we can just create the embedding matrix of size len(word2idx) + 1) * (EMBEDDING_DIM).

pickle load

Line 14 in https://github.com/lazyprogrammer/machine_learning_examples/blob/master/supervised_class/app.py

should be

with open('mymodel.pkl','rb') as f:

Otherwise throws charmap cant decode error

Error: 'the path of my train.npy file' is not UTF-8 format.

I am trying to run your linear_rl_trader.py file. but there is a problem with train.npy file. The problem is running but the result can not be saved.

Error: 'the path of my train.npy file' is not UTF-8 format

Do you know how I can solve this problem? I added encoding='utf-8' in the following but still it didn't solved.

np.save(f'{rewards_folder}/{args.mode}.npy', portfolio_value)

unsupervised_class3/bayes_classifier_gmm.py

python3 bayes_classifier_gmm.py
Reading in and transforming data...
Fitting gmm 0
Traceback (most recent call last):
  File "bayes_classifier_gmm.py", line 52, in <module>
    clf.fit(X, Y)
  File "bayes_classifier_gmm.py", line 23, in fit
    self.p_y[k] = len(Xk)

easy fix :

*** bayes_classifier_gmm.py	Thu Aug  3 12:15:56 2017
--- bayes_classifier_gmm.py~	Tue Aug  1 18:33:18 2017
***************
*** 20,28 ****
      self.p_y = np.zeros(self.K)
      for k in range(self.K):
        print("Fitting gmm", k)
-       Xk = X[Y == k]
        self.p_y[k] = len(Xk)
!
        gmm = BayesianGaussianMixture(10)
        gmm.fit(Xk)
        self.gaussians.append(gmm)
--- 20,27 ----
      self.p_y = np.zeros(self.K)
      for k in range(self.K):
        print("Fitting gmm", k)
        self.p_y[k] = len(Xk)
!       Xk = X[Y == k]
        gmm = BayesianGaussianMixture(10)
        gmm.fit(Xk)
        self.gaussians.append(gmm)

ValueError: Unknown activation function:softmax_over_time

I try to run attention.py and it seems it's not possible to fit into memory for me. So I try do some tricks. One is to clean up and read arrays from disk only needed to use model.fit, but in the line
model = load_model("large_files/model.h5")
I get following error:

Traceback (most recent call last):
  File "AT_translate.py", line 32, in <module>
    model = load_model("large_files/"+MODE+"-model.h5")
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\engine\saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\engine\saving.py", line 584, in load_model
    model = _deserialize_model(h5dict, custom_objects, compile)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\engine\saving.py", line 274, in _deserialize_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\engine\saving.py", line 627, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\layers\__init__.py", line 168, in deserialize
    printable_module_name='layer')
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\utils\generic_utils.py", line 147, in deserialize_keras_object
    list(custom_objects.items())))
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\engine\network.py", line 1056, in from_config
    process_layer(layer_data)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\engine\network.py", line 1042, in process_layer
    custom_objects=custom_objects)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\layers\__init__.py", line 168, in deserialize
    printable_module_name='layer')
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\utils\generic_utils.py", line 149, in deserialize_keras_object
    return cls.from_config(config['config'])
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 1179, in from_config
    return cls(**config)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\layers\core.py", line 875, in __init__
    self.activation = activations.get(activation)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\activations.py", line 227, in get
    return deserialize(identifier)
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\activations.py", line 208, in deserialize
    printable_module_name='activation function')
  File "C:\Users\cp\Anaconda3\lib\site-packages\keras\utils\generic_utils.py", line 167, in deserialize_keras_object
    ':' + function_name)
ValueError: Unknown activation function:softmax_over_time

Please give me a hint how can I avoid it?

rl/monte_carlo.py - "iterative_policy_evaluation" doesn't exist!

"iterative_policy_evaluation" in the mentioned file must be changed to "iterative_policy_evaluation_deterministic" (or probabilistic).

seem took the wrong column as Y?

https://github.com/lazyprogrammer/machine_learning_examples/blob/master/ann_logistic_extra/process.py

Y = data[:,-1].astype(np.int32)
is that should be taking the first column rather than the last column as Y?
as the last column is not binary data (0 or 1)?

Deliver requirements files

Please deliver a requirement file with exact version information for your project (or a number of files for each sub project if necessary), which would really help to prevent from errors arising from different module versions.
Often problems arises just coming from a slightly different module version (older / newer) than the intended one.

Warning message

/Users/sarit/.pyenv/versions/machine/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d

Happen in bow_classifier.py

Low GPU utilization

Hi, I use the code DQN, I can run the code successfully, but I find the GPU utilization is really low, about 8%. I use the 2080Ti and I use the GPU by adding the code at the begining of the DQN file.

import os

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

How can I improve the GPU utilization? Thanks!!

Machine learning

best_avg_totalreward never updated

DBSCAN method

@lazyprogrammer i wish to explain unsupervised learning technique - DBSCAN
Can you please assign it to me

AttributeError: module 'gym.wrappers' has no attribute 'Monitor'

I've installed gym==0.9.6 but I've obtain this error.
When I launch
python save_a_video.py

I've got the error.

Could you tell me which version of the gym I have to install.

Thanks
Marco

What Is the Point of Using "n" in UCB Formula?

Hello,

In "ucb1.py" in the "rl" folder for solving the bandits problem, what is the point of using "n=total times of playing" in the UCB formula which is: mean + np.sqrt(2*np.log(n) / nj) ?
I tested the two following formulas (without "n" ) instead and they worked totally fine:
mean + np.sqrt(2 / nj)
and even
mean + (1 / nj)
I also tested them with different total number of plays, but the final results of the agents were so similar.
I would be grateful if you elaborate on the usage of n in the formula.

Best,
Parnia

Requirements.txt file missing

Please add rquirement.txt file.

I have this error when I run the "python optimistic_initial_values.py"

File "optimistic_initial_values.py", line 10, in
from comparing_epsilons import run_experiment as run_experiment_eps
ModuleNotFoundError: No module named 'comparing_epsilons'

Error in linear_rl_trader.py

There is some error in the code. Kindly check it and resolve it out as soon as possible.
My MacOS terminal says:
line 349
with open(f'{models_folder}/scaler.pkl', 'rb') as f:
^
SyntaxError: invalid syntax

Required LLM section

Hey! Your repository is well maintained. I figured out that there is no section for LLMs, and I would love to contribute to that section. If you think its a good idea, we may discuss what to include and what not.

Best regards,
Rafay

Invalid Argument Error in style_transfer3.py

Why do we need a manual for loop when implementing Attention?

Please bear with me here.

This might be confusing to understand for some because I'm adding the pseudocode to support what's unclear to me. I've been following a tutorial and it was mentioned that we need to define a for loop over the target sequences considering we're doing machine translation using attention mechanisms using LSTMs.

I've made it something that would look like Keras.

This is the pseudocode


h = encoder(input) # For getting input sequences for calculation attention weights and context vector
decoder_hidden_state = 0 # Hidden state
decoder_cell_state = 0 # Cell state
outputs = []

for t in range(Ty): # Ty being the length of the target sequence
    context = calc_attention(decoder_hidden_state, h) # decoder_hiddent_state(t-1), h(1),......h(Tx)
    decoder_output, decoder_hidden_state, decoder_cell_state = decoder_lstm(context, init = [decoder_hidden_state,decoder_cell_state])
    probabilities = dense_layer(o)
    outputs.append(probabilities)

model = Model ( input, outputs)

The thing that is unclear to me is why are we using a for loop, It was said that "In a regular seq2seq, we pass in the entire target input sequence all at once because the output was calculated all at once. But we need a loop over Ty steps since each context depends on the previous state"

But I think that the same can be done in the case of attention because if I just remove the for loop.

Just like this code below, which is the decoder part of a normal seq2seq

decoder_inputs_placeholder = Input(shape=(max_len_target,))
decoder_embedding = Embedding(num_words_output, EMBEDDING_DIM)
decoder_inputs_x = decoder_embedding(decoder_inputs_placeholder)
decoder_lstm = LSTM(
   LATENT_DIM,
   return_sequences=True,
   return_state=True,
)

If I want to add attention can't I just define the states here and call the calc_attention function that would return the context for a particular timestep while decoding, and can be passed onto the lstm call just as done before in pseudocode?


decoder_outputs, decoder_hidden_state, decoder_cell_state = decoder_lstm(
   decoder_inputs_x,
   initial_state=[decoder_hidden_state, decoder_cell_state]
)
decoder_outputs = decoder_dense(decoder_outputs)

Put own data file

how I can import own data corpus file instead of brown corpus??

warning with saving the model

it apperas that the accuracy of the model is so good but after saving the model and load it, it gives a non reasonable results ..
the model is not saved correctly

getting recommendation output

hey there;
thanks for your unique recommendation sys with deep learning!
but I want to know there is a way that I can to get a recommended movie for a user or a movie from these codes?
Because in this course I learned how improve gradually movie recom system with various AI approaches.
but I don't understand how recommend a movie!
appreciate your clear this confusion for me !

ann_class2/tensorflow1.py - migrate from tensorflow version 1 to version 2

I'm following the how to install guide of the data-science-linear-regression-in-python course.

The code for verifying that tensorflow is working correctly, didn't work for me, so I investigated how to fix it. Here is the error I was experiencing.

PROMPT> python3 tensorflow1.py 
Traceback (most recent call last):
  File "tensorflow1.py", line 20, in <module>
    A = tf.placeholder(tf.float32, shape=(5, 5), name='A')
AttributeError: module 'tensorflow' has no attribute 'placeholder'

On TF's website there is a migration guide.

Change from

import tensorflow as tf

Change to

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

This solved it for me.

Getting error while running theano2.py

Hi,

I am getting following error while running theano2.py in the function get_normalized_data.

FileNotFoundError: File b'../large_files/train.csv' does not exist

I cannot see the folder named large_files in the repository.

Please help!

rl_trader.py can not be run

Current machine_learning_examples/pytorch/rl_trader.py does not work

if condition in cnn_toxic.py

Hi,

In the code cnn_toxic.py shouldn't the the condition be "if i < num_words" instead of "if i <MAX_VOCAB_SIZE" ?

prepare embedding matrix

print('Filling pre-trained embeddings...')
num_words = min(MAX_VOCAB_SIZE, len(word2idx) + 1)
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word2idx.items():
if i < MAX_VOCAB_SIZE:
embedding_vector = word2vec.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all zeros.
embedding_matrix[i] = embedding_vector

Regards,

problem with moore

data = pd.read('moore.csv', header = None)- this code isnt working for me. "ParserError: Error tokenizing data. C error: Expected 1 fields in line 41, saw 16" keeps showing up

CVS file error/mismatch?

Should the linear_regression_class/moore.csv be replaced with tf2.0/moore.csv?

The linear_regression_class/moore.py file gives an error with python 3.8.10:

x = int(non_decmial.sub("", r[2].split("[")[0]))
IndexError: list index out of range

OR perhaps correcting this(Illegal quoting in line 88)?

TypeError: Bad input argument to theano function : autoencoder.py

I have tried your code. But it seems there is a problem with it.
I installed theano 0.9 and tried to run the code. Training for autoencoder:1 is displayed and the code crashes with the following error

TypeError: Bad input argument to theano function with name "../autoencoder.py:43" at index 0 (0-based).
Backtrace when that variable is created:

X_in = T.matrix('X_%s' % self.id)
TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function". Value: "array([[ 0.51405417, 0.54865338, 0.54488875, ..., 0.46252157,
0.44946414, 0.5688999 ],

RuntimeWarning: invalid value encountered in less

I'm training attention model on different data. I've encountered some strange error after several epochs of running:

Using TensorFlow backend.
num samples: 29024
input seq: 29024
Found 5000 unique input tokens.
target seq: 29024 | inp: 29024
Found 5000 unique output tokens.
encoder_data.shape: (29024, 11)
encoder_data[0]: [ 0  0  0  0  0  0  0  0  0  0 43]
decoder_data[0]: [  3 266   1   0   0   0   0   0   0   0   0   0   0   0]
decoder_data.shape: (29024, 14)
Loading word vectors...
Found 400000 word vectors.
Filling pre-trained embeddings...
OUTPUT size: (29024, 14, 5001)
2020-03-16 09:37:03.339535: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default
inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
C:\Users\cp\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:433: UserWarning: Converting
sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Train on 23219 samples, validate on 5805 samples
Epoch 1/50
23219/23219 [==============================] - 1056s 45ms/step - loss: 1.6740 - acc: 0.4022 - val_loss: 2.0796 - val_acc
: 0.3890

Epoch 00001: val_loss improved from inf to 2.07957, saving model to ./large_files/weights/engpol-30k-epoch.01-loss.2.08.
hdf5
Epoch 2/50
23219/23219 [==============================] - 1019s 44ms/step - loss: 1.2243 - acc: 0.5152 - val_loss: 1.8456 - val_acc
: 0.4375

Epoch 00002: val_loss improved from 2.07957 to 1.84557, saving model to ./large_files/weights/engpol-30k-epoch.02-loss.1
.85.hdf5
Epoch 3/50
23219/23219 [==============================] - 1051s 45ms/step - loss: 0.9595 - acc: 0.5739 - val_loss: 1.7147 - val_acc
: 0.4640

Epoch 00003: val_loss improved from 1.84557 to 1.71466, saving model to ./large_files/weights/engpol-30k-epoch.03-loss.1
.71.hdf5
Epoch 4/50
23219/23219 [==============================] - 1099s 47ms/step - loss: 0.7664 - acc: 0.6238 - val_loss: 1.6391 - val_acc
: 0.4823

Epoch 00004: val_loss improved from 1.71466 to 1.63908, saving model to ./large_files/weights/engpol-30k-epoch.04-loss.1
.64.hdf5
Epoch 5/50
23219/23219 [==============================] - 1021s 44ms/step - loss: 0.6217 - acc: 0.6725 - val_loss: 1.6114 - val_acc
: 0.4919

Epoch 00005: val_loss improved from 1.63908 to 1.61137, saving model to ./large_files/weights/engpol-30k-epoch.05-loss.1
.61.hdf5
Epoch 6/50
23219/23219 [==============================] - 1021s 44ms/step - loss: 0.5111 - acc: 0.7154 - val_loss: 1.6024 - val_acc
: 0.5002

Epoch 00006: val_loss improved from 1.61137 to 1.60242, saving model to ./large_files/weights/engpol-30k-epoch.06-loss.1
.60.hdf5
Epoch 7/50
23219/23219 [==============================] - 1034s 45ms/step - loss: nan - acc: 0.4895 - val_loss: nan - val_acc: 0.00
00e+00
C:\Users\cp\Anaconda3\lib\site-packages\keras\callbacks\callbacks.py:709: RuntimeWarning: invalid value encountered in l
ess
  if self.monitor_op(current, self.best):

Epoch 00007: val_loss did not improve from 1.60242
Epoch 8/50
 9796/23219 [===========>..................] - ETA: 11:31 - loss: nan - acc: 0.0000e+00Traceback (most recent call last)

I've checked input data, it seems ok. No missing values. What could cause the issue during train? How can I monitor what went wrong? is it possible than some value goes to infinity (in the current format of matrix data)? The problem occurs always when validation loss is close to converge (apparently).

Missing parentheses in call to 'print' (<string>, line 42-47)

extra_reading.txt link broken

The link in rl2/extra_reading.txt is broken.

ImportError: sys.meta_path is None, Python is likely shutting down python save_a_video.p

python save_a_video.p
....

avg length: 37.99
Final run with final weights: 200
Exception ignored in: <bound method Viewer.del of <gym.envs.classic_control.rendering.Viewer object at 0x0000022967EDBAC8>>
Traceback (most recent call last):
File "C:\Users\AUCAR\Anaconda3\lib\site-packages\gym\envs\classic_control\rendering.py", line 143, in del
File "C:\Users\AUCAR\Anaconda3\lib\site-packages\gym\envs\classic_control\rendering.py", line 62, in close
File "C:\Users\AUCAR\Anaconda3\lib\site-packages\pyglet\window\win32_init_.py", line 304, in close
File "C:\Users\AUCAR\Anaconda3\lib\site-packages\pyglet\window_init_.py", line 772, in close
ImportError: sys.meta_path is None, Python is likely shutting down

https://www.youtube.com/watch?v=FbPDXmMj5VI

I applied all steps. Everthing you did run. I received just an error at final with

python save_a_video.p in cartpole folder. Video generated and was saved in a new folder.
How to install Tensorflow, PyTorch, Keras, Theano, CNTK and more on Windows

File/folder does not exist or pls provide comment for link

machine_learning_examples/supervised_class/util.py

Line 14 in 53c89dd

df = pd.read_csv('../large_files/train.csv')

Simple Question

Hello.

I bought "Convolutional Neural Networks in Python".
And, cnn_tf.py (Chapter 5:Sample Code in Tensorflow) is worked.

Now, I have simple question. How shold I evaluate it ???
Please give me some advice if you do not mind.

thanks!!

Tensorflow 2 version

Hello Team, Can you please release the tensorflow 2 version of the reinforcement learning codes. It is very helpful to debug code in tensorflow 2 versions

GANS materials

Does this repo contain any GANS related material?

nb.py line 24 , column 48 is referenced as the score but it looks like column 58 should be

https://github.com/lazyprogrammer/machine_learning_examples/blob/master/nlp_class/nb.py
Looks like column 58 has the 0,1 score.
You have hard-coded column 48.

Update after terminal state

I think there's a little bug in many of your scripts in that you update the returns for the last step with a post-terminal step. Thus, your value (policy) functions wind up growing (unbounded?) near the terminal state. For example, in rl2/mountaincar you have a "train" boolean but it is never set to false for the last step.

Add requirements

Hello,

It would be great if you could add the project requirements (requirements.txt, or in some other form). Installing the required libraries is not enough, since we end up not having the correct library versions for running the code.

Thanks

python spam2.py error

$ python spam2.py
...
FileNotFoundError: [Errno 2] No such file or directory: '../large_files/spam.csv'