edwardraff / inside-deep-learning Goto Github PK
View Code? Open in Web Editor NEWInside Deep Learning: The math, the algorithms, the models
Inside Deep Learning: The math, the algorithms, the models
In Chapter 3.2.1, there is an implementation of sliding the filter over the input:
filter = [1, 0, -1]
input = [1, 0, 2, -1, 1, 2]
output = []
for i in range(len(input) - len(filter)):
result = 0
for j in range(len(filter)):
result += input[i+j] * filter[j]
output.append(result)
The first loop does not catch the last possible slide, so it should be:
filter = [1, 0, -1]
input = [1, 0, 2, -1, 1, 2]
output = []
for i in range(len(input) - len(filter) + 1):
result = 0
for j in range(len(filter)):
result += input[i+j] * filter[j]
output.append(result)
PS: @EdwardRaff your book is absolutely brilliant!
In the figure 3.11, that shows convolution in Chapter 3.4.2 the output should have the same dimensions as the original image.
Original Image is 7x7 (without padding), however the output of the convolutions is 7x6.
Figure 6.2: the 4th "high complexity" function figure is not a function because the figure suggests that a particular x value can map to multiple distinct y values.
Section 6.3.2: in the algorithm annotations on page 231, "cat=3" should be "dim=3"
Section 6.6.2: "anything time 0" should be "anything times 0"
Section 6.6.2: "forget fate" -> "forget gate" in the second to last paragraph in that section.
Section 9.0: "GANS" -> "GANs" right before the start of section 9.1.
Section 9.5.1: in the algorithm annotations on page 382 "inear" -> "linear"
Section 9.5.3: "which is now always a possibility" -> "which is not always a possibility"
Section 11.2: in the first figure describing the data: "journxe" -> "journée"
Also in the last paragraph of Section 11.2: "almond" translates to "amande" not "amende". Perhaps a homograph example might be clearer, like "avocat" which means both "lawyer" and "avocado" in French.
In Figure 11.6, it is unclear why there is an arrow from the "z hat" box to the "Attention" box.
Section 12.2.1: "EmbeddingAttentionBad" -> "EmbeddingAttentionBag"
Section 13.1.1: The cats and dogs dataset is no longer downloadable from the link assigned to "data_url_zip" in the code example.
Section 14.3.1: It might be informative to show some additional parameter settings for the Beta distribution. In particular settings for which the distribution might not be U shaped or might be asymmetrical, and when it looks like a Uniform distribution.
There is a random figure on the next to last page in the book. Is this normal?
Hi @EdwardRaff - I am a beginner and currently working through Chapter 2. On print book's P49, there is a figure:
I am using the code in this repo and my pytorch is v2.0.1
Can you advise what's going wrong?
Thank you.
In Chapter 3.4.4 the code is shown for creating a first CNN. For using nn.Flattening before the last Layer, it says in the code comments (in the book it's point 10), "Converts from (B, C, W, H) ->(B, D) so we can use a Linear layer".
Shouldn't it actually be (B, filters, C, W, H) -> (B, filters*D) ?
Hi,
Thanks for the examples.
I think it would make sense to add the following to the notebooks:
from google.colab import drive
drive.mount('/content/drive')
and
# Here you want to customize the path to the right location in your drive
!cp drive/MyDrive/Inside\ Deep\ Learning/idlmam.py .
Learning rate (named parameter lr
) is ignored inside the train_simple_network
function's body.
Instead, learning rate is hardcoded to 0.001 in line:
Inside-Deep-Learning/idlmam.py
Line 123 in bc4dccf
lr
to the created optimizerIn the code snippet 'train_simple_network'
optimizer.step() # updates all the parameters theta(k+1) = theta(k)yita gradient
it should have been theta(k+1) = theta(k) - yita gradient
-p47 you write W_{d,c}
instead of W^{d,c}
Y_pred.ravel()
could have been made earlier on p40 where it was first introducedtest_data = torchvision.datasets.FashionMNIST("./", train=True, transform=transforms.ToTensor(), download=True)
should use train=False
This changes the figures in this chapter significantly.
In the beginning of chapter 2
2.1.3 The training loop
you used the following steps in you train simple nn function.
optimizer.zero_grad()
loss.backward()
optimizer.step()
but in the end of the chapter for def run_epoch function you used a different order:
2.4.2 Training and testing passes
if model.training:
loss.backward()
optimizer.step()
optimizer.zero_grad()
is there any reason for switching the order? i thought we had to zero the gradients firs thing at every epoch.
Backpropagation is really an important and fundamental topic in deep learning
yeah, i admit that bp is a little math-heavy and a little hard for new guys
but i also can not imagine that a guy who can not understand bp can undestand deep learning, really.
bp hurts people but it is a good hurt and a must hurt.
you can not omit it . so , please add bp in ch1.
Hi, I was getting the followed erro when I executing this code:
from torch.utils.data import Dataset
from sklearn.datasets import fetch_openml
X, y = fetch_openml("mnist_784", version=1, return_X_y=True)
class SimpleDataset(Dataset):
def __init__(self, X, y):
super(SimpleDataset, self).__init__()
self.X = X
self.y = y
def __getitem__(self, index):
inputs = torch.tensor(self.X[index, :], dtype=torch.float32)
targets = torch.tensor(int(self.y[index]), dtype=torch.int64)
return inputs, targets
def __len__(self):
return self.X.shape[0]
dataset = SimpleDataset(X, y)
example, label = dataset[0]
InvalidIndexError: (tensor(0), slice(None, None, None))
The same was fixed when I change the code of the fetch_openml to:
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)
The problem was that whithout the as_frame, scikit will import the data as a DataFrame, not as numpy anymore.
Hi,
when executing the 2nd cell:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import *
from idlmam import *
I get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-376bfb908340> in <module>()
2 import torch.nn as nn
3 import torch.nn.functional as F
----> 4 from torch.utils.data import *
5 from idlmam import *
AttributeError: module 'torch.utils.data' has no attribute 'BatchSamplerDistributedSamplerDataset'
it is solved by importing only the used modules
from torch.utils.data import Dataset, DataLoader, TensorDataset
I'm not sure if it has something to do with my setup at Collab, but based on this post, it is related to version 1.7.0 of PyTorch
Thanks
When running in colab (using GPU) I got the following error in cell:
rnn_3layer = nn.Sequential( #Simple old style RNN
EmbeddingPackable(nn.Embedding(len(all_letters), 64)), #(B, T) -> (B, T, D)
nn.RNN(64, n, num_layers=3, batch_first=True), #(B, T, D) -> ( (B,T,D) , (S, B, D) )
LastTimeStep(rnn_layers=3), #We need to take the RNN output and reduce it to one item, (B, D)
nn.Linear(n, len(namge_language_data)), #(B, D) -> (B, classes)
)
#Apply gradient cliping to maximize its performance
for p in rnn_3layer.parameters():
p.register_hook(lambda grad: torch.clamp(grad, -5, 5))
rnn_results = train_network(rnn_3layer, loss_func, train_lang_loader, val_loader=test_lang_loader, score_funcs={'Accuracy': accuracy_score}, device=device, epochs=10)
Error is:
/usr/local/lib/python3.6/dist-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
242
243 data, batch_sizes = \
--> 244 _VF._pack_padded_sequence(input, lengths, batch_first)
245 return _packed_sequence_init(data, batch_sizes, sorted_indices, None)
246
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
I think in the forward pass of the TransformerEncoder a padding mask for the attention should be used.
The padding tokens need to be excluded when calculating the attention weights. This is related to Chapter 12.2.1.
See cell 33 here. See also the PyTorch docs for refernece.
It should be changed into something like this (the src_key_padding_mask needs to be True
for the values that need to be masked out):
def forward(self, input):
if self.padding_idx is not None:
mask = input != self.padding_idx
src_key_padding_mask = torch.logical_not(mask)
else:
mask = input == input
src_key_padding_mask = None
x = self.embd(input) #(B, T, D)
x = self.position(x) #(B, T, D)
#Because the resut of our code is (B, T, D), but transformers
#take input as (T, B, D), we will have to permute the order
#of the dimensions before and after
x = self.transformer(x.permute(1,0,2), src_key_padding_mask=src_key_padding_mask) #(T, B, D)
x = x.permute(1,0,2) #(B, T, D)
#average over time
context = x.sum(dim=1)/mask.sum(dim=1).unsqueeze(1)
return self.pred(self.attn(x, context, mask=mask))```
Dear Edward,
From page 21 to page 23, when we are talking about auto grad,
we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun
my question is : why not just to test whether the grad of cur is zero or not ?
that is to say :
can
while torch.linalg.norm(x_cur-x_prev) > epsilon:
be replaced by
epsilon = 1e-12 # an enough small value
while abs(cur.grad) > epsilon:
?
thanks a lot !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.