Giter VIP home page Giter VIP logo

practical_nlp_in_pytorch's People

Contributors

keitakurita avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

practical_nlp_in_pytorch's Issues

Memory allocation for h_t and c_t

I think it would be a tiny bit faster if you allocate the whole memory for h_t and c_t from the beginning in OptimizedLSTM. I mean:

    if init_states is None:
        h_t, c_t = (torch.zeros(bs, self.hidden_size).to(x.device),
                    torch.zeros(bs, self.hidden_size).to(x.device))

instead of:

    if init_states is None:
        h_t, c_t = (torch.zeros(self.hidden_size).to(x.device),
                    torch.zeros(self.hidden_size).to(x.device))

Question: calculation of memory length for validation in TransformerXL

Hi, very helpful repo, learned a lot from it.

I got a question about an implementation detail in TransformerXL.

In the transformer_xl_from_scratch notebook, the memory length during validation is calculated as val_memory_length + train_bptt - val_bptt.

Why aren't it just set to val_memory_length?

Looking forward to reply.

Not enough elements in Tuple for the keyword argument in NaiveLSTM

In /deep_dives/lstm_from_scratch.ipynb,

class NaiveLSTM(nn.Module): 
    ....
    def forward(self, x: torch.Tensor, 
                init_states: Optional[Tuple[torch.Tensor]]=None
               ) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
        """Assumes x is of shape (batch, sequence, feature)"""
        ...

Shouldn't init_states be:
init_states: Optional[Tuple[torch.Tensor, torch.Tensor]]=None

instead of just:
init_states:Optional[Tuple[torch.Tensor]]=None

Because init_states needs exactly two values to unpack to h_t, c_t in:

....
else:
    h_t, c_t = init_states 

Transformer - decoder block does not use encoder output for keys and values in attention mechanism

From the transformer implementation here:

class DecoderBlock(nn.Module):
    level = TensorLoggingLevels.enc_dec_block
    def __init__(self, d_model=512, d_feature=64,
                 d_ff=2048, n_heads=8, dropout=0.1):
        super().__init__()
        self.masked_attn_head = MultiHeadAttention(d_model, d_feature, n_heads, dropout)
        self.attn_head = MultiHeadAttention(d_model, d_feature, n_heads, dropout)
        self.position_wise_feed_forward = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.ReLU(),
            nn.Linear(d_ff, d_model),
        )

        self.layer_norm1 = LayerNorm(d_model)
        self.layer_norm2 = LayerNorm(d_model)
        self.layer_norm3 = LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x, enc_out, 
                src_mask=None, tgt_mask=None):
        # Apply attention to inputs
        att = self.masked_attn_head(x, x, x, mask=src_mask)
        x = x + self.dropout(self.layer_norm1(att))
        # Apply attention to the encoder outputs and outputs of the previous layer
        att = self.attn_head(queries=att, keys=x, values=x, mask=tgt_mask)
        x = x + self.dropout(self.layer_norm2(att))
        # Apply position-wise feedforward network
        pos = self.position_wise_feed_forward(x)
        x = x + self.dropout(self.layer_norm2(pos))
        return x

In the forward mehtod, should:

att = self.attn_head(queries=att, keys=x, values=x, mask=tgt_mask)

Not be:

att = self.attn_head(queries=att, keys=enc_out, values=enc_out, mask=tgt_mask)

config

class Config(dict):
def init(self, **kwargs):
super().init(**kwargs)
for k, v in kwargs.items():
setattr(self, k, v)

def set(self, key, val):
    self[key] = val
    setattr(self, key, val)

config = Config(
testing=False,
bert_model_name="bert-base-uncased",
max_lr=3e-5,
epochs=1,
use_fp16=False,
bs=4,
discriminative=False,
max_seq_len=128,
)

what is the importance of function set here , could you please give an example?

reader.read produced a KeyError

Hey Kei

I found your excellent tutorial when I was searching for ELMO.

I installed the latest AllenNLP (0.8.4) and downloaded your code. I ran it and the cell that contains:

train_ds, test_ds = (reader.read(DATA_ROOT / fname) for fname in ["train.csv", "test.csv"])

and it produced the following error msg:

KeyError: "None of [['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']] are in the [index]"

The train.csv is from a download of the Jigsaw dataset when I participated in the toxic comment competition. The header of the train.csv looks like:

"id","comment_text","toxic","severe_toxic","obscene","threat","insult","identity_hate"

Have you had a chance to run the code w/ the latest AllenNLP? If not, which version were you using? Just being lazy and hoping for a quick pointer before I dive in...

Thx,
SH

var vs tensor error in nb

;-)

attn(q, k, v)


RuntimeError Traceback (most recent call last)
in
----> 1 attn(q, k, v)

/opt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
--> 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)

in forward(self, q, k, v, mask)
28 attn = attn / attn.sum(dim=-1, keepdim=True)
29 attn = self.dropout(attn)
---> 30 output = torch.bmm(attn, v) # (Batch, Seq, Feature)
31 log_size(output, "attention output size") # (Batch, Seq, Seq)
32 return output

RuntimeError: bmm(): argument 'mat2' (position 1) must be Variable, not torch.FloatTensor

attn_head = AttentionHead(20, 20)
attn_head(q, k, v)

TypeError: torch.mm received an invalid combination of arguments - got (torch.FloatTensor, Variable), but expected one of:

  • (torch.FloatTensor source, torch.FloatTensor mat2)
    didn't match because some of the arguments have invalid types: (torch.FloatTensor, !Variable!)
  • (torch.SparseFloatTensor source, torch.FloatTensor mat2)
    didn't match because some of the arguments have invalid types: (!torch.FloatTensor!, !Variable!)

SciBert embedding

Hi!
what if I want to use scibert embedding in my model is it enough just to replace this code :
`from allennlp.data.token_indexers import PretrainedBertIndexer

token_indexer = PretrainedBertIndexer(
pretrained_model="bert-base-uncased",
max_pieces=config.max_seq_len,
do_lowercase=True,
)

def tokenizer(s: str):
return token_indexer.wordpiece_tokenizer(s)[:config.max_seq_len - 2]`

by this code

` from allennlp.data.token_indexers import PretrainedBertIndexer

token_indexer = PretrainedBertIndexer(
pretrained_model="scibert-scivocab-uncased",
max_pieces=config.max_seq_len,
do_lowercase=True,
)

def tokenizer(s: str):
return token_indexer.wordpiece_tokenizer(s)[:config.max_seq_len - 2]`

ImportError: cannot import name 'PretrainedBertIndexer'

when trying to understand
bert_text_classification.ipynb

this part of notebook

from allennlp.data.token_indexers import PretrainedBertIndexer

token_indexer = PretrainedBertIndexer(
pretrained_model="bert-base-uncased",
max_pieces=config.max_seq_len,
do_lowercase=True,
)

apparently we need to truncate the sequence here, which is a stupid design decision

def tokenizer(s: str):
return token_indexer.wordpiece_tokenizer(s)[:config.max_seq_len - 2]

gives this error

ImportError Traceback (most recent call last)
in ()
----> 1 from allennlp.data.token_indexers import PretrainedBertIndexer
2
3 token_indexer = PretrainedBertIndexer(
4 pretrained_model="bert-base-uncased",
5 max_pieces=config.max_seq_len,

ImportError: cannot import name 'PretrainedBertIndexer'


NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

allennlp version used is 1.0.0

seems version differ , I could not find the solution
what should I do?

Size of the tensors

Hi ty for your tutorial. But I can't figure out why sometimes tensors have shape (input_size, hidden_size) and sometimes (hidden_size, hidden_size)

Attribute error while loading databunch

Hi,

I have created a custom databunch which I am trying to load using load_data. But I am getting an attribute error -

File “/home/views.py”, line 641, in get
path, r"/home/data_save.pkl")
File “/usr/local/lib/python3.7/site-packages/fastai/basic_data.py”, line 281, in load_data
ll = torch.load(source, map_location=‘cpu’) if defaults.device == torch.device(‘cpu’) else torch.load(source)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/site-packages/torch/serialization.py”, line 702, in _legacy_load
result = unpickler.load()
AttributeError: Can’t get attribute ‘FastAiBertTokenizer’ on <module ’ main ’ from ‘manage.py’>

The FastAiBertTokenizer has been defined in the program but I am still getting the error.

Maybe I have to define this function or import it in the context that I’m loading the databunch. But I don’t know how.

This is the code -

path = Path()
data = load_data(path, r"data_save.pkl")
bert_model = CustomBertModel()
learn = Learner(data, bert_model, metrics=[accuracy])
st2 = torch.load(r"final_model_base.pth", map_location=torch.device('cpu'))
learn.model.state_dict(st2)
Can you help me with this?

License

You do not have a license file.

Any restrictions on reusing / modifying your code and/or blog text?

Thanks

transformer_xl encoder

Thanks for this great tutorial on tranformer. In transformer_xl model I don't see any Encoder class and the decoder is fed directly with inputs. Could you clarify what part of the transformer_xl code is responsible for encoder ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.