Comments (4)
hell,in the file of transformer-multibranch-v2,the class of TransformerEncoderLayer--the code are as follow:
if args.encoder_branch_type is None:#default=None????
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.I just wonder that do the args.encoder_branch_type equalstrue???
Hi, args.encoder_branch_type
is a list containing the encoder branch type defined in your training yml file.
In my case, I set the encoder_branch_type
in the training yml as encoder-branch-type: [attn:1:32:4, dynamic:default:32:4]
, where 32
represents the embedding dimension, and 4
stands for the attention head numbers.
Hope this helps!
from lite-transformer.
hell,in the file of transformer-multibranch-v2,the class of TransformerEncoderLayer--the code are as follow:
if args.encoder_branch_type is None:#default=None????
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.
I just wonder that do the args.encoder_branch_type equalstrue???Hi,
args.encoder_branch_type
is a list containing the encoder branch type defined in your training yml file.
In my case, I set theencoder_branch_type
in the training yml asencoder-branch-type: [attn:1:32:4, dynamic:default:32:4]
, where32
represents the embedding dimension, and4
stands for the attention head numbers.
Hope this helps!
thx,what'S the meaning of [attn:1:32:4, dynamic:default:32:4]?could you show some details about the list
from lite-transformer.
thx,what'S the meaning of [attn:1:32:4, dynamic:default:32:4]?could you show some details about the list
As I mentioned in my last reply, args.encoder_branch_type
should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for 32
and 4
, they represent params embed_dim
and num_head
when initializing MultiheadAttention
and DynamicconvLayer
modules.
You can find more details on these two params at the
get_layer
method in TransformerEncoderLayer
module.lite-transformer/fairseq/models/transformer_multibranch_v2.py
Lines 617 to 645 in de9631c
Find more details about MultiheadAttention module at
lite-transformer/fairseq/modules/multihead_attention.py
Lines 15 to 27 in de9631c
from lite-transformer.
thx,what'S the meaning of [attn:1:32:4, dynamic:default:32:4]?could you show some details about the list
As I mentioned in my last reply,
args.encoder_branch_type
should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for32
and4
, they represent paramsembed_dim
andnum_head
when initializingMultiheadAttention
andDynamicconvLayer
modules.
You can find more details on these two params at the
get_layer
method inTransformerEncoderLayer
module.
lite-transformer/fairseq/models/transformer_multibranch_v2.py
Lines 617 to 645 in de9631c
Find more details about MultiheadAttention module at
lite-transformer/fairseq/modules/multihead_attention.py
Lines 15 to 27 in de9631c
thx a lot!!!
one more,as shown below,
for layer_type in args.encoder_branch_type:
embed_dims.append(int(layer_type.split(':')[2]))
heads.append(int(layer_type.split(':')[3]))
layers.append(self.get_layer(args, index, embed_dims[-1], heads[-1], layer_type))
self.self_attn = MultiBranch(layers, embed_dims)
the above code appear in the encoderlayer class,as you said,args.encoder_branch_type ==[attn:1:160:4, lightweight:default:160:4],but it lead to some errors,how to comprehend it????
from lite-transformer.
Related Issues (20)
- Quantization HOT 1
- transfomer model with different paramters HOT 3
- Export model to ONNX HOT 1
- Error while evaluating model HOT 9
- Please share your quantization, quantization+pruning checkpoints HOT 1
- Missing Data Preparation section for the CNN / DailyMail dataset HOT 1
- Error while testing the model HOT 8
- Can not get the result as the paper if train the transformer from scratch. HOT 2
- How to measure the FLOPs/MACs? HOT 2
- in paragra 4 of HOT 1
- in the paragra 4 of paper HOT 1
- about the global and local features in fig 3 HOT 2
- about kernel size HOT 1
- about dynamicconv_cuda HOT 1
- about padding!!! HOT 2
- About data ! HOT 1
- wmt16_en_de dataset link HOT 1
- model pruning
- Can‘t find the cnn branch,
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lite-transformer.