TransformerEncoderLayer about lite-transformer HOT 4 OPEN

mit-han-lab commented on June 9, 2024

TransformerEncoderLayer

from lite-transformer.

Comments (4)

realzza commented on June 9, 2024

hell，in the file of transformer-multibranch-v2，the class of TransformerEncoderLayer--the code are as follow：
if args.encoder_branch_type is None:#default=None？？？？
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.

I just wonder that do the args.encoder_branch_type equalstrue？？？

Hi, args.encoder_branch_type is a list containing the encoder branch type defined in your training yml file.
In my case, I set the encoder_branch_type in the training yml as encoder-branch-type: [attn:1:32:4, dynamic:default:32:4], where 32 represents the embedding dimension, and 4 stands for the attention head numbers.
Hope this helps!

from lite-transformer.

sanwei111 commented on June 9, 2024

hell，in the file of transformer-multibranch-v2，the class of TransformerEncoderLayer--the code are as follow：
if args.encoder_branch_type is None:#default=None？？？？
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.
I just wonder that do the args.encoder_branch_type equalstrue？？？

Hi, args.encoder_branch_type is a list containing the encoder branch type defined in your training yml file.
In my case, I set the encoder_branch_type in the training yml as encoder-branch-type: [attn:1:32:4, dynamic:default:32:4], where 32 represents the embedding dimension, and 4 stands for the attention head numbers.
Hope this helps!

thx，what'S the meaning of [attn:1:32:4, dynamic:default:32:4]？could you show some details about the list

from lite-transformer.

realzza commented on June 9, 2024

thx，what'S the meaning of [attn:1:32:4, dynamic:default:32:4]？could you show some details about the list

As I mentioned in my last reply, args.encoder_branch_type should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for 32 and 4, they represent params embed_dim and num_head when initializing MultiheadAttention and DynamicconvLayer modules.

lite-transformer/configs/cnndm/attention/multibranch_v2/embed496.yml

Line 36 in de9631c

encoder-branch-type: [attn:1:248:4, dynamic:default:248:4]

You can find more details on these two params at the get_layer method in TransformerEncoderLayer module.

lite-transformer/fairseq/models/transformer_multibranch_v2.py

Lines 617 to 645 in de9631c

 def get_layer(self, args, index, out_dim, num_heads, layer_type): 

 kernel_size = layer_type.split(':')[1] 

 if kernel_size == 'default': 

 kernel_size = args.encoder_kernel_size_list[index] 

 else: 

 kernel_size = int(kernel_size) 

 padding_l = kernel_size // 2 if kernel_size % 2 == 1 else ((kernel_size - 1) // 2, kernel_size // 2) 

 if 'lightweight' in layer_type: 

 layer = LightweightConv( 

 out_dim, kernel_size, padding_l=padding_l, weight_softmax=args.weight_softmax, 

 num_heads=num_heads, weight_dropout=args.weight_dropout, 

 with_linear=args.conv_linear, 

 ) 

 elif 'dynamic' in layer_type: 

 layer = DynamicConv( 

 out_dim, kernel_size, padding_l=padding_l, 

 weight_softmax=args.weight_softmax, num_heads=num_heads, 

 weight_dropout=args.weight_dropout, with_linear=args.conv_linear, 

 glu=args.encoder_glu, 

 ) 

 elif 'attn' in layer_type: 

 layer = MultiheadAttention( 

 out_dim, num_heads, 

 dropout=args.attention_dropout, self_attention=True, 

 ) 

 else: 

 raise NotImplementedError 

 return layer

Find more details about MultiheadAttention module at

lite-transformer/fairseq/modules/multihead_attention.py

Lines 15 to 27 in de9631c

 class MultiheadAttention(nn.Module): 

 """Multi-headed attention. 

  See "Attention Is All You Need" for more details. 

  """ 

 def __init__(self, embed_dim, num_heads, kdim=None, vdim=None, dropout=0., bias=True, 

 add_bias_kv=False, add_zero_attn=False, self_attention=False, 

 encoder_decoder_attention=False): 

 super().__init__() 

 self.embed_dim = embed_dim 

 self.kdim = kdim if kdim is not None else embed_dim 

 self.vdim = vdim if vdim is not None else embed_dim 

 self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim

from lite-transformer.

sanwei111 commented on June 9, 2024

thx，what'S the meaning of [attn:1:32:4, dynamic:default:32:4]？could you show some details about the list

As I mentioned in my last reply, args.encoder_branch_type should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for 32 and 4, they represent params embed_dim and num_head when initializing MultiheadAttention and DynamicconvLayer modules.

lite-transformer/configs/cnndm/attention/multibranch_v2/embed496.yml

Line 36 in de9631c

encoder-branch-type: [attn:1:248:4, dynamic:default:248:4]

You can find more details on these two params at the get_layer method in TransformerEncoderLayer module.

lite-transformer/fairseq/models/transformer_multibranch_v2.py

Lines 617 to 645 in de9631c

def get_layer(self, args, index, out_dim, num_heads, layer_type):

kernel_size = layer_type.split(':')[1]

if kernel_size == 'default':

kernel_size = args.encoder_kernel_size_list[index]

else:

kernel_size = int(kernel_size)

padding_l = kernel_size // 2 if kernel_size % 2 == 1 else ((kernel_size - 1) // 2, kernel_size // 2)

if 'lightweight' in layer_type:

layer = LightweightConv(

out_dim, kernel_size, padding_l=padding_l, weight_softmax=args.weight_softmax,

num_heads=num_heads, weight_dropout=args.weight_dropout,

with_linear=args.conv_linear,

)

elif 'dynamic' in layer_type:

layer = DynamicConv(

out_dim, kernel_size, padding_l=padding_l,

weight_softmax=args.weight_softmax, num_heads=num_heads,

weight_dropout=args.weight_dropout, with_linear=args.conv_linear,

glu=args.encoder_glu,

)

elif 'attn' in layer_type:

layer = MultiheadAttention(

out_dim, num_heads,

dropout=args.attention_dropout, self_attention=True,

)

else:

raise NotImplementedError

return layer

Find more details about MultiheadAttention module at

lite-transformer/fairseq/modules/multihead_attention.py

Lines 15 to 27 in de9631c

class MultiheadAttention(nn.Module):

"""Multi-headed attention.

See "Attention Is All You Need" for more details.

"""

def __init__(self, embed_dim, num_heads, kdim=None, vdim=None, dropout=0., bias=True,

add_bias_kv=False, add_zero_attn=False, self_attention=False,

encoder_decoder_attention=False):

super().__init__()

self.embed_dim = embed_dim

self.kdim = kdim if kdim is not None else embed_dim

self.vdim = vdim if vdim is not None else embed_dim

self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim

thx a lot！！！
one more，as shown below，

for layer_type in args.encoder_branch_type:
embed_dims.append(int(layer_type.split(':')[2]))
heads.append(int(layer_type.split(':')[3]))
layers.append(self.get_layer(args, index, embed_dims[-1], heads[-1], layer_type))
self.self_attn = MultiBranch(layers, embed_dims)

the above code appear in the encoderlayer class，as you said，args.encoder_branch_type ==[attn:1:160:4, lightweight:default:160:4]，but it lead to some errors，how to comprehend it？？？？

from lite-transformer.

TransformerEncoderLayer about lite-transformer HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def get_layer(self, args, index, out_dim, num_heads, layer_type):
	kernel_size = layer_type.split(':')[1]
	if kernel_size == 'default':
	kernel_size = args.encoder_kernel_size_list[index]
	else:
	kernel_size = int(kernel_size)
	padding_l = kernel_size // 2 if kernel_size % 2 == 1 else ((kernel_size - 1) // 2, kernel_size // 2)
	if 'lightweight' in layer_type:
	layer = LightweightConv(
	out_dim, kernel_size, padding_l=padding_l, weight_softmax=args.weight_softmax,
	num_heads=num_heads, weight_dropout=args.weight_dropout,
	with_linear=args.conv_linear,
	)
	elif 'dynamic' in layer_type:
	layer = DynamicConv(
	out_dim, kernel_size, padding_l=padding_l,
	weight_softmax=args.weight_softmax, num_heads=num_heads,
	weight_dropout=args.weight_dropout, with_linear=args.conv_linear,
	glu=args.encoder_glu,
	)
	elif 'attn' in layer_type:
	layer = MultiheadAttention(
	out_dim, num_heads,
	dropout=args.attention_dropout, self_attention=True,
	)
	else:
	raise NotImplementedError

	return layer

	class MultiheadAttention(nn.Module):
	"""Multi-headed attention.
	See "Attention Is All You Need" for more details.
	"""

	def __init__(self, embed_dim, num_heads, kdim=None, vdim=None, dropout=0., bias=True,
	add_bias_kv=False, add_zero_attn=False, self_attention=False,
	encoder_decoder_attention=False):
	super().__init__()
	self.embed_dim = embed_dim
	self.kdim = kdim if kdim is not None else embed_dim
	self.vdim = vdim if vdim is not None else embed_dim
	self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim