Skip to content

lidongxing/mask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

mask

mask style 1.padding mask embedding = token embedding + positional embedding so the embedding input is needed to multiply padding mask. shape as [batch_size, len_sequence], or [batch_size,len_sequence,1] [[1,1,1,0,0],[1,1,1,1,0]] or [[[1] [1] [1] [0] [0]]

[[1] [1] [1] [1] [0]]]

  1. encoder self-attention mask attention mask is square matric. such as: 'I like [pad]'. self-attention mask is : [[1, 1, 0], [1, 1, 0], [0, 0, 0]]
  2. decoder masked attention is a square matric = padding mask & low-triangle matric such as: 'I like [pad]'. low-triangle: [[1, 0, 0], [1, 1, 0], [1, 1, 1], padding mask: [[1], [1], [0]] The result is: [[1, 0, 0], [1, 1, 0], [0, 0, 0]]
  3. encoder-decoder attention: If the source sequence is 'I like [pad]' The attention is: [[1, 1, 0], [1, 1, 0], [1, 1, 0], [1, 1, 0]] The number of rows is the token number of target sequence.

The last attention point is that dot product between querys and keys multiply zero mask firstly and then add the -inf attention before softmax operation. score *= score_mask #score is the dot product, *score_mask is to delete the padding function score += ((score_mask-1) * 1e+10) # add -inf is to delete the "0" attention function between tokens in softmax operation

Releases

No releases published

Packages

No packages published