Why is trained embedding orthogonal? #53

Peacer68 · 2023-06-28T13:19:41Z

I load the model ema_0.9999_050000.pt shared by you (thanks for sharing), and find the word_embedding.weight is orthogonal, which is wierd! This means trained embedding failing to learn semantic relevance between words, and it just seperates words far away to tolerate the generation error.
Here is my code:

pt_path = '/home/workarea/Diffusion/DiffuSeq-Fork/diffusion_models/diffuseq_qqp_h128_lr0.0001_t2000_sqrt_lossaware_seed102_test_ori20221113-20_27_29/ema_0.9999_050000.pt'
s = torch.load(pt_path, map_location=torch.device('cpu'))
weight = s['word_embedding.weight']
mm = torch.softmax(torch.mm(weight, weight.transpose(0,1)), dim=-1)
print(mm.trace()/mm.size(0))
# the result is 1!

Could you please explain this phenomenon? Thanks a lot!

xiaotingxuan · 2023-10-12T05:57:39Z

Hi , I am also confused about that. When I visualize the trained embedding and bert embedding，I find its very different.
here is trained embedding，it looks like a gaussian distribution（I visualize gaussian noise ，it looks very similar with the following picture）

here is bert embedding （load from pretrained bert-base-uncased）

I notice that some papers said using bert embedding is useless for diffusion，a learnable embedding is better. Why pretrained bert embedding is useless，is it because it's distribution is different？Why a learnable embedding is better, after the training process, it still fails to learn semantic relevance between words ，Hope someone can give some advice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is trained embedding orthogonal? #53

Why is trained embedding orthogonal? #53

Peacer68 commented Jun 28, 2023

xiaotingxuan commented Oct 12, 2023

Why is trained embedding orthogonal? #53

Why is trained embedding orthogonal? #53

Comments

Peacer68 commented Jun 28, 2023

xiaotingxuan commented Oct 12, 2023