You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I load the model ema_0.9999_050000.pt shared by you (thanks for sharing), and find the word_embedding.weight is orthogonal, which is wierd! This means trained embedding failing to learn semantic relevance between words, and it just seperates words far away to tolerate the generation error.
Here is my code:
pt_path='/home/workarea/Diffusion/DiffuSeq-Fork/diffusion_models/diffuseq_qqp_h128_lr0.0001_t2000_sqrt_lossaware_seed102_test_ori20221113-20_27_29/ema_0.9999_050000.pt's=torch.load(pt_path, map_location=torch.device('cpu'))
weight=s['word_embedding.weight']
mm=torch.softmax(torch.mm(weight, weight.transpose(0,1)), dim=-1)
print(mm.trace()/mm.size(0))
# the result is 1!
Could you please explain this phenomenon? Thanks a lot!
The text was updated successfully, but these errors were encountered:
Hi , I am also confused about that. When I visualize the trained embedding and bert embedding,I find its very different.
here is trained embedding,it looks like a gaussian distribution(I visualize gaussian noise ,it looks very similar with the following picture)
here is bert embedding (load from pretrained bert-base-uncased)
I notice that some papers said using bert embedding is useless for diffusion,a learnable embedding is better. Why pretrained bert embedding is useless,is it because it's distribution is different?Why a learnable embedding is better, after the training process, it still fails to learn semantic relevance between words ,Hope someone can give some advice.
I load the model ema_0.9999_050000.pt shared by you (thanks for sharing), and find the word_embedding.weight is orthogonal, which is wierd! This means trained embedding failing to learn semantic relevance between words, and it just seperates words far away to tolerate the generation error.
Here is my code:
Could you please explain this phenomenon? Thanks a lot!
The text was updated successfully, but these errors were encountered: