You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello , thanks for sharing your code, it is really helpful.
I notice there is a hyperparameter top-p, the code is here. When we run decode, this hyperparameter is set -1, so we don't actually use "top-p sampling".
But I still wonder what it is for , did you use it in your experiment?if we use it,what is the appropriate value? Could you please provide me with further details or refer me to any relevant literature that would allow me to better understand it
Thank you in advance for your assistance
The text was updated successfully, but these errors were encountered:
Hi,
We dind't use top-p sampling in our experiment. During sampling, we compute the logits of each token, and you can do top-p sampling or beam search based on this. These sampling strategies can be easily borrowed from the generation of AR models. You're free to try it. However, honestly speaking, top-p or beam search may not work as much as you think. But it is still worth to try and investigate meticulously.
Hello , thanks for sharing your code, it is really helpful.
I notice there is a hyperparameter top-p, the code is here. When we run decode, this hyperparameter is set -1, so we don't actually use "top-p sampling".
But I still wonder what it is for , did you use it in your experiment?if we use it,what is the appropriate value? Could you please provide me with further details or refer me to any relevant literature that would allow me to better understand it
Thank you in advance for your assistance
The text was updated successfully, but these errors were encountered: