Question About top-p sampling #44

xiaotingxuan · 2023-04-07T07:43:50Z

Hello , thanks for sharing your code, it is really helpful.

I notice there is a hyperparameter top-p, the code is here. When we run decode, this hyperparameter is set -1, so we don't actually use "top-p sampling".

But I still wonder what it is for , did you use it in your experiment？if we use it，what is the appropriate value? Could you please provide me with further details or refer me to any relevant literature that would allow me to better understand it

Thank you in advance for your assistance

summmeer · 2023-04-11T09:04:13Z

Hi,
We dind't use top-p sampling in our experiment. During sampling, we compute the logits of each token, and you can do top-p sampling or beam search based on this. These sampling strategies can be easily borrowed from the generation of AR models. You're free to try it. However, honestly speaking, top-p or beam search may not work as much as you think. But it is still worth to try and investigate meticulously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About top-p sampling #44

Question About top-p sampling #44

xiaotingxuan commented Apr 7, 2023

summmeer commented Apr 11, 2023

Question About top-p sampling #44

Question About top-p sampling #44

Comments

xiaotingxuan commented Apr 7, 2023

summmeer commented Apr 11, 2023