Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summary for standalone GQ, VI, etc. #667

Open
bob-carpenter opened this issue Apr 10, 2023 · 10 comments
Open

summary for standalone GQ, VI, etc. #667

bob-carpenter opened this issue Apr 10, 2023 · 10 comments
Labels
feature New feature or request

Comments

@bob-carpenter
Copy link
Contributor

Summary

Any object returned by sampling should allow .summary() to be called on it to report mean, sd, MCMC SE, quantiles, and R-hat. This includes objects returned by

  • CmdStanMCMC: MCMC sampling
  • CmndStanGQ: generated quantities
  • CmdStanVB: variational inference
  • ???: Laplace approximation

Is Laplace approximation not supported yet in CmdStanPy? It will also return a sample of multiple draws and should also include a .summary() method.

Description

One way to do this would be to have each of these wrapper objects allow the actual draws to be extracted. Right now, there is a high-level "helper" function, where if mcmc_fit is a CmdStanMCMC object, I just call mcmc_fit.summary() directly. I would rather have this work by mcmc_fit.draws() pulling out a simple draws object on which the summary() operates. Then the other object would also support a .draws() extraction and then summary() would be a standalone function that applies to a draws object rather than to the whole output of a run (I don't know what else is in the CmdStanMCMC object---I only ever use the draws).

Current Version

1.1.0

@bob-carpenter bob-carpenter added the feature New feature or request label Apr 10, 2023
@WardBrian
Copy link
Member

Laplace approximation has not yet landed in a released version of CmdStan, see #649

Note that the .summary method on CmdStanMCMC is just a small wrapper around the stansummary executable. I’m not sure if stansummary does the right thing for the output of ADVI

@ahartikainen
Copy link
Contributor

.draws() should return a draws object (which would then be a custom class or customized numpy array)?

@WardBrian
Copy link
Member

Currently .draws() returns a numpy array which is more or less just the raw contents of the CSV file(s)

@bob-carpenter
Copy link
Contributor Author

I'm not sure what "raw contents of the CSV files" means. Isn't that a string?

Is there a way to summarize the draws after extracting with .draws()?

Am I supposed to just load arviz and just use that somehow? If so, could you give me a hint? I can't understand their doc or their data types and wasted over an hour trying to get a summary out of it last time I tried and wound up giving up.

@WardBrian
Copy link
Member

CmdStanPy contains essentially no analysis code. We really just do ~3 things:

  • Provide a wrapper to build CmdStan models and call them on the command line
  • Provide ways of extracting the results of those runs in a few different structures
  • Wrap additional executables which come with CmdStan, such as the auto-formatter in stanc or the stansummary command.

The current .summary function only exists because of the third bullet point above, and it has exactly the features and limitations of the wrapped executable.

In particular, the things that fall under the second bullet point (.draws, .draws_pd, .draws_xr, etc) are all designed to make it easier to use the output with analysis code, but we don’t provide any such analysis. Probably using Arviz is the most common choice, you can construct Arviz objects from cmdstanpy objects directly using helper functions in Arviz: see

https://python.arviz.org/en/stable/api/generated/arviz.from_cmdstanpy.html
https://oriolabrilpla.cat/en/blog/posts/2022/einstats-hmm-cmdstanpy.html

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Apr 10, 2023

Thanks. Last time I tried this it ended in 2 hours of misery with the Arviz docs trying to define unit tests for Bayes Kit and I eventually gave up.

Edit. So, I tried again. I don't see how from_cmdstanpy is relevant. It takes a CmdStanMCMC object and I only have a CmdStanGQ object. At least I didn't waste a lot of time on it.

@WardBrian
Copy link
Member

Seems like a reasonable feature request for arviz. In the mean time I suspect you could either use arviz.from_cmdstan and point it to fit_gq.runset.csv_files, or use fit_gq.draws_xr() to get an xarray representation of the draws and feed that to arviz.convert_to_inference_data. Only the second is something I have tried before

@bob-carpenter
Copy link
Contributor Author

I'll just drop standalone GQs from my tutorial. I would rather not have to introduce a dependency on arviz. Eventually, I think the right answer for CmdStanPy is to use CmdStan's stansummary function and read it back in with structure (may already be doing that for the MCMC objects).

@WardBrian
Copy link
Member

WardBrian commented Apr 10, 2023

It appears that stansummary can consume the output of things like ADVI or standalone GQ, but it doesn't know anything special about them.

For example, for ADVI, it computes the statistics including the first row of the output, which is not a sample.

So, we would probably need to put a pretty big warning label on using it for anything which isn't exclusively rows of draws

@ahartikainen
Copy link
Contributor

ahartikainen commented Apr 10, 2023

I think arviz.summary can take the output from fit_gq.draws_xr()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants