Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter: individual records no longer in existence should be removed from the GCS export #2902

Open
sha-cy opened this issue Nov 26, 2024 · 3 comments

Comments

@sha-cy
Copy link

sha-cy commented Nov 26, 2024

There seem to be a mismatch between the data that is present in the webapp (osv.dev) and the storage (https://storage.googleapis.com).
In the web app when viewing specific vulnerabilities we can see data about the present vulnerabilities, but when searching for that same vulnerability in the googleapi storage, we see duplicates in different ecosystems.

This means that I don't know on which one i should trust.

To Reproduce
Steps to reproduce the behaviour:

  1. Go to https://osv.dev/list?q=CVE-2014-8176&ecosystem=
  2. The vulnerability is only present in the 'debian' ecosystem
  3. Go to https://storage.googleapis.com/osv-vulnerabilities/index.html?prefix=Debian/ and search for CVE-2014-8176
  4. Go to https://storage.googleapis.com/osv-vulnerabilities/index.html?prefix=Ubuntu/ and search for CVE-2014-8176

Expected behaviour
The data that is viewed in the web app will be the same as the one in the storage

Screenshots
Image
Image
Image

@andrewpollock
Copy link
Contributor

Can you provide some more details on your needs here?

If you want to retrieve individual records, you're best served using the API, e.g. https://api.osv.dev/v1/vulns/CVE-2014-8176 as this is using the same database as the web interface.

There is currently a shortcoming with the exports, and individually exported records are not cleaned up if they are subsequently deleted. The all.zip files are canonical, see https://google.github.io/osv.dev/data/#data-dumps

@andrewpollock andrewpollock changed the title miss match between the data on the website and the json exporter: individual records no longer in existence should be removed from the GCS export Nov 26, 2024
@sha-cy
Copy link
Author

sha-cy commented Nov 26, 2024

i wanted to know if it was a bug that the data in https://storage.googleapis.com/ doesnt match the data in the webapp.

my needs were to know from where i can get data i can trust as there is no indication in the vulnerability json that it is no longer relevant (looking at CVE-2014-8176 in the Ubuntu ecosystem).

and from what I seen in the gs://osv-vulnerabilities//all.zip file also have the same "issue" as the individual file, that vulnerabilities that have changed ecosystem are still displayed.

Image

@andrewpollock
Copy link
Contributor

By needs, it seems like you're wanting to retrieve individual records, in JSON format.

Is the API unsuitable for this task? Or if you're needing to enumerate the entire database, are the zip files sufficient?

We have a known issue with the individual files in GCS not getting cleaned up when a record gets deleted. This needs to be better surfaced in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants