-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT Read replication and moving the API to a second database #4427
base: main
Are you sure you want to change the base?
Conversation
This runs a full install sequence. Ready for testing on preview.
Adds the access tables to its own startup. Admin API relies on it, or it won't start.
How did I miss those?
We should be able to inspect more tables, for debugging.
This brings up the full API, and is ready for performance testing in `preview`.
It is cleaner than it was, and may make it easier to debug the `preview` deploy.
Adds grabbing sling into the mix
Testing again
Also, added partitions. Partitions are 3x faster for downloading all the data (by year), but 10x slower than batches (via EXPLAIN). In the real, it is 2x slower to download than batches, but 3x faster than a straight download. So. Probably worth it, if we can document that people downloading all the data should either 1) do so year-by-year, or 2) use batches.
And, redeploying...
Need to just return a non-zero exit code...
Need to *always* succeed...
Terraform plan for meta No changes. Your infrastructure matches the configuration.
📝 Plan generated in Pull Request Checks #3931 |
Terraform plan for dev Plan: 2 to add, 1 to change, 2 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.dev.cloudfoundry_app.postgrest will be updated in-place
!~ resource "cloudfoundry_app" "postgrest" {
!~ environment = (sensitive value)
id = "5de6f04c-52e0-400f-961a-16243b1c7c9a"
!~ id_bg = "************************************" -> (known after apply)
name = "postgrest"
# (16 unchanged attributes hidden)
# (1 unchanged block hidden)
}
# module.dev.cloudfoundry_service_key.postgrest must be replaced
-/+ resource "cloudfoundry_service_key" "postgrest" {
!~ credentials = (sensitive value)
!~ id = "************************************" -> (known after apply)
name = "postgrest"
!~ service_instance = "b036f306-5950-4078-9309-cfda6ed03482" -> "86a11021-7922-411b-b0ca-4341b7a0b911" # forces replacement
}
# module.dev.module.cors.null_resource.cors_header must be replaced
-/+ resource "null_resource" "cors_header" {
!~ id = "*******************" -> (known after apply)
!~ triggers = { # forces replacement
!~ "always_run" = "2024-11-13T21:32:16Z" -> (known after apply)
}
}
Plan: 2 to add, 1 to change, 2 to destroy. 📝 Plan generated in Pull Request Checks #3931 |
Added max timeout to request
Now uses Django factory testing (like we do with our other tests that fall under `manage.py test`) for the tests that were written up by Matt. INCOMPLETE - check the "FIXME" comment. Currently the environment is hardcoded to "local" under the `ApiTests` class, and I haven't gotten a "success" response from the following tests. - `test_suppressed_not_accessible_with_bad_key` - `test_suppressed_accessible_with_good_key`
Adding some notes where I've left off from this PR today, if this gets picked up before I come back. Before we perform any review for move this PR forward, we need to finish cleaning up the linting and the failed django tests. Bandit is reporting a few problems that just need a simple tweak to fix, but there are 2 more critical vulnerabilities for SQL injection that it is reporting as well. Also, the python and accessibility tests are failing, I haven't had much time to crank through these. |
CREATE VIEW api_v2_0_0.combined AS | ||
SELECT * FROM public_data_v1_0_0.combined comb | ||
; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
census_gsa_crosswalk view is missing here. This needs to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies; I still didn't get to that.
You could add it and do a PR against the codebase. However, it also wants to be added in the .sql
in the... fac-snapshot-db/post
folder.
Actually, I'd be happy to pair (Monday?) for 30m to help add this in. It would be faster, I suspect, for both of us.
Still in progress. This message will be removed, and the PR moved out of draft, when it is ready. Saving incrementally.
This PR represents work from @asteel-gsa , @rnovak338 , @jadudm , and has benefited greatly from conversation with many members of the team.
This is a large PR.
DB1 =
fac-db
DB2 =
fac-snapshot-db
Changes
api_v1_0_3
is removed, as it was suppressing all Tribal data, and therefore arguably incorrect.admni_api_v...
is removed. No administrative access happens via the API. This also removes an entire access control mechanism. 🎉 (A good thing for maintenance, compliance.)api_v1_1_0
is the new default API.api_v2_0_0
is introducedpublic_data_v1_0_0
andsuppressed_data_v1_0_0
are the schemas. Only data that is public is in thepublic
data, and only data that is suppressed (e.g. audits from Tribes/Tribal Organizations that have opted to suppress their data) is on thesuppressed
schema.api_v2_0_0
uses no access control on data in thepublic
schema, and applies access control to the data in thesuppressed
schema.config
directory, and is handled via onboarding/offboarding just like the cloud.gov access.preview
to validate that the tooling works and is ready for inclusion in the data pipeline. (It only dumps tables from thepublic_data
schema, thus guaranteeing that no suppressed data is processed.)Additional documentation about the architectural changes is in the
jadudm/api-perf
branch (SQL_README.md.Future work/possibilities
public_data
as opposed to doing filtering-while-generating from the internal tables.Testing
First, pull the branch and do
followed by a
compose up
.Watch the boot sequence
The boot sequence simulates both the daily boot sequence in production as well as the nightly GH Action work. That is, it runs all of the SQL involved in the data pipeline against DB2, and then it runs
pre
actions on DB1 (formerlyapi_teardown
), and then migrations, and thenpost
actions on DB1 (formerlyapi_standup
). The pre/post on DB1 is tiny compared to what it was before. The actions on DB2 are larger than what we used to have inapi_standup
/api_teardown
, but that is because we're adding new/improving.If things don't come up, it might be worth dropping a note to the dev channel with an @jadudm . This should be a clean sequence, as tested both via local development as well as in
preview
. This PR does not introduce any changes to the DB1 data models, so there should be no data migration issues. (Is that true? @rnovak338 ? We might introduce a model change in an admin table...) No changes to thesingleauditchecklist
or thedissemination_
tables on DB1.Confirm the app works
Try doing both a basic and advanced search. If both work (and probably return zero results on an empty DB), that means you should have an app that is hitting DB1 for basic search, and DB2 for advanced search.
Load a lot of data
In
backend/util/load_public_dissem_data
is a tool to help pre-populate your database with data. There is aREADME
in that directory with detailed instructions. In short:is_public=false
, but those are actually public records. In other words, it is all public data, and some of it is intentionally marked as suppressed for testing purposes.)make build
andmake run
the container.You will need your full FAC stack running, and there is a note in the README about making sure that you are on the same virtual docker network. (That is, the containers all need to talk to each other.)
The data loading will take a few minutes. It is loading the
dissemination_
tables in DB1 with around 300Kgeneral
records and 4Mfederal_award
records (and a bunch of other data in the other tables, too).Run the FAC again
Once you have loaded the data, bring your FAC stack down (
docker compose down
), and back up (docker compose up
). This time, the simulation will take a lot longer. It willpublic
andsuppressed
tablesThe
SQL_README
has more information on that process. The indexing can be particularly slow on some machines.Once the FAC is back up, you should be able to:
93
will go much quicker. Refining the search further should generally increase the search performance.Test the API
Once you have data, testing the API is interesting/possible.
In
backend/sql
istest_api.py
. (I don't know where it would be better to put this file. It can be moved.)You should have, in your environment:
FAC_API_KEY
, which, in truth, does not matter for local tests... but will if you want to test againstpreview
CYPRESS_API_GOV_USER_ID
, which is the same ID you would have available for E2E testing with CypressCYPRESS_API_GOV_JWT
, which (again) you should already have from E2E testsThen, run the tests.
will run the test script by passing the
env
variable to all tests (with it set to"local"
). The environment variable on the command line,CAN_READ_SUPPRESSED
, means that the key you are using does not have Tribal data access. Therefore, it should fail when you attempt to access theapi_v2_0_0
endpoints that deal with suppressed data.If you set
CAN_READ_SUPPRESSED=1
you should be able to run again and get failing tests. That is because your key is not approved to access suppressed data!
Getting the expected results from local testing is a good thing, even if they are failing tests. Now, you want to grant tribal access to your key, and see if the tests change.
Testing in
preview
The script can also be used to test in
preview
.Preview currently has the large/public data set loaded, with the faked suppressed data. (PREVIEW DOES NOT CONTAIN SUPPRESSED DATA. IT CONTAINS FAKE SUPPRESSED DATA.) By changing the environment to
preview
, and still setting the env var appropriately, it is possible to run the pytest against the "production" environment, and confirm that access controls are in place there.You will need to grant access to your key in
preview
, which may require adding yourself to a file and restarting the app (so that the config change is read). That, or edit the DB tables directly in order to affect the change.Grant access to your key
In order to grant access to your key, you can either use the new admin panel, or edit the database directly.
To use the admin panel [RN EDIT HERE]...
Performance testing
There is an
api_perf
script floating around; it is relatively easy to inspect/modify and use against either the local stack or thepreview
site. It was primarily used to evaluate early design decisions in the API. It might be useful as another kind of check.PR Checklist: Submitter
main
into your branch shortly before creating the PR. (You should also be mergingmain
into your branch regularly during development.)git status | grep migrations
. If there are any results, you probably need to add them to the branch for the PR. Your PR should have only one new migration file for each of the component apps, except in rare circumstances; you may need to delete some and re-runpython manage.py makemigrations
to reduce the number to one. (Also, unless in exceptional circumstances, your PR should not delete any migration files.)PR Checklist: Reviewer
make docker-clean; make docker-first-run && docker compose up
; then rundocker compose exec web /bin/bash -c "python manage.py test"
The larger the PR, the stricter we should be about these points.
Pre Merge Checklist: Merger
-/+ resource "null_resource" "cors_header"
should be destroying and recreating its self and~ resource "cloudfoundry_app" "clamav_api"
might be updating itssha256
for thefac-file-scanner
andfac-av-${ENV}
by default.main
.