Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - 2024.9.1 Upgrade Bug with Auto-Creation of Group Directory Role #2766

Open
kenafoster opened this issue Oct 11, 2024 · 3 comments
Open

Comments

@kenafoster
Copy link
Contributor

Describe the bug

When doing nebari upgrade from 2024.7.1 to 2024.9.1, there is a step which asks "Would you like Nebari to assign the corresponding role to all of your current groups automatically? [y/N] (N): ". This is due to the fact that in 2024.9.1, Keycloak groups will not automatically get JupyterHub shared directories created/mounted for them UNLESS the Keycloak group is manually assigned a JupyterHub Client Role (three groups get directories by default - admin, analyst, developer)

There are two issues:

FIrst, if you choose "Y", Nebari has to then interact directly with the Keycloak REST API using the credentials in nebari-config. If you have changed the Keycloak root password to something outside of the config (a good practice especially if you're committing the config file to a repo for CICD) then it uses an invalid password. The way to work around this is to retrieve the valid password and temporarily make it the value of security.keycloak.initial_root_password ... just be careful not to commit the real password. Maybe there's a fix to this? The Keycloak terraform stages are able to interact with the API even with the root password not stored in plaintext config... I haven't looked exactly into how that happened. In any case, even if it's the intended behavior/only possible solution to use the nebari-config file value, maybe some help text would aid users in troubleshooting (or at the very least I hope they come across this issue!)

Second, once you have a valid Keycloak credential, the second problem is "allow-group-directory-creation-role" doesn't exist. The commit 6a16cb8 that adds this role isn't present in 2024.7.1. So you have a chicken-and-egg problem... can't get the role until the upgrade, and can't upgrade without the role

The workaround is to manually create the role in Keycloak. I'm currently in the process of finishing the upgrade via CI/CD... once 2024.9.1 actually deploys to AWS, I'll see whether this creates any errors.

Expected behavior

If the user enters "Y", the process of creating and assigning the role to current groups should succeed.

OS and architecture in which you are running Nebari

MacOS Sequoia 15.0.1 ARM (apple silicon)

How to Reproduce the problem?

Begin with a Nebari 2024.7.1 deployment and a corresponding

Upgrade your Nebari CLI to 2024.9.1

Run nebari upgrade -c nebari-config.yaml

You'll encounter the first problem (401: Invalid User Credentials) if you have set your Keycloak root password to something other than what is in your config file

You'll encounter the second issue (404: Could not find role) once you have gotten past the first issue (fix the value of security.keycloak.initial_root_password if needed).

Command output

First issue (not necessarily a bug per comment above):

`ValueError: Failed to connect to Keycloak server: 401: b'{"error":"invalid_grant","error_description":"Invalid user credentials"}'`

Second issue:


Would you like Nebari to assign the corresponding role to all of your current groups automatically? [y/N] (N): y
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/_nebari/subcom │
│ mands/upgrade.py:37 in upgrade                                                                   │
│                                                                                                  │
│   34 │   │   │   │   f"passed in configuration filename={config_filename} must exist"            │
│   35 │   │   │   )                                                                               │
│   36 │   │                                                                                       │
│ ❱ 37 │   │   do_upgrade(config_filename, attempt_fixes=attempt_fixes)                            │
│   38                                                                                             │
│                                                                                                  │
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/_nebari/upgrad │
│ e.py:92 in do_upgrade                                                                            │
│                                                                                                  │
│     89 │                                                                                         │
│     90 │   start_version = config.get("nebari_version", "")                                      │
│     91 │                                                                                         │
│ ❱   92 │   UpgradeStep.upgrade(                                                                  │
│     93 │   │   config, start_version, __version__, config_filename, attempt_fixes                │
│     94 │   )                                                                                     │
│     95                                                                                           │
│                                                                                                  │
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/_nebari/upgrad │
│ e.py:205 in upgrade                                                                              │
│                                                                                                  │
│    202 │   │   current_start_version = start_version                                             │
│    203 │   │   for stepcls in [cls._steps[str(v)] for v in step_versions]:                       │
│    204 │   │   │   step = stepcls()                                                              │
│ ❱  205 │   │   │   config = step.upgrade_step(                                                   │
│    206 │   │   │   │   config,                                                                   │
│    207 │   │   │   │   current_start_version,                                                    │
│    208 │   │   │   │   config_filename,                                                          │
│                                                                                                  │
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/_nebari/upgrad │
│ e.py:416 in upgrade_step                                                                         │
│                                                                                                  │
│    413 │   │   │   │   )                                                                         │
│    414 │   │                                                                                     │
│    415 │   │   # Run any version-specific tasks                                                  │
│ ❱  416 │   │   return self._version_specific_upgrade(                                            │
│    417 │   │   │   config, start_version, config_filename, *args, **kwargs                       │
│    418 │   │   )                                                                                 │
│    419                                                                                           │
│                                                                                                  │
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/_nebari/upgrad │
│ e.py:1293 in _version_specific_upgrade                                                           │
│                                                                                                  │
│   1290 │   │   │   # Proceed with updating group permissions                                     │
│   1291 │   │   │   client_id = keycloak_admin.get_client_id("jupyterhub")                        │
│   1292 │   │   │   role_name = "allow-group-directory-creation-role"                             │
│ ❱ 1293 │   │   │   role_id = keycloak_admin.get_client_role_id(                                  │
│   1294 │   │   │   │   client_id=client_id, role_name=role_name                                  │
│   1295 │   │   │   )                                                                             │
│   1296 │   │   │   role_representation = keycloak_admin.get_role_by_id(role_id=role_id)          │
│                                                                                                  │
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/keycloak/keycl │
│ oak_admin.py:2423 in get_client_role_id                                                          │
│                                                                                                  │
│   2420 │   │   :return: role_id                                                                  │
│   2421 │   │   :rtype: str                                                                       │
│   2422 │   │   """
│ ❱ 2423 │   │   role = self.get_client_role(client_id, role_name)                                 │
│   2424 │   │   return role.get("id")                                                             │
│   2425 │                                                                                         │
│   2426 │   def create_client_role(self, client_role_id, payload, skip_exists=False):             │
│                                                                                                  │
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/keycloak/keycl │
│ oak_admin.py:2406 in get_client_role                                                             │
│                                                                                                  │
│   2403 │   │   data_raw = self.connection.raw_get(                                               │
│   2404 │   │   │   urls_patterns.URL_ADMIN_CLIENT_ROLE.format(**params_path)                     │
│   2405 │   │   )                                                                                 │
│ ❱ 2406 │   │   return raise_error_from_response(data_raw, KeycloakGetError)                      │
│   2407 │                                                                                         │
│   2408 │   def get_client_role_id(self, client_id, role_name):                                   │
│   2409 │   │   """Get client role id by name.                                                    │
│                                                                                                  │
│ /Users/kfoster/miniconda3/envs/jatic-nebari-internal/lib/python3.12/site-packages/keycloak/excep │
│ tions.py:192 in raise_error_from_response                                                        │
│                                                                                                  │
│   189 │   │   if response.status_code == 401:                                                    │
│   190 │   │   │   error = KeycloakAuthenticationError                                            │
│   191 │                                                                                          │
│ ❱ 192 │   raise error(                                                                           │
│   193 │   │   error_message=message, response_code=response.status_code, response_body=respons   │
│   194 │   )                                                                                      │
│   195                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeycloakGetError: 404: b'{"error":"Could not find role"}'


### Versions and dependencies used.

Nebari 2024.9.1

### Compute environment

AWS

### Integrations

_No response_

### Anything else?

_No response_
@kenafoster kenafoster added type: bug 🐛 Something isn't working needs: triage 🚦 Someone needs to have a look at this issue and triage labels Oct 11, 2024
@marcelovilla marcelovilla added area: user experience 👩🏻‍💻 area: integration/keycloak and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels Oct 14, 2024
@marcelovilla
Copy link
Member

The workaround is to manually create the role in Keycloak. I'm currently in the process of finishing the upgrade via CI/CD... once 2024.9.1 actually deploys to AWS, I'll see whether this creates any errors.

@kenafoster were you able to upgrade to 2024.9.1 using the workaround?

@viniciusdc do you have any ideas on how to fix this?

@viniciusdc
Copy link
Contributor

viniciusdc commented Oct 14, 2024

Thanks, @kenafoster, for the fantastic attention to detail in this issue. That's awesome.

It looks like the upgrade path for the role assumes the role's presence already to proceed with the rest of the logic. That's a flaw on my end while testing it. Hopefully, things like this will soon be catched with our updates on release testing.

To answer your and @marcelovilla's questions, the way to address this is to create the role prior to the actual deployment. This can be done with an extra check in the upgrade command, but I am worried that Terraform might later complain about the role's existence while applying the changes.

I will do a test this afternoon and follow back here, if that all succeeds then the idea above is the way to go.

@kenafoster
Copy link
Contributor Author

@marcelovilla FYI the 2024.9.1 deploy fails if you have manually previously created allow-group-directory-creation-role (Keycloak API error - client role already exists)

While I guess you could try and manually import it into Terraform state then re-run the deploy, I found it easiest to manually delete the role, run the deploy (which then succeeded) and then re-create the role and manually assign it to groups.

Given that the upgrade auto-assign step doesn't work if the client role doesn't exist, and then the subsequent deploy step doesn't work if the client role does exist, I don't think there's a path where this feature is functional in this implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New 🚦
Development

No branches or pull requests

3 participants