Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service jupyter-xyz not found #333

Open
zi-dan opened this issue Sep 16, 2019 · 2 comments
Open

Service jupyter-xyz not found #333

zi-dan opened this issue Sep 16, 2019 · 2 comments
Labels

Comments

@zi-dan
Copy link

zi-dan commented Sep 16, 2019

Hi guys,

I have problem with my JupyterHub+DockerSpawner implementation. In my system user can choose a image to sprawn (SwarmSpawner.image_whitelist) and change it by stoping existing server and starting another one. Many times during this process (and also during pure start of the spawning process) service response by "Service jupyter-xyz not found".

[D 2019-09-16 14:20:03.917 JupyterHub base:288] Refreshing auth for zi-dan
[D 2019-09-16 14:20:03.917 JupyterHub base:785] Initiating spawn for zi-dan
[D 2019-09-16 14:20:03.917 JupyterHub base:792] 0/100 concurrent spawns
[D 2019-09-16 14:20:03.917 JupyterHub base:797] 0 active servers
[D 2019-09-16 14:20:03.971 JupyterHub user:542] Calling Spawner.start for zi-dan
[D 2019-09-16 14:20:03.976 JupyterHub dockerspawner:777] Getting container 'jupyter-zi-dan'
[W 2019-09-16 14:20:03.979 JupyterHub dockerspawner:976] Removing service that should have been cleaned up: jupyter-zi-dan (id: mos0vpk)
[I 2019-09-16 14:20:03.979 JupyterHub swarmspawner:232] Removing service mos0vpkud10s82akzhckbly06
[I 2019-09-16 14:20:04.027 JupyterHub dockerspawner:990] Created service jupyter-zi-dan (id: nzl1noe) from image reg.jupyteo.com/eodata-notebook:2.1.1
[I 2019-09-16 14:20:04.027 JupyterHub dockerspawner:1013] Starting service jupyter-zi-dan (id: nzl1noe)
[D 2019-09-16 14:20:04.027 JupyterHub swarmspawner:144] Getting task of service 'jupyter-zi-dan'
[D 2019-09-16 14:20:04.027 JupyterHub dockerspawner:777] Getting container 'jupyter-zi-dan'
[E 2019-09-16 14:20:04.033 JupyterHub user:626] Unhandled error starting zi-dan's server: Service jupyter-zi-dan not found
[D 2019-09-16 14:20:04.034 JupyterHub user:724] Stopping zi-dan
[D 2019-09-16 14:20:04.034 JupyterHub swarmspawner:144] Getting task of service 'jupyter-zi-dan'
[D 2019-09-16 14:20:04.034 JupyterHub dockerspawner:777] Getting container 'jupyter-zi-dan'
[W 2019-09-16 14:20:04.040 JupyterHub swarmspawner:128] Service jupyter-zi-dan not found
[D 2019-09-16 14:20:04.048 JupyterHub user:752] Deleting oauth client jupyterhub-user-zi-dan
[D 2019-09-16 14:20:04.061 JupyterHub user:755] Finished stopping zi-dan
[E 2019-09-16 14:20:04.070 JupyterHub pages:217] Failed to spawn single-user server with form
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/pages.py", line 214, in post
        await self.spawn_single_user(user, server_name=server_name, options=options)
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py", line 894, in spawn_single_user
        timedelta(seconds=self.slow_spawn_timeout), finish_spawn_future
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py", line 812, in finish_user_spawn
        await spawn_future
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 642, in spawn
        raise e
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 546, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/opt/conda/lib/python3.6/site-packages/dockerspawner/dockerspawner.py", line 1017, in start
        yield self.start_object()
      File "/opt/conda/lib/python3.6/site-packages/dockerspawner/swarmspawner.py", line 252, in start_object
        raise RuntimeError("Service %s not found" % self.service_name)
    RuntimeError: Service jupyter-zi-dan not found

[I 2019-09-16 14:20:04.072 JupyterHub log:174] 200 POST /hub/spawn ([email protected]) 164.66ms

After couple of attempts (sometimes only one is needed) container is spawned properly and everything works well. In logs I see that the existing service is removed and new one is calling but spawner still call previous one.

It is possible to manage this issue or prevent it? Any ideas to remove this quite annoying issue?

@Wildcarde
Copy link

Wildcarde commented Sep 19, 2019

I am having the same behavior here.
My setup:
docker compose file launching proxy + hub these containers launch onto the swarm's network which is set as attachable.
Image is pre-pushed to all swarm nodes; home directories and other disk requirements are mounted via nfs onto all docker host systems.

Hub can launch a service, you can exec into the hub and use curl to verify the service is running at port 8888, but sometimes the hub can't detect that the service is running and generates a 500 error or a 'service unavailable' page. Refreshing will work around the second most times, the first requires re-attempting 'start my server' from the 'home' tab and isn't particularly reliable. It seems to work better if they first log all the way out then log back in.

Running watch docker service jupyter-<username> ps while this happens shows the service launching on a given swarm node, and using docker-compose exec hub curl http://jupyter-<username>:8888 results in output indicating the service starts. Once the timeout is reached jhub kills the service (usually).

jupyterhub_config.py:

# Configuration file for Jupyter Hub
c = get_config()

import os
import sys
import pwd, grp
import subprocess
sys.path.insert(0, '/srv/jupyterhub_config')


### required if you are launching a seperate proxy server!!!!!
c.ConfigurableHTTPProxy.should_start = False
c.ConfigurableHTTPProxy.api_url = 'http://proxy:8001'

# Base configuration
c.JupyterHub.log_level = "DEBUG"
#c.JupyterHub.admin_access = True
c.JupyterHub.confirm_no_ssl = True
c.Authenticator.admin_users = ['XXXX']
c.JupyterHub.hub_ip = '0.0.0.0'



### docker spawner configuration:

c.JupyterHub.spawner_class = 'dockerspawner.SwarmSpawner'
c.DockerSpawner.host_ip = "0.0.0.0"
c.SystemUserSpawner.host_homedir_format_string = '/XXXX/home/{username}'
c.Spawner.mem_limit = '2G'
c.DockerSpawner.image = 'XXXXX:5000/XXXX/systemuser'
c.DockerSpawner.remove_containers = True
#c.DockerSpawner.remove_containers = False
c.DockerSpawner.notebook_dir = '/home/jovyan'
c.SwarmSpawner.notebook_dir = 'home/jovyan'

c.Spawner.start_timeout = 45
c.Spawner.timeout = 70
c.Spawner.poll_interval = 120

### this function mounts a specific users home dir into their swarm container.  All files are owned by joyvan
## it mounts an individuals homedir in as the jovyan user THIS ACTUALLY WORKS
## swarm spawner test 2:
def create_homedir_hook(spawner):
    username = spawner.user.name  # get the username
    volume_path = os.path.join('/XXXXX/home/', username)
    if not os.path.exists(volume_path):
        os.mkdir(volume_path, 0o755)
        os.chown(volume_path,1000,100)
    mounts_user = [
                   {'type': 'bind',
                    'source': '/XXXX/home/' + username,
                    'target': '/home/jovyan/', },
                   {'type': 'bind',
                    'source': '/XXXX/exchange',
                    'target': '/mnt/exchange', },
                   {'type': 'bind',
                    'source': '/XXXX/classdat',
                     'target': '/mnt/classdat',}
                  ]
    ## it's not clear why the above works and alternative mounting approachs do not, but
    ## this is a reliable way to make sure volume mounts work in swarmspawner
    spawner.extra_container_spec = {
        'mounts': mounts_user
    }
    spawner.environment['SYSGROUPS']=getgroups(username)

c.Spawner.pre_spawn_hook = create_homedir_hook


## Need to capture groups - this is an utterly brute force of the `id` command
## this works because we are mounting pam into the jhub front end (see compose file).
def getgroups(user):
    groups=os.popen('/usr/bin/id {}'.format(user)).read()
    return groups


## docker spawner network configuration:
#network_name = os.environ['DOCKER_NETWORK_NAME'] #local var named 'network_name'
network_name = 'jhub-swarmnet'
c.SwarmSpawner.network_name = network_name
c.SwarmSpawner.extra_host_config = {'network_mode': network_name}
### this is the main fulcrum of things i don't know if work.

## some quick debugging code to check what IPs the container thinks it has
from jupyter_client.localinterfaces import public_ips
print('public IPs: ',public_ips())
c.DockerSpawner.hub_ip_connect = public_ips()[0]


### setup nbgrader service:
### i have no idea what populating this does more broadly, but it makes a list of graders here
c.JupyterHub.load_groups = {
  'formgraders-XXXX': [
    XXXX,]
  }

c.JupyterHub.services = [
    {
        'name': 'XXXX',
        'url': 'http://hub:9999',
        'command': [
            'jupyterhub-singleuser',
            '--group=formgraders-XXXX',
            '--debug',
            '--ip=0.0.0.0',

        ],
        'user': 's_nbgrader',
        'cwd': '/mnt/classdat',
        #'cwd': '/home/s_nbgrader/servicewd'
    },
    {
        'name': 'cull-idle',
        'admin': True,
        'command': [sys.executable, 'cull_idle_servers.py', '--timeout=3600'],
    }
]

## the 'user' here is a service account that will own and maintain the jhub service for grading work
## this user is uid 1999
## creates a service at pynote.pni.princeton.edu/services/XXXXX for grading work for the class
## this service is litterally just a jupyter notebook with the formgrader system enabled.

@minrk
Copy link
Member

minrk commented Mar 12, 2021

Hi! I’m going through and cleaning up old/stale issues on this repo. Sorry for not responding in a reasonable amount of time!

I believe the issue here is a failure to wait for the previous removal to complete, as swarm, unlike regular docker, appears to leave services in a gettable state while waiting for delete to complete.

The fix should be to add a wait to SwarmSpawner.remove_object to not return until the service is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants