Skip to content

Releases: lablup/backend.ai

24.09.1

25 Nov 02:09
df37a40
Compare
Choose a tag to compare

Features

  • Allow regular users to assign agent manually if hide-agent configuration is disabled (#2614)
  • Hide FastTrack (pipeline) menu by default on installation by install-dev.sh script. (#3010)
  • Add an show_non_installed_images option to show all images regardless of installation on environment select section in session/service launcher page. (#3124)

Fixes

  • Fix architecture condition not applied when query images rows (#2989)
  • Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
  • Disallow None id encoding in AsyncNode.to_global_id(). (#2898)
  • Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
  • Fix order GQL query argument parser of group_nodes (#2927)
  • Set the postgres_readonly flag to false when begin generic sessions (#2946)
  • Fix wrong container registry migration script. (#2949)
  • Let GPFS client keep polling when GPFS job is running (#2961)
  • Handle IndexError when parse string to BinarySize (#2962)
  • Handle error when convert shmem string value into BinarySize (#2972)
  • Fix a wrong parameter when call 'recalc_agent_resource_occupancy()' (#2982)
  • Make image, container_registry table's project column nullable and improve container registry storage config migration script. (#2978)
  • Fix wrong password limit in container registry migration script. (#2986)
  • Strengthen join condition between kernels and images to prevent incorrect matches (#2993)
  • Enable session commit to different registry, project. (#2997)
  • Wrong field reference in ImageNode resolver (#3002)
  • Fix obsolete logic of untag() of HarborRegistry_v2. (#3004)
  • Fix Agent.compute_containers GraphQL field by adding missing resolver (#3011)
  • Fix backend.ai apps command's faulty argument handling logic. (#3015)
  • Check Vast data quota with a given name exists before creating quota and change default value of force_login config to true (#3023)
  • Fix get_logs_from_agent() to raise InstanceNotFound exception for kernels not assigned to agents (#3032)
  • Fix regression of ComputeContainer GraphQL queries due to newly introduced relationship fields (#3042)
  • Fix model service traffics not distributed equally to every sessions when there are 10 or more replicas (#3043)
  • Fix regression of LegacyComputeSession GraphQL queries. (#3046)
  • Include missing legacy logging module in the pex. (#3054)
  • Change the name of deleted vfolders with a timestamp suffix when sending them to DELETE_ONGOING status to allow reuse of the vfolder name, for cases when actual deletion takes a long time (#3061)
  • Fix model service not routing traffics based on traffic ratio (#3075)
  • Fix the broken ComputeContainer.batch_load_detail due to the misuse of selectinload as follow-up to #3042 (#3078)
  • Fix session status_info not being updated correctly when batch executions fail, ensuring failed batch execution states are properly reflected in the sessions table (#3085)
  • agent not loading krunner-extractor image when Docker instance does not support loading XZ compressed images (#3101)

Full Changelog

Check out the full changelog until this release (24.09.1).

Full Commit Logs

Check out the full commit logs between release (24.09.1rc2) and (24.09.1).

24.09.1rc2

28 Oct 07:33
dcbac22
Compare
Choose a tag to compare
24.09.1rc2 Pre-release
Pre-release

Fixes

  • Fix architecture condition not applied when query images rows (#2989)

Full Changelog

Check out the full changelog until this release (24.09.1rc2).

Full Commit Logs

Check out the full commit logs between release (24.09.1rc1) and (24.09.1rc2).

24.09.1rc1

25 Oct 14:05
1d84d0f
Compare
Choose a tag to compare
24.09.1rc1 Pre-release
Pre-release

Fixes

  • Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
  • Disallow None id encoding in AsyncNode.to_global_id(). (#2898)
  • Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
  • Fix order GQL query argument parser of group_nodes (#2927)
  • Set the postgres_readonly flag to false when begin generic sessions (#2946)
  • Fix wrong container registry migration script. (#2949)
  • Let GPFS client keep polling when GPFS job is running (#2961)
  • Handle IndexError when parse string to BinarySize (#2962)
  • Handle error when convert shmem string value into BinarySize (#2972)
  • Fix a wrong parameter when call 'recalc_agent_resource_occupancy()' (#2982)

Full Changelog

Check out the full changelog until this release (24.09.1rc1).

Full Commit Logs

Check out the full commit logs between release (24.09.0) and (24.09.1rc1).

24.09.0

21 Oct 15:00
98ab6c8
Compare
Choose a tag to compare

Features

  • Add support for optional payload encryption in the client SDK and CLI as a follow-up to #484 (#493)
  • Allow unicode characters in project(user group) name and domain name. (#1663)
  • Improve exception logging stability by pre-formatting exception objects instead of pickling/unpickling them (#1759)
  • Add new API to create new image from live session (#1973)
  • Clear error_logs records in the clear-history command (#1989)
  • Introduce mgr schema dump-history and mgr schema apply-missing-revisions command to ease the major upgrade involving deviation of database migration histories (#2002)
  • Update image forget CLI command to untag image from registry before forgetting it from the database (#2010)
  • Update etcd-client-py to 0.3.0 (#2014)
  • Allow self-ssh in single-node single-container compute sessions. (#2032)
  • Prevent deleting mounted folders. (#2036)
  • Allow agent to report its internal registry snapshot via UNIX domain socket server (#2038)
  • New redis client (experimental) (#2041)
  • Expose user info to environment variables (#2043)
  • Introduce the rolling_count GraphQL field to provide the current rate limit counter for a keypair within the designated time window slice (#2050)
  • Deprecate the reliance on HTTP cookies for authenticating the pipeline service, switching to the use of HTTP headers instead (#2051)
  • Allow user to explicitly set filename of model definition YAML (#2063)
  • Add the backend.ai plugin scan command to inspect the plugin scan results from various entrypoint sources (#2070)
  • Bring back etcetra-backed Etcd as an option for ditributed lock backend (#2079)
  • Enable distribute-lock configuration (#2080)
  • Cache volume objects in RootContext.get_volume (#2081)
  • Revamp images GQL query by changing image filtering from flag-based to feature set-based and add aliases field to customized image GQL schema (#2136)
  • Added missing fields for keypair_resource_policy in client-py, models, etc. (#2146)
  • Add parameters to check-presets SDK function (#2153)
  • Add relay-aware VirtualFolderNode GQL Query (#2165)
  • Also perform basic model service validation process when updating model service via ModifyEndpoint (#2167)
  • Add support for mounting arbitrary VFolders on model service session (#2168)
  • Add support for CentOS 8 based kernels (#2220)
  • Clear zombie routes automatically (#2229)
  • Add scaling_group.agent_count_by_status and scaling_group.agent_total_resource_slots_by_status GQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254)
  • Allow modifying model service session's environment variable setup (#2255)
  • Add endpoint.runtime_variant column (#2256)
  • Add new API to show list of supported inference runtimes (#2258)
  • Add support for model service provisioning without model-definition.yaml (#2260)
  • Allow superadmins to force-update session status through destroy API. (#2275)
  • Add session status check & update API. (#2312)
  • Add support for fetching container logs of a specific kernel. (#2364)
  • Introduce Python native WSProxy (#2372)
  • Implement scanning plugin entrypoints of external packages (#2377)
  • Add row_id, type and container_registry fields to the GroupNode GQL schema. (#2409)
  • Add support for PureStorage RapidFiles Toolkit v2 (#2419)
  • Add API that extends lifespan of webserver's login session. (#2456)
  • Allow bulk association and disassociation of scaling groups with domains, user groups, and key pairs. (#2473)
  • Match container's timezone to container host OS when available (#2503)
  • Add a pre-setup configuration menu to the TUI installer to allow setting the public-facing address of Backend.AI components (#2541)
  • Now Backend.AI can run arbitrary container images without Backend.AI-specific metadata labels by introducing good default values and replacing intrinsic kernel-runner binaries with statically built ones (#2582)
  • Allow Bearer as valid token type on model service authentication (#2583)
  • Introduce automatic creation of a 'model-store' group upon inserting a new domain. (#2611)
  • Add support for declaring custom description field for GraphQL relay edge types. (#2643)
  • Add an enable_LLM_playground option to show/hide the LLM playground tab on the serving page. (#2677)
  • Add max_gaudi2_devices_per_container config on webserver (#2685)
  • Add max_atom_plus_device_per_container config on webserver (#2686)
  • Introduce Account-manager component. (#2688)
    • Add query depth limit config of GQL.
    • Add page size limit config of GQL Connection.
    • Set default page size of GQL Connection to 10. (#2709)
  • Add compute session GQL Relay query schema. (#2711)
  • Allow DataLoaderManager to get a loader function by function itself rather than function name. (#2717)
  • Allow filter and order in endpointlist gql request. (#2723)
  • Add new vfolder API to update sharing status. (#2740)
  • Avoid raising a type error even if a particular table in the toml file is empty, as long as the default value for all settings exists. (#2782)
  • Add an explicit configuration scaling-group-type to agent.toml so that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796)
  • Add per-session priority attributes and ModifyComputeSession GraphQL mutation to update session names and priorities (#2840)
  • Add dependee/dependent/graph ComputeSessionNode connection queries (#2844)
  • Implement the priority-aware scheduler that applies to any arbitrary scheduler plugin (#2848)
  • Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)

Improvements

  • Enable robust DB connection handling by allowing pool-pre-ping setting. (#1991)
  • Enhance update mechanism of session & kernel status. (#2311)
  • Remove database-level foreign key constraints in vfolders.{user,group} columns to decouple the timing of vfolder deletion and user/group deletion. (#2404)
  • Implement storage-host RBAC interface. (#2505)
  • Optimize the query latency when fetching a large number of agents with stat metrics from Redis (#2558)
  • Split out ai.backend.logging package from the ai.backend.common to improve reusability and reduce the startup time (i.e., import latencies) (#2760)
  • Avoid using collections.OrderedDict when not necessary in the manager API and client SDK ([#2842](https://github.com/lablup/backend...
Read more

24.03.11

21 Oct 05:01
4cfdadb
Compare
Choose a tag to compare

Features

  • Add vast_use_auth_token config to utilize VASTData API token optionally. (#2901)

Fixes

  • Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in install-dev.sh for when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378)
  • Fix invalid image format log spam in Agent (#2894)
  • Update vast quota rather than raise error when quota exists. (#2900)
  • Calculate correct expiration time of VAST auth token and add vast_force_login config to enable login before every REST API call (#2911)

Full Changelog

Check out the full changelog until this release (24.03.11).

Full Commit Logs

Check out the full commit logs between release (24.03.10) and (24.03.11).

24.09.0rc1

21 Oct 03:22
5cf86c5
Compare
Choose a tag to compare
24.09.0rc1 Pre-release
Pre-release

Features

  • Migrate container registry config storage from Etcd to PostgreSQL (#1917)
  • Implement ID-based client workflow to ContainerRegistry API. (#2615)
  • Rafactor Base ContainerRegistry's scan_tag and implement MEDIA_TYPE_DOCKER_MANIFEST type handling. (#2620)
  • Support GitHub Container Registry. (#2621)
  • Support GitLab Container Registry. (#2622)
  • Support AWS ECR Public Container Registry. (#2623)
  • Support AWS ECR Private Container Registry. (#2624)
  • Replace rescan command's --local flag with local container registry record. (#2665)
  • Add project column to the images table and refactoring ImageRef logic. (#2707)
  • Support docker image manifest v2 schema1. (#2815)
  • Add filter and order parameters to Group GQL Relay API. (#2863)
  • Add vast_use_auth_token config to utilize VASTData API token optionally. (#2901)
  • Use a valid value for the id field in the GQL schema query resolver for ContainerRegistry. (#2908)

Fixes

  • Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in install-dev.sh for when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378)
  • Add missing implementation of wsproxy and manager CLI's log-level customization options (#2698)
  • Add missing batch execution call after session starts (#2884)
  • Fix a regression of the unicode-aware slug update that prevented creation of dot-prefixed (automount) vfolders (#2892)
  • Fix invalid image format log spam in Agent (#2894)
  • Fix wrong creation of raw_configs in _create_kernels_in_one_agent (#2896)
  • Assign valid value to id field in ContainerRegistryNode GQL schema query resolver. (#2899)
  • Update vast quota rather than raise error when quota exists. (#2900)
  • Calculate correct expiration time of VAST auth token and add vast_force_login config to enable login before every REST API call (#2911)

Full Changelog

Check out the full changelog until this release (24.09.0rc1).

Full Commit Logs

Check out the full commit logs between release (24.09.0b1) and (24.09.0rc1).

24.03.10

27 Sep 13:25
a9d5f50
Compare
Choose a tag to compare

Features

  • Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
  • Allow DataLoaderManager to get a loader function by function itself rather than function name. (#2717)
  • Add an explicit configuration scaling-group-type to agent.toml so that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796)

Improvements

  • Avoid using collections.OrderedDict when not necessary in the manager API and client SDK (#2842)

Fixes

  • Merge kernels.role into sessions.session_type and check the image compatibility based on comparison with the ai.backend.role label (#1587)
  • Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
  • Fix kernel_id assignment for main kernel log retrieval (#2820)
  • Wrong count of concurrent compute sessions. (#2829)
  • Create kernels with correct scaling_group value. (#2837)
  • Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
  • Add scaling_group.agent_count_by_status and scaling_group.agent_total_resource_slots_by_status GQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254)
  • Fix handling of undefined values in the ModifyImage GraphQL mutation. (#2028)
  • Silence model_ namespace warnings with pydantic-based model classes (#2765)
  • Change the initialization order of PackageContext to apply target_path correctly in the TUI installer (#2768)
  • Make the regex patterns to update configuration files working with multiline texts correctly in the TUI installer (#2771)
  • Omit null parameter when call usage-per-period API. (#2777)
  • Handle container port mismatch when creating kernel. (#2786)
  • Explicitly set the protected service ports depending on the resource group type and the service types (#2797)
  • Correct session status determiner function. (#2803)
  • Fix endpoint_list.total_count GQL field returning incorrect value (#2805)

External Dependency Updates

  • Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)

Miscellaneous

  • Enhacne type hints for potential None arguments (#2580)
  • Upgrade readthedocs build environment to Python 3.12 (#2814)

Full Changelog

Check out the full changelog until this release (24.03.10).

Full Commit Logs

Check out the full commit logs between release (24.03.10rc1) and (24.03.10).

24.03.10rc1

27 Sep 02:48
7113af4
Compare
Choose a tag to compare
24.03.10rc1 Pre-release
Pre-release

Features

  • Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)

Improvements

  • Avoid using collections.OrderedDict when not necessary in the manager API and client SDK (#2842)

Fixes

  • Merge kernels.role into sessions.session_type and check the image compatibility based on comparison with the ai.backend.role label (#1587)
  • Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
  • Fix kernel_id assignment for main kernel log retrieval (#2820)
  • Wrong count of concurrent compute sessions. (#2829)
  • Create kernels with correct scaling_group value. (#2837)
  • Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)

External Dependency Updates

  • Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)

Miscellaneous

  • Enhacne type hints for potential None arguments (#2580)
  • Upgrade readthedocs build environment to Python 3.12 (#2814)

Full Changelog

Check out the full changelog until this release (24.03.10rc1).

Full Commit Logs

Check out the full commit logs between release (24.03.10b3) and (24.03.10rc1).

24.03.10b3

05 Sep 01:34
84e71a5
Compare
Choose a tag to compare
24.03.10b3 Pre-release
Pre-release

No significant changes.

Full Changelog

Check out the full changelog until this release (24.03.10b3).

Full Commit Logs

Check out the full commit logs between release (24.03.10b2) and (24.03.10b3).

24.03.10b2

04 Sep 15:59
81627d7
Compare
Choose a tag to compare
24.03.10b2 Pre-release
Pre-release

Fixes

  • Fix Service.create() SDK method and service create CLI command not working with UnboundLocalError exception (#2806)

Full Changelog

Check out the full changelog until this release (24.03.10b2).

Full Commit Logs

Check out the full commit logs between release (24.03.10b1) and (24.03.10b2).