Releases: lablup/backend.ai
24.09.1
Features
- Allow regular users to assign agent manually if
hide-agent
configuration is disabled (#2614) - Hide FastTrack (
pipeline
) menu by default on installation byinstall-dev.sh
script. (#3010) - Add an
show_non_installed_images
option to show all images regardless of installation on environment select section in session/service launcher page. (#3124)
Fixes
- Fix
architecture
condition not applied when queryimages
rows (#2989) - Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
- Disallow
None
id encoding inAsyncNode.to_global_id()
. (#2898) - Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
- Fix
order
GQL query argument parser ofgroup_nodes
(#2927) - Set the
postgres_readonly
flag tofalse
when begin generic sessions (#2946) - Fix wrong container registry migration script. (#2949)
- Let GPFS client keep polling when GPFS job is running (#2961)
- Handle
IndexError
when parse string toBinarySize
(#2962) - Handle error when convert
shmem
string value intoBinarySize
(#2972) - Fix a wrong parameter when call 'recalc_agent_resource_occupancy()' (#2982)
- Make image, container_registry table's
project
column nullable and improve container registry storage config migration script. (#2978) - Fix wrong password limit in container registry migration script. (#2986)
- Strengthen join condition between kernels and images to prevent incorrect matches (#2993)
- Enable session commit to different registry, project. (#2997)
- Wrong field reference in
ImageNode
resolver (#3002) - Fix obsolete logic of
untag()
ofHarborRegistry_v2
. (#3004) - Fix
Agent.compute_containers
GraphQL field by adding missing resolver (#3011) - Fix
backend.ai apps
command's faulty argument handling logic. (#3015) - Check Vast data quota with a given name exists before creating quota and change default value of
force_login
config to true (#3023) - Fix
get_logs_from_agent()
to raiseInstanceNotFound
exception for kernels not assigned to agents (#3032) - Fix regression of
ComputeContainer
GraphQL queries due to newly introduced relationship fields (#3042) - Fix model service traffics not distributed equally to every sessions when there are 10 or more replicas (#3043)
- Fix regression of
LegacyComputeSession
GraphQL queries. (#3046) - Include missing legacy logging module in the pex. (#3054)
- Change the name of deleted vfolders with a timestamp suffix when sending them to DELETE_ONGOING status to allow reuse of the vfolder name, for cases when actual deletion takes a long time (#3061)
- Fix model service not routing traffics based on traffic ratio (#3075)
- Fix the broken
ComputeContainer.batch_load_detail
due to the misuse ofselectinload
as follow-up to #3042 (#3078) - Fix session
status_info
not being updated correctly when batch executions fail, ensuring failed batch execution states are properly reflected in the sessions table (#3085) - agent not loading
krunner-extractor
image when Docker instance does not support loading XZ compressed images (#3101)
Full Changelog
Check out the full changelog until this release (24.09.1).
Full Commit Logs
Check out the full commit logs between release (24.09.1rc2) and (24.09.1).
24.09.1rc2
Fixes
- Fix
architecture
condition not applied when queryimages
rows (#2989)
Full Changelog
Check out the full changelog until this release (24.09.1rc2).
Full Commit Logs
Check out the full commit logs between release (24.09.1rc1) and (24.09.1rc2).
24.09.1rc1
Fixes
- Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
- Disallow
None
id encoding inAsyncNode.to_global_id()
. (#2898) - Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
- Fix
order
GQL query argument parser ofgroup_nodes
(#2927) - Set the
postgres_readonly
flag tofalse
when begin generic sessions (#2946) - Fix wrong container registry migration script. (#2949)
- Let GPFS client keep polling when GPFS job is running (#2961)
- Handle
IndexError
when parse string toBinarySize
(#2962) - Handle error when convert
shmem
string value intoBinarySize
(#2972) - Fix a wrong parameter when call 'recalc_agent_resource_occupancy()' (#2982)
Full Changelog
Check out the full changelog until this release (24.09.1rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.0) and (24.09.1rc1).
24.09.0
Features
- Add support for optional payload encryption in the client SDK and CLI as a follow-up to #484 (#493)
- Allow unicode characters in project(user group) name and domain name. (#1663)
- Improve exception logging stability by pre-formatting exception objects instead of pickling/unpickling them (#1759)
- Add new API to create new image from live session (#1973)
- Clear
error_logs
records in theclear-history
command (#1989) - Introduce
mgr schema dump-history
andmgr schema apply-missing-revisions
command to ease the major upgrade involving deviation of database migration histories (#2002) - Update
image forget
CLI command to untag image from registry before forgetting it from the database (#2010) - Update
etcd-client-py
to 0.3.0 (#2014) - Allow self-ssh in single-node single-container compute sessions. (#2032)
- Prevent deleting mounted folders. (#2036)
- Allow agent to report its internal registry snapshot via UNIX domain socket server (#2038)
- New redis client (experimental) (#2041)
- Expose user info to environment variables (#2043)
- Introduce the
rolling_count
GraphQL field to provide the current rate limit counter for a keypair within the designated time window slice (#2050) - Deprecate the reliance on HTTP cookies for authenticating the pipeline service, switching to the use of HTTP headers instead (#2051)
- Allow user to explicitly set filename of model definition YAML (#2063)
- Add the
backend.ai plugin scan
command to inspect the plugin scan results from various entrypoint sources (#2070) - Bring back etcetra-backed Etcd as an option for ditributed lock backend (#2079)
- Enable distribute-lock configuration (#2080)
- Cache volume objects in
RootContext.get_volume
(#2081) - Revamp images GQL query by changing image filtering from flag-based to feature set-based and add
aliases
field to customized image GQL schema (#2136) - Added missing fields for
keypair_resource_policy
in client-py, models, etc. (#2146) - Add parameters to
check-presets
SDK function (#2153) - Add relay-aware
VirtualFolderNode
GQL Query (#2165) - Also perform basic model service validation process when updating model service via
ModifyEndpoint
(#2167) - Add support for mounting arbitrary VFolders on model service session (#2168)
- Add support for CentOS 8 based kernels (#2220)
- Clear zombie routes automatically (#2229)
- Add
scaling_group.agent_count_by_status
andscaling_group.agent_total_resource_slots_by_status
GQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Allow modifying model service session's environment variable setup (#2255)
- Add
endpoint.runtime_variant
column (#2256) - Add new API to show list of supported inference runtimes (#2258)
- Add support for model service provisioning without
model-definition.yaml
(#2260) - Allow superadmins to force-update session status through destroy API. (#2275)
- Add session status check & update API. (#2312)
- Add support for fetching container logs of a specific kernel. (#2364)
- Introduce Python native WSProxy (#2372)
- Implement scanning plugin entrypoints of external packages (#2377)
- Add
row_id
,type
andcontainer_registry
fields to theGroupNode
GQL schema. (#2409) - Add support for PureStorage RapidFiles Toolkit v2 (#2419)
- Add API that extends lifespan of webserver's login session. (#2456)
- Allow bulk association and disassociation of scaling groups with domains, user groups, and key pairs. (#2473)
- Match container's timezone to container host OS when available (#2503)
- Add a pre-setup configuration menu to the TUI installer to allow setting the public-facing address of Backend.AI components (#2541)
- Now Backend.AI can run arbitrary container images without Backend.AI-specific metadata labels by introducing good default values and replacing intrinsic kernel-runner binaries with statically built ones (#2582)
- Allow
Bearer
as valid token type on model service authentication (#2583) - Introduce automatic creation of a 'model-store' group upon inserting a new domain. (#2611)
- Add support for declaring custom description field for GraphQL
relay
edge types. (#2643) - Add an
enable_LLM_playground
option to show/hide the LLM playground tab on the serving page. (#2677) - Add
max_gaudi2_devices_per_container
config on webserver (#2685) - Add
max_atom_plus_device_per_container
config on webserver (#2686) - Introduce Account-manager component. (#2688)
-
- Add query depth limit config of GQL.
- Add page size limit config of GQL Connection.
- Set default page size of GQL Connection to 10. (#2709)
- Add compute session GQL Relay query schema. (#2711)
- Allow
DataLoaderManager
to get a loader function by function itself rather than function name. (#2717) - Allow filter and order in endpointlist gql request. (#2723)
- Add new vfolder API to update sharing status. (#2740)
- Avoid raising a type error even if a particular table in the toml file is empty, as long as the default value for all settings exists. (#2782)
- Add an explicit configuration
scaling-group-type
toagent.toml
so that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796) - Add per-session priority attributes and
ModifyComputeSession
GraphQL mutation to update session names and priorities (#2840) - Add dependee/dependent/graph ComputeSessionNode connection queries (#2844)
- Implement the priority-aware scheduler that applies to any arbitrary scheduler plugin (#2848)
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
Improvements
- Enable robust DB connection handling by allowing
pool-pre-ping
setting. (#1991) - Enhance update mechanism of session & kernel status. (#2311)
- Remove database-level foreign key constraints in
vfolders.{user,group}
columns to decouple the timing of vfolder deletion and user/group deletion. (#2404) - Implement storage-host RBAC interface. (#2505)
- Optimize the query latency when fetching a large number of agents with stat metrics from Redis (#2558)
- Split out
ai.backend.logging
package from theai.backend.common
to improve reusability and reduce the startup time (i.e., import latencies) (#2760) - Avoid using
collections.OrderedDict
when not necessary in the manager API and client SDK ([#2842](https://github.com/lablup/backend...
24.03.11
Features
- Add
vast_use_auth_token
config to utilize VASTData API token optionally. (#2901)
Fixes
- Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in
install-dev.sh
for when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378) - Fix invalid image format log spam in Agent (#2894)
- Update vast quota rather than raise error when quota exists. (#2900)
- Calculate correct expiration time of VAST auth token and add
vast_force_login
config to enable login before every REST API call (#2911)
Full Changelog
Check out the full changelog until this release (24.03.11).
Full Commit Logs
Check out the full commit logs between release (24.03.10) and (24.03.11).
24.09.0rc1
Features
- Migrate container registry config storage from
Etcd
toPostgreSQL
(#1917) - Implement ID-based client workflow to ContainerRegistry API. (#2615)
- Rafactor Base ContainerRegistry's
scan_tag
and implementMEDIA_TYPE_DOCKER_MANIFEST
type handling. (#2620) - Support GitHub Container Registry. (#2621)
- Support GitLab Container Registry. (#2622)
- Support AWS ECR Public Container Registry. (#2623)
- Support AWS ECR Private Container Registry. (#2624)
- Replace rescan command's
--local
flag with local container registry record. (#2665) - Add
project
column to the images table and refactoringImageRef
logic. (#2707) - Support docker image manifest v2 schema1. (#2815)
- Add
filter
andorder
parameters to Group GQL Relay API. (#2863) - Add
vast_use_auth_token
config to utilize VASTData API token optionally. (#2901) - Use a valid value for the
id
field in the GQL schema query resolver forContainerRegistry
. (#2908)
Fixes
- Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in
install-dev.sh
for when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378) - Add missing implementation of wsproxy and manager CLI's log-level customization options (#2698)
- Add missing batch execution call after session starts (#2884)
- Fix a regression of the unicode-aware slug update that prevented creation of dot-prefixed (automount) vfolders (#2892)
- Fix invalid image format log spam in Agent (#2894)
- Fix wrong creation of
raw_configs
in_create_kernels_in_one_agent
(#2896) - Assign valid value to
id
field inContainerRegistryNode
GQL schema query resolver. (#2899) - Update vast quota rather than raise error when quota exists. (#2900)
- Calculate correct expiration time of VAST auth token and add
vast_force_login
config to enable login before every REST API call (#2911)
Full Changelog
Check out the full changelog until this release (24.09.0rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.0b1) and (24.09.0rc1).
24.03.10
Features
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
- Allow
DataLoaderManager
to get a loader function by function itself rather than function name. (#2717) - Add an explicit configuration
scaling-group-type
toagent.toml
so that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796)
Improvements
- Avoid using
collections.OrderedDict
when not necessary in the manager API and client SDK (#2842)
Fixes
- Merge
kernels.role
intosessions.session_type
and check the image compatibility based on comparison with theai.backend.role
label (#1587) - Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
- Fix
kernel_id
assignment for main kernel log retrieval (#2820) - Wrong count of concurrent compute sessions. (#2829)
- Create kernels with correct
scaling_group
value. (#2837) - Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
- Add
scaling_group.agent_count_by_status
andscaling_group.agent_total_resource_slots_by_status
GQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Fix handling of undefined values in the ModifyImage GraphQL mutation. (#2028)
- Silence
model_
namespace warnings with pydantic-based model classes (#2765) - Change the initialization order of PackageContext to apply
target_path
correctly in the TUI installer (#2768) - Make the regex patterns to update configuration files working with multiline texts correctly in the TUI installer (#2771)
- Omit null parameter when call
usage-per-period
API. (#2777) - Handle container port mismatch when creating kernel. (#2786)
- Explicitly set the protected service ports depending on the resource group type and the service types (#2797)
- Correct session status determiner function. (#2803)
- Fix
endpoint_list.total_count
GQL field returning incorrect value (#2805)
External Dependency Updates
- Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)
Miscellaneous
- Enhacne type hints for potential
None
arguments (#2580) - Upgrade
readthedocs
build environment to Python 3.12 (#2814)
Full Changelog
Check out the full changelog until this release (24.03.10).
Full Commit Logs
Check out the full commit logs between release (24.03.10rc1) and (24.03.10).
24.03.10rc1
Features
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
Improvements
- Avoid using
collections.OrderedDict
when not necessary in the manager API and client SDK (#2842)
Fixes
- Merge
kernels.role
intosessions.session_type
and check the image compatibility based on comparison with theai.backend.role
label (#1587) - Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
- Fix
kernel_id
assignment for main kernel log retrieval (#2820) - Wrong count of concurrent compute sessions. (#2829)
- Create kernels with correct
scaling_group
value. (#2837) - Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
External Dependency Updates
- Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)
Miscellaneous
- Enhacne type hints for potential
None
arguments (#2580) - Upgrade
readthedocs
build environment to Python 3.12 (#2814)
Full Changelog
Check out the full changelog until this release (24.03.10rc1).
Full Commit Logs
Check out the full commit logs between release (24.03.10b3) and (24.03.10rc1).
24.03.10b3
No significant changes.
Full Changelog
Check out the full changelog until this release (24.03.10b3).
Full Commit Logs
Check out the full commit logs between release (24.03.10b2) and (24.03.10b3).
24.03.10b2
Fixes
- Fix
Service.create()
SDK method andservice create
CLI command not working withUnboundLocalError
exception (#2806)
Full Changelog
Check out the full changelog until this release (24.03.10b2).
Full Commit Logs
Check out the full commit logs between release (24.03.10b1) and (24.03.10b2).