Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better logs for @conda/@pypi #2080

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Better logs for @conda/@pypi #2080

wants to merge 7 commits into from

Conversation

savingoyal
Copy link
Collaborator

@savingoyal savingoyal commented Oct 5, 2024

also drive by roughly ~40% speed up on benchmark flows

@savingoyal savingoyal changed the title speed up container bootstrap for @pypi/@conda speed up bootstrap for @pypi/@conda Oct 6, 2024
@savingoyal savingoyal changed the title speed up bootstrap for @pypi/@conda Better logs for @conda/@pypi Oct 6, 2024
def info(self):
return self._call(["config", "list", "-a"])

@functools.lru_cache(maxsize=None)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to check if there is any unintended consequence of this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an implicit None returned from this call which will be cached, if the environment has not been formed yet. This might cause problems in some cases


with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
# install micromamba, download conda and pypi packages in parallel
future_install_micromamba = executor.submit(install_micromamba, architecture)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if download from datastore possible

@@ -960,7 +1027,7 @@ def start(
ctx.obj.environment = [
e for e in ENVIRONMENTS + [MetaflowEnvironment] if e.TYPE == environment
][0](ctx.obj.flow)
ctx.obj.environment.validate_environment(ctx.obj.logger, datastore)
ctx.obj.environment.validate_environment(echo, datastore)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take care of symmetric changes for fast bakery

executor.submit(self.solvers[solver].create, *result)
if storage:
# parallel cache
executor.submit(cache, storage, [result], solver)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pypi execution can start a lot earlier

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to be careful with the timing though, as a lot of pypi steps rely on
self.micromamba.path_to_environment(id_) which can get stuck to None if called too early.

"underline": kwargs.get("underline", False),
}

if animate:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check how animate and overwrite shows up in runtime logs and notebooks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notebooks seem broken still, animate produces new lines for each frame, even though the \r echoing by itself works outside of Metaflow runtime.

Copy link
Contributor

@romain-intel romain-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look at non CLI part.

One general comment as well is we have too many echo functions (I was bitten already). It would be nice to clean that up and make all of them take the same arguments.

I like the idea of adding a spinner like that though.

_animation_stop.clear()
click.echo("\r", nl=False, err=kwargs["err"])

if indent:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify: if indent and not animate

line = INDENT + line

animation_kwargs = {
"fg": kwargs.get("fg"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fallbacks needed to be safe.

"underline": kwargs.get("underline", False),
}

if animate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please!


if animate:
if animate is True:
get_frame = lambda line: f"{next(_default_spinner)} {line}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-string not supported in 3.5

if animate is True:
get_frame = lambda line: f"{next(_default_spinner)} {line}"
else:
get_frame = animate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused here -- what is animate? It doesn't seem to be a simple bool here so a bit confused.



def _animate(get_frame, line, err, kwargs, indent):
while not _animation_stop.is_set():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem to respect the same arguments as the non animate case.

"underline": kwargs.get("underline", False),
}

if animate:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notebooks seem broken still, animate produces new lines for each frame, even though the \r echoing by itself works outside of Metaflow runtime.

storage = self.datastore(_datastore_packageroot(self.datastore, echo))
cache(storage, results, solver)
self.logger("Virtual environment(s) bootstrapped!")
# parallel solves
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rename the PR as the scope of this is beyond simple log changes at this point.

def info(self):
return self._call(["config", "list", "-a"])

@functools.lru_cache(maxsize=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an implicit None returned from this call which will be cached, if the environment has not been formed yet. This might cause problems in some cases

if os.path.exists(f"{prefix}/fake.done"):
return

# somewhat expensive check
# TODO: make this less expensive
if self.path_to_environment(id_, platform):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during forming a new environment, would this not cache the path to environment as None due to it not being ready yet?

executor.submit(self.solvers[solver].create, *result)
if storage:
# parallel cache
executor.submit(cache, storage, [result], solver)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to be careful with the timing though, as a lot of pypi steps rely on
self.micromamba.path_to_environment(id_) which can get stuck to None if called too early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants