Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plots: standardise across DVC/Studio/VS Code #9940

Closed
8 tasks done
mattseddon opened this issue Sep 12, 2023 · 37 comments
Closed
8 tasks done

plots: standardise across DVC/Studio/VS Code #9940

mattseddon opened this issue Sep 12, 2023 · 37 comments
Assignees
Labels
A: plots Related to the plots enhancement Enhances DVC product: Studio Integration with Studio product: VSCode Integration with VSCode extension

Comments

@mattseddon
Copy link
Member

mattseddon commented Sep 12, 2023

Summary / Background

Discussions regarding migrating the underlying plots' engine from Vega to Plotly were recently re-started (see iterative/dvc-render#7) due to several plots-related issues appearing in vscode-dvc (e.g. iterative/vscode-dvc#4530, iterative/vscode-dvc#4532, iterative/vscode-dvc#3837 and there are more). Initial scoping of that issue (re-)highlighted the current difference between the three product's UI/data processing. These three images show the "same" plot across the 3 products:

image

image

image

An initial step in scoping the migration effort was to implement zoom/pan functionality to Vega plots within the VS Code extension (iterative/vscode-dvc#4629). The immediate feedback provided was "But this needs to be available in Studio too!". Again (and unfortunately), that is no simple task because all three products take such different approaches. A simple template update can not simply be plugged in.

Another example of unified changes being difficult was plots per step being added in both Studio (https://github.com/iterative/studio/pull/7342) + VS Code (iterative/vscode-dvc#4372). It should also be noted that the slider is currently unavailable via the CLI output.

In order to make unified changes across the product easier we need to standardise.

Scope

As a first step to fixing the above problems, we are going to standardise the appearance of plots provided by built-in dvc-render templates across the three product's UIs.

Forcing the UIs to converge will give insights into the code. Those insights will enable us to refactor and move towards a standardised approach for dealing with plots/their data. As we remove legacy code/processes we will be able to iterate quicker on plot changes and not get bogged down every time we want to do something as simple as change a template.

At a minimum, we should display the same legend for each plot and we should also have the same underlying data in each plot.

Assumptions

  • The overall approach is up for debate.
  • There is business appetite to be able to iterate quicker in the future.
  • There is business appetite to have three standardised product UIs.
  • Do not need to attempt to improve the HTML produced by the CLI.
  • Usage of custom templates is low enough not to worry about being backwards compatible with user-defined custom templates.

Open Questions

  • Can we split out parts of dvc-render into JSON + Typescript + move into DVC? 👈🏻 This should become clearer as we get further into the process.
  • Can top-level bar plots have multiple data sources? How should they be treated?

Blockers / Dependencies

None yet.

General Approach

(after discussing with @dberenbaum)

  • Match CLI/Studio appearance to VS Code plot.
  • Add extra template anchors to the baseline templates and docs. These will include color, stroke dash and shape mappings along with a field to group data (equivalent of the current rev field used by DVC to render plots). I.e. <DVC_METRIC_COLOR> / <DVC_METRIC_STROKE_DASH> / <DVC_METRIC_SHAPE> / <DVC_METRIC_GROUP_BY>
  • Fill the anchors in each of the products based on.
  • Repeat colors for the CLI when > 7 have been selected.
  • Delay combining code for now.
  • Add <DVC_METRIC_ROW> for bar charts and only use <DVC_METRIC_SHAPE> for scatter plots.

Steps

Note: Took a detour to add tests to Studio and do some refactoring before attempting to make the required changes. All the PRs are linked to this issue.

@mattseddon
Copy link
Member Author

cc @iterative/studio-frontend @tapadipti @daavoo.

@daavoo
Copy link
Contributor

daavoo commented Sep 13, 2023

Does the approach have to be backwards compatible with user-defined custom templates?

I think the usage is too low to be worth the effort.
Also, many custom templates are most likely broken in some way if you compare the 3 UIs today.

@daavoo
Copy link
Contributor

daavoo commented Sep 13, 2023

Can we split out parts of dvc-render into JSON + Typescript + move into DVC? 👈🏻 This should become clearer as we get further into the process.

I would love it if DVC takes only care of collecting data sources and definitions (what is basically today's internal Repo API) and leaves the rendering (including building and defining templates) to the UIs.

I want to also give some background context on 2 parts:

  • The local HTML was created back when there were no other UIs.
  • dvc-render was created so DVCLive could generate live updates while the experiment was running, without depending on DVC (there was a time when the push was to keep DVCLive usable on its own). It was meant to be used by DVC, DVCLive, and Studio backend to avoid duplicating logic.

We now have the VSCode extension and Studio UIs, both supporting live updates.
Studio backend uses dvc-render but only partially while still using custom code and introducing inconsistencies.

I see the existence of the local HTML and dvc-render as legacy corner cases, and would be happy to deprioritize any changes there / drop them.

@mattseddon
Copy link
Member Author

Thanks @daavoo

I would love it if DVC takes only care of collecting data sources and definitions (what is basically today's internal Repo API) and leaves the rendering (including building and defining templates) to the UIs.

This is where we are heading.

I see the existence of the local HTML and dvc-render as legacy corner cases, and would be happy to deprioritize any changes there / drop them.

I'll see how difficult making updates there are going to be. I'm hopeful that we won't need to free or abandon rendering plots with the CLI.

@shcheklein
Copy link
Member

I would love it if DVC takes only care of collecting data sources and definitions (what is basically today's internal Repo API) and leaves the rendering (including building and defining templates) to the UIs.

(just to start the discussion, probably a bit too early, fine to ignore for now)

Can we list the implications of this decision (e.g. each product will have to implement an engine, each product will have its own set of definitions for plots, anything else?). How critical it is for us?

Btw, for basic plots I think we don't even need templates as something users deal with - we expect them to work out of the box. So, it all comes back to customizations, I guess?

@mattseddon
Copy link
Member Author

Can we list the implications of this decision (e.g. each product will have to implement an engine, each product will have its own set of definitions for plots, anything else?). How critical it is for us?

If we separate the concerns in that way we can have a shared Typescript package which can be imported by both VS Code and Studio. It will be responsible for processing/combining the data and working out the appropriate groups (see next comment) before it can be fed into the different app's components. I am also hoping to reuse that package for creating the current index.html. At this point, this is only an idea and if we do have a shared package it will bring in another set of problems.

Btw, for basic plots I think we don't even need templates as something users deal with - we expect them to work out of the box. So, it all comes back to customizations, I guess?

One reason for having default templates/making them visible is that (hopefully) users can find them and customise them to their own needs. I do however know that at this stage there is no evidence to support the idea that regular users will self-serve in this way.

@mattseddon
Copy link
Member Author

For the 4 template groups this is what we are aiming at:

Linear:

image

Note the updated tooltip. The plot can potentially contain different shapes when the data has multiple fields in multiple files.

No change to the simple template.

Bar:

image

In the above example (and whenever there are different data sources i.e. filename::field combos) we will introduce the row variable and stack groups on top of each other in separate plots.

Potentially we could introduce a column instead of a row.

image

Please let me know if you have any strong opinions on this.

Scatter:

image

Confusion Matrix:

No change for now but I will be looking into migrating to Plotly Heatmap as data is aggregated prior to rendering. The main benefit would be that we will have to send much much less data over the wire (as JSON). We may also be able to unmerge revs as we can specify a custom/standardised color scale for different plots. This would potentially give us a discrepancy in zoom + pan behaviour but we can weigh up the tradeoffs when we get to them.

@mvshmakov
Copy link

If we separate the concerns in that way we can have a shared Typescript package which can be imported by both VS Code and Studio.

+1 for it, shared codebase will help the standardization + allow us to not deviate from the shared logic as we go forward + implement it once instead of several times. It has its own set of problems, but IMO is worth it for the complex plots problem.

@mattseddon
Copy link
Member Author

mattseddon commented Sep 18, 2023

There are also differences in how the products merge plot definitions when comparing different revisions. We previously discussed this in iterative/vscode-dvc#3676.

I have found several edge cases/inconsistencies associated with updating y-axis values. When comparing different revisions we can either treat the y-axis values as mutable or immutable. Right now it feels that we are stuck between the two.

Studio tries to bypass this issue by persisting entire files that appear in plot definitions. This falls down when the required field is in a different file that was also tracked by DVC (see #9898 (comment) for some more details).

Personally, this feels like a complicated edge case that we can eliminate by making these specific definitions immutable. Here are the benefits (as I see them) of the two approaches:

Immutable:

  • Can compare revisions after moving plot files. E.g. If revision A writes metrics into train/metrics/acc.tsv & revision B writes metrics into training/metrics/acc.tsv the user will still be able to compare those revisions (this currently kind of works in Studio but doesn't anywhere else).
  • Much simpler to implement and follow.
  • Eliminates some of the aforementioned edge cases/need to save every available out for an experiment/commit.

Mutable:

  • Can update/compare different metrics across revisions if they exist for both revisions. E.g if revision A and B wrote metrics acc, acc_normalised, acc_average into train/metrics/acc.tsv but A used acc and B used acc_normalised the comparison between A and B will be between each revision's acc or acc_normalized, not acc vs acc_normalised.
  • Requires collecting all configs/definitions before data (file contents) collection and/or saving all outs (not really a benefit).

@dberenbaum @daavoo which of the above situations is more likely? Should we care about either of these cases and if so which is more useful?

@mattseddon
Copy link
Member Author

mattseddon commented Sep 20, 2023

Quick update.

What I've learned:

As an interim step, it seems to make sense to move logic out of the extension into dvc/dvc-render. We can calculate encoding updates before providing them along with partially filled templates to the extension. We can also use the encoding updates to fill the templates for the good old index.html.

Demo (of prototype)

Screen.Recording.2023-09-20.at.3.03.36.pm.mov

Having this logic in 1 less place has some benefits all by itself but it definitely keeps the door open to consolidating further.

It would be good to discuss whether or not we can release dvc/dvc-render/vscode-dvc together before making updates to Studio. The new templates are essentially a breaking change so Studio would be pinned on the previous version until such time that we can consolidate.

@dberenbaum
Copy link
Collaborator

It would be good to discuss whether or not we can release dvc/dvc-render/vscode-dvc together before making updates to Studio. The new templates are essentially a breaking change so Studio would be pinned on the previous version until such time that we can consolidate.

This could be a little dangerous if need to make other updates to Studio, but let's see how it goes. Maybe we can keep a feature branch in each project so we don't block releases? I don't think there's any rush to get it released.

@mattseddon
Copy link
Member Author

mattseddon commented Sep 22, 2023

Today I started working on the extension's implementation and came across the problem that multiple data sources cause confusion matrices. I currently plan on introducing some extra template variables to add a "facet row" for multiple data sources. Below is an example of how these will looks:

image

Open the Chart in the Vega Editor

Please let me know if you see any fundamental issues with this approach 🙏🏻.

Note: If you want to open the vega editor then edit this comment. The link is so long that it breaks the markdown.

I also now think that I may be able to fix the problem of grouping data for confusion matrices prior to sending it in a JSON payload (this will have to be done for Plotly) but this can wait until after everything else is done.

@dberenbaum
Copy link
Collaborator

Today I started working on the extension's implementation and came across the problem that multiple data sources cause confusion matrices. I currently plan on introducing some extra template variables to add a "facet row" for multiple data sources. Below is an example of how these will looks:

Is it different than what we did before?

@mattseddon
Copy link
Member Author

Is it different than what we did before?

Yes, VS Code behaved in the same way the DVC did. It concatenated the appropriate filename/field in with the rev and created all plots in a single row:

image

@dberenbaum
Copy link
Collaborator

Studio injects a custom sort order to faceted templates (i.e. confusion matrix variations). Vega does not respect this order but having it in place means that revisions are shown in the order that they appear in the provided datapoints.

Does it use the same order that the revisions are shown for all the other plots? What does VS Code do?

The main product question is: Do we want to make it possible to resize plots in Studio? Should we remove the functionality from VS Code? What's the correct thing to do from a product perspective?

What drove adding it in VS Code? How hard is it to add in Studio? @tapadipti @shcheklein Do you recall any discussion around whether we should allow resizing in Studio? I seem to remember some early discussions around how dynamic the layout should be and whether making it too flexible would be a mess.

@shcheklein
Copy link
Member

What drove adding it in VS Code?

I don't remember if we had specific feature requests or anything for this. I'm pretty sure it felt as a basic scenario, basic UX to have (exists in all other products).

Do we want to make it possible to resize plots in Studio?

We can skip the implementation part for now, but generalize across products - it makes sense.

@mattseddon
Copy link
Member Author

Does it use the same order that the revisions are shown for all the other plots? What does VS Code do?

Right now VS Code shows the revisions for multi-view plots in a random order but adding "sort": [] will fix the bug and ensure revisions are displaying in the same order as in the ribbon.

@mattseddon
Copy link
Member Author

I'll address this bug in Studio the same way that we fixed it in the extension:

image

x/y axis labels and the title will be truncated before being added to the template:

image

@mattseddon
Copy link
Member Author

mattseddon commented Oct 11, 2023

I am closing in on the last piece of the puzzle (Studio) for "make it work":

Screen.Recording.2023-10-11.at.4.00.38.pm.mov

Should be able to start "make it right" soon.

Note: This demo is still very broken but I am happy that I have rendering plots.

@mattseddon
Copy link
Member Author

I have built a very messy (but working) prototype, that presents plots differently depending on whether or not they are currently focused (e.g. legends and zoom and pan are suppressed unless the plot is focused):

Screenshot

image

However, I keep running into outdated comments/extra information (e.g. #10023 / https://github.com/iterative/studio/issues/4778 / https://github.com/iterative/studio/issues/6604). I'm taking time to get up to speed on these outstanding issues. If there are any issues that I can split out from the ongoing work I'll raise PRs.

@BradyJ27
Copy link
Contributor

Good news and bad news.

The good news is that we should be able to fix resizing multi-view plots (i.e. iterative/vscode-dvc#3757). The bad news is that all three of the products are using different grid strategies to render the plots.

Neither Studio nor dvc-render allow for resizing plots but VS Code allows users to resize both horizontally and vertically:

Screen.Recording.2023-10-06.at.7.14.56.pm.mov
dvc-render sets width and height to be 300 for all non-facet templates and width and height for each facet to 300 for each of the facets inside faceted templates (multi-view). Studio overrides the height of each plot to be 200 and the width for non-multiview plots to be "container". It has a responsive grid that then takes care of the width of each plot. It also overrides the height of the plot to be "container" when it is opened in a modal. VS Code sets both the height and width of all plots to "container" although this does not work for multi-view plots. It also lets users manipulate the size of each plot using sliders (shown in the above demo).

Currently, both VS Code and Studio are stuck with the 300 height and width set by the default template for faceted plots (i.e. the confusion matrix templates).

Right now I've put in anchors to deal with this but I think we need to standardise the grids that we are using as a follow-up step in this process. By using anchors the products can continue to overwrite things in their own specific way for the time being. We will probably have to expose these to users if they want the custom templates to "appear as expected" in all three products.

The main product question is: Do we want to make it possible to resize plots in Studio? Should we remove the functionality from VS Code? What's the correct thing to do from a product perspective?

This is secondary but we also have a very different drag-and-drop experience:

Screen.Recording.2023-10-06.at.7.21.55.pm.mov
Screen.Recording.2023-10-06.at.7.22.44.pm.mov
Would we consider standardising this as well?

Does this mean iterative/vscode-dvc#3757 will be entirely resolved and therefore vega/vega-lite#6672 is no longer relevant for multi-view plots? I was going to look into the vega-lite PR, but if it's no longer needed I will focus elsewhere.

@mattseddon
Copy link
Member Author

Does this mean iterative/vscode-dvc#3757 will be entirely resolved and therefore vega/vega-lite#6672 is no longer relevant for multi-view plots? I was going to look into the vega-lite PR, but if it's no longer needed I will focus elsewhere.

Not really, it just means that we will have a more reliable way to patch the issue. It would be much better solved on the Vega side and fixing it there would help out more people.

Note: If you're looking for something Vega/plots-related to contribute to there is also iterative/vscode-dvc#3837 👍🏻.

@dberenbaum
Copy link
Collaborator

Do we need to make a call on whether we are staying with vega lite before we have anyone spend more time on vega lite issues?

@mattseddon
Copy link
Member Author

From trying to fit top-level plots into Studio I've made the following decisions regarding multi-source horizontal bar plots:

  1. The column attribute will be used to differentiate data from different sources.
  2. These plots will be moved to the multi-view section (with confusion matrices) under these circumstances.

Demo

Screen.Recording.2023-11-14.at.12.42.08.pm.mov

IMO, these plots will not be getting much usage, let's get something that works into production and see if we get any feedback.

I have also realised that we can drop the proposed <DVC_METRIC_ROW_HEIGHT> & <DVC_METRIC_COLUMN_HEIGHT> anchors in favour of using <DVC_METRIC_PLOT_HEIGHT> & <DVC_METRIC_PLOT_WEIGHT> in combination with the isMultiView property for all of the different types. This is good because it gives us one less concept to explain.

@mattseddon
Copy link
Member Author

mattseddon commented Nov 20, 2023

Getting close now. I am currently working on moving multi-source legend entries from plots into the directories tree view on the LHS (as per the VS Code approach):

Screen.Recording.2023-11-20.at.3.10.19.pm.mov
image

@mattseddon
Copy link
Member Author

I've marked all the associated PRs as ready for review. The only one missing will be for dvclive if we decide to bump the major version of dvc-render.

@mattseddon
Copy link
Member Author

I've just released 1.0.0 of dvc-render. I am going to hold off on rolling everything else out until tomorrow morning 🙏🏻.

@dberenbaum
Copy link
Collaborator

Will you need a dvc release as part of the rollout?

@mattseddon
Copy link
Member Author

Will you need a dvc release as part of the rollout?

Already bugged @efiop. I want to double-check everything is fine with the extension/Studio before making that release. It will be today.

@mattseddon
Copy link
Member Author

mattseddon commented Dec 6, 2023

Here is the run list for today's activities:

  • make a dvclive PR with version of dvc from iterative/dvc#9931
  • test vscode-dvc
  • test studio
  • release dvc (3.33.0)
  • update dvclive PR from 1
  • release dvclive (3.4.0)
  • update vscode-dvc-demo
  • update vscode-dvc PR
  • merge vscode-dvc PR
  • update studio PRs
  • merge studio PRs

I'll link to this in the #studio-eng slack channel to let the team know that this is going live.

As I am bumping the min version of dvc in vscode-dvc I'll hold off on that release for a day or so until all packages become available.

@mattseddon
Copy link
Member Author

Only thing left to close this issue is iterative/dvc.org#4978

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: plots Related to the plots enhancement Enhances DVC product: Studio Integration with Studio product: VSCode Integration with VSCode extension
Projects
None yet
Development

No branches or pull requests

6 participants