Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion thread on key use cases, interface rearrangement #4

Open
stschiff opened this issue Apr 28, 2020 · 16 comments
Open

Discussion thread on key use cases, interface rearrangement #4

stschiff opened this issue Apr 28, 2020 · 16 comments
Labels
enhancement New feature or request

Comments

@stschiff
Copy link
Contributor

We currently have this very nice and flexible module-based approach, where I can say ./sidora.R -m progress_table and it creates a progress table. However, it's a bit funny that - for example - the project filter is a global option. I think that design will create problems later.

For example, here is an exploratory workflow that I can imagine might be useful:

  1. query Pandora quickly for all available projects (because you don't know by heart how you've named your favourite project again)
  2. quickly list how many sites/samples/individuals or so there are per project (perhaps you have two projects with relatively similar names and you want to quickly double check the raw numbers of sites in each)
  3. Then select a specific project and output the full progress table for it.
  4. Print the progress table again with selected columns
  5. Create a pdf or html report from that customised progress table to send to your team, say.

here, 1-3 would not allow for a project selection (because you are printing all), while 4-6 involve selecting a project.

I would therefore suggest we aim for a more controlled, less free, approach to the interface, where we have subcommands of the sort

./sidora.R list --projects -> listing all projects
./sidora.R list --projects --withStats -> as above, but with some key numbers summarised (nr of sites, individuals, ...)
./sidora.R list --tags -> listing all tags, with optional --withStats
./sidora.R list --sites -> listing all sites, with optional --withStats
./sidora.R view --project=XX -> list progress table
./sidora.R view --tag=XX -> similar
./sidora.R view --project=XX --columns=X,Y,Z -> show only the selected columns
./sidora.R view --project=XX --columns=X,Y,Z --output=html -> create report

So here I can see we already need list and view subcommands, but I'm sure we'll have a lot more in the future. But this would go away from the very flexible module-based approach, but make a more restrictive set of sub-commands.

In terms of internal design, I think we should design this repo as a proper R package. So all functionality shown above from the command line should be also available as simple R functions (called for example sidora_list(...) and sidora_view(...)). Then, the CLI script would simply call those functions. Thereby we have covered both the interactive, programmatic use-case within R, and also the more immediate bash-based approach.

Destroy.

@jfy133
Copy link
Member

jfy133 commented Apr 29, 2020

Actually, I think it's a good idea, it streamlines a lot and indeed fits with a 'cli' like tool, so probably would be clearer to a user.

One additional note (although this can come later), is often I only need to build a report for archaeologists of a single (precious, in all cases ;)) sample.

So, if I understand correctly, for the view function, we could have a --individual (instead of --project), and that will then have a different sub function to make a single-indiviudual report? Is that correct?

@stschiff
Copy link
Contributor Author

Yes, that's possible. I also think we should make it possible to select a specific site, or in fact multiple sites. And with respect to tags, I think we should allow a system where we list - say everything for which a specific tag is set at a specific level. So perhaps something like ./sidora.R view --tag myTag --tagLevel Site, which would say: Show me everything that has "myTag" at the site level.

I think we need to come up with a general selection grammar or something... well, one step at a time.

@nevrome
Copy link
Member

nevrome commented Apr 29, 2020

Let's revive the sidora hackhour on Friday. "Typical" workflows as the one outlined above will help to narrow down, what this interface should do.

The idea to transform this repo again to an R package is good, but I believe it will be difficult to write functions that are both useful for command line data exploration and R data analysis. Within R you want tidy data structures, on the command line you want easily digestible and well readable output (and I want cool ascii plots). IMHO we should do one thing good and not try to support two different interfaces at once.

R users should rely on the core package only. Probably our work on the cli-backend-package will reveal new functions that should be part of core. Beyond the differences in output and purpose it's also confusing to require the user to effectively use two packages for exploring Pandora. One vaguely general and the other vaguely more specific.

@jfy133
Copy link
Member

jfy133 commented Apr 29, 2020

I unfortunately am finding it more and more difficult to find time to join the hackathon at the moment due to continued childcare and piling up deadlines :.

But I see what you mean. Why couldn't even the 'specific' reporting functions also just not go in sidora.core? I suspect only people who want to get into the nitty gritty would go into sidora.core anyway.

@stschiff
Copy link
Contributor Author

OK, you've convinced me about the separation of keeping the R API in sidora.core, and focus the cli only on bash usage. Sounds all good.

@jfy133
Copy link
Member

jfy133 commented May 15, 2020

Clemen's and I decided to try and solidify more of the design decisions, so will make a draft here.

General Overview

sidora.cli will have a verb - noun 'like' grammar.

e.g.
view -> project/site/sample/capture
list -> site/project/site/sample/capture
summarise -> sample/project/tag

etc.

'Verb' Module Descriptions

List

Simply gives a list of each entity of a given criteria in a row-wise fashion.

E.g. I want all sites for a project:

AAA
ABB
ABC

View

Provides all information for a single 'row' of a pandora table. This is essentially all the information that is displayed when on the Pandora Web UI.

image

Summarise

Gives summaries (totals, means, maps, lists etc.) of a given noun.

For example:
This project has 10 sites with 40 samples.

The samples are from these countries, with LAT:LON on a fancy ascii nerd-map.

Tabulate

This provides all information of a multiple entries of a pandora table in a TSV format.

Default displayed is a markdown table.

A export function would allow exporting as a TSV file.

Site Sample ID Type
AAA AAA001 21
ABB ABB001 23

Report

tbc.

@stschiff
Copy link
Contributor Author

So what "noun" means depends on the Verb:

  • For list: The noun should be an entity type in plural (e.g. "sites", "individuals" or "projects")
  • For view: It should be an actual entity in singular (e.g. "FSW", or "FSW001")
  • For summarise: plural or singular?

@jfy133
Copy link
Member

jfy133 commented May 15, 2020

Correct.

Summarise - good question @nevrome ?

@nevrome
Copy link
Member

nevrome commented May 15, 2020

Since my work with ruby I have a deep-rooted dislike for every approach that forces context-sensitive plurals. I vote for only singulars everywhere.

@stschiff
Copy link
Contributor Author

That's OK, I more meant conceptually... not clear to me whether summarise takes a type or an entity. With respect to list, I would suggest to also show some property columns for each entity, right? Like "Name", "Country", "Locality" per site... In the end we'll see whether we need to even have a summary command, depending on how fast/slow that is. But certainly OK to have it for now.

@nevrome
Copy link
Member

nevrome commented May 15, 2020

Each of these modules (except list) takes an entity type and an entity id. What you suggest for list is now part of tabulate. I think we should quickly talk about it in our debriefing session later.

@nevrome
Copy link
Member

nevrome commented May 15, 2020

ToDos:

  1. Fill the modules with life (as they are right now)
  • View (James)
  • Summary (Clemens)
  • Tabulate (maybe Thiseas?)
  1. Basic filter abilities (for the tabulate module?) -> maybe compare: advanced search of pandora webinterface

Focus: Data driven analysis approach

@jfy133
Copy link
Member

jfy133 commented May 15, 2020

General Roadmap discussion:

  • Development of sidora.cli should initially focus on data-onwards functionality. The users of this tool will likely be bioinformaticians, and thus should revolve around them. This involves
    • General filter
    • Focusing on summarising sequencing results (e.g. of a sequencing run)
    • Finding and reporting locations of FASTQ files (e.g. what @TCLamnidis had made for eager - speaking of which sidora-cli.R eager ?)

WebApp will be the focus for non-bioinformaticians, e.g. to provide smmary statisics for PIs; lab tracking (e.g. progress tables) for lab techs.

James will finish simple tasks (view) then move back to Web app.

Clemens will work on summary, which will also feed into later work for James and Stephan when we start developing the report module.

@jfy133 jfy133 mentioned this issue May 17, 2020
@jfy133 jfy133 added the enhancement New feature or request label Jul 3, 2020
@nevrome
Copy link
Member

nevrome commented Jul 9, 2020

I fell in love with this extremely neat, documentation string based CLI interface definition with docopt. How would you like an interface like this, @stschiff and @jfy133?

sidora.

Usage:
  sidora tutorial [options]
  sidora examples [options]
  sidora glance <entity_type> [options]
  sidora view <entity_type> <entity> [options]
  sidora summarise <entity_type> <entity> [options]
  sidora list <entity_type> (<entity>... | <filter_entity_type> <filter_string>) [options]
  sidora tabulate <entity_type> (<entity>... | <filter_entity_type> <filter_string>)  [--as_tsv | --as_pandora_upload] [options]

Options:
  -h --help Show this screen
  --version Show version
  --human-readable Todo
  --credentials=FILE Todo [default: .credentials]
  --cache_dir=DIR Todo [default: ?]
  --empty_cache Todo

@jfy133
Copy link
Member

jfy133 commented Jul 9, 2020

That looks really nice! Much more what I would be familiar with!

@stschiff
Copy link
Contributor Author

Looks beautiful, indeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants