Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF(WiP): SpecObject .find and .__iadd__ #418

Draft
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

Sits on top of #273. Only the last commit (6e9178f to be rewritten during WiP) is pertinent to this PR ATM

Should make it possible to "join" multiple specifications, and also
avoid duplicates while retracing.

Seems to run without crashing on a sample but I do not think it works correctly yet

yarikoptic and others added 30 commits January 17, 2019 14:24
    changed diff to a_and_b
    collecting diff objects in a_only and b_only
Should make it possible to "join" multiple specifications, and also
avoid duplicates while retracing.

Seems to run without crashing on a sample but I do not think it works correctly yet
* origin/master:
  TST: resource/singularity: Adjust test for previous commit
  RF(PEP8): make resource/singularity.py pep8 compliant (besides TODO)
  BF: singularity 3.x has changed to "instance CMD" instead of "instance.CMD"
  BF: fix for singularity 3.x having new --version output format
  RF: use /bin/bash instead of /bin/ed or /sbin/iptables, define centrally
  BF: exception cannot be logged, preceding debug() already logs its exc_str
  TST: travis: Remove 'route' commands for NONETWORK run
  RF: external_versions: Drop cmd:datalad
  BF: skip: Use datalad module to check DataLad availability
  TST: orchestrator: Fix a test's prepare_remote() call
  TST: travis: Install datalad-container in the DataLad run
  DOC: run: Mention container job parameter
  ENH: orchestrators: Support containers in subdatasets
  BF: orchestrator: Fix target commit checkout for shell resource
  TST: orchestrators: Rename test and add description
  BF: orchestrators: Fix plain copy for inputs in subdirectory
  TST: orchestrators: Move repeated setup to dataset fixture
  TST: orchestrators: Check that branch is restored
…nly with files

This is a very preliminary crude implementation, needed to bring sanity to test
tracing conda and venv.  Even without tests ATM :-/   Ideally we should rewamp configuration
management so all those options could be specified at command line and have some
centralized definition/management and easy ways to mock/patch for testing
…to use

TODOs:

- possibly RF for better API (changed to return dict in get_tracer_classes)
- we should know which tracers are available without importing the classes
  which should be delayed until use
  - might as well take care about
    # TODO: automate discovery of available tracers
    which should be possible if we unify naming for submodules/tracer classes
- that list of tracers should appear in the --help for -t, --tracer
- add basic tests for selection
Before RFing into dict it was a list to which we were appending.  With switching to dict
we started to override the same tracer.  Now they all will exist and in the order they
were added
* origin/master: (114 commits)
  0.2.0 release changes
  BF: orchestrators: Make dirty check work better with subdatasets
  ENH: orchestrators: Add directory parameter to _assert_clean_repo()
  BF: orchestrators: Guard dirty check against Git configuration
  STY: orchestrators: Convert string command to list
  ENH: run: Abort if dataset is dirty
  TST: orchestrators: Switch more tests over to SSH resource
  TST: orchestrators: Move repeated instance creation to a helper
  ENH: datalad-pair-run orchestrator: Reset to original commit
  BF: orchestrators: Fix parsing status from ls-tree output
  TST: orchestrators: Drop module-scoped dataset fixture
  TST: orchestrators: Add check_orc_plain for SSH resource
  TST: orchestrators: Add ssh fixture
  TST: travis: Install DataLad system-wide
  CLN: Remove stray unicode_literals imports
  TST: Adjust ExceptionInfo use for compatiblity with pytest v5.0.0
  DOC: orchestrators: Add missing word to comment
  BF: ssh: Don't pass None values for Fabric's connect_kwargs
  TST: travis: Set up SSH for localhost
  BF: conda: Temporary pin version in miniconda URL
  ...
@codecov
Copy link

codecov bot commented Sep 12, 2019

Codecov Report

Merging #418 into master will decrease coverage by 4.8%.
The diff coverage is 77.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #418      +/-   ##
==========================================
- Coverage   89.48%   84.68%   -4.81%     
==========================================
  Files         148      148              
  Lines       12112    12282     +170     
==========================================
- Hits        10839    10401     -438     
- Misses       1273     1881     +608
Impacted Files Coverage Δ
reproman/distributions/venv.py 89.55% <100%> (+0.4%) ⬆️
reproman/distributions/redhat.py 94.54% <100%> (+0.03%) ⬆️
reproman/distributions/debian.py 95.3% <100%> (ø) ⬆️
reproman/distributions/conda.py 94.19% <100%> (+0.02%) ⬆️
reproman/distributions/vcs.py 95.85% <100%> (+0.02%) ⬆️
reproman/interface/tests/test_diff.py 100% <100%> (ø) ⬆️
reproman/interface/diff.py 95.77% <100%> (+0.06%) ⬆️
reproman/formats/reproman.py 80.16% <22.22%> (-10.22%) ⬇️
reproman/distributions/base.py 81.63% <72.89%> (-7.56%) ⬇️
reproman/interface/retrace.py 89.76% <75%> (-5.48%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62084cb...db188c2. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 12, 2019

Codecov Report

Merging #418 into master will decrease coverage by 4.94%.
The diff coverage is 77.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #418      +/-   ##
==========================================
- Coverage   89.54%   84.59%   -4.95%     
==========================================
  Files         148      148              
  Lines       12132    12296     +164     
==========================================
- Hits        10863    10402     -461     
- Misses       1269     1894     +625
Impacted Files Coverage Δ
reproman/distributions/venv.py 89.55% <100%> (+0.4%) ⬆️
reproman/distributions/redhat.py 94.54% <100%> (+0.03%) ⬆️
reproman/distributions/debian.py 95.3% <100%> (ø) ⬆️
reproman/distributions/conda.py 94.19% <100%> (+0.02%) ⬆️
reproman/distributions/vcs.py 95.85% <100%> (+0.02%) ⬆️
reproman/interface/tests/test_diff.py 100% <100%> (ø) ⬆️
reproman/interface/diff.py 95.77% <100%> (+0.06%) ⬆️
reproman/formats/reproman.py 80.16% <22.22%> (-10.22%) ⬇️
reproman/distributions/base.py 82% <73.07%> (-7.19%) ⬇️
reproman/interface/retrace.py 89.76% <75%> (-5.48%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e39954...3fffc18. Read the comment docs.

@kyleam
Copy link
Contributor

kyleam commented Nov 7, 2019

On the call, @chaselgrove asked if anyone had a snippet for triggering the duplicate distributions in order to test this PR. I wasn't able to find a snippet posted anywhere (e.g., to generate linked traces in gh-367). If my understanding of the issue is correct, the necessary condition is for a tracer to add a new file to the list of unknown files that will be picked up by a tracer on the next iteration. Because of this, the order of the tracers in retrace.py matters. The current order in retrace.py is DebTracer, RPMTracer, CondaTracer, VenvTracer, VCSTracer, DockerTracer, SingularityTracer.

On a Debian system, here's a snippet that triggers two Debian distribution entries:

cd $(mktemp -d --tmpdir reproman-retrace-XXXXXXX)
virtualenv --python=python3 venv
. ./venv/bin/activate
reproman retrace /bin/sh $PWD/venv/lib/python3.7/abc.py

So for the first pass, the DebTracer adds a debian distribution entry because it picked up dash from /bin/sh, then VenvTracer adds a venv distribution based on encountering abc.py. In the virtual environment, abc.py is a link to /usr/lib/python3.7/abc.py, so the VenvTracer passes that file back as an unknown file. On the next iteration DebTracer picks it up, and adds a second debian distribution.


On a non-Debian system I think this is harder to trigger. The easiest thing I came up with so far is to put VCSTracer before VenvTracer:

diff --git a/reproman/interface/retrace.py b/reproman/interface/retrace.py
index 62f1be60d..597b327aa 100644
--- a/reproman/interface/retrace.py
+++ b/reproman/interface/retrace.py
@@ -244,6 +244,6 @@ def get_tracer_classes():
     from reproman.distributions.vcs import VCSTracer
     from reproman.distributions.docker import DockerTracer
     from reproman.distributions.singularity import SingularityTracer
-    Tracers = [DebTracer, RPMTracer, CondaTracer, VenvTracer, VCSTracer,
+    Tracers = [DebTracer, RPMTracer, CondaTracer, VCSTracer, VenvTracer,
         DockerTracer, SingularityTracer]
     return Tracers

Then pass reproman retrace a file from a git repo and a file from a virtual environment that has an editable package for that git repo:

cd $(mktemp -d --tmpdir reproman-retrace-XXXXXXX)

git clone https://github.com/docker/docker-py

virtualenv --python=python3 venv
. ./venv/bin/activate
pip install -e docker-py

reproman retrace $PWD/venv/lib/python3.7/site-packages/six.py $PWD/docker-py/setup.py

I see the distribution for the virtual environment sandwiched between two git distributions.

output
[...]
2019-11-07 15:37:55,707 [INFO] Entering iteration #1 over Tracers 
2019-11-07 15:37:55,874 [INFO] DebTracer: 0 envs with 2 other files remaining 
2019-11-07 15:37:55,876 [INFO] RPMTracer: 0 envs with 2 other files remaining 
2019-11-07 15:37:55,897 [INFO] CondaTracer: 0 envs with 2 other files remaining 
2019-11-07 15:37:55,981 [INFO] VCSTracer: 1 envs with 1 other files remaining 
2019-11-07 15:37:57,308 [INFO] VenvTracer: 1 envs with 1 other files remaining 
2019-11-07 15:37:57,354 [INFO] SingularityTracer: 0 envs with 1 other files remaining 
2019-11-07 15:37:57,354 [INFO] Entering iteration #2 over Tracers 
2019-11-07 15:37:57,356 [INFO] RPMTracer: 0 envs with 1 other files remaining 
2019-11-07 15:37:57,364 [INFO] CondaTracer: 0 envs with 1 other files remaining 
2019-11-07 15:37:57,438 [INFO] VCSTracer: 1 envs with 0 other files remaining 
2019-11-07 15:37:57,438 [INFO] No more changes or files to track.  Exiting the loop 
# ReproMan Environment Configuration File
# This file was created by ReproMan 0.2.0 on 2019-11-07 15:37:57.438721
version: 0.0.1
distributions:
- name: git
  packages:
  - path: /tmp/reproman-retrace-2GSWKtd/docker-py
    files:
    - setup.py
    root_hexsha: 8e0b54109ad2f1a6c4d4972675d523240038bf44
    branch: master
    hexsha: a0b9c3d0b38abd4af1880ca3dde2845556dd2f70
    describe: 3.7.1-159-ga0b9c3d
    tracked_remote: origin
    remotes:
      origin:
        contains: true
        url: https://github.com/docker/docker-py
- name: venv
  path: /usr/bin/virtualenv
  venv_version: 15.1.0
  environments:
  - path: /tmp/reproman-retrace-2GSWKtd/venv
    python_version: 3.7.3
    packages:
    - name: certifi
      version: 2019.9.11
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: chardet
      version: 3.0.4
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: docker
      version: 4.1.0.dev0
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/docker-py
      editable: true
    - name: idna
      version: '2.8'
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: pip
      version: 19.3.1
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: pkg-resources
      version: 0.0.0
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: requests
      version: 2.22.0
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: setuptools
      version: 41.6.0
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: six
      version: 1.13.0
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
      files:
      - lib/python3.7/site-packages/six.py
    - name: urllib3
      version: 1.25.6
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: websocket-client
      version: 0.56.0
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
    - name: wheel
      version: 0.33.6
      local: true
      location: /tmp/reproman-retrace-2GSWKtd/venv/lib/python3.7/site-packages
- name: git
  packages:
  - path: /tmp/reproman-retrace-2GSWKtd/docker-py
    root_hexsha: 8e0b54109ad2f1a6c4d4972675d523240038bf44
    branch: master
    hexsha: a0b9c3d0b38abd4af1880ca3dde2845556dd2f70
    describe: 3.7.1-159-ga0b9c3d
    tracked_remote: origin
    remotes:
      origin:
        contains: true
        url: https://github.com/docker/docker-py

@yarikoptic
Copy link
Member Author

@chaselgrove will you have some time in the near future to address @kyleam 's comment?

@yarikoptic yarikoptic marked this pull request as draft February 15, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants