Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement new iter_by_chunks() in items #133

Merged
merged 11 commits into from
Oct 23, 2019
68 changes: 68 additions & 0 deletions scrapinghub/client/items.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from __future__ import absolute_import

import sys

from .proxy import _ItemsResourceProxy, _DownloadableProxyMixin


Expand Down Expand Up @@ -37,6 +39,34 @@ class Items(_DownloadableProxyMixin, _ItemsResourceProxy):
'size': 100000,
}]

- retrieve items via a generator of lists. This is most useful in cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special thanks for the nice docstrings in the PR ❤️

where the job has a huge amount of items and it needs to be broken down
into chunks when consumed. This example shows a job with 3 items::

>>> gen = job.items.list_iter(chunksize=2)
>>> next(gen)
[{'name': 'Item #1'}, {'name': 'Item #2'}]
>>> next(gen)
[{'name': 'Item #3'}]
>>> next(gen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

- retrieving via meth::`list_iter` also supports the `start` and `count`.
params. This is useful when you want to only retrieve a subset of items in
a job. The example below belongs to a job with 10 items::

>>> gen = job.items.list_iter(chunksize=2, start=5, size=3)
BurnzZ marked this conversation as resolved.
Show resolved Hide resolved
>>> next(gen)
[{'name': 'Item #5'}, {'name': 'Item #6'}]
>>> next(gen)
[{'name': 'Item #7'}]
>>> next(gen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

- retrieve 1 item with multiple filters::

>>> filters = [("size", ">", [30000]), ("size", "<", [40000])]
Expand All @@ -59,3 +89,41 @@ def _modify_iter_params(self, params):
if offset:
params['start'] = '{}/{}'.format(self.key, offset)
return params

def list_iter(self, chunksize=1000, *args, **kwargs):
"""An alternative interface for reading items by returning them
as a generator which yields lists of items sized as `chunksize`.

This is a convenient method for cases when processing a large amount of
items from a job isn't ideal in one go due to the large memory needed.
Instead, this allows you to process it chunk by chunk.

You can improve I/O overheads by increasing the chunk value but that
would also increase the memory consumption.

:param chunksize: size of list to be returned per iteration
:param start: offset to specify the start of the item iteration
:param count: overall number of items to be returned, which is broken
down by `chunksize`.

:return: an iterator over items, yielding lists of items.
:rtype: :class:`collections.Iterable`
"""

start = kwargs.pop("start", 0)
count = kwargs.pop("count", sys.maxsize)
processed = 0

while True:
next_key = self.key + "/" + str(start)
items = [
item for item in self.iter(
count=chunksize, start=next_key, *args, **kwargs)
]
yield items
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a side note, when the total amount of items >= count and count isn't divisible by chunksize, with the current logic the function will return amount of items >= count. Say, list_iter(chunksize=2, count=3) will return 2 batches by 2 items, so 4 items in total, not 3. This is probably fine, but if keeping it as-is, I'd ask to at least mention the behaviour in the function docstring.

Copy link
Member Author

@BurnzZ BurnzZ Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vshlapakov! Turns out there was a logic that @manycoding suggested before that wasn't added to this PR. This has been fixed and I've added a new test case for it.

processed += len(items)
start += len(items)
if processed >= count:
break
if len(items) < chunksize:
manycoding marked this conversation as resolved.
Show resolved Hide resolved
break
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eJy1lo1/HEUZx9O0FGgaKIgWisg1knCE7N3eXi+5JAZIkzTN6zW5S9jUjnWyN8lucrd7z77kpRghgEVDI6RSFUEUFBUQtLWoKLa66wu+v7/r/+Kzs5eXJlcC6cfLZffmZXdmnvn+fs88VD4HZWH56rKysilmWpqhw7aebfJuLGu6zUyq2FhnQTmB7eE52BF0NRk4zLLhqjnYGZZ3Yk2e2aqRhavlHVg4kkpn4Bp5O/50TA2ule/AX6ptF5qi0Xg8EnwTsaakmBSjtKBFTUePTFg49i7+/KiRnYUKuRd/Fkxjgil2SzwmSjGpxipoWWa2qJZg4/hCUKzBwWlLdcPBakmybGozvFfHW+/GG75Y1/Rxv6KhHXbzyauM4kMWVM7BdWF5F9YMWcwUWseZbsP1BPbIN/sDz+KCdMFSTJygPq46o1EpIkVicAOVr8f2VkVhBVvo0BUji+1wI4H3yJXYMH5CK9SFsmwshzOBmygPT9Ab3kvgfTwstdFa2Ev54G2GrjMeZbiZwC28bpKxgkBz2hSDfcEL2gxjUmNwK4H3yz1YVixzzDYmmS619Gc6+oem8obak5ZShX453k47BqfNkU5IO+JIb9vRfGc6kzosTw4UjPzI0NiI5ph5u79vRBSPdk3NzsJtVL4umIiNIRB6mT5uq/ABArfL5VjfUA8hyolY7pGZLTDYT6BK3o+1tFDIaQr1VxCdEaanp4Uxw8wLjpljfnBYFj5IeWRaHQypqZ3gXeEOWcK6g9TSlNDR/Dv9G6mHaseRr+EQWgVEk0HNHNwZUOjvvmMBgnpXmJPkjw+1PW/xjc8zy6LjDO7my0r1QJ2jIgRC0LXd364Igah8G5b6DL0uJMVCKcUOSWKsMRSTmkT8JkOdfRkQqXyVv/JczpiGGAGJr8/Hvi6UOpLpSvWnIU7lPWtitgLKAQIJPqAPCtSXiGwDgSR/dm1kuTwaA2RkYahg2SajeWgi0CzfgHVZaqlCnlqo2eNI7FQMPkT5MFPUnIUWAves5ejeDezdR6B1PXsHA/bSzERvgDYC7Xyy+rimz0RjkZgvhw7qqBVzcKgYf9NfYqdy3A+8Yu0KCNgFh+UhbL7dfdzfOPxsd18ero101d0lDLvnBr0F74lh7/QA3hdS3kLvNtM9VVnpntrtLnrnK71FwXvdXex1z0xsw0e9i97F7WXeSe+Z8pvuxDJ08aFRbZoeg+6T0DMIvZbjzEFfWEWD6g+ra1wppV4r37PqRpZtmIhEZCofWaP0iGLko5rN8lZUCj7RWENjVErci4CZdosIR9QK9bDcvbymVvc591VS7l10XxnG+Q/gmkTvyYEj3V11XTskb949784PCvcn2r2FPX34v5+5j9awHe5T5WV73MUm77EBfyEDKs5+cBDSPpWZDdY0RGD47a3pfqqiDcklbGiEqug9R9W9VEWT+bC6L4BTKWLHluE8RoCswvkRqqILHOdySTTAR6m6ZdFSx1FRp6NhFcWphFXUpLqiwyzXIQvLtxat0rIEXxGmkRNafY0JKVND6GCMwLjsQ1ALaknT0ghM8A4iTJZQVo5AXr4Fa202Y0cLOarpzSFFpabF7JahzCEhCTrlHdJIsoJPmVRHlzFtIc0UzGX2LBgECrLg2wmdEZAdTE2JeL0ois0hTVdyTpalndF2I4/vtppDBZPlDJptBgh0Yqqdsi8AsDiiNkfUuQTRKUQ08Y4QnTBGYYVQp5D1d3pareCze6CKp8KqplDVmKZrlsqyVXWhqkk261cF6bRIddUczPgbMBtWkbETaoDRA2oAzceK0MwhND4OH+e7diABD14BDg8FOMxzHB5eh8MjHIdH19jyJwicfDtbjonclh+j3AplwXcsTWHCcPFI80kCn5LD2CSxUfGAQpW4FJcaJdqQHJOkZDLeyERxbEyKUSYmkpIkKrBAN4XxcQKnVHUzXBYJfHoLuDxRgt4nCSzJezdkXD8z5DSdWXB6Q3r4DIGn+HHFR2U5PUyzUThTUj6fJfC5YHuT8PlLzP9pAl/YaP7PBFA/i1CLnLmSeCF2uK6pFR6LhzLE7otcBM9xEXzJF4F/NursyMCXUQPsSm26WFktHcJqvEoJvIg1iuHo2BicGo/jjOF5taLfR+4FLoCvFAXw1aIAXiwK4Gu+ALZM/NcD4r/BiX9pHfEvc+Jf2dwAv0ngVZ+5kieL1wh8a9W8v70ZmWcJnNsCmd8pQeZ5Aq/LNZcnc73Jfrckfd8j8P3lE+cbAVk/UDsxz+YvOTuUrSRZCXOsn2Iz7rny9M7upm7vrPe0+5J75j73Nfes+7w77y7hdcm9gNc33BfxvoTNF4JkLLuP7HcXdrYd27fbnd9X5j3sLrhveudGfXf+oZ+K3xyEH3FAL3BAL64F9Mf/N0ClEoD+pAioywH1ioD+tAjoz4qA/vyKAP1FAOhbHNBfrgP0VxzQX28O6G8I/PaygP6OwO9XAf3DZoD+kcCftgDon0sA+hcCf303gP6tJKB/J/APHpH6A/DPANB/cUCPXRbQ06uAHnoXgC4NCnvL3GfdF8ZvjPk8/tvn8T+D8F/kkTmR/wHHHSmZ
1 change: 1 addition & 0 deletions tests/client/cassetes/test_items/test_items_list_iter.gz
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eJy1lo1/HEUZx9MXoDQNpCi0BZFraMI1ZO/29pp3A6ZJmqZ5ueaVTXSIk71JZpO73Xv2JS+t0RYRYmilYFUQaEBFEUR5KQoqVnd9wff395e/xWdnr02aHKSf9OMld7vzzOzOzDPf32/mxOZ5KIqq1xUVFU0zy9ZNAzZ1blJ3YFk3HGZRzcGYDZsJbInOw9awqcXAZbYD18zDtVH1WoxkmcPNNFynbsXC0VT/AGxTt+Cta+lwvboP77jj5Bri8WQyFv5XJxrq5Do5TnN63HKN2KSNfW8Xz4+Z6TkoVrvwNmeZk0xzmpIJWUkoFXZOTzOriduSg/1LYbECO6dN5bUHyxXFdqjD8FqebL4bL/hiQzcmgkBtK+wQg+eM4kM2lMzDDVF1O0YGbWZJzRPMcOBGAqXq7qDjOZyQIdmahQM0Jrg7FldiSiwBO6l6I9Y3axrLOVKboZlprIebCLxPLcGKiWN6riqSZuMZHAm8n4r0hK3hZgK3iLRUxithFxWdt5iGwUSWYTeBPSI2xVhOohl9msGt4QtaTHNKZ3AbgQ+onVjWbGvcMaeYoTT1DLT1DE5nTd7Zr6RyPWqylbb1zVjD7dDvysNdLSPZ9v6B1GF1qjdnZocHx4d118o6Pd3DsjzSMT03B7dT9YZwIA6mQOpixoTD4YME7lA3Y7y2BiJUEHGxxcBcjsFeAmXqXozSXC6jazSYQXxWmpmZkcZNKyu5VoYFyWFpuJOKzDS7mFJLPyaawj5VwdhBautaZCR7pX/DNVDuuuo2AaGdQzQZVMzDXSGFweq7NiCo+6OCpKB/qOx8Ryx8ltk2nWBwt5hWqhOqXI4QSGHT1mC5YgTi6u1Y6jaNqoiSiKQ0J6LIifpIQmmQlQalPtLePQAyVa8JZp7JmDOQIKCI+QXYV0VSRwc6Uj39kKRq6YqcXQLlAIFq0WEACtQUyGwtgTrx7MrMCnnUh8io0mDOdixGs9BAoFHdibE0tbmUpTZqdhSJnU7Ah6joZppac9BE4J6VHN27hr0PE2hezd7BkL1+ZqE3QAuBVjFYY0I3ZuOJWCKQQxt1efE8HMrn3wqm2K6NBonX7O0hAdvhsDqI1Xd4jwQLh58t3otDlbGOqv3SkPdan7/oPzrkP96L18WUv9i1yfJOlZR4p3Z4p/3zJf5pyX8D7055Zyc34bP+Bf/CliJvibXechcWoUN0jWrTjQQceQg6+6DLdt156I5yNKieKF/hSil+vXrPshvZjmkhErHpbGyF0mOamY3rDsvacSX8xBO19XGl5l4EzHKaZDjKi/lh9cjFOQ1557yXyWb/gvfSEI6/F+ck+2d6jx7pqOrYqvgnvfPeyT7pvupWf7G0G797mfdgBdvqnd1cVOqdbvAf7g0m0stx9H190B9QObDGmgYJDL23Nd1HOdqQWsCGhilH7xnhuyhHk/kIvzWEU8tjxy7C+VECZBnO+ylHFxgVcqmuhY9RvmHRUtflqNOxKEdxalGOmuSXdJgWOmRR9ba8Vdq2FCjCMjNSc6AxKWXpCB2ME5hQAwYqgRc0LZ3ApGggw1QBZWUIZNU9GHXYrBPPZahuNEY0Ti2bOU2DA4ekOjCoaNCPJGv4lEUNdBnLkfqZhnuZMwcmgZwqBXZCZyVkB7em6mSNLMuNEd3QMm6a9btjrWYW3203RnIWy5g03QgQ6sTi7WogALAFoo5A1L0M0WlEtPqKEJ00x+ASoW4uHaz0DC8WozteJrbCsoZI2bhu6DZn6bKqSNkUmwtC4Xaap7psHmaDBZiLcmTsGA8xOs5DaD6eh2YeoQlw+IRYtQPV8MmrwOFEiMNJgcMDq3D4lMDhwRW2/GkCD72XLScVYcsPU2GFqhQ4lq4xaSh/pFkg8Bk1ilUKG5MPaFRLKkmlXqG1deOKUleXrGeyPD6uJCiTq+sURdZgka4L4yMETnG+Hi6nCXx2A7g8WoDeMwQeU3et2XGDnSGjG8yGx9dsD58jcFYcVwJULm4PM2wMPl9QPl8g8MVweevgicvM/0kCX1pr/k+FUD+NUMuCuYJ4IXY4r+lLPOYPZYjdM0IE54QIlgIRBGej9rYBeBY1wK7WpvPBcuUQhvFXqcEfuUIzXQMrw1PjKI4YnuPFPQFyXxYC+EpeAF/NC+B59eY1Oc/aEzmqTcHXhDS+Hkhjw1p4IdTCN4QWXlylhZeEFr65vjW+TOBbAY0FzxzfJvDKsq2/uh6zrxF4fQPMni/A7BsEviOO/wXzt9p8v1uQyjcJvCXyUVMN3wuJ+z5vx/139LIzRdHU/d5Z70n/Ae+VfeXeq/tgv7cgqzd5S94zI95b/jn/Cf+Mf8Jf8M8c9pfw9uk+vCx0+Uves3u8F7yFnd5z/kKp91RZS9Fu//lW/81ge4cfBJvy233wQ4HqBYHqj1ai+uP/G6pKAVS9PKq+QPUneVR/mkf1ZzwE8udXBeQ7IZC/EED+chWQvxJA/np9IH9D4LfvCuTvCPx+Gcg/rAfkHwn8aQNA/rkAkH8h8NcrB/JvBYH8O4F/hIeiavhnCOS/BJCpdwHy/DKQI1cA5Lai497rjf7b5QGA/w4A/E8f/BcBZG7sf692I/I=
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eJy11glYFNcdAHBEYyJesZqYkKSuNJKNMuzuLMeCJQZBUECQ5XC0vpph9sEs7M7ufw4QlTbG1hOb4NWY1lStJmpaW2mS1rbR9k2P9L7v+77v+/q+/ufNIghLiPhl4ZvjzfGu3/8/7+HMPsjwSjdnZGR0U92IJjSYUjtFmoXnUc2kuqyYWGZAJoGp3j6Y5t6qU7CoYcJNfTDdK03Hkjg11UQEbpam4cm6hqZmuEWaioeWHoUZ0r14pJpmstTnCwbz3f/CQGnIH/L75GTUp1tafqeBdWfx59sSkV6YKdXhYVJPdFLFLAsG/GJAzDWS0QjVy1RDMLF+wT3NxcrlsiXFK5eIomHKJsX9kmD5Mtzhi7Wo1uEUFFfCLN54lcr4kAGz+2COV8rCkhaD6kJ5B9VMmEvgVukOp+Je7JAmGIqODdQ6VKvNJ+aL+QGYJ0tz8Xq5otCkKazSlEQEr8OrCMyXZuOFjq3RZJ4nQttj2BJYIPPhce+G2wjczodlqW8pLJR55RUJTaN8lOEOAnfysi5Kk4Ici3ZTyHZfUJFIdEUp3EXgbqkWzxVDbzcTXVQTy1b3SkZFj7y+Q60KV4jxzrXVRYlkr1QTKu5qrGupbikubiwJFYaNDbUVcTVYsb7R2rhRrmmIFYhWbbh+VTXcI0tz3IaYOARCHdU6TBVeTWCRlInlxUXgkbmIoTuae5MUFhPIkRZjqZxMxqKK7PTAt0Xo6ekR2hN6XLD0GHUGh0bgNTIfmXILh1SPbuW3wr2SiGUrZSOqeDbGX+7fhiJYYlnSLRyhkUSaFHL74D5XoTP7lgEI9X4vl+TUD0trX+QTH6eGIXdQWMa71VALeZaKCASvdJPTjVgs0QP5BHy8sY7hPE/DuuY1DfVN4E8zAAEConTrqAHgioPunDVRHWMKCggU8qe1jqi2xRfIDziMimT+7NAbrzoqJhDiTXccQYlrRBJakoapUzkOpQSWS/OwLCIbqhCXDQzSzUi0OwCvHQOqjMADo0GtkPnru2W9Fx4kUD6S10r3WqUjt4JApXQPnjVbNM8jip4GxfSI/kCJx19YGigsLfR7qtc2wyrZUmf2QVVq/HWnD9XKZmfgFSPLFZAFq6VWvLyI7XcmDn9T2fnWpflr8u4XWtlg2N5nP9pqH2zE/b4Ge1/dFJ31z57N+mexA/azs+0Dgv0cHvWzI51T8Fn7in1lagY7O5ftWngfnsMaXjeGW1QLQM0uqA1DnWFZfbDWq2KGqveqI9JSgzpDemA4HRlmQkcT+d3x/BGhnq8k4r6oSeOGT3R/vkBxiU8sXoHCdLPMD+vUmepqaedQp9hu9hi7QDLtK8TeW5KVsXjBNHtge/3iFUXJzfZTRewRtpddYsfsfvt4dYFqn2Vns+wn2UF7wD7TUrXMPhdnz7MX2NNV7HIHYYOVt9un2Z677T3ze9d5stiJ1QvK2V4v2yE2Tr+Nncpk5yst+9Bc9kSA7c+dg0MCjSp2OxyGJsdz85ik1kKg9aWT2npZxQQmpUlgG2QVs9ZGdaGsYnp6nZrtulVSbumQ200EyLDb18sq5o/NPNBKCuAhWZ10uMuWpWKEt3lVDGvFq2I0q1cjOMIjmHqlu1JJ1jAEJ6T0REwodwJaaNCjGHbQTqBDcvQsBTVtuosS6OQ3+KFLlu50whcxKxjruqxhotFNoYkq+DkzeyFGIC4JTkaRtwioB79OhcEiv9+/3BPVlJgVoU1WW2UiLkc1Y7knqdNYQo4sBy1NEkkQSPLaTLrF9CVj+Mhyj6LKukHNspbmKiEE4EaYrlZLTuiAwW2b3LZ1je1utF34smx3JtrgKm0rGXFmukedyTu1LYd/RHNKPTntUS1qqDSSk+fJ6aK9TpH7IU6FQ04fbHEmoNerorGtqstom+qi2Z5C04doHA5v4LNWUAhvvAEOD7scdnAOj4zisJNzeJN37Di/mcAuaeGYr5WTrmNRjRqw+5qcvYfA3rE5e9+YbLyfQD9fDjgDOpSNe2gbHJAnJPkWAo+qiHEef6VTcVShQmtqFfYYgQHJi5dE2uYvUGQlKAbFElEuDrWLYigULKF+f3u7GJCpvzAkin4FDo5I3ocIHH7J5C3y5H0kbSgcJfBWd6pC8PhEsXCMwBOTiIW3uajfjqj93FxaXsgOn+i+6jG1nEN2x3kQPMmD4B1OEDirqupVzXACY4DeaH5PFS4Rq7AYt2IxboK5SsLScCnqrjc3Y4vhpDqz3iF3igfAO1MBcDoVAGdSAfCUEwCTFv+0K/4sF39ulPjzXPwzacS/i8C7pdzxxY9OMxfGWZC8h8B7hxP7xYldDxJ4n5o+yT5L4Dne8lARPD+RrPcT+MAkZF1yZX1QrcYP9PZrVh0Z7JlWXF40skHRPtpYsyZvTTMbzGyyd9WU1tgX7WPsHDvyILvALrKTbAcbwO0Au4zbS+wM7gdwf5ldDAvrCyW2cwbbN33+puw6+0R2XWMBe9w+ZR98KMs+jJWcts+w3avZISdZf8j5Mn84DC9wr5e51ysjvX7kFfNalMbrR1NeGfdqp7x+LOX14ymvn7ghr590vb7IvX5qlNdPc6+fSeP1swQ+dz1ePz+O1y8Q+OKw1y9N7PXLBL4yjtevEvjakNevT+T1GwS+OQmv33K9fpt73Tau10PDXtuvg+uxFNcS5GpsysYF6MnsvHUFkn0CuU5xuGbbx+0BdtTl+h2H63fD8D3O9fuc6w9Gcv3hK8a1JA3XH6W4/phz/UmK609TXH+W4vrzG+L6C5frLznXX43i+mvO9TdpuP6WwO+uh+vvx+H6BwJ/HOb6p4m5/pnAX8bh+lcCf+MtLyqAv0/E9R8E/jkJrv9yuf6bcyXjcGWD9uFhr+zodYA9ExYWZtSz/pWLAg7I/zgg/xuG/yFIauX/HzWJVgA=
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eJy11gl8FNUdB/BweSQccoko1CUluEBmj9kcm9BIQ0ISCBByMqAPOpl9yUyyO7v/ORKCxiJaG64KiBVRQC0qUmuVKpUWoX3Tw973fd/3fZ8f//NmQ0KyMfmEj5t8dmfeHG/mzff3n7dzYi9k+aWrs7KyuqhhakkdJtROkKbiuqZb1JAVC9tMmEhgkr8XJnu7GhRsalowpReu8ktXYUuCWmoyBldLk3FlY11jE1wjTcJF29DgWmkxLqmWlSoNBiORgPdfGC6NhqKhoJzSgoatBzpM7DubH9+ajPVAjrQOF1NGsoMqVlkkHBLD4hIzpcWoUaaagoX9C97qEuxcLssrXpUniqYlWxR/8yLly/EHT6xrervbUFwJU/nFq1TGg0yY1gvT/VI2tjSb1BDK26luwQwC10k3uB334A3pgqkYeIF6u2q3BsWAGAjDTFmagdvLFYWmLGG1riRjuB1mEZgtTcMN7Tu0VL4vRtvieCUwR+bD4+0Ncwlcz4dlWXAZzJN55xVJXad8lOEGAvN5WyelKUGOa10UbvROUJFMdmoUbiKwQKrFdcU02qxkJ9XFspoeyazolje1q1UNFWKiY311UTLVI62NFnfWr2uubi4uri+JFjaYm2srEmqkYlO9vWWLvLYuXiDatQ0bVlfDQlma7l2IhUMgrKN6u6XCmwjcLE3E9uIi8MlcRP8eTT0pCosI5EqLsFVOpeKaIrt3ENwudHd3C21JIyHYRpy6g0Nj8GaZj0y5jUNqaDv4rrBYErFtlWxqim9LYqx/m4sgz7alazhCM4U0KSzphVs8he7Tt01AqEv9XJLbPyyrfZU/+AQ1TbmdwnJ+W3W1kG+riEDwS1Pc24jHk90QIBDkF+sazvfVbWxaU7ehEUIZBiBMQJSuGzIAXHHEe2aN1MBMQQGBQn603q7p24PhQNhlVCTzY/vPeMlRMYEov3TXEZR4RiShOWVaBpUTUEpghTQT22KyqQoJ2cSQbkOiXWF4yzBQZQRuHQpqpcxP3yUbPfBWAuWDea3ytlW6cisIVEoLca3Jpvk+UfTVKZZPDIVLfKHC0nBRKS5Ur2+C1bKt5vRCVXr8DfceqpVt7sArZrYnIBtqpGbcfDPb6z44/Exip1uWBdbkLxVa2JkGZ49zf4tzqB5/99Q5e9ZNMNi+adPYvqlsv/PiNGe/4LzE9uezwx0T8FDnonNxUhZ7sMY5uuAWXIc1vGtMm6aHYe19UNsA60zb7oX1fhUL1Aa/Oqgq1anXSrcOVCPTShpIItCVCAxKekBJJoKaRRNmUPQ+wXBxSVCMrkRghlUWgo1qjloj3dN/T+wkO8CeIxOdi8TZXZKdtWjOZOfgnRsWrSxKbXOeKmK72G52jj3s7HOOVReozil2Kts5zg45B50nm6uWO88k2Fn2Cnu6il1oJ+xM5fXOSda3wOmb3bPRl80eq5lTznb72d1i/VVz2RMT2elK23lgBjsaZnuXTMchgXoVb7uhARpdzk3DalozgZbXr2mbZBXrl5Shfm2WVSxaW9R5sorV6Tb1Ro+tkmZL+9neToAMsN0qq1g+tvGclRTA22R13GmXbVvFgLf6VUy14lcxzOqlAMd4gKlfuildY01TcBNlJONCuZtnoc7QMHXQRqBdcvUsAzVjtdMIdPAdQtApS/Pd9KJlBaNuyDrWGcMSGqmCbzOrB+IEEpLgFhR5u4B68OVUGCkKhUIrfJquxO0YbbRbK5MJWdPNFb6UQeNJObYC9Aw1JEkgxXuz6HYrmIrjISt8iiobJrXKmpuqhCiAFzBDrZbc5IDJbVvctn2Z7S60XTgm2x3JVrhE207F3Cfdrebwm7ojl79Dc0t9uW2arpkqjeXm+3I7aY/b5L2H03HI7YXt7gPo8atobIfqMbpD9dDcmUbTi2hcDnfxp1ZQCG+/Ag47PQ53cw67hnC4h3O41z98nN9B4D5p3rCXlVut45pOTXjnZSW7j8Du4SV7z7BivJfAPj4bcAe0vxh301bYL49K8l0E7lcR40x+SrdjTaFCS3oSdoDAQcmPm0TaGipQZCUiRsQSUS6OtoliNBopoaFQW5sYlmmoMCqKIQUODardDxA4/Hq1G1vc2v1gxii8m8BD3qOKwpHRsvAwgaPjyMIjHupHEXWIm8vIC9nhEV2XPKZnc8juGA/BcR6CE24I3ElV9eomeAwzQK+0vqcb88QqbMZvMYpfkSVK0tZxJupNN7fhFcPjas4Gl9wTPADvSQfgZDoAT0pzh4lLmO0pWemEp3g0nnajMe4snPKy8AzPwukhWXgvz8KzGbLwPgLP8Rl5xisbWn7eP8I85XkCLwwU/DOje/8AgRfVzMX3JQJnvalmAXxwNHEvEzg3DnEf8sR9WK3GF3fHZZORrM6t7DC+nXex84vz2AuLYSnrC22dxU6wY1vYOee4c8Q54Ox0+pwDNc4JXHy0AX/6Vjkn2Mvz2SnWN4U97vRNZo/kzmYHclvYvbmNcYaTmB13rWXnkm7JPu++n19pgAtc7UWu9iOD1X70DVNblEEtS6t1uNqPpdV+PK32E6pn85NXZPNVz+anuM1PD7H5GW7zsxlsfo7A58du8wsj2PwigS8N2Pzy6Da/QuCrI9j8GoGv99v8xmg2v0ngW+Ow+W3P5ne4zfgINs8O2LxtLDaf5TYXujYXoE0TbU5HmzloMz+LHWE769gurvO7rs7vNcD3uc4fcJ0/HKzzR2+YzpIMOn+c1vkTrvOnaZ0/S+v8eVrnL65I5y89nb/iOn89ROdvuM7fZtD5OwK/H7vOP4yg848E/jSg88+j6/wLgb+OoPNvBP7Or7uwEP4xms5/EvjXOHT+29P5H66zfgSd5y7pZM/fPgaes7Js5zy74DyU5xL8r0vwfw3wfyRI7cBrN61FBA==
49 changes: 47 additions & 2 deletions tests/client/test_items.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
import pytest
from six.moves import range

from .utils import normalize_job_for_tests

def _add_test_items(job):
for i in range(3):

def _add_test_items(job, size=3):
for i in range(size):
job.items.write({'id': i, 'data': 'data' + str(i)})
job.items.flush()
job.items.close()
Expand All @@ -28,6 +30,7 @@ def test_items_iter(spider, json_and_msgpack):

def test_items_list(spider, json_and_msgpack):
job = spider.jobs.run(meta={'state': 'running'})
job = normalize_job_for_tests(job)
_add_test_items(job)

o = job.items.list()
Expand All @@ -36,3 +39,45 @@ def test_items_list(spider, json_and_msgpack):
assert o[0] == {'id': 0, 'data': 'data0'}
assert o[1] == {'id': 1, 'data': 'data1'}
assert o[2] == {'id': 2, 'data': 'data2'}


def test_items_list_iter(spider, json_and_msgpack):
job = spider.jobs.run(meta={'state': 'running'})
job = normalize_job_for_tests(job)
_add_test_items(job)
job.finish()

o = job.items.list_iter(chunksize=2)
assert next(o) == [
{'id': 0, 'data': 'data0'},
{'id': 1, 'data': 'data1'},
]
assert next(o) == [
{'id': 2, 'data': 'data2'},
]
with pytest.raises(StopIteration):
next(o)


def test_items_list_iter_with_start_and_count(spider, json_and_msgpack):
job = spider.jobs.run(meta={'state': 'running'})
job = normalize_job_for_tests(job)
_add_test_items(job, size=10)
job.finish()

o = job.items.list_iter(chunksize=3, start=3, count=7)
assert next(o) == [
{'id': 3, 'data': 'data3'},
{'id': 4, 'data': 'data4'},
{'id': 5, 'data': 'data5'},
]
assert next(o) == [
{'id': 6, 'data': 'data6'},
{'id': 7, 'data': 'data7'},
{'id': 8, 'data': 'data8'},
]
assert next(o) == [
{'id': 9, 'data': 'data9'},
]
with pytest.raises(StopIteration):
next(o)
27 changes: 27 additions & 0 deletions tests/client/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,30 @@ def validate_default_meta(meta, state='pending', units=1,
assert meta.get('units') == units
assert meta.get('api_url') == TEST_DASH_ENDPOINT
assert meta.get('portia_url')


def normalize_job_for_tests(job):
"""A temporary workaround to deal with VCR.py cassettes(snapshots).

The existing tests highly rely on VCR.py which creates snapshots of real
HTTP requests and responses, and during the test process tries to match
requests with the snapshots. Sometimes it's hard to run an appropriate test
environment locally, so we allow to use our servers to create snapshots
for new tests, by "normalizing" the snapshots via patching hosts/credentials
on-the-fly before saving it (see #112).

The problem here is that we patch only requests data and not responses data,
which is pretty difficult to unify over the whole client. It means that if
some test gets data from API (say, a new job ID) and uses it to form another
requests (get the job data), it will form the HTTP requests differently,
thus it won't match with the snapshots during the test process and the tests
will fail.

As a temporary workaround, the helper gets a Job instance, extracts its key,
replaces the project ID part with TEST_PROJECT_ID, and returns a new Job.
So, the other requests done via the new job instance (updating job items,
accessing job logs, etc) will be done using proper URLs matching with
existing snapshots.
"""
normalized_key = '{}/{}'.format(TEST_PROJECT_ID, job.key.split('/', 1)[1])
return job._client.get_job(normalized_key)