Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always trim string fields #101

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Nykakin
Copy link
Contributor

@Nykakin Nykakin commented Aug 21, 2024

Always trim string fields.

For all fields of type Optional[str] or str apply str.strip() through processors.

For all fields of type Optional[List[str]] or List[str] apply str.strip() to all elements through processors.

Extract processors to separate classes following approach found in extractors to reduce duplication in code.

@Nykakin Nykakin changed the title Always trim strings Always trim string fields Aug 21, 2024
"""Processor for string values"""
if isinstance(value, str):
return value.strip()
return value
Copy link
Contributor Author

@Nykakin Nykakin Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should return None if value is not a string instance...? Need an opinion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should convert it to str unless it is None.

I also wonder if we should return None if not value.strip().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are easy to implement. I don't particullary mind but I don't feel I am the one to make such decision.

@Nykakin Nykakin mentioned this pull request Aug 21, 2024
def loop(values):
if not isinstance(values, Iterable):
return values
return [processor(value) for value in values]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should skip None values in the resulting list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of following this PR with another one that would apply some sort of "clear empty items" on all fields returning lists of items.

But later in this comment I was told that by design we should keep processors transparent, so right now I'm not sure how much of post-processing is actually allowed. This particular PR also goes against this philosophy, though I'd argue is relatively non-invasive and seems useful.

I guess decision needs to be made at higher level whenever we want to stick to this design goal or are we deliberately going to break it next release, which would allow this and other similar PR to be included in the library.

Copy link
Contributor

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+0.5 to these processors.

Even if we are processing str, I feel it is justified to meet the schema requirements, which I feel are for all these fields to be trimmed, even if the schema only says that explicitly for article body.

Copy link

codecov bot commented Aug 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 158 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (cce3b94) to head (2800ff8).
Report is 6 commits behind head on main.

Files Patch % Lines
zyte_common_items/pages/product.py 0.00% 25 Missing ⚠️
zyte_common_items/pages/job_posting.py 0.00% 21 Missing ⚠️
zyte_common_items/pages/real_estate.py 0.00% 19 Missing ⚠️
zyte_common_items/pages/business_place.py 0.00% 18 Missing ⚠️
zyte_common_items/pages/article.py 0.00% 17 Missing ⚠️
zyte_common_items/pages/social_media_post.py 0.00% 10 Missing ⚠️
zyte_common_items/processors.py 0.00% 10 Missing ⚠️
zyte_common_items/pages/product_list.py 0.00% 9 Missing ⚠️
zyte_common_items/pages/product_navigation.py 0.00% 9 Missing ⚠️
zyte_common_items/pages/article_list.py 0.00% 8 Missing ⚠️
... and 2 more
Additional details and impacted files
@@          Coverage Diff           @@
##            main    #101    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files         54      55     +1     
  Lines       2107    2224   +117     
======================================
- Misses      2107    2224   +117     
Files Coverage Δ
zyte_common_items/pages/base.py 0.00% <0.00%> (ø)
zyte_common_items/pages/article_navigation.py 0.00% <0.00%> (ø)
zyte_common_items/pages/article_list.py 0.00% <0.00%> (ø)
zyte_common_items/pages/product_list.py 0.00% <0.00%> (ø)
zyte_common_items/pages/product_navigation.py 0.00% <0.00%> (ø)
zyte_common_items/pages/social_media_post.py 0.00% <0.00%> (ø)
zyte_common_items/processors.py 0.00% <0.00%> (ø)
zyte_common_items/pages/article.py 0.00% <0.00%> (ø)
zyte_common_items/pages/business_place.py 0.00% <0.00%> (ø)
zyte_common_items/pages/real_estate.py 0.00% <0.00%> (ø)
... and 2 more

@kmike
Copy link
Contributor

kmike commented Sep 9, 2024

I think that's fine to strip string fields, but we need to change the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants