-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pytx] Implement FileContent class #1680
base: main
Are you sure you want to change the base?
[pytx] Implement FileContent class #1680
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having trouble parsing your comment, did you post this accidentally before completing your thought?
Toplevel comments:
- There is some unexpected functionality in here that I think we should talk about more - see questions inline
- All of the other
ContentType
s today are powered by static/class methods, since we haven't defined what instantiation means yet. I propose we stick with static/class methods to start with, unless you have a specific proposal that depends on making these methods more objective.
@@ -21,8 +21,8 @@ def get_name(cls) -> str: | |||
|
|||
@classmethod | |||
def extract_additional_content( | |||
cls, content_arg: str | |||
) -> t.List[t.Tuple[t.Type["ContentType"], str]]: | |||
cls, content_in_file: Path, available_content: t.Sequence[t.Type["ContentType"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking question: Walk me through your thoughts about these changes. Why are they on this PR?
|
||
class FileContent(ContentType): | ||
""" | ||
Content representing a file. Determines if a file is a photo or video based on file extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested:
Content representing a file. Determines if a file is a photo or video based on file extension. | |
ContentType representing a generic file. | |
Determines if a file is a photo or video based on file extension. |
@classmethod | ||
def get_name(cls) -> str: | ||
return "File" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this the same as a default implementation, and so can be removed
def __init__(self, file_name: str): | ||
self.file_name = file_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking q: Currently, I don't any ContentType has an implementation for instantiation (we couldn't agree on what it should mean at the time, and so just saved it for the future). What would be your arguments to make file the first one?
Content representing a file. Determines if a file is a photo or video based on file extension. | ||
""" | ||
|
||
VALID_PHOTO_EXTENSIONS = {".jpg", ".jpeg", ".png", ".gif"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For photos, we can align with the image types that Pillow supports. Check out this code:
from PIL import Image
exts = Image.registered_extensions()
def get_content_type_from_filename(self) -> t.Type[ContentType]: | ||
""" | ||
Determines content type based on file extension. | ||
""" | ||
if any(self.file_name.endswith(ext) for ext in self.VALID_PHOTO_EXTENSIONS): | ||
return PhotoContent | ||
elif any(self.file_name.endswith(ext) for ext in self.VALID_VIDEO_EXTENSIONS): | ||
return VideoContent | ||
else: | ||
raise ValueError(f"Unknown content type for file: {self.file_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: Since the caller can't easily know whether the type is supported before calling, suggest making this return None
instead of throwing an exception.
from threatexchange.content_type.video import VideoContent | ||
from threatexchange.content_type.file_content import FileContent | ||
|
||
class TestFileContentType(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: In this library, we use py.test and not unittest.TestCase. Take look at some of the other tests in this file to see how to write those.
The logic overall looks good though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any tests that using py.test
All are using unittest.TestCase
@Dcallies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very definitive statement! Here's one: https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/cli/tests/cli_smoke_test.py
I will open issues for you to convert all of the old ones to py.test :)
def get_name(cls) -> str: | ||
return "File" | ||
|
||
def get_content_type_from_filename(self) -> t.Type[ContentType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested signature
def get_content_type_from_filename(self) -> t.Type[ContentType]: | |
@classmethod | |
def get_content_type_from_filename(cls, file_name: str) -> t.Type[ContentType]: |
I have added all the requested changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please:
- Convert the test to py.test as previously requested
- Make sure lint and testing is passing (you can run it locally before submitting)
@@ -0,0 +1,33 @@ | |||
import unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: Please convert this to py.test as previously requested.
Additionally, we want the tests for modules to live in the same directory as the code. Please move this file to
threatexchange/content_type/tests/test_file_content_type.py
Determines if a file is a photo or video based on file extension. | ||
""" | ||
|
||
VALID_PHOTO_EXTENSIONS = {ext.lower() for ext in Image.registered_extensions()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is doing work during module import time, which is generally considered an anti-pattern, but I don't think it's worth changing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you tell me more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the first result for me from google for "doing work during module import time"
https://www.benkuhn.net/importtime/
As mentioned, I don't think it's worth changing.
How to run the linting and testing locally? |
It's still in the same place in https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/CONTRIBUTING.md :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: Please make sure all lint and CI are passing. You can test them locally before submitting.
from threatexchange.content_type.video import VideoContent | ||
from threatexchange.content_type.file_content import FileContent | ||
|
||
@pytest.mark.parametrize("file_name,expected_content_type", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Determines if a file is a photo or video based on file extension. | ||
""" | ||
|
||
VALID_PHOTO_EXTENSIONS = {ext.lower() for ext in Image.registered_extensions()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the first result for me from google for "doing work during module import time"
https://www.benkuhn.net/importtime/
As mentioned, I don't think it's worth changing.
as either PhotoContent or VideoContent based on file extension. | ||
""" | ||
content_type = FileContent.get_content_type_from_filename(file_name) | ||
assert content_type == expected_content_type, f"Failed for {file_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need to add a message - parameterize will do that for you.
content_type = FileContent.get_content_type_from_filename(file_name) | ||
assert content_type == expected_content_type, f"Failed for {file_name}" | ||
|
||
def test_unknown_file_type(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can combine this into your previous test by just doing
("file.txt", None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to run linting locally before submitting! I cannot review this PR anymore today - you can save time by running tests and typing yourself!
@Dcallies |
Hey @ZeyadTarekk it looks like you are showing the the result of a Here are my suggestions:
|
I have already run the linter and it gives me that there is no problem And when I try to run the tests it is giving me error in the imports because the packages are not installed using |
Hey @ZeyadTarekk , @haianhng31 can teach you about a workaround using test selection with pytest, or you can read the command line |
Summary
Added FileContent class that detects the type of the file
Test Plan
added some unit tests to test the class