Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow TIFF (and more) image formats in load_yolo_annotations #1554

Open
2 tasks done
patel-zeel opened this issue Sep 28, 2024 · 9 comments
Open
2 tasks done

Allow TIFF (and more) image formats in load_yolo_annotations #1554

patel-zeel opened this issue Sep 28, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request hacktoberfest Open for contributions during the annual Hacktoberfest event, aimed at encouraging open-source parti

Comments

@patel-zeel
Copy link
Contributor

patel-zeel commented Sep 28, 2024

Search before asking

  • I have searched the Supervision issues and found no similar feature requests.

Description

  • Currently, load_yolo_annotations only allows png, jpg, and jpeg file formats. load_yolo_annotations is internally called by sv.DetectionDataset.from_yolo functionality.

directory=images_directory_path, extensions=["jpg", "jpeg", "png"]

  • Ultralytics supports a wide variety of image formats. Copied the following table from their website.
Image Suffix Example Predict Command Reference
.bmp yolo predict source=image.bmp Microsoft BMP File Format
.dng yolo predict source=image.dng Adobe DNG
.jpeg yolo predict source=image.jpeg JPEG
.jpg yolo predict source=image.jpg JPEG
.mpo yolo predict source=image.mpo Multi Picture Object
.png yolo predict source=image.png Portable Network Graphics
.tif yolo predict source=image.tif Tag Image File Format
.tiff yolo predict source=image.tiff Tag Image File Format
.webp yolo predict source=image.webp WebP
.pfm yolo predict source=image.pfm Portable FloatMap
  • Use of TIFF files is common in satellite imagery such as Sentinel-2. One may prefer to preserve the TIFF format over convert it to PNG/JPG because TIFF allows the storage of georeferencing information.
  • I see that the load_yolo_annotations uses cv2.imread to read the image files. OpenCV seems to support many of the Ultralytics-supported formats.

image = cv2.imread(image_path)

Proposals

  • P1: We can expand the hardcoded list of allowed formats.
  • P2: We may choose not to restrict the image format and let it fail later.

Use case

  • Everyone who'd like to use formats other than png, jpg, and jpeg will be able to use this extension.

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@patel-zeel patel-zeel added the enhancement New feature or request label Sep 28, 2024
@patel-zeel patel-zeel changed the title Allow TIFF (and more) formats for images in load_yolo_annotations Allow TIFF (and more) image formats in load_yolo_annotations Sep 28, 2024
@LinasKo
Copy link
Collaborator

LinasKo commented Sep 28, 2024

Hi @patel-zeel 👋

Thanks for giving a very specific example. That's invaluable to us.

Generally - sounds good!

What we need to be careful about is silent failures. For example, cv2.imread does not raise an error when an image is not loaded - it simply returns None. Reading videos was similarly problematic, as it wouldn't tell of a missing codex. I bet the current code doesn't handle it all that well.

Next, formats like TIFF may have image layers. We probably want the merged image in those cases. I don't know the peculiaritues of other models.

Generally, If you're aware of what's restricting us from having more image formats, I'm happy for us to limit those restrictions.

@patel-zeel
Copy link
Contributor Author

patel-zeel commented Sep 30, 2024

Hi @LinasKo,

cv2.imread does not raise an error when an image is not loaded - it simply returns None. Reading videos was similarly problematic, as it wouldn't tell of a missing codex. I bet the current code doesn't handle it all that well.

I see. Is there any reason we are sticking to cv2? Isn't from PIL import Image more Pythonic? At least for the scope of this issue, we are loading the image in memory to know its dimensions. I can test from PIL import Image for some known issues that you might have come across already.

image = cv2.imread(image_path)
lines = read_txt_file(file_path=annotation_path, skip_empty=True)
h, w, _ = image.shape
resolution_wh = (w, h)

Next, formats like TIFF may have image layers. We probably want the merged image in those cases. I don't know the peculiarities of other models.

That's true. We may also have to assert image.shape[2] == 3. Does supervision/ultralytics work with images with more or less than three channels? I checked that in the case of a multi-image TIFF file, from PIL import Image only loads the first image, which, in some way, takes away the burden of handling it at our end.

from PIL import Image

# Create and save a multi-image TIFF
img1 = Image.new('RGB', (111, 222), color='red')    # 100x100
img2 = Image.new('RGB', (333, 444), color='green')  # 200x150
img3 = Image.new('RGB', (555, 666), color='blue')   # 300x100
img1.save('/tmp/multi_image.tif', save_all=True, append_images=[img2, img3])

# Read the multi-image TIFF
img = Image.open('/tmp/multi_image.tif')
print(img.size)
# Output: (111, 222)

Also, if an image is corrupt, the current code will generate the following error which is uninformative.

---> h, w, _ = image.shape

AttributeError: 'NoneType' object has no attribute 'shape'

Overall, if we can replace cv2 with something else that raises an error on failure, we may lift the image format checker and let the image loader raise the error. If we need to use cv2, it seems challenging to raise a meaningful error with the user about why the image loading was unsuccessful.

@LinasKo
Copy link
Collaborator

LinasKo commented Sep 30, 2024

cv2 loads the image as a numpy array, which makes all subsequent computation easier. I'd strongly prefer to stick with cv2 and avoid conversions from PIL every time.

Supervision uses transparent images very infrequently, e.g. when loading images for the IconAnnotator. If possible, it would be great to keep the channel size flexible. However, if you spot cv2.imread with no extra parameters, it means the image is 2d or 3d. cv2.IMREAD_UNCHANGED param is required to read transparent images.

I've tested it - calling an object detection model in ultralytics on an image with transparency raises a runtime error.

RuntimeError: Given groups=1, weight of size [16, 3, 3, 3], expected input[1, 4, 640, 640] to have 3 channels, but got 4 channels instead

Also, if an image is corrupt, the current code will generate the following error which is uninformative.

Agreed. After most imreads, we should check if the image is None and raise a ValueError. Because of opencv's popularity and convenience, I'd still prefer to stick to it, but we can raise ValueError("Failed to read image") for an earlier warning

That's my line of thinking - hopefully it clears things up a little.

@patel-zeel
Copy link
Contributor Author

patel-zeel commented Sep 30, 2024

cv2 loads the image as a numpy array, which makes all subsequent computation easier. I'd strongly prefer to stick with cv2 and avoid conversions from PIL every time.

I did a small benchmarking around this in this colab. PIL seems a lot faster than cv2 when checking the shape without loading the data. cv2 is faster than PIL when loading the numpy data.

  • Time (Min - Max) taken (in seconds) to load 500 images (size: 640 x 640) of each type:
Image type cv2 load as numpy PIL shape check only PIL load as numpy
PNG 2.58 - 3.91 0.05 - 0.07 3.49 - 4.77
JPG 4.52 - 5.65 0.07 - 0.12 5.41 - 6.63
TIF 5.27 - 6.64 0.22 - 0.34 11.15 - 11.54

I've tested it - calling an object detection model in ultralytics on an image with transparency raises a runtime error.

Okay, in that case, shall we assert image.shape[2] == 3?

Agreed. After most imreads, we should check if the image is None and raise a ValueError. Because of opencv's popularity and convenience, I'd still prefer to stick to it, but we can raise ValueError("Failed to read image") for an earlier warning

Do the following changes seem reasonable to you?

If we go with cv2:

  • if image is None, we raise ValueError(f"Failed to read image from '{image_path}'").

In any case:

  • We remove type restriction by extension and allow any extension by default.
  • We assert image.shape[2] == 3

@LinasKo
Copy link
Collaborator

LinasKo commented Oct 1, 2024

Interesting details about the shape check. Thanks for carrying out the speed analysis!

I'm happy for us to change the shape check to use PIL in yolo dataset case, as long as we leave a short comment explaining why Pil and not cv2 is used there. In most other places, I'd stick to cv2 for consistency, even if it's slower.

Do the following changes seem reasonable to you?

"Yes" to everything, but I'd steer us towards raising errors rather than using assert. While I personally like assert statements, our standard in this repo is to opt for ValueError instead. I reserve assertions only for assumption checks that we never expect users to see. (ValueError = caller made the mistake, Assertion = us developers made the mistake)

@LinasKo LinasKo added the hacktoberfest Open for contributions during the annual Hacktoberfest event, aimed at encouraging open-source parti label Oct 3, 2024
@LinasKo
Copy link
Collaborator

LinasKo commented Oct 3, 2024

Marking this issue as eligible for Hacktoberfest.
It aligns very well the theme we'd like to have, focusing on reliability and feature completeness of supervision.

@patel-zeel
Copy link
Contributor Author

@LinasKo, feel free to unassign me to open this for the community. I'm in a bit of deadline mode and I'd love to see more people contributing to this repo.

@onuralpszr
Copy link
Collaborator

@patel-zeel thank you for the research and comments you made so far. I am un-assigning from you

@LinasKo
Copy link
Collaborator

LinasKo commented Oct 3, 2024

@onuralpszr, no, that's too hasty of a decision.

This issue starts from simple foundations but contains a lot of detail, discussions. It can be solved in different ways and at different depths. It is not specific enough for the general community.

I want it to stay Patel's domain as he's proven to have a thoughtful approach and handle both the ambiguity & edge cases.

@patel-zeel, I'll leave this assigned to you - take as much time as you need! If you don't find a good occasion to solve it by November, I'll be happy to take over 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hacktoberfest Open for contributions during the annual Hacktoberfest event, aimed at encouraging open-source parti
Projects
None yet
Development

No branches or pull requests

3 participants