Feature Improvement: Support Image Inputs via Base64, URL, Bytes, and Dictionary Formats in Agent Messages. #1497
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses Issue #1460 by updating the agent.py file to enable support for multiple image input formats.
The agent can now handle images in the following formats:
where earlier it used to only handle adding images to message content through URLs.
Implementation Details
Modified the image handling logic in agent.py to accommodate diverse input formats.
Ensured compatibility with existing functionality while extending support for these new formats.
Testing
Below is an example script (test_image_functionality) demonstrating how to test the updated functionality with various image formats:
Notes
This functionality has been tested with the provided script using OpenAI models (gpt-4o).
Important: Testing with OSS models has not been performed. Please verify compatibility on your end and let me know if any adjustments are required.
Thanks.
cc : @ashpreetbedi @ysolanky @manthanguptaa