Feature Improvement: Support Image Inputs via Base64, URL, Bytes, and Dictionary Formats in Agent Messages. #1497

MANISH007700 · 2024-11-29T18:47:56Z

Summary
This PR addresses Issue #1460 by updating the agent.py file to enable support for multiple image input formats.
The agent can now handle images in the following formats:

URL format
Bytes format
Base64 format
Dictionary format

where earlier it used to only handle adding images to message content through URLs.

Implementation Details
Modified the image handling logic in agent.py to accommodate diverse input formats.
Ensured compatibility with existing functionality while extending support for these new formats.

Testing
Below is an example script (test_image_functionality) demonstrating how to test the updated functionality with various image formats:

def test_image_functionality():
    agent = Agent(
        model=OpenAIChat(id="gpt-4o", api_key=os.getenv("OPENAI_API_KEY")),
        markdown=True,
    )

    # Test cases with different image formats
    image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
    
    # 1. Test with URL
    print("\n=== Testing URL input ===")
    agent.print_response("What's in this image?", images=[image_url])

    # 2. Test with bytes
    print("\n=== Testing bytes input ===")
    image_bytes = requests.get(image_url).content
    agent.print_response("Describe this image.", images=[image_bytes])

    # 3. Test with base64
    print("\n=== Testing base64 input ===")
    base64_image = base64.b64encode(image_bytes).decode('utf-8')
    base64_url = f"data:image/jpeg;base64,{base64_image}"
    agent.print_response("What do you see in this image?", images=[base64_url])

    # 4. Test with dictionary format
    print("\n=== Testing dictionary input ===")
    dict_format = {
        "url": image_url
    }
    agent.print_response("Describe this image briefly.", images=[dict_format])

if __name__ == "__main__":
    test_image_functionality()

Notes
This functionality has been tested with the provided script using OpenAI models (gpt-4o).

Important: Testing with OSS models has not been performed. Please verify compatibility on your end and let me know if any adjustments are required.

Thanks.

cc : @ashpreetbedi @ysolanky @manthanguptaa

MANISH007700 added 2 commits November 29, 2024 18:37

Add: Pass Image in Message via Base64, ImageURL, ImageBytes and Dict

0550239

Fix: Import error, added them.

92ecde6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Improvement: Support Image Inputs via Base64, URL, Bytes, and Dictionary Formats in Agent Messages. #1497

Feature Improvement: Support Image Inputs via Base64, URL, Bytes, and Dictionary Formats in Agent Messages. #1497

MANISH007700 commented Nov 29, 2024

Feature Improvement: Support Image Inputs via Base64, URL, Bytes, and Dictionary Formats in Agent Messages. #1497

Are you sure you want to change the base?

Feature Improvement: Support Image Inputs via Base64, URL, Bytes, and Dictionary Formats in Agent Messages. #1497

Conversation

MANISH007700 commented Nov 29, 2024