Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Improvement: Support Image Inputs via Base64, URL, Bytes, and Dictionary Formats in Agent Messages. #1497

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MANISH007700
Copy link
Contributor

Summary
This PR addresses Issue #1460 by updating the agent.py file to enable support for multiple image input formats.
The agent can now handle images in the following formats:

  • URL format
  • Bytes format
  • Base64 format
  • Dictionary format

where earlier it used to only handle adding images to message content through URLs.

Implementation Details
Modified the image handling logic in agent.py to accommodate diverse input formats.
Ensured compatibility with existing functionality while extending support for these new formats.

Testing
Below is an example script (test_image_functionality) demonstrating how to test the updated functionality with various image formats:

def test_image_functionality():
    agent = Agent(
        model=OpenAIChat(id="gpt-4o", api_key=os.getenv("OPENAI_API_KEY")),
        markdown=True,
    )

    # Test cases with different image formats
    image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
    
    # 1. Test with URL
    print("\n=== Testing URL input ===")
    agent.print_response("What's in this image?", images=[image_url])

    # 2. Test with bytes
    print("\n=== Testing bytes input ===")
    image_bytes = requests.get(image_url).content
    agent.print_response("Describe this image.", images=[image_bytes])

    # 3. Test with base64
    print("\n=== Testing base64 input ===")
    base64_image = base64.b64encode(image_bytes).decode('utf-8')
    base64_url = f"data:image/jpeg;base64,{base64_image}"
    agent.print_response("What do you see in this image?", images=[base64_url])

    # 4. Test with dictionary format
    print("\n=== Testing dictionary input ===")
    dict_format = {
        "url": image_url
    }
    agent.print_response("Describe this image briefly.", images=[dict_format])

if __name__ == "__main__":
    test_image_functionality()

Notes
This functionality has been tested with the provided script using OpenAI models (gpt-4o).

Important: Testing with OSS models has not been performed. Please verify compatibility on your end and let me know if any adjustments are required.

Thanks.

cc : @ashpreetbedi @ysolanky @manthanguptaa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant