microsoft/autogen

[Issue]: Accessible Interface for Autogen Studio & updated multimodal agents

Open

#1519 opened on Feb 2, 2024

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Python (58,033 stars) (8,759 forks)batch import
help wantedproj-studio

Description

Describe the issue

This is specific to Autogen Studio following my discussion with folks working with autogen :-)

Create a fully accessible interface for autogen studio

It's very important to me that tools i use are fully accessible. That often means multimodality for user inputs.

  • in that sense multimodal on-device models are very useful.

User Input

  • image , audio and text inputs using on device models
  • audio and text outputs for autogen studio returns

Multimodal agents

The current examples of multimodal agents have not taken advantage of llava plus yet. it's a great opportunity to review and update multimodal agents and demonstrate them in context.

Requirements

Autogen Studio

  • audio input / output
  • image input

Blog : Autogen Studio with on device multimodal agents

Multimodal Agent Notebook Image Agent

  • Simple image agent that can parse image inputs in 2-way chats
  • Complex image agent on-device model & tools demo

Multimodal Agent Nnotebook Audio Agent(s) :

  • simple audio agent that can audio to text
  • complex audio demo that can text to studio :-)

Linked Issues :

My Linked Repo :

Autogen Community Contributors !

Hey we're all just doing our best to push our cool demos and ideas upstream, the best for me is to meet like minded contributors in order to co-create the accessible interface we want to use ;-) and also organise it a bit cleanly with "my linked repo" but:

  • that said, dont be shy to just contribute to this issue is you own branch :-)

Steps to reproduce

  • open autogen studio , cannot type : need audio
  • open autogen studio , you're 4.5 years of age : need image to text
  • open autogen studio , you're driving , cannot take the laptop to read the output : need text to speech

Screenshots and logs

No response

Additional Information

No response

Contributor guide