Just one day after OpenAI revealed GPT-4o, which it bills as being able to understand what’s taking place in a video feed and converse about it, Google announced Project Astra, a research prototype that features similar video comprehension capabilities. It was announced by Google DeepMind CEO Demis Hassabis on Tuesday at the Google I/O conference keynote in Mountain View.
Hassabis called Astra “a universal agent helpful in everyday life.” During a demonstration, the research model showcased its capabilities by identifying sound-producing objects, providing creative alliterations, explaining code on a monitor, and locating misplaced items. The AI assistant also exhibited its potential in wearable devices, such as smart glasses, where it could analyze diagrams, suggest improvements, and generate witty responses to visual prompts.
Google says that Astra uses the camera and microphone on a user’s device to provide assistance in everyday life. By continuously processing and encoding video frames and speech input, Astra creates a timeline of events and caches the information for quick recall. The company says that this enables the AI to identify objects, answer questions, and remember things it has seen that are no longer in the camera’s frame.
Read 14 remaining paragraphs | Comments