At this point, anyone who follows generative AI is used to tools that can generate passive, consumable content in the form of text, images, video, and audio. Google DeepMind’s recently unveiled Genie model (for “GENerative Interactive Environment”) does something altogether different, converting images into “interactive, playable environments that can be easily created, stepped into, and explored.”
DeepMind’s Genie announcement page shows plenty of sample GIFs of simple platform-style games generated from static starting images (children’s sketches, real-world photographs, etc.) or even text prompts passed through ImageGen2. While those slick-looking GIFs gloss over some major current limitations that are discussed in the full research paper, AI researchers are still excited about how Genie’s generalizable “foundational world modeling” could help supercharge machine learning going forward.
Under the hood
While Genie’s output looks similar at a glance to what might come from a basic 2D game engine, the model doesn’t actually draw sprites and code a playable platformer in the same way a human game developer might. Instead, the system treats its starting image (or images) as frames of a video and generates a best guess at what the entire next frame (or frames) should look like when given a specific input.
Read 15 remaining paragraphs | Comments