A blind LLM can create and refine images?

Researchers at MIT CSAIL found that large language models (LLMs) trained only on text data have an impressive understanding of visual concepts. By prompting LLMs to generate code for rendering images, the researchers collected a dataset of simple digital illustrations. Remarkably, the LLMs could iteratively improve these illustrations when prompted, demonstrating their robust visual knowledge gained from textual descriptions.

Using this LLM-generated dataset for training, the MIT team built a computer vision system that recognizes objects in real photos despite never seeing photo data. Their approach outperformed methods using procedurally generated images. However, the LLMs sometimes failed to identify human recreations of the images they could generate, highlighting inconsistencies in how their visual knowledge is represented.

—
Understanding the visual knowledge of language models
LLMs trained primarily on text can generate complex visual concepts through code with self-correction. Researchers used these illustrations to train an image-free computer vision system to recognize real photos.
https://news.mit.edu/2024/understanding-visual-knowledge-language-models-0617

Blog

A blind LLM can create and refine images?

Luis G de la Fuente

Posts by Luis G de la Fuente

Do you think it’s a waste of time to build an alternative to a big SaaS?

Custom GPTs are a step in the opposite direction to AGI

Tech progress stagnant since the 70s ?

Comments by Luis G de la Fuente

A blind LLM can create and refine images?

HAL149 Mission Status: cambios en branding y aplicación online

Las alucinaciones son una feature, no un bug de los LLM

Luis G de la Fuente

Posts by Luis G de la Fuente

Do you think it’s a waste of time to build an alternative to a big SaaS?

Custom GPTs are a step in the opposite direction to AGI

Tech progress stagnant since the 70s ?

Comments by Luis G de la Fuente