Cohere is focused on building large language models tailored for real-world business solutions, rather than pursuing artificial general intelligence (AGI). Their recently released Command+R models excel at multilingual retrieval-augmented generation (RAG) and tool use. RAG, where the model generates queries to retrieve relevant information before providing an answer, is a standout feature that addresses the hallucination issue by grounding outputs in external data.

Cohere sees RAG and multi-step tool use, where models chain together different capabilities like web search and code execution, as key to making language models truly useful in production settings. However, they don’t view these models as agentic – the models are still fundamentally constrained by their training data. While increasingly capable, LLMs remain specialized tools best suited for augmenting human intelligence rather than automating businesses entirely.

Looking ahead, Cohere believes LLMs will continue rapidly improving at composing different capabilities and integrating with external systems. But benchmarking challenges remain, as many current benchmarks don’t reflect real-world business use cases. As the technology matures, benchmarking and evaluation should evolve to focus on quantifying an LLM’s practical utility for solving problems, rather than just open-ended chat ability. Cohere encourages developers to explore their open-source chat toolkit incorporating the latest RAG and tool use capabilities.