We have released #AgentCoMa, an agentic reasoning benchmark where each task requires a mix of commonsense and math to be solved š§
LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? š¤
š§µā¬ļø