A new method for quantized matrix multiplication in large language models enhances weight-only quantization with waterfilling techniques. This boosts efficiency in AI computations, reducing distortion limits while preserving performance in neural networks. https://arxiv.org/abs/2605.13768
Introducing Trajectory-Refined Distillation (TRD), a novel method that boosts large language models by addressing "prefix failure" in on-policy distillation. TRD refines student trajectories under teacher guidance, enhancing accuracy and reasoning coverage. https://arxiv.org/abs/2606.08432
The innovative DiScO framework refines large reasoning models by fostering diverse thinking schemata, leading to notable enhancements in problem-solving and error recovery for complex math tasks. https://arxiv.org/abs/2606.08974
A new GPU-accelerated algorithm reduces banded matrices to bidiagonal form, achieving up to 800× speed-up over CPU libraries. Using modern GPU architectures and memory-aware designs, this solution enables efficient linear algebra in scientific computing and AI. https://arxiv.org/abs/2510.12705
LEXRUBRIC is a new benchmark for evaluating open-ended legal tasks in Chinese, featuring over 12,000 expert criteria. It caters to the demand for reliable legal AI, demonstrating language models' varying capacities and limitations in resolving complex legal queries. https://arxiv.org/abs/2606.09389
PerspectiveGap shows LLMs struggle with role-specific prompts for multi-agent orchestration, garnering just a 14.9% average pass rate. Highlighting 110 real-world scenarios, the study signals a gap in AI model capabilities for better multi-agent system design. https://arxiv.org/abs/2606.08878
AI Firehose
Researchers created a method to analyze encrypted smartphone network traffic, revealing insights into stress, sleep disturbance, and loneliness. Their model captures behavior patterns, suggesting encrypted data can monitor mental health privacy-preservingly. https://arxiv.org/abs/2605.01616
A study shows ClinicalBench, revealing that Large Language Models (LLMs) excel in medical knowledge but lag behind traditional machine learning models in clinical prediction. Researchers urge caution about LLM adoption in clinical environments due to reasoning gaps. https://arxiv.org/abs/2411.06469
Researchers developed a dual-encoder framework that separates intrinsic signals of celestial objects from sensor artifacts. Using counterfactual generation on overlapping galaxy images enhances astrophysical insights and comparisons across instruments. https://arxiv.org/abs/2604.09787
New research presents PerspectiveGap, a benchmark for assessing LLMs’ ability to create prompts for multi-agent systems. Findings indicate GPT-5.5 surpasses rivals but expose issues in orchestration prompting, stressing the need for improved AI communication. https://arxiv.org/abs/2606.08878
AI Firehose
AI Firehose
AI Firehose
AI Firehose
AI Firehose
AI Firehose
AI Firehose
AI Firehose
AI Firehose
arxiv.org
ArXiv link for Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics