Inlay

Profile

A new method for quantized matrix multiplication in large language models enhances weight-only quantization with waterfilling techniques. This boosts efficiency in AI computations, reducing distortion limits while preserving performance in neural networks. https://arxiv.org/abs/2605.13768

Introducing Trajectory-Refined Distillation (TRD), a novel method that boosts large language models by addressing "prefix failure" in on-policy distillation. TRD refines student trajectories under teacher guidance, enhancing accuracy and reasoning coverage. https://arxiv.org/abs/2606.08432

The innovative DiScO framework refines large reasoning models by fostering diverse thinking schemata, leading to notable enhancements in problem-solving and error recovery for complex math tasks. https://arxiv.org/abs/2606.08974

A new GPU-accelerated algorithm reduces banded matrices to bidiagonal form, achieving up to 800× speed-up over CPU libraries. Using modern GPU architectures and memory-aware designs, this solution enables efficient linear algebra in scientific computing and AI. https://arxiv.org/abs/2510.12705

LEXRUBRIC is a new benchmark for evaluating open-ended legal tasks in Chinese, featuring over 12,000 expert criteria. It caters to the demand for reliable legal AI, demonstrating language models' varying capacities and limitations in resolving complex legal queries. https://arxiv.org/abs/2606.09389

PerspectiveGap shows LLMs struggle with role-specific prompts for multi-agent orchestration, garnering just a 14.9% average pass rate. Highlighting 110 real-world scenarios, the study signals a gap in AI model capabilities for better multi-agent system design. https://arxiv.org/abs/2606.08878

AI Firehose

Researchers created a method to analyze encrypted smartphone network traffic, revealing insights into stress, sleep disturbance, and loneliness. Their model captures behavior patterns, suggesting encrypted data can monitor mental health privacy-preservingly. https://arxiv.org/abs/2605.01616

A study shows ClinicalBench, revealing that Large Language Models (LLMs) excel in medical knowledge but lag behind traditional machine learning models in clinical prediction. Researchers urge caution about LLM adoption in clinical environments due to reasoning gaps. https://arxiv.org/abs/2411.06469

Researchers developed a dual-encoder framework that separates intrinsic signals of celestial objects from sensor artifacts. Using counterfactual generation on overlapping galaxy images enhances astrophysical insights and comparisons across instruments. https://arxiv.org/abs/2604.09787

New research presents PerspectiveGap, a benchmark for assessing LLMs’ ability to create prompts for multi-agent systems. Findings indicate GPT-5.5 surpasses rivals but expose issues in orchestration prompting, stressing the need for improved AI communication. https://arxiv.org/abs/2606.08878

53m

52m

33m

AI Firehose

arxiv.org

ArXiv link for Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics

Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics

ArXiv link for High-Rate Quantized Matrix Multiplication II

arxiv.org

High-Rate Quantized Matrix Multiplication II

ArXiv link for PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting

arxiv.org

PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting

ArXiv link for Trajectory-Refined Distillation

arxiv.org

Trajectory-Refined Distillation

ArXiv link for Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs

arxiv.org

Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs

ArXiv link for Diverse Thinking Schemata Elicit Better Reasoning in Large Language Models

arxiv.org

ArXiv link for LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

arxiv.org

Diverse Thinking Schemata Elicit Better Reasoning in Large Language Models

LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

ArXiv link for ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

arxiv.org

ArXiv link for PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting

arxiv.org

PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting

ArXiv link for Learning Behavioral Signals from Encrypted Smartphone Network Traffic

arxiv.org

Learning Behavioral Signals from Encrypted Smartphone Network Traffic