Transformers without normalization

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations

Transformers without Normalization

A Photonic SRAM with Embedded XOR Logic for Ultra-Fast In-Memory Computing

Supervised fine tuning on curated data is reinforcement learning

Magentic-UI: Towards Human-in-the-Loop Agentic Systems

Why Neural Networks Can Discover Symbolic Structures

AlphaGo Moment for Model Architecture Discovery

Self-attention transforms a prompt into a low-rank weight-update

Assessing interstellar comet 3I/ATLAS with the 10.4M Gran Telescopio Canarias

RE#: High performance derivative-based regular expression matching (2024)

Enhancing COBOL Code Explanations: A Multi-Agents Approach Using LLMs

An exponential improvement for Ramsey lower bounds

Chain of thought monitorability: A new and fragile opportunity for AI safety

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Torqued Accelerator Using Radiation from the Sun (Tars) for Interstellar Payload

LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

WatchWitch: Interoperability, Privacy, and Autonomy for the Apple Watch

Two-photon 3D printing of functional microstructures inside living cells [pdf]

BeePL: Correct-by-compilation kernel extensions

Dissecting the NVIDIA Blackwell Architecture with Microbenchmarks

The impact of file position on code review

Dynamical origin of Theia, the last giant impactor on Earth

Techno-feudalism and the rise of AGI: A future without economic rights?

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Nuclear Explosion for Carbon Sequestration

The Symbol Grounding Problem (1990)

The JPEG XL Image Coding History, Features, Coding Tools, Design Rationale

Do AI Tutors Empower or Enslave Learners?

More →