Loading...

Tag trends are in beta. Feedback? Thoughts? Email me at [email protected]

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Why LLMs can't really build software

LLMs aren't world models

Tell HN: I'm tired of formulaic, "LLM house style" show HN submissions

I clustered four Framework Mainboards to test LLMs

A new study analysed over 1 million research papers and found growing use of large language models, especially in computer science, where up to 22% show signs of LLM editing.

A comprehensive analysis of software package hallucinations by code generating LLMs found 19.7% of the LLM recommended packages did not exist, with open-source models hallucinating far more frequently (21.7%) compared to commercial models (5.2%)

AI’s Serious Python Bias: Concerns of LLMs Preferring One Language

LLM Inflation

LLMs applied in long-term care are prone to downplay women's health issues

Yet Another LLM Rant

Convo-Lang: LLM Programming Language and Runtime

Design Patterns for Securing LLM Agents Against Prompt Injections

Actual LLM agents are coming

DoubleAgents: Fine-Tuning LLMs for Covert Malicious Tool Calls

Evaluating LLMs playing text adventures

Blocking LLMs from your website cuts you off from next-generation search

LLM Coding Assistant Census

Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching

Can modern LLMs count the number of b's in "blueberry"?

An LLM does not need to understand MCP

LLMs as Parts of Systems

A new study found that LLMs like ChatGPT fell for fake clinical details 50–82% of the time. Even the best prompts could not stop all hallucinations

Why deterministic output from LLMs is nearly impossible

Show HN: Evaluating LLMs on creative writing via reader usage, not benchmarks

Fine-tuned small LLMs can beat large ones with programmatic data curation

Arch-Router: Aligning LLM Routing with Human Preferences

I let LLMs write an Elixir NIF in C; it mostly worked

LLM over DNS

LLM advises to delete the Linux dynamic linker during a troubleshooting session

More →