progscrape: www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html

Evaluating the Evaluation: A Benchmarking Checklist

4 years ago brendangregg.com

Benchmarking is hard, sometimes

16 days ago vondra.me database performance

Why hasn’t partial evaluation been applied to Pandas?

16 days ago reddit.com compiler

Benchmarking Strategies for Non-Standard Cognitive Architectures

6 days ago i.redd.it

Wrote about benchmarking and profiling in golang

2 days ago reddit.com go

To Mock Or Not To Mock Your Auth: The Checklist

18 days ago fusionauth.io

LongCodeBench: Evaluating Coding LLMs at 1M Context Windows

16 days ago arxiv.org llm windows

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

21 days ago arxiv.org ai

Canadian universities grapple with evaluating students amid AI cheating fears

14 days ago cbc.ca ai canada

NASA ‘evaluating’ opportunities to launch rockets to Mars during Trump presidency

11 days ago msn.com nasa rocket trump

3 Years of Remote Work

31 days ago brendangregg.com

Benchmarking Zasper versus JupyterLab

52 days ago reddit.com go

Benchmarking Crimes Meet Formal Verification

47 days ago microkerneldude.org formalmethods osdev

Does tust define evaluation order?

27 days ago reddit.com rust

Recursive Data Structures and Lazy Evaluation

52 days ago romanliutikov.com javascript

OpenBSD IO Benchmarking: How Many Jobs Are Worth It?

30 days ago rsadowski.de bsd openbsd testing

Small but powerful dummy Object generator for Testing & Benchmarking!

26 days ago npmjs.com javascript

Checklist for software engineers who think there's no growth without working at scale

50 days ago bhupesh.me

HealthBench – An evaluation for AI systems and human health

40 days ago openai.com ai

`overflow evaluating the requirement` when using simple trait bounds for error handling

34 days ago reddit.com rust

Can You Trust Code Copilots? Evaluating LLMs from a Code Security Perspec

34 days ago arxiv.org llm

CMU TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

49 days ago arxiv.org llm

Show HN: Pi Co-pilot – Evaluation of AI apps made easy

31 days ago withpi.ai ai show

9 Lazy Evaluation Features in Python That Optimize Your Code Quietly

21 days ago yangzhou1993.medium.com python

Design and evaluation of a parrot-to-parrot video-calling system (2023)

47 days ago smithsonianmag.com

Querying 10M rows in 11 seconds: Benchmarking ConnectorX, Asyncpg and Psycopg vs QuestDB

40 days ago reddit.com python

Re-evaluating Fan-Out-on-Write vs. Fan-Out-on-Read Under Celebrity Traffic Spikes (2025)

43 days ago codemia.io

The serum evaluation of sex hormones including DHEAs, DHT, testosterone in oral lichen planus patients

38 days ago nature.com health

Doom GPU Flame Graphs

53 days ago brendangregg.com gpu performance visualization

Benchmarking via github actions

2 months ago reddit.com github go

Evaluating Agent-Based Program Repair at Google

2 months ago arxiv.org google

>progscrape ▒

All the news that's fit to scrape