>progscrape ▒
All the news that's fit to scrape
Subscribe to search feed
Trending tags:
ai
github.com
rust
trump
psychology
linux
health
microsoft
show
apple
go
psypost.org
vibecoding
environment
brain
web
arstechnica.com
theguardian.com
theregister.com
nature.com
Evaluating the Evaluation: A Benchmarking Checklist
4 years ago
brendangregg.com
Related Stories
Benchmarking is hard, sometimes
16 days ago
vondra.me
database
performance
Why hasn’t partial evaluation been applied to Pandas?
16 days ago
reddit.com
compiler
Benchmarking Strategies for Non-Standard Cognitive Architectures
6 days ago
i.redd.it
Wrote about benchmarking and profiling in golang
2 days ago
reddit.com
go
To Mock Or Not To Mock Your Auth: The Checklist
18 days ago
fusionauth.io
LongCodeBench: Evaluating Coding LLMs at 1M Context Windows
16 days ago
arxiv.org
llm
windows
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
21 days ago
arxiv.org
ai
Canadian universities grapple with evaluating students amid AI cheating fears
14 days ago
cbc.ca
ai
canada
NASA ‘evaluating’ opportunities to launch rockets to Mars during Trump presidency
11 days ago
msn.com
nasa
rocket
trump
3 Years of Remote Work
31 days ago
brendangregg.com
Benchmarking Zasper versus JupyterLab
52 days ago
reddit.com
go
Benchmarking Crimes Meet Formal Verification
47 days ago
microkerneldude.org
formalmethods
osdev
Does tust define evaluation order?
27 days ago
reddit.com
rust
Recursive Data Structures and Lazy Evaluation
52 days ago
romanliutikov.com
javascript
OpenBSD IO Benchmarking: How Many Jobs Are Worth It?
30 days ago
rsadowski.de
bsd
openbsd
testing
Small but powerful dummy Object generator for Testing & Benchmarking!
26 days ago
npmjs.com
javascript
Checklist for software engineers who think there's no growth without working at scale
50 days ago
bhupesh.me
HealthBench – An evaluation for AI systems and human health
40 days ago
openai.com
ai
`overflow evaluating the requirement` when using simple trait bounds for error handling
34 days ago
reddit.com
rust
Can You Trust Code Copilots? Evaluating LLMs from a Code Security Perspec
34 days ago
arxiv.org
llm
CMU TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
49 days ago
arxiv.org
llm
Show HN: Pi Co-pilot – Evaluation of AI apps made easy
31 days ago
withpi.ai
ai
show
9 Lazy Evaluation Features in Python That Optimize Your Code Quietly
21 days ago
yangzhou1993.medium.com
python
Design and evaluation of a parrot-to-parrot video-calling system (2023)
47 days ago
smithsonianmag.com
Querying 10M rows in 11 seconds: Benchmarking ConnectorX, Asyncpg and Psycopg vs QuestDB
40 days ago
reddit.com
python
Re-evaluating Fan-Out-on-Write vs. Fan-Out-on-Read Under Celebrity Traffic Spikes (2025)
43 days ago
codemia.io
The serum evaluation of sex hormones including DHEAs, DHT, testosterone in oral lichen planus patients
38 days ago
nature.com
health
Doom GPU Flame Graphs
53 days ago
brendangregg.com
gpu
performance
visualization
Benchmarking via github actions
2 months ago
reddit.com
github
go
Evaluating Agent-Based Program Repair at Google
2 months ago
arxiv.org
google