A practitioner's guide to testing and running GPU clusters

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

Dragonfly: A large vision-language model with multi-resolution zoom

Based: Simple linear attention language models

Paving the way to efficient architectures: StripedHyena-7B

Together AI raises a $102.5M Series A

RedPajama v2 Open Dataset with 30T Tokens for Training LLMs

Llama 2 on togetherAI is as bad of a privacy nightmare as OpenAI

Llama 32K Context Released by Together AI