Skip to content

NVIDIA A100 Llama 3.1 Inference Performance Testing on VALDI

Benchmarking Results

We conducted extensive benchmarks of Llama 3.x across NVIDIA A100 GPUs. We implemented a custom script to measure Tokens Per Second (TPS) throughput. We expect to enhance the testing approach over-time, but here are our initial findings:

NVIDIA 1xA100 80G

Model Batch Size Max Input Avg TPS stdev TPS
Meta-Llama-3.1-8B 64 1000 1663.11 63.28

Try It Yourself

Want to experiment with Llama 3.1 optimization? Check out our GPU marketplace to rent high-performance NVIDIA GPUs. Sign up now and start optimizing in minutes!

Rent a GPU Now


Keywords: Llama 3.1, 70B model, 405B model, NVIDIA GPU, performance optimization, inference optimization, NLP, large language models