NVIDIA H100 Llama 3.1 Inference Performance Testing on VALDI

Benchmarking Results

We conducted extensive benchmarks of Llama 3.x across NVIDIA H100 GPUs. We implemented a custom script to measure Tokens Per Second (TPS) throughput. We expect to enhance the testing approach over-time, but here are our initial findings:

NVIDIA 1xH100 80G

Model	Batch Size	Max Input	Avg TPS	stdev TPS
Meta-Llama-3.1-8B	64	1000	3621.02	157.94

Try It Yourself

Want to experiment with Llama 3.1 optimization? Check out our GPU marketplace to rent high-performance NVIDIA GPUs. Sign up now and start optimizing in minutes!

Rent a GPU Now

Keywords: Llama 3.1, 70B model, 405B model, NVIDIA GPU, performance optimization, inference optimization, NLP, large language models