Skip to content

NVIDIA H100 Llama 3.1 Inference Performance Testing on VALDI

Benchmarking Results

We conducted extensive benchmarks of Llama 3.x across NVIDIA H100 GPUs. We implemented a custom script to measure Tokens Per Second (TPS) throughput. We expect to enhance the testing approach over-time, but here are our initial findings:

NVIDIA 1xH100 80G

Model Batch Size Max Input Avg TPS stdev TPS
Meta-Llama-3.1-8B 64 1000 3621.02 157.94

Try It Yourself

Want to experiment with Llama 3.1 optimization? Check out our GPU marketplace to rent high-performance NVIDIA GPUs. Sign up now and start optimizing in minutes!

Rent a GPU Now


Keywords: Llama 3.1, 70B model, 405B model, NVIDIA GPU, performance optimization, inference optimization, NLP, large language models