NVIDIA H100 Llama 3.1 Inference Performance Testing on VALDI
Benchmarking Results
We conducted extensive benchmarks of Llama 3.x across NVIDIA H100 GPUs. We implemented a custom script to measure Tokens Per Second (TPS) throughput. We expect to enhance the testing approach over-time, but here are our initial findings:
NVIDIA 1xH100 80G
Model | Batch Size | Max Input | Avg TPS | stdev TPS |
---|---|---|---|---|
Meta-Llama-3.1-8B | 64 | 1000 | 3621.02 | 157.94 |
Try It Yourself
Want to experiment with Llama 3.1 optimization? Check out our GPU marketplace to rent high-performance NVIDIA GPUs. Sign up now and start optimizing in minutes!
Keywords: Llama 3.1, 70B model, 405B model, NVIDIA GPU, performance optimization, inference optimization, NLP, large language models