NVidia L40S
| Supermicro news • NewsNVidia L40S (GPU-NVL40S)
Specification Nvidia L40S | |
GPU Architecture | NVIDIA Ada Lovelace architecture |
GPU Memory | 48GB GDDR6 with ECC |
Memory Bandwidth | 864GB/s |
Interconnect Interface | PCIe Gen4 x16: 64GB/s bidirectional |
NVIDIA Ada Lovelace Architecture-Based CUDA® Cores | 18,176 |
NVIDIA Third-Generation RT Cores | 142 |
NVIDIA Fourth-Generation Tensor Cores | 568 |
RT Core Performance TFLOPS | 212 |
FP32 TFLOPS | 91.6 |
TF32 Tensor Core TFLOPS | 183 I 366* |
BFLOAT16 Tensor Core TFLOPS | 362.05 I 733* |
FP16 Tensor Core | 362.05 I 733* |
FP8 Tensor Core | 733 I 1,466* |
Peak INT8 Tensor TOPS | 733 I 1,466* |
Peak INT4 Tensor TOPS | 733 I 1,466* |
Form Factor | 4.4" (H) x 10.5" (L), dual slot |
Display Ports | 4x DisplayPort 1.4a |
Max Power Consumption | 350W |
Power Connector | 16-pin |
Thermal | Passive |
Virtual GPU (vGPU) Software Support | Yes |
vGPU Profiles Supported | See virtual GPU licensing guide |
NVENC I NVDEC | 3x l 3x (includes AV1 encode and decode) |
Secure Boot With Root of Trust | Yes |
NEBS Ready | Level 3 |
Multi-Instance GPU (MIG) Support | No |
NVIDIA® NVLink® Support | No |
*With Sparsity |
- The new Ada Lovelace architecture includes a new multi-stream multiprocessor, fourth-generation Tensor cores, third-generation RT cores, and 91.6 teraflops FP32 performance.
- Experience the power of generative artificial intelligence, LLM training, and inference capabilities through features like the Transformer Engine — FP8, tensor performance exceeding 1.5 petaflops*, and a large L2 cache.
- Unleash unparalleled 3D graphics and rendering capabilities with 212 teraflops RT core performance, DLSS 3.0 for AI frame generation, and manipulation of shader module execution order.
- Increase multimedia acceleration with 3 encoding and decoding engines, 4 JPEG decoders, and support for AV1 encoding and decoding.
Why NVidia L40S - Key Features
- Impressive performance. For LLM, better performance than even HGX A100 in many scenarios, including GPT-170B level, except for large-scale training from scratch.
- Ideal for utilizing pre-trained base models from NVIDIA, open source types, and fine-tuning. B-etter availability (shortened lead time – available from September)
- Encompasses graphics, robust multimedia engines (unavailable with A100/H100)
- 20-25% better price than A100."
Benefits for customers considering L40S instead of H100 or A100.
- What is the workload?
- Are you using Generative AI/large language models (LLM), training a large model from scratch with a massive dataset, or fine-tuning a pre-trained model?
- Is most of your inference based on pre-trained models?
- Are you planning to run HPC workloads such as scientific/engineering simulations? Is FP64 precision important?
- Does your workload involve graphics, video encoding/decoding/transcoding?
- Will these be edge applications?
- What are the relevant benchmarks for the workload?
- What is the scale, how many GPUs are needed?
- For example, 4000 L40S with FP8 precision can fully train GPT170B with 300B tokens in less than 4 days, faster and cheaper than HGX A100.
- Any specific technical specifications or bottlenecks? E.g., GPU memory, memory bandwidth, GPU Interconnect, and latency?
Important:
- Nvidia L40S does not support NVLink
- NVidia L40S is cheaper~15% than A100
Related Pages
- Serwery Supermicro dedykowane dla NVidia L40S
- Serwery Gigabyte dedykowane dla NVidia L40S
- ChatGPT New Liquid-Cooled Workstations: Supermicro SYS-551A-T and Supermicro SYS-751GE-TNRT-NV1 Designed for AI
- New GIGABYTE G363-SR0 and G593-SD2 Servers for AI and HPC:
- Artificial Intelligence (AI) ChatGPT, Bing, Bard - part 1
- Supermicro GPU Platforms