Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows

Generative AI on PC is getting up to 4x faster via TensorRT-LLM for Windows, an open-source library that accelerates inference performance.

This is a companion discussion topic for the original entry at https://blogs.nvidia.com/blog/2023/10/17/tensorrt-llm-windows-stable-diffusion-rtx/