Aller au contenu principal
← Back to applications · FPGA · CPU · GPU

AI & Inference

Distributed inference with model protection

FPGA encrypts weights in transit, CPU orchestrates model serving, GPU runs inference.

FPGA

Weights + activations protection

Workloads
  • In-transit weight encryption
  • Model watermarking
  • Anti-model-extraction
Performance
Weight load < 5 s for 70B
CPU

Model serving + routing

Workloads
  • vLLM / TensorRT-LLM
  • Batch scheduler
  • OpenAI-compatible API
Performance
1 000 req/s /appliance
GPU

LLM / vision inference

Workloads
  • 70B decoder Transformers
  • Diffusion models
  • Multimodal LLM
Performance
300 tok/s on 70B

Multi-agent scenario

A client submits a prompt: CPU routes to the right model, FPGA decrypts the weights into GPU memory, GPU runs inference, FPGA re-encrypts the response before return.