← Back to applications · FPGA · CPU · GPU

AI & Inference

Distributed inference with model protection

FPGA encrypts weights in transit, CPU orchestrates model serving, GPU runs inference.

FPGA

Weights + activations protection

Workloads

▸ In-transit weight encryption
▸ Model watermarking
▸ Anti-model-extraction

Performance

Weight load < 5 s for 70B

CPU

Model serving + routing

Workloads

▸ vLLM / TensorRT-LLM
▸ Batch scheduler
▸ OpenAI-compatible API

Performance

1 000 req/s /appliance

GPU

LLM / vision inference

Workloads

▸ 70B decoder Transformers
▸ Diffusion models
▸ Multimodal LLM

Performance

300 tok/s on 70B

Multi-agent scenario

A client submits a prompt: CPU routes to the right model, FPGA decrypts the weights into GPU memory, GPU runs inference, FPGA re-encrypts the response before return.

See related architecture