AI & Inference
Distributed inference with model protection
FPGA encrypts weights in transit, CPU orchestrates model serving, GPU runs inference.
FPGA
Weights + activations protection
Workloads
- ▸ In-transit weight encryption
- ▸ Model watermarking
- ▸ Anti-model-extraction
Performance
Weight load < 5 s for 70B
CPU
Model serving + routing
Workloads
- ▸ vLLM / TensorRT-LLM
- ▸ Batch scheduler
- ▸ OpenAI-compatible API
Performance
1 000 req/s /appliance
GPU
LLM / vision inference
Workloads
- ▸ 70B decoder Transformers
- ▸ Diffusion models
- ▸ Multimodal LLM
Performance
300 tok/s on 70B
Multi-agent scenario
A client submits a prompt: CPU routes to the right model, FPGA decrypts the weights into GPU memory, GPU runs inference, FPGA re-encrypts the response before return.