The DeepSeek with vLLM on EKS project facilitates the deployment of DeepSeek models on AWS EKS (Elastic Kubernetes Service), optimized for GPU or NeuronCore. Below is a step-by-step guide on setting up the infrastructure, deploying the model, and integrating the chatbot UI.
chatbot-ui/
βββ application/
β βββ Dockerfile # Dockerfile for building the UI container
β βββ app.py # Chatbot UI application (Gradio)
β βββ requirements.txt # Python dependencies
β
βββ manifests/
β βββ deployment.yaml # Kubernetes Deployment for UI
β βββ ingress-class.yaml # Ingress configuration
β
βββ static/images/ # UI image assets
β
βββ vllm-chart/ # Helm chart for vLLM
β βββ values.yaml # Default configuration values
β βββ templates/
β βββ deployment.yaml # Pod configuration
β βββ service.yaml # Service configuration
β βββ _helpers.tpl # Template helpers
β
βββ .gitignore
βββ helm.tf # Terraform script for Helm configuration
βββ main.tf # Terraform infrastructure setup
βββ nodepool_automode.tf # Auto-scaling node pool configuration
βββ README.md # Documentation
Create VPC and EKS Cluster
main.tf using the terraform-aws-modules/eks/aws module.nodepool_automode.tf) to automatically select GPU/Neuron instances.Configure Node Pool (nodepool_automode.tf)
g5/g6/p5) or Neuron (inf2).Deploy Helm Chart (helm.tf)
deepseek_gpu: Runs on GPU.deepseek_neuron: Runs on Neuron.Configure vLLM Chart (vllm-chart/)
deployment.yaml: Defines the vLLM running pod.service.yaml: Configures the service to handle requests to vLLM.Build UI Container (Dockerfile)
python:3.12-slimrequirements.txt.uvicorn app:app --host 0.0.0.0 --port 7860.Deploy UI to Kubernetes (manifests/)
deployment.yaml: Creates a pod running the UI container.ingress-class.yaml: Configures Ingress with an Application Load Balancer (ALB).FastAPI to communicate with vLLM.Gradio for an interactive UI.g5/g6/p5).inf2).