Setup the Dify and vllm in AWS G4dn instance

The first step

  • launch the AWS ec2 type G4dn-xlarge
 1# initialization the default ebs volume
 2sudo file -s /dev/nvme2n1
 3lsblk -f
 4mkfs -t xfs /dev/nvme2n1
 5mount /dev/nvme2n1 /mnt
 6
 7# persistent the mount info to `/etc/fstab`
 8# view the UUID
 9blkid
10
11# write the information
12echo "UID=xxxxx-3047-437a-81f0-xxxxx /mnt xfs defaults,nofail 0 2" >> /etc/fstab
  • Extends the root ebs volume
1# modify the ebs volume size in aws console
2# extend the partition
3growpart /dev/nvme0n1 1
4# extend thf filesystem
5resize2fs /dev/nvme0n1p1
  • Install the docker-ce
 1# Add Docker's official GPG key:
 2sudo apt-get update
 3sudo apt-get install ca-certificates curl
 4sudo install -m 0755 -d /etc/apt/keyrings
 5sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
 6sudo chmod a+r /etc/apt/keyrings/docker.asc
 7
 8# Add the repository to Apt sources:
 9echo \
10  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
11  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
12  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
13sudo apt-get update
14sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
15
16sudo docker run hello-world
  • Install the nvidia-driver
1sudo apt-get install -y nvidia-driver-525 nvidia-dkms-525
2# to view the GPU information
3nvidia-smi

Enable the docker GPU runtime

 1# Add Nvidia repo to system
 2distribution=ubuntu22.04 && \
 3curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \
 4curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
 5# install the Nvidia container toolkit 
 6sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
 7# configure the docker runtime
 8sudo nvidia-ctk runtime configure --runtime=docker
 9# restart the docker daemon
10sudo systemctl restart docker
11docker info | grep Runtimes

Prepare the pytorch environment

1# pull the full functional docker image
2docker pull pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime
3# create local host path to mount to container
4mkdir -p /mnt/models
5# running the docker container
6docker run --gpus=all  -it -v /mnt/models:/models -p 8000:8000 pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime bash
7# install huggingface cli in container
8pip install huggingface_hub[cli]

Download the Model in Huggingface from container

1huggingface-cli download  Qwen/Qwen2.5-7B-Instruct --local-dir=./Qwen2.5-7B-Instruct/ --cache-dir=./cache --local-dir-use-symlinks=False --resume-download
2huggingface-cli download  facebook/opt-125m --local-dir=./opt-125m/ --cache-dir=./cache --local-dir-use-symlinks=False --resume-download

Launch the LLM model

1# in container `/models/Qwen2.5-7B-Instruct` folder
2# make sure the model folder is in current path
3# if the GPU is old maybe add the --dtype float 
4vllm serve Qwen2.5-7B-Instruct/ --dtype float
5vllm serve opt-125m/ --dtype float

Launch the Dify LLM platform use docker-compose

1git clone https://github.com/langgenius/dify.git
2cd dify/docker
3cp .env.example .env
4docker compose up -d
5
6docker compose ps
  • Access the Dify: http://your ip/install

Reference