Skip to content

Setting up your Modular LLM backend server on VALDI

Ollama is a fantastic software that allows you to get open-source LLM models up and running quickly. This repository is the fastest way to chat with multiple LLMs, generate images using Stable Diffusion, and perform VLM analysis. The compiled code will deploy a Flask server on your choice of hardware.

Video walkthrough

Usage

  1. Complete all the prerequisite steps.
  2. Run the program by executing python3 app.py.

Interact with the server (UI)

The OllamaAPI can now operate directly on VALDI by leveraging an internal proxy server to host the frontend. Alternatively, you have the option to run the code as a backend server. Please refer to the guide below for instructions on running the OllamaAPI seamlessly on VALDI.

Hardware specs

Ensure that you have a VALDI machine with the following minimum hardware specifications:

  1. Ubuntu OS
  2. 32 GB of RAM
  3. 6 vCPUs
  4. 50 GB of Storage
  5. NVIDIA GPU

Prerequisites

  1. In order to run Ollama + Stable Diffusion, you must create a read-only HuggingFace API key. Here's how.
  2. Once you've generated an API Key, you need to update the configuration file: ./app/config.json
  3. Finally, install necessary dependencies and requirements:
# Update your machine (Linux Only)
sudo apt-get update

# Install pip
sudo apt-get install python3-pip 

# Navigate to the directory containing requirements.txt
./app

# Run the pip install command
pip install -r requirements.txt

# Enable port 5000 (ufw)
sudo ufw allow 5000
sudo ufw status

# CUDA update drivers 
sudo apt-get install -y cuda-drivers