Template Configuration Basics

This documentation provides an overview of a Release AI template configuration, focusing on how to utilize parameters as user inputs, select GPU resources, and run specific jobs like "pulling models in Ollama" and "running data ingestion" for vector databases.

Configuration Overview

The configuration file defines various services, resources, and workflows necessary to set up and manage an AI environment. Here’s a breakdown of the key components:

Hostnames

The hostnames section specifies the domain naming convention for different services based on the environment ID (env_id) and the domain (domain). Note that hostnames require a service with node ports.

hostnames:
- api: api-${env_id}-genai.${domain}
- bot: bot-${env_id}-genai.${domain}
- front-end: front-end-${env_id}-genai.${domain}
- loader: loader-${env_id}-genai.${domain}
- pdf-bot: pdf-bot-${env_id}-genai.${domain}
- ollama: ollama-${env_id}-genai.${domain}

Default Resources

Specifies the default CPU and memory limits and requests, as well as the number of replicas for scalability. These are applied to all services and jobs defined in your template. You can override these values on a per service or job basis.

resources:
  cpu:
    limits: 1000m
    requests: 100m
  memory:
    limits: 1Gi
    requests: 100Mi
  replicas: 1

Ingress

Configures the ingress controller to manage incoming HTTP requests for uploading documents.

ingress:
  proxy_body_size: 30m
  proxy_buffer_size: 64k
  proxy_buffering: true
  proxy_buffers_number: 4
  proxy_max_temp_file_size: 1024m
  proxy_read_timeout: '180'
  proxy_send_timeout: '180'

Shared Volumes

Defines shared storage volumes for persistent data storage.

shared_volumes:
- name: models
  size: 40Gi
  type: persistent

Parameters

Parameters are user inputs that can be dynamically configured during deployment.

parameters:
- name: llm
  type: string
  description: Can be any Ollama model tag, or gpt-4 or gpt-3.5 or claudev2
  default: llama2
- name: embedding_model
  type: string
  description: Can be sentence_transformer, openai, aws, ollama or google-genai-embedding-001
  default: sentence_transformer

Parameters in the configuration allow for dynamic user inputs during the deployment. This flexibility ensures the environment can be tailored to specific requirements without modifying the core configuration file. For instance, the llm and embedding_model parameters can be set to different values based on the desired models and embeddings.

Services

Defines the different services that make up the AI stack, including their configurations, environment variables, and resource requests. The example Ollama service configuration includes a persistent volume and mount point to store downloaded models. This ensures that models are retained even if the service is restarted.

services:
- name: ollama
  image: ollama/ollama
  command:
  - "/bin/ollama"
  args:
  - serve
  envs:
  - key: LLM
    value: "${parameters.llm}"
  - key: EMBEDDING_MODEL
    value: "${parameters.embedding_model}"
  ports:
  - type: node_port
    target_port: '11434'
    port: '11434'
    loadbalancer: false
  memory:
    limits: 16Gi
    requests: 6Gi
  cpu:
    limits: 4
    requests: 1
  replicas: 1
  node_selector:
  - key: nvidia.com/gpu
    value: 'true'
  volumes:
  - name: shmem
    type: shmem
    size: 16Gi
    mount_path: "/dev/shm"
  - claim: models
    mount_path: "/models"

GPU Resources

To select GPU resources, use the node_selector field to specify nodes with GPU capabilities:

services:
- name: dummy-service
  image: dummy/image
  node_selector:
  - key: nvidia.com/gpu
    value: 'true'

Running Jobs

Jobs are defined to perform specific tasks, such as pulling models or ingesting data. An example job configuration for pulling models with Ollama:

jobs:
- name: pull-model
  image: releaseai/genai-stack-pull-model
  command:
  - bb
  - "-f"
  - pull_model.clj
  envs:
  - key: OLLAMA_BASE_URL
    value: http://ollama:11434
  - key: LLM
    value: "${parameters.llm}"

Workflows

Workflows orchestrate the sequence of tasks to be executed, ensuring dependencies are respected. The setup, patch, and teardown workflows are used for environment lifecycle. You can also define custom workflows which can be kicked off via the CLI or manually via the user interface.

Example setup workflow:

workflows:
- name: setup
  parallelize:
  - step: services-parallel
    wait_for_finish: false
    tasks:
    - services.front-end
  - step: services-0
    tasks:
    - services.database
    - services.ollama
  - step: services-1
    tasks:
    - jobs.pull-model
  - step: services-2
    tasks:
    - services.api
    - services.bot
    - services.loader
    - services.pdf-bot

Example custom workflow:

workflows:
- name: rag-ingest
  parallelize:
  - step: rag-ingest
    tasks:
    - jobs.rag-ingest

Last updated