Template Configuration Basics
This documentation provides an overview of a Release AI template configuration, focusing on how to utilize parameters as user inputs, select GPU resources, and run specific jobs like "pulling models in Ollama" and "running data ingestion" for vector databases.
Configuration Overview
The configuration file defines various services, resources, and workflows necessary to set up and manage an AI environment. Here’s a breakdown of the key components:
Hostnames
The hostnames section specifies the domain naming convention for different services based on the environment ID (env_id
) and the domain (domain
). Note that hostnames require a service with node ports.
hostnames:
- api: api-${env_id}-genai.${domain}
- bot: bot-${env_id}-genai.${domain}
- front-end: front-end-${env_id}-genai.${domain}
- loader: loader-${env_id}-genai.${domain}
- pdf-bot: pdf-bot-${env_id}-genai.${domain}
- ollama: ollama-${env_id}-genai.${domain}
Default Resources
Specifies the default CPU and memory limits and requests, as well as the number of replicas for scalability. These are applied to all services and jobs defined in your template. You can override these values on a per service or job basis.
resources:
cpu:
limits: 1000m
requests: 100m
memory:
limits: 1Gi
requests: 100Mi
replicas: 1
Ingress
Configures the ingress controller to manage incoming HTTP requests for uploading documents.
ingress:
proxy_body_size: 30m
proxy_buffer_size: 64k
proxy_buffering: true
proxy_buffers_number: 4
proxy_max_temp_file_size: 1024m
proxy_read_timeout: '180'
proxy_send_timeout: '180'
Shared Volumes
Defines shared storage volumes for persistent data storage.
shared_volumes:
- name: models
size: 40Gi
type: persistent
Parameters
Parameters are user inputs that can be dynamically configured during deployment.
parameters:
- name: llm
type: string
description: Can be any Ollama model tag, or gpt-4 or gpt-3.5 or claudev2
default: llama2
- name: embedding_model
type: string
description: Can be sentence_transformer, openai, aws, ollama or google-genai-embedding-001
default: sentence_transformer
Parameters in the configuration allow for dynamic user inputs during the deployment. This flexibility ensures the environment can be tailored to specific requirements without modifying the core configuration file. For instance, the llm
and embedding_model
parameters can be set to different values based on the desired models and embeddings.
Services
Defines the different services that make up the AI stack, including their configurations, environment variables, and resource requests. The example Ollama service configuration includes a persistent volume and mount point to store downloaded models. This ensures that models are retained even if the service is restarted.
services:
- name: ollama
image: ollama/ollama
command:
- "/bin/ollama"
args:
- serve
envs:
- key: LLM
value: "${parameters.llm}"
- key: EMBEDDING_MODEL
value: "${parameters.embedding_model}"
ports:
- type: node_port
target_port: '11434'
port: '11434'
loadbalancer: false
memory:
limits: 16Gi
requests: 6Gi
cpu:
limits: 4
requests: 1
replicas: 1
node_selector:
- key: nvidia.com/gpu
value: 'true'
volumes:
- name: shmem
type: shmem
size: 16Gi
mount_path: "/dev/shm"
- claim: models
mount_path: "/models"
GPU Resources
To select GPU resources, use the node_selector
field to specify nodes with GPU capabilities:
services:
- name: dummy-service
image: dummy/image
node_selector:
- key: nvidia.com/gpu
value: 'true'
Running Jobs
Jobs are defined to perform specific tasks, such as pulling models or ingesting data. An example job configuration for pulling models with Ollama:
jobs:
- name: pull-model
image: releaseai/genai-stack-pull-model
command:
- bb
- "-f"
- pull_model.clj
envs:
- key: OLLAMA_BASE_URL
value: http://ollama:11434
- key: LLM
value: "${parameters.llm}"
Workflows
Workflows orchestrate the sequence of tasks to be executed, ensuring dependencies are respected. The setup
, patch
, and teardown
workflows are used for environment lifecycle. You can also define custom workflows which can be kicked off via the CLI or manually via the user interface.
Example setup workflow:
workflows:
- name: setup
parallelize:
- step: services-parallel
wait_for_finish: false
tasks:
- services.front-end
- step: services-0
tasks:
- services.database
- services.ollama
- step: services-1
tasks:
- jobs.pull-model
- step: services-2
tasks:
- services.api
- services.bot
- services.loader
- services.pdf-bot
Example custom workflow:
workflows:
- name: rag-ingest
parallelize:
- step: rag-ingest
tasks:
- jobs.rag-ingest
Last updated
Was this helpful?