Fine Tuning LlamaX
Using Release.ai to fine tune Llama based models.
Training and tuning in house models is a great way to achieve improved performance and results from LLMs without resorting to techniques that slow down and increase cost of inference, such as RAG and context-stuffing.
Creating a fine tuning application
From the Applications page, select Create Application
then Create from Template
in the drop down. Select Runnable
as this is a job that will run and complete. For this example we'll select Axolotl
as it's the simplest to get working. Click Finish
.
Configuring your fine tuning application
To allow your job to save models to S3, you must link the application template to your S3 bucket. First you'll need to find your bucket name, which is the name of the your cluster, plus it's Release ID, plus static-builds
, separated with dashes.
To get the cluster's Release ID, click on the cluster you want to use under Configuration
> Clusters
and copy the last part of the URL. For our example cluster called release-ai
with ID of 0ezcy7w
the bucket name should be release-ai-0ezcy7w-static-builds
.
At the top level of your application template, define the s3_volumes
.
s3_volumes:
- bucket: release-ai-0ezcy7w-static-builds
Then under the axolotl
service, add the s3 mount under the volumes
array.
- type: s3
bucket: release-ai-0ezcy7w-static-builds
mount_path: "/bucket"
Running the job
Now click the Run Job
button. Model location or name can be either a HuggingFace model ID, or an S3 location of a model. For this example we'll use meta-llama/Meta-Llama-3-8B
. Tuning dataset location should be an S3 path to a .json file containing tuning data. It should have the following format:
{"input": "...","output": "..."}
An example from our tuning data looks like:
{"input": "How are Release environments built?", "output": "Release environments are built from an Application Template using default environment variables. The Application Template defines high-level concepts like the Kubernetes cluster, domain, hostnames, and resources allocated to your services."}
{"input": "How do developers interact with Release?","output": "Developers generally interact with Release by creating a pull request in their VCS (version control system), which automatically creates an environment with the code changes in the PR. Webhooks are calls made from your VCS back to Release when certain events happen."}
{"input": "How does Release eliminate time spent managing environments?","output": "With Release, all of the management and tooling required to build a flexible environment ecosystem comes out of the box. You can create an application template using default environment variables."}
You can also use datasets from HuggingFace such as mhenrichsen/alpaca_2k_test.
For this example we'll leave the model type as llama3
and click the Run Job
button.
Once the job has finished, there will be a folder called axolotl-out-YYYY-MM-DD
created in your S3 bucket.
Now you can take this folder, and load it into your favorite inference server to start sending it queries.
Last updated
Was this helpful?