Fine Tuning LlamaX

Using Release.ai to fine tune Llama based models.

Training and tuning in house models is a great way to achieve improved performance and results from LLMs without resorting to techniques that slow down and increase cost of inference, such as RAG and context-stuffing.

Creating a fine tuning application

From the Applications page, select Create Application then Create from Template in the drop down. Select Runnable as this is a job that will run and complete. For this example we'll select Axolotl as it's the simplest to get working. Click Finish.

Configuring your fine tuning application

To allow your job to save models to S3, you must link the application template to your S3 bucket. First you'll need to find your bucket name, which is the name of the your cluster, plus it's Release ID, plus static-builds, separated with dashes.

To get the cluster's Release ID, click on the cluster you want to use under Configuration > Clusters and copy the last part of the URL. For our example cluster called release-ai with ID of 0ezcy7w the bucket name should be release-ai-0ezcy7w-static-builds.

At the top level of your application template, define the s3_volumes.

s3_volumes:
- bucket: release-ai-0ezcy7w-static-builds

Then under the axolotl service, add the s3 mount under the volumes array.

  - type: s3
    bucket: release-ai-0ezcy7w-static-builds
    mount_path: "/bucket"

Running the job

Now click the Run Job button. Model location or name can be either a HuggingFace model ID, or an S3 location of a model. For this example we'll use meta-llama/Meta-Llama-3-8B. Tuning dataset location should be an S3 path to a .json file containing tuning data. It should have the following format:

{"input": "...","output": "..."}

An example from our tuning data looks like:

{"input": "How are Release environments built?", "output": "Release environments are built from an Application Template using default environment variables. The Application Template defines high-level concepts like the Kubernetes cluster, domain, hostnames, and resources allocated to your services."}
{"input": "How do developers interact with Release?","output": "Developers generally interact with Release by creating a pull request in their VCS (version control system), which automatically creates an environment with the code changes in the PR. Webhooks are calls made from your VCS back to Release when certain events happen."}
{"input": "How does Release eliminate time spent managing environments?","output": "With Release, all of the management and tooling required to build a flexible environment ecosystem comes out of the box. You can create an application template using default environment variables."}

You can also use datasets from HuggingFace such as mhenrichsen/alpaca_2k_test.

For this example we'll leave the model type as llama3 and click the Run Job button.

Once the job has finished, there will be a folder called axolotl-out-YYYY-MM-DD created in your S3 bucket.

Now you can take this folder, and load it into your favorite inference server to start sending it queries.

PreviousCustom Workflows NextServing Inference

Last updated 1 year ago

Was this helpful?