Instant datasets on AWS

How to use Instant Datasets in Release on AWS

Access to data can be a challenge when developing software.

Seed data allows you to generate a simple and consistent dataset, but such a dataset is unlikely to be reflective of what your application will be accumulating in production. As your application changes, access to data becomes yet another problem to manage.

A more optimal solution to the problem of accessing data would be to have a pool of data resources in various forms – such as in a database or cache, in search infrastructure, and so on – ready to be used immediately by an environment for the duration of the environment's life.

This is the solution Instant Datasets provide. With minimal set up, you can have production-replica data available to any of your environments, instantly, regardless of dataset size.

Release currently supports the Instant Datasets feature for databases on Google Cloud and AWS

Why use Instant Datasets with Release?

At Release, we've used Amazon RDS to power our production application from the beginning. Amazon RDS automatically takes daily snapshots of your application databases and provides them to you in a readily accessible manner.

One of the major drawbacks to using RDS for environments other than production is the long spin-up time for any particular database, making it less than ideal for ephemeral or staging environments.

Instant Datasets solve this problem, while also providing your staging environments with production-like data.

Instant Datasets are a collection of databases that are ready to be used by an ephemeral environment. These datasets are based on the production snapshots created for you by Amazon RDS. Each time an environment that requires an Instant Dataset is created, another dataset is generated from the production snapshot so that you never run out of databases or have to wait for one to spin-up. This process allows access to production-like data instantly for any environment you create.

Set up an Instant Dataset in Release

In order to create Instant Datasets in Release, you'll need a snapshot to restore from. Ensure you have automatic snapshots set up for your Amazon RDS database. Take a look at the Amazon RDS document Working with backups to set up automatic snapshots.

Each Instant Dataset is limited to a single account.

Create an Instant Dataset

Log on to Release and click on Configurations. Open the Datasets tab under the "Deploy" heading on the sidebar to go to the Datasets screen. Click the Create Dataset button to create a new Instant Dataset.

Complete the fields as follows:

Dataset Name: Give your dataset a name. You can choose any name that will help you remember what this dataset contains.
Cluster: Instant Datasets must be assigned to a cluster. You can select from a list of created clusters in the dropdown. If you have not created a cluster yet, read the clusters documentation to learn how to create one.
Select a Snapshot: These are the snapshots available and accessible by the cluster you selected.

Instance Type: This refers to the database instance in Amazon RDS. Refer to this document for additional information.
Database Name: This is the name of the database that will be created in the Instant Dataset. This is the name an application will use to connect to the underlying database in the Instant Dataset.
User Password: This is the password an application will use to connect to the underlying database in the Instant Dataset.
Number of available Instant Databases: This is the target number of available databases in the Instant Dataset pool that are free to checkout at any time. Each time an application environment is created that uses one of the instances, the system starts the creation of another database in the set. The ideal size will be the number of "spare" instances you need to keep up with environment creation. For instance, if you create two environments within a few minutes of each other before the pool is replenished, you would likely want at least two instances available at any given time.

We generally recommend starting with two available Instant Databases if you unsure. You can increase this number later, if you need to. We do not recommend setting the pool size to one, unless you are sure that you will not run out of new database instances for a new environment while a replacement is being generated.

If you ever find that environments are delayed or fail to be created during Instant Dataset checkouts, you will need to increase the number of available Instant Databases. If you find that you have a lot of database instances sitting idle all day, you can reduce the number of available Instant Databases, but take care not to make the number too small.

Click Create to begin the process of initializing the database instances. This process could take a few minutes to a few hours to complete, depending on the size of the database.

Amazon RDS takes 5-6 minutes to create the database. Once the database is created, it takes some time to restore the data from the snapshot, depending on the size of the snapshot.

Once the dataset is ready to be used, it will transition to an Active state and environments can use it when they are deployed.

Click on a dataset to see more details about it.

Set up your application to use Instant Datasets

You will need to modify the Application Template and environment configuration for environments that will use Instant Datasets.

If an Instant Dataset was not set up to work with every ephemeral environment through its default configuration, you can add it explicitly to an environment you create. The syntax is the same whether Release automatically creates the environment or you do it manually.

In the Application Template, add a datasets line. Give it a name property, with the name of your Instant Dataset as the value:

environment_templates:
- name: ephemeral
  datasets:
  - name: release-dataset

In the environment configuration, create a mapping using the "Generated Environment Variables" section in the Instant Dataset details page.

The generated environment variables will be prefixed with the dataset name in uppercase and with underscores instead of any non-alphanumeric characters (example environment variable prefix: RELEASE_DATASET). Here is an example of how you can map the generated environment variables to your application's environment variables:

---
mapping:
  DATABASE_HOST: RELEASE_DATASET_RDS_DB_POOL_HOST
  DATABASE_PASSWORD: RELEASE_DATASET_RDS_DB_POOL_PASS
  DATABASE_USER: RELEASE_DATASET_RDS_DB_POOL_USER

Your environment variables are on the left side of the colon, and the generated environment variables are on the right.

Notice: These changes will only be propagated to newly created environments. You cannot apply a new Instant Dataset configuration retroactively to an existing environment. We are working on fixes to update this experience in the future.

Deploy your application to use your Instant Dataset

Your application is now ready to use your Instant Dataset!

Whenever you deploy an ephemeral environment, it will checkout one of the available (active) databases to be used while the environment exists.
The Instant Dataset will then create an additional database to maintain the available database count.

Modify an existing Instant Dataset

Change the target number of available databases

To change the target number of available databases, go to the Datasets tab in account settings and click on the dataset that you want to modify. Click the Edit button under the Actions dropdown to modify the dataset.

Reducing the number of available databases will not destroy any databases currently in use by an environment.

Change the Amazon RDS instance type

Changing the instance type will destroy ALL standby databases in the Instant Dataset. You might not be able to create a new environment during this window, because there will be no database instances available to checkout. You should only perform this step during off-hours, when you are unlikely to make new environments.

To change the instance type for an Instant Dataset, go to the Datasets tab in account settings and click on the dataset that you want to modify. Click the Edit button under the Actions dropdown to modify the dataset.

Advanced Modifications

Expanding the Advanced Settings drop-down allows you to change:

Which snapshot to be used for the next refresh
Which snapshot refresh strategy to be used for the next refresh
How many days, if any, to delay the snapshot selection. This defaults to zero which means no delay is applied by default. This only applies when Latest is the selected strategy.

The default snapshot refresh strategy is Latest.

By default there is no snapshot delay.

The snapshot refresh strategy can be set to one of the following:

Latest: use the most recently available snapshot, taking into account any delay, and apply that during refresh
Static: use the currently selected snapshot and do not update it during refresh.

Changing the Instant Dataset master password

On the Edit Dataset page, you can change the master password for the Instant Dataset. This will change the password for all databases in the Instant Dataset.

PreviousEnvironment presets NextInstant datasets on GCP

Last updated 8 months ago

Was this helpful?