Instant datasets on GCP

How to use Instant Datasets in Release on GCP

Access to data can be a challenge when developing software.

Seed data allows you to generate a simple and consistent dataset, but such a dataset is unlikely to be reflective of what your application will be accumulating in production. As your application changes, access to data becomes yet another problem to manage.

A more optimal solution to the problem of accessing data would be to have a pool of data resources in various forms – such as in a database or cache, in search infrastructure, and so on – ready to be used immediately by an environment for the duration of the environment's life.

This is the solution Instant Datasets provide. With minimal set up, you can have production-replica data available to any of your environments, instantly, regardless of dataset size.

Release currently supports the Instant Datasets feature for databases on Google Cloud and AWS.

Set up an Instant Dataset in Release

In order to create Instant Datasets in Release, you'll need a snapshot to restore from. Ensure you have automatic snapshots set up for your Cloud SQL database. Take a look at the Google Cloud SQL document About Cloud SQL backups to set up automatic snapshots.

Each Instant Dataset is limited to a single account.

Create an Instant Dataset

Log on to Release and click on Configurations. Open the Datasets tab under the "Deploy" heading on the sidebar to go to the Datasets screen. Click the Create Dataset button to create a new Instant Dataset.

Complete the fields as follows:

Dataset Name: Give your dataset a name. You can choose any name that will help you remember what this dataset contains.
Cluster: Instant Datasets must be assigned to a cluster. You can select from a list of created clusters in the dropdown. If you have not created a cluster yet, read the clusters documentation to learn how to create one.
Select a Snapshot: These are the snapshots available and accessible by the cluster you selected.
Instance Type: This refers to the database instance type in GCP Cloud Sql. Refer to this document for additional information.
Database Name: This is the name of the database that will be created in the Instant Dataset. This is the name an application will use to connect to the underlying database in the Instant Dataset.
User Password: This is the password an application will use to connect to the underlying database in the Instant Dataset.
Number of available Instant Databases: This is the target number of available databases in the Instant Dataset pool that are free to checkout at any time. Each time an application environment is created that uses one of the instances, the system starts the creation of another database in the set. The ideal size will be the number of "spare" instances you need to keep up with environment creation. For instance, if you create two environments within a few minutes of each other before the pool is replenished, you would likely want at least two instances available at any given time.

We generally recommend starting with two available Instant Databases if you unsure. You can increase this number later, if you need to. We do not recommend setting the pool size to one, unless you are sure that you will not run out of new database instances for a new environment while a replacement is being generated.

If you ever find that environments are delayed or fail to be created during Instant Dataset checkouts, you will need to increase the number of available Instant Databases. If you find that you have a lot of database instances sitting idle all day, you can reduce the number of available Instant Databases, but take care not to make the number too small.

Click Create to begin the process of initializing the database instances. This process could take a few minutes to a few hours to complete, depending on the size of the database.

Once the database is created, it takes some time to restore the data from the snapshot, depending on the size of the snapshot.

Once the dataset is ready to be used, it will transition to an Active state and environments can use it when they are deployed.

Click on a dataset to see more details about it.

Set up your application to use Instant Datasets

You will need to modify the Application Template and environment configuration for environments that will use Instant Datasets.

If an Instant Dataset was not set up to work with every ephemeral environment through its default configuration, you can add it explicitly to an environment you create. The syntax is the same whether Release automatically creates the environment or you do it manually.

In the Application Template, add a datasets line. Give it a name property, with the name of your Instant Dataset as the value:

environment_templates:
- name: ephemeral
  datasets:
  - name: release-dataset

In the environment configuration, create a mapping using the "Generated Environment Variables" section in the Instant Dataset details page.

---
mapping:
  DATABASE_HOST: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_HOST
  DATABASE_PASSWORD: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_PASS
  DATABASE_USER: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_USER

Your environment variables are on the left side of the colon, and the generated environment variables are on the right.

Notice: These changes will only be propagated to newly created environments. You cannot apply a new Instant Dataset configuration retroactively to an existing environment. We are working on fixes to update this experience in the future.

Deploy your application to use your Instant Dataset

Your application is now ready to use your Instant Dataset!

Whenever you deploy an ephemeral environment, it will checkout one of the available (active) databases to be used while the environment exists.
The Instant Dataset will then create an additional database to maintain the available database count.

Modify an existing Instant Dataset

Change the target number of available databases

To change the target size of an Instant Dataset, go to the Datasets tab in account settings and click on the edit button on the right (pencil icon) for the dataset that you want to modify.

Reducing the number of available databases will not destroy any databases currently in use by an environment.

Change the Cloud Sql instance type

Changing the instance type will destroy ALL standby databases in the Instant Dataset. You might not be able to create a new environment during this window, because there will be no database instances available to checkout. You should only perform this step during off-hours, when you are unlikely to make new environments.

To change the instance type for an Instant Dataset, go to the Datasets tab in account settings and click on the edit button on the right (pencil icon) for the dataset that you want to modify.

Advanced Modifications

Expanding the Advanced drop-down allows you to change:

Which snapshot to be used for the next refresh
Which snapshot refresh strategy to be used for the next refresh
How many days, if any, to delay the snapshot selection. This defaults to zero which means no delay is applied by default. This only applies when Latest is the selected strategy.

The default snapshot refresh strategy is Latest.

By default there is no snapshot delay.

The snapshot refresh strategy can be set to one of the following:

Latest: use the most recently available snapshot, taking into account any delay, and apply that during refresh
Static: use the currently selected snapshot and do not update it during refresh.

Changing the Instant Dataset master password

On the Edit Dataset page, you can change the master password for the Instant Dataset. This will change the password for all databases in the Instant Dataset.

PreviousInstant datasets on AWS NextInstant dataset tasks

Last updated 1 year ago

Was this helpful?