Large Files And Sensitive Data

Some tests require data that are best not checked into source control. These data are typically either sensitive (e.g. cryptographic keys), large (e.g. large seed data), or both.

Small Sensitive Data

Small pieces of sensitive data, like API keys, are best configured using config variables.

Bulk Data: Attachments

Bulk data is best handled via a separate mechanism.

We generally recommend placing large files in an object store such as AWS’s S3, Azure Blob Storage, etc. You can pass authentication tokens into the build as sensitive data in order to create signed URLs for download during the build process if necessary. For details on handling large, sensitive objects, see below.

For less sensitive large files, usually the simplest way to pull them into the build is to us wget in a setup hook. You can cache these downloads with the build. For instance, in a pre_setup hook that places GeoLiteCity.dat in the db/geo sub-directory of your repository might look like this:

hooks:
  pre_setup:
    set -e
    wget -O db/geo/GeoLiteCity.dat https://mybucket.s3.amazonaws.com/test-data/GeoLiteCity.dat
    # other setup tasks go here

Solano CI also supports “attachments” which are references to external data sources, e.g. a file hosted in S3 or on your web server. Unless the files you need are unusually large or on the wrong side of a low-bandwidth network connection, we find that most users prefer to use ``wget``.

Solano CI will download attachments into your test environment at runtime. Solano CI may also choose to cache downloaded attachments internally to improve performance and save bandwidth. Sensitive data in attachments should be encrypted (see below).

Attachments are configured using config/solano.yml. The attachments section of the configuration file contains a hash. Each key is a path name relative to the repository root. This key points to a hash with two keys: url and hash. The url is an https URL for the file and the hash is the SHA-1 hash of the file. An example configuration for the publicly available GeoLiteCity geolocation database might look like this; the target directory must already exist.

attachments:
  'db/geo/GeoLiteCity.dat':
    url: 'https://mybucket.s3.amazonaws.com/test-data/GeoLiteCity.dat'
    hash: '2cae5b16ad50bd26b795f22bb30620d829648142'

Bulk Data: Sensitive Objects

If your tests need access to a large, sensitive object, we recommend encrypting the file and downloading it either directly from a pre_hook or through the attachment mechanism described above. In some cases, the sensitive object may be a ZIP file or tarball from Github.

Solano Labs has released a small wrapper around OpenSSL called s3store that is automatically installed into the Solano CI environment. S3store automates the process of encrypting files and storing them in S3 in your local environment and downloading and decrypting them inside Solano CI.

To use s3store for secure object storage, you will need to:

  1. Have an Amazon account or IAM identity with access to an S3 bucket of your choice. We recommend a separate IAM identity for uploading and downloading objects so that the identity used in Solano CI has only read permissions.
  2. Export the AWS region, key ID, and access key information to s3 store locally and use it to upload an encrypted blob
  3. Set configuration variables for your repository that will pass the read-only AWS identity (region, key ID, and access key) as well as the shared secret passphrase to the Solano CI build.
  4. Check in a Solano CI pre_hook that downloads the encrypted blob, decrypts it, and moves it into place using s3store (see below)

The s3store command expects four configuration environment variables to exported inside Solano CI:

  1. TDDIUM_S3_REGION - the S3 region to use; defaults to us-east-1
  2. TDDIUM_S3_KEY_ID - the AWS key ID to use
  3. TDDIUM_S3_SECRET - the AWS secret access key value to use
  4. TDDIUM_S3_PASSPHRASE - the passphrase to use with OpenSSL for encryption and decryption

You can also pass s3store a YAML configuration file with the first three values. When storing a blob, if a passphrase is not provided, s3store will automatically generate a secure one; when fetching a blob, s3store will interactively prompt for a passphrase if one is not supplied in the environment or the command line.

# Copyright (c) 2012, 2013, 2014, 2015, 2016 Solano Labs All Rights Reserved
namespace :tddium do  desc "solano pre hook"
  task :pre_hook do
    url="s3://solano-labs.s3.amazonaws.com/s3store/todo.enc"
    Kernel.system("s3store fetch #{url}")
    Kernel.system("mv todo.enc #{ENV['TDDIUM_REPO_ROOT']}/data/secret.dat")
  end
end

The s3store command implements two sub-commands: store and fetch. The store sub-command will use the value of the TDDIUM_S3_PASSPHRASE environment variable, if present, as the passphrase. Otherwise, it will use the argument to the -p option as the passphrase or prompt if given the -P option. If the TDDIUM_S3_PASSPHRASE evnironment variable is not set and neither -P nor -p is supplied on the command line, a passphrase will be automatically generated.

Usage:
  s3store store

Options:
  -P, [--passprompt]
  -p, [--passphrase=PASSPHRASE]
  -c, [--config=CONFIG]

In the typical use case, the s3store fetch sub-command will read the passphrase from the TDDIUM_S3_PASSPHRASE environment variable inside the Solano CI environment.

Usage:
  s3store fetch  [file]

Options:
  -p, [--passphrase=PASSPHRASE]
  -c, [--config=CONFIG]

Bulk Data: Sensitive Objects (Github)

To download sensitive objects from Github, we recommend creating an OAuth token on Github, adding the OAuth token as a config variable which will be exported into the environment, and then running a curl command to download the object from a pre_hook. Assuming you add the OAuth token as a repo-level config variable called GITHUB_OAUTH_TOKEN, your pre-hook would then contain a command such as:

curl -u "your_login:$GITHUB_OAUTH_TOKEN" -o $TMPDIR/repo-dev.zip \
    https://api.github.com/repos/user/repo/zipball/dev