Install GeneFlow¶

GeneFlow can be installed and run in any Linux environment, and is pre-installed in the CDC environment.

Requirements¶

At a minimum, GeneFlow requires a Linux environment with Python 3. The Python pip installer for GeneFlow handles all python dependencies.

Agave is optionally required if you want to run workflows in Agave (see https://agaveapi.co).

Install Dependencies in Ubuntu/Debian Systems¶

Install system-level dependencies in Ubuntu with the following commands:

sudo apt install python3 python3-dev git gcc

Install Dependencies in CentOS/RHEL Systems¶

Install system-level dependencies in CentOS with the following commands:

sudo yum install python36 python36-devel git gcc

Python Modules¶

You may also need the following Python modules:

pip install setuptools wheel

Prepare the CDC Environment to run GeneFlow¶

To use the pre-installed GeneFlow in the CDC environment, use the following instructions to prepare the environment and load the module.

Prepare the CDC Environment and Load the GeneFlow Module¶

Use the following steps to setup your CDC Linux environment to run GeneFlow. GeneFlow can be run from Biolinux or from any CDC Linux system that has access to the “modules” environment.

In your home directory, create a working directory.
mkdir ~/geneflow_work
Although GeneFlow output can be directed to any folder, create an output folder to help organize workflow outputs:
mkdir ~/geneflow_work/output
Load the GeneFlow module. Note that older versions of GeneFlow can also be loaded by replacing “latest” with the desired version number:
module load geneflow/latest

Prepare Agave in the CDC Environment¶

If you want to run workflows in Agave, you will need to initialize your CDC Agave environment.

Follow these instructions to prepare your Agave environment. Note that these instructions only need to be performed once.

Load the Agave CLI tools:
module load cobra-cli/0.1
Initialize your client:
cobra-init
cobra-init will prompt you for your username and password.
Create an execution system:
cobra-systems-create
Note the name of the new execution system, which will be formatted as cobra-hpc-aspen-[USER]-[DATE].

Create GeneFlow output and work directories in your Agave home:
files-mkdir -N geneflow-output /[USER] files-mkdir -N geneflow-work /[USER]

Prepare the Agave configuration file:

Create a new file with agave environment parameters:
cd ~/geneflow_work
vi ./agave-params.yaml
Add the following to the file:
%YAML 1.1
---
agave:
  # prefix for app name. For user apps, use your username.
  # For public apps, use 'public'.
  appsPrefix: [USER]

  # must have publish rights to the execution system
  executionSystem: cobra-hpc-aspen-[USER]-[DATE]

  # location of your agave home directory
  deploymentSystem: cobra-default-public-storage

  # Apps directory where app assets will be uploaded
  # This must be an absolute path
  appsDir: /[USER]/apps-gf

  # location of workflow test data, absolute path
  testDataDir: /[USER]/testdata-gf
Replace [USER] with your Agave username.

executionSystem should be the same system created in step 3 (e.g., cobra-hpc-aspen-[USER]-[DATE], replace [USER] and [DATE]). To see a list of execution systems to which you have access, use:
systems-list -E
deploymentSystem should be left at the default value.