Install GeneFlow¶
GeneFlow can be installed and run in any Linux environment, and is pre-installed in the CDC environment.
Requirements¶
At a minimum, GeneFlow requires a Linux environment with Python 3. The Python pip installer for GeneFlow handles all python dependencies.
Agave is optionally required if you want to run workflows in Agave (see https://agaveapi.co).
Install Dependencies in Ubuntu/Debian Systems¶
Install system-level dependencies in Ubuntu with the following commands:
sudo apt install python3 python3-dev git gcc
Install Dependencies in CentOS/RHEL Systems¶
Install system-level dependencies in CentOS with the following commands:
sudo yum install python36 python36-devel git gcc
Prepare the CDC Environment to run GeneFlow¶
To use the pre-installed GeneFlow in the CDC environment, use the following instructions to prepare the environment and load the module.
Prepare the CDC Environment and Load the GeneFlow Module¶
Use the following steps to setup your CDC Linux environment to run GeneFlow. GeneFlow can be run from Biolinux or from any CDC Linux system that has access to the “modules” environment.
In your home directory, create a working directory.
mkdir ~/geneflow_work
Although GeneFlow output can be directed to any folder, create an output folder to help organize workflow outputs:
mkdir ~/geneflow_work/output
Load the GeneFlow module. Note that older versions of GeneFlow can also be loaded by replacing “latest” with the desired version number:
module load geneflow/latest
Prepare Agave in the CDC Environment¶
If you want to run workflows in Agave, you will need to initialize your CDC Agave environment.
Follow these instructions to prepare your Agave environment. Note that these instructions only need to be performed once.
Load the Agave CLI tools:
module load cobra-cli/0.1
Initialize your client:
cobra-init
cobra-init
will prompt you for your username and password.Create an execution system:
cobra-systems-create
Note the name of the new execution system, which will be formatted as
cobra-hpc-aspen-[USER]-[DATE]
.Create GeneFlow output and work directories in your Agave home:
files-mkdir -N geneflow-output /[USER] files-mkdir -N geneflow-work /[USER]
Prepare the Agave configuration file:
Create a new file with agave environment parameters:
cd ~/geneflow_work vi ./agave-params.yaml
Add the following to the file:
%YAML 1.1 --- agave: # prefix for app name. For user apps, use your username. # For public apps, use 'public'. appsPrefix: [USER] # must have publish rights to the execution system executionSystem: cobra-hpc-aspen-[USER]-[DATE] # location of your agave home directory deploymentSystem: cobra-default-public-storage # Apps directory where app assets will be uploaded # This must be an absolute path appsDir: /[USER]/apps-gf # location of workflow test data, absolute path testDataDir: /[USER]/testdata-gf
Replace
[USER]
with your Agave username.executionSystem
should be the same system created in step 3 (e.g.,cobra-hpc-aspen-[USER]-[DATE]
, replace[USER]
and[DATE]
). To see a list of execution systems to which you have access, use:systems-list -E
deploymentSystem
should be left at the default value.
Install GeneFlow in a General Linux Environment¶
Use the following instructions to install GeneFlow in a general Linux environment.
Install GeneFlow using a Python Virtual Environment¶
GeneFlow can be installed and run in any general Linux environment. Use the following instructions to install geneflow and prepare the environment for running workflows.
Create a working directory in a Linux environment. The remainder of these instructions will assume that the working directory is in
~/geneflow_work
. If you customize this, be sure to adjust the commands in the instructions accordingly.mkdir ~/geneflow_work
Setup a Python 3 virtual environment and install dependencies with pip3. The virtual environment is optional if you have sudo access and wish to install GeneFlow and dependencies system-wide. The python3 executable must be available in your path.
Create and activate the Python 3 virtual environment:
cd ~/geneflow_work python3 -m venv gfpy source gfpy/bin/activate
You should now see a modified prompt prefixed with “(gfpy)”.
Clone the GeneFlow source code and install it:
git clone https://github.com/CDCgov/geneflow2 pip3 install ./geneflow
Prepare Agave in a General Linux Environment¶
If you want to run workflows in Agave, you will need access to an Agave tenant. Follow the instructions at https://agaveapi.co to setup Agave with a client, storage system, and execution system. Alternatively, talk to your system administrator about setting up an Agave tenant.
Create an Agave token using the Agave CLI (https://bitbucket.org/tacc-cic/cli):
auth-tokens-create -u [USER]
Create GeneFlow output and work directories in your Agave home:
files-mkdir -N geneflow-output /[USER] files-mkdir -N geneflow-work /[USER]
Note that, depending on your Agave default storage system, your home directory may be in a different place (e.g., /home/[USER]). Check with your Agave administrator if you are unsure.
Install the Agave Python wrapper:
pip3 install agavepy
Prepare the Agave configuration file:
Create a new file with Agave environment parameters:
cd ~/geneflow_work vi ./agave-params.yaml
Add the following to the file:
%YAML 1.1 --- agave: # prefix for app name. For user apps, use your username. # For public apps, use 'public'. appsPrefix: [USER] # must have publish rights to the execution system executionSystem: [execution-system] # location of your agave home directory deploymentSystem: [default-public-storage] # Apps directory where app assets will be uploaded. # This must be an absolute path. appsDir: /[USER]/apps-gf # location of workflow test data, absolute path. testDataDir: /[USER]/testdata-gf
Replace
[USER]
with your Agave username.Replace
[execution-system]
with an Agave execution system for which you have “OWNER” access.Replace
[deployment-system]
with an Agave storage system that contains your home directory.For public apps,
appsDir
andtestDataDir
can be set to a global or shared location instead of a user home directory.