Basic App: Hello World

GeneFlow workflows are composed of applications, or apps, which are modular pieces of functionality. Apps are ideally designed so that they can be combined to produce useful data analysis workflows. This tutorial covers the creation of a basic GeneFlow app that prints “Hello World!” to a text file. It is meant to be an introduction to the basic features of GeneFlow apps and does not cover all features. Advanced GeneFlow app features will be covered in later tutorials.

Configure the Environment

Load or Install GeneFlow

Configure the Linux environment by loading or installing GeneFlow. To check if GeneFlow is available, use the following command:

geneflow --help

If it’s available, you should see GeneFlow’s usage instructions:

usage: geneflow [-h] [--log_level LOG_LEVEL] [--log_file LOG_FILE]
                {add-apps,add-workflows,help,init-db,install-workflow,make-app,migrate-db,run,run-pending}
                ...

GeneFlow CLI

positional arguments:
  {add-apps,add-workflows,help,init-db,install-workflow,make-app,migrate-db,run,run-pending}
                        Functions
    add-apps            add apps to database
    add-workflows       add workflows to database
    help                GeneFlow workflow help
    init-db             initialize database
    install-workflow    install workflow
    make-app            make app from templates
    migrate-db          migrate database
    run                 run a GeneFlow workflow
    run-pending         run pending workflow jobs

optional arguments:
  -h, --help            show this help message and exit
  --log_level LOG_LEVEL
                        logging level
  --log_file LOG_FILE   log file

However, if it’s not installed, you’ll get an error message like this:

-bash: geneflow: command not found

If your system is configured with modules, try loading the GeneFlow module:

module load geneflow/latest
geneflow --help

If you need to install GeneFlow, the recommended method for installation is in a Python virtual environment, as described here: Install GeneFlow using a Python Virtual Environment.

After installation in a Python virtual environment, load GeneFlow by sourcing the virtual environment:

cd ~/geneflow_work
source gfpy/bin/activate

Clone the GeneFlow App Template

Create the “geneflow_work” directory in your home directory if it doesn’t already exist. This will be the location for all tutorial-related workflows, apps, and data:

mkdir -p ~/geneflow_work
cd ~/geneflow_work

GeneFlow’s public Apps and Workflows repository is located here: https://gitlab.com/geneflow/. In addition to public apps and workflows, this repository contains app and workflow templates. When creating new GeneFlow apps or workflows, we recommend to start with the app or workflow template rather then from scratch. Clone the app template using git:

git clone https://gitlab.com/geneflow/apps/app-template-gf2.git hello-world-gf2

This command downloads the app template into the “hello-world-gf2” directory. “hello-world-gf2” also happens to be the name of the app you’re creating in this tutorial.

The GeneFlow app template contains a simple, but fully functional application. View the contents of the app template using the “tree” command:

cd hello-world-gf2
tree .

You should see the app template directory structure similar to the following (it may have some extra files):

.
├── assets
│   └── README.rst
├── app.yaml
├── README.rst
└── test
    ├── data
    └── README.rst

5 directories, 7 files

You’ll need to update the “app.yaml” file to create the “Hello World” app. The “app.yaml” file is the main app definition file, which defines the inputs, parameters, and execution commands of the app.

It’s good practice to also update the main “README.rst” file to document the app.

Define the App

Configure the app by editing the “app.yaml” file. This file currently contains the definition of a fully functional app, so you’ll be simplifying some of the sections to create the “hello-world” app. Open the “app.yaml” file using your favorite text editor (vi and nano examples shown):

vi ./app.yaml

or:

nano ./app.yaml

The “app.yaml” file contains three main sections: Metadata, Inputs and Parameters, and Execution Methods. Edit each of these sections to create the “hello-world” app.

Metadata

The app metadata section contains the following basic information:

name:

Name of the GeneFlow app. The app name should include a “gf2” suffix. For example, if the app is meant to wrap the “mem” function in BWA, the app name should be “bwa-mem-gf2”. For this example, use “hello-world-gf2”.

description:

A title or short description of the app. For this example, use “Simple hello world GeneFlow app”.

git:

The full URL of the app’s source repository. This information is not available yet, so leave it blank.

version:

A string value that represents the app’s version. For this example, use “0.1”. We recommend to start with “0.1” for new apps and increment the number when changes are made to the app.

In the “app.yaml” file, modify the “Metadata” section so that it looks like the following:

# name: standard GeneFlow app name
name: hello-world-gf2
# description: short description for the app
description: Simple hello world GeneFlow app
# git: link to the app's git repo
git:
# version: must be incremented every time this file, or any file in the app
# project is modified
version: '0.1'

Inputs and Parameters

Each app input and parameter item is defined in a subsection with several properties. At least one input and one parameter is requred for each app. The “output” parameter is required, and must be manually included in the config file.

The example “Hello World” app doesn’t need any inputs. However, because at least one input is required, define a “dummy”, or un-used, input called “file”. Modify the “Inputs and Parameters” section of the “config.yaml” file so that it looks like the following:

inputs:
  file:
    label: Dummy Input File
    description: Dummy input file
    type: File
    required: false

parameters:
  output:
    label: Output Text File
    description: Output text file
    type: File
    required: true
    test_value: output.txt

For a more detailed explanation of each input or parameter property, see App Inputs and Parameters.

Execution Methods

The “Execution Methods” section of the app configuration file defines what the app actually does when executed. Apps can be defined with multiple execution methods. The specific method executed upon app invocation is either auto-detected or specified on the command line. Execution method names are customizable and the choice of a name should depend on the execution system. For example, if the app dependencies are installed globally in the execution system, use an execution method called “environment” (indicating that dependencies are available in the environment). If the app dependencies are containerized with Singularity, use an execution method called “singularity”. For a more detailed explanation of the app “Execution Methods” section, see App Execution Methods.

The “Execution Methods” section of the “app.yaml” file contains three sub-sections: “pre_exec”, “exec_methods”, and “post_exec”.

The “pre_exec” sub-section defines any commands that should be executed prior to commands in the main “exec_methods” sub-section. These usually include commands for directory or file preparation that are common for all execution methods, e.g., creating an output directory. For this tutorial, no “pre_exec” commands are required, so leave it blank:

pre_exec:

The “Hello World” app simply prints “Hello World!” to a text file using the standard Linux “echo” command. Thus, define a single execution method in the “exec_methods” sub-section called “environment”, which indicates that the needed commands or tools are already available in Linux. Update the “exec_methods” sub-section so that it looks like the following:

exec_methods:
- name: environment
  if:
  - in_path: 'echo'
  exec:
  - run: echo 'Hello World!'
    stdout: ${OUTPUT_FULL}

The “if” statement is used for auto-detecting the execution method. If multiple execution methods are specified, the first execution method with an “if” statement that evaluates to “True” will be selected for execution. In this example, the statement in_path: 'echo' within the “if” statement means that the “environment” execution method will be selected if the “echo” command is available in the environment path. The “exec” statement contains a list of commands to be executed for the “environment” execution method. The “environment” execution method contains only a single command that echos the “Hello World!” text to an output file. Here, ${OUTPUT_FULL} is the full path of the file specified by the “output” parameter.

The “post_exec” sub-section defines any commands that should be executed after commands in the main “exec_methods” sub-section. These usually include commands for cleaning up any temporary files created during app execution. For this tutorial, no clean-up commands are necessary, so leave it blank:

post_exec:

“Make” the App

Now that the app has been configured, generate the app wrapper script, the test script, and various definition files using the following commands:

First, make sure you’re still in the app directory:

cd ~/geneflow_work/hello-world-gf2

Then run the GeneFlow “make-app” command:

geneflow make-app .

GeneFlow will then generate three files:

2019-05-31 00:21:43 INFO [app_installer.py:293:make_agave()] compiling /home/[user]/geneflow_work/hello-world-gf/agave-app-def.json.j2
2019-05-31 00:21:43 INFO [app_installer.py:325:make_wrapper()] compiling /home/[user]/geneflow_work/hello-world-gf/assets/hello-world-gf2.sh
2019-05-31 00:21:43 INFO [app_installer.py:357:make_test()] compiling /home/[user]/geneflow_work/hello-world-gf/test/test.sh

Test the App

The GeneFlow “make-app” command generates a “test.sh” script inside the “test” folder. If your app requires test data, that data can be placed inside the “test” folder, ideally within a sub-folder called “data”. In this example, no test data is required.

To test the app, run the following commands:

cd test
sh ./test.sh

You should see output similar to the following:

CMD=/home/[user]/geneflow_work/hello-world-gf/test/../assets/hello-world-gf.sh --output="output.txt" --exec_method="auto"
File:
Output: output.txt
Execution Method: auto
Detected Execution Method: environment
CMD=echo 'Hello World!'  >"/home/[user]/geneflow_work/hello-world-gf/test/output.txt"
Exit code: 0
Exit code: 0

The “output.txt” file should also have been created in the test directory with the text “Hello World!”. View it with:

cat ./output.txt

And you should see this output:

Hello World!

Update the App README

It is best practice to update the app README file to include the app name, a short description, and descriptions for each input and parameter. Edit the README.rst file in the main app directory:

cd ~/geneflow_work/hello-world-gf2
vi ./README.rst

Modify the file so it looks like the following:

Hello World! Basic GeneFlow App
===============================

Version: 0.1

This is a basic GeneFlow app.

Inputs
------

1. file: Dummy input file, use any small file.

Parameters
----------

1. output: Output text file where "Hello World!" will be printed.

Save the file and exit the editor.

Commit the App to a Git Repo

Finally, commit the app to a git repo so that it can be used in a GeneFlow workflow. First, if you don’t already have one, create an account in either GitHub, GitLab, BitBucket, or your company/organization’s git repository. Delete the output file that was created while testing the app, since this output file is not part of the main app definition:

cd ~/geneflow_work/hello-world-gf2
rm ./test/output.txt

Commit all changes to the local git repo and tag the app version:

git add -A
git commit -m "initial version of the hello world app"
git tag 0.1

Push to the remote repo using the following commands, depending on where your repository is located.

GitHub

If your repository is in GitHub, you must first create the repo on the GitHub.com site. Once created, it will likely be located at a URL similar to https://github.com/[user]/hello-world-gf2.git, where [user] should be replaced with your GitHub username or group. Push your code to GitHub using the following commands:

git remote set-url origin https://github.com/[user]/hello-world-gf2.git
git push --tags origin master

Be sure to replace [user] with your GitHub username or group.

GitLab

If your repository is in GitLab, you don’t need to create the repo on the GitLab.com site. You can skip directly to pushing your code to the git URL, which will be similar to https://gitlab.com/[user]/hello-world-gf2.git, where [user] should be replaced with your GitLab username or group:

git remote set-url origin https://gitlab.com/[user]/hello-world-gf2.git
git push --tags origin master

Be sure to replace [user] with your GitLab username or group.

Organization GitLab

If you have a company or organization GitLab server, your git repo hostname will likely be different. For example, it could be hosted at https://git.biotech.cdc.gov/[user]/hello-world-gf2.git, where [user] should be replaced with your username or group:

git remote set-url origin https://git.biotech.cdc.gov/[user]/hello-world-gf2.git
git push --tags origin master

Be sure to replace [user] with your organization’s GitLab username or group.

Summary

Congratulations! You created a basic GeneFlow app, tested it using the auto-generated test script, and committed it to a git repo. The next tutorial covers creation of a one-step GeneFlow workflow that uses this “Hello-World” app.