Create a New Targets Pipeline for LPC • levinmisc

library(levinmisc)

Introduction

The targets package is a tool for developing reproducible research workflows in R. Details of the package motivation and tools are described in detail: https://books.ropensci.org/targets/ and https://docs.ropensci.org/targets/.

Example Workflow

Create a new R project
Run the levinmisc::populate_targets_proj() function in the R console - this function is described below in detail, and will initialize your project with helpful files/folders to start building your targets pipeline.
Modify the Pipelines.qmd file to specify your analyses. The targets documentation can be useful for specifics.
Knit/Render Pipelines.qmd to convert your markdown document into a series of R scripts that will actually run your analyses.
Run submit-targets.sh from a terminal window to submit your targets pipeline to the LPC for execution. Details of possible command line arguments to this script are described below.
Modify Results.qmd to present the results of your analyses. Rendering this file allows you to mix text describing your analyses/methods and include citations alongside the actual results of your pipeline. The targets::tar_read() function should be used heavily to load pre-computed results from your pipeline.

populate_targets_proj()

The populate_targets_proj function can be run within a new project folder to initialize the project with the files/folders necessary for deploying a targets pipeline on the LPC at Penn. This includes creating LSF templates, a Pipelines.qmd file containing boilerplate for running analyses, and a Results.qmd file which can be used to visualize the results.

#' \dontrun{
populate_targets_proj("test")
#' }

The populate_targets_proj() function creates several files/folders within the project directory. .make-targets.sh and is a hidden helper file which is not designed for user interaction, but necessary for submission of jobs to the LSF scheduler. The other files are designed to be edited/used by the user:

Pipeline.qmd - Quarto markdown file which can be used to create a Target Markdown document that specifies a targets pipeline for your analyses. See: https://books.ropensci.org/targets/literate-programming.html#target-markdown for details. Remember to knit/render this document in order to generate the pipeline.
Results.qmd - Quarto markdown file which can be used to display the results generated by the targets pipeline specified in Pipeline.qmd
build_logs/ - Directory where jobs logs are stored
submit-targets.sh- This is a bash script which can be used to run your targets pipeline, once Pipeline.qmd has been knit/rendered. This script can be run directly from the submission host.

submit-targets.sh

This script is used to actually submit your pipeline to the LPC once Pipeline.qmd has been knit/rendered. This should be run from a terminal session from the root directory of your project. The only function of this script is to submit your pipeline to the LPC for analysis, and can be run directly from a submission host (eg. scisub7). The script can accept command line arguments, which can be useful for parallelizing your pipeline over multiple workers/CPUs:

Usage: ./submit-targets.sh [-n NUM_WORKERS] [-j JOB_NAME] [-o OUTPUT_LOG] [-e ERROR_LOG] [-q QUEUE] [-m MEMORY] [-s SLACK] [-h HELP]

Submit a job using the LSF scheduler with the specified number of CPUs and memory usage.

Options:
  -n NUM_WORKERS Number of workers (cpu cores) to request for running the targets pipeline (default: 1)
  -j JOB_NAME    Name of the job (default: make_targets)
  -o OUTPUT_LOG  Path to the output log file (default: build_logs/targets_%J.out)
  -e ERROR_LOG   Path to the error log file (default: build_logs/targets_%J.err)
  -q QUEUE       Name of the queue to submit the job to (default: voltron_normal)
  -m MEMORY      Memory usage for the job in megabytes (default: 16000)
  -s SLACK       Enable slack notifications; requires setup using slackr::slack_setup() (default: false)
  -h HELP        Display this help message and exit

Slack Notifications

Slack can be used to automatically notify the user of pipeline start/finish using the -s true command line flag:

./submit-targets.sh -s true

Slack notifications are provided using the slackr package. The package must be configured separately before Slack notifications are enabled. See https://mrkaye97.github.io/slackr/index.html for more information about slackr setup and generation of an Slack API token.

Use {crew} for parallelization

The crew and crew.cluster packages have enabled the use of heterogenous workers (https://books.ropensci.org/targets/crew.html#heterogeneous-workers), that can be used to deploy targets pipelines either locally or on HPC resources. The use_crew_lsf() function is designed to return a block of code to rapidly enable the use of heterogeneous workers on the Penn LPC. By default, this function creates workers that submit to different queues (eg. voltron_normal, voltron_long), and allocate different resources (eg. a “normal” worker will use 1 core and 16GB memory, while a “long” worker will use 1 core and 10GB memory).

#' \dontrun{
use_crew_lsf()
#' }