Create a New Targets Pipeline for LPC
Source:vignettes/create-a-new-targets-pipeline-for-lpc.Rmd
create-a-new-targets-pipeline-for-lpc.Rmd
Introduction
The targets
package is a tool for developing
reproducible research workflows in R. Details of the package motivation
and tools are described in detail: https://books.ropensci.org/targets/ and https://docs.ropensci.org/targets/.
Example Workflow
Create a new R project
Run the
levinmisc::populate_targets_proj()
function in the R console - this function is described below in detail, and will initialize your project with helpful files/folders to start building yourtargets
pipeline.Modify the
Pipelines.qmd
file to specify your analyses. Thetargets
documentation can be useful for specifics.Knit/Render
Pipelines.qmd
to convert your markdown document into a series of R scripts that will actually run your analyses.Run
submit-targets.sh
from a terminal window to submit yourtargets
pipeline to the LPC for execution. Details of possible command line arguments to this script are described below.Modify
Results.qmd
to present the results of your analyses. Rendering this file allows you to mix text describing your analyses/methods and include citations alongside the actual results of your pipeline. Thetargets::tar_read()
function should be used heavily to load pre-computed results from your pipeline.
populate_targets_proj()
The populate_targets_proj
function can be run within a
new project folder to initialize the project with the files/folders
necessary for deploying a targets
pipeline on the LPC at
Penn. This includes creating LSF templates, a Pipelines.qmd
file containing boilerplate for running analyses, and a
Results.qmd
file which can be used to visualize the
results.
#' \dontrun{
populate_targets_proj("test")
#' }
The populate_targets_proj()
function creates several
files/folders within the project directory.
.make-targets.sh
and is a hidden helper file which is not
designed for user interaction, but necessary for submission of jobs to
the LSF scheduler. The other files are designed to be edited/used by the
user:
Pipeline.qmd
- Quarto markdown file which can be used to create a Target Markdown document that specifies atargets
pipeline for your analyses. See: https://books.ropensci.org/targets/literate-programming.html#target-markdown for details. Remember to knit/render this document in order to generate the pipeline.Results.qmd
- Quarto markdown file which can be used to display the results generated by thetargets
pipeline specified inPipeline.qmd
build_logs/
- Directory where jobs logs are storedsubmit-targets.sh
- This is a bash script which can be used to run yourtargets
pipeline, oncePipeline.qmd
has been knit/rendered. This script can be run directly from the submission host.
submit-targets.sh
This script is used to actually submit your pipeline to the LPC once
Pipeline.qmd
has been knit/rendered. This should be run
from a terminal session from the root directory of your project. The
only function of this script is to submit your pipeline to the LPC for
analysis, and can be run directly from a submission host (eg.
scisub7
). The script can accept command line arguments,
which can be useful for parallelizing your pipeline over multiple
workers/CPUs:
Usage: ./submit-targets.sh [-n NUM_WORKERS] [-j JOB_NAME] [-o OUTPUT_LOG] [-e ERROR_LOG] [-q QUEUE] [-m MEMORY] [-s SLACK] [-h HELP]
Submit a job using the LSF scheduler with the specified number of CPUs and memory usage.
Options:
-n NUM_WORKERS Number of workers (cpu cores) to request for running the targets pipeline (default: 1)
-j JOB_NAME Name of the job (default: make_targets)
-o OUTPUT_LOG Path to the output log file (default: build_logs/targets_%J.out)
-e ERROR_LOG Path to the error log file (default: build_logs/targets_%J.err)
-q QUEUE Name of the queue to submit the job to (default: voltron_normal)
-m MEMORY Memory usage for the job in megabytes (default: 16000)
-s SLACK Enable slack notifications; requires setup using slackr::slack_setup() (default: false)
-h HELP Display this help message and exit
Slack Notifications
Slack can be used to automatically notify the user of pipeline
start/finish using the -s true
command line flag:
./submit-targets.sh -s true
Slack notifications are provided using the slackr
package. The package must be configured separately before Slack
notifications are enabled. See https://mrkaye97.github.io/slackr/index.html for more
information about slackr
setup and generation of an Slack
API token.
Use {crew} for parallelization
The crew
and crew.cluster
packages have
enabled the use of heterogenous workers (https://books.ropensci.org/targets/crew.html#heterogeneous-workers),
that can be used to deploy targets
pipelines either locally
or on HPC resources. The use_crew_lsf()
function is
designed to return a block of code to rapidly enable the use of
heterogeneous workers on the Penn LPC. By default, this function creates
workers that submit to different queues (eg.
voltron_normal
, voltron_long
), and allocate
different resources (eg. a “normal” worker will use 1 core and 16GB
memory, while a “long” worker will use 1 core and 10GB memory).
#' \dontrun{
use_crew_lsf()
#' }