Targets

Table of contents

What is Targets?
Setup
Getting Started
Advanced Topics
Additional Resources

What is Targets?

Targets is an R package for creating reproducible data analysis pipelines. It was created by Will Landau and provides a Make-like workflow system specifically designed for R. Targets tracks dependencies between analysis steps, automatically skips up-to-date targets, and provides tools for debugging and monitoring pipeline execution.

Targets is particularly well-suited for:

Data Analysis: Reproducible data analysis workflows, statistical pipelines
Research: Scientific computing workflows, research reproducibility
Reporting: Automated report generation, data processing pipelines
R-based Workflows: Any workflow that primarily uses R for data processing

Key features of Targets include:

R-native: Workflows are defined as R code using target objects
Dependency tracking: Automatically tracks dependencies between targets
Incremental execution: Only runs targets that are out of date
Reproducibility: Ensures consistent results through dependency management
Debugging tools: Built-in tools for inspecting and debugging pipelines
Parallel execution: Can run independent targets in parallel
Integration: Works seamlessly with other R packages (dplyr, ggplot2, etc.)

Setup

Requirements

Targets requires:

R version 4.0.0 or later
Common R packages (available via targets::tar_option_set()) Install Targets from CRAN:

install.packages("targets")

Or install the development version from GitHub:

# Install remotes if not already installed
install.packages("remotes")

# Install targets
remotes::install_github("ropensci/targets")

Verify Installation

Verify that Targets is installed correctly:

library(targets)
packageVersion("targets")

Additional Packages for Examples

The example scripts in this repository require additional packages:

install.packages(c("tarchetypes", "visNetwork"))

tarchetypes: Provides additional target types and utilities used in the examples
visNetwork: Required for tar_visnetwork() to visualize the dependency graph

These packages are required to run the examples in the examples/targets/ directory. If you’re creating your own pipeline from scratch, you may not need them unless you use specific features (e.g., tar_visnetwork() for visualization).

Once Targets is installed, you can:

Create your first pipeline (see “Getting Started” section below)
Check out the example scripts in the GitHub repository.

Getting Started

Understanding Targets

Targets workflows are built around the concept of targets - named objects that represent steps in your analysis pipeline.

Key Concepts:

Target: A named object that represents a step in the pipeline (e.g., a data file, a processed dataset, a plot)
Target script: An R script (_targets.R) that defines all targets and their dependencies
Dependency graph: Targets automatically builds a dependency graph to determine execution order
Storage: Targets stores results in a _targets/ directory for caching and reproducibility

Targets automatically

Tracks dependencies between targets,
Skips targets that are up-to-date,
Runs targets in the correct order,
Caches results for reproducibility.

A simple Workflow

Assuming that you’re in the top level directory of the cloned GitHub repo, change to the examples folder with this command:

cd examples/targets

Here’s a basic Targets pipeline that processes data:

Create a file named _targets.R:

library(targets)
library(tarchetypes)

# Set options
tar_option_set(
  packages = c("dplyr", "readr")
)

# Define targets
list(
  # Target 1: Load raw data
  tar_target(
    name = raw_data,
    command = {
      # Simulate loading data
      data.frame(
        id = 1:10,
        value = rnorm(10, mean = 5, sd = 2)
      )
    }
  ),
  
  # Target 2: Process data
  tar_target(
    name = processed_data,
    command = {
      raw_data %>%
        dplyr::mutate(value_doubled = value * 2)
    },
    packages = "dplyr"
  ),
  
  # Target 3: Create summary
  tar_target(
    name = summary_stats,
    command = {
      list(
        mean = mean(processed_data$value),
        sd = sd(processed_data$value),
        n = nrow(processed_data)
      )
    }
  ),
  
  # Target 4: Save results
  tar_target(
    name = save_results,
    command = {
      write.csv(processed_data, "results/processed_data.csv", row.names = FALSE)
      "results/processed_data.csv"
    },
    format = "file"
  )
)

Executing the pipeline:

library(targets)

# Load the pipeline
tar_make()

# View the pipeline (opens interactive HTML graph in browser/viewer)
tar_visnetwork()

# Read a target
tar_read(processed_data)

For convenience, you can run the pipeline or specific target using the provided run_pipeline.R script from the command line:

# Run all targets
Rscript run_pipeline.R

# Run a specific target
Rscript run_pipeline.R processed_data

Expected output

Running all targets...
+ raw_data dispatched
No data file found. Generating sample data...
✔ raw_data completed [109ms, 380 B]
+ processed_data dispatched
✔ processed_data completed [14ms, 652 B]
+ data_plot dispatched
✔ data_plot completed [1.2s, 46.44 kB]
+ summary_stats dispatched
✔ summary_stats completed [1ms, 210 B]
+ save_results dispatched
✔ save_results completed [321ms, 1.29 kB]
✔ ended pipeline [2s, 5 completed, 0 skipped]

Pipeline completed successfully!

Pipeline status:
# A tibble: 5 × 2
  name           progress 
  <chr>          <chr>    
1 raw_data       completed
2 processed_data completed
3 data_plot      completed
4 summary_stats  completed
5 save_results   completed

View of task dependencies

Data storage and caching:

Targets stores all computed results, metadata, and dependency information in a _targets/ directory at the root of your project (the same directory where your _targets.R file is located). This directory contains:

Computed target values: Cached results of each target for fast retrieval
Metadata: Information about when each target was built, its dependencies, and execution status
Dependency graph: Internal representation of the pipeline structure

When you run tar_make(), Targets checks this cache to determine which targets need to be rebuilt (only those that have changed inputs or dependencies). This makes subsequent runs much faster and ensures reproducibility. The _targets/ directory should typically be added to .gitignore since it contains generated files.

Key points:

Targets are defined using tar_target()
Dependencies are automatically inferred from code
Use tar_make() to run the pipeline
Use tar_read() to access target results
Use tar_visnetwork() to visualize the dependency graph (opens interactive HTML in browser/viewer)

Inspecting and Managing the Pipeline

Beyond the basic commands shown above, Targets provides several useful functions for inspecting and managing your pipeline:

# Export the dependency graph as a static image (optional)
# First install: install.packages(c("htmlwidgets", "webshot2"))
library(htmlwidgets)
library(webshot2)
vis <- tar_visnetwork()
htmlwidgets::saveWidget(vis, "pipeline_graph.html")
webshot2::webshot("pipeline_graph.html", "pipeline_graph.png")

# Alternative: Simple text-based diagram
tar_man()

# List all targets
tar_manifest()

# Check which targets are out of date
tar_outdated()

# View pipeline metadata
tar_meta()

All Example Scripts

You can find the example scripts and notebooks in the examples folder in the Git repository.

In addition, take a look at the examples in the Additional Resources

Advanced Topics

Dynamic Targets

Create targets dynamically based on data:

tar_target(
  name = file_list,
  command = list.files("data/", pattern = "*.csv")
),

tar_target(
  name = data_files,
  command = read.csv(file_list, stringsAsFactors = FALSE),
  pattern = map(file_list),
  iteration = "list"
)

Branching

Create branches for parallel processing:

tar_target(
  name = analysis,
  command = process_data(data),
  pattern = map(data),
  iteration = "list"
)

Format Options

Specify storage formats for targets:

tar_target(
  name = data_file,
  command = "data.csv",
  format = "file"  # Tracks file timestamps
)

tar_target(
  name = large_data,
  command = big_data_frame,
  format = "fst"  # Efficient storage for large data frames
)

Custom Storage

Configure custom storage backends:

tar_option_set(
  storage = "worker",
  retrieval = "worker",
  memory = "transient"
)

Pipeline Configuration

Configure pipeline behavior:

tar_option_set(
  packages = c("dplyr", "ggplot2"),
  error = "continue",  # Continue on errors
  memory = "persistent"  # Keep targets in memory
)

Debugging

Debug pipeline issues:

# Run a specific target
tar_make(names = "processed_data")

# Inspect a target
tar_load(processed_data)
head(processed_data)

# View error messages
tar_meta(fields = error)

Additional Resources

Official Documentation

Targets Documentation - Comprehensive guides, API reference, and tutorials
Targets GitHub Repository - Source code and issues

Learning Resources

Targets Tutorial - Step-by-step tutorial
Targets Examples - Example pipelines
Targets Best Practices - Best practices guide

Community

rOpenSci Community - Community forum and discussions
Stack Overflow - Q&A with targets tag
Targets Discussions - GitHub discussions

tarchetypes - Additional target types and utilities
crew - Parallel computing backend for targets
stantargets - Targets integration for Stan models