Targets Examples

This directory contains example Targets pipelines and helper scripts for R-based data workflows.

Files

_targets.R - Main pipeline definition file (required for all Targets pipelines)
run_pipeline.R - Script to run the pipeline from command line
inspect_pipeline.R - Script to inspect pipeline status and dependencies
README.md - This file

Setup

Install Targets:
```
install.packages("targets")
```

Install optional dependencies:

install.packages(c("dplyr", "readr", "tarchetypes", "visNetwork"))
# Optional: for plotting
install.packages("ggplot2")

Note: visNetwork is required for tar_visnetwork() to visualize the dependency graph.

Verify installation:

library(targets)
packageVersion("targets")

Running the Example

Basic Usage

Copy _targets.R to your project directory (or work in this directory)
Run the pipeline in R:
```
library(targets)
tar_make()
```
View the dependency graph:
```
tar_visnetwork()
```
Read a target:
```
tar_read(processed_data)
```

Using Helper Scripts

Run the pipeline:

Rscript run_pipeline.R

Run specific targets:

Rscript run_pipeline.R processed_data summary_stats

Inspect the pipeline:

Rscript inspect_pipeline.R

Understanding the Pipeline

Pipeline Structure

The _targets.R file defines a pipeline with five targets:

raw_data - Generates or loads raw data
processed_data - Transforms the raw data
summary_stats - Calculates summary statistics
save_results - Saves processed data to a CSV file
data_plot - Creates a visualization (optional, requires ggplot2)

Target Dependencies

Targets automatically tracks dependencies:

processed_data depends on raw_data
summary_stats depends on processed_data
save_results depends on processed_data
data_plot depends on processed_data

Key Commands

# Run the pipeline
tar_make()

# Run specific targets
tar_make(names = c("processed_data", "summary_stats"))

# View dependency graph
tar_visnetwork()

# Read a target
tar_read(processed_data)

# List all targets
tar_manifest()

# Check outdated targets
tar_outdated()

# View pipeline metadata
tar_meta()

# Load targets into environment
tar_load(processed_data, summary_stats)

# Clean pipeline (remove all targets)
tar_destroy()

Pipeline Features

Incremental Execution

Targets only runs targets that are out of date. If you modify raw_data, only downstream targets will be re-run.

Dependency Tracking

Targets automatically infers dependencies from your code. If processed_data uses raw_data, the dependency is automatically tracked.

File Tracking

The save_results target uses format = "file" to track file timestamps. If the file is modified externally, the target will be marked as outdated.

Error Handling

The pipeline is configured with error = "continue" to continue execution even if one target fails.

Customization

Adding New Targets

Add targets to the list() in _targets.R:

tar_target(
  name = new_target,
  command = {
    # Your code here
    processed_data %>% filter(value > 5)
  },
  packages = "dplyr"
)

Using External Data Files

Modify the raw_data target to read from a file:

tar_target(
  name = raw_data,
  command = {
    readr::read_csv("data/my_data.csv", show_col_types = FALSE)
  },
  format = "file"  # Track file timestamps
)

Parallel Execution

Configure parallel execution:

tar_option_set(
  workers = 4  # Number of parallel workers
)

Troubleshooting

Common Issues

Package not found: Ensure all required packages are installed and listed in tar_option_set(packages = ...)
Target not found: Check that the target name is spelled correctly and exists in _targets.R
Dependency errors: Use tar_visnetwork() to visualize dependencies and identify issues
Outdated targets: Use tar_outdated() to see which targets need to be updated

Getting Help

Check Targets logs in the _targets/ directory
Use tar_meta(fields = error) to view error messages
Consult the Targets Documentation

Additional Resources

For comprehensive documentation, tutorials, and additional resources, see the Targets documentation page.