--- title: "Advanced Features: EWMA, Quality Control, and Cohort Analysis" author: "Athlytics Package" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Advanced Features: EWMA, Quality Control, and Cohort Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 10, fig.height = 7, out.width = "100%", dpi = 150 ) ``` # Introduction This vignette demonstrates the advanced features of Athlytics for sports physiology analysis: 1. **Data Quality Control**: Detecting and flagging HR/power spikes, GPS drift 2. **ACWR with EWMA**: Exponentially weighted moving average with confidence bands 3. **Cohort Reference Analysis**: Multi-athlete percentile comparisons These features make Athlytics suitable for both individual training analysis and multi-athlete research studies. --- # 1. Data Quality Control Before calculating physiological metrics, it's crucial to ensure data quality. The `flag_quality()` function detects common issues in activity stream data. ## Basic Quality Checking ```{r eval=FALSE} library(Athlytics) library(dplyr) # Parse stream data from a FIT/TCX/GPX file stream_data <- parse_activity_file("path/to/activity.fit") # Flag quality issues flagged_data <- flag_quality( stream_data, sport = "Run", hr_range = c(30, 220), # Valid HR range pw_range = c(0, 1500), # Valid power range max_run_speed = 7.0, # Max speed (m/s) ≈ 2:23/km pace max_hr_jump = 10, # Max HR change per second max_pw_jump = 300, # Max power change per second min_steady_minutes = 20, # Min duration for steady-state steady_cv_threshold = 8 # CV% threshold for steady-state ) # View quality summary summary_stats <- summarize_quality(flagged_data) print(summary_stats) ``` **Output interpretation:** - `quality_score`: Overall quality (0-1), where 1 = perfect - `flagged_pct`: Percentage of data points with issues - `steady_state_pct`: Percentage suitable for EF/decoupling calculations ## Why Quality Control Matters **Without quality control:** - HR spikes from sensor errors inflate HRSS calculations - GPS drift creates artificially high speeds, affecting pace metrics - Non-steady-state segments contaminate Efficiency Factor (EF) calculations **With quality control:** - Only clean data contributes to metrics - Steady-state detection ensures valid EF/decoupling calculations - Researchers can report data quality in publications --- # 2. ACWR with EWMA Method Traditional ACWR uses rolling averages (RA). The EWMA method, introduced by Williams et al. (2017), offers: - **Greater sensitivity** to recent training changes - **Customizable decay rates** via half-life parameters - **Confidence bands** via bootstrap to quantify uncertainty > **Reference:** Williams, S., et al. (2017). Better way to determine the acute:chronic workload ratio? *British Journal of Sports Medicine*, 51(3), 209-210. [DOI: 10.1136/bjsports-2016-096589](https://doi.org/10.1136/bjsports-2016-096589) ## Comparing RA vs EWMA ```{r eval=FALSE} # Load activities activities <- load_local_activities("export_12345678.zip") # Calculate ACWR using Rolling Average (traditional) acwr_ra <- calculate_acwr_ewma( activities, activity_type = "Run", method = "ra", acute_period = 7, chronic_period = 28, load_metric = "duration_mins" ) # Calculate ACWR using EWMA acwr_ewma <- calculate_acwr_ewma( activities, activity_type = "Run", method = "ewma", half_life_acute = 3.5, # Acute load half-life (days) half_life_chronic = 14, # Chronic load half-life (days) load_metric = "duration_mins" ) # Compare the two methods library(ggplot2) ggplot() + geom_line( data = acwr_ra, aes(x = date, y = acwr_smooth), color = "blue", linewidth = 1, linetype = "solid" ) + geom_line( data = acwr_ewma, aes(x = date, y = acwr_smooth), color = "red", linewidth = 1, linetype = "dashed" ) + labs( title = "ACWR: Rolling Average vs EWMA", subtitle = "Blue = RA (7:28d) | Red = EWMA (3.5:14d half-life)", x = "Date", y = "ACWR" ) + theme_minimal() ``` **Key differences:** - **RA**: Equal weight to all days in window; sudden jumps when old workouts exit window - **EWMA**: Exponential decay; smooth transitions; more responsive to recent changes ## Adding Confidence Bands Confidence bands quantify uncertainty in ACWR estimates, essential for research reporting. ```{r eval=FALSE} # Calculate EWMA with 95% confidence bands acwr_ewma_ci <- calculate_acwr_ewma( activities, activity_type = "Run", method = "ewma", half_life_acute = 3.5, half_life_chronic = 14, load_metric = "duration_mins", ci = TRUE, # Enable confidence intervals B = 200, # Bootstrap iterations block_len = 7, # Block length (days) for bootstrap conf_level = 0.95 # 95% CI ) # Plot with confidence bands ggplot(acwr_ewma_ci, aes(x = date)) + geom_ribbon(aes(ymin = acwr_lower, ymax = acwr_upper), fill = "gray70", alpha = 0.5 ) + geom_line(aes(y = acwr_smooth), color = "black", linewidth = 1) + geom_hline(yintercept = c(0.8, 1.3), linetype = "dotted", color = "green") + geom_hline(yintercept = 1.5, linetype = "dotted", color = "red") + labs( title = "ACWR with 95% Confidence Bands", subtitle = "Gray band = bootstrap confidence interval", x = "Date", y = "ACWR" ) + theme_minimal() ``` **Interpretation:** - **Gray band**: 95% confidence interval from moving-block bootstrap - **Wider bands** = more uncertainty (e.g., during high load variability) - **Narrower bands** = more stable estimates ### Technical Note: Bootstrap Method The confidence bands use **moving-block bootstrap**: 1. Daily load sequence is resampled in weekly blocks (preserves within-week correlation) 2. ACWR is recalculated for each bootstrap sample (B = 200 iterations) 3. 2.5th and 97.5th percentiles form the 95% confidence band This approach is computationally efficient and doesn't require distributional assumptions. --- # 3. Cohort Reference Analysis When analyzing multiple athletes (teams, research cohorts), it's valuable to compare individuals to group norms. ## Multi-Athlete Setup ```{r eval=FALSE} # Load data for multiple athletes athlete1 <- load_local_activities("athlete1_export.zip") |> mutate(athlete_id = "athlete1", sex = "M", age_band = "25-35") athlete2 <- load_local_activities("athlete2_export.zip") |> mutate(athlete_id = "athlete2", sex = "F", age_band = "25-35") athlete3 <- load_local_activities("athlete3_export.zip") |> mutate(athlete_id = "athlete3", sex = "M", age_band = "35-45") # Combine cohort_data <- bind_rows(athlete1, athlete2, athlete3) # Calculate ACWR for each athlete cohort_acwr <- cohort_data |> group_by(athlete_id) |> group_modify(~ calculate_acwr_ewma(.x, method = "ewma")) |> ungroup() |> left_join( cohort_data |> select(athlete_id, sex, age_band) |> distinct(), by = "athlete_id" ) ``` ## Calculate Reference Percentiles ```{r eval=FALSE} # Calculate cohort reference by activity type and sex reference <- calculate_cohort_reference( cohort_acwr, metric = "acwr_smooth", by = c("type", "sex"), # Group by activity type and sex probs = c(0.05, 0.25, 0.5, 0.75, 0.95), # Percentiles min_athletes = 3 # Minimum athletes per group ) # View reference structure head(reference) ``` **Output:** ``` # A tibble: 6 × 5 date type sex percentile value n_athletes 1 2024-01-01 Run M p05 0.65 5 2 2024-01-01 Run M p25 0.82 5 3 2024-01-01 Run M p50 0.98 5 4 2024-01-01 Run M p75 1.15 5 5 2024-01-01 Run M p95 1.38 5 ``` ## Plot Individual vs Cohort ```{r eval=FALSE} # Extract one athlete's data individual <- cohort_acwr |> filter(athlete_id == "athlete1") # Plot with cohort reference p <- plot_with_reference( individual = individual, reference = reference |> filter(sex == "M"), # Match athlete's group metric = "acwr_smooth", bands = c("p25_p75", "p05_p95", "p50") ) print(p) ``` **Interpretation:** - **Black line**: Individual athlete's ACWR trend - **Dark shaded area (P25-P75)**: "Normal" range (50% of cohort) - **Light shaded area (P5-P95)**: Extended range (90% of cohort) - **Dashed line (P50)**: Median (cohort center) **Use cases:** - **Outlier detection**: Athletes consistently above P95 or below P5 - **Load matching**: Ensure similar load profiles when comparing injury rates - **Benchmarking**: Compare individual's trajectory to team norms --- # 4. Integrated Workflow Example Here's a complete research workflow combining all features: ```{r eval=FALSE} library(Athlytics) library(dplyr) library(ggplot2) # --- Step 1: Load multi-athlete cohort --- cohort_files <- c("athlete1_export.zip", "athlete2_export.zip", "athlete3_export.zip") cohort_data <- lapply(cohort_files, function(file) { load_local_activities(file) |> mutate(athlete_id = tools::file_path_sans_ext(basename(file))) }) |> bind_rows() # --- Step 2: Quality control (conceptual - for stream data) --- # If you have stream data, flag quality before calculating metrics # stream <- parse_activity_file("path/to/activity.fit") # flagged <- flag_quality(stream, sport = "Run") # summary <- summarize_quality(flagged) # --- Step 3: Calculate ACWR with EWMA + CI --- cohort_acwr <- cohort_data |> group_by(athlete_id) |> group_modify(~ calculate_acwr_ewma( .x, method = "ewma", half_life_acute = 3.5, half_life_chronic = 14, load_metric = "duration_mins", ci = TRUE, B = 100 # Reduced for speed in example )) |> ungroup() # --- Step 4: Calculate cohort reference --- reference <- calculate_cohort_reference( cohort_acwr, metric = "acwr_smooth", by = c("type"), probs = c(0.05, 0.25, 0.5, 0.75, 0.95), min_athletes = 3 ) # --- Step 5: Visualize individual with confidence + cohort --- individual <- cohort_acwr |> filter(athlete_id == "athlete1") p <- plot_with_reference(individual, reference, metric = "acwr_smooth") + # Add individual's confidence bands geom_ribbon( data = individual, aes(x = date, ymin = acwr_lower, ymax = acwr_upper), fill = "blue", alpha = 0.2 ) print(p) # --- Step 6: Export for statistical analysis --- write.csv(cohort_acwr, "cohort_acwr_with_ci.csv", row.names = FALSE) write.csv(reference, "cohort_reference.csv", row.names = FALSE) ``` --- # 5. Methodological Notes for Research ## ACWR Method Selection | Method | Pros | Cons | Recommended Use | |--------|------|------|----------------| | **RA (Rolling Average)** | Stable, well-established, simple interpretation | Sudden jumps, equal weight to all days | **Descriptive studies**, replicating prior work | | **EWMA** | Smooth, responsive, customizable decay | More parameters, less established norms | **Monitoring**, sensitive change detection | **Default parameters:** - RA: 7-day acute, 28-day chronic (most common in literature) - EWMA: 3.5-day acute half-life, 14-day chronic half-life (equivalent decay) ## Confidence Bands Limitations - **Not causal inference**: Bands reflect sampling variability, not prediction of injury - **Assumes stationarity**: If training pattern changes fundamentally, past data may not be representative - **Block length matters**: Default 7-day blocks preserve weekly structure; adjust if training cycle differs ## Cohort Reference Interpretation ⚠️ **Critical distinction**: Percentile bands are **descriptive population variability**, NOT **confidence intervals** for an individual's "true" value. **Correct interpretation:** - "Athlete was above the 75th percentile of the cohort" ✅ - "Athlete's ACWR was significantly higher than cohort" ❌ (requires statistical test) **Minimum sample sizes:** - P25-P75: n ≥ 5 athletes - P5-P95: n ≥ 20 athletes (for stable extremes) --- # 6. FAQ for Advanced Features **Q: How do I choose EWMA half-life?** A: Start with defaults (3.5:14d ≈ 7:28d RA). For more responsive monitoring, reduce both proportionally (e.g., 2.5:10d). For smoother trends, increase (e.g., 5:20d). **Q: Bootstrap is slow. Can I reduce B?** A: Yes. B = 50–100 is often sufficient for visual inspection. For publication, use B = 200–500. **Q: What if my cohort has < 5 athletes?** A: Set `min_athletes = 3`, but report this limitation. Percentiles with n < 5 are unstable. **Q: Can I use these methods for injury prediction?** A: **No**. These are **descriptive monitoring tools**. Injury prediction requires proper case-control studies, controlling for confounders, and validation. See: Carey, D. L., et al. (2018). Predictive modelling of training loads and injury in Australian football. *International Journal of Computer Science in Sport*, 17(1), 49-66. [DOI: 10.2478/ijcss-2018-0002](https://doi.org/10.2478/ijcss-2018-0002); Impellizzeri, F. M., et al. (2021). What role do chronic workloads play in the acute to chronic workload ratio? Time to dismiss ACWR and its underlying theory. *Sports Medicine*, 51(3), 581-592. [DOI: 10.1007/s40279-020-01378-6](https://doi.org/10.1007/s40279-020-01378-6) **Q: How do I cite Athlytics?** A: Use the standard software citation format with version number and repository URL. See the main tutorial for the complete citation. --- # Session Info ```{r} sessionInfo() ```