Skip to content

Add harm bound to AHR designs#640

Open
LittleBeannie wants to merge 13 commits into
mainfrom
618-add-harm-bound
Open

Add harm bound to AHR designs#640
LittleBeannie wants to merge 13 commits into
mainfrom
618-add-harm-bound

Conversation

@LittleBeannie

@LittleBeannie LittleBeannie commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

To solve issue #618.

@jdblischak: In addition to "Efficacy" and "Futility" bound, I added "Harm" bound, which sequentially leads to some changes in gs_bound_summary(). The changes in gs_bound_summary() is suggested by GPT5.5. Could you please review if these AI-suggested changes looks good to you?

@LittleBeannie LittleBeannie self-assigned this Jun 10, 2026
@LittleBeannie LittleBeannie linked an issue Jun 10, 2026 that may be closed by this pull request
LittleBeannie and others added 9 commits June 11, 2026 10:31
The helper-support-as_rtf.R is auto-sourced by testit, so the explicit
source() call failed during R CMD check. Also update as_gt and as_rtf
snapshots to reflect the new bound ordering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Run roxygenize to sync Rd files with code (fixes WARNING)
- Remove test-independent-as_rtf.R (the .md snapshot is sufficient)
- Replace all() with bare logical vector in test assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…alone

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yihui

yihui commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Note on snapshot test changes

The .md snapshot diffs look large but are caused by a single behavioral change: as_gt() now sorts bounds by factor level order (Efficacy, Futility, Harm) via arrange(x2, Analysis, Bound), whereas previously it used arrange(x2, Analysis) which preserved the summary() order (Futility before Efficacy, from desc(bound) alphabetical sorting).

This means every analysis section in both test-independent-as_gt.md and test-independent-as_rtf.md has its Efficacy and Futility rows swapped. The actual numeric values are unchanged.

The other visible change in as_gt.md is lrrrrrcrrrrr (first column alignment changed from left to center).

@yihui

yihui commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Correction on the alignment change: the lrrrrrcrrrrr (first column left → center) is not from this PR's code. It's from gt 1.3.0 changing its default LaTeX alignment for the first data column in row-grouped tables. The old snapshot was generated with an older gt version. Since CI also installs gt 1.3.0, the updated snapshot is correct.

@LittleBeannie

Copy link
Copy Markdown
Collaborator Author

Thank you @yihui for helping with the testit snapshot issues!

Comment thread R/gs_design_ahr.R
#'
#' # Example 8 ----
#' # Design with an additional harm bound
#' \donttest{

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: we use \donttest{} here to avoid long-running example code

xref: #147

}
\fontsize{12.0pt}{14.0pt}\selectfont
\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}lrrrrr}
\begin{tabular*}{\linewidth}{@{\extracolsep{\fill}}crrrrr}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, I think it would be cleaner if we fixed snapshot test churn in a separate PR

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea.

Previously we skipped these tests on CRAN (via skip_on_cran()) so we have been passively accepting upstream {gt} changes, which I find quite annoying. Now these tests run unconditionally, so we have at least one vote in our hands to restrict {gt}---if they make such changes again, they will have to rethink and inform us in advance, otherwise they will break our tests.

Comment thread R/gs_bound_summary.R
if (nrow(row_bound) == 0) return(rep(NA_real_, length(columns)))
as.numeric(unlist(row_bound[1, columns], use.names = FALSE))
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring looks good

harm_bound <- x |> dplyr::filter(bound == "harm") |> dplyr::pull(z)
futility_bound <- x |> dplyr::filter(bound == "lower") |> dplyr::pull(z)
(harm_bound <= futility_bound)
})

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please instruct the AI to write unit tests for every function that it edits

Comment thread R/gs_bound_summary.R

# One-sided design should not include Futility column
if (all(is.na(out[["Futility"]]))) out[["Futility"]] <- NULL
if (all(is.na(out[["Harm"]]))) out[["Harm"]] <- NULL

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good when I tested locally, but this needs some unit tests for long-term maintenance. For example, confirm that a one-sided design does not return the column Harm

Comment thread R/gs_power_npe.R
#' single value of `TRUE` (default) indicates all analyses;
#' single value of `FALSE` indicates no harm bound; otherwise,
#' a logical vector of the same length as `info` should
#' indicate which analyses will have a harm bound.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between test_harm and test_lower? Is it theoretically possible to have a harm bound but not a futility bound?

From my local testing, it appears that a harm bound is only included if both test_harm and test_lower are TRUE for a given analysis. If this is the expected behavior, please document this. If it does not make sense to have a harm bound when there is no futility bound, then we should also consider throwing an error if test_harm is TRUE and test_lower is FALSE.

Comment thread man/gs_design_ahr.Rd
upper = gs_b, upar = -qnorm(0.025), test_upper = TRUE,
lower = gs_b, lpar = -1, test_lower = TRUE,
harm = gs_b, hpar = -2, test_harm = TRUE
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to compute and return a futility or harm bound when there is only a single analysis?

If I run this with test_harm = FALSE, it only returns an efficacy bound for the single analysis. But with test_harm = TRUE, it returns all 3 bounds (efficacy, futility, harm).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add harm bound

3 participants