FSA Analyses for Multiple Groups

Demonstrate new (or improved) functionality in the FSA package for efficiently performing some analyses across multiple groups.
Abundance
Catch Curve
Depletion
Removal
group_by
purrr
Author

Derek H. Ogle

Published

Jan 8, 2025

Modified

Jan 8, 2025

Introduction

A year or so ago a user of FSA asked me if there was a way to use group_by() from dplyr() with removal() from FSA to efficiently estimate fish abundance from removal data for multiple groups. There was not, as far as either of us could tell. My naive impression was that I would need a new version of removal() that used formula notation with a data= argument. So, I modified removal to use a formula and data=.

However, this was not an adequate solution because I did not want to change the contents of the list that was returned from removal(). Furthermore required both coef() and confint() (S3 extractor functions for removal()) to extract the results the user wanted from this list. My solution for this was to modify confint() with incl.est= that would append the confidence intervals to the point estimate (extracted with coef()) when set to TRUE.

However, confint() returned a matrix which I found difficult to work with within any of the dplyr functions. Thus, I further modified confint()1 with as.df= to return a data.frame rather than a matrix when set to TRUE. Both incl.est= and as.df= default to FALSE so that the original functionality is maintained (by default).2 Thus, to get the new functionality both incl.est= and as.df= must be explicitly set to TRUE by the user.

1 And coef() and summary() while I was at it.

2 Trying not to break user’s legacy code.

These changes seemed to meet the user’s request. Thus, I made similar changes to depletion(), catchCurve(), and chapmanRobson(). In this post, I demonstrate how these new functions work vis-a-vis helper functions from dplyr (and, briefly, purrr).

The following packages are loaded for use below. Note that the modified removal(), depletion(), catchCurve(), and chapmanRobson() functionality requires FSA v0.9.6 or higher.3

3 This version is at CRAN being processed when this post was written.

library(FSA)       # for depletion(), removal(), catchCurve(), chapmanRobson(), headtail()
library(tidyverse) # for dplyr, tidyr

Abundance

Removal Estimates

The new functionality of removal() is demonstrated with the user’s (partial) data, entered manually below.4

4 I usually enter data into a CSV file external to R. This is for demonstration only.

d <- data.frame(lake=factor(rep(c("Ash Tree","Bark","Clay"),each=5)),
                year=factor(rep(c("2010","2011","2010","2011","2010","2011"),
                                times=c(2,3,3,2,2,3))),
                pass=factor(c(1,2,1,2,3,1,2,3,1,2,1,2,1,2,3)),
                catch=c(57,34,65,34,12,54,26,9,54,27,67,34,68,35,12))
d
#R|         lake year pass catch
#R|  1  Ash Tree 2010    1    57
#R|  2  Ash Tree 2010    2    34
#R|  3  Ash Tree 2011    1    65
#R|  4  Ash Tree 2011    2    34
#R|  5  Ash Tree 2011    3    12
#R|  6      Bark 2010    1    54
#R|  7      Bark 2010    2    26
#R|  8      Bark 2010    3     9
#R|  9      Bark 2011    1    54
#R|  10     Bark 2011    2    27
#R|  11     Clay 2010    1    67
#R|  12     Clay 2010    2    34
#R|  13     Clay 2011    1    68
#R|  14     Clay 2011    2    35
#R|  15     Clay 2011    3    12

There are several things to note with these data. First, they are in “long” format where each row corresponds to one observation – in this case the catch of fish on a particular pass within a given year and lake. Second, the user wanted to estimated abundance from catches (i.e., removal) on multiple passes for all possible lake-year combinations; thus, “group” is defined by the combination of lake and year. Third, the number of passes was not consistent, with some year-lake combinations having two and others having three passes. This is not an issue as long as a removal method is used that can handle both numbers of removal. The default Carle-Strub method5 will be used here, so this will not be an issue. Fourth, it is important when data are in “long” format like this that the passes are ordered consecutively from first to last. That is the case here, but if it was not then the data.frame would need to be sorted on pass (within lake and year).

5 See method= argument in removal() for other methods.

6 If the variable is not called catch as it is here, then replace with the actual variable name.

removal() from FSA is used to compute estimates of initial population abundance (No) and probability of capture (p) by using a formula of the form ~catch with the data.frame containing the variable in data=.6 The result of this function is assigned to an object which is then given to confint() to extract confidence intervals for the parameters. Including incl.est=TRUE in confint() will include the point estimate in the result. Below is an example of using depletion() for just Ash Tree lake in 2010.

tmp <- removal(~catch,data=dplyr::filter(d,lake=="Ash Tree",year=="2010"))
confint(tmp,incl.est=TRUE)
#R|             Est    95% LCI     95% UCI
#R|  No 130.0000000 78.8281665 181.1718335
#R|  p    0.4482759  0.2107166   0.6858351

The object returned by confint() is a matrix by default, which generally prints nicely. However, the result may be returned as a data.frame, which as described previously will be needed with the dplyr functions used below, by including as.df=TRUE in confint().

confint(tmp,incl.est=TRUE,as.df=TRUE)
#R|     No   No.LCI   No.UCI         p     p.LCI     p.UCI
#R|  1 130 78.82817 181.1718 0.4482759 0.2107166 0.6858351

removal() can be used to compute parameter estimates for multiple groups by first giving the variable or variables that identify groups to group_by() and then giving that to group_modify() from dplyr. The first argument to group_modify(), when piped together with %>%, is an expression that begins with ~ followed by a call to removal() nested within a call to confint(). In the removal() call data= must be set to .x. The call to confint() should include incl.est=TRUE to get the point estimates of the parameters and must include as.df=TRUE for group_modify() to work properly.7 Below I submit the result to as.data.frame() simply to remove the grouping structure and tibble class.8

7 group_modify() requires the function defined after the ~ to return a data.frame.

8 I prefer to work with data.frames rather than tibbles.

res <- d %>%
  dplyr::group_by(lake,year) %>%
  dplyr::group_modify(~confint(removal(~catch,data=.x),
                               incl.est=TRUE,as.df=TRUE)) %>%
  as.data.frame() # removes tibble and grouping structure

For display purposes, the results for the two parameters are rounded to different numbers of digits.

res %>% 
  dplyr::mutate(dplyr::across(dplyr::starts_with("No"), \(x) round(x,digits=0))) %>%
  dplyr::mutate(dplyr::across(dplyr::starts_with("p"), \(x) round(x,digits=3)))
#R|        lake year  No No.LCI No.UCI     p p.LCI p.UCI
#R|  1 Ash Tree 2010 130     79    181 0.448 0.211 0.686
#R|  2 Ash Tree 2011 121    110    132 0.558 0.440 0.676
#R|  3     Bark 2010  95     87    103 0.589 0.464 0.715
#R|  4     Bark 2011 103     74    132 0.533 0.314 0.752
#R|  5     Clay 2010 130     96    164 0.523 0.324 0.723
#R|  6     Clay 2011 125    114    136 0.564 0.449 0.679

Depletion Estimates

The Pathfinder data9 from FSAdata contains the catch and effort for three Snapper species (Pristipomoides zonatus, Pristipomoides auricilla, and Etelis carbunculUs) in a depletion experiment around Pathfinder Reef in the Mariana Archipelago. Polovina (1985) used these data to demonstrate the need for a Leslie depletion model with variable recruitment (as was evident for Pristipomoides auricilla). Here these data are used to demonstrate how to efficiently compute population (No) and catchability (q) estimates for each species using depletion() from FSA with help from dplyr.

data(Pathfinder,package="FSAdata")
headtail(Pathfinder)
#R|       date effort Pzonatus Pauricilla Ecarbunculus
#R|  1  10-Apr   27.5       98         12           42
#R|  2  11-Apr   23.7      111         17           22
#R|  3  12-Apr   21.3       47         12           41
#R|  11  5-May   20.3       40         29           13
#R|  12  6-May   22.8       35         35           21
#R|  13  7-May   24.1       30         27           15

The data are provided in “wide” format where the catch of each species is presented as a column or variable in Pathfinder. The methods used below require data to be in “long” format where each row corresponds to one observation – in this case catch (and effort and other associated date) for one species (on one day). In some cases the data may have been entered in long format, but when it is not it must be converted to long format for the process used below.

There are several methods for converting from wide to long formats. In my mind, one of the easiest is pivot_longer() from tidyr.10 The names of columns that contain the values to be converted from wide to long are given to cols=. In this example, that will be the columns containing the catches of the separate species. Because those columns are contiguous in the original data.frame they can be referred to en masse by separating the name of the left-most column from the name for the right-most column with a colon. The three catch columns from the original data.frame will be moved to two columns in the resulting data.frame – one with the catch values and one with the spcies names. A name for the column of names can be given to names_to= and a name for the values column can be given to values_to=. Thus, Pathfinder is converted from wide to long format below.

10 Some alternatives are reshape() from base R and melt() from reshape2.

Pathfinder <- Pathfinder |>
  tidyr::pivot_longer(cols=Pzonatus:Ecarbunculus,
                      names_to="species",values_to="catch")
str(Pathfinder)
#R|  tibble [39 × 4] (S3: tbl_df/tbl/data.frame)
#R|   $ date   : Factor w/ 13 levels "10-Apr","11-Apr",..: 1 1 1 2 2 2 3 3 3 4 ...
#R|   $ effort : num [1:39] 27.5 27.5 27.5 23.7 23.7 23.7 21.3 21.3 21.3 29.7 ...
#R|   $ species: chr [1:39] "Pzonatus" "Pauricilla" "Ecarbunculus" "Pzonatus" ...
#R|   $ catch  : int [1:39] 98 12 42 111 17 22 47 12 41 91 ...

Once the data are in long format, depletion() can be used within group_modify() as shown for removal(). However, note that depletion() requires a formula of the form catch~effort and that groups are defined by only the single variable species here.

res <- Pathfinder %>%
  dplyr::group_by(species) %>%
    dplyr::group_modify(~confint(depletion(catch~effort,data=.x),
                                 incl.est=TRUE,as.df=TRUE)) %>%
    as.data.frame() # removes tibble and grouping structure
res %>% 
  dplyr::mutate(dplyr::across(dplyr::starts_with("No"), \(x) round(x,digits=0))) %>%
  dplyr::mutate(dplyr::across(dplyr::starts_with("p"), \(x) round(x,digits=3)))
#R|         species   No No_LCI No_UCI            q        q_LCI         q_UCI
#R|  1 Ecarbunculus  571    224    918  0.002499027  0.000399448  0.0045986060
#R|  2   Pauricilla -165     34   -363 -0.003134290 -0.005368398 -0.0009001819
#R|  3     Pzonatus 1062    795   1328  0.003678569  0.002225597  0.0051315404

Mortality

Catch Curves

A similar process can be followed with catchCurve() to estimate mortality for several groups. However, the “long” format data.frame either needs to contain only data for ages on the descending limb for each group or groups that have the same ages on the descending limb. Both instances are illustrated below.

FHCatfish from FSAdata11 contains numbers of Flathead Catfish (Pylodictis olivaris) captured by electrofishing in three rivers – Coosa River, AL; Ocmulgee River, GA; and Satilla River, GA – for ages ONLY on the descending limb of the catch curve.

data(FHCatfish,package="FSAdata")
headtail(FHCatfish)
#R|       river age abundance
#R|  1    Coosa   2        25
#R|  2    Coosa   3        24
#R|  3    Coosa   4        18
#R|  37 Satilla   6         1
#R|  38 Satilla   7         1
#R|  39 Satilla  10         1

The process here is essentially the same as it was for depletion() except noting that catchCurve() uses a formula of the form catch~age.

res <- FHCatfish %>%
  dplyr::group_by(river) %>%
  dplyr::group_modify(~confint(catchCurve(abundance~age,data=.x),
                               incl.est=TRUE,as.df=TRUE)) %>%
  as.data.frame() # removes tibble and grouping structure
res %>% 
  dplyr::mutate(dplyr::across(dplyr::starts_with("Z"), \(x) round(x,digits=3))) %>%
  dplyr::mutate(dplyr::across(dplyr::starts_with("A"), \(x) round(x,digits=1)))
#R|       river     Z Z_LCI Z_UCI    A A_LCI A_UCI
#R|  1    Coosa 0.164 0.123 0.205 15.1  11.5  18.5
#R|  2 Ocmulgee 0.269 0.194 0.344 23.6  17.6  29.1
#R|  3  Satilla 0.683 0.195 1.171 49.5  17.7  69.0

WalleyeKS from FSAdata contains the catch of Walleye at all observed ages in eight reservoirs in Kansas.12 The descending limb was determined to start at age-2 for fish from each reservoir, even though other ages appear in the data.frame. Thus, ages2use= must be used in catchCurve() to identify the ages on the descending limb. Here a sequence of ages beginning with 2 and ending with the maximum observed age was used.13 Otherwise the code below is similar to that above.

13 This will result in a sequence of warnings from catchCurve() as some of the groups do not have fish of the maximum age.

data(WalleyeKS,package="FSAdata")
str(WalleyeKS)
#R|  'data.frame':  66 obs. of  3 variables:
#R|   $ reservoir: Factor w/ 8 levels "Cedar.Bluff",..: 1 1 1 1 1 1 1 1 2 2 ...
#R|   $ age      : int  0 1 2 3 4 5 6 7 0 1 ...
#R|   $ catch    : int  78 70 104 52 33 13 4 2 131 28 ...
( maxage <- max(WalleyeKS$age) )
#R|  [1] 11
res <- WalleyeKS %>%
  dplyr::group_by(reservoir) %>%
  dplyr::group_modify(~confint(catchCurve(catch~age,data=.x,ages2use=2:11),
                               incl.est=TRUE,as.df=TRUE)) %>%
  as.data.frame() # removes tibble and grouping structure

res %>% 
  dplyr::mutate(dplyr::across(dplyr::starts_with("Z"), \(x) round(x,digits=3))) %>%
  dplyr::mutate(dplyr::across(dplyr::starts_with("A"), \(x) round(x,digits=1)))
#R|      reservoir     Z Z_LCI Z_UCI    A A_LCI A_UCI
#R|  1 Cedar.Bluff 0.811 0.664 0.958 55.6  48.5  61.6
#R|  2      Cheney 0.551 0.174 0.928 42.4  16.0  60.5
#R|  3  Glen.Elder 0.855 0.516 1.194 57.5  40.3  69.7
#R|  4      Kirwin 0.941 0.493 1.388 61.0  38.9  75.1
#R|  5    Lovewell 0.544 0.279 0.809 41.9  24.3  55.5
#R|  6      Marion 0.613 0.346 0.879 45.8  29.3  58.5
#R|  7     Webster 0.567 0.132 1.002 43.3  12.4  63.3
#R|  8      Wilson 0.719 0.555 0.884 51.3  42.6  58.7

Chapman-Robson

The process for the Chapman-Robson method is the same as that for the catch-curve except that chapmanRobson() is used instead of catchCurve().14

14 Also note that S is returned rather than A.

res <- WalleyeKS %>%
  dplyr::group_by(reservoir) %>%
  dplyr::group_modify(~confint(chapmanRobson(catch~age,data=.x,ages2use=2:11),
                               incl.est=TRUE,as.df=TRUE)) %>%
  as.data.frame() # removes tibble and grouping structure

res %>% 
  dplyr::mutate(dplyr::across(dplyr::starts_with("S"), \(x) round(x,digits=1))) %>%
  dplyr::mutate(dplyr::across(dplyr::starts_with("Z"), \(x) round(x,digits=3)))
#R|      reservoir    S S_LCI S_UCI     Z Z_LCI Z_UCI
#R|  1 Cedar.Bluff 46.9  42.0  51.9 0.754 0.662 0.846
#R|  2      Cheney 53.5  48.1  58.9 0.623 0.162 1.085
#R|  3  Glen.Elder 45.1  42.7  47.4 0.796 0.665 0.927
#R|  4      Kirwin 39.0  31.1  47.0 0.930 0.732 1.129
#R|  5    Lovewell 59.5  56.6  62.3 0.519 0.215 0.823
#R|  6      Marion 59.2  56.5  61.9 0.524 0.308 0.739
#R|  7     Webster 57.7  50.8  64.7 0.546 0.361 0.731
#R|  8      Wilson 47.1  42.7  51.5 0.751 0.583 0.919

Example Using purrr

It is also possible to use purrr functions with these new FSA functions, though I am not sure that it is more convenient. Here purrr is used to repeat the Chapman-Robson results from above.

First, a new function must be declared that combines the confint() and chapmanRobson() functions, making sure to include incl.est=TRUE and as.df=TRUE. Note that x=x is used because the argument o chapmanRobson() that takes the formulas is x=.

chapmanRobson2 <- function(x,...) {confint(chapmanRobson(x=x,...),
                                           incl.est=TRUE,as.df=TRUE)}

The long form data.frame is then split by groups into a list with a data.frame for each group. Here WalleyeKS is split on reservoir with split(), but split() requires the “splitting variable” to be a factor so reservoir was coerced to a factor below.

WalleyeKS_list <- split(WalleyeKS,as.factor(WalleyeKS$reservoir))

map_df() from purrr takes the list as the first argument and the function to apply to each item in the list as the second argument (i.e., our new chapmanRobson2()) and returns a data.frame. The x= is the formula of form catch~age and the ages2use= to be given to chapmanRobson. The .id= provides a name for the group variable in the data.frame returned by map_df().

res <- purrr::map_df(WalleyeKS_list,chapmanRobson2,
                     x=catch~age,
                     ages2use=2:11,
                     .id="reservoir")
res %>% 
  dplyr::mutate(dplyr::across(dplyr::starts_with("S"), \(x) round(x,digits=1))) %>%
  dplyr::mutate(dplyr::across(dplyr::starts_with("Z"), \(x) round(x,digits=3)))
#R|      reservoir    S S_LCI S_UCI     Z Z_LCI Z_UCI
#R|  1 Cedar.Bluff 46.9  42.0  51.9 0.754 0.662 0.846
#R|  2      Cheney 53.5  48.1  58.9 0.623 0.162 1.085
#R|  3  Glen.Elder 45.1  42.7  47.4 0.796 0.665 0.927
#R|  4      Kirwin 39.0  31.1  47.0 0.930 0.732 1.129
#R|  5    Lovewell 59.5  56.6  62.3 0.519 0.215 0.823
#R|  6      Marion 59.2  56.5  61.9 0.524 0.308 0.739
#R|  7     Webster 57.7  50.8  64.7 0.546 0.361 0.731
#R|  8      Wilson 47.1  42.7  51.5 0.751 0.583 0.919

References

Polovina, J. J. 1985. A variable catchability version of the Leslie model with application to an intensive fishing experiment on a multispecies stock. Fishery Bulletin 84:423–428.

Reuse

Citation

BibTeX citation:
@online{h._ogle2025,
  author = {H. Ogle, Derek},
  title = {FSA {Analyses} for {Multiple} {Groups}},
  date = {2025-01-08},
  url = {https://fishr-core-team.github.io/fishR/blog/posts/2025-1-8_FSA_Mult_Groups/},
  langid = {en}
}
For attribution, please cite this work as:
H. Ogle, D. 2025, January 8. FSA Analyses for Multiple Groups. https://fishr-core-team.github.io/fishR/blog/posts/2025-1-8_FSA_Mult_Groups/.