```
library(tidyverse) # for dplyr, ggplot2 packages
library(ggtext) # for use of markdown in text/labels
library(FSA) # for catchCurve et al.
```

# Introduction

Miller et al. (2022) examined life history characteristics of Goldeye (*Hiodon alosoides*) in two Kansas reservoirs. Their Figure 4 represents a catch curve (log catch at age) for Goldeye captured in Milford Reservoir in 2020. I use `FSA`

and `ggplot2`

here to recreate this figure.

The following packages are loaded for use below. A few functions from each of `readxl`

and `scales`

are used with `::`

such that the entire packages are not attached here.

# Data Wrangling

Miller et al. (2022) provided the raw data for producing Figure 4 in their Data Supplement S2. These are the same data used to recreate Figures 2 and 3 in this post. Thus, I do not explain the data wrangling used to obtain `dat2`

below.

```
<- read.csv("../2023-3-31_Milleretal2022_Fig23/JFWM-21-090.S2.csv") |>
dat2 filter(ann==1)
head(dat2)
```

```
#R| netid tl w agecap ann bclen
#R| 1 6 385 521 6 1 210
#R| 2 6 357 466 4 1 213
#R| 3 6 397 725 5 1 209
#R| 4 8 393 610 8 1 202
#R| 5 8 373 571 4 1 193
#R| 6 8 389 656 6 1 163
```

The total catch of individuals at each age-at-capture is required to construct Figure 4. One issue that occurs with these data is that no age-2 fish were captured and Miller et al. (2022) treat this as an observed zero. If the data are simply grouped by `agecap`

and summarized then no age-2 data will be present in the result. There are several ways to deal with this but I am going to handle it by first creating a new variable `fagecap`

that is a factored version of `agecap`

. The key here, though, is to set the levels of this new variable to cover the entire range of observed ages.

```
<- dat2 |>
dat2 mutate(fagecap=factor(agecap,levels=1:8))
```

The “sample size” (i.e., catch) at each `fagecap`

is then found. It is important to use `.drop=FALSE`

so that the age-2 “level” is not removed from the resultant data frame because it did not exist in the original data frame.

```
<- dat2 |>
sum_a group_by(fagecap,.drop=FALSE) |>
summarize(catch=n())
sum_a
```

```
#R| # A tibble: 8 × 2
#R| fagecap catch
#R| <fct> <int>
#R| 1 1 122
#R| 2 2 0
#R| 3 3 5
#R| 4 4 7
#R| 5 5 10
#R| 6 6 3
#R| 7 7 2
#R| 8 8 3
```

The issue now with using these data is that `fcapage`

is a factor rather than a numeric. Thus, a new `agecap`

variable is created below that treats age as numeric.^{1}

^{1} This conversion code comes from `?factor`

.

```
<- sum_a |>
sum_a mutate(agecap=as.numeric(levels(fagecap)))
sum_a
```

```
#R| # A tibble: 8 × 3
#R| fagecap catch agecap
#R| <fct> <int> <dbl>
#R| 1 1 122 1
#R| 2 2 0 2
#R| 3 3 5 3
#R| 4 4 7 4
#R| 5 5 10 5
#R| 6 6 3 6
#R| 7 7 2 7
#R| 8 8 3 8
```

Finally, the catch curve analysis ultimately requires log-transforming the catch variable. The zero for age-2 will cause an error when log-transforming. The authors addressed this by adding 1 to all of the `catch`

es. This is common practice, but I comment on it further below. Also, for use when recreating Figure 4, the log of these modified catches are added to the data frame.

```
<- sum_a |>
sum_a mutate(catch1=catch+1,
logcatch1=log(catch1))
sum_a
```

```
#R| # A tibble: 8 × 5
#R| fagecap catch agecap catch1 logcatch1
#R| <fct> <int> <dbl> <dbl> <dbl>
#R| 1 1 122 1 123 4.81
#R| 2 2 0 2 1 0
#R| 3 3 5 3 6 1.79
#R| 4 4 7 4 8 2.08
#R| 5 5 10 5 11 2.40
#R| 6 6 3 6 4 1.39
#R| 7 7 2 7 3 1.10
#R| 8 8 3 8 4 1.39
```

# Catch Curve Analysis

Catch curves analysis^{2} may be conducted with `catchCurve()`

from `FSA`

. `catchCurve`

require a formula of the form `catch~age`

as the first argument and the corresponding data frame in `data=`

. The “weighted catch curve” that Miller et al. (2022) used requires `weighted=TRUE`

in `catchCurve()`

. The results should be saved to an object.

^{2} I assume the reader is familiar with catch curves. If not see Chapter 11 in Ogle (2016).

`<- catchCurve(catch1~agecap,data=sum_a,weighted=TRUE) cc1 `

The point estimates of \(Z\) and \(A\) may be extracted with `coef()`

or seen in the `Estimate`

column of the `summary()`

results.

`coef(cc1)`

```
#R| Z A
#R| 0.2709483 23.7344113
```

`summary(cc1)`

```
#R| Estimate Std. Error t value Pr(>|t|)
#R| Z 0.2709483 0.2544583 1.064805 0.3279271
#R| A 23.7344113 NA NA NA
```

The so-called “recruitment coefficient of determination” (RCD) in Miller et al. (2022) is just the usual r^{2}, which can be extracted from the `summary()`

of the `lm`

object in `cc1`

.

`<- summary(cc1$lm)$r.squared ) ( r2 `

`#R| [1] 0.1589346`

These results were put into a label that will ultimately be placed on the plot.^{3}

^{3} This process is discussed in several previous posts, including this one.

```
<- paste0("*Z* = ",round(coef(cc1)[["Z"]],2),
lbl "<br>*A* = ",round(coef(cc1)[["A"]],2),"%",
"<br>RCD = ",round(r2,2))
lbl
```

`#R| [1] "*Z* = 0.27<br>*A* = 23.73%<br>RCD = 0.16"`

The `cc1`

object also contains the data used in the catch curve analysis, including the weights in an object called `weights.e`

. These weights are added to the `sum_a`

data frame as they are needed to produce Figure 4.^{4}

^{4} The weights can be produced “manually” as described in Ogle (2016), but this process using `catchCurve()`

is more efficient and less prone to error.

```
<- sum_a |>
sum_a mutate(wts=cc1$weights.e)
sum_a
```

```
#R| # A tibble: 8 × 6
#R| fagecap catch agecap catch1 logcatch1 wts
#R| <fct> <int> <dbl> <dbl> <dbl> <dbl>
#R| 1 1 122 1 123 4.81 2.68
#R| 2 2 0 2 1 0 2.45
#R| 3 3 5 3 6 1.79 2.22
#R| 4 4 7 4 8 2.08 1.98
#R| 5 5 10 5 11 2.40 1.75
#R| 6 6 3 6 4 1.39 1.52
#R| 7 7 2 7 3 1.10 1.29
#R| 8 8 3 8 4 1.39 1.06
```

The data in `sum_a`

are now ready to recreate Figure 4.

# Recreating Figure 4

Figure 4 is simply a scatterplot with a regression line overlaid. Constructing these kinds of plots was discussed in detail in this post. Thus, most of the code below are not discussed in depth. A very important detail, though, is that mapping `weight=wts`

will provide the `wts`

to `lm()`

in `geom_smooth()`

as weights, so that the weighted regression is used in the same way as was done in `catch_curve()`

.^{5}

^{5} In my opinion, the y-axis title should clearly note that 1 was added to the catches. Thus, my y-axis title is different than that in Miller et al. (2022).

```
ggplot(data=sum_a,mapping=aes(x=agecap,y=logcatch1,weight=wts)) +
geom_smooth(method="lm",se=FALSE,color="black") +
geom_point() +
scale_x_continuous(name="Estimated age",
limits=c(0,8),breaks=scales::breaks_width(1),
expand=expansion(mult=c(0,0.02))) +
scale_y_continuous(name="log(catch+1)",
limits=c(0,5),breaks=scales::breaks_width(1),
expand=expansion(mult=c(0.01,0))) +
theme_bw() +
theme(panel.grid=element_blank(),
axis.title=element_text(size=12,face="bold"),
axis.text=element_text(size=10,face="bold",color="black")) +
annotate(geom="richtext",x=Inf,y=Inf,vjust=1,hjust=1,label=lbl,
label.color=NA,fontface="bold")
```

### Possible Modifications

As with other posts related to Miller et al. (2022), a confidence band can be added to this regression.^{6}

^{6} Note use of `coord_cartesian()`

as the confidence band extends outside the desired range of the y-axis.

```
ggplot(data=sum_a,mapping=aes(x=agecap,y=logcatch1,weight=wts)) +
geom_smooth(method="lm",color="black") +
geom_point() +
scale_x_continuous(name="Estimated age",
limits=c(0,8),breaks=scales::breaks_width(1),
expand=expansion(mult=c(0,0.02))) +
scale_y_continuous(name="log(catch+1)",
breaks=scales::breaks_width(1),
expand=expansion(mult=c(0.01,0))) +
coord_cartesian(ylim=c(0,5)) +
theme_bw() +
theme(panel.grid=element_blank(),
axis.title=element_text(size=12,face="bold"),
axis.text=element_text(size=10,face="bold",color="black")) +
annotate(geom="richtext",x=Inf,y=Inf,vjust=1,hjust=1,label=lbl,
label.color=NA,fontface="bold")
```

Here, I think the confidence band is critical to include because it demonstrates that the regression line is NOT statistically declining.^{7} Obviously, there is mortality in this population, so this indicates data issues.

^{7} This was also apparent by the “large” p-value for the slope in the `summary(cc1)`

results above.

A major assumption of catch curve analysis is that recruitment is constant, which it clearly is not here. The most glaring evidence of this is the complete lack of age-2 fish. I don’t know if the age-2 fish should be considered as an observed zero (i.e., that age-class is actually very low) or if there is a sampling issue (i.e., was the gear highly selective; here is some evidence for that in the length frequency histogram in their Figure 2 and in this post.)

It is common to add 1 to “integer” data that has zeroes prior to log-transformation. However, this can be problematic. For example, adding 1 to the catch of age-1 fish is relatively minor (<1% change from 122 to 123), but the same modification is relatively important for the catch of age-7 fish (50% increase from 2 to 3). This can influence mortality estimates. For example, below one was added **only** to the age-2 catch.^{8}

^{8} I am not suggesting that this is the correct thing to do. However, the authors would not have added a 1 if *one* age-2 fish had been captured. Thus, this will demonstrate some level of sensitivity to the data collection and analysis choice.

```
<- sum_a |>
sum_a mutate(catch1a=ifelse(agecap==2,1,catch))
sum_a
```

```
#R| # A tibble: 8 × 7
#R| fagecap catch agecap catch1 logcatch1 wts catch1a
#R| <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#R| 1 1 122 1 123 4.81 2.68 122
#R| 2 2 0 2 1 0 2.45 1
#R| 3 3 5 3 6 1.79 2.22 5
#R| 4 4 7 4 8 2.08 1.98 7
#R| 5 5 10 5 11 2.40 1.75 10
#R| 6 6 3 6 4 1.39 1.52 3
#R| 7 7 2 7 3 1.10 1.29 2
#R| 8 8 3 8 4 1.39 1.06 3
```

And the catch curve analysis was conducted with these data instead.

```
<- FSA::catchCurve(catch1a~agecap,data=sum_a,weighted=TRUE)
cc1a summary(cc1a)
```

```
#R| Estimate Std. Error t value Pr(>|t|)
#R| Z 0.3400406 0.2737122 1.242329 0.2604674
#R| A 28.8258572 NA NA NA
```

The issue of no statistical decline is still evident, but the point estimates of A, for example, has increased substantially.

Ignoring the age-2 “fish” completely has an even larger effect.

```
<- FSA::catchCurve(catch~agecap,data=sum_a,ages2use=-2,weighted=TRUE)
cc1b summary(cc1b)
```

```
#R| Estimate Std. Error t value Pr(>|t|)
#R| Z 0.6105809 0.1688334 3.61647 0.0152761
#R| A 45.6964676 NA NA NA
```

I am not sure what is the best way to handle this problem, but this exercise suggests to me that the published mortality estimate should be considered cautiously.

## References

## Reuse

## Citation

```
@online{h.ogle2023,
author = {Derek H. Ogle},
title = {Miller Et Al. (2022) {Catch} {Curve} {Plot}},
date = {2023-04-01},
url = {https://fishr-core-team.github.io/fishR//blog/posts/2023-4-1_Milleretal2022_Fig4},
langid = {en}
}
```