library(tidyverse) # for dplyr, ggplot2 packages
library(ggtext) # for use of markdown in text/labels
This is the first of several posts related to Miller et al. (2022). I thank the authors for making their data available with their publication.
Introduction
Miller et al. (2022) examined life history characteristics of Goldeye (Hiodon alosoides) in two Kansas reservoirs. Their Figure 1 displayed a time series of catch-per-unit-effort (CPE) from 1997 to 2020 for both reservoirs, including a regression line to demonstrate the decline in abundance. I use ggplot2
here to recreate both figures.
The following packages are loaded for use below. A few functions from each of readxl
, FSA
, scales
, and rstatix
are used with ::
such that the entire packages are not attached here.
Data Wrangling
Miller et al. (2022) provided the raw data for producing Figure 1 in the Data Supplement S1, which I loaded below. I converted the “reservoir” (impd
) variable to a factor, reordered the levels so that the facets created below would be in the order used by the authors, and replaced the reservoir abbreviations with their longer names/labels.
<- read.csv("JFWM-21-090.S1.csv") |>
dat mutate(impd=factor(impd,levels=c("MILR","LOVR"),labels=c("Milford","Lovewell")))
::headtail(dat) FSA
#R| impd yr fish nets net_area_m2 catch_m2
#R| 1 Lovewell 1997 496 16 1189.158 0.41710171
#R| 2 Lovewell 1998 499 16 1189.158 0.41962450
#R| 3 Lovewell 1999 324 16 1189.158 0.27246160
#R| 45 Milford 2017 73 20 891.870 0.08185049
#R| 46 Milford 2018 10 20 891.870 0.01121240
#R| 47 Milford 2020 49 20 891.870 0.05494074
I double-checked to make sure that the provided CPE variable (catch_m2
) was equal to catch (fish
) divided by the areal effort (net_area_m2
).1
1 all_equal()
checks equality between two vectors with a tolerance for machine precision.
all.equal(dat$fish/dat$net_area_m2,dat$catch_m2)
#R| [1] TRUE
These data are ready to recreate Figure 1.
Recreating Figure 1
Basic Plot with Regression Line
The basic time series plot is created with geom_line()
by mapping yr
to the x-axis and catch_m2
to the y-axis, and using a shade of gray and slightly thicker line to match the authors’ choices. A linear regression is added with geom_smooth()
using method="lm"
.2 Here se=FALSE
was used so a confidence band is not shown, and the line was changed to dashed, made black,3 and made a little thicker than the time series line to match the authors’ choices. A facet based on impd
was created with ncol=1
to stack the two plots and scales="free_x"
so that labeled ticks would appear on the x-axis of both facets. Finally, the y-axis was labelled (more on this below), both axis limits were controlled and were expanded in a way to try to match the authors’ choices, and the basic black-and-white theme was added.
2 The default is to add a loess smoother, lm
is for a linear model.
3 It defaults to blue.
ggplot(data=dat,mapping=aes(x=yr,y=catch_m2)) +
geom_line(color="gray50",linewidth=1) +
geom_smooth(method="lm",se=FALSE,linetype="dashed",color="black",linewidth=1.2) +
facet_wrap(vars(impd),ncol=1,scales="free_x") +
scale_y_continuous(name="Catch/m^2^",expand=expansion(mult=c(0.02,0.10))) +
scale_x_continuous(limits=c(1995,2020),expand=expansion(mult=0.03)) +
theme_bw()
The portion of the regression line for Lovewell that extends below 0 was cut off in the published Figure 1. It is tempting to accomplish this by limiting the extent of the y-axis with limits=
in scale_y_continuous()
as has been done in other posts. However, in this case, that will remove the years for which the regression extends below zero from the data used to compute the regression, effectively altering the results. The better way to handle this is to limit the extent of the y-axis with ylim=
in coord_cartesian()
. With this all data is used in the regression, but the axis is clipped for viewing.
ggplot(data=dat,mapping=aes(x=yr,y=catch_m2)) +
geom_line(color="gray50",linewidth=1) +
geom_smooth(method="lm",se=FALSE,linetype="dashed",color="black",linewidth=1.2) +
facet_wrap(vars(impd),ncol=1,scales="free_x") +
scale_y_continuous(name="Catch/m^2^",expand=expansion(mult=c(0.02,0.10))) +
scale_x_continuous(limits=c(1995,2020),expand=expansion(mult=0.03)) +
coord_cartesian(ylim=c(0,0.4)) +
theme_bw()
The theme was then modified to try to match choices made by the authors. Specifically, the grid lines were removed; the x-axis title was removed; the axes tick mark labels were made slightly larger, bold, and black; and the facet strip labels were removed. In addition, the y-axis title was formatted with element_markdown()
which forces any markdown language code in the title to be rendered appropriately. In this case, the “carets” around the “2” in name=
(in scale_y_continuous()
) is markup code to superscript the “2.” Other arguments to element_markdown()
are treated the same as those in element_text()
. Thus, the y-axis title was also made slightly larger and bold.
ggplot(data=dat,mapping=aes(x=yr,y=catch_m2)) +
geom_line(color="gray50",linewidth=1) +
geom_smooth(method="lm",se=FALSE,linetype="dashed",color="black",linewidth=1.2) +
facet_wrap(vars(impd),ncol=1,scales="free_x") +
scale_y_continuous(name="Catch/m^2^",expand=expansion(mult=c(0.02,0.10))) +
scale_x_continuous(limits=c(1995,2020),expand=expansion(mult=0.03)) +
coord_cartesian(ylim=c(0,0.4)) +
theme_bw() +
theme(panel.grid=element_blank(),
axis.title.x=element_blank(),
axis.text=element_text(size=10,face="bold",color="black"),
strip.text=element_blank(),
axis.title.y=element_markdown(size=12,face="bold"))
Reservoir and Regression Result Labels
The last part to add to the plot is the reservoir label along with the significance (of the slope) and “r-squared” value from the regression. This takes a little bit of work.
First, the regressions for each reservoir were fit “outside” of ggplot and the results returned from summary()
were saved to individual objects.
<- summary(lm(catch_m2~yr,data=filter(dat,impd=="Milford")))
mreg <- summary(lm(catch_m2~yr,data=filter(dat,impd=="Lovewell"))) lreg
These objects contain the pertinent p-value and r-squared, which will be extracted for use. The p-value is in the second row and Pr(>|t|)
column of the $coefficients
portion of the results. These values were extracted from both regression results, saved into a vector ps
, and then that vector was given to p_format()
from rstatix
for formatting. This particular use of p_format()
will include the p=
or p<
label, will use one significant digit, and will convert any p-value that is less than 0.001 to “p<0.001”.
<- c(mreg$coefficients[2,"Pr(>|t|)"],lreg$coefficients[2,"Pr(>|t|)"]) ) ( ps
#R| [1] 7.148242e-03 3.053910e-08
<-rstatix::p_format(ps,add.p=TRUE,accuracy=0.001,digits=1) ) ( ps
#R| [1] "p=0.007" "p<0.001"
The r-squared values are stored in $r.squared
of the regression results and are extracted below and placed into a vector rs
. They are then rounded to three decimal places and appended to r^2^
which is markdown language for r2.
<- c(mreg$r.squared,lreg$r.squared) ) ( rs
#R| [1] 0.2971323 0.7588716
<- paste0("r^2^=",round(rs,3)) ) ( rs
#R| [1] "r^2^=0.297" "r^2^=0.759"
Finally, a small data frame was created that had the reservoir names (in factor format to match the original data frame) and the ps
and rs
pasted together with <br>
in between in a variable called lbls
. The <br>
will force a “line break” when the markdown language is rendered, thus forming a label with the r-squared value beneath to p-value result.
<- data.frame(impd=factor(c("Milford","Lovewell"),levels=c("Milford","Lovewell")),
dlbls lbls=paste0(ps,"<br>",rs))
dlbls
#R| impd lbls
#R| 1 Milford p=0.007<br>r^2^=0.297
#R| 2 Lovewell p<0.001<br>r^2^=0.759
The reservoir names are added with geom_text()
below as described in this post. Note here that size=13/.pt
will use a 13 pt font and that fontface="bold"
will make the font bold. geom_richtext()
is very similar to geom_text()
except that it will render the markdown code appropriately. By default geom_richtext()
places a box around the text, which is removed with label.color=NA
. The reservoir label and the regression statistics were placed on the plot separately because the authors used a larger font for the reservoir label.
ggplot(data=dat,mapping=aes(x=yr,y=catch_m2)) +
geom_line(color="gray50",linewidth=1) +
geom_smooth(method="lm",se=FALSE,linetype="dashed",color="black",linewidth=1.2) +
geom_text(mapping=aes(label=impd),
x=2014,y=Inf,vjust=1.3,hjust=0,size=13/.pt,fontface="bold",
check_overlap=TRUE) +
geom_richtext(data=dlbls,mapping=aes(label=lbls),
x=2014,y=Inf,vjust=1.5,hjust=0,size=10/.pt,fontface="bold",
label.color=NA) +
facet_wrap(vars(impd),ncol=1,scales="free_x") +
scale_y_continuous(name="Catch/m^2^",expand=expansion(mult=c(0.02,0.10))) +
scale_x_continuous(limits=c(1995,2020),expand=expansion(mult=0.03)) +
coord_cartesian(ylim=c(0,0.4)) +
theme_bw() +
theme(panel.grid=element_blank(),
axis.title.x=element_blank(),
axis.text=element_text(size=10,face="bold",color="black"),
strip.text=element_blank(),
axis.title.y=element_markdown(size=12,face="bold"))
Further Thoughts
Confidence Band
In the authors’ Figure 5 they included a confidence band around the fitted von Bertalanffy growth curve. They did not include a confidence band in their Figure 1. A confidence band can be added to Figure 1 by simply removing se=FALSE
from geom_smooth()
.4
4 The default is se=TRUE
which adds the confidence band.
ggplot(data=dat,mapping=aes(x=yr,y=catch_m2)) +
geom_line(color="gray50",linewidth=1) +
geom_smooth(method="lm",linetype="dashed",color="black",linewidth=1.2) +
geom_text(mapping=aes(label=impd),
x=2014,y=Inf,vjust=1.3,hjust=0,size=13/.pt,fontface="bold",
check_overlap=TRUE) +
geom_richtext(data=dlbls,mapping=aes(label=lbls),
x=2014,y=Inf,vjust=1.5,hjust=0,size=10/.pt,fontface="bold",
label.color=NA) +
facet_wrap(vars(impd),ncol=1,scales="free_x") +
scale_y_continuous(name="Catch/m^2^",expand=expansion(mult=c(0.02,0.10))) +
scale_x_continuous(limits=c(1995,2020),expand=expansion(mult=0.03)) +
coord_cartesian(ylim=c(0,0.4)) +
theme_bw() +
theme(panel.grid=element_blank(),
axis.title.x=element_blank(),
axis.text=element_text(size=10,face="bold",color="black"),
strip.text=element_blank(),
axis.title.y=element_markdown(size=12,face="bold"))
Information Label
The process of adding the reservoir name with the regression results above was hacky because the reservoir name was in a larger font then the regression results. This, can, however be solved more elegantly, but more verbosely, using HTML code in geom_richtext()
.5 Here, though, the reservoir label and the regression results need to be wrapped in paired <span>
and <\span>
couplets, with <span>
including a style=
that sets the font size as shown below.
5 Markdown is flexible enough to render HTML code appropriately.
<- c("Milford","Lovewell")
tmp <- data.frame(impd=factor(tmp,levels=tmp),
dlbls lbls=paste0("<span style='font-size:13pt;'>",tmp,"</span><br>",
"<span style='font-size:10pt;'>",ps,"<br>",
"</span>"))
rs, dlbls
#R| impd
#R| 1 Milford
#R| 2 Lovewell
#R| lbls
#R| 1 <span style='font-size:13pt;'>Milford</span><br><span style='font-size:10pt;'>p=0.007<br>r^2^=0.297</span>
#R| 2 <span style='font-size:13pt;'>Lovewell</span><br><span style='font-size:10pt;'>p<0.001<br>r^2^=0.759</span>
With this new dlbls
data frame, the geom_text()
from above that put on the reservoir labels can be removed and, in geom_richtext()
, the size=
should be removed as the font size is set in dlbls
and vjust=
should be adjusted up a little bit.
ggplot(data=dat,mapping=aes(x=yr,y=catch_m2)) +
geom_line(color="gray50",linewidth=1) +
geom_smooth(method="lm",se=FALSE,linetype="dashed",color="black",linewidth=1.2) +
geom_richtext(data=dlbls,mapping=aes(label=lbls),
x=2014,y=Inf,vjust=1.1,hjust=0,fontface="bold",
label.color=NA) +
facet_wrap(vars(impd),ncol=1,scales="free_x") +
scale_y_continuous(name="Catch/m^2^",expand=expansion(mult=c(0.02,0.10))) +
scale_x_continuous(limits=c(1995,2020),expand=expansion(mult=0.03)) +
coord_cartesian(ylim=c(0,0.4)) +
theme_bw() +
theme(panel.grid=element_blank(),
axis.title.x=element_blank(),
axis.text=element_text(size=10,face="bold",color="black"),
strip.text=element_blank(),
axis.title.y=element_markdown(size=12,face="bold"))
If you know a little bit of markdown or HTML, then ggtext
is a simple but powerful package for handling these kinds of issues. See more about ggtext
here.
References
Reuse
Citation
@online{h. ogle2023,
author = {H. Ogle, Derek},
title = {Miller Et Al. (2022) {CPE} {Plot}},
date = {2023-03-29},
url = {https://fishr-core-team.github.io/fishR//blog/posts/2023-3-29_Milleretal2022_Fig1},
langid = {en}
}