vignettes/collectRadiiData.Rmd
collectRadiiData.Rmd
The size of fish at a previous time in their life is often estimated through “back-calculation.” Back-calculation of previous lengths requires accurate measurements of annual growth on calcified structures from individual fish and a suitable model that relates growth on the structure to growth of the fish.
The FishBC
software is commonly used to measure lengths on a calcified
structure and apply a back-calculation model to estimate length at
previous ages. However, FishBC only works on out-dated computers, there
are no plans to update it, and it is not open source. The functionality
in the RFishBC
package is meant to replace FishBC. Methods
for making measurements on images of calcified structures is
demonstrated in this vignette. Using those measurements to
back-calculate fish length at a previous age is demonstrated in the Compute Back-Calculated Lengths vignette.
This vignette assumes that you have a basic understanding of how to back-calculate fish lengths at previous ages as described in the Short Introduction to Back-Calculation vignette. At the very least, you should be aware of what calcified structures and radial measurements are.
This vignette also assumes that you have static digital images of structures. The images must be of jpeg (.jpg), portable network graphics (.png), bitmap (.bmp), or TIFF format. Images will usually be obtained from a camera mounted on a microscope and connected to a computer. Below is an image of a Kiyi (Coregonus kiyi) scale.
Ideally, but not necessarily, the image will also contain an object of known length (e.g., a “scale-bar”) so that actual lengths on the structure can be found. If a scale-bar is absent, then the measured lengths will be on an arbitrary scale (i.e., the actual values will be meaningless but the proportion of the total structure radius to each annulus will be meaningful). An image of a Kiyi otolith with a 1-mm scale-bar is shown in later sections of this vignette.
The process described herein requires that all images to be processed be in the same directory or folder. To be most efficient (and simple), this directory should contain ONLY image files related to a particular project (e.g., one species for one water body for one year) and all image files should be of the same type (e.g., png or jpg). Additionally, as shown in the Processing Multiple Images section below, it may be more efficient if the image file names end with an underscore (i.e., a “_“) followed by the fish’s unique identification number (and then, of course, the file extension).
The working directory of R should be set with setwd()
to
the directory that contains the images. The following is an example of
setting a working directory in R.
setwd("c:/work/aging/Kiyi2014")
The working directory may also be set interactively through a dialog box using the following code.1
setwd(choose.dir())
Finally, note that the process described herein will produce one R data object file (.rds files; hereafter called “R data file”) for each set of measurements made on a structure image. These R data files will be saved in the current working directory (likely the directory with the structure image files). I suggest keeping the R data files and corresponding structure image files together at all times, as the R data files serve as archives of the data collected from the structure image files.
The digitizeRadii()
function is used to
For example (but described more thoroughly further below), the line below identifies “Scale_1.jpg” as the structure image, this fish has an identification number of “1”, this reading of the image should be labelled as “DHO”,2 and the structure margin or edge should not be considered as an annulus. When this line is run it will open the image,3 allow the user to select points that represent a linear transect and annuli on that transect, and save information about this image and results of this process to “Scale_1_DHO.rds”.4
digitizeRadii("Scale_1.jpg",id="1",reading="DHO",edgeIsAnnulus=FALSE)
Further specifics of digitizeRadii()
and many of its
arguments5 are described below.
The digitizeRadii()
function requires
only three arguments.
img
: File name (or names; see the Processing Multiple Images section) for the
structure image (or images), which must be in the current working
directory. A dialog box will be provided from which the image file (or
files) can be selected if this argument is not given and you are using a
Windows machine.id=
: The unique identifier (or identifiers if more than
one image is provided in img
) for the fish/structure(s). If
this argument is missing, then the ID can be entered in a dialog box (if
using Windows) or in the console. By default, the Windows dialog will be
populated with the fish ID if that ID follows an underscore at the end
of the filename (sans the extension).edgeIsAnnulus=
: A logical that indicates whether the
point selected at the structure margin should be considered an annulus
or not. If the fish was captured at a time when the margin shows growth,
but not a complete year’s worth of growth, then use
edgeIsAnnulus=FALSE
. However, if the fish was captured
before the current year’s growth commenced or after it has completed
then use edgeIsAnnulus=TRUE
. Identifying whether
the structure margin is an annulus or not is critical to properly
recording radial measurements. As such, there is no default for
this argument (i.e., it MUST be set by the user).Other optional arguments that are likely to be commonly used are:
reading=
: Label for the reading. The
reading=
argument is primarily used when the structure is
read more than once. However, I suggest giving a descriptive label to
reading=
even if there are no plans to read the structure
again.6
windowSize=
: A value that sets the size of the separate
window in which the image will appear. This value will become the larger
of the two dimensions, with the other value proportionate so that the
original aspect ratio of the image is maintained. Note that the default
is windowSize=7
; so values larger than 7 represent a
“zooming in.”device=
: The image will be opened in a separate window.
This will happen seamlessly with many operating systems (especially if
using a Windows machine). However, device="X11"
may be
needed with some Mac OS.7
Finally, note that the R data file that will be created after the
annuli have been selected will have the same name as the image file8 but
including the suffix optionally provided in suffix=
. If
nothing is given in suffix=
, then a suffix will be created
from reading=
(if it exists). For example, if the structure
image file was named “Scale_1.jpg”, then the resultant R data file will
be named “Scale_1_DHO.rds” if reading="DHO"
and
suffix=
was not set or “Scale_1_TESTING.rds” if
reading="DHO"
and suffix="TESTING"
.
By default, the fish ID will be shown in the top-left corner of the
image. This may be moved by giving a different location to
pos.info=
. For example, pos.info="bottomright"
would move this information to the bottom-right corner of the image. The
color of this information may be changed with col.info=
and
the relative size may be changed with cex.info=
.
At this point, the image may look something like that below.
Other arguments to digitizeRadii()
are described in the
specific sections below.
A scaling factor to convert measurements on the image to actual measurements on the structure is required if actual lengths, rather than arbitrary (but proportional) lengths, are needed.9 This scaling factor may be calculated from a scale-bar found on the image or provided by the user.
Scale-bar On Image: If a scale-bar of known length
exists on the image, then use scaleBar=TRUE
with the actual
length of the scale bar given in scaleBarLength=
. You will
then select the two end points of the scale-bar on the structure image
prior to selecting points that represent annuli. Press the ‘f’ key (for
“finished”) after selecting the end points of the scale-bar.10 An
appropriate scaling factor will be computed from your selections and the
radial measurements on the image will be converted to actual lengths on
the structure.
Separately Defined Scaling Factor: In applications
where a scale-bar does not exist on the image, the user can provide a
value to scalingFactor=
, which will be multiplied by
lengths on the structure image to derive actual lengths. One way to
derive this scaling factor is to capture an image of the structure at a
specific magnification on the microscope and then capture a separate
image of an object of known length at that same magnification. Note that
these captured images must be of the same size so that the aspect ratio
is consistent. A scaling factor may then be computed from the image with
the object of known length and applied to the structure image. This
scaling factor can be found by giving findScalingFactor()
the file name with the object of known length and that known length in
knownLength=
. The value returned from
findScalingFactor()
can then be given to
scalingFactor=
in digitizeRadii()
.11
If no scalingFactor=
is derived from a scale-bar or
provided, then the radial measurements returned by
digitizeRadii()
are simply proportional to the unknown
actual lengths on the structure.
At this point (i.e., after having selected the scale-bar endpoints), the image may look something like that below.
After the scaling factor has been determined (or provided) and if
makeTransect=TRUE
(the default), then you will select a
transect on the structure image on which annular marks will be selected.
This transect is selected by first selecting the structure
focus and then selecting the structure
margin/edge and pressing the ‘f’ key.12 The color of the
transect may be changed with col.transect=
. The width of
the transect may be increased by including a number greater than 1 in
lwd.transect=
.
At this point (i.e., after having selected the transect endpoints), the image may look something like that below.
If you prefer not to use a transect, for example if the “growth
trajectory” is primarily curved, then set
makeTransect=FALSE
(which will then change
snap2Transect=
to FALSE
). If you choose not to
define a transect, then, in contrast to what is described in the next
section, you will be prompted to select the structure center and
successive annuli out to the margin.
Once a transect has been identified on the structure (assuming that you are using a transect), then you can select points on the structure that represent annuli. Points are selected by clicking with the first (left) mouse button at a point on the image. The most recently selected point can be removed by pressing the ‘d’ key (for “delete”). When the last point has been selected, press the ‘f’ (for “finished”) key.
Selected points will be marked with the plotting character given in
pch.sel=
(defaults to a filled circle) with a color given
in col.sel=
(defaults to yellow). Any deleted points will
be marked with the color and character in col.del=
and
pch.del=
(defaults to a red circle with an “x” in it).
If using a linear transect, many users prefer that all selected
points fall exactly on the transect. In practice, some points may be
selected that are slightly off the transect. Selected points will be
“moved” perpendicularly to fall exactly on the transect when
snap2Transect=TRUE
(the default).
At this point, the image may look something like that below (note that two points were deleted in the selection process for this image).
When you have finished selecting points, information about your selections, including the calculated radial measurements, are saved to the R data file. The contents of this file are described further here.
It will be common to process a number of image files, one after
another. The selection of the images can be made more efficient by
supplying digitizeRadii()
a vector of image file names and
corresponding fish IDs. These vectors can be constructed in a variety of
ways before calling digitizeRadii()
. First, the user can
simply type the image names and fish IDs into vectors; e.g.,
However, if the image file names follow a general pattern, for
example always being JPG files and always containing the word “Scale” or
“Bass” or something similar, then a list of all image file names in the
current working directory can be obtained with listFiles()
.
This function takes the common extension as its first argument and the
common “other” words in other=
. For example, the following
code finds all JPG files that also contain the word “Scale” in the
current working directory.
( imgs <- listFiles("jpg",other="Scale"))
#> [1] "Scale_1.jpg" "Scale_1_A.jpg" "Scale_1_B.jpg" "Scale_1_C.jpg"
#> [5] "Scale_1_D.JPG" "Scale_2.jpg" "Scale_3.jpg"
The user would still need to manually create a vector of
corresponding fish IDs. However, if the image file names follow the
idiom of having the fish ID after an underscore (i.e., “_“) at the end
of the file name (not including the extension) then getID()
can be used to efficiently extract a vector of fish IDs from the vector
of file names. For example,
( ids <- getID(imgs) )
#> [1] "1" "A" "B" "C" "D" "2" "3"
With these vectors, the following call to
digitizeRadii()
will bring up the first image on which you
can mark annuli as described above. When you have finished with the
first image then the second image will automatically appear on which you
can then mark annuli. This process will be repeated until the last image
in the vector of images has been completed.
digitizeRadii(imgs,id=ids,reading="DHO",edgeIsAnnulus=FALSE)
Some directories may have such a large number of images that the user
will not want to process them all at one time as the code above would
do. In this case, the user could select a certain number, say 10, of the
images to process at any one time. For example, the call to
digitizeRadii()
below would be used to process the first 10
images from the vector of image names.
digitizeRadii(imgs[1:10],id=ids[1:10],reading="DHO",edgeIsAnnulus=FALSE)
The second 10 images could then be processed with the following.
digitizeRadii(imgs[11:20],id=ids[11:20],reading="DHO",edgeIsAnnulus=FALSE)
If one does not include an image file name (or vector of image file
names) in digitizeRadii()
then a dialog box will appear in
which a file name or names can be selected. If multiple file names are
selected then digitizeRadii()
will assume that the file
names use the convention of having the fish ID after the last
underscore. In this way, multiple files can be chosen from a dialog box,
rather than by creating the vectors described above. In this case, the
call to digitizeRadii()
would look like the following.
digitizeRadii(reading="DHO",edgeIsAnnulus=FALSE)
Note that when processing multiple image files as described in this
section, all of the options must be the same across all of the
images. In the example above, for example, all image files must
use reading="DHO"
and edgeIsAnnulus=FALSE
.
The user can “start over” the processing of any image by pressing the “z” key at any time during the processing. Note however that this is a “hard reset” in the sense that all points selected prior to pressing the “z” key will be lost, the original unmarked image will be reloaded, and you will need to start over processing the image (i.e., marking the scale-bar, transect, and annuli again).
The user may also abort or skip processing an image that has been loaded by pressing the “q” key at any time during the processing. This will most likely be useful when processing multiple images at one time as described previously. For example, an image may appear that is unreadable such that annuli cannot be reliably marked on the image. Note that aborting processing an image will result in no R data file being created for that image.
If the user is processing multiple images as shown in the Processing Multiple Images section, then “q” will abort the current image and move to the next image. However, pressing “k” (i.e., “kill” the process) will abort the current image and NOT move on to any other images.
As described above, digitizeRadii()
has several
arguments that provide flexibility when measuring radii on images. The
default values for all of these arguments can be seen with
RFBCoptions()
(i.e., without any arguments). The value for
any argument can be seen by appending the argument name to
RFBCoptions()
with a $
. For example, the
current setting for the makeTransect
argument is
TRUE
as shown below.
RFBCoptions()$makeTransect
#> [1] TRUE
Default values for these arguments may be changed within
digitizeRadii()
. For example, the code below sets the
“reading” label to “DHO”, the edge to not be considered an annulus, the
width of the transect line to be thicker, and identifying that a
scale-bar with a known length of 0.6 mm is present.
digitizeRadii("Scale_1.jpg",id="1",reading="DHO",edgeIsAnnulus=FALSE,
lwd.transect=3,scaleBar=TRUE,scaleBarLength=0.6)
However, changing the arguments within digitizeRadii()
is inefficient if you will be processing many images with the same
arguments. Thus, the default values for these arguments can be set for
the entire session (i.e., until you change them or close R and open it
again) by including the argument name set equal to the desired default
value within RFBCoptions()
. For example, if the code below
is run at the beginning of a session (i.e., early in the script), then
every call to digitizeRadii()
after that will default to
using “DHO” as the reading label, not treating the edge as an annulus,
using a thicker line for the transect, and identifying that a scale-bar
with a known length of 0.6 exists on the image.
RFBCoptions(reading="DHO",edgeIsAnnulus=FALSE,lwd.transect=3,
scaleBar=TRUE,scaleBarLength=0.6)
With these changes to the default settings, the last call to
digitizeRadii()
above could be simplified as shown
below.
digitizeRadii("Scale_1.jpg",id="1")
Argument values can still be changed from the default values for a
particular call to digitizeRadii()
by including that
argument in the specific call. For example if the edge was an annulus
for the structure on only one of the images, then include
edgeIsAnnulus=TRUE
in digitizeRadii()
.
The radial measurements recorded from one structure may be seen by
submitting the R data file name to combineData()
.13 By
default the radial measurement that includes the “plus-growth” will be
omitted (as this radial measurement is equal to the radial measurement
at capture and is thus redundant with the value in the
radcap
column).14
combineData("Scale_1_DHO.rds")
#> id reading agecap ann rad radcap
#> 1 1 DHO 5 1 0.2208691 0.5163737
#> 2 1 DHO 5 2 0.2893299 0.5163737
#> 3 1 DHO 5 3 0.3259383 0.5163737
#> 4 1 DHO 5 4 0.4626601 0.5163737
#> 5 1 DHO 5 5 0.5017862 0.5163737
By default the data are shown in “long” format where each row
consists of one radial measurement with all radial measurements for an
individual fish distributed across several rows. The radial data can be
shown in “wide” format where each row consists of all the radial
measurements (in separate columns) for an individual fish by including
formatOut="wide"
.
combineData("Scale_1_DHO.rds",formatOut="wide")
#> id reading agecap radcap rad1 rad2 rad3 rad4 rad5
#> 1 1 DHO 5 0.5163737 0.2208691 0.2893299 0.3259383 0.4626601 0.5017862
Of course, most analyses will consist of collecting radial measurements from structures from many fish. For example, suppose that “Scale_1.jpg” and “Scale_2.jpg” were both read by “DHO” using the following code. Following this, “Scale_1_DHO.rds” and “Scale_2_DHO.rds” would both exist in the current working directory.
RFBCoptions(reading="DHO",edgeIsAnnulus=FALSE)
digitizeRadii() # Select both images in a dialog box
Radial measurements from multiple structures can be combined into one
data.frame with combineData()
if the appropriate R data
file names are listed in a vector. The listFiles()
function
may be used to identify all filenames in the current working directory
that have the file extension given in the first argument. For example,
all files in the current working directory with the “rds” extension are
identified below.
listFiles("rds")
#> [1] "DWS_Oto_89765_DHO.rds" "Oto140306_DHO.rds" "Oto140306_OHD.rds"
#> [4] "Scale_1_DHO.rds" "Scale_1_ODH.rds" "Scale_1_OHD.rds"
#> [7] "Scale_2_DHO.rds" "Scale_2_OLDwNoNote.rds" "Scale_3_DHO.rds"
This list of names can be further filtered by including other key
words for the filenames in other=
. In this case, the list
should be limited to those files with “Scale” in the name and those
files with just “DHO”.
( fns <- listFiles("rds",other=c("Scale","DHO")) )
#> [1] "Scale_1_DHO.rds" "Scale_2_DHO.rds" "Scale_3_DHO.rds"
The listFiles()
result should be saved to an object so
that the names can be given to combineData()
as shown
below.15
( dfrad <- combineData(fns) )
#> id reading agecap ann rad radcap
#> 1 1 DHO 5 1 0.2208691 0.5163737
#> 2 1 DHO 5 2 0.2893299 0.5163737
#> 3 1 DHO 5 3 0.3259383 0.5163737
#> 4 1 DHO 5 4 0.4626601 0.5163737
#> 5 1 DHO 5 5 0.5017862 0.5163737
#> 6 2 DHO 4 1 0.1377625 0.3908662
#> 7 2 DHO 4 2 0.2236611 0.3908662
#> 8 2 DHO 4 3 0.3026716 0.3908662
#> 9 2 DHO 4 4 0.3530492 0.3908662
#> 10 3 DHO 1 1 0.5202232 0.5202232
Again, the data can be shown in “wide” format by including
formatOut="wide"
.
( dfrad2 <- combineData(fns,formatOut="wide") )
#> id reading agecap radcap rad1 rad2 rad3 rad4 rad5
#> 1 1 DHO 5 0.5163737 0.2208691 0.2893299 0.3259383 0.4626601 0.5017862
#> 2 2 DHO 4 0.3908662 0.1377625 0.2236611 0.3026716 0.3530492 NA
#> 3 3 DHO 1 0.5202232 0.5202232 NA NA NA NA
Other information about the fish (e.g., location of capture, length,
sex) is likely held in a separate file. Below, example “other” data are
loaded into the dffish
data.frame. Note that the
id
variable created from processing the structure images
above are characters. In this case, read.csv()
reads the
id
variable from the external data file as numeric because
the unique IDs were simple numbers. The second line of code below
converts these numeric IDs to characters so that this data.frame can be
joined with the radial measurements data.frame from above.16
dffish <- read.csv("FishData.csv",stringsAsFactors=FALSE) %>%
mutate(id=as.character(id))
The data in the dffish
and dfrad
data.frames are then joined by the common id
variable using
inner_join()
.
fishdat <- dffish %>%
inner_join(dfrad,by="id")
fishdat
#> id loc sex len reading agecap ann rad radcap
#> 1 1 MI-5 M 189 DHO 5 1 0.2208691 0.5163737
#> 2 1 MI-5 M 189 DHO 5 2 0.2893299 0.5163737
#> 3 1 MI-5 M 189 DHO 5 3 0.3259383 0.5163737
#> 4 1 MI-5 M 189 DHO 5 4 0.4626601 0.5163737
#> 5 1 MI-5 M 189 DHO 5 5 0.5017862 0.5163737
#> 6 2 MI-6 F 210 DHO 4 1 0.1377625 0.3908662
#> 7 2 MI-6 F 210 DHO 4 2 0.2236611 0.3908662
#> 8 2 MI-6 F 210 DHO 4 3 0.3026716 0.3908662
#> 9 2 MI-6 F 210 DHO 4 4 0.3530492 0.3908662
#> 10 3 MI-5 M 145 DHO 1 1 0.5202232 0.5202232
One could also join with the “wide” data in dfrad2
.
fishdat2 <- dffish %>%
inner_join(dfrad2,by="id")
fishdat2
#> id loc sex len reading agecap radcap rad1 rad2 rad3
#> 1 1 MI-5 M 189 DHO 5 0.5163737 0.2208691 0.2893299 0.3259383
#> 2 2 MI-6 F 210 DHO 4 0.3908662 0.1377625 0.2236611 0.3026716
#> 3 3 MI-5 M 145 DHO 1 0.5202232 0.5202232 NA NA
#> rad4 rad5
#> 1 0.4626601 0.5017862
#> 2 0.3530492 NA
#> 3 NA NA
Either file can be written to a “comma-separated values” (CSV) file17 with
write.csv()
using the R object name (e.g.,
fishdat
or fishdat2
) as the first argument and
a name for the file in file=
. Additionally, I prefer to
have non-quoted values by using quote=FALSE
and no row
names by using row.names=FALSE
. For example, the
“one-measurement-per-line” data can be output to “Kiyi2014_BCs.csv” as
follows.
write.csv(fishdat,file="Kiyi2014_BCs.csv",quote=FALSE,row.names=FALSE)
In RStudio, the working directory can be set with any of
the options under the Session … Set Working Directory menu. My
preference is to start a script that will contain all of the code
described later in this vignette. If this script is saved to the same
directory with the structure images then the working directory can be
set in RStudio with the Session … Set Working Directory … to Source File
Location menu items. I then copy the resultant setwd()
code
to my script so that I do not have to use the menu items when I run this
script again.↩︎
I use my initials (“DHO”) here for reading=
simply as an example. You will likely want to use something else.↩︎
This assumes that the Scale_1.jpg
file is
in the current working directory. Use getwd()
to see the
current working directory.↩︎
Actually much more than the radial measurements are recorded in the R data file (see here). Also note that the radii are on an arbitrary scale in this case because no scale-bar was available on the image.↩︎
The many specific arguments to
digitizeRadii()
are controlled with
RFBCoptions()
(described later) and described in detail here.↩︎
A more detailed description about the structure can be
given to description
for saving in the R data file for
future use. For example, one may use
description="Kiyi scale read once by Ogle on 22-Apr-18"
to
provide more information about the structure reading.↩︎
You may find this resource useful with respect to the X11 device and Mac OS use.↩︎
The extension (e.g., “png” or “jpg”) will not be included in the resultant R data object file.↩︎
Some back-calculation methods require knowing the relationship between actual scale length and fish length. See the Short Introducton to Back-Calculation vignette for more details.↩︎
The last selected point can be deleted by pressing the
‘d’ key. This can be done multiple times such that the last number of
selections can be deleted. The deleted points will be marked with the
plotting character and color in pch.del=
and
col.del=
(defaults to a red circle with an ‘x’ in it).↩︎
Thus, findScalingFactor()
would be run
prior to digitizeRadii()
.↩︎
The last selected point can be deleted by pressing the
‘d’ key. This can be done multiple times such that the last number of
selections can be deleted. The deleted points will be marked with the
plotting character and color in pch.del=
and
col.del=
(defaults to a red circle with an ‘x’ in it).↩︎
If no file name is given, then a dialog box will appear from which the data file can be selected. It is also assumed that the file is in the current working directory.↩︎
To include the radial measurement with “plus-growth”
then use deletePlusGrowth=FALSE
.↩︎
If no file names are given, then a dialog box will appear from which multiple data files can be selected.↩︎
The two files cannot be joined by the “id” variable if the “id” variables are of different types (e.g., character and numeric) in the two files. Thus, as shown here, they must be coerced to be the same type. This line of code would no be needed if the “id” variables were of the same type in the two files as would likely occur if the “id” variable was not simply numbers.↩︎
CSV files are small, portable, and can be opened directly in most spreadsheet softwares.↩︎