Skip to contents

Creates a one- or two-way table of summary statistics for a quantitative variable.

Usage

sumTable(formula, ...)

# S3 method for class 'formula'
sumTable(formula, data = NULL, FUN = mean, digits = getOption("digits"), ...)

Arguments

formula

A formula with a quantitative variable on the left-hand-side and one or two factor variables on the right-hand-side. See details.

...

Other arguments to pass through to FUN.

data

An optional data frame that contains the variables in formula.

FUN

A scalar function that identifies the summary statistics. Applied to the quantitative variable for all data subsets identified by the combination of the factor(s). Defaults to mean.

digits

A single numeric that indicates the number of digits to be used for the result.

Value

A one-way array of values if only one factor variable is supplied on the right-hand-side of formula. A two-way matrix of values if two factor variables are supplied on the right-hand-side of formula. These are the same classes of objects returned by tapply.

Details

The formula must be of the form quantitative~factor or quantitative~factor*factor2 where quantitative is the quantitative variable to construct the summaries for and factor and factor2 are factor variables that contain the levels for which separate summaries should be constructed. If the variables on the right-hand-side are not factors, then they will be coerced to be factors and a warning will be issued.

This function is largely a wrapper to tapply(), but only works for one quantitative variable on the left-hand-side and one or two factor variables on the right-hand-side. Consider using tapply for situations with more factors on the right-hand-side.

See also

See tapply for a more general implementation. See Summarize for a similar computation when only one factor variable is given.

Author

Derek H. Ogle, DerekOgle51@gmail.com

Examples

## The same examples as in the old aggregate.table in gdata package
## but data in data.frame to illustrate formula notation
d <- data.frame(g1=sample(letters[1:5], 1000, replace=TRUE),
                g2=sample(LETTERS[1:3], 1000, replace=TRUE),
                dat=rnorm(1000))

sumTable(dat~g1*g2,data=d,FUN=length)       # get sample size
#> Warning: First RHS variable was converted to a factor.
#> Warning: Second RHS variable was converted to a factor.
#>    A  B  C
#> a 73 67 59
#> b 71 70 64
#> c 67 72 70
#> d 66 56 65
#> e 56 73 71
sumTable(dat~g1*g2,data=d,FUN=validn)       # get sample size (better way)
#> Warning: First RHS variable was converted to a factor.
#> Warning: Second RHS variable was converted to a factor.
#>    A  B  C
#> a 73 67 59
#> b 71 70 64
#> c 67 72 70
#> d 66 56 65
#> e 56 73 71
sumTable(dat~g1*g2,data=d,FUN=mean)         # get mean
#> Warning: First RHS variable was converted to a factor.
#> Warning: Second RHS variable was converted to a factor.
#>            A          B          C
#> a  0.0821087  0.0372090  0.0204970
#> b -0.1646688 -0.1809234  0.0716924
#> c  0.0582100  0.0470679  0.0880142
#> d -0.0608258 -0.0688797  0.1181317
#> e  0.0678168  0.0245884 -0.1041628
sumTable(dat~g1*g2,data=d,FUN=sd)           # get sd
#> Warning: First RHS variable was converted to a factor.
#> Warning: Second RHS variable was converted to a factor.
#>           A         B         C
#> a 0.8969377 0.9308896 1.0627889
#> b 1.0648330 0.9646597 0.9066040
#> c 1.0624400 1.0027871 1.0061691
#> d 0.9059574 1.0125741 0.8656674
#> e 1.1076446 0.9818268 1.0423718
sumTable(dat~g1*g2,data=d,FUN=sd,digits=1)  # show digits= argument
#> Warning: First RHS variable was converted to a factor.
#> Warning: Second RHS variable was converted to a factor.
#>     A   B   C
#> a 0.9 0.9 1.1
#> b 1.1 1.0 0.9
#> c 1.1 1.0 1.0
#> d 0.9 1.0 0.9
#> e 1.1 1.0 1.0

## Also demonstrate use in the 1-way example -- but see Summarize()
sumTable(dat~g1,data=d,FUN=validn)
#> Warning: RHS variable was converted to a factor.
#>   a   b   c   d   e 
#> 199 205 209 187 200 
sumTable(dat~g1,data=d,FUN=mean)
#> Warning: RHS variable was converted to a factor.
#>          a          b          c          d          e 
#>  0.0487249 -0.0964283  0.0643538 -0.0010332 -0.0090143 

## Example with a missing value (compare to above)
d$dat[1] <- NA
sumTable(dat~g1,data=d,FUN=validn)  # note use of validn
#> Warning: RHS variable was converted to a factor.
#>   a   b   c   d   e 
#> 199 204 209 187 200 
sumTable(dat~g1,data=d,FUN=mean,na.rm=TRUE)
#> Warning: RHS variable was converted to a factor.
#>          a          b          c          d          e 
#>  0.0487249 -0.0973921  0.0643538 -0.0010332 -0.0090143