jmv

Descriptives

Descriptives are an assortment of summarising statistics, and visualizations which allow exploring the shape and distribution of data. It is good practice to explore your data with descriptives before proceeding to more formal tests.

Example usage

data('mtcars')
dat <- mtcars

# frequency tables can be provided for factors
dat$gear <- as.factor(dat$gear)

descriptives(dat, vars = vars(mpg, cyl, disp, gear), freq = TRUE)

#
#  DESCRIPTIVES
#
#  Descriptives
#  ───────────────────────────────────────────
#               mpg     cyl     disp    gear
#  ───────────────────────────────────────────
#    N            32      32      32      32
#    Missing       0       0       0       0
#    Mean       20.1    6.19     231    3.69
#    Median     19.2    6.00     196    4.00
#    Minimum    10.4    4.00    71.1       3
#    Maximum    33.9    8.00     472       5
#  ───────────────────────────────────────────
#
#
#  FREQUENCIES
#
#  Frequencies of gear
#  ────────────────────
#    Levels    Counts
#  ────────────────────
#    3             15
#    4             12
#    5              5
#  ────────────────────
#

# spliting by a variable
descriptives(formula = disp + mpg ~ cyl, dat,
    median=F, min=F, max=F, n=F, missing=F)

# providing histograms
descriptives(formula = mpg ~ cyl, dat, hist=T,
    median=F, min=F, max=F, n=F, missing=F)

# splitting by multiple variables
descriptives(formula = mpg ~ cyl:gear, dat,
    median=F, min=F, max=F, missing=F)

Arguments

data the data as a data frame
vars a vector of strings naming the variables of interest in data
splitBy a vector of strings naming the variables used to split vars
freq TRUE or FALSE (default), provide frequency tables (nominal, ordinal variables only)
hist TRUE or FALSE (default), provide histograms (continuous variables only)
dens TRUE or FALSE (default), provide density plots (continuous variables only)
bar TRUE or FALSE (default), provide bar plots (nominal, ordinal variables only)
barCounts TRUE or FALSE (default), add counts to the bar plots
box TRUE or FALSE (default), provide box plots (continuous variables only)
violin TRUE or FALSE (default), provide violin plots (continuous variables only)
dot TRUE or FALSE (default), provide dot plots (continuous variables only)
dotType
boxMean TRUE or FALSE (default), add mean to box plot
qq TRUE or FALSE (default), provide Q-Q plots (continuous variables only)
n TRUE (default) or FALSE, provide the sample size
missing TRUE (default) or FALSE, provide the number of missing values
mean TRUE (default) or FALSE, provide the mean
median TRUE (default) or FALSE, provide the median
mode TRUE or FALSE (default), provide the mode
sum TRUE or FALSE (default), provide the sum
sd TRUE (default) or FALSE, provide the standard deviation
variance TRUE or FALSE (default), provide the variance
range TRUE or FALSE (default), provide the range
min TRUE or FALSE (default), provide the minimum
max TRUE or FALSE (default), provide the maximum
se TRUE or FALSE (default), provide the standard error
iqr TRUE or FALSE (default), provide the interquartile range
skew TRUE or FALSE (default), provide the skewness
kurt TRUE or FALSE (default), provide the kurtosis
sw TRUE or FALSE (default), provide Shapiro-Wilk p-value
pcEqGr TRUE or FALSE (default), provide quantiles
pcNEqGr an integer (default: 4) specifying the number of equal groups
pc TRUE or FALSE (default), provide percentiles
pcValues a comma-sepated list (default: 25,50,75) specifying the percentiles

Returns

A results object containing:

results$descriptives a table
results$frequencies an array of tables
results$plots an array of groups

Tables can be converted to data frames with asDF or as.data.frame(). For example:

results$descriptives$asDF

as.data.frame(results$descriptives)

Elements in arrays can be accessed with [[n]]. For example:

results$frequencies[[1]] # accesses the first element