compare()
, and design_plot()
.mplot(model, which = 1)
now uses raw residuals rather than standardized/studentized. This mathes behavior in plot()
.na.rm
argument to prop.test().qdata()
so that it is always a named vector.cdata()
so that is is always a data frame. Also changed names to “lo” and “hi”.xpchisq()
caused by introducing explicit arguments and failing to retain ...
. (Issue #737)xpt()
caused by introducing explicit arguments and failing to handle missing ncp
correctly. (Issue #736)googleMap()
has be deprecated due to change in policy at google. Try leaflet_map()
as an alternative.do()
.xpt()
, xqt()
, etc. now have more explicit arguments. This provides additional help and prompts for the user.percs()
and counts()
are re-exported from mosaicCore
confint()
, attempting to set the confidence level using conf.level
instead of level
throws and error and provides a reminder to use level
for that purpose.confint()
methods for binom.test()
have been modified a bit. See documentation for how names map to methods.ggformula
is used for plotting in more places (replacing older lattice
code).CIdata()
now handles negative numbers correctly.mplot.lm()
now removed points with leverage 1 to avoid errors and warnings; a warning messages notifies which points have been removed.TukeyHSD()
now correctly follows system = "gg"
mplot.lm()
now uses ggrapel
to place labels and offers additional controls for the smooth curve that is overlaid. [gg version of plots only]orrr()
, oddsRatio()
, and relrisk()
now accept a 2x2 data frame to match claims in documentation.cor(~y, ~x)
prop.test()
so it handles success
argument properly for 2-way tables.ggformula
.which
argument added to mplot.TukeyHSD()
.ggformula
.mosaic
compatible with ggplot2
version 3.0.ggplot2
rather than lattice
by default.cnorm()
, ct()
, xcnorm()
, and xct()
added to find central portions of distributions.mosaicCore
mosaicCore
.ggplot2
rather than lattice
.mplot()
on linear models when system = "gg"
.formals()
.xpnorm()
and friends now use ggplot2
and can return the plot object, if requested.t.test()
has been completely reimplemented. It no longer supports “bare variable mode”, but it is more similar to stats::t.test()
in some cases.gwm()
has been removed since it no longer works with the current version of dplyr
.mosaicModel
package.props()
and counts()
have been added. They are a bit like tally()
but designed to play well with df_stats()
. Currently the formula versions drop missing data, but that will likely be determined by a user-supplied option in the future.mosaicCalc
.mosaic
depends on ggformula
, so users will have lattice
, ggplot2
, and ggformula
available after loading mosaic
.mplot()
on a data frame supports ggformula
now.ggformula
has been added.lattice
and ggformula
has been added.mosaic
to mosaicCore
. This should not affect users of mosaic
.tally()
now provide names to dimnames in cases where they were previously missing. This was needed for the refactoring of bargaph()
.bargraph()
to use tally()
for tabulation. This means the behavior of bargraph()
should match expectations of users of tally()
better than it did before. In particular, proportions now sum to 1 in each panel of a multi-panel plot.tally()
so the proportions computed when format = "proportion"
are easier to predict.prop(x ~ y)
was reporting overall proportions rather than marginal proportions.value()
, a generic with several methods for extracting a “value” from a more complicated object. Useful for extracting values from output of uniroot()
, nlm()
, integrate()
, cubature::adaptIntegrat()
without needing to know just how those values are stored in the object.prop(a ~ b)
to compute joint rather than conditional proportions.favstats()
, mean()
, sd()
, etc.) now require that the first argument be a formula. This was always the preferred method, but some functions allowed bare variable names to be used instead. As a specific example, the following code now generates an error (unless there is another object named age
in your environment).favstats(age, data = HELPrct)
## Error in typeof(x) : object 'age' not found
Replace this with
favstats( ~ age, data = HELPrct)
## min Q1 median Q3 max mean sd n missing
## 19 30 35 40 60 35.65342 7.710266 453 0
ggplot2
.mplot.data.frame()
allow it to work with an expression that evaluates to a data frame. ASH plots are now a choice for 1-variable plots.deltaMethod()
has been moved to a separate package (called deltaMethod
) to reduce package dependenciescull_for_do.lm()
now returns a data frame instead of a vector. This makes it easier for do()
to bind things together by column name.makeMap()
updated to work with new version of ggplot2
.cdata()
, ddata()
, pdata()
, qdata()
and rdata()
have been reordered so that the formula comes first.rflip()
has been improved.dfapply()
, also default value for select
changed to TRUE
.inspect()
, which is primarily intended to give an over view of the variables in a data frame, but handles some additional objects as well.data
argument is not an environment or data frame.mm()
has been deprecated and replaced with gwm()
which does groupwise models where the response may be either categorical or quantitative.plotModel()
. This is likely still not the final version, but we are getting closer.do()
.dotPlot()
are now the same size in all panels of multi-panel plots.cdist()
has been rewritten.mplot()
on a data frame now (a) prompts the user for the type of plot to create and (b) has an added option to make line plots for time series and the like.resample()
can now do residual resampling from a linear model.do()
to create common bootstrap confidence intervals. In particular, confint()
can now calculate three kinds of intervals in many common situations.fetchData()
, fetchGoogle()
, and fetchGapminder()
have been moved to a separate package, called fetch()
.plotModel()
can be used to show data and model fits for a variety of models created with lm()
or glm()
.mosaicData
a dependency of mosaic
. This avoids the problem of users forgetting to separately load the mosaicData
package.fetchGoogle()
(and perhaps read.file()
) from future versions of the package. More and more packages are providing utilities for bringing data into R and it doesn’t make sense for us to duplicate those efforts in this package. For google sheets, you might take a look at the googlesheets
package which is available via github now and will be on CRAN soon.binom.test()
, prop.test()
, and t.test()
, which have also undergone some internal restructuring. The objects returned now do a better job of reporting about the test conducted. In particular, binom.test()
and prop.test()
will report the value of success
used.(#450, #455)binom.test()
can now compute several different kinds of confidence intervals including the Wald, Plus-4 and Agresti-Coull intervals. (#449)derivedFactor()
now handles NAs without throwing a warning. (#451)pdist()
, pdist()
and related functions now do a better (i.e., useful) job with discrete distributions (#417)t.test()
and all the “aggregating” functions like mean()
and favstats()
. In particular, it is now possible to reference variables both in the data
argument and in the calling environment. (#435)CIAdata()
now provides a message indicating the source URL for the data retrieved (#444)CIAdata()
that seem to be related to a changed in file format at the CIA World Factbook website. The “inflation” data set is still broken (on the CIA website). (#441)read.file()
now uses functions from readr
in some cases. A message is produced indicating which reader is being used. There are also some API changes. In particular, character data will be returned as character rather than factor. See factorize()
for an easy way to convert things with few unique values into factors. (#442)mutate()
is used in place of transform()
in the examples. (#452)tally()
now produces counts by default for all formula shapes. Proportions or percentages must be requested explicitly. This is to avoid common errors, especially when feeding the results into chisq.test()
.msummary()
. Usually this is identical to summary()
, but for a few kids of objects it provides modified output that is less verbose.do * lm( )
will now keep track of the F statistic, too.
confint()
applied to an object produced using do()
now does more appropriate things.binom.test()
and prop.test()
now set success = 1
by default on 0-1 data to treat 0 like failure and 1 like success. Similarly, prop()
and count()
set level = 1
by default.CIsim()
can now produce plots and does so by default when samples <= 200
.add=TRUE
improved for plotDist()
.swap()
which is useful for creating randomization distributions for paired designs. The current implementation is a bit slow.MAD()
, SAD()
, and quantile()
.docFile()
introduced to simplify accessing files included with package documentation. read.file()
enhanced to take a package as an argument and look among package documentation files.factorize()
introduced as a way to convert vectors with few unique values into factors. Can be applied to an entire data frame.NHANES
contains the NHANES
data set and mosaicData
contains the other data sets.MAD()
and SAD()
were added to compute mean and sum of all pairs of absolute differences.rspin()
has been added to simulate spinning a spinner.mosaic
package to simplify R for beginners.mosaic
package.plotFun()
has been improved so that it does a better job of selecting points where the function is evaluated and no longer warns about NaN
s encountered while exploring the domain of the function.oddsRatio()
has been redesigned and relrisk()
has been added. Use their summary()
methods or verbose=TRUE
to see more information (including confidence intervals).Birthdays
data set.mplot()
and several instances have been added to make a number of plots easy to generate. There are methods for objects of classes "data.frame"
, "lm"
, "summary.lm"
, "glm"
, "summary.glm"
, "TukeyHSD"
, and "hclust"
. For several of these there are also fortify
methods that return the data frame created to facilitate plotting.read.file()
now handles (some?) https URLs and accepts an optional argument filetype
that can be used to declare the type of data file when it is not identified by extension.useNA
in the tally()
function has changed to "ifany"
.mosaic
now depends on dplyr
both to use some of its functionality and to avoid naming collisions with functions like tally()
and do()
, allowing mosaic
and dplyr
to coexist more happily.dotPlot()
. In particular, the size of the dots is determined differently and works better more of the time. Dots were also shifted down by .5 units so that theydo()
that caused it to scope incorrectly in some edge cases when a variable had the same name as a function.ntiles()
has been reimplemented and now has more formatting options.derivedFactor()
for creating factors from logical “cases”.HELP
data set has been removed from the package.HELPrct
instead.plotDist()
now accepts add=TRUE
and under=TRUE
, making it easy to add plots of distributions over (or under) plots of data (e.g., histograms, densityplots, etc.) or other distributions.add=TRUE
have been reimplemented using layer
from latticeExtra
. See documentation of these functions for details.ladd()
has been completely reimplemented using layer()
from latticeExtra
. See documentation of ladd()
for details, including some behavior changes.mean()
, sd()
, var()
, et al) now use getOptions("na.rm")
to determine the default value of na.rm
. Use options(na.rm=TRUE)
to change the default behavior to remove NA
s and options(na.rm=NULL) to restore defaults.do()
has been largely rewritten with an eye toward improved efficiency. In particular, do()
will take advantage of multiple cores if the parallel
package is available. At this point, sluggishness in applications of do()
are mostly likely due to the sluggishness of what is being done, not to do()
itself.deltaMethod()
from the car
package to make it easier to propagate uncertainty in some situations that commonly arise in the physical sciences and engineering.cdist()
to compute critical values for the central portion of a distribution.qdata()
. For interactive use, this should not cause any problems, but old programmatic uses of qdata()
should be checked as the object returned is now different.sum()
, mean()
, sd()
, etc.) to produce counter-intuitive results (but with a warning). The results are now what one would expect (and the warning is removed).rsquared()
for extracting r-squared from models and model-like objects (r.squared()
has been deprecated).do()
now handles ANOVA-like objects bettermaggregate()
is now built on some improved behind the scenes functions. Among other features, the groups
argument is now incorporated as an alternative method of specifying the groups to aggregate over and the method
argument can be set to "ddply"
to use ddply()
from the plyr
package for aggregation. This results in a different output format that may be desired in some applications.
The cdata()
, pdata()
and qdata()
functions have been largely rewritten. In addition, cdata_f()
, pdata_f()
and qdata_f()
are provided which produce similar results but have a formula in the first argument slot.doc/
and so are available from within the package as well as via links to external files.fetchGapminder()
for fetching data sets originally from Gapminder.cdata()
for finding end points of a central portion of a variable.prop()
to avoid internal :
which makes downstream processing messier.manipulate()
(RStudio)plotFun()
can be used without manipulate()
. This makes it possible to put surface plots into RMarkdown or Rnw files or to generate them outside of RStudio.do() * rflip()
now records proportion heads as well as counts of heads and tails.mosaicLatticeOptions()
and restoreLatticeOptions()
to switch back and forth between lattice
defaults and mosaic
defaults.dotPlot()
uses a different algorithm to determine dot sizes. (Still not perfect, but cex
can be used to further scale the dots.)histogram()
so that nint
matches the number of bins used more accurately.i2
: max number of drinks is at least as large as i1
: the average number of drinks.D()
and antiD()
.mPlot()
provides an interactive environment for creating lattice
and ggplot2
plots.sp2df()
for converting SpatialPolygonDataFrames to regular data frames (which is useful for plotting with ggplot2
, for example). Also the Countries
data frame facilitates mapping country names among different sources of map data.do()
are now marked as such so that confint()
can behave differently for such data frames and for “regular” data frames.t.test()
can now do 1-sample t-test described using a formula.mean()
, var()
, etc. using a formula interface) have been completely reimplemented and additional aggregating functions are provided.ntiles()
function has been added to facilitate creating factors based on quantile ranges.RailTrail
dataset.xhistogram()
is now deprecated. Use histogram()
instead.mean()
, max()
, median()
, var()
, etc.) now use getOption('na.rm')
to determine default behavior.var()
allow it to work in a wider range of situations.TukeyHSD()
so that explicit use of aov()
is no longer requiredpanel.lmbands()
for plotting confidence and prediction bands in linear regressionAnimals
from MASS
has been removed by renaming the data set GestationLongevity
.freqpolygon()
for making frequency polygons.r.squared()
for extracting r-squared from models and model-like objects.do()
so that hyphens (‘-’) are turned into dots (‘.’)fetchData()
.We are still in beta, but we hope things are beginning to stabilize as we settle on syntax and coding idioms for the package. Here are some of the key updates since 0.4:
lm()
and its cousins.makeFun()
now has methods for glm and nls objectsD()
improved to use symbolic differentiation in more cases and allow pass through to stats::D()
when that makes sense. This allows functions like deltaMethod() from the car package to work properly even when the mosaic package is loaded.antiD()
has been modified somewhat. This may go through another revision if/when we add in symbolic differentiation, but we think we are now close to the end state.fitSpline()
and fitModel()
have been added as wrappers around linear models using ns(), bs(), and nls(). Each of these returns the model fit as a function.