#HistDAWass ##(Histogram-valued Data analysis using Wasserstein metric)

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the analysis of data tables containing histograms in each cell instead of the classical numeric values.

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the anlysis of data tables containing histograms in each cell instead of the classical numeric values.

What is the L2 Wasserstein metric?

given two probability density functions f and g, each one has a cumulative distribution function F and G and thei respectively quantile functions (the inverse of a cumulative distribution function) Qf and Qg. The L2 Wasserstein distance is

\[d_W(f,g)=\\sqrt{\\int\\limits_0^1{(Q_f(p) - Q_g(p))^2 dp}}\]

The implemented classes are those described in the following table

Class wrapper function for initializing Description
distributionH distributionH(x,p) A class describing a histogram distibution
MatH MatH(x, nrows, ncols,rownames,varnames, by.row ) A class describing a matrix of distributions
TdistributionH TdistributionH() A class derived from distributionH equipped with a timestamp or a time window
HTS HTS() A class describing a Histgram-valued time series
library(HistDAWass)
mydist=distributionH(x=c(0,1,2),p=c(0,0.3,1))

#From raw data to histograms

data2hist functions

#Basic statistics for a distributionH (A histogram)

#Basic statistics for a MatH (A matrix of histogrm-valued data)

#Visualization > plot of a distributionH

plot of a MatH

plot of a HTS

#Data Analysis methods

Clustering

Dimension reduction techniques

#Methods for Histogram time series

Smoothing

Predicting

#Linear regression

A two component model for a linear regression using Least Square method