Introduction to the regfilter package

The regfilter package contains filtering techniques to remove noisy samples in regression datasets. It adapts up to a total of 14 classic and recent noise filters to be used in regression problems employing the approach proposed in Martin et al. (2021).

Instalation

The regfilter package can be installed in R from CRAN servers using the command:

# install.packages("regfilter")

This command installs all the dependencies of the package as well as all the regression algorithms necessary for the operation of the noise filters. In order to access all the functions of the package, it is necessary to use the R command:

library(regfilter)

Documentation

All the information corresponding to each noise filter can be consulted from the CRAN website. Additionally, the help() command can be used. For example, in order to check the documentation of the regIPF noise filter, we can use:

help(regIPF)

Usage of regressand noise filters

For processing noisy regression data, each noise filter in the regfilter package provides two standard ways of use:

An example on how to use these two methods for filtering out the rock dataset with the regCNN noise filter is shown below:

data(rock)
head(rock)
#>   area    peri     shape perm
#> 1 4990 2791.90 0.0903296  6.3
#> 2 7002 3892.60 0.1486220  6.3
#> 3 7558 3930.66 0.1833120  6.3
#> 4 7352 3869.32 0.1170630  6.3
#> 5 7943 3948.54 0.1224170 17.1
#> 6 7979 4010.15 0.1670450 17.1
# Using the default method:
set.seed(9)
out.def <- regCNN(x = rock[,-ncol(rock)], y = rock[,ncol(rock)])
# Using the formula method:
set.seed(9)
out.frm <- regCNN(formula = perm ~ ., data = rock)
# Check the match of noisy indices:
all(out.def$idnoise == out.frm$idnoise)
#> [1] TRUE

Note that, the \(\$\) operator is used to access the elements returned by the filter in the objects \(out.def\) and \(out.frm\).

Output values

All regression noise filters return an object of class rfdata. It is designed to unify the output value of the methods included in the regfilter package. The class rfdata is a list of elements with the most relevant information of the noise filtering process:

As an example, the structure of the rfdata object returned using the regCNN noise filter is shown below:

str(out.def)
#> List of 11
#>  $ xclean  :'data.frame':    39 obs. of  3 variables:
#>   ..$ area : int [1:39] 4990 7002 7558 7352 7943 7979 9333 8209 8393 6425 ...
#>   ..$ peri : num [1:39] 2792 3893 3931 3869 3949 ...
#>   ..$ shape: num [1:39] 0.0903 0.1486 0.1833 0.1171 0.1224 ...
#>  $ yclean  : num [1:39] 6.3 6.3 6.3 6.3 17.1 17.1 17.1 17.1 119 119 ...
#>  $ numclean: int 39
#>  $ idclean : num [1:39] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ xnoise  :'data.frame':    9 obs. of  3 variables:
#>   ..$ area : int [1:9] 3469 1468 3524 5267 5048 1016 5605 8793 5514
#>   ..$ peri : num [1:9] 1377 476 1189 1645 942 ...
#>   ..$ shape: num [1:9] 0.177 0.439 0.164 0.254 0.329 ...
#>  $ ynoise  : num [1:9] 100 100 100 100 1300 1300 1300 1300 580
#>  $ numnoise: int 9
#>  $ idnoise : int [1:9] 37 38 39 40 41 42 43 44 47
#>  $ filter  : chr "Condensed Nearest Neighbors"
#>  $ param   :List of 1
#>   ..$ t: num 0.2
#>  $ call    : language regCNN(x = rock[, -ncol(rock)], y = rock[, ncol(rock)])
#>  - attr(*, "class")= chr "rfdata"

In order to display the results of the class rfdata in a friendly way in the R console, two specific print and summary functions are implemented. The print function presents the basic information of the regressand noise filter:

print(out.def)
#> 
#> ## Noise model: 
#> Condensed Nearest Neighbors
#> 
#> ## Parameters:
#> - t = 0.2
#> 
#> ## Number of noisy and clean samples values:
#> - Noisy values: 9/48 (18.75%)
#> - Clean values: 39/48 (81.25%)

The information offered by print is as follows:

On the other hand, the summary function displays the information of the dataset processed with the noise filter along with other additional details. This function can be called by typing the following R command:

summary(out.frm, showid = TRUE)
#> 
#> ########################################################
#>  Noise filtering process: Summary
#> ########################################################
#> 
#> ## Original call:
#> regCNN(formula = perm ~ ., data = rock)
#> 
#> ## Noise model: 
#> Condensed Nearest Neighbors
#> 
#> ## Parameters:
#> - t = 0.2
#> 
#> ## Number of noisy and clean samples values:
#> - Noisy values: 9/48 (18.75%)
#> - Clean values: 39/48 (81.25%)
#> 
#> ## Indices of noisy samples:
#> - Output class: 37, 38, 39, 40, 41, 42, 43, 44, 47

The information offered by this function is as follows: