malaytextr

library(malaytextr)

Examples

Malay root words

There is a data frame of Malay root words that can be used as a dictionary:


head(malayrootwords)
#>   Col Word Root Word
#> 1       ad       ada
#> 2       ak       aku
#> 3      akn      akan
#> 4      ank      anak
#> 5       ap       apa
#> 6      awl      awal

Stem Malay words

stem_malay() will find the root words in a dictionary, in which the malayrootwords data frame can be used, then it will remove “extra suffix”“,”prefix" and lastly “suffix”

To stem word “banyaknya”. It will return a data frame with the word “banyaknya” and the stemmed word “banyak”:


stem_malay(word = "banyaknya", dictionary = malayrootwords)
#> 'Root Word' is now returned instead of 'root_word'
#>    Col Word Root Word
#> 1 banyaknya    banyak

To stem words in a data frame:

  1. Specify the data frame
  2. Specify the dictionary
  3. Specify the column that needs to be stemmed

x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))

stem_malay(word = x, 
          dictionary = malayrootwords, 
          col_feature1 = "text")
#> 'Root Word' is now returned instead of 'root_word'
#>      Col Word Root Word
#> 1   banyaknya    banyak
#> 2      sangat    sangat
#> 3     terkedu      kedu
#> 4 pengetahuan      tahu

Remove URLs

remove_url will remove all urls found in a string


x <- c("test https://t.co/fkQC2dXwnc", "another one https://www.google.com/ to try")

remove_url(x)
#> [1] "test "               "another one  to try"

Malay stop words

There is a data frame of Malay stop words:


head(malaystopwords)
#> # A tibble: 6 x 1
#>   stopwords
#>   <chr>    
#> 1 ada      
#> 2 sampai   
#> 3 sana     
#> 4 itu      
#> 5 sangat   
#> 6 saya