How to find the statistical mode?

One more solution, which works for both numeric & character/factor data: Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second. If your data set might have multiple modes, the above solution takes the … Read more

How to sum a variable by group

Using aggregate: aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum) Category x 1 First 30 2 Second 5 3 Third 34 In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind: aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) … (embedding @thelatemail comment), aggregate has a formula interface too aggregate(Frequency … Read more

How to write trycatch in R

Well then: welcome to the R world 😉 Here you go Setting up the code urls <- c( “http://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html”, “http://en.wikipedia.org/wiki/Xz”, “xxxxx” ) readUrl <- function(url) { out <- tryCatch( { # Just to highlight: if you want to use more than one # R expression in the “try” part then you’ll have to # use … Read more

Check existence of directory and create if doesn’t exist

Use showWarnings = FALSE: dir.create(file.path(mainDir, subDir), showWarnings = FALSE) setwd(file.path(mainDir, subDir)) dir.create() does not crash if the directory already exists, it just prints out a warning. So if you can live with seeing warnings, there is no problem with just doing this: dir.create(file.path(mainDir, subDir)) setwd(file.path(mainDir, subDir))

How to find out which package version is loaded in R?

You can use sessionInfo() to accomplish that. > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] graphics grDevices utils datasets stats grid methods base other attached packages: [1] ggplot2_0.9.0 reshape2_1.2.1 plyr_1.7.1 loaded via a namespace … Read more

Quickly reading very large tables as dataframes

An update, several years later This answer is old, and R has moved on. Tweaking read.table to run a bit faster has precious little benefit. Your options are: Using vroom from the tidyverse package vroom for importing data from csv/tab-delimited files directly into an R tibble. See Hector’s answer. Using fread in data.table for importing … Read more

Create an empty data.frame

Just initialize it with empty vectors: df <- data.frame(Date=as.Date(character()), File=character(), User=character(), stringsAsFactors=FALSE) Here’s an other example with different column types : df <- data.frame(Doubles=double(), Ints=integer(), Factors=factor(), Logicals=logical(), Characters=character(), stringsAsFactors=FALSE) str(df) > str(df) ‘data.frame’: 0 obs. of 5 variables: $ Doubles : num $ Ints : int $ Factors : Factor w/ 0 levels: $ Logicals … Read more

Run R script from command line

If you want the output to print to the terminal it is best to use Rscript Rscript a.R Note that when using R CMD BATCH a.R that instead of redirecting output to standard out and displaying on the terminal a new file called a.Rout will be created. R CMD BATCH a.R # Check the output … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)