command line utility to print statistics of numbers in linux

This is a breeze with R. For a file that looks like this:

1
2
3
4
5
6
7
8
9
10

Use this:

R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])"

To get this:

       V1       
 Min.   : 1.00  
 1st Qu.: 3.25  
 Median : 5.50  
 Mean   : 5.50  
 3rd Qu.: 7.75  
 Max.   :10.00  
[1] 3.02765
  • The -q flag squelches R’s startup licensing and help output
  • The -e flag tells R you’ll be passing an expression from the terminal
  • x is a data.frame – a table, basically. It’s a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you’re just reading in a single vector. This has an impact on which functions you can use.
  • Some functions, like summary(), naturally accommodate data.frames. If x had multiple fields, summary() would provide the above descriptive stats for each.
  • But sd() can only take one vector at a time, which is why I index x for that command (x[ , 1] returns the first column of x). You could use apply(x, MARGIN = 2, FUN = sd) to get the SDs for all columns.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)