Is it possible to draw a boxplot given the percentile values instead of the original inputs?

As of 2020, there is a better method than the one in the accepted answer. The matplotlib.axes.Axes class provides a bxp method, which can be used to draw the boxes and whiskers based on the percentile values. Raw data is only needed for the outliers, and that is optional. Example: import matplotlib.pyplot as plt fig, … Read more

Fast algorithm for repeated calculation of percentile?

You can do it with two heaps. Not sure if there’s a less ‘contrived’ solution, but this one provides O(logn) time complexity and heaps are also included in standard libraries of most programming languages. First heap (heap A) contains smallest 75% elements, another heap (heap B) – the rest (biggest 25%). First one has biggest … Read more

Percentile calculation

I think Wikipedia page has formulas you need to write your own function… I tried this: public double Percentile(double[] sequence, double excelPercentile) { Array.Sort(sequence); int N = sequence.Length; double n = (N – 1) * excelPercentile + 1; // Another method: double n = (N + 1) * excelPercentile; if (n == 1d) return sequence[0]; … Read more

nth percentile calculations in postgresql

With PostgreSQL 9.4 there is native support for percentiles now, implemented in Ordered-Set Aggregate Functions: percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression) multiple continuous percentile: returns an array of results … Read more

Eliminating all data over a given percentile

Use the Series.quantile() method: In [48]: cols = list(‘abc’) In [49]: df = DataFrame(randn(10, len(cols)), columns=cols) In [50]: df.a.quantile(0.95) Out[50]: 1.5776961953820687 To filter out rows of df where df.a is greater than or equal to the 95th percentile do: In [72]: df[df.a < df.a.quantile(.95)] Out[72]: a b c 0 -1.044 -0.247 -1.149 2 0.395 0.591 … Read more

matplotlib: disregard outliers when plotting

There’s no single “best” test for an outlier. Ideally, you should incorporate a-priori information (e.g. “This parameter shouldn’t be over x because of blah…”). Most tests for outliers use the median absolute deviation, rather than the 95th percentile or some other variance-based measurement. Otherwise, the variance/stddev that is calculated will be heavily skewed by the … Read more

Weighted percentile using numpy

Completely vectorized numpy solution Here is the code I use. It’s not an optimal one (which I’m unable to write with numpy), but still much faster and more reliable than accepted solution def weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False): “”” Very close to numpy.percentile, but supports weights. NOTE: quantiles should be in [0, 1]! :param values: … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)