larray.Array.describe_by

Array.describe_by(*args, percentiles=None) Array[source]

Descriptive summary statistics, excluding NaN values, along axes or for groups.

By default, it includes the number of non-NaN values, the mean, standard deviation, minimum, maximum and the 25, 50 and 75 percentiles.

Parameters
*argsint or str or Axis or Group or any combination of those, optional

Axes or groups to include in the result after aggregating. Defaults to aggregate over the whole array.

percentilesarray-like, optional.

list of integer percentiles to include. Defaults to [25, 50, 75].

Returns
Array

See also

Array.describe

Examples

>>> data = [[0, 6, 3, 5, 4, 2, 1, 3], [7, 5, 3, 2, 8, 5, 6, 4]]
>>> arr = Array(data, 'gender=Male,Female;year=2013..2020').astype(float)
>>> arr
gender\year  2013  2014  2015  2016  2017  2018  2019  2020
       Male   0.0   6.0   3.0   5.0   4.0   2.0   1.0   3.0
     Female   7.0   5.0   3.0   2.0   8.0   5.0   6.0   4.0
>>> arr.describe_by('gender')
gender\statistic  count  mean  std  min   25%  50%   75%  max
            Male    8.0   3.0  2.0  0.0  1.75  3.0  4.25  6.0
          Female    8.0   5.0  2.0  2.0  3.75  5.0  6.25  8.0
>>> arr.describe_by('gender', (X.year[:2015], X.year[2018:]))
gender  year\statistic  count  mean  std  min  25%  50%  75%  max
  Male           :2015    3.0   3.0  3.0  0.0  1.5  3.0  4.5  6.0
  Male           2018:    3.0   2.0  1.0  1.0  1.5  2.0  2.5  3.0
Female           :2015    3.0   5.0  2.0  3.0  4.0  5.0  6.0  7.0
Female           2018:    3.0   5.0  1.0  4.0  4.5  5.0  5.5  6.0
>>> arr.describe_by('gender', percentiles=[50, 90])
gender\statistic  count  mean  std  min  50%  90%  max
            Male    8.0   3.0  2.0  0.0  3.0  5.3  6.0
          Female    8.0   5.0  2.0  2.0  5.0  7.3  8.0