Interactive online version: Binder badge

Aggregations

Import the LArray library:

[1]:
from larray import *

Load the population array and related axes from the demography_eurostat dataset:

[2]:
# load the 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')

# extract the 'country', 'gender' and 'time' axes
country = demography_eurostat.country
gender = demography_eurostat.gender
time = demography_eurostat.time

# extract the 'population_5_countries' array as 'population'
population = demography_eurostat.population_5_countries

# show the 'population' array
population
[2]:
    country  gender\time      2013      2014      2015      2016      2017
    Belgium         Male   5472856   5493792   5524068   5569264   5589272
    Belgium       Female   5665118   5687048   5713206   5741853   5762455
     France         Male  31772665  32045129  32174258  32247386  32318973
     France       Female  33827685  34120851  34283895  34391005  34485148
    Germany         Male  39380976  39556923  39835457  40514123  40697118
    Germany       Female  41142770  41210540  41362080  41661561  41824535
 Luxembourg         Male    268412    275117    281972    289193    296641
 Luxembourg       Female    268627    274563    280986    287056    294026
Netherlands         Male   8307339   8334385   8372858   8417135   8475102
Netherlands       Female   8472236   8494904   8527868   8561985   8606405

The LArray library provides many aggregation functions. The list is given in the Aggregation Functions subsection of the API Reference page.

Aggregation operations can be performed on axes or groups. Axes and groups can be mixed.

The main rules are:

  • Axes are separated by commas ,

  • Groups belonging to the same axis are grouped inside parentheses ()

Calculate the sum along an axis:

[3]:
population.sum(gender)
[3]:
country\time      2013      2014      2015      2016      2017
     Belgium  11137974  11180840  11237274  11311117  11351727
      France  65600350  66165980  66458153  66638391  66804121
     Germany  80523746  80767463  81197537  82175684  82521653
  Luxembourg    537039    549680    562958    576249    590667
 Netherlands  16779575  16829289  16900726  16979120  17081507

or several axes (axes are separated by commas ,):

[4]:
population.sum(country, gender)
[4]:
time       2013       2014       2015       2016       2017
      174578684  175493252  176356648  177680561  178349675

Calculate the sum along all axes except one by appending _by to the aggregation function:

[5]:
population.sum_by(time)
[5]:
time       2013       2014       2015       2016       2017
      174578684  175493252  176356648  177680561  178349675

Calculate the sum along groups (the groups belonging to the same axis must grouped inside parentheses ()):

[6]:
benelux = population.country['Belgium', 'Netherlands', 'Luxembourg'] >> 'benelux'
fr_de = population.country['France', 'Germany'] >> 'FR+DE'

population.sum((benelux, fr_de))
[6]:
country  gender\time      2013      2014      2015      2016      2017
benelux         Male  14048607  14103294  14178898  14275592  14361015
benelux       Female  14405981  14456515  14522060  14590894  14662886
  FR+DE         Male  71153641  71602052  72009715  72761509  73016091
  FR+DE       Female  74970455  75331391  75645975  76052566  76309683

Mixing axes and groups in aggregations:

[7]:
population.sum(gender, (benelux, fr_de))
[7]:
country\time       2013       2014       2015       2016       2017
     benelux   28454588   28559809   28700958   28866486   29023901
       FR+DE  146124096  146933443  147655690  148814075  149325774

Warning: Mixing slices and individual labels inside the [ ] will generate several groups (a tuple of groups) instead of a single group.If you want to create a single group using both slices and individual labels, you need to use the .union() method (see below).

[8]:
# mixing slices and individual labels leads to the creation of several groups (a tuple of groups)
except_2016 = time[:2015, 2017]
except_2016
[8]:
(time[:2015], time[2017])
[9]:
# leading to potentially unexpected results
population.sum(except_2016)
[9]:
    country  gender\time      :2015      2017
    Belgium         Male   16490716   5589272
    Belgium       Female   17065372   5762455
     France         Male   95992052  32318973
     France       Female  102232431  34485148
    Germany         Male  118773356  40697118
    Germany       Female  123715390  41824535
 Luxembourg         Male     825501    296641
 Luxembourg       Female     824176    294026
Netherlands         Male   25014582   8475102
Netherlands       Female   25495008   8606405
[10]:
# the union() method allows to mix slices and individual labels to create a single group
except_2016 = time[:2015].union(time[2017])
except_2016
[10]:
time[2013, 2014, 2015, 2017].set()
[11]:
population.sum(except_2016)
[11]:
country\gender       Male     Female
       Belgium   22079988   22827827
        France  128311025  136717579
       Germany  159470474  165539925
    Luxembourg    1122142    1118202
   Netherlands   33489684   34101413