Aggregations
Import the LArray library:
[1]:
from larray import *
Load the population
array and related axes from the demography_eurostat
dataset:
[2]:
# load the 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')
# extract the 'country', 'gender' and 'time' axes
country = demography_eurostat.country
gender = demography_eurostat.gender
time = demography_eurostat.time
# extract the 'population_5_countries' array as 'population'
population = demography_eurostat.population_5_countries
# show the 'population' array
population
[2]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
Luxembourg Male 268412 275117 281972 289193 296641
Luxembourg Female 268627 274563 280986 287056 294026
Netherlands Male 8307339 8334385 8372858 8417135 8475102
Netherlands Female 8472236 8494904 8527868 8561985 8606405
The LArray library provides many aggregation functions. The list is given in the Aggregation Functions subsection of the API Reference page.
Aggregation operations can be performed on axes or groups. Axes and groups can be mixed.
The main rules are:
Axes are separated by commas
,
Groups belonging to the same axis are grouped inside parentheses ()
Calculate the sum along an axis:
[3]:
population.sum(gender)
[3]:
country\time 2013 2014 2015 2016 2017
Belgium 11137974 11180840 11237274 11311117 11351727
France 65600350 66165980 66458153 66638391 66804121
Germany 80523746 80767463 81197537 82175684 82521653
Luxembourg 537039 549680 562958 576249 590667
Netherlands 16779575 16829289 16900726 16979120 17081507
or several axes (axes are separated by commas ,
):
[4]:
population.sum(country, gender)
[4]:
time 2013 2014 2015 2016 2017
174578684 175493252 176356648 177680561 178349675
Calculate the sum along all axes except one by appending _by
to the aggregation function:
[5]:
population.sum_by(time)
[5]:
time 2013 2014 2015 2016 2017
174578684 175493252 176356648 177680561 178349675
Calculate the sum along groups (the groups belonging to the same axis must grouped inside parentheses ()):
[6]:
benelux = population.country['Belgium', 'Netherlands', 'Luxembourg'] >> 'benelux'
fr_de = population.country['France', 'Germany'] >> 'FR+DE'
population.sum((benelux, fr_de))
[6]:
country gender\time 2013 2014 2015 2016 2017
benelux Male 14048607 14103294 14178898 14275592 14361015
benelux Female 14405981 14456515 14522060 14590894 14662886
FR+DE Male 71153641 71602052 72009715 72761509 73016091
FR+DE Female 74970455 75331391 75645975 76052566 76309683
Mixing axes and groups in aggregations:
[7]:
population.sum(gender, (benelux, fr_de))
[7]:
country\time 2013 2014 2015 2016 2017
benelux 28454588 28559809 28700958 28866486 29023901
FR+DE 146124096 146933443 147655690 148814075 149325774
Warning: Mixing slices and individual labels inside the [ ]
will generate several groups (a tuple of groups) instead of a single group.If you want to create a single group using both slices and individual labels, you need to use the .union()
method (see below).
[8]:
# mixing slices and individual labels leads to the creation of several groups (a tuple of groups)
except_2016 = time[:2015, 2017]
except_2016
[8]:
(time[:2015], time[2017])
[9]:
# leading to potentially unexpected results
population.sum(except_2016)
[9]:
country gender\time :2015 2017
Belgium Male 16490716 5589272
Belgium Female 17065372 5762455
France Male 95992052 32318973
France Female 102232431 34485148
Germany Male 118773356 40697118
Germany Female 123715390 41824535
Luxembourg Male 825501 296641
Luxembourg Female 824176 294026
Netherlands Male 25014582 8475102
Netherlands Female 25495008 8606405
[10]:
# the union() method allows to mix slices and individual labels to create a single group
except_2016 = time[:2015].union(time[2017])
except_2016
[10]:
time[2013, 2014, 2015, 2017].set()
[11]:
population.sum(except_2016)
[11]:
country\gender Male Female
Belgium 22079988 22827827
France 128311025 136717579
Germany 159470474 165539925
Luxembourg 1122142 1118202
Netherlands 33489684 34101413