Import the LArray library:
[1]:
from larray import *
[2]:
# load 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')
# extract the 'population' array from the dataset
population = demography_eurostat.population
population
[2]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
Inspecting Array objects
Get array summary : metadata + dimensions + description of axes + dtype + size in memory
[3]:
# Array summary: metadata + dimensions + description of axes
population.info
[3]:
title: Population on 1 January by age and sex
source: table demo_pjan from Eurostat
3 x 2 x 5
country [3]: 'Belgium' 'France' 'Germany'
gender [2]: 'Male' 'Female'
time [5]: 2013 2014 2015 2016 2017
dtype: int32
memory used: 120 bytes
Get axes
[4]:
population.axes
[4]:
AxisCollection([
Axis(['Belgium', 'France', 'Germany'], 'country'),
Axis(['Male', 'Female'], 'gender'),
Axis([2013, 2014, 2015, 2016, 2017], 'time')
])
Get axis names
[5]:
population.axes.names
[5]:
['country', 'gender', 'time']
Get number of dimensions
[6]:
population.ndim
[6]:
3
Get length of each dimension
[7]:
population.shape
[7]:
(3, 2, 5)
Get total number of elements of the array
[8]:
population.size
[8]:
30
Get type of internal data (int, float, …)
[9]:
population.dtype
[9]:
dtype('int32')
Get size in memory
[10]:
population.memory_used
[10]:
'120 bytes'
Some Useful Functions
with total
Add totals to one or several axes:
[11]:
population.with_total('gender', label='Total')
[11]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
Belgium Total 11137974 11180840 11237274 11311117 11351727
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
France Total 65600350 66165980 66458153 66638391 66804121
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
Germany Total 80523746 80767463 81197537 82175684 82521653
See with_total for more details and examples.
where
The where
function can be used to apply some computation depending on a condition:
[12]:
# where(condition, value if true, value if false)
where(population < population.mean('time'), -population, population)
[12]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male -5472856 -5493792 -5524068 5569264 5589272
Belgium Female -5665118 -5687048 -5713206 5741853 5762455
France Male -31772665 -32045129 32174258 32247386 32318973
France Female -33827685 -34120851 34283895 34391005 34485148
Germany Male -39380976 -39556923 -39835457 40514123 40697118
Germany Female -41142770 -41210540 -41362080 41661561 41824535
See where for more details and examples.
clip
Set all data between a certain range:
[13]:
# values below 10 millions are set to 10 millions
population.clip(minval=10**7)
[13]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 10000000 10000000 10000000 10000000 10000000
Belgium Female 10000000 10000000 10000000 10000000 10000000
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
[14]:
# values above 40 millions are set to 40 millions
population.clip(maxval=4*10**7)
[14]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40000000 40000000
Germany Female 40000000 40000000 40000000 40000000 40000000
[15]:
# values below 10 millions are set to 10 millions and
# values above 40 millions are set to 40 millions
population.clip(10**7, 4*10**7)
[15]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 10000000 10000000 10000000 10000000 10000000
Belgium Female 10000000 10000000 10000000 10000000 10000000
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40000000 40000000
Germany Female 40000000 40000000 40000000 40000000 40000000
[16]:
# Using vectors to define the lower and upper bounds
lower_bound = sequence(population.time, initial=5_500_000, inc=50_000)
upper_bound = sequence(population.time, 41_000_000, inc=100_000)
print(lower_bound, '\n')
print(upper_bound, '\n')
population.clip(lower_bound, upper_bound)
time 2013 2014 2015 2016 2017
5500000 5550000 5600000 5650000 5700000
time 2013 2014 2015 2016 2017
41000000 41100000 41200000 41300000 41400000
[16]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5500000 5550000 5600000 5650000 5700000
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41000000 41100000 41200000 41300000 41400000
See clip for more details and examples.
divnot0
Replace division by 0 by 0:
[17]:
divisor = ones(population.axes, dtype=int)
divisor['Male'] = 0
divisor
[17]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 0 0 0 0 0
Belgium Female 1 1 1 1 1
France Male 0 0 0 0 0
France Female 1 1 1 1 1
Germany Male 0 0 0 0 0
Germany Female 1 1 1 1 1
[18]:
population / divisor
/tmp/ipykernel_2813/1661386825.py:1: RuntimeWarning: divide by zero encountered during operation
population / divisor
[18]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male inf inf inf inf inf
Belgium Female 5665118.0 5687048.0 5713206.0 5741853.0 5762455.0
France Male inf inf inf inf inf
France Female 33827685.0 34120851.0 34283895.0 34391005.0 34485148.0
Germany Male inf inf inf inf inf
Germany Female 41142770.0 41210540.0 41362080.0 41661561.0 41824535.0
[19]:
# we use astype(int) since the divnot0 method
# returns a float array in this case while
# we want an integer array
population.divnot0(divisor).astype(int)
[19]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 0 0 0 0 0
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 0 0 0 0 0
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 0 0 0 0 0
Germany Female 41142770 41210540 41362080 41661561 41824535
See divnot0 for more details and examples.
ratio
The ratio
(rationot0
) method returns an array with all values divided by the sum of values along given axes:
[20]:
population.ratio('gender')
# which is equivalent to
population / population.sum('gender')
[20]:
country gender\time 2013 ... 2017
Belgium Male 0.491369076638175 ... 0.4923719536243252
Belgium Female 0.508630923361825 ... 0.5076280463756748
France Male 0.48433682137366646 ... 0.48378711546852027
France Female 0.5156631786263336 ... 0.5162128845314797
Germany Male 0.4890604071002857 ... 0.4931689625751922
Germany Female 0.5109395928997144 ... 0.5068310374248077
percents
[21]:
# or, if you want the previous ratios in percents
population.percent('gender')
[21]:
country gender\time 2013 ... 2017
Belgium Male 49.136907663817496 ... 49.237195362432516
Belgium Female 50.863092336182504 ... 50.762804637567484
France Male 48.433682137366645 ... 48.378711546852024
France Female 51.566317862633355 ... 51.621288453147976
Germany Male 48.90604071002857 ... 49.316896257519225
Germany Female 51.09395928997143 ... 50.683103742480775
See percent for more details and examples.
diff
The diff
method calculates the n-th order discrete difference along a given axis.
The first order difference is given by out[n+1] = in[n+1] - in[n]
along the given axis.
[22]:
# calculates 'diff[year+1] = population[year+1] - population[year]'
population.diff('time')
[22]:
country gender\time 2014 2015 2016 2017
Belgium Male 20936 30276 45196 20008
Belgium Female 21930 26158 28647 20602
France Male 272464 129129 73128 71587
France Female 293166 163044 107110 94143
Germany Male 175947 278534 678666 182995
Germany Female 67770 151540 299481 162974
[23]:
# calculates 'diff[year+2] = population[year+2] - population[year]'
population.diff('time', d=2)
[23]:
country gender\time 2015 2016 2017
Belgium Male 51212 75472 65204
Belgium Female 48088 54805 49249
France Male 401593 202257 144715
France Female 456210 270154 201253
Germany Male 454481 957200 861661
Germany Female 219310 451021 462455
[24]:
# calculates 'diff[year] = population[year+1] - population[year]'
population.diff('time', label='lower')
[24]:
country gender\time 2013 2014 2015 2016
Belgium Male 20936 30276 45196 20008
Belgium Female 21930 26158 28647 20602
France Male 272464 129129 73128 71587
France Female 293166 163044 107110 94143
Germany Male 175947 278534 678666 182995
Germany Female 67770 151540 299481 162974
See diff for more details and examples.
growth_rate
The growth_rate
method calculates the growth along a given axis.
It is roughly equivalent to a.diff(axis, d, label) / a[axis.i[:-d]]
:
[25]:
population.growth_rate('time')
[25]:
country gender\time 2014 ... 2017
Belgium Male 0.0038254249700704714 ... 0.003592575248722273
Belgium Female 0.0038710579373633525 ... 0.0035880403068486778
France Male 0.008575421671427311 ... 0.0022199318729276226
France Female 0.008666451753940596 ... 0.002737430906715288
Germany Male 0.004467817151103619 ... 0.00451681997411125
Germany Female 0.001647190988842025 ... 0.0039118553431063225
See growth_rate for more details and examples.
shift
The shift
method drops first label of an axis and shifts all subsequent labels
[26]:
population.shift('time')
[26]:
country gender\time 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264
Belgium Female 5665118 5687048 5713206 5741853
France Male 31772665 32045129 32174258 32247386
France Female 33827685 34120851 34283895 34391005
Germany Male 39380976 39556923 39835457 40514123
Germany Female 41142770 41210540 41362080 41661561
[27]:
# when shift is applied on an (increasing) time axis,
# it effectively brings "past" data into the future
population_shifted = population.shift('time')
stack({'population_shifted_2014': population_shifted[2014], 'population_2013': population[2013]}, 'array')
[27]:
country gender\array population_shifted_2014 population_2013
Belgium Male 5472856 5472856
Belgium Female 5665118 5665118
France Male 31772665 31772665
France Female 33827685 33827685
Germany Male 39380976 39380976
Germany Female 41142770 41142770
See shift for more details and examples.
Other interesting functions
There are a lot more interesting functions that you can find in the API reference in sections Aggregation Functions, Miscellaneous and Utility Functions.