Interactive online version: Binder badge

Import the LArray library:

[1]:
from larray import *
[2]:
# load 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')

# extract the 'population' array from the dataset
population = demography_eurostat.population
population
[2]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535

Inspecting Array objects

Get array summary : metadata + dimensions + description of axes + dtype + size in memory

[3]:
# Array summary: metadata + dimensions + description of axes
population.info
[3]:
title: Population on 1 January by age and sex
source: table demo_pjan from Eurostat
3 x 2 x 5
 country [3]: 'Belgium' 'France' 'Germany'
 gender [2]: 'Male' 'Female'
 time [5]: 2013 2014 2015 2016 2017
dtype: int32
memory used: 120 bytes

Get axes

[4]:
population.axes
[4]:
AxisCollection([
    Axis(['Belgium', 'France', 'Germany'], 'country'),
    Axis(['Male', 'Female'], 'gender'),
    Axis([2013, 2014, 2015, 2016, 2017], 'time')
])

Get axis names

[5]:
population.axes.names
[5]:
['country', 'gender', 'time']

Get number of dimensions

[6]:
population.ndim
[6]:
3

Get length of each dimension

[7]:
population.shape
[7]:
(3, 2, 5)

Get total number of elements of the array

[8]:
population.size
[8]:
30

Get type of internal data (int, float, …)

[9]:
population.dtype
[9]:
dtype('int32')

Get size in memory

[10]:
population.memory_used
[10]:
'120 bytes'

Some Useful Functions

with total

Add totals to one or several axes:

[11]:
population.with_total('gender', label='Total')
[11]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
Belgium        Total  11137974  11180840  11237274  11311117  11351727
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
 France        Total  65600350  66165980  66458153  66638391  66804121
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535
Germany        Total  80523746  80767463  81197537  82175684  82521653

See with_total for more details and examples.

where

The where function can be used to apply some computation depending on a condition:

[12]:
# where(condition, value if true, value if false)
where(population < population.mean('time'), -population, population)
[12]:
country  gender\time       2013       2014       2015      2016      2017
Belgium         Male   -5472856   -5493792   -5524068   5569264   5589272
Belgium       Female   -5665118   -5687048   -5713206   5741853   5762455
 France         Male  -31772665  -32045129   32174258  32247386  32318973
 France       Female  -33827685  -34120851   34283895  34391005  34485148
Germany         Male  -39380976  -39556923  -39835457  40514123  40697118
Germany       Female  -41142770  -41210540  -41362080  41661561  41824535

See where for more details and examples.

clip

Set all data between a certain range:

[13]:
# values below 10 millions are set to 10 millions
population.clip(minval=10**7)
[13]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male  10000000  10000000  10000000  10000000  10000000
Belgium       Female  10000000  10000000  10000000  10000000  10000000
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535
[14]:
# values above 40 millions are set to 40 millions
population.clip(maxval=4*10**7)
[14]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40000000  40000000
Germany       Female  40000000  40000000  40000000  40000000  40000000
[15]:
# values below 10 millions are set to 10 millions and
# values above 40 millions are set to 40 millions
population.clip(10**7, 4*10**7)
[15]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male  10000000  10000000  10000000  10000000  10000000
Belgium       Female  10000000  10000000  10000000  10000000  10000000
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40000000  40000000
Germany       Female  40000000  40000000  40000000  40000000  40000000
[16]:
# Using vectors to define the lower and upper bounds
lower_bound = sequence(population.time, initial=5_500_000, inc=50_000)
upper_bound = sequence(population.time, 41_000_000, inc=100_000)

print(lower_bound, '\n')
print(upper_bound, '\n')

population.clip(lower_bound, upper_bound)
time     2013     2014     2015     2016     2017
      5500000  5550000  5600000  5650000  5700000

time      2013      2014      2015      2016      2017
      41000000  41100000  41200000  41300000  41400000

[16]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5500000   5550000   5600000   5650000   5700000
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41000000  41100000  41200000  41300000  41400000

See clip for more details and examples.

divnot0

Replace division by 0 by 0:

[17]:
divisor = ones(population.axes, dtype=int)
divisor['Male'] = 0
divisor
[17]:
country  gender\time  2013  2014  2015  2016  2017
Belgium         Male     0     0     0     0     0
Belgium       Female     1     1     1     1     1
 France         Male     0     0     0     0     0
 France       Female     1     1     1     1     1
Germany         Male     0     0     0     0     0
Germany       Female     1     1     1     1     1
[18]:
population / divisor
/tmp/ipykernel_2829/1661386825.py:1: RuntimeWarning: divide by zero encountered during operation
  population / divisor
[18]:
country  gender\time        2013        2014        2015        2016        2017
Belgium         Male         inf         inf         inf         inf         inf
Belgium       Female   5665118.0   5687048.0   5713206.0   5741853.0   5762455.0
 France         Male         inf         inf         inf         inf         inf
 France       Female  33827685.0  34120851.0  34283895.0  34391005.0  34485148.0
Germany         Male         inf         inf         inf         inf         inf
Germany       Female  41142770.0  41210540.0  41362080.0  41661561.0  41824535.0
[19]:
# we use astype(int) since the divnot0 method
# returns a float array in this case while
# we want an integer array
population.divnot0(divisor).astype(int)
[19]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male         0         0         0         0         0
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male         0         0         0         0         0
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male         0         0         0         0         0
Germany       Female  41142770  41210540  41362080  41661561  41824535

See divnot0 for more details and examples.

ratio

The ratio (rationot0) method returns an array with all values divided by the sum of values along given axes:

[20]:
population.ratio('gender')

# which is equivalent to
population / population.sum('gender')
[20]:
country  gender\time                 2013  ...                 2017
Belgium         Male    0.491369076638175  ...   0.4923719536243252
Belgium       Female    0.508630923361825  ...   0.5076280463756748
 France         Male  0.48433682137366646  ...  0.48378711546852027
 France       Female   0.5156631786263336  ...   0.5162128845314797
Germany         Male   0.4890604071002857  ...   0.4931689625751922
Germany       Female   0.5109395928997144  ...   0.5068310374248077

See ratio and rationot0 for more details and examples.

percents

[21]:
# or, if you want the previous ratios in percents
population.percent('gender')
[21]:
country  gender\time                2013  ...                2017
Belgium         Male  49.136907663817496  ...  49.237195362432516
Belgium       Female  50.863092336182504  ...  50.762804637567484
 France         Male  48.433682137366645  ...  48.378711546852024
 France       Female  51.566317862633355  ...  51.621288453147976
Germany         Male   48.90604071002857  ...  49.316896257519225
Germany       Female   51.09395928997143  ...  50.683103742480775

See percent for more details and examples.

diff

The diff method calculates the n-th order discrete difference along a given axis.

The first order difference is given by out[n+1] = in[n+1] - in[n] along the given axis.

[22]:
# calculates 'diff[year+1] = population[year+1] - population[year]'
population.diff('time')
[22]:
country  gender\time    2014    2015    2016    2017
Belgium         Male   20936   30276   45196   20008
Belgium       Female   21930   26158   28647   20602
 France         Male  272464  129129   73128   71587
 France       Female  293166  163044  107110   94143
Germany         Male  175947  278534  678666  182995
Germany       Female   67770  151540  299481  162974
[23]:
# calculates 'diff[year+2] = population[year+2] - population[year]'
population.diff('time', d=2)
[23]:
country  gender\time    2015    2016    2017
Belgium         Male   51212   75472   65204
Belgium       Female   48088   54805   49249
 France         Male  401593  202257  144715
 France       Female  456210  270154  201253
Germany         Male  454481  957200  861661
Germany       Female  219310  451021  462455
[24]:
# calculates 'diff[year] = population[year+1] - population[year]'
population.diff('time', label='lower')
[24]:
country  gender\time    2013    2014    2015    2016
Belgium         Male   20936   30276   45196   20008
Belgium       Female   21930   26158   28647   20602
 France         Male  272464  129129   73128   71587
 France       Female  293166  163044  107110   94143
Germany         Male  175947  278534  678666  182995
Germany       Female   67770  151540  299481  162974

See diff for more details and examples.

growth_rate

The growth_rate method calculates the growth along a given axis.

It is roughly equivalent to a.diff(axis, d, label) / a[axis.i[:-d]]:

[25]:
population.growth_rate('time')
[25]:
country  gender\time                   2014  ...                   2017
Belgium         Male  0.0038254249700704714  ...   0.003592575248722273
Belgium       Female  0.0038710579373633525  ...  0.0035880403068486778
 France         Male   0.008575421671427311  ...  0.0022199318729276226
 France       Female   0.008666451753940596  ...   0.002737430906715288
Germany         Male   0.004467817151103619  ...    0.00451681997411125
Germany       Female   0.001647190988842025  ...  0.0039118553431063225

See growth_rate for more details and examples.

shift

The shift method drops first label of an axis and shifts all subsequent labels

[26]:
population.shift('time')
[26]:
country  gender\time      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264
Belgium       Female   5665118   5687048   5713206   5741853
 France         Male  31772665  32045129  32174258  32247386
 France       Female  33827685  34120851  34283895  34391005
Germany         Male  39380976  39556923  39835457  40514123
Germany       Female  41142770  41210540  41362080  41661561
[27]:
# when shift is applied on an (increasing) time axis,
# it effectively brings "past" data into the future
population_shifted = population.shift('time')
stack({'population_shifted_2014': population_shifted[2014], 'population_2013': population[2013]}, 'array')
[27]:
country  gender\array  population_shifted_2014  population_2013
Belgium          Male                  5472856          5472856
Belgium        Female                  5665118          5665118
 France          Male                 31772665         31772665
 France        Female                 33827685         33827685
Germany          Male                 39380976         39380976
Germany        Female                 41142770         41142770

See shift for more details and examples.

Other interesting functions

There are a lot more interesting functions that you can find in the API reference in sections Aggregation Functions, Miscellaneous and Utility Functions.