Interactive online version: Binder badge

Arithmetic Operations

Import the LArray library:

[1]:
from larray import *

Load the population array from the demography_eurostat dataset:

[2]:
# load the 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')

# extract the 'country', 'gender' and 'time' axes
country = demography_eurostat.country
gender = demography_eurostat.gender
time = demography_eurostat.time

# extract the 'population' array
population = demography_eurostat.population

# show the 'population' array
population
[2]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535

Basics

One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually

[3]:
# 'true' division
population_in_millions = population / 1_000_000
population_in_millions
[3]:
country  gender\time       2013       2014       2015       2016       2017
Belgium         Male   5.472856   5.493792   5.524068   5.569264   5.589272
Belgium       Female   5.665118   5.687048   5.713206   5.741853   5.762455
 France         Male  31.772665  32.045129  32.174258  32.247386  32.318973
 France       Female  33.827685  34.120851  34.283895  34.391005  34.485148
Germany         Male  39.380976  39.556923  39.835457  40.514123  40.697118
Germany       Female   41.14277   41.21054   41.36208  41.661561  41.824535
[4]:
# 'floor' division
population_in_millions = population // 1_000_000
population_in_millions
[4]:
country  gender\time  2013  2014  2015  2016  2017
Belgium         Male     5     5     5     5     5
Belgium       Female     5     5     5     5     5
 France         Male    31    32    32    32    32
 France       Female    33    34    34    34    34
Germany         Male    39    39    39    40    40
Germany       Female    41    41    41    41    41

Warning: Python has two different division operators:

  • the ‘true’ division (/) always returns a float.

  • the ‘floor’ division (//) returns an integer result (discarding any fractional result).

[5]:
# % means modulo (aka remainder of division)
population % 1_000_000
[5]:
country  gender\time    2013    2014    2015    2016    2017
Belgium         Male  472856  493792  524068  569264  589272
Belgium       Female  665118  687048  713206  741853  762455
 France         Male  772665   45129  174258  247386  318973
 France       Female  827685  120851  283895  391005  485148
Germany         Male  380976  556923  835457  514123  697118
Germany       Female  142770  210540  362080  661561  824535
[6]:
# ** means raising to the power
print(ndtest(4))
ndtest(4) ** 3
a  a0  a1  a2  a3
    0   1   2   3
[6]:
a  a0  a1  a2  a3
    0   1   8  27

More interestingly, binary operators as above also works between two arrays.

Let us imagine a rate of population growth which is constant over time but different by gender and country:

[7]:
growth_rate = Array(data=[[1.011, 1.010], [1.013, 1.011], [1.010, 1.009]], axes=[country, gender])
growth_rate
[7]:
country\gender   Male  Female
       Belgium  1.011    1.01
        France  1.013   1.011
       Germany   1.01   1.009
[8]:
# we store the population of the year 2017 in a new variable
population_2017 = population[2017]
population_2017
[8]:
country\gender      Male    Female
       Belgium   5589272   5762455
        France  32318973  34485148
       Germany  40697118  41824535
[9]:
# perform an arithmetic operation between two arrays
population_2018 = population_2017 * growth_rate
population_2018
[9]:
country\gender                Male        Female
       Belgium         5650753.992    5820079.55
        France  32739119.648999996  34864484.628
       Germany         41104089.18  42200955.815

Note: Be careful when mixing different data types. You can use the method astype to change the data type of an array.

[10]:
# force the resulting matrix to be an integer matrix
population_2018 = (population_2017 * growth_rate).astype(int)
population_2018
[10]:
country\gender      Male    Female
       Belgium   5650753   5820079
        France  32739119  34864484
       Germany  41104089  42200955

Axis order does not matter much (except for output)

You can do operations between arrays having different axes order. The axis order of the result is the same as the left array

[11]:
# let's change the order of axes of the 'constant_growth_rate' array
transposed_growth_rate = growth_rate.transpose()

# look at the order of the new 'transposed_growth_rate' array:
# 'gender' is the first axis while 'country' is the second
transposed_growth_rate
[11]:
gender\country  Belgium  France  Germany
          Male    1.011   1.013     1.01
        Female     1.01   1.011    1.009
[12]:
# look at the order of the 'population_2017' array:
# 'country' is the first axis while 'gender' is the second
population_2017
[12]:
country\gender      Male    Female
       Belgium   5589272   5762455
        France  32318973  34485148
       Germany  40697118  41824535
[13]:
# LArray doesn't care of axes order when performing
# arithmetic operations between arrays
population_2018 = population_2017 * transposed_growth_rate
population_2018
[13]:
country\gender                Male        Female
       Belgium         5650753.992    5820079.55
        France  32739119.648999996  34864484.628
       Germany         41104089.18  42200955.815

Axes must be compatible

Arithmetic operations between two arrays only works when they have compatible axes (i.e. same list of labels in the same order).

[14]:
# show 'population_2017'
population_2017
[14]:
country\gender      Male    Female
       Belgium   5589272   5762455
        France  32318973  34485148
       Germany  40697118  41824535

Order of labels matters

[15]:
# let us imagine that the labels of the 'country' axis
# of the 'constant_growth_rate' array are in a different order
# than in the 'population_2017' array
reordered_growth_rate = growth_rate.reindex('country', ['Germany', 'Belgium', 'France'])
reordered_growth_rate
[15]:
country\gender   Male  Female
       Germany   1.01   1.009
       Belgium  1.011    1.01
        France  1.013   1.011
[16]:
# when doing arithmetic operations,
# the order of labels counts
try:
    population_2018 = population_2017 * reordered_growth_rate
except Exception as e:
    print(type(e).__name__, e)
ValueError incompatible axes:
Axis(['Germany', 'Belgium', 'France'], 'country')
vs
Axis(['Belgium', 'France', 'Germany'], 'country')

No extra or missing labels are permitted

[17]:
# let us imagine that the 'country' axis of
# the 'constant_growth_rate' array has an extra
# label 'Netherlands' compared to the same axis of
# the 'population_2017' array
growth_rate_netherlands = Array([1.012, 1.], population.gender)
growth_rate_extra_country = growth_rate.append('country', growth_rate_netherlands, label='Netherlands')
growth_rate_extra_country
[17]:
country\gender   Male  Female
       Belgium  1.011    1.01
        France  1.013   1.011
       Germany   1.01   1.009
   Netherlands  1.012     1.0
[18]:
# when doing arithmetic operations,
# no extra or missing labels are permitted
try:
    population_2018 = population_2017 * growth_rate_extra_country
except Exception as e:
    print(type(e).__name__, e)
ValueError incompatible axes:
Axis(['Belgium', 'France', 'Germany', 'Netherlands'], 'country')
vs
Axis(['Belgium', 'France', 'Germany'], 'country')

Ignoring labels (risky)

Warning: Operations between two arrays only works when they have compatible axes (i.e. same labels) but this behavior can be override via the ignore_labels method. In that case only the position on the axis is used and not the labels.

Using this method is done at your own risk and SHOULD NEVER BEEN USED IN A MODEL. Use this method only for quick tests or rapid data exploration.

[19]:
# let us imagine that the labels of the 'country' axis
# of the 'constant_growth_rate' array are the
# country codes instead of the country full names
growth_rate_country_codes = growth_rate.set_labels('country', ['BE', 'FR', 'DE'])
growth_rate_country_codes
[19]:
country\gender   Male  Female
            BE  1.011    1.01
            FR  1.013   1.011
            DE   1.01   1.009
[20]:
# use the .ignore_labels() method on axis 'country'
# to avoid the incompatible axes error (risky)
population_2018 = population_2017 * growth_rate_country_codes.ignore_labels('country')
population_2018
[20]:
country\gender                Male        Female
       Belgium         5650753.992    5820079.55
        France  32739119.648999996  34864484.628
       Germany         41104089.18  42200955.815

Extra Or Missing Axes (Broadcasting)

The condition that axes must be compatible only applies on common axes. Making arithmetic operations between two arrays having the same axes is intuitive. However, arithmetic operations between two arrays can be performed even if the second array has extra and/or missing axes compared to the first one. Such mechanism is called broadcasting. It allows to make a lot of arithmetic operations without using any loop. This is a great advantage since using loops in Python can be highly time consuming (especially nested loops) and should be avoided as much as possible.

To understand how broadcasting works, let us start with a simple example. We assume we have the population of both men and women cumulated for each country:

[21]:
population_by_country = population_2017['Male'] + population_2017['Female']
population_by_country
[21]:
country   Belgium    France   Germany
         11351727  66804121  82521653

We also assume we have the proportion of each gender in the population and that proportion is supposed to be the same for all countries:

[22]:
gender_proportion = Array([0.49, 0.51], gender)
gender_proportion
[22]:
gender  Male  Female
        0.49    0.51

Using the two 1D arrays above, we can naively compute the population by country and gender as follow:

[23]:
# define a new variable with both 'country' and 'gender' axes to store the result
population_by_country_and_gender = zeros([country, gender], dtype=int)

# loop over the 'country' and 'gender' axes
for c in country:
    for g in gender:
        population_by_country_and_gender[c, g] = population_by_country[c] * gender_proportion[g]

# display the result
population_by_country_and_gender
[23]:
country\gender      Male    Female
       Belgium   5562346   5789380
        France  32734019  34070101
       Germany  40435609  42086043

Relying on the broadcasting mechanism, the calculation above becomes:

[24]:
# the outer product is done automatically.
# No need to use any loop -> saves a lot of computation time
population_by_country_and_gender = population_by_country * gender_proportion

# display the result
population_by_country_and_gender.astype(int)
[24]:
country\gender      Male    Female
       Belgium   5562346   5789380
        France  32734019  34070101
       Germany  40435609  42086043

In the calculation above, LArray automatically creates a resulting array with axes given by the union of the axes of the two arrays involved in the arithmetic operation.

Let us do the same calculation but we add a common time axis:

[25]:
population_by_country_and_year = population['Male'] + population['Female']
population_by_country_and_year
[25]:
country\time      2013      2014      2015      2016      2017
     Belgium  11137974  11180840  11237274  11311117  11351727
      France  65600350  66165980  66458153  66638391  66804121
     Germany  80523746  80767463  81197537  82175684  82521653
[26]:
gender_proportion_by_year = Array([[0.49, 0.485, 0.495, 0.492, 0.498],
                                   [0.51, 0.515, 0.505, 0.508, 0.502]], [gender, time])
gender_proportion_by_year
[26]:
gender\time  2013   2014   2015   2016   2017
       Male  0.49  0.485  0.495  0.492  0.498
     Female  0.51  0.515  0.505  0.508  0.502

Without the broadcasting mechanism, the computation of the population by country, gender and year would have been:

[27]:
# define a new variable to store the result.
# Its axes is the union of the axes of the two arrays
# involved in the arithmetic operation
population_by_country_gender_year = zeros([country, gender, time], dtype=int)

# loop over axes which are not present in both arrays
# involved in the arithmetic operation
for c in country:
    for g in gender:
        # all subsets below have the same 'time' axis
        population_by_country_gender_year[c, g] = population_by_country_and_year[c] * gender_proportion_by_year[g]

population_by_country_gender_year
[27]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5457607   5422707   5562450   5565069   5653160
Belgium       Female   5680366   5758132   5674823   5746047   5698566
 France         Male  32144171  32090500  32896785  32786088  33268452
 France       Female  33456178  34075479  33561367  33852302  33535668
Germany         Male  39456635  39172219  40192780  40430436  41095783
Germany       Female  41067110  41595243  41004756  41745247  41425869

Once again, the above calculation can be simplified as:

[28]:
# No need to use any loop -> saves a lot of computation time
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year

# display the result
population_by_country_gender_year.astype(int)
[28]:
country  time\gender      Male    Female
Belgium         2013   5457607   5680366
Belgium         2014   5422707   5758132
Belgium         2015   5562450   5674823
Belgium         2016   5565069   5746047
Belgium         2017   5653160   5698566
 France         2013  32144171  33456178
 France         2014  32090500  34075479
 France         2015  32896785  33561367
 France         2016  32786088  33852302
 France         2017  33268452  33535668
Germany         2013  39456635  41067110
Germany         2014  39172219  41595243
Germany         2015  40192780  41004756
Germany         2016  40430436  41745247
Germany         2017  41095783  41425869

Warning: Broadcasting is a powerful mechanism but can be confusing at first. It can lead to unexpected results. In particular, if axes which are supposed to be common are not, you will get a resulting array with extra axes you didn’t want.

For example, imagine that the name of the time axis is time for the first array but period for the second:

[29]:
gender_proportion_by_year = gender_proportion_by_year.rename('time', 'period')
gender_proportion_by_year
[29]:
gender\period  2013   2014   2015   2016   2017
         Male  0.49  0.485  0.495  0.492  0.498
       Female  0.51  0.515  0.505  0.508  0.502
[30]:
population_by_country_and_year
[30]:
country\time      2013      2014      2015      2016      2017
     Belgium  11137974  11180840  11237274  11311117  11351727
      France  65600350  66165980  66458153  66638391  66804121
     Germany  80523746  80767463  81197537  82175684  82521653
[31]:
# the two arrays below have a "time" axis with two different names: 'time' and 'period'.
# LArray will treat the "time" axis of the two arrays as two different "time" axes
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year

# as a consequence, the result of the multiplication of the two arrays is not what we expected
population_by_country_gender_year.astype(int)
[31]:
country  time  gender\period      2013      2014      2015      2016      2017
Belgium  2013           Male   5457607   5401917   5513297   5479883   5546711
Belgium  2013         Female   5680366   5736056   5624676   5658090   5591262
Belgium  2014           Male   5478611   5422707   5534515   5500973   5568058
Belgium  2014         Female   5702228   5758132   5646324   5679866   5612781
Belgium  2015           Male   5506264   5450077   5562450   5528738   5596162
Belgium  2015         Female   5731009   5787196   5674823   5708535   5641111
Belgium  2016           Male   5542447   5485891   5599002   5565069   5632936
Belgium  2016         Female   5768669   5825225   5712114   5746047   5678180
Belgium  2017           Male   5562346   5505587   5619104   5585049   5653160
Belgium  2017         Female   5789380   5846139   5732622   5766677   5698566
 France  2013           Male  32144171  31816169  32472173  32275372  32668974
 France  2013         Female  33456178  33784180  33128176  33324977  32931375
 France  2014           Male  32421330  32090500  32752160  32553662  32950658
 France  2014         Female  33744649  34075479  33413819  33612317  33215321
 France  2015           Male  32564494  32232204  32896785  32697411  33096160
 France  2015         Female  33893658  34225948  33561367  33760741  33361992
 France  2016           Male  32652811  32319619  32986003  32786088  33185918
 France  2016         Female  33985579  34318771  33652387  33852302  33452472
 France  2017           Male  32734019  32399998  33068039  32867627  33268452
 France  2017         Female  34070101  34404122  33736081  33936493  33535668
Germany  2013           Male  39456635  39054016  39859254  39617683  40100825
Germany  2013         Female  41067110  41469729  40664491  40906062  40422920
Germany  2014           Male  39576056  39172219  39979894  39737591  40222196
Germany  2014         Female  41191406  41595243  40787568  41029871  40545266
Germany  2015           Male  39786793  39380805  40192780  39949188  40436373
Germany  2015         Female  41410743  41816731  41004756  41248348  40761163
Germany  2016           Male  40266085  39855206  40676963  40430436  40923490
Germany  2016         Female  41909598  42320477  41498720  41745247  41252193
Germany  2017           Male  40435609  40023001  40848218  40600653  41095783
Germany  2017         Female  42086043  42498651  41673434  41920999  41425869

Boolean Operations

Python comparison operators are:

Operator

Meaning

==

equal

!=

not equal

>

greater than

>=

greater than or equal

<

less than

<=

less than or equal

Applying a comparison operator on an array returns a boolean array:

[32]:
# test which values are greater than 10 millions
population > 10e6
[32]:
country  gender\time   2013   2014   2015   2016   2017
Belgium         Male  False  False  False  False  False
Belgium       Female  False  False  False  False  False
 France         Male   True   True   True   True   True
 France       Female   True   True   True   True   True
Germany         Male   True   True   True   True   True
Germany       Female   True   True   True   True   True

Comparison operations can be combined using Python bitwise operators:

Operator

Meaning

&

and

|

or

~

not

[33]:
# test which values are greater than 10 millions and less than 40 millions
(population > 10e6) & (population < 40e6)
[33]:
country  gender\time   2013   2014   2015   2016   2017
Belgium         Male  False  False  False  False  False
Belgium       Female  False  False  False  False  False
 France         Male   True   True   True   True   True
 France       Female   True   True   True   True   True
Germany         Male   True   True   True  False  False
Germany       Female  False  False  False  False  False
[34]:
# test which values are less than 10 millions or greater than 40 millions
(population < 10e6) | (population > 40e6)
[34]:
country  gender\time   2013   2014   2015   2016   2017
Belgium         Male   True   True   True   True   True
Belgium       Female   True   True   True   True   True
 France         Male  False  False  False  False  False
 France       Female  False  False  False  False  False
Germany         Male  False  False  False   True   True
Germany       Female   True   True   True   True   True
[35]:
# test which values are not less than 10 millions
~(population < 10e6)
[35]:
country  gender\time   2013   2014   2015   2016   2017
Belgium         Male  False  False  False  False  False
Belgium       Female  False  False  False  False  False
 France         Male   True   True   True   True   True
 France       Female   True   True   True   True   True
Germany         Male   True   True   True   True   True
Germany       Female   True   True   True   True   True

The returned boolean array can then be used in selections and assignments:

[36]:
population_copy = population.copy()

# set all values greater than 40 millions to 40 millions
population_copy[population_copy > 40e6] = 40e6
population_copy
[36]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40000000  40000000
Germany       Female  40000000  40000000  40000000  40000000  40000000

Boolean operations can be made between arrays:

[37]:
# test where the two arrays have the same values
population == population_copy
[37]:
country  gender\time   2013   2014   2015   2016   2017
Belgium         Male   True   True   True   True   True
Belgium       Female   True   True   True   True   True
 France         Male   True   True   True   True   True
 France       Female   True   True   True   True   True
Germany         Male   True   True   True  False  False
Germany       Female  False  False  False  False  False

To test if all values between are equals, use the equals method:

[38]:
population.equals(population_copy)
[38]:
False