Arithmetic Operations
Import the LArray library:
[1]:
from larray import *
Load the population
array from the demography_eurostat
dataset:
[2]:
# load the 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')
# extract the 'country', 'gender' and 'time' axes
country = demography_eurostat.country
gender = demography_eurostat.gender
time = demography_eurostat.time
# extract the 'population' array
population = demography_eurostat.population
# show the 'population' array
population
[2]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
Basics
One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually
[3]:
# 'true' division
population_in_millions = population / 1_000_000
population_in_millions
[3]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5.472856 5.493792 5.524068 5.569264 5.589272
Belgium Female 5.665118 5.687048 5.713206 5.741853 5.762455
France Male 31.772665 32.045129 32.174258 32.247386 32.318973
France Female 33.827685 34.120851 34.283895 34.391005 34.485148
Germany Male 39.380976 39.556923 39.835457 40.514123 40.697118
Germany Female 41.14277 41.21054 41.36208 41.661561 41.824535
[4]:
# 'floor' division
population_in_millions = population // 1_000_000
population_in_millions
[4]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5 5 5 5 5
Belgium Female 5 5 5 5 5
France Male 31 32 32 32 32
France Female 33 34 34 34 34
Germany Male 39 39 39 40 40
Germany Female 41 41 41 41 41
Warning: Python has two different division operators:
the ‘true’ division (/) always returns a float.
the ‘floor’ division (//) returns an integer result (discarding any fractional result).
[5]:
# % means modulo (aka remainder of division)
population % 1_000_000
[5]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 472856 493792 524068 569264 589272
Belgium Female 665118 687048 713206 741853 762455
France Male 772665 45129 174258 247386 318973
France Female 827685 120851 283895 391005 485148
Germany Male 380976 556923 835457 514123 697118
Germany Female 142770 210540 362080 661561 824535
[6]:
# ** means raising to the power
print(ndtest(4))
ndtest(4) ** 3
a a0 a1 a2 a3
0 1 2 3
[6]:
a a0 a1 a2 a3
0 1 8 27
More interestingly, binary operators as above also works between two arrays.
Let us imagine a rate of population growth which is constant over time but different by gender and country:
[7]:
growth_rate = Array(data=[[1.011, 1.010], [1.013, 1.011], [1.010, 1.009]], axes=[country, gender])
growth_rate
[7]:
country\gender Male Female
Belgium 1.011 1.01
France 1.013 1.011
Germany 1.01 1.009
[8]:
# we store the population of the year 2017 in a new variable
population_2017 = population[2017]
population_2017
[8]:
country\gender Male Female
Belgium 5589272 5762455
France 32318973 34485148
Germany 40697118 41824535
[9]:
# perform an arithmetic operation between two arrays
population_2018 = population_2017 * growth_rate
population_2018
[9]:
country\gender Male Female
Belgium 5650753.992 5820079.55
France 32739119.648999996 34864484.628
Germany 41104089.18 42200955.815
Note: Be careful when mixing different data types. You can use the method astype to change the data type of an array.
[10]:
# force the resulting matrix to be an integer matrix
population_2018 = (population_2017 * growth_rate).astype(int)
population_2018
[10]:
country\gender Male Female
Belgium 5650753 5820079
France 32739119 34864484
Germany 41104089 42200955
Axis order does not matter much (except for output)
You can do operations between arrays having different axes order. The axis order of the result is the same as the left array
[11]:
# let's change the order of axes of the 'constant_growth_rate' array
transposed_growth_rate = growth_rate.transpose()
# look at the order of the new 'transposed_growth_rate' array:
# 'gender' is the first axis while 'country' is the second
transposed_growth_rate
[11]:
gender\country Belgium France Germany
Male 1.011 1.013 1.01
Female 1.01 1.011 1.009
[12]:
# look at the order of the 'population_2017' array:
# 'country' is the first axis while 'gender' is the second
population_2017
[12]:
country\gender Male Female
Belgium 5589272 5762455
France 32318973 34485148
Germany 40697118 41824535
[13]:
# LArray doesn't care of axes order when performing
# arithmetic operations between arrays
population_2018 = population_2017 * transposed_growth_rate
population_2018
[13]:
country\gender Male Female
Belgium 5650753.992 5820079.55
France 32739119.648999996 34864484.628
Germany 41104089.18 42200955.815
Axes must be compatible
Arithmetic operations between two arrays only works when they have compatible axes (i.e. same list of labels in the same order).
[14]:
# show 'population_2017'
population_2017
[14]:
country\gender Male Female
Belgium 5589272 5762455
France 32318973 34485148
Germany 40697118 41824535
Order of labels matters
[15]:
# let us imagine that the labels of the 'country' axis
# of the 'constant_growth_rate' array are in a different order
# than in the 'population_2017' array
reordered_growth_rate = growth_rate.reindex('country', ['Germany', 'Belgium', 'France'])
reordered_growth_rate
[15]:
country\gender Male Female
Germany 1.01 1.009
Belgium 1.011 1.01
France 1.013 1.011
[16]:
# when doing arithmetic operations,
# the order of labels counts
try:
population_2018 = population_2017 * reordered_growth_rate
except Exception as e:
print(type(e).__name__, e)
ValueError incompatible axes:
Axis(['Germany', 'Belgium', 'France'], 'country')
vs
Axis(['Belgium', 'France', 'Germany'], 'country')
No extra or missing labels are permitted
[17]:
# let us imagine that the 'country' axis of
# the 'constant_growth_rate' array has an extra
# label 'Netherlands' compared to the same axis of
# the 'population_2017' array
growth_rate_netherlands = Array([1.012, 1.], population.gender)
growth_rate_extra_country = growth_rate.append('country', growth_rate_netherlands, label='Netherlands')
growth_rate_extra_country
[17]:
country\gender Male Female
Belgium 1.011 1.01
France 1.013 1.011
Germany 1.01 1.009
Netherlands 1.012 1.0
[18]:
# when doing arithmetic operations,
# no extra or missing labels are permitted
try:
population_2018 = population_2017 * growth_rate_extra_country
except Exception as e:
print(type(e).__name__, e)
ValueError incompatible axes:
Axis(['Belgium', 'France', 'Germany', 'Netherlands'], 'country')
vs
Axis(['Belgium', 'France', 'Germany'], 'country')
Ignoring labels (risky)
Warning: Operations between two arrays only works when they have compatible axes (i.e. same labels) but this behavior can be override via the ignore_labels method. In that case only the position on the axis is used and not the labels.
Using this method is done at your own risk and SHOULD NEVER BEEN USED IN A MODEL. Use this method only for quick tests or rapid data exploration.
[19]:
# let us imagine that the labels of the 'country' axis
# of the 'constant_growth_rate' array are the
# country codes instead of the country full names
growth_rate_country_codes = growth_rate.set_labels('country', ['BE', 'FR', 'DE'])
growth_rate_country_codes
[19]:
country\gender Male Female
BE 1.011 1.01
FR 1.013 1.011
DE 1.01 1.009
[20]:
# use the .ignore_labels() method on axis 'country'
# to avoid the incompatible axes error (risky)
population_2018 = population_2017 * growth_rate_country_codes.ignore_labels('country')
population_2018
[20]:
country\gender Male Female
Belgium 5650753.992 5820079.55
France 32739119.648999996 34864484.628
Germany 41104089.18 42200955.815
Extra Or Missing Axes (Broadcasting)
The condition that axes must be compatible only applies on common axes. Making arithmetic operations between two arrays having the same axes is intuitive. However, arithmetic operations between two arrays can be performed even if the second array has extra and/or missing axes compared to the first one. Such mechanism is called broadcasting
. It allows to make a lot of arithmetic operations without using any loop. This is a great advantage since using loops in Python can be highly time
consuming (especially nested loops) and should be avoided as much as possible.
To understand how broadcasting works, let us start with a simple example. We assume we have the population of both men and women cumulated for each country:
[21]:
population_by_country = population_2017['Male'] + population_2017['Female']
population_by_country
[21]:
country Belgium France Germany
11351727 66804121 82521653
We also assume we have the proportion of each gender in the population and that proportion is supposed to be the same for all countries:
[22]:
gender_proportion = Array([0.49, 0.51], gender)
gender_proportion
[22]:
gender Male Female
0.49 0.51
Using the two 1D arrays above, we can naively compute the population by country and gender as follow:
[23]:
# define a new variable with both 'country' and 'gender' axes to store the result
population_by_country_and_gender = zeros([country, gender], dtype=int)
# loop over the 'country' and 'gender' axes
for c in country:
for g in gender:
population_by_country_and_gender[c, g] = population_by_country[c] * gender_proportion[g]
# display the result
population_by_country_and_gender
[23]:
country\gender Male Female
Belgium 5562346 5789380
France 32734019 34070101
Germany 40435609 42086043
Relying on the broadcasting
mechanism, the calculation above becomes:
[24]:
# the outer product is done automatically.
# No need to use any loop -> saves a lot of computation time
population_by_country_and_gender = population_by_country * gender_proportion
# display the result
population_by_country_and_gender.astype(int)
[24]:
country\gender Male Female
Belgium 5562346 5789380
France 32734019 34070101
Germany 40435609 42086043
In the calculation above, LArray
automatically creates a resulting array with axes given by the union of the axes of the two arrays involved in the arithmetic operation.
Let us do the same calculation but we add a common time
axis:
[25]:
population_by_country_and_year = population['Male'] + population['Female']
population_by_country_and_year
[25]:
country\time 2013 2014 2015 2016 2017
Belgium 11137974 11180840 11237274 11311117 11351727
France 65600350 66165980 66458153 66638391 66804121
Germany 80523746 80767463 81197537 82175684 82521653
[26]:
gender_proportion_by_year = Array([[0.49, 0.485, 0.495, 0.492, 0.498],
[0.51, 0.515, 0.505, 0.508, 0.502]], [gender, time])
gender_proportion_by_year
[26]:
gender\time 2013 2014 2015 2016 2017
Male 0.49 0.485 0.495 0.492 0.498
Female 0.51 0.515 0.505 0.508 0.502
Without the broadcasting
mechanism, the computation of the population by country, gender and year would have been:
[27]:
# define a new variable to store the result.
# Its axes is the union of the axes of the two arrays
# involved in the arithmetic operation
population_by_country_gender_year = zeros([country, gender, time], dtype=int)
# loop over axes which are not present in both arrays
# involved in the arithmetic operation
for c in country:
for g in gender:
# all subsets below have the same 'time' axis
population_by_country_gender_year[c, g] = population_by_country_and_year[c] * gender_proportion_by_year[g]
population_by_country_gender_year
[27]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5457607 5422707 5562450 5565069 5653160
Belgium Female 5680366 5758132 5674823 5746047 5698566
France Male 32144171 32090500 32896785 32786088 33268452
France Female 33456178 34075479 33561367 33852302 33535668
Germany Male 39456635 39172219 40192780 40430436 41095783
Germany Female 41067110 41595243 41004756 41745247 41425869
Once again, the above calculation can be simplified as:
[28]:
# No need to use any loop -> saves a lot of computation time
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year
# display the result
population_by_country_gender_year.astype(int)
[28]:
country time\gender Male Female
Belgium 2013 5457607 5680366
Belgium 2014 5422707 5758132
Belgium 2015 5562450 5674823
Belgium 2016 5565069 5746047
Belgium 2017 5653160 5698566
France 2013 32144171 33456178
France 2014 32090500 34075479
France 2015 32896785 33561367
France 2016 32786088 33852302
France 2017 33268452 33535668
Germany 2013 39456635 41067110
Germany 2014 39172219 41595243
Germany 2015 40192780 41004756
Germany 2016 40430436 41745247
Germany 2017 41095783 41425869
Warning: Broadcasting is a powerful mechanism but can be confusing at first. It can lead to unexpected results. In particular, if axes which are supposed to be common are not, you will get a resulting array with extra axes you didn’t want.
For example, imagine that the name of the time
axis is time
for the first array but period
for the second:
[29]:
gender_proportion_by_year = gender_proportion_by_year.rename('time', 'period')
gender_proportion_by_year
[29]:
gender\period 2013 2014 2015 2016 2017
Male 0.49 0.485 0.495 0.492 0.498
Female 0.51 0.515 0.505 0.508 0.502
[30]:
population_by_country_and_year
[30]:
country\time 2013 2014 2015 2016 2017
Belgium 11137974 11180840 11237274 11311117 11351727
France 65600350 66165980 66458153 66638391 66804121
Germany 80523746 80767463 81197537 82175684 82521653
[31]:
# the two arrays below have a "time" axis with two different names: 'time' and 'period'.
# LArray will treat the "time" axis of the two arrays as two different "time" axes
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year
# as a consequence, the result of the multiplication of the two arrays is not what we expected
population_by_country_gender_year.astype(int)
[31]:
country time gender\period 2013 2014 2015 2016 2017
Belgium 2013 Male 5457607 5401917 5513297 5479883 5546711
Belgium 2013 Female 5680366 5736056 5624676 5658090 5591262
Belgium 2014 Male 5478611 5422707 5534515 5500973 5568058
Belgium 2014 Female 5702228 5758132 5646324 5679866 5612781
Belgium 2015 Male 5506264 5450077 5562450 5528738 5596162
Belgium 2015 Female 5731009 5787196 5674823 5708535 5641111
Belgium 2016 Male 5542447 5485891 5599002 5565069 5632936
Belgium 2016 Female 5768669 5825225 5712114 5746047 5678180
Belgium 2017 Male 5562346 5505587 5619104 5585049 5653160
Belgium 2017 Female 5789380 5846139 5732622 5766677 5698566
France 2013 Male 32144171 31816169 32472173 32275372 32668974
France 2013 Female 33456178 33784180 33128176 33324977 32931375
France 2014 Male 32421330 32090500 32752160 32553662 32950658
France 2014 Female 33744649 34075479 33413819 33612317 33215321
France 2015 Male 32564494 32232204 32896785 32697411 33096160
France 2015 Female 33893658 34225948 33561367 33760741 33361992
France 2016 Male 32652811 32319619 32986003 32786088 33185918
France 2016 Female 33985579 34318771 33652387 33852302 33452472
France 2017 Male 32734019 32399998 33068039 32867627 33268452
France 2017 Female 34070101 34404122 33736081 33936493 33535668
Germany 2013 Male 39456635 39054016 39859254 39617683 40100825
Germany 2013 Female 41067110 41469729 40664491 40906062 40422920
Germany 2014 Male 39576056 39172219 39979894 39737591 40222196
Germany 2014 Female 41191406 41595243 40787568 41029871 40545266
Germany 2015 Male 39786793 39380805 40192780 39949188 40436373
Germany 2015 Female 41410743 41816731 41004756 41248348 40761163
Germany 2016 Male 40266085 39855206 40676963 40430436 40923490
Germany 2016 Female 41909598 42320477 41498720 41745247 41252193
Germany 2017 Male 40435609 40023001 40848218 40600653 41095783
Germany 2017 Female 42086043 42498651 41673434 41920999 41425869
Boolean Operations
Python comparison operators are:
Operator |
Meaning |
---|---|
|
equal |
|
not equal |
|
greater than |
|
greater than or equal |
|
less than |
|
less than or equal |
Applying a comparison operator on an array returns a boolean array:
[32]:
# test which values are greater than 10 millions
population > 10e6
[32]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male False False False False False
Belgium Female False False False False False
France Male True True True True True
France Female True True True True True
Germany Male True True True True True
Germany Female True True True True True
Comparison operations can be combined using Python bitwise operators:
Operator |
Meaning |
---|---|
& |
and |
| |
or |
~ |
not |
[33]:
# test which values are greater than 10 millions and less than 40 millions
(population > 10e6) & (population < 40e6)
[33]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male False False False False False
Belgium Female False False False False False
France Male True True True True True
France Female True True True True True
Germany Male True True True False False
Germany Female False False False False False
[34]:
# test which values are less than 10 millions or greater than 40 millions
(population < 10e6) | (population > 40e6)
[34]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male True True True True True
Belgium Female True True True True True
France Male False False False False False
France Female False False False False False
Germany Male False False False True True
Germany Female True True True True True
[35]:
# test which values are not less than 10 millions
~(population < 10e6)
[35]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male False False False False False
Belgium Female False False False False False
France Male True True True True True
France Female True True True True True
Germany Male True True True True True
Germany Female True True True True True
The returned boolean array can then be used in selections and assignments:
[36]:
population_copy = population.copy()
# set all values greater than 40 millions to 40 millions
population_copy[population_copy > 40e6] = 40e6
population_copy
[36]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40000000 40000000
Germany Female 40000000 40000000 40000000 40000000 40000000
Boolean operations can be made between arrays:
[37]:
# test where the two arrays have the same values
population == population_copy
[37]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male True True True True True
Belgium Female True True True True True
France Male True True True True True
France Female True True True True True
Germany Male True True True False False
Germany Female False False False False False
To test if all values between are equals, use the equals method:
[38]:
population.equals(population_copy)
[38]:
False