Interactive online version: Binder badge

Arithmetic Operations And Aggregations

Import the LArray library:

[2]:
from larray import *

Check the version of LArray:

[3]:
from larray import __version__
__version__
[3]:
'0.31'

Arithmetic operations

Import a subset of the test array pop:

[4]:
# import a 6 x 2 x 2 subset of the 'pop' example array
pop = load_example_data('demography').pop[2016, 'BruCap', 90:95]
pop
[4]:
age  sex\nat    BE   FO
 90        M   539   74
 90        F  1477  136
 91        M   499   49
 91        F  1298  105
 92        M   332   35
 92        F  1141   78
 93        M   287   27
 93        F   906   74
 94        M   237   23
 94        F   739   65
 95        M   154   19
 95        F   566   53

One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually

[5]:
# addition
pop + 200
[5]:
age  sex\nat    BE   FO
 90        M   739  274
 90        F  1677  336
 91        M   699  249
 91        F  1498  305
 92        M   532  235
 92        F  1341  278
 93        M   487  227
 93        F  1106  274
 94        M   437  223
 94        F   939  265
 95        M   354  219
 95        F   766  253
[6]:
# multiplication
pop * 2
[6]:
age  sex\nat    BE   FO
 90        M  1078  148
 90        F  2954  272
 91        M   998   98
 91        F  2596  210
 92        M   664   70
 92        F  2282  156
 93        M   574   54
 93        F  1812  148
 94        M   474   46
 94        F  1478  130
 95        M   308   38
 95        F  1132  106
[7]:
# ** means raising to the power (squaring in this case)
pop ** 2
[7]:
age  sex\nat       BE     FO
 90        M   290521   5476
 90        F  2181529  18496
 91        M   249001   2401
 91        F  1684804  11025
 92        M   110224   1225
 92        F  1301881   6084
 93        M    82369    729
 93        F   820836   5476
 94        M    56169    529
 94        F   546121   4225
 95        M    23716    361
 95        F   320356   2809
[8]:
# % means modulo (aka remainder of division)
pop % 10
[8]:
age  sex\nat  BE  FO
 90        M   9   4
 90        F   7   6
 91        M   9   9
 91        F   8   5
 92        M   2   5
 92        F   1   8
 93        M   7   7
 93        F   6   4
 94        M   7   3
 94        F   9   5
 95        M   4   9
 95        F   6   3

More interestingly, it also works between two arrays

[9]:
# load mortality equivalent array
mortality = load_example_data('demography').qx[2016, 'BruCap', 90:95]

# compute number of deaths
death = pop * mortality
death
[9]:
age  sex\nat                  BE                  FO
 90        M   94.00000000000001  13.000000000000004
 90        F  204.00000000000003  19.000000000000004
 91        M                95.0                 9.0
 91        F  200.00000000000006                16.0
 92        M                70.0                 7.0
 92        F  195.00000000000006  13.000000000000004
 93        M   66.00000000000001                 6.0
 93        F  171.99999999999997                14.0
 94        M                59.0                 6.0
 94        F  155.00000000000003                14.0
 95        M                41.0                 5.0
 95        F               130.0  12.000000000000004

Note: Be careful when mixing different data types. You can use the method astype to change the data type of an array.

[10]:
# to be sure to get number of deaths as integers
# one can use .astype() method
death = (pop * mortality).astype(int)
death
[10]:
age  sex\nat   BE  FO
 90        M   94  13
 90        F  204  19
 91        M   95   9
 91        F  200  16
 92        M   70   7
 92        F  195  13
 93        M   66   6
 93        F  171  14
 94        M   59   6
 94        F  155  14
 95        M   41   5
 95        F  130  12

Warning: Operations between two arrays only works when they have compatible axes (i.e. same labels). However, it can be override but at your own risk. In that case only the position on the axis is used and not the labels.

[11]:
pop[90:92] * mortality[93:95]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-3e6b95e7cc66> in <module>
----> 1 pop[90:92] * mortality[93:95]

~/checkouts/readthedocs.org/user_builds/larray/conda/0.31/lib/python3.6/site-packages/larray-0.31-py3.6.egg/larray/core/array.py in opmethod(self, other)
   5493             if isinstance(other, LArray):
   5494                 # TODO: first test if it is not already broadcastable
-> 5495                 (self, other), res_axes = make_numpy_broadcastable([self, other])
   5496                 other = other.data
   5497             return LArray(super_method(self.data, other), res_axes)

~/checkouts/readthedocs.org/user_builds/larray/conda/0.31/lib/python3.6/site-packages/larray-0.31-py3.6.egg/larray/core/array.py in make_numpy_broadcastable(values, min_axes)
   9404     Axis.iscompatible : tests if axes are compatible between them.
   9405     """
-> 9406     all_axes = AxisCollection.union(*[get_axes(v) for v in values])
   9407     if min_axes is not None:
   9408         if not isinstance(min_axes, AxisCollection):

~/checkouts/readthedocs.org/user_builds/larray/conda/0.31/lib/python3.6/site-packages/larray-0.31-py3.6.egg/larray/core/axis.py in union(self, *args, **kwargs)
   1705             if not isinstance(a, AxisCollection):
   1706                 a = AxisCollection(a)
-> 1707             result.extend(a, validate=validate, replace_wildcards=replace_wildcards)
   1708         return result
   1709     __or__ = union

~/checkouts/readthedocs.org/user_builds/larray/conda/0.31/lib/python3.6/site-packages/larray-0.31-py3.6.egg/larray/core/axis.py in extend(self, axes, validate, replace_wildcards)
   2050                 # check that common axes are the same
   2051                 if validate and not old_axis.iscompatible(axis):
-> 2052                     raise ValueError("incompatible axes:\n%r\nvs\n%r" % (axis, old_axis))
   2053                 if replace_wildcards and old_axis.iswildcard:
   2054                     self[old_axis] = axis

ValueError: incompatible axes:
Axis([93, 94, 95], 'age')
vs
Axis([90, 91, 92], 'age')
[12]:
pop[90:92] * mortality[93:95].ignore_labels('age')
[12]:
age  sex\nat                  BE                  FO
 90        M  123.95121951219514  16.444444444444443
 90        F    280.401766004415   25.72972972972973
 91        M  124.22362869198312  12.782608695652174
 91        F  272.24627875507446  22.615384615384617
 92        M   88.38961038961038   9.210526315789473
 92        F  262.06713780918733   17.66037735849057

Boolean Operations

[13]:
pop2 = pop.copy()
pop2['F'] = -pop2['F']
pop2
[13]:
age  sex\nat     BE    FO
 90        M    539    74
 90        F  -1477  -136
 91        M    499    49
 91        F  -1298  -105
 92        M    332    35
 92        F  -1141   -78
 93        M    287    27
 93        F   -906   -74
 94        M    237    23
 94        F   -739   -65
 95        M    154    19
 95        F   -566   -53
[14]:
# testing for equality is done using == (a single = assigns the value)
pop == pop2
[14]:
age  sex\nat     BE     FO
 90        M   True   True
 90        F  False  False
 91        M   True   True
 91        F  False  False
 92        M   True   True
 92        F  False  False
 93        M   True   True
 93        F  False  False
 94        M   True   True
 94        F  False  False
 95        M   True   True
 95        F  False  False
[15]:
# testing for inequality
pop != pop2
[15]:
age  sex\nat     BE     FO
 90        M  False  False
 90        F   True   True
 91        M  False  False
 91        F   True   True
 92        M  False  False
 92        F   True   True
 93        M  False  False
 93        F   True   True
 94        M  False  False
 94        F   True   True
 95        M  False  False
 95        F   True   True
[16]:
# what was our original array like again?
pop
[16]:
age  sex\nat    BE   FO
 90        M   539   74
 90        F  1477  136
 91        M   499   49
 91        F  1298  105
 92        M   332   35
 92        F  1141   78
 93        M   287   27
 93        F   906   74
 94        M   237   23
 94        F   739   65
 95        M   154   19
 95        F   566   53
[17]:
# & means (boolean array) and
(pop >= 500) & (pop <= 1000)
[17]:
age  sex\nat     BE     FO
 90        M   True  False
 90        F  False  False
 91        M  False  False
 91        F  False  False
 92        M  False  False
 92        F  False  False
 93        M  False  False
 93        F   True  False
 94        M  False  False
 94        F   True  False
 95        M  False  False
 95        F   True  False
[18]:
# | means (boolean array) or
(pop < 500) | (pop > 1000)
[18]:
age  sex\nat     BE    FO
 90        M  False  True
 90        F   True  True
 91        M   True  True
 91        F   True  True
 92        M   True  True
 92        F   True  True
 93        M   True  True
 93        F  False  True
 94        M   True  True
 94        F  False  True
 95        M   True  True
 95        F  False  True

Arithmetic operations with missing axes

[19]:
pop.sum('age')
[19]:
sex\nat    BE   FO
      M  2048  227
      F  6127  511
[20]:
# arr has 3 dimensions
pop.info
[20]:
6 x 2 x 2
 age [6]: 90 91 92 93 94 95
 sex [2]: 'M' 'F'
 nat [2]: 'BE' 'FO'
dtype: int64
memory used: 192 bytes
[21]:
# and arr.sum(age) has two
pop.sum('age').info
[21]:
2 x 2
 sex [2]: 'M' 'F'
 nat [2]: 'BE' 'FO'
dtype: int64
memory used: 32 bytes
[22]:
# you can do operation with missing axes so this works
pop / pop.sum('age')
[22]:
age  sex\nat                   BE                   FO
 90        M        0.26318359375  0.32599118942731276
 90        F   0.2410641423208748  0.26614481409001955
 91        M        0.24365234375  0.21585903083700442
 91        F   0.2118491921005386   0.2054794520547945
 92        M          0.162109375  0.15418502202643172
 92        F  0.18622490615309287  0.15264187866927592
 93        M        0.14013671875  0.11894273127753303
 93        F  0.14787008323812634  0.14481409001956946
 94        M        0.11572265625   0.1013215859030837
 94        F  0.12061367716663947  0.12720156555772993
 95        M         0.0751953125  0.08370044052863436
 95        F  0.09237799902072792  0.10371819960861056

Axis order does not matter much (except for output)

You can do operations between arrays having different axes order. The axis order of the result is the same as the left array

[23]:
pop
[23]:
age  sex\nat    BE   FO
 90        M   539   74
 90        F  1477  136
 91        M   499   49
 91        F  1298  105
 92        M   332   35
 92        F  1141   78
 93        M   287   27
 93        F   906   74
 94        M   237   23
 94        F   739   65
 95        M   154   19
 95        F   566   53
[24]:
# let us change the order of axes
pop_transposed = pop.T
pop_transposed
[24]:
nat  sex\age    90    91    92   93   94   95
 BE        M   539   499   332  287  237  154
 BE        F  1477  1298  1141  906  739  566
 FO        M    74    49    35   27   23   19
 FO        F   136   105    78   74   65   53
[25]:
# mind blowing
pop_transposed + pop
[25]:
nat  sex\age    90    91    92    93    94    95
 BE        M  1078   998   664   574   474   308
 BE        F  2954  2596  2282  1812  1478  1132
 FO        M   148    98    70    54    46    38
 FO        F   272   210   156   148   130   106

Aggregates

Calculate the sum along an axis:

[26]:
pop = load_example_data('demography').pop[2016, 'BruCap']
pop.sum('age')
[26]:
sex\nat      BE      FO
      M  375261  204534
      F  401554  206541

or along all axes except one by appending _by to the aggregation function

[27]:
pop[90:95].sum_by('age')
# is equivalent to
pop[90:95].sum('sex', 'nat')
[27]:
age    90    91    92    93    94   95
     2226  1951  1586  1294  1064  792

Calculate the sum along one group:

[28]:
teens = pop.age[10:20]

pop.sum(teens)
[28]:
sex\nat     BE     FO
      M  53834  19145
      F  51740  18871

Calculate the sum along two groups:

[29]:
pensioners = pop.age[67:]

# groups from the same axis must be grouped in a tuple
pop.sum((teens, pensioners))
[29]:
  age  sex\nat     BE     FO
10:20        M  53834  19145
10:20        F  51740  18871
  67:        M  44138   9939
  67:        F  70314  13241

Mixing axes and groups in aggregations:

[30]:
pop.sum((teens, pensioners), 'nat')
[30]:
age\sex      M      F
  10:20  72979  70611
    67:  54077  83555

More On Aggregations

There are many other aggregation functions:

  • mean, min, max, median, percentile, var (variance), std (standard deviation)

  • labelofmin, labelofmax (label indirect minimum/maxium – labels where the value is minimum/maximum)

  • indexofmin, indexofmax (positional indirect minimum/maxium – position along axis where the value is minimum/maximum)

  • cumsum, cumprod (cumulative sum, cumulative product)