Arithmetic Operations And Aggregations¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.30'
Arithmetic operations¶
Import a subset of the test array pop
:
[4]:
# import a 6 x 2 x 2 subset of the 'pop' example array
pop = load_example_data('demography').pop[2016, 'BruCap', 90:95]
pop
[4]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually
[5]:
# addition
pop + 200
[5]:
age sex\nat BE FO
90 M 739 274
90 F 1677 336
91 M 699 249
91 F 1498 305
92 M 532 235
92 F 1341 278
93 M 487 227
93 F 1106 274
94 M 437 223
94 F 939 265
95 M 354 219
95 F 766 253
[6]:
# multiplication
pop * 2
[6]:
age sex\nat BE FO
90 M 1078 148
90 F 2954 272
91 M 998 98
91 F 2596 210
92 M 664 70
92 F 2282 156
93 M 574 54
93 F 1812 148
94 M 474 46
94 F 1478 130
95 M 308 38
95 F 1132 106
[7]:
# ** means raising to the power (squaring in this case)
pop ** 2
[7]:
age sex\nat BE FO
90 M 290521 5476
90 F 2181529 18496
91 M 249001 2401
91 F 1684804 11025
92 M 110224 1225
92 F 1301881 6084
93 M 82369 729
93 F 820836 5476
94 M 56169 529
94 F 546121 4225
95 M 23716 361
95 F 320356 2809
[8]:
# % means modulo (aka remainder of division)
pop % 10
[8]:
age sex\nat BE FO
90 M 9 4
90 F 7 6
91 M 9 9
91 F 8 5
92 M 2 5
92 F 1 8
93 M 7 7
93 F 6 4
94 M 7 3
94 F 9 5
95 M 4 9
95 F 6 3
More interestingly, it also works between two arrays
[9]:
# load mortality equivalent array
mortality = load_example_data('demography').qx[2016, 'BruCap', 90:95]
# compute number of deaths
death = pop * mortality
death
[9]:
age sex\nat BE FO
90 M 94.00000000000001 13.000000000000004
90 F 204.00000000000003 19.000000000000004
91 M 95.0 9.0
91 F 200.00000000000006 16.0
92 M 70.0 7.0
92 F 195.00000000000006 13.000000000000004
93 M 66.00000000000001 6.0
93 F 171.99999999999997 14.0
94 M 59.0 6.0
94 F 155.00000000000003 14.0
95 M 41.0 5.0
95 F 130.0 12.000000000000004
Note: Be careful when mixing different data types. You can use the method astype
to change the data type of an array.
[10]:
# to be sure to get number of deaths as integers
# one can use .astype() method
death = (pop * mortality).astype(int)
death
[10]:
age sex\nat BE FO
90 M 94 13
90 F 204 19
91 M 95 9
91 F 200 16
92 M 70 7
92 F 195 13
93 M 66 6
93 F 171 14
94 M 59 6
94 F 155 14
95 M 41 5
95 F 130 12
Warning: Operations between two arrays only works when they have compatible axes (i.e. same labels). However, it can be override but at your own risk. In that case only the position on the axis is used and not the labels.
[11]:
pop[90:92] * mortality[93:95]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-3e6b95e7cc66> in <module>
----> 1 pop[90:92] * mortality[93:95]
~/checkouts/readthedocs.org/user_builds/larray/conda/0.30/lib/python3.6/site-packages/larray-0.30-py3.6.egg/larray/core/array.py in opmethod(self, other)
5439 if isinstance(other, LArray):
5440 # TODO: first test if it is not already broadcastable
-> 5441 (self, other), res_axes = make_numpy_broadcastable([self, other])
5442 other = other.data
5443 return LArray(super_method(self.data, other), res_axes)
~/checkouts/readthedocs.org/user_builds/larray/conda/0.30/lib/python3.6/site-packages/larray-0.30-py3.6.egg/larray/core/array.py in make_numpy_broadcastable(values, min_axes)
9350 Axis.iscompatible : tests if axes are compatible between them.
9351 """
-> 9352 all_axes = AxisCollection.union(*[get_axes(v) for v in values])
9353 if min_axes is not None:
9354 if not isinstance(min_axes, AxisCollection):
~/checkouts/readthedocs.org/user_builds/larray/conda/0.30/lib/python3.6/site-packages/larray-0.30-py3.6.egg/larray/core/axis.py in union(self, *args, **kwargs)
1702 if not isinstance(a, AxisCollection):
1703 a = AxisCollection(a)
-> 1704 result.extend(a, validate=validate, replace_wildcards=replace_wildcards)
1705 return result
1706 __or__ = union
~/checkouts/readthedocs.org/user_builds/larray/conda/0.30/lib/python3.6/site-packages/larray-0.30-py3.6.egg/larray/core/axis.py in extend(self, axes, validate, replace_wildcards)
2047 # check that common axes are the same
2048 if validate and not old_axis.iscompatible(axis):
-> 2049 raise ValueError("incompatible axes:\n%r\nvs\n%r" % (axis, old_axis))
2050 if replace_wildcards and old_axis.iswildcard:
2051 self[old_axis] = axis
ValueError: incompatible axes:
Axis([93, 94, 95], 'age')
vs
Axis([90, 91, 92], 'age')
[12]:
pop[90:92] * mortality[93:95].ignore_labels('age')
[12]:
age sex\nat BE FO
90 M 123.95121951219514 16.444444444444443
90 F 280.401766004415 25.72972972972973
91 M 124.22362869198312 12.782608695652174
91 F 272.24627875507446 22.615384615384617
92 M 88.38961038961038 9.210526315789473
92 F 262.06713780918733 17.66037735849057
Boolean Operations¶
[13]:
pop2 = pop.copy()
pop2['F'] = -pop2['F']
pop2
[13]:
age sex\nat BE FO
90 M 539 74
90 F -1477 -136
91 M 499 49
91 F -1298 -105
92 M 332 35
92 F -1141 -78
93 M 287 27
93 F -906 -74
94 M 237 23
94 F -739 -65
95 M 154 19
95 F -566 -53
[14]:
# testing for equality is done using == (a single = assigns the value)
pop == pop2
[14]:
age sex\nat BE FO
90 M True True
90 F False False
91 M True True
91 F False False
92 M True True
92 F False False
93 M True True
93 F False False
94 M True True
94 F False False
95 M True True
95 F False False
[15]:
# testing for inequality
pop != pop2
[15]:
age sex\nat BE FO
90 M False False
90 F True True
91 M False False
91 F True True
92 M False False
92 F True True
93 M False False
93 F True True
94 M False False
94 F True True
95 M False False
95 F True True
[16]:
# what was our original array like again?
pop
[16]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
[17]:
# & means (boolean array) and
(pop >= 500) & (pop <= 1000)
[17]:
age sex\nat BE FO
90 M True False
90 F False False
91 M False False
91 F False False
92 M False False
92 F False False
93 M False False
93 F True False
94 M False False
94 F True False
95 M False False
95 F True False
[18]:
# | means (boolean array) or
(pop < 500) | (pop > 1000)
[18]:
age sex\nat BE FO
90 M False True
90 F True True
91 M True True
91 F True True
92 M True True
92 F True True
93 M True True
93 F False True
94 M True True
94 F False True
95 M True True
95 F False True
Arithmetic operations with missing axes¶
[19]:
pop.sum('age')
[19]:
sex\nat BE FO
M 2048 227
F 6127 511
[20]:
# arr has 3 dimensions
pop.info
[20]:
6 x 2 x 2
age [6]: 90 91 92 93 94 95
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: int64
memory used: 192 bytes
[21]:
# and arr.sum(age) has two
pop.sum('age').info
[21]:
2 x 2
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: int64
memory used: 32 bytes
[22]:
# you can do operation with missing axes so this works
pop / pop.sum('age')
[22]:
age sex\nat BE FO
90 M 0.26318359375 0.32599118942731276
90 F 0.2410641423208748 0.26614481409001955
91 M 0.24365234375 0.21585903083700442
91 F 0.2118491921005386 0.2054794520547945
92 M 0.162109375 0.15418502202643172
92 F 0.18622490615309287 0.15264187866927592
93 M 0.14013671875 0.11894273127753303
93 F 0.14787008323812634 0.14481409001956946
94 M 0.11572265625 0.1013215859030837
94 F 0.12061367716663947 0.12720156555772993
95 M 0.0751953125 0.08370044052863436
95 F 0.09237799902072792 0.10371819960861056
Axis order does not matter much (except for output)¶
You can do operations between arrays having different axes order. The axis order of the result is the same as the left array
[23]:
pop
[23]:
age sex\nat BE FO
90 M 539 74
90 F 1477 136
91 M 499 49
91 F 1298 105
92 M 332 35
92 F 1141 78
93 M 287 27
93 F 906 74
94 M 237 23
94 F 739 65
95 M 154 19
95 F 566 53
[24]:
# let us change the order of axes
pop_transposed = pop.T
pop_transposed
[24]:
nat sex\age 90 91 92 93 94 95
BE M 539 499 332 287 237 154
BE F 1477 1298 1141 906 739 566
FO M 74 49 35 27 23 19
FO F 136 105 78 74 65 53
[25]:
# mind blowing
pop_transposed + pop
[25]:
nat sex\age 90 91 92 93 94 95
BE M 1078 998 664 574 474 308
BE F 2954 2596 2282 1812 1478 1132
FO M 148 98 70 54 46 38
FO F 272 210 156 148 130 106
Aggregates¶
Calculate the sum along an axis:
[26]:
pop = load_example_data('demography').pop[2016, 'BruCap']
pop.sum('age')
[26]:
sex\nat BE FO
M 375261 204534
F 401554 206541
or along all axes except one by appending _by
to the aggregation function
[27]:
pop[90:95].sum_by('age')
# is equivalent to
pop[90:95].sum('sex', 'nat')
[27]:
age 90 91 92 93 94 95
2226 1951 1586 1294 1064 792
Calculate the sum along one group:
[28]:
teens = pop.age[10:20]
pop.sum(teens)
[28]:
sex\nat BE FO
M 53834 19145
F 51740 18871
Calculate the sum along two groups:
[29]:
pensioners = pop.age[67:]
# groups from the same axis must be grouped in a tuple
pop.sum((teens, pensioners))
[29]:
age sex\nat BE FO
10:20 M 53834 19145
10:20 F 51740 18871
67: M 44138 9939
67: F 70314 13241
Mixing axes and groups in aggregations:
[30]:
pop.sum((teens, pensioners), 'nat')
[30]:
age\sex M F
10:20 72979 70611
67: 54077 83555
More On Aggregations¶
There are many other aggregation functions:
mean, min, max, median, percentile, var (variance), std (standard deviation)
labelofmin, labelofmax (label indirect minimum/maxium – labels where the value is minimum/maximum)
indexofmin, indexofmax (positional indirect minimum/maxium – position along axis where the value is minimum/maximum)
cumsum, cumprod (cumulative sum, cumulative product)