Interactive online version: Binder badge

Arithmetic Operations And Aggregations

Import the LArray library:

[1]:
from larray import *

Arithmetic operations

[2]:
arr = ndtest((3, 3))
arr
[2]:
a\b  b0  b1  b2
 a0   0   1   2
 a1   3   4   5
 a2   6   7   8

One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually

[3]:
# addition
arr + 10
[3]:
a\b  b0  b1  b2
 a0  10  11  12
 a1  13  14  15
 a2  16  17  18
[4]:
# multiplication
arr * 2
[4]:
a\b  b0  b1  b2
 a0   0   2   4
 a1   6   8  10
 a2  12  14  16
[5]:
# 'true' division
arr / 2
[5]:
a\b   b0   b1   b2
 a0  0.0  0.5  1.0
 a1  1.5  2.0  2.5
 a2  3.0  3.5  4.0
[6]:
# 'floor' division
arr // 2
[6]:
a\b  b0  b1  b2
 a0   0   0   1
 a1   1   2   2
 a2   3   3   4

Warning: Python has two different division operators:

  • the ‘true’ division (/) always returns a float.

  • the ‘floor’ division (//) returns an integer result (discarding any fractional result).

[7]:
# % means modulo (aka remainder of division)
arr % 5
[7]:
a\b  b0  b1  b2
 a0   0   1   2
 a1   3   4   0
 a2   1   2   3
[8]:
# ** means raising to the power
arr ** 3
[8]:
a\b   b0   b1   b2
 a0    0    1    8
 a1   27   64  125
 a2  216  343  512

More interestingly, binary operators as above also works between two arrays:

[9]:
# load the 'demography_eurostat' dataset
demo_eurostat = load_example_data('demography_eurostat')

# extract the 'pop' array
pop = demo_eurostat.pop
pop
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-d8c998aa1085> in <module>
      1 # load the 'demography_eurostat' dataset
----> 2 demo_eurostat = load_example_data('demography_eurostat')
      3
      4 # extract the 'pop' array
      5 pop = demo_eurostat.pop

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/example.py in load_example_data(name)
     91     if name not in AVAILABLE_EXAMPLE_DATA.keys():
     92         raise ValueError("example_data must be chosen from list {}".format(list(AVAILABLE_EXAMPLE_DATA.keys())))
---> 93     return la.Session(AVAILABLE_EXAMPLE_DATA[name])

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in __init__(self, *args, **kwargs)
     94             if isinstance(a0, str):
     95                 # assume a0 is a filename
---> 96                 self.load(a0)
     97             else:
     98                 # iterable of tuple or dict-like

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in load(self, fname, names, engine, display, **kwargs)
    426         else:
    427             handler = handler_cls(fname)
--> 428         metadata, objects = handler.read(names, display=display, **kwargs)
    429         for k, v in objects.items():
    430             self[k] = v

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/common.py in read(self, keys, *args, **kwargs)
    128                 print("loading", type, "object", key, "...", end=' ')
    129             try:
--> 130                 res[key] = self._read_item(key, type, *args, **kwargs)
    131             except Exception:
    132                 if not ignore_exceptions:

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in _read_item(self, key, type, *args, **kwargs)
    137         else:
    138             raise TypeError()
--> 139         return read_hdf(self.handle, hdf_key, *args, **kwargs)
    140
    141     def _dump_item(self, key, value, *args, **kwargs):

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in read_hdf(filepath_or_buffer, key, fill_value, na, sort_rows, sort_columns, name, **kwargs)
     81             cartesian_prod = writer != 'LArray'
     82             res = df_asarray(pd_obj, sort_rows=sort_rows, sort_columns=sort_columns, fill_value=fill_value,
---> 83                              parse_header=False, cartesian_prod=cartesian_prod)
     84             if _meta is not None:
     85                 res.meta = _meta

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in df_asarray(df, sort_rows, sort_columns, raw, parse_header, wide, cartesian_prod, **kwargs)
    338         unfold_last_axis_name = isinstance(axes_names[-1], basestring) and '\\' in axes_names[-1]
    339         res = from_frame(df, sort_rows=sort_rows, sort_columns=sort_columns, parse_header=parse_header,
--> 340                          unfold_last_axis_name=unfold_last_axis_name, cartesian_prod=cartesian_prod, **kwargs)
    341
    342     # ugly hack to avoid anonymous axes converted as axes with name 'Unnamed: x' by pandas

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in from_frame(df, sort_rows, sort_columns, parse_header, unfold_last_axis_name, fill_value, meta, cartesian_prod, **kwargs)
    241             raise ValueError('sort_rows and sort_columns cannot not be used when cartesian_prod is set to False. '
    242                              'Please call the method sort_axes on the returned array to sort rows or columns')
--> 243         axes_labels = index_to_labels(df.index, sort=False)
    244
    245     # Pandas treats column labels as column names (strings) so we need to convert them to values

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in index_to_labels(idx, sort)
     41     Returns unique labels for each dimension.
     42     """
---> 43     if isinstance(idx, pd.core.index.MultiIndex):
     44         if sort:
     45             return list(idx.levels)

AttributeError: module 'pandas.core' has no attribute 'index'
[10]:
aggregation_matrix = Array([[1, 0, 0], [0, 1, 1]], axes=(Axis('country=Belgium,France+Germany'), pop.country))
aggregation_matrix
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-9c80f7bb843d> in <module>
----> 1 aggregation_matrix = Array([[1, 0, 0], [0, 1, 1]], axes=(Axis('country=Belgium,France+Germany'), pop.country))
      2 aggregation_matrix

NameError: name 'pop' is not defined
[11]:
# @ means matrix product
aggregation_matrix @ pop['Male']
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-30029397c219> in <module>
      1 # @ means matrix product
----> 2 aggregation_matrix @ pop['Male']

NameError: name 'aggregation_matrix' is not defined

Note: Be careful when mixing different data types. You can use the method astype to change the data type of an array.

[12]:
aggregation_matrix = Array([[1, 0, 0], [0, 0.5, 0.5]], axes=(Axis('country=Belgium,France+Germany/2'), pop.country))
aggregation_matrix
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-12-2cda65e94db0> in <module>
----> 1 aggregation_matrix = Array([[1, 0, 0], [0, 0.5, 0.5]], axes=(Axis('country=Belgium,France+Germany/2'), pop.country))
      2 aggregation_matrix

NameError: name 'pop' is not defined
[13]:
aggregation_matrix @ pop['Male']
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-c758434338eb> in <module>
----> 1 aggregation_matrix @ pop['Male']

NameError: name 'aggregation_matrix' is not defined
[14]:
# force the resulting matrix to be an integer matrix
(aggregation_matrix @ pop['Male']).astype(int)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-1298c22f1d57> in <module>
      1 # force the resulting matrix to be an integer matrix
----> 2 (aggregation_matrix @ pop['Male']).astype(int)

NameError: name 'aggregation_matrix' is not defined

Axis order does not matter much (except for output)

You can do operations between arrays having different axes order. The axis order of the result is the same as the left array

[15]:
# extract the 'births' array
births = demo_eurostat.births

# let's change the order of axes of the 'births' array
births_transposed = births.transpose()
births_transposed
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-e465a1da3e89> in <module>
      1 # extract the 'births' array
----> 2 births = demo_eurostat.births
      3
      4 # let's change the order of axes of the 'births' array
      5 births_transposed = births.transpose()

NameError: name 'demo_eurostat' is not defined
[16]:
# LArray doesn't care of axes order when performing
# arithmetic operations between arrays
pop + births_transposed
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-16-a86edf97720f> in <module>
      1 # LArray doesn't care of axes order when performing
      2 # arithmetic operations between arrays
----> 3 pop + births_transposed

NameError: name 'pop' is not defined

Axes must be compatible

Arithmetic operations between two arrays only works when they have compatible axes (i.e. same labels).

[17]:
# the 'pop' and 'births' have compatible axes
pop + births
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-17-8a09b11d5b88> in <module>
      1 # the 'pop' and 'births' have compatible axes
----> 2 pop + births

NameError: name 'pop' is not defined
[18]:
# Now, let's replace the country names by the country codes
births_codes = births.set_labels('country', ['BE', 'FR', 'DE'])
births_codes
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-18-93b8c53b345d> in <module>
      1 # Now, let's replace the country names by the country codes
----> 2 births_codes = births.set_labels('country', ['BE', 'FR', 'DE'])
      3 births_codes

NameError: name 'births' is not defined
[19]:
# arithmetic operations between arrays
# having incompatible axes raise an error
try:
    pop + births_codes
except Exception as e:
    print(type(e).__name__, e)
NameError name 'pop' is not defined

Warning: Operations between two arrays only works when they have compatible axes (i.e. same labels) but this behavior can be override via the ignore_labels method. In that case only the position on the axis is used and not the labels. Using this method is done at your own risk.

[20]:
# use the .ignore_labels() method on axis 'country'
# to avoid the incompatible axes error (risky)
pop + births_codes.ignore_labels('country')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-20-4bc895ec8d89> in <module>
      1 # use the .ignore_labels() method on axis 'country'
      2 # to avoid the incompatible axes error (risky)
----> 3 pop + births_codes.ignore_labels('country')

NameError: name 'pop' is not defined

Extra Or Missing Axes (Broadcasting)

The condition that axes must be compatible only applies on common axes. Arithmetic operations between two arrays can be performed even if the second array has extra or missing axes compared to the first one:

[21]:
# let's define a 'multiplicator' vector with
# one value defined for each gender
multiplicator = Array([-1, 1], axes=pop.gender)
multiplicator
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-21-de074b9d8f5c> in <module>
      1 # let's define a 'multiplicator' vector with
      2 # one value defined for each gender
----> 3 multiplicator = Array([-1, 1], axes=pop.gender)
      4 multiplicator

NameError: name 'pop' is not defined
[22]:
# the multiplication below has been propagated to the
# 'country' and 'time' axes.
# This behavior is called broadcasting
pop * multiplicator
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-9b1afbbea6ff> in <module>
      2 # 'country' and 'time' axes.
      3 # This behavior is called broadcasting
----> 4 pop * multiplicator

NameError: name 'pop' is not defined

Boolean Operations

Python comparison operators are:

Operator

Meaning

==

equal

!=

not equal

>

greater than

>=

greater than or equal

<

less than

<=

less than or equal

Applying a comparison operator on an array returns a boolean array:

[23]:
# test which values are greater than 10 millions
pop > 10e6
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-641c4fe49e3a> in <module>
      1 # test which values are greater than 10 millions
----> 2 pop > 10e6

NameError: name 'pop' is not defined

Comparison operations can be combined using Python bitwise operators:

Operator

Meaning

&

and

|

or

~

not

[24]:
# test which values are greater than 10 millions and less than 40 millions
(pop > 10e6) & (pop < 40e6)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-229e41e4b2b3> in <module>
      1 # test which values are greater than 10 millions and less than 40 millions
----> 2 (pop > 10e6) & (pop < 40e6)

NameError: name 'pop' is not defined
[25]:
# test which values are less than 10 millions or greater than 40 millions
(pop < 10e6) | (pop > 40e6)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-d287c6a8a6e5> in <module>
      1 # test which values are less than 10 millions or greater than 40 millions
----> 2 (pop < 10e6) | (pop > 40e6)

NameError: name 'pop' is not defined
[26]:
# test which values are not less than 10 millions
~(pop < 10e6)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-511032bcaa94> in <module>
      1 # test which values are not less than 10 millions
----> 2 ~(pop < 10e6)

NameError: name 'pop' is not defined

The returned boolean array can then be used in selections and assignments:

[27]:
pop_copy = pop.copy()

# set all values greater than 40 millions to 40 millions
pop_copy[pop_copy > 40e6] = 40e6
pop_copy
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-5c28ad882397> in <module>
----> 1 pop_copy = pop.copy()
      2
      3 # set all values greater than 40 millions to 40 millions
      4 pop_copy[pop_copy > 40e6] = 40e6
      5 pop_copy

NameError: name 'pop' is not defined

Boolean operations can be made between arrays:

[28]:
# test where the two arrays have the same values
pop == pop_copy
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-28-bd6bc78c0981> in <module>
      1 # test where the two arrays have the same values
----> 2 pop == pop_copy

NameError: name 'pop' is not defined

To test if all values between are equals, use the equals method:

[29]:
pop.equals(pop_copy)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-29-19ebac2262e0> in <module>
----> 1 pop.equals(pop_copy)

NameError: name 'pop' is not defined

Aggregates

The LArray library provides many aggregation functions. The list is given in the Aggregation Functions subsection of the API Reference page.

Aggregation operations can be performed on axes or groups. Axes and groups can be mixed.

The main rules are:

  • Axes are separated by commas ,

  • Groups belonging to the same axis are grouped inside parentheses ()

Calculate the sum along an axis:

[30]:
pop.sum('gender')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-30-69a08e55585e> in <module>
----> 1 pop.sum('gender')

NameError: name 'pop' is not defined

or several axes (axes are separated by commas ,):

[31]:
pop.sum('country', 'gender')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-31-90acf6310fef> in <module>
----> 1 pop.sum('country', 'gender')

NameError: name 'pop' is not defined

Calculate the sum along all axes except one by appending _by to the aggregation function:

[32]:
pop.sum_by('time')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-fa0331e295de> in <module>
----> 1 pop.sum_by('time')

NameError: name 'pop' is not defined

Calculate the sum along groups (the groups belonging to the same axis must grouped inside parentheses ()):

[33]:
even_years = pop.time[2014::2] >> 'even_years'
odd_years = pop.time[2013::2] >> 'odd_years'

pop.sum((odd_years, even_years))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-33-6da7c15fbbb2> in <module>
----> 1 even_years = pop.time[2014::2] >> 'even_years'
      2 odd_years = pop.time[2013::2] >> 'odd_years'
      3
      4 pop.sum((odd_years, even_years))

NameError: name 'pop' is not defined

Mixing axes and groups in aggregations:

[34]:
pop.sum('gender', (odd_years, even_years))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-34-260c0b07b806> in <module>
----> 1 pop.sum('gender', (odd_years, even_years))

NameError: name 'pop' is not defined