{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Arithmetic Operations\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import the LArray library:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from larray import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the `population` array from the `demography_eurostat` dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load the 'demography_eurostat' dataset\n", "demography_eurostat = load_example_data('demography_eurostat')\n", "\n", "# extract the 'country', 'gender' and 'time' axes\n", "country = demography_eurostat.country\n", "gender = demography_eurostat.gender\n", "time = demography_eurostat.time\n", "\n", "# extract the 'population' array\n", "population = demography_eurostat.population\n", "\n", "# show the 'population' array\n", "population" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basics\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 'true' division\n", "population_in_millions = population / 1_000_000\n", "population_in_millions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 'floor' division\n", "population_in_millions = population // 1_000_000\n", "population_in_millions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "**Warning:** Python has two different division operators: \n", "\n", "- the 'true' division (/) always returns a float.\n", "- the 'floor' division (//) returns an integer result (discarding any fractional result).\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# % means modulo (aka remainder of division)\n", "population % 1_000_000" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ** means raising to the power\n", "print(ndtest(4))\n", "ndtest(4) ** 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More interestingly, binary operators as above also works between two arrays.\n", "\n", "Let us imagine a rate of population growth which is constant over time but different by gender and country:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "growth_rate = Array(data=[[1.011, 1.010], [1.013, 1.011], [1.010, 1.009]], axes=[country, gender])\n", "growth_rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# we store the population of the year 2017 in a new variable\n", "population_2017 = population[2017]\n", "population_2017" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# perform an arithmetic operation between two arrays\n", "population_2018 = population_2017 * growth_rate\n", "population_2018" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "**Note:** Be careful when mixing different data types.\n", "You can use the method [astype](../_generated/larray.Array.astype.rst#larray.Array.astype) to change the data type of an array.\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# force the resulting matrix to be an integer matrix\n", "population_2018 = (population_2017 * growth_rate).astype(int)\n", "population_2018" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Axis order does not matter much (except for output)\n", "\n", "You can do operations between arrays having different axes order.\n", "The axis order of the result is the same as the left array\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# let's change the order of axes of the 'constant_growth_rate' array\n", "transposed_growth_rate = growth_rate.transpose()\n", "\n", "# look at the order of the new 'transposed_growth_rate' array:\n", "# 'gender' is the first axis while 'country' is the second\n", "transposed_growth_rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# look at the order of the 'population_2017' array:\n", "# 'country' is the first axis while 'gender' is the second\n", "population_2017" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# LArray doesn't care of axes order when performing \n", "# arithmetic operations between arrays\n", "population_2018 = population_2017 * transposed_growth_rate\n", "population_2018" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Axes must be compatible\n", "\n", "Arithmetic operations between two arrays only works when they have compatible axes (i.e. same list of labels in the same order)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# show 'population_2017'\n", "population_2017" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Order of labels matters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# let us imagine that the labels of the 'country' axis \n", "# of the 'constant_growth_rate' array are in a different order\n", "# than in the 'population_2017' array\n", "reordered_growth_rate = growth_rate.reindex('country', ['Germany', 'Belgium', 'France'])\n", "reordered_growth_rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# when doing arithmetic operations, \n", "# the order of labels counts\n", "try:\n", " population_2018 = population_2017 * reordered_growth_rate\n", "except Exception as e:\n", " print(type(e).__name__, e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### No extra or missing labels are permitted" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# let us imagine that the 'country' axis of \n", "# the 'constant_growth_rate' array has an extra \n", "# label 'Netherlands' compared to the same axis of \n", "# the 'population_2017' array\n", "growth_rate_netherlands = Array([1.012, 1.], population.gender)\n", "growth_rate_extra_country = growth_rate.append('country', growth_rate_netherlands, label='Netherlands')\n", "growth_rate_extra_country" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# when doing arithmetic operations, \n", "# no extra or missing labels are permitted \n", "try:\n", " population_2018 = population_2017 * growth_rate_extra_country\n", "except Exception as e:\n", " print(type(e).__name__, e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ignoring labels (risky)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " **Warning:** Operations between two arrays only works when they have compatible axes (i.e. same labels) but this behavior can be override via the [ignore_labels](../_generated/larray.Array.ignore_labels.rst#larray.Array.ignore_labels) method.\n", "In that case only the position on the axis is used and not the labels.\n", "\n", "Using this method is done at your own risk and SHOULD NEVER BEEN USED IN A MODEL. \n", "Use this method only for quick tests or rapid data exploration. \n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# let us imagine that the labels of the 'country' axis \n", "# of the 'constant_growth_rate' array are the \n", "# country codes instead of the country full names\n", "growth_rate_country_codes = growth_rate.set_labels('country', ['BE', 'FR', 'DE'])\n", "growth_rate_country_codes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# use the .ignore_labels() method on axis 'country'\n", "# to avoid the incompatible axes error (risky)\n", "population_2018 = population_2017 * growth_rate_country_codes.ignore_labels('country')\n", "population_2018" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extra Or Missing Axes (Broadcasting)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The condition that axes must be compatible only applies on common axes. \n", "Making arithmetic operations between two arrays having the same axes is intuitive. \n", "However, arithmetic operations between two arrays can be performed even if the second array has extra and/or missing axes compared to the first one. Such mechanism is called ``broadcasting``. It allows to make a lot of arithmetic operations without using any loop. This is a great advantage since using loops in Python can be highly time consuming (especially nested loops) and should be avoided as much as possible. \n", "\n", "To understand how broadcasting works, let us start with a simple example. \n", "We assume we have the population of both men and women cumulated for each country:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population_by_country = population_2017['Male'] + population_2017['Female']\n", "population_by_country" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also assume we have the proportion of each gender in the population and that proportion is supposed to be the same for all countries:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gender_proportion = Array([0.49, 0.51], gender)\n", "gender_proportion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the two 1D arrays above, we can naively compute the population by country and gender as follow: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define a new variable with both 'country' and 'gender' axes to store the result\n", "population_by_country_and_gender = zeros([country, gender], dtype=int)\n", "\n", "# loop over the 'country' and 'gender' axes \n", "for c in country:\n", " for g in gender:\n", " population_by_country_and_gender[c, g] = population_by_country[c] * gender_proportion[g]\n", "\n", "# display the result\n", "population_by_country_and_gender" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Relying on the ``broadcasting`` mechanism, the calculation above becomes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the outer product is done automatically.\n", "# No need to use any loop -> saves a lot of computation time\n", "population_by_country_and_gender = population_by_country * gender_proportion\n", "\n", "# display the result\n", "population_by_country_and_gender.astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the calculation above, ``LArray`` automatically creates a resulting array with axes given by the union of the axes of the two arrays involved in the arithmetic operation.\n", "\n", "Let us do the same calculation but we add a common `time` axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population_by_country_and_year = population['Male'] + population['Female']\n", "population_by_country_and_year" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gender_proportion_by_year = Array([[0.49, 0.485, 0.495, 0.492, 0.498], \n", " [0.51, 0.515, 0.505, 0.508, 0.502]], [gender, time])\n", "gender_proportion_by_year" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without the ``broadcasting`` mechanism, the computation of the population by country, gender and year would have been:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define a new variable to store the result.\n", "# Its axes is the union of the axes of the two arrays \n", "# involved in the arithmetic operation\n", "population_by_country_gender_year = zeros([country, gender, time], dtype=int)\n", "\n", "# loop over axes which are not present in both arrays\n", "# involved in the arithmetic operation\n", "for c in country:\n", " for g in gender:\n", " # all subsets below have the same 'time' axis\n", " population_by_country_gender_year[c, g] = population_by_country_and_year[c] * gender_proportion_by_year[g]\n", " \n", "population_by_country_gender_year" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again, the above calculation can be simplified as:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# No need to use any loop -> saves a lot of computation time\n", "population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year\n", "\n", "# display the result\n", "population_by_country_gender_year.astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " **Warning:** Broadcasting is a powerful mechanism but can be confusing at first. It can lead to unexpected results. \n", " In particular, if axes which are supposed to be common are not, you will get a resulting array with extra axes you didn't want. \n", "
\n", "\n", "For example, imagine that the name of the `time` axis is `time` for the first array but `period` for the second:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gender_proportion_by_year = gender_proportion_by_year.rename('time', 'period')\n", "gender_proportion_by_year" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population_by_country_and_year" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the two arrays below have a \"time\" axis with two different names: 'time' and 'period'.\n", "# LArray will treat the \"time\" axis of the two arrays as two different \"time\" axes\n", "population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year\n", "\n", "# as a consequence, the result of the multiplication of the two arrays is not what we expected\n", "population_by_country_gender_year.astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Boolean Operations\n", "\n", "Python comparison operators are: \n", "\n", "| Operator | Meaning |\n", "|-----------|-------------------------|\n", "|``==`` | equal | \n", "|``!=`` | not equal | \n", "|``>`` | greater than | \n", "|``>=`` | greater than or equal | \n", "|``<`` | less than | \n", "|``<=`` | less than or equal |\n", "\n", "Applying a comparison operator on an array returns a boolean array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are greater than 10 millions\n", "population > 10e6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparison operations can be combined using Python bitwise operators:\n", "\n", "| Operator | Meaning |\n", "|----------|------------------------------------- |\n", "| & | and |\n", "| \\| | or |\n", "| ~ | not |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are greater than 10 millions and less than 40 millions\n", "(population > 10e6) & (population < 40e6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are less than 10 millions or greater than 40 millions\n", "(population < 10e6) | (population > 40e6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test which values are not less than 10 millions\n", "~(population < 10e6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned boolean array can then be used in selections and assignments:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population_copy = population.copy()\n", "\n", "# set all values greater than 40 millions to 40 millions\n", "population_copy[population_copy > 40e6] = 40e6\n", "population_copy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Boolean operations can be made between arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test where the two arrays have the same values\n", "population == population_copy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test if all values between are equals, use the [equals](../_generated/larray.Array.equals.rst#larray.Array.equals) method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population.equals(population_copy)" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "livereveal": { "autolaunch": false, "scroll": true } }, "nbformat": 4, "nbformat_minor": 2 }