{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Indexing, Selecting and Assigning\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import the LArray library:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from larray import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import the test array ``population``:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's start with\n",
    "population = load_example_data('demography_eurostat').population\n",
    "population"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Selecting (Subsets)\n",
    "\n",
    "The ``Array`` class allows to select a subset either by labels or indices (positions)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting by Labels\n",
    "\n",
    "To take a subset of an array using labels, use brackets [ ].\n",
    "\n",
    "Let's start by selecting a single element:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "population['Belgium', 'Female', 2017]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As long as there is no ambiguity (i.e. axes sharing one or several same label(s)), the order of indexing does not matter. \n",
    "So you usually do not care/have to remember about axes positions during computation. It only matters for output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# order of index doesn't matter\n",
    "population['Female', 2017, 'Belgium']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Selecting a subset is done by using slices or lists of labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "population[['Belgium', 'Germany'], 2014:2016]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Slices bounds are optional:\n",
    "if not given, start is assumed to be the first label and stop is the last one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select all years starting from 2015\n",
    "population[2015:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select all first years until 2015\n",
    "population[:2015]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Slices can also have a step (defaults to 1), to take every Nth labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select all even years starting from 2014\n",
    "population[2014::2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "**Warning:** Selecting by labels as in above examples works well as long as there is no ambiguity.\n",
    "   When two or more axes have common labels, it leads to a crash.\n",
    "   The solution is then to precise to which axis belong the labels.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "immigration = load_example_data('demography_eurostat').immigration\n",
    "\n",
    "# the 'immigration' array has two axes (country and citizenship) which share the same labels\n",
    "immigration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# LArray doesn't use the position of the labels used inside the brackets \n",
    "# to determine the corresponding axes. Instead LArray will try to guess the \n",
    "# corresponding axis for each label whatever is its position.\n",
    "# Then, if a label is shared by two or more axes, LArray will not be able \n",
    "# to choose between the possible axes and will raise an error.\n",
    "try:\n",
    "    immigration['Belgium', 'Netherlands']\n",
    "except Exception as e:\n",
    "    print(type(e).__name__, ':', e)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the solution is simple. You need to precise the axes on which you make a selection\n",
    "immigration[immigration.country['Belgium'], immigration.citizenship['Netherlands']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ambiguous Cases - Specifying Axes Using The Special Variable X\n",
    "\n",
    "When selecting, assigning or using aggregate functions, an axis can be\n",
    "referred via the special variable ``X``:\n",
    "\n",
    "-  population[X.time[2015:]]\n",
    "-  population.sum(X.time)\n",
    "\n",
    "This gives you access to axes of the array you are manipulating. The main\n",
    "drawback of using ``X`` is that you lose the autocompletion available from\n",
    "many editors. It only works with non-anonymous axes for which names do not contain whitespaces or special characters.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the previous example can also be written as\n",
    "immigration[X.country['Belgium'], X.citizenship['Netherlands']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting by Indices\n",
    "\n",
    "Sometimes it is more practical to use indices (positions) along the axis, instead of labels.\n",
    "You need to add the character ``i`` before the brackets: ``.i[indices]``.\n",
    "As for selection with labels, you can use a single index, a slice or a list of indices.\n",
    "Indices can be also negative (-1 represent the last element of an axis).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "**Note:** Remember that indices (positions) are always **0-based** in Python.\n",
    "So the first element is at index 0, the second is at index 1, etc.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select the last year\n",
    "population[X.time.i[-1]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# same but for the last 3 years\n",
    "population[X.time.i[-3:]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# using a list of indices\n",
    "population[X.time.i[0, 2, 4]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "**Warning:** The end *indice* (position) is EXCLUSIVE while the end label is INCLUSIVE.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "year = 2015\n",
    "\n",
    "# with labels\n",
    "population[X.time[:year]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# with indices (i.e. using the .i[indices] syntax)\n",
    "index_year = population.time.index(year)\n",
    "population[X.time.i[:index_year]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use ``.i[]`` selection directly on array instead of axes.\n",
    "In this context, if you want to select a subset of the first and third axes for example, you must use a full slice ``:`` for the second one.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select first country and last three years\n",
    "population.i[0, :, -3:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using Groups In Selections\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "even_years = population.time[2014::2]\n",
    "\n",
    "population[even_years]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Boolean Filtering\n",
    "\n",
    "Boolean filtering can be used to extract subsets. Filtering can be done on axes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select even years\n",
    "population[X.time % 2 == 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select population for the year 2017\n",
    "population_2017 = population[2017]\n",
    "\n",
    "# select all data with a value greater than 30 million\n",
    "population_2017[population_2017 > 30e6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "**Note:** Be aware that after boolean filtering, several axes may have merged.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Arrays can also be used to create boolean filters:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "start_year = Array([2015, 2016, 2017], axes=population.country)\n",
    "start_year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "population[X.time >= start_year]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iterating over an axis\n",
    "\n",
    "Iterating over an axis is straightforward:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for year in population.time:\n",
    "    print(year)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Assigning subsets\n",
    "\n",
    "### Assigning A Value\n",
    "\n",
    "Assigning a value to a subset is simple:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "population[2017] = 0\n",
    "population"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's store a subset in a new variable and modify it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# store the data associated with the year 2016 in a new variable\n",
    "population_2016 = population[2016]\n",
    "population_2016"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# now, we modify the new variable\n",
    "population_2016['Belgium'] = 0\n",
    "\n",
    "# and we can see that the original array has been also modified\n",
    "population"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One very important gotcha though...\n",
    "\n",
    "<div class=\"alert alert-warning\">\n",
    "**Warning:** Storing a subset of an array in a new variable and modifying it after may also impact the original array. The reason is that selecting a contiguous subset of the data does not return a copy of the selected subset, but rather a view on a subset of the array. To avoid such behavior, use the ``.copy()`` method.\n",
    "</div>\n",
    "\n",
    "Remember:\n",
    "\n",
    "-  taking a contiguous subset of an array is extremely fast (no data is copied)\n",
    "-  if one modifies that subset, one also **modifies the original array**\n",
    "-  **.copy()** returns a copy of the subset (takes speed and memory) but\n",
    "   allows you to change the subset without modifying the original array\n",
    "   in the same time\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same warning apply for entire arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# reload the 'population' array\n",
    "population = load_example_data('demography_eurostat').population\n",
    "\n",
    "# create a second 'population2' variable\n",
    "population2 = population\n",
    "population2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set all data corresponding to the year 2017 to 0\n",
    "population2[2017] = 0\n",
    "population2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# and now take a look of what happened to the original array 'population'\n",
    "# after modifying the 'population2' array\n",
    "population"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "**Warning:** The syntax ``new_array = old_array`` does not create a new array but rather an 'alias' variable. To actually create a new array as a copy of a previous one, the ``.copy()`` method must be called.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# reload the 'population' array\n",
    "population = load_example_data('demography_eurostat').population\n",
    "\n",
    "# copy the 'population' array and store the copy in a new variable\n",
    "population2 = population.copy()\n",
    "\n",
    "# modify the copy\n",
    "population2[2017] = 0\n",
    "population2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the data from the original array have not been modified\n",
    "population"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assigning Arrays And Broadcasting\n",
    "\n",
    "Instead of a value, we can also assign an array to a subset. In that\n",
    "case, that array can have less axes than the target but those which are\n",
    "present must be compatible with the subset being targeted.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select population for the year 2015\n",
    "population_2015 = population[2015]\n",
    "\n",
    "# propagate population for the year 2015 to all next years\n",
    "population[2016:] = population_2015\n",
    "\n",
    "population"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "**Warning:** The array being assigned must have compatible axes (i.e. same axes names and same labels) with the target subset.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# replace 'Male' and 'Female' labels by 'M' and 'F'\n",
    "population_2015 = population_2015.set_labels('gender', 'M,F')\n",
    "population_2015"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# now let's try to repeat the assignement operation above with the new labels.\n",
    "# An error is raised because of incompatible axes\n",
    "try:\n",
    "    population[2016:] = population_2015\n",
    "except Exception as e:\n",
    "    print(type(e).__name__, ':', e)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Edit Metadata",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "livereveal": {
   "autolaunch": false,
   "scroll": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}