{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Started" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The purpose of the present **Getting Started** section is to give a quick overview\n", "of the main objects and features of the LArray library.\n", "To get a more detailed presentation of all capabilities of LArray, read the next sections of the tutorial.\n", "\n", "The [API Reference](../api.rst#api-reference) section of the documentation give you the list of all objects, methods and functions with their individual documentation and examples.\n", "\n", "To use the LArray library, the first thing to do is to import it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "%xmode Minimal" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from larray import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To know the version of the LArray library installed on your machine, type:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from larray import __version__\n", "__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Warning:** \n", "The tutorial is generated from Jupyter notebooks which work in the \"interactive\" mode (like in the LArray Editor console). \n", "In the interactive mode, there is no need to use the print() function to display the content of a variable.\n", "Simply writing its name is enough.\n", "The same remark applies for the returned value of an expression.

\n", "In a Python script (file with .py extension), you always need to use the print() function to display the content of \n", "a variable or the value returned by a function or an expression.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s = 1 + 2\n", "\n", "# In the interactive mode, there is no need to use the print() function \n", "# to display the content of the variable 's'.\n", "# Simply typing 's' is enough\n", "s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# In the interactive mode, there is no need to use the print() function \n", "# to display the result of an expression\n", "1 + 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create an array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Working with the LArray library mainly consists of manipulating [Array](../api.rst#array) data structures.\n", "They represent N-dimensional labelled arrays and are composed of raw data (NumPy ndarray), axes and optionally some metadata.\n", "\n", "An [Axis](../api.rst#axis) object represents a dimension of an array. \n", "It contains a list of labels and has a name.\n", "They are several ways to create an axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an axis using one string\n", "age = Axis('age=0-9,10-17,18-66,67+')\n", "# labels generated using the special syntax start..end\n", "time = Axis('time=2015..2017')\n", "# labels given as a list\n", "gender = Axis(['female', 'male'], 'gender')\n", "\n", "age, gender, time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Warning:** When using the string syntax `\"axis_name=list,of,labels\"` or `\"axis_name=start..end\"`, LArray will automatically infer the type of labels. For example, `age = Axis(\"age=0..100\")` will create an age axis with labels of type `int`.

\n", " Mixing numbers with letters or special characters like `+` will create an axis with labels of type `str` instead of `int`.
\n", " For example, `age = Axis(\"age=0..98,99+\")` will create an age axis with labels of type `str` instead of `int`! \n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The labels allow to select subsets and to manipulate the data without working with the positions\n", "of array elements directly.\n", "\n", "To create an array from scratch, you need to supply data and axes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define some data. This is the belgian population (in thousands). Source: eurostat.\n", "data = [[[633, 635, 634],\n", " [663, 665, 664]],\n", " [[484, 486, 491],\n", " [505, 511, 516]],\n", " [[3572, 3581, 3583],\n", " [3600, 3618, 3616]],\n", " [[1023, 1038, 1053],\n", " [756, 775, 793]]]\n", "\n", "# create an Array object\n", "population = Array(data, axes=[age, gender, time])\n", "population" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can optionally attach some metadata to an array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# attach some metadata to the population array\n", "population.meta.title = 'population by age, gender and year'\n", "population.meta.source = 'Eurostat'\n", "\n", "# display metadata\n", "population.meta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get a short summary of an array, type:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Array summary: metadata + dimensions + description of axes\n", "population.info" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the axes of an array, type:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population.axes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to extract one axis belonging to an array using its name:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# extract the 'time' axis belonging to the 'population' array\n", "time = population.time\n", "time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create an array filled with predefined values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays filled with predefined values can be generated through [dedicated functions](../api.rst#array-creation-functions):\n", "\n", " - `zeros` : creates an array filled with 0\n", " - `ones` : creates an array filled with 1\n", " - `full` : creates an array filled with a given value\n", " - `sequence` : creates an array by sequentially applying modifications to the array along axis.\n", " - `ndtest` : creates a test array with increasing numbers as data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zeros([age, gender])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ones([age, gender])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "full([age, gender], fill_value=10.0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...\n", "sequence(age, initial=1.0, inc=0.5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ndtest([age, gender])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save/Load an array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The LArray library offers many I/O functions to read and write arrays in various formats\n", "(CSV, Excel, HDF5). For example, to save an array in a CSV file, call the method `to_csv`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# save our population array to a CSV file\n", "population.to_csv('population_belgium.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The content of the CSV file is then:\n", "\n", " age,gender\\time,2015,2016,2017\n", " 0-9,female,633,635,634\n", " 0-9,male,663,665,664\n", " 10-17,female,484,486,491\n", " 10-17,male,505,511,516\n", " 18-66,female,3572,3581,3583\n", " 18-66,male,3600,3618,3616\n", " 67+,female,1023,1038,1053\n", " 67+,male,756,775,793 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** In CSV or Excel files, the last dimension is horizontal and the names of the\n", "last two dimensions are separated by a backslash \\.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To load a saved array, call the function `read_csv`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population = read_csv('population_belgium.csv')\n", "population" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other input/output functions are described in the [Input/Output](../api.rst#input-output) section of the API documentation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting a subset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To select an element or a subset of an array, use brackets [ ].\n", "In Python we usually use the term *indexing* for this operation.\n", "\n", "Let us start by selecting a single element:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population['67+', 'female', 2017]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Labels can be given in arbitrary order:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population[2017, 'female', '67+']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When selecting a larger subset the result is an array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population['female']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When selecting several labels for the same axis, they must be given as a list (enclosed by ``[ ]``)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population['female', ['0-9', '10-17']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also select *slices*, which are all labels between two bounds (we usually call them the `start` and `stop`\n", "bounds). Specifying the `start` and `stop` bounds of a slice is optional: when not given, `start` is the first label\n", "of the corresponding axis, `stop` the last one:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# in this case '10-17':'67+' is equivalent to ['10-17', '18-66', '67+']\n", "population['female', '10-17':'67+']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# :'18-66' selects all labels between the first one and '18-66'\n", "# 2017: selects all labels between 2017 and the last one\n", "population[:'18-66', 2017:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** Contrary to slices on normal Python lists, the stop bound is included in the selection.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Warning:** Selecting by labels as above only works as long as there is no ambiguity.\n", "When several axes have some labels in common and you do not specify explicitly\n", "on which axis to work, it fails with an error ending with something like:\n", "`ValueError: is ambiguous (valid in , )`\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, imagine you need to work with an 'immigration' array containing two axes sharing some common labels:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "country = Axis(['Belgium', 'Netherlands', 'Germany'], 'country')\n", "citizenship = Axis(['Belgium', 'Netherlands', 'Germany'], 'citizenship')\n", "\n", "immigration = ndtest((country, citizenship, time))\n", "\n", "immigration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we try to get the number of Belgians living in the Netherlands for the year 2017, we might try something like:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", " immigration['Netherlands', 'Belgium', 2017]\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... but we receive back a volley of insults:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "[some long error message ending with the line below]\n", "[...]\n", "ValueError: Netherlands is ambiguous (valid in country, citizenship)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In that case, we have to specify explicitly which axes the 'Netherlands' and 'Belgium' labels we want to select belong to:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "immigration[country['Netherlands'], citizenship['Belgium'], 2017]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The LArray library includes many [aggregations methods](../api.rst#aggregation-functions): sum, mean, min, max, std, var, ...\n", "\n", "For example, assuming we still have an array in the ``population`` variable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can sum along the 'gender' axis using:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population.sum(gender)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or sum along both 'age' and 'gender':" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population.sum(age, gender)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is sometimes more convenient to aggregate along all axes **except** some. In that case, use the aggregation\n", "methods ending with `_by`. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population.sum_by(time)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A [Group](../api.rst#group) object represents a subset of labels or positions of an axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "children = age['0-9', '10-17']\n", "children" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is often useful to attach them an explicit name using the ``>>`` operator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "working = age['18-66'] >> 'working'\n", "working" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nonworking = age['0-9', '10-17', '67+'] >> 'nonworking'\n", "nonworking" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Still using the same ``population`` array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Groups can be used in selections:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population[working]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population[nonworking]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or aggregations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population.sum(nonworking)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When aggregating several groups, the names we set above using ``>>`` determines the label on the aggregated axis.\n", "Since we did not give a name for the children group, the resulting label is generated automatically :" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population.sum((children, working, nonworking))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Warning:** Mixing slices and individual labels inside the `[ ]` will generate **several groups** (a tuple of groups) instead of a single group.
\n", " If you want to create a single group using both slices and individual labels, you need to use the `.union()` method (see below). \n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "age_100 = Axis('age=0..100')\n", "\n", "# mixing slices and individual labels leads to the creation of several groups (a tuple of groups)\n", "age_100[0:10, 20, 30, 40]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the union() method allows to mix slices and individual labels to create a single group\n", "age_100[0:10].union(age_100[20, 30, 40])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grouping arrays in a Session" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variables (arrays) may be grouped in [Session](../api.rst#session) objects.\n", "A session is an ordered dict-like container with special I/O methods:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population = zeros([age, gender, time])\n", "births = zeros([age, gender, time])\n", "deaths = zeros([age, gender, time])\n", "\n", "# create a session containing the arrays of the model\n", "demography_session = Session(population=population, births=births, deaths=deaths)\n", "\n", "# get an array (option 1)\n", "demography_session['population']\n", "\n", "# get an array (option 2)\n", "demography_session.births\n", "\n", "# modify an array\n", "demography_session.deaths['male'] = 1\n", "\n", "# add an array\n", "demography_session.foreigners = zeros([age, gender, time])\n", "\n", "# displays names of arrays contained in the session\n", "# (in alphabetical order)\n", "demography_session.names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the main interests of using sessions is to save and load many arrays at once:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# dump all arrays contained in demography_session in one HDF5 file\n", "demography_session.save('demography.h5')\n", "# load all arrays saved in the HDF5 file 'demography.h5' and store them in the 'demography_session' variable\n", "demography_session = Session('demography.h5')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, development tools like PyCharm do not provide *autocomplete* for objects in ``Session`` objects.\n", "\n", "*Autocomplete* is the feature in which development tools try to predict the variable or function a user intends to enter after only a few characters have been typed (like word completion in cell phones).\n", "\n", "Another way to group objects of a model is to use [CheckedSession](../api.rst#checkedsession). \n", "The ``CheckedSession`` provide the same methods than ``Session`` but enable the *autocomplete* feature on objects it contains.\n", "\n", "For more details about ``Session`` and ``CheckedSession``, see the [Working With Sessions](tutorial_sessions.ipynb#Working-With-Sessions) section of the tutorial.\n", "\n", "To get the list of methods belonging to the ``Session`` and ``CheckedSession`` ojects, check the [corresponding section](../api.rst#session) in the API Reference." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Graphical User Interface (Editor)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The LArray project provides an optional package called [larray-editor](../api.rst#editor) allowing users to explore and edit arrays through a graphical interface.\n", "\n", "The `view()` function displays the content of (an) array(s) in a graphical user interface in read-only mode. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For instance, the statement\n", "```python\n", "view(population)\n", "```\n", "will open a new window showing the values and axes of the 'population' array. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The statement\n", "```python\n", "view(demography_session)\n", "```\n", "will show all arrays contained in the 'demography_session'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A session can be directly loaded from a file\n", "```python\n", "view('demography.h5')\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling\n", "```python\n", "view()\n", "```\n", "with no passed argument creates a session with all existing arrays from the current namespace and shows its content." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Notes**: \n", "\n", " - Calling `view()` will block the execution of the rest of code until the graphical user interface is closed!\n", " - The larray-editor tool is automatically available when installing the **larrayenv** metapackage from conda.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To open the user interface in edit mode, call the `edit()` function instead.\n", "\n", "![compare](../_static/editor.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you can also visually compare two arrays or sessions using the `compare()` function:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", " arr0 = ndtest((3, 3))\n", " arr1 = ndtest((3, 3))\n", " arr1[['a1', 'a2']] = -arr1[['a1', 'a2']]\n", " compare(arr0, arr1)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![compare](../_static/compare.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For Windows Users" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Installing the ``larray-editor`` package on Windows will create a ``LArray`` menu in the\n", "Windows Start Menu. This menu contains:\n", "\n", " * a shortcut to open the documentation of the last stable version of the library\n", " * a shortcut to open the graphical interface in edit mode.\n", " * a shortcut to update `larrayenv`.\n", "\n", "![menu_windows](../_static/menu_windows.png)\n", "\n", "![editor_new](../_static/editor_new.png)\n", "\n", "Once the graphical interface is open, all LArray objects and functions are directly accessible.\n", "No need to start by `from larray import *`." ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 2 }