Interactive online version: Binder badge

Indexing, Selecting and Assigning

Import the LArray library:

[1]:
from larray import *

Import the test array population:

[2]:
# let's start with
population = load_example_data('demography_eurostat').population
population
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 2
      1 # let's start with
----> 2 population = load_example_data('demography_eurostat').population
      3 population

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/example.py:97, in load_example_data(name)
     95     available_datasets = list(AVAILABLE_EXAMPLE_DATA.keys())
     96     raise ValueError(f"example_data must be chosen from list {available_datasets}")
---> 97 return la.Session(AVAILABLE_EXAMPLE_DATA[name])

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:98, in Session.__init__(self, meta, *args, **kwargs)
     94     elements = {a.name: a for a in args}
     96 if isinstance(elements, (str, Path)):
     97     # assume elements is a filename
---> 98     self.load(elements)
     99     self.update(**kwargs)
    100 else:
    101     # iterable of tuple or dict-like

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:438, in Session.load(self, fname, names, engine, display, **kwargs)
    436 else:
    437     handler = handler_cls(fname)
--> 438 metadata, objects = handler.read(names, display=display, **kwargs)
    439 self._update_from_iterable(objects.items())
    440 self.meta = metadata

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/common.py:139, in FileHandler.read(self, keys, display, ignore_exceptions, *args, **kwargs)
    114 def read(self, keys, *args, display=False, ignore_exceptions=False, **kwargs) -> Tuple[Metadata, dict]:
    115     r"""
    116     Read file content (HDF, Excel, CSV, ...) and returns a dictionary containing loaded objects.
    117
   (...)
    137         Dictionary containing the loaded objects.
    138     """
--> 139     self._open_for_read()
    140     metadata = self._read_metadata()
    141     item_types = self.item_types()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/hdf.py:138, in PandasHDFHandler._open_for_read(self)
    137 def _open_for_read(self):
--> 138     self.handle = HDFStore(self.fname, mode='r')

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/io/pytables.py:566, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    563 if "format" in kwargs:
    564     raise ValueError("format is not a defined argument for HDFStore")
--> 566 tables = import_optional_dependency("tables")
    568 if complib is not None and complib not in tables.filters.all_complibs:
    569     raise ValueError(
    570         f"complib only supports {tables.filters.all_complibs} compression."
    571     )

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/compat/_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)
    130 msg = (
    131     f"Missing optional dependency '{install_name}'. {extra} "
    132     f"Use pip or conda to install {install_name}."
    133 )
    134 try:
--> 135     module = importlib.import_module(name)
    136 except ImportError:
    137     if errors == "raise":

File ~/.asdf/installs/python/3.11.9/lib/python3.11/importlib/__init__.py:126, in import_module(name, package)
    124             break
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1147, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:690, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:940, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/__init__.py:44
     40     raise RuntimeError("Blosc2 library not found. "
     41                        f"I looked for \"{', '.join(blosc2_search_paths)}\"")
     43 # Necessary imports to get versions stored on the cython extension
---> 44 from .utilsextension import get_hdf5_version as _get_hdf5_version
     46 from ._version import __version__
     48 hdf5_version = _get_hdf5_version()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/utilsextension.pyx:1, in init tables.utilsextension()

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Selecting (Subsets)

The Array class allows to select a subset either by labels or indices (positions)

Selecting by Labels

To take a subset of an array using labels, use brackets [ ].

Let’s start by selecting a single element:

[3]:
population['Belgium', 'Female', 2017]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 population['Belgium', 'Female', 2017]

NameError: name 'population' is not defined

As long as there is no ambiguity (i.e. axes sharing one or several same label(s)), the order of indexing does not matter. So you usually do not care/have to remember about axes positions during computation. It only matters for output.

[4]:
# order of index doesn't matter
population['Female', 2017, 'Belgium']
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 2
      1 # order of index doesn't matter
----> 2 population['Female', 2017, 'Belgium']

NameError: name 'population' is not defined

Selecting a subset is done by using slices or lists of labels:

[5]:
population[['Belgium', 'Germany'], 2014:2016]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 population[['Belgium', 'Germany'], 2014:2016]

NameError: name 'population' is not defined

Slices bounds are optional: if not given, start is assumed to be the first label and stop is the last one.

[6]:
# select all years starting from 2015
population[2015:]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 # select all years starting from 2015
----> 2 population[2015:]

NameError: name 'population' is not defined
[7]:
# select all first years until 2015
population[:2015]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 2
      1 # select all first years until 2015
----> 2 population[:2015]

NameError: name 'population' is not defined

Slices can also have a step (defaults to 1), to take every Nth labels:

[8]:
# select all even years starting from 2014
population[2014::2]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 2
      1 # select all even years starting from 2014
----> 2 population[2014::2]

NameError: name 'population' is not defined

Warning: Selecting by labels as in above examples works well as long as there is no ambiguity. When two or more axes have common labels, it leads to a crash. The solution is then to precise to which axis belong the labels.

[9]:
immigration = load_example_data('demography_eurostat').immigration

# the 'immigration' array has two axes (country and citizenship) which share the same labels
immigration
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 immigration = load_example_data('demography_eurostat').immigration
      3 # the 'immigration' array has two axes (country and citizenship) which share the same labels
      4 immigration

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/example.py:97, in load_example_data(name)
     95     available_datasets = list(AVAILABLE_EXAMPLE_DATA.keys())
     96     raise ValueError(f"example_data must be chosen from list {available_datasets}")
---> 97 return la.Session(AVAILABLE_EXAMPLE_DATA[name])

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:98, in Session.__init__(self, meta, *args, **kwargs)
     94     elements = {a.name: a for a in args}
     96 if isinstance(elements, (str, Path)):
     97     # assume elements is a filename
---> 98     self.load(elements)
     99     self.update(**kwargs)
    100 else:
    101     # iterable of tuple or dict-like

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:438, in Session.load(self, fname, names, engine, display, **kwargs)
    436 else:
    437     handler = handler_cls(fname)
--> 438 metadata, objects = handler.read(names, display=display, **kwargs)
    439 self._update_from_iterable(objects.items())
    440 self.meta = metadata

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/common.py:139, in FileHandler.read(self, keys, display, ignore_exceptions, *args, **kwargs)
    114 def read(self, keys, *args, display=False, ignore_exceptions=False, **kwargs) -> Tuple[Metadata, dict]:
    115     r"""
    116     Read file content (HDF, Excel, CSV, ...) and returns a dictionary containing loaded objects.
    117
   (...)
    137         Dictionary containing the loaded objects.
    138     """
--> 139     self._open_for_read()
    140     metadata = self._read_metadata()
    141     item_types = self.item_types()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/hdf.py:138, in PandasHDFHandler._open_for_read(self)
    137 def _open_for_read(self):
--> 138     self.handle = HDFStore(self.fname, mode='r')

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/io/pytables.py:566, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    563 if "format" in kwargs:
    564     raise ValueError("format is not a defined argument for HDFStore")
--> 566 tables = import_optional_dependency("tables")
    568 if complib is not None and complib not in tables.filters.all_complibs:
    569     raise ValueError(
    570         f"complib only supports {tables.filters.all_complibs} compression."
    571     )

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/compat/_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)
    130 msg = (
    131     f"Missing optional dependency '{install_name}'. {extra} "
    132     f"Use pip or conda to install {install_name}."
    133 )
    134 try:
--> 135     module = importlib.import_module(name)
    136 except ImportError:
    137     if errors == "raise":

File ~/.asdf/installs/python/3.11.9/lib/python3.11/importlib/__init__.py:126, in import_module(name, package)
    124             break
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1147, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:690, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:940, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/__init__.py:44
     40     raise RuntimeError("Blosc2 library not found. "
     41                        f"I looked for \"{', '.join(blosc2_search_paths)}\"")
     43 # Necessary imports to get versions stored on the cython extension
---> 44 from .utilsextension import get_hdf5_version as _get_hdf5_version
     46 from ._version import __version__
     48 hdf5_version = _get_hdf5_version()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/utilsextension.pyx:1, in init tables.utilsextension()

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
[10]:
# LArray doesn't use the position of the labels used inside the brackets
# to determine the corresponding axes. Instead LArray will try to guess the
# corresponding axis for each label whatever is its position.
# Then, if a label is shared by two or more axes, LArray will not be able
# to choose between the possible axes and will raise an error.
try:
    immigration['Belgium', 'Netherlands']
except Exception as e:
    print(type(e).__name__, ':', e)
NameError : name 'immigration' is not defined
[11]:
# the solution is simple. You need to precise the axes on which you make a selection
immigration[immigration.country['Belgium'], immigration.citizenship['Netherlands']]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 2
      1 # the solution is simple. You need to precise the axes on which you make a selection
----> 2 immigration[immigration.country['Belgium'], immigration.citizenship['Netherlands']]

NameError: name 'immigration' is not defined

Ambiguous Cases - Specifying Axes Using The Special Variable X

When selecting, assigning or using aggregate functions, an axis can be referred via the special variable X:

  • population[X.time[2015:]]

  • population.sum(X.time)

This gives you access to axes of the array you are manipulating. The main drawback of using X is that you lose the autocompletion available from many editors. It only works with non-anonymous axes for which names do not contain whitespaces or special characters.

[12]:
# the previous example can also be written as
immigration[X.country['Belgium'], X.citizenship['Netherlands']]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 2
      1 # the previous example can also be written as
----> 2 immigration[X.country['Belgium'], X.citizenship['Netherlands']]

NameError: name 'immigration' is not defined

Selecting by Indices

Sometimes it is more practical to use indices (positions) along the axis, instead of labels. You need to add the character i before the brackets: .i[indices]. As for selection with labels, you can use a single index, a slice or a list of indices. Indices can be also negative (-1 represent the last element of an axis).

Note: Remember that indices (positions) are always 0-based in Python. So the first element is at index 0, the second is at index 1, etc.

[13]:
# select the last year
population[X.time.i[-1]]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 2
      1 # select the last year
----> 2 population[X.time.i[-1]]

NameError: name 'population' is not defined
[14]:
# same but for the last 3 years
population[X.time.i[-3:]]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 2
      1 # same but for the last 3 years
----> 2 population[X.time.i[-3:]]

NameError: name 'population' is not defined
[15]:
# using a list of indices
population[X.time.i[0, 2, 4]]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 2
      1 # using a list of indices
----> 2 population[X.time.i[0, 2, 4]]

NameError: name 'population' is not defined

Warning: The end indice (position) is EXCLUSIVE while the end label is INCLUSIVE.

[16]:
year = 2015

# with labels
population[X.time[:year]]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 4
      1 year = 2015
      3 # with labels
----> 4 population[X.time[:year]]

NameError: name 'population' is not defined
[17]:
# with indices (i.e. using the .i[indices] syntax)
index_year = population.time.index(year)
population[X.time.i[:index_year]]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 2
      1 # with indices (i.e. using the .i[indices] syntax)
----> 2 index_year = population.time.index(year)
      3 population[X.time.i[:index_year]]

NameError: name 'population' is not defined

You can use .i[] selection directly on array instead of axes. In this context, if you want to select a subset of the first and third axes for example, you must use a full slice : for the second one.

[18]:
# select first country and last three years
population.i[0, :, -3:]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 2
      1 # select first country and last three years
----> 2 population.i[0, :, -3:]

NameError: name 'population' is not defined

Using Groups In Selections

[19]:
even_years = population.time[2014::2]

population[even_years]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 even_years = population.time[2014::2]
      3 population[even_years]

NameError: name 'population' is not defined

Boolean Filtering

Boolean filtering can be used to extract subsets. Filtering can be done on axes:

[20]:
# select even years
population[X.time % 2 == 0]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 2
      1 # select even years
----> 2 population[X.time % 2 == 0]

NameError: name 'population' is not defined

or data:

[21]:
# select population for the year 2017
population_2017 = population[2017]

# select all data with a value greater than 30 million
population_2017[population_2017 > 30e6]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 2
      1 # select population for the year 2017
----> 2 population_2017 = population[2017]
      4 # select all data with a value greater than 30 million
      5 population_2017[population_2017 > 30e6]

NameError: name 'population' is not defined

Note: Be aware that after boolean filtering, several axes may have merged.

Arrays can also be used to create boolean filters:

[22]:
start_year = Array([2015, 2016, 2017], axes=population.country)
start_year
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 start_year = Array([2015, 2016, 2017], axes=population.country)
      2 start_year

NameError: name 'population' is not defined
[23]:
population[X.time >= start_year]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 population[X.time >= start_year]

NameError: name 'population' is not defined

Iterating over an axis

Iterating over an axis is straightforward:

[24]:
for year in population.time:
    print(year)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 for year in population.time:
      2     print(year)

NameError: name 'population' is not defined

Assigning subsets

Assigning A Value

Assigning a value to a subset is simple:

[25]:
population[2017] = 0
population
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 population[2017] = 0
      2 population

NameError: name 'population' is not defined

Now, let’s store a subset in a new variable and modify it:

[26]:
# store the data associated with the year 2016 in a new variable
population_2016 = population[2016]
population_2016
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 2
      1 # store the data associated with the year 2016 in a new variable
----> 2 population_2016 = population[2016]
      3 population_2016

NameError: name 'population' is not defined
[27]:
# now, we modify the new variable
population_2016['Belgium'] = 0

# and we can see that the original array has been also modified
population
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[27], line 2
      1 # now, we modify the new variable
----> 2 population_2016['Belgium'] = 0
      4 # and we can see that the original array has been also modified
      5 population

NameError: name 'population_2016' is not defined

One very important gotcha though…

Warning: Storing a subset of an array in a new variable and modifying it after may also impact the original array. The reason is that selecting a contiguous subset of the data does not return a copy of the selected subset, but rather a view on a subset of the array. To avoid such behavior, use the .copy() method.

Remember:

  • taking a contiguous subset of an array is extremely fast (no data is copied)

  • if one modifies that subset, one also modifies the original array

  • .copy() returns a copy of the subset (takes speed and memory) but allows you to change the subset without modifying the original array in the same time

The same warning apply for entire arrays:

[28]:
# reload the 'population' array
population = load_example_data('demography_eurostat').population

# create a second 'population2' variable
population2 = population
population2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[28], line 2
      1 # reload the 'population' array
----> 2 population = load_example_data('demography_eurostat').population
      4 # create a second 'population2' variable
      5 population2 = population

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/example.py:97, in load_example_data(name)
     95     available_datasets = list(AVAILABLE_EXAMPLE_DATA.keys())
     96     raise ValueError(f"example_data must be chosen from list {available_datasets}")
---> 97 return la.Session(AVAILABLE_EXAMPLE_DATA[name])

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:98, in Session.__init__(self, meta, *args, **kwargs)
     94     elements = {a.name: a for a in args}
     96 if isinstance(elements, (str, Path)):
     97     # assume elements is a filename
---> 98     self.load(elements)
     99     self.update(**kwargs)
    100 else:
    101     # iterable of tuple or dict-like

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:438, in Session.load(self, fname, names, engine, display, **kwargs)
    436 else:
    437     handler = handler_cls(fname)
--> 438 metadata, objects = handler.read(names, display=display, **kwargs)
    439 self._update_from_iterable(objects.items())
    440 self.meta = metadata

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/common.py:139, in FileHandler.read(self, keys, display, ignore_exceptions, *args, **kwargs)
    114 def read(self, keys, *args, display=False, ignore_exceptions=False, **kwargs) -> Tuple[Metadata, dict]:
    115     r"""
    116     Read file content (HDF, Excel, CSV, ...) and returns a dictionary containing loaded objects.
    117
   (...)
    137         Dictionary containing the loaded objects.
    138     """
--> 139     self._open_for_read()
    140     metadata = self._read_metadata()
    141     item_types = self.item_types()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/hdf.py:138, in PandasHDFHandler._open_for_read(self)
    137 def _open_for_read(self):
--> 138     self.handle = HDFStore(self.fname, mode='r')

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/io/pytables.py:566, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    563 if "format" in kwargs:
    564     raise ValueError("format is not a defined argument for HDFStore")
--> 566 tables = import_optional_dependency("tables")
    568 if complib is not None and complib not in tables.filters.all_complibs:
    569     raise ValueError(
    570         f"complib only supports {tables.filters.all_complibs} compression."
    571     )

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/compat/_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)
    130 msg = (
    131     f"Missing optional dependency '{install_name}'. {extra} "
    132     f"Use pip or conda to install {install_name}."
    133 )
    134 try:
--> 135     module = importlib.import_module(name)
    136 except ImportError:
    137     if errors == "raise":

File ~/.asdf/installs/python/3.11.9/lib/python3.11/importlib/__init__.py:126, in import_module(name, package)
    124             break
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1147, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:690, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:940, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/__init__.py:44
     40     raise RuntimeError("Blosc2 library not found. "
     41                        f"I looked for \"{', '.join(blosc2_search_paths)}\"")
     43 # Necessary imports to get versions stored on the cython extension
---> 44 from .utilsextension import get_hdf5_version as _get_hdf5_version
     46 from ._version import __version__
     48 hdf5_version = _get_hdf5_version()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/utilsextension.pyx:1, in init tables.utilsextension()

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
[29]:
# set all data corresponding to the year 2017 to 0
population2[2017] = 0
population2
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[29], line 2
      1 # set all data corresponding to the year 2017 to 0
----> 2 population2[2017] = 0
      3 population2

NameError: name 'population2' is not defined
[30]:
# and now take a look of what happened to the original array 'population'
# after modifying the 'population2' array
population
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[30], line 3
      1 # and now take a look of what happened to the original array 'population'
      2 # after modifying the 'population2' array
----> 3 population

NameError: name 'population' is not defined

Warning: The syntax new_array = old_array does not create a new array but rather an ‘alias’ variable. To actually create a new array as a copy of a previous one, the .copy() method must be called.

[31]:
# reload the 'population' array
population = load_example_data('demography_eurostat').population

# copy the 'population' array and store the copy in a new variable
population2 = population.copy()

# modify the copy
population2[2017] = 0
population2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[31], line 2
      1 # reload the 'population' array
----> 2 population = load_example_data('demography_eurostat').population
      4 # copy the 'population' array and store the copy in a new variable
      5 population2 = population.copy()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/example.py:97, in load_example_data(name)
     95     available_datasets = list(AVAILABLE_EXAMPLE_DATA.keys())
     96     raise ValueError(f"example_data must be chosen from list {available_datasets}")
---> 97 return la.Session(AVAILABLE_EXAMPLE_DATA[name])

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:98, in Session.__init__(self, meta, *args, **kwargs)
     94     elements = {a.name: a for a in args}
     96 if isinstance(elements, (str, Path)):
     97     # assume elements is a filename
---> 98     self.load(elements)
     99     self.update(**kwargs)
    100 else:
    101     # iterable of tuple or dict-like

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:438, in Session.load(self, fname, names, engine, display, **kwargs)
    436 else:
    437     handler = handler_cls(fname)
--> 438 metadata, objects = handler.read(names, display=display, **kwargs)
    439 self._update_from_iterable(objects.items())
    440 self.meta = metadata

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/common.py:139, in FileHandler.read(self, keys, display, ignore_exceptions, *args, **kwargs)
    114 def read(self, keys, *args, display=False, ignore_exceptions=False, **kwargs) -> Tuple[Metadata, dict]:
    115     r"""
    116     Read file content (HDF, Excel, CSV, ...) and returns a dictionary containing loaded objects.
    117
   (...)
    137         Dictionary containing the loaded objects.
    138     """
--> 139     self._open_for_read()
    140     metadata = self._read_metadata()
    141     item_types = self.item_types()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/hdf.py:138, in PandasHDFHandler._open_for_read(self)
    137 def _open_for_read(self):
--> 138     self.handle = HDFStore(self.fname, mode='r')

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/io/pytables.py:566, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    563 if "format" in kwargs:
    564     raise ValueError("format is not a defined argument for HDFStore")
--> 566 tables = import_optional_dependency("tables")
    568 if complib is not None and complib not in tables.filters.all_complibs:
    569     raise ValueError(
    570         f"complib only supports {tables.filters.all_complibs} compression."
    571     )

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/compat/_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)
    130 msg = (
    131     f"Missing optional dependency '{install_name}'. {extra} "
    132     f"Use pip or conda to install {install_name}."
    133 )
    134 try:
--> 135     module = importlib.import_module(name)
    136 except ImportError:
    137     if errors == "raise":

File ~/.asdf/installs/python/3.11.9/lib/python3.11/importlib/__init__.py:126, in import_module(name, package)
    124             break
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1147, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:690, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:940, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/__init__.py:44
     40     raise RuntimeError("Blosc2 library not found. "
     41                        f"I looked for \"{', '.join(blosc2_search_paths)}\"")
     43 # Necessary imports to get versions stored on the cython extension
---> 44 from .utilsextension import get_hdf5_version as _get_hdf5_version
     46 from ._version import __version__
     48 hdf5_version = _get_hdf5_version()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/utilsextension.pyx:1, in init tables.utilsextension()

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
[32]:
# the data from the original array have not been modified
population
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[32], line 2
      1 # the data from the original array have not been modified
----> 2 population

NameError: name 'population' is not defined

Assigning Arrays And Broadcasting

Instead of a value, we can also assign an array to a subset. In that case, that array can have less axes than the target but those which are present must be compatible with the subset being targeted.

[33]:
# select population for the year 2015
population_2015 = population[2015]

# propagate population for the year 2015 to all next years
population[2016:] = population_2015

population
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[33], line 2
      1 # select population for the year 2015
----> 2 population_2015 = population[2015]
      4 # propagate population for the year 2015 to all next years
      5 population[2016:] = population_2015

NameError: name 'population' is not defined

Warning: The array being assigned must have compatible axes (i.e. same axes names and same labels) with the target subset.

[34]:
# replace 'Male' and 'Female' labels by 'M' and 'F'
population_2015 = population_2015.set_labels('gender', 'M,F')
population_2015
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[34], line 2
      1 # replace 'Male' and 'Female' labels by 'M' and 'F'
----> 2 population_2015 = population_2015.set_labels('gender', 'M,F')
      3 population_2015

NameError: name 'population_2015' is not defined
[35]:
# now let's try to repeat the assignement operation above with the new labels.
# An error is raised because of incompatible axes
try:
    population[2016:] = population_2015
except Exception as e:
    print(type(e).__name__, ':', e)
NameError : name 'population_2015' is not defined