Interactive online version: Binder badge

Pythonic VS String Syntax

Import the LArray library:

[1]:
from larray import *

The LArray library offers two syntaxes to build axes and make selections and aggregations. The first one is more Pythonic (uses Python structures) For example, you can create an age_category axis as follows:

[2]:
age_category = Axis(["0-9", "10-17", "18-66", "67+"], "age_category")
age_category
[2]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')

The second one consists of using strings that are parsed. It is shorter to type. The same age_category axis could have been generated as follows:

[3]:
age_category = Axis("age_category=0-9,10-17,18-66,67+")
age_category
[3]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')

Warning: The drawback of the string syntax is that some characters such as , ; = : .. [ ] >> have a special meaning and cannot be used with the String syntax. If you need to work with labels containing such special characters (when importing data from an external source for example), you have to use the Pythonic syntax which allows to use any character in labels.

String Syntax

Axes And Arrays creation

The string syntax allows to easily create axes.

When creating one axis, the labels are separated using ,:

[4]:
a = Axis('a=a0,a1,a2,a3')
a
[4]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')

The special syntax start..stop generates a sequence of labels:

[5]:
a = Axis('a=a0..a3')
a
[5]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')

When creating an array, it is possible to define several axes in the same string using ;

[6]:
arr = zeros("a=a0..a2; b=b0,b1; c=c0..c5")
arr
[6]:
 a  b\c   c0   c1   c2   c3   c4   c5
a0   b0  0.0  0.0  0.0  0.0  0.0  0.0
a0   b1  0.0  0.0  0.0  0.0  0.0  0.0
a1   b0  0.0  0.0  0.0  0.0  0.0  0.0
a1   b1  0.0  0.0  0.0  0.0  0.0  0.0
a2   b0  0.0  0.0  0.0  0.0  0.0  0.0
a2   b1  0.0  0.0  0.0  0.0  0.0  0.0

Selection

Starting from the array:

[7]:
immigration = load_example_data('demography_eurostat').immigration
immigration.info
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 immigration = load_example_data('demography_eurostat').immigration
      2 immigration.info

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/example.py:97, in load_example_data(name)
     95     available_datasets = list(AVAILABLE_EXAMPLE_DATA.keys())
     96     raise ValueError(f"example_data must be chosen from list {available_datasets}")
---> 97 return la.Session(AVAILABLE_EXAMPLE_DATA[name])

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:98, in Session.__init__(self, meta, *args, **kwargs)
     94     elements = {a.name: a for a in args}
     96 if isinstance(elements, (str, Path)):
     97     # assume elements is a filename
---> 98     self.load(elements)
     99     self.update(**kwargs)
    100 else:
    101     # iterable of tuple or dict-like

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:438, in Session.load(self, fname, names, engine, display, **kwargs)
    436 else:
    437     handler = handler_cls(fname)
--> 438 metadata, objects = handler.read(names, display=display, **kwargs)
    439 self._update_from_iterable(objects.items())
    440 self.meta = metadata

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/common.py:139, in FileHandler.read(self, keys, display, ignore_exceptions, *args, **kwargs)
    114 def read(self, keys, *args, display=False, ignore_exceptions=False, **kwargs) -> Tuple[Metadata, dict]:
    115     r"""
    116     Read file content (HDF, Excel, CSV, ...) and returns a dictionary containing loaded objects.
    117
   (...)
    137         Dictionary containing the loaded objects.
    138     """
--> 139     self._open_for_read()
    140     metadata = self._read_metadata()
    141     item_types = self.item_types()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/hdf.py:138, in PandasHDFHandler._open_for_read(self)
    137 def _open_for_read(self):
--> 138     self.handle = HDFStore(self.fname, mode='r')

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/io/pytables.py:566, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    563 if "format" in kwargs:
    564     raise ValueError("format is not a defined argument for HDFStore")
--> 566 tables = import_optional_dependency("tables")
    568 if complib is not None and complib not in tables.filters.all_complibs:
    569     raise ValueError(
    570         f"complib only supports {tables.filters.all_complibs} compression."
    571     )

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/compat/_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)
    130 msg = (
    131     f"Missing optional dependency '{install_name}'. {extra} "
    132     f"Use pip or conda to install {install_name}."
    133 )
    134 try:
--> 135     module = importlib.import_module(name)
    136 except ImportError:
    137     if errors == "raise":

File ~/.asdf/installs/python/3.11.9/lib/python3.11/importlib/__init__.py:126, in import_module(name, package)
    124             break
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1147, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:690, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:940, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/__init__.py:44
     40     raise RuntimeError("Blosc2 library not found. "
     41                        f"I looked for \"{', '.join(blosc2_search_paths)}\"")
     43 # Necessary imports to get versions stored on the cython extension
---> 44 from .utilsextension import get_hdf5_version as _get_hdf5_version
     46 from ._version import __version__
     48 hdf5_version = _get_hdf5_version()

File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/utilsextension.pyx:1, in init tables.utilsextension()

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

an example of a selection using the Pythonic syntax is:

[8]:
# since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
# we need to explicitly specify that we want to make a selection over the 'country' axis
immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 3
      1 # since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis, 
      2 # we need to explicitly specify that we want to make a selection over the 'country' axis
----> 3 immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
      4 immigration_subset

NameError: name 'immigration' is not defined

Using the String syntax, the same selection becomes:

[9]:
immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
      2 immigration_subset

NameError: name 'immigration' is not defined

Aggregation

An example of an aggregation using the Pythonic syntax is:

[10]:
immigration.mean((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 immigration.mean((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')

NameError: name 'immigration' is not defined

Using the String syntax, the same aggregation becomes:

[11]:
immigration.mean('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 immigration.mean('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')

NameError: name 'immigration' is not defined

where we used ; to separate groups of labels from the same axis.