Pythonic VS String Syntax
Import the LArray library:
[1]:
from larray import *
The LArray library offers two syntaxes to build axes and make selections and aggregations. The first one is more Pythonic
(uses Python structures) For example, you can create an age_category axis as follows:
[2]:
age_category = Axis(["0-9", "10-17", "18-66", "67+"], "age_category")
age_category
[2]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')
The second one consists of using strings
that are parsed. It is shorter to type. The same age_category axis could have been generated as follows:
[3]:
age_category = Axis("age_category=0-9,10-17,18-66,67+")
age_category
[3]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')
Warning: The drawback of the string syntax is that some characters such as , ; = : .. [ ] >>
have a special meaning and cannot be used with the String
syntax. If you need to work with labels containing such special characters (when importing data from an external source for example), you have to use the Pythonic
syntax which allows to use any character in labels.
String Syntax
Axes And Arrays creation
The string syntax allows to easily create axes.
When creating one axis, the labels are separated using ,
:
[4]:
a = Axis('a=a0,a1,a2,a3')
a
[4]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')
The special syntax start..stop
generates a sequence of labels:
[5]:
a = Axis('a=a0..a3')
a
[5]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')
When creating an array, it is possible to define several axes in the same string using ;
[6]:
arr = zeros("a=a0..a2; b=b0,b1; c=c0..c5")
arr
[6]:
a b\c c0 c1 c2 c3 c4 c5
a0 b0 0.0 0.0 0.0 0.0 0.0 0.0
a0 b1 0.0 0.0 0.0 0.0 0.0 0.0
a1 b0 0.0 0.0 0.0 0.0 0.0 0.0
a1 b1 0.0 0.0 0.0 0.0 0.0 0.0
a2 b0 0.0 0.0 0.0 0.0 0.0 0.0
a2 b1 0.0 0.0 0.0 0.0 0.0 0.0
Selection
Starting from the array:
[7]:
immigration = load_example_data('demography_eurostat').immigration
immigration.info
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 1
----> 1 immigration = load_example_data('demography_eurostat').immigration
2 immigration.info
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/example.py:97, in load_example_data(name)
95 available_datasets = list(AVAILABLE_EXAMPLE_DATA.keys())
96 raise ValueError(f"example_data must be chosen from list {available_datasets}")
---> 97 return la.Session(AVAILABLE_EXAMPLE_DATA[name])
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:98, in Session.__init__(self, meta, *args, **kwargs)
94 elements = {a.name: a for a in args}
96 if isinstance(elements, (str, Path)):
97 # assume elements is a filename
---> 98 self.load(elements)
99 self.update(**kwargs)
100 else:
101 # iterable of tuple or dict-like
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/core/session.py:438, in Session.load(self, fname, names, engine, display, **kwargs)
436 else:
437 handler = handler_cls(fname)
--> 438 metadata, objects = handler.read(names, display=display, **kwargs)
439 self._update_from_iterable(objects.items())
440 self.meta = metadata
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/common.py:139, in FileHandler.read(self, keys, display, ignore_exceptions, *args, **kwargs)
114 def read(self, keys, *args, display=False, ignore_exceptions=False, **kwargs) -> Tuple[Metadata, dict]:
115 r"""
116 Read file content (HDF, Excel, CSV, ...) and returns a dictionary containing loaded objects.
117
(...)
137 Dictionary containing the loaded objects.
138 """
--> 139 self._open_for_read()
140 metadata = self._read_metadata()
141 item_types = self.item_types()
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/larray/inout/hdf.py:138, in PandasHDFHandler._open_for_read(self)
137 def _open_for_read(self):
--> 138 self.handle = HDFStore(self.fname, mode='r')
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/io/pytables.py:566, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
563 if "format" in kwargs:
564 raise ValueError("format is not a defined argument for HDFStore")
--> 566 tables = import_optional_dependency("tables")
568 if complib is not None and complib not in tables.filters.all_complibs:
569 raise ValueError(
570 f"complib only supports {tables.filters.all_complibs} compression."
571 )
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/pandas/compat/_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)
130 msg = (
131 f"Missing optional dependency '{install_name}'. {extra} "
132 f"Use pip or conda to install {install_name}."
133 )
134 try:
--> 135 module = importlib.import_module(name)
136 except ImportError:
137 if errors == "raise":
File ~/.asdf/installs/python/3.11.9/lib/python3.11/importlib/__init__.py:126, in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
File <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)
File <frozen importlib._bootstrap>:1147, in _find_and_load_unlocked(name, import_)
File <frozen importlib._bootstrap>:690, in _load_unlocked(spec)
File <frozen importlib._bootstrap_external>:940, in exec_module(self, module)
File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/__init__.py:44
40 raise RuntimeError("Blosc2 library not found. "
41 f"I looked for \"{', '.join(blosc2_search_paths)}\"")
43 # Necessary imports to get versions stored on the cython extension
---> 44 from .utilsextension import get_hdf5_version as _get_hdf5_version
46 from ._version import __version__
48 hdf5_version = _get_hdf5_version()
File ~/checkouts/readthedocs.org/user_builds/larray/envs/0.34.3/lib/python3.11/site-packages/tables/utilsextension.pyx:1, in init tables.utilsextension()
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
an example of a selection using the Pythonic
syntax is:
[8]:
# since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
# we need to explicitly specify that we want to make a selection over the 'country' axis
immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 3
1 # since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
2 # we need to explicitly specify that we want to make a selection over the 'country' axis
----> 3 immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
4 immigration_subset
NameError: name 'immigration' is not defined
Using the String
syntax, the same selection becomes:
[9]:
immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
2 immigration_subset
NameError: name 'immigration' is not defined
Aggregation
An example of an aggregation using the Pythonic
syntax is:
[10]:
immigration.mean((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 immigration.mean((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
NameError: name 'immigration' is not defined
Using the String
syntax, the same aggregation becomes:
[11]:
immigration.mean('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 immigration.mean('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
NameError: name 'immigration' is not defined
where we used ;
to separate groups of labels from the same axis.