Interactive online version: Binder badge

Pythonic VS String Syntax

Import the LArray library:

[1]:
from larray import *

The LArray library offers two syntaxes to build axes and make selections and aggregations. The first one is more Pythonic (uses Python structures) For example, you can create an age_category axis as follows:

[2]:
age_category = Axis(["0-9", "10-17", "18-66", "67+"], "age_category")
age_category
[2]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')

The second one consists of using strings that are parsed. It is shorter to type. The same age_category axis could have been generated as follows:

[3]:
age_category = Axis("age_category=0-9,10-17,18-66,67+")
age_category
[3]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')

Warning: The drawback of the string syntax is that some characters such as , ; = : .. [ ] >> have a special meaning and cannot be used with the String syntax. If you need to work with labels containing such special characters (when importing data from an external source for example), you have to use the Pythonic syntax which allows to use any character in labels.

String Syntax

Axes And Arrays creation

The string syntax allows to easily create axes.

When creating one axis, the labels are separated using ,:

[4]:
a = Axis('a=a0,a1,a2,a3')
a
[4]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')

The special syntax start..stop generates a sequence of labels:

[5]:
a = Axis('a=a0..a3')
a
[5]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')

When creating an array, it is possible to define several axes in the same string using ;

[6]:
arr = zeros("a=a0..a2; b=b0,b1; c=c0..c5")
arr
[6]:
 a  b\c   c0   c1   c2   c3   c4   c5
a0   b0  0.0  0.0  0.0  0.0  0.0  0.0
a0   b1  0.0  0.0  0.0  0.0  0.0  0.0
a1   b0  0.0  0.0  0.0  0.0  0.0  0.0
a1   b1  0.0  0.0  0.0  0.0  0.0  0.0
a2   b0  0.0  0.0  0.0  0.0  0.0  0.0
a2   b1  0.0  0.0  0.0  0.0  0.0  0.0

Selection

Starting from the array:

[7]:
immigration = load_example_data('demography_eurostat').immigration
immigration.info
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-4d243c9b8788> in <module>
----> 1 immigration = load_example_data('demography_eurostat').immigration
      2 immigration.info

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/example.py in load_example_data(name)
     91     if name not in AVAILABLE_EXAMPLE_DATA.keys():
     92         raise ValueError("example_data must be chosen from list {}".format(list(AVAILABLE_EXAMPLE_DATA.keys())))
---> 93     return la.Session(AVAILABLE_EXAMPLE_DATA[name])

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in __init__(self, *args, **kwargs)
     94             if isinstance(a0, str):
     95                 # assume a0 is a filename
---> 96                 self.load(a0)
     97             else:
     98                 # iterable of tuple or dict-like

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in load(self, fname, names, engine, display, **kwargs)
    426         else:
    427             handler = handler_cls(fname)
--> 428         metadata, objects = handler.read(names, display=display, **kwargs)
    429         for k, v in objects.items():
    430             self[k] = v

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/common.py in read(self, keys, *args, **kwargs)
    128                 print("loading", type, "object", key, "...", end=' ')
    129             try:
--> 130                 res[key] = self._read_item(key, type, *args, **kwargs)
    131             except Exception:
    132                 if not ignore_exceptions:

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in _read_item(self, key, type, *args, **kwargs)
    137         else:
    138             raise TypeError()
--> 139         return read_hdf(self.handle, hdf_key, *args, **kwargs)
    140
    141     def _dump_item(self, key, value, *args, **kwargs):

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in read_hdf(filepath_or_buffer, key, fill_value, na, sort_rows, sort_columns, name, **kwargs)
     81             cartesian_prod = writer != 'LArray'
     82             res = df_asarray(pd_obj, sort_rows=sort_rows, sort_columns=sort_columns, fill_value=fill_value,
---> 83                              parse_header=False, cartesian_prod=cartesian_prod)
     84             if _meta is not None:
     85                 res.meta = _meta

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in df_asarray(df, sort_rows, sort_columns, raw, parse_header, wide, cartesian_prod, **kwargs)
    338         unfold_last_axis_name = isinstance(axes_names[-1], basestring) and '\\' in axes_names[-1]
    339         res = from_frame(df, sort_rows=sort_rows, sort_columns=sort_columns, parse_header=parse_header,
--> 340                          unfold_last_axis_name=unfold_last_axis_name, cartesian_prod=cartesian_prod, **kwargs)
    341
    342     # ugly hack to avoid anonymous axes converted as axes with name 'Unnamed: x' by pandas

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in from_frame(df, sort_rows, sort_columns, parse_header, unfold_last_axis_name, fill_value, meta, cartesian_prod, **kwargs)
    241             raise ValueError('sort_rows and sort_columns cannot not be used when cartesian_prod is set to False. '
    242                              'Please call the method sort_axes on the returned array to sort rows or columns')
--> 243         axes_labels = index_to_labels(df.index, sort=False)
    244
    245     # Pandas treats column labels as column names (strings) so we need to convert them to values

~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in index_to_labels(idx, sort)
     41     Returns unique labels for each dimension.
     42     """
---> 43     if isinstance(idx, pd.core.index.MultiIndex):
     44         if sort:
     45             return list(idx.levels)

AttributeError: module 'pandas.core' has no attribute 'index'

an example of a selection using the Pythonic syntax is:

[8]:
# since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
# we need to explicitly specify that we want to make a selection over the 'country' axis
immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-d9ca54fdd4a1> in <module>
      1 # since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
      2 # we need to explicitly specify that we want to make a selection over the 'country' axis
----> 3 immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
      4 immigration_subset

NameError: name 'immigration' is not defined

Using the String syntax, the same selection becomes:

[9]:
immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-6d463abc555a> in <module>
----> 1 immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
      2 immigration_subset

NameError: name 'immigration' is not defined

Aggregation

An example of an aggregation using the Pythonic syntax is:

[10]:
immigration.sum((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-d0659d645eb8> in <module>
----> 1 immigration.sum((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')

NameError: name 'immigration' is not defined

Using the String syntax, the same aggregation becomes:

[11]:
immigration.sum('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-ca658d11b0fe> in <module>
----> 1 immigration.sum('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')

NameError: name 'immigration' is not defined

where we used ; to separate groups of labels from the same axis.