Pythonic VS String Syntax¶
Import the LArray library:
[1]:
from larray import *
The LArray library offers two syntaxes to build axes and make selections and aggregations. The first one is more Pythonic
(uses Python structures) For example, you can create an age_category axis as follows:
[2]:
age_category = Axis(["0-9", "10-17", "18-66", "67+"], "age_category")
age_category
[2]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')
The second one consists of using strings
that are parsed. It is shorter to type. The same age_category axis could have been generated as follows:
[3]:
age_category = Axis("age_category=0-9,10-17,18-66,67+")
age_category
[3]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')
Warning: The drawback of the string syntax is that some characters such as , ; = : .. [ ] >>
have a special meaning and cannot be used with the String
syntax. If you need to work with labels containing such special characters (when importing data from an external source for example), you have to use the Pythonic
syntax which allows to use any character in labels.
String Syntax¶
Axes And Arrays creation¶
The string syntax allows to easily create axes.
When creating one axis, the labels are separated using ,
:
[4]:
a = Axis('a=a0,a1,a2,a3')
a
[4]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')
The special syntax start..stop
generates a sequence of labels:
[5]:
a = Axis('a=a0..a3')
a
[5]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')
When creating an array, it is possible to define several axes in the same string using ;
[6]:
arr = zeros("a=a0..a2; b=b0,b1; c=c0..c5")
arr
[6]:
a b\c c0 c1 c2 c3 c4 c5
a0 b0 0.0 0.0 0.0 0.0 0.0 0.0
a0 b1 0.0 0.0 0.0 0.0 0.0 0.0
a1 b0 0.0 0.0 0.0 0.0 0.0 0.0
a1 b1 0.0 0.0 0.0 0.0 0.0 0.0
a2 b0 0.0 0.0 0.0 0.0 0.0 0.0
a2 b1 0.0 0.0 0.0 0.0 0.0 0.0
Selection¶
Starting from the array:
[7]:
immigration = load_example_data('demography_eurostat').immigration
immigration.info
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-4d243c9b8788> in <module>
----> 1 immigration = load_example_data('demography_eurostat').immigration
2 immigration.info
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/example.py in load_example_data(name)
91 if name not in AVAILABLE_EXAMPLE_DATA.keys():
92 raise ValueError("example_data must be chosen from list {}".format(list(AVAILABLE_EXAMPLE_DATA.keys())))
---> 93 return la.Session(AVAILABLE_EXAMPLE_DATA[name])
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in __init__(self, *args, **kwargs)
94 if isinstance(a0, str):
95 # assume a0 is a filename
---> 96 self.load(a0)
97 else:
98 # iterable of tuple or dict-like
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in load(self, fname, names, engine, display, **kwargs)
426 else:
427 handler = handler_cls(fname)
--> 428 metadata, objects = handler.read(names, display=display, **kwargs)
429 for k, v in objects.items():
430 self[k] = v
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/common.py in read(self, keys, *args, **kwargs)
128 print("loading", type, "object", key, "...", end=' ')
129 try:
--> 130 res[key] = self._read_item(key, type, *args, **kwargs)
131 except Exception:
132 if not ignore_exceptions:
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in _read_item(self, key, type, *args, **kwargs)
137 else:
138 raise TypeError()
--> 139 return read_hdf(self.handle, hdf_key, *args, **kwargs)
140
141 def _dump_item(self, key, value, *args, **kwargs):
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in read_hdf(filepath_or_buffer, key, fill_value, na, sort_rows, sort_columns, name, **kwargs)
81 cartesian_prod = writer != 'LArray'
82 res = df_asarray(pd_obj, sort_rows=sort_rows, sort_columns=sort_columns, fill_value=fill_value,
---> 83 parse_header=False, cartesian_prod=cartesian_prod)
84 if _meta is not None:
85 res.meta = _meta
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in df_asarray(df, sort_rows, sort_columns, raw, parse_header, wide, cartesian_prod, **kwargs)
338 unfold_last_axis_name = isinstance(axes_names[-1], basestring) and '\\' in axes_names[-1]
339 res = from_frame(df, sort_rows=sort_rows, sort_columns=sort_columns, parse_header=parse_header,
--> 340 unfold_last_axis_name=unfold_last_axis_name, cartesian_prod=cartesian_prod, **kwargs)
341
342 # ugly hack to avoid anonymous axes converted as axes with name 'Unnamed: x' by pandas
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in from_frame(df, sort_rows, sort_columns, parse_header, unfold_last_axis_name, fill_value, meta, cartesian_prod, **kwargs)
241 raise ValueError('sort_rows and sort_columns cannot not be used when cartesian_prod is set to False. '
242 'Please call the method sort_axes on the returned array to sort rows or columns')
--> 243 axes_labels = index_to_labels(df.index, sort=False)
244
245 # Pandas treats column labels as column names (strings) so we need to convert them to values
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in index_to_labels(idx, sort)
41 Returns unique labels for each dimension.
42 """
---> 43 if isinstance(idx, pd.core.index.MultiIndex):
44 if sort:
45 return list(idx.levels)
AttributeError: module 'pandas.core' has no attribute 'index'
an example of a selection using the Pythonic
syntax is:
[8]:
# since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
# we need to explicitly specify that we want to make a selection over the 'country' axis
immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-8-d9ca54fdd4a1> in <module>
1 # since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
2 # we need to explicitly specify that we want to make a selection over the 'country' axis
----> 3 immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
4 immigration_subset
NameError: name 'immigration' is not defined
Using the String
syntax, the same selection becomes:
[9]:
immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
immigration_subset
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-9-6d463abc555a> in <module>
----> 1 immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
2 immigration_subset
NameError: name 'immigration' is not defined
Aggregation¶
An example of an aggregation using the Pythonic
syntax is:
[10]:
immigration.sum((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-10-d0659d645eb8> in <module>
----> 1 immigration.sum((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
NameError: name 'immigration' is not defined
Using the String
syntax, the same aggregation becomes:
[11]:
immigration.sum('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-ca658d11b0fe> in <module>
----> 1 immigration.sum('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
NameError: name 'immigration' is not defined
where we used ;
to separate groups of labels from the same axis.