Interactive online version: Binder badge

Pythonic VS String Syntax

Import the LArray library:

[1]:
from larray import *

The LArray library offers two syntaxes to build axes and make selections and aggregations. The first one is more Pythonic (uses Python structures) For example, you can create an age_category axis as follows:

[2]:
age_category = Axis(["0-9", "10-17", "18-66", "67+"], "age_category")
age_category
[2]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')

The second one consists of using strings that are parsed. It is shorter to type. The same age_category axis could have been generated as follows:

[3]:
age_category = Axis("age_category=0-9,10-17,18-66,67+")
age_category
[3]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')

Warning: The drawback of the string syntax is that some characters such as , ; = : .. [ ] >> have a special meaning and cannot be used with the String syntax. If you need to work with labels containing such special characters (when importing data from an external source for example), you have to use the Pythonic syntax which allows to use any character in labels.

String Syntax

Axes And Arrays creation

The string syntax allows to easily create axes.

When creating one axis, the labels are separated using ,:

[4]:
a = Axis('a=a0,a1,a2,a3')
a
[4]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')

The special syntax start..stop generates a sequence of labels:

[5]:
a = Axis('a=a0..a3')
a
[5]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')

When creating an array, it is possible to define several axes in the same string using ;

[6]:
arr = zeros("a=a0..a2; b=b0,b1; c=c0..c5")
arr
[6]:
 a  b\c   c0   c1   c2   c3   c4   c5
a0   b0  0.0  0.0  0.0  0.0  0.0  0.0
a0   b1  0.0  0.0  0.0  0.0  0.0  0.0
a1   b0  0.0  0.0  0.0  0.0  0.0  0.0
a1   b1  0.0  0.0  0.0  0.0  0.0  0.0
a2   b0  0.0  0.0  0.0  0.0  0.0  0.0
a2   b1  0.0  0.0  0.0  0.0  0.0  0.0

Selection

Starting from the array:

[7]:
immigration = load_example_data('demography_eurostat').immigration
immigration.info
[7]:
title: Immigration by age group, sex and citizenship
source: table migr_imm1ctz from Eurostat
3 x 3 x 2 x 5
 country [3]: 'Belgium' 'Luxembourg' 'Netherlands'
 citizenship [3]: 'Belgium' 'Luxembourg' 'Netherlands'
 gender [2]: 'Male' 'Female'
 time [5]: 2013 2014 2015 2016 2017
dtype: int32
memory used: 360 bytes

an example of a selection using the Pythonic syntax is:

[8]:
# since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
# we need to explicitly specify that we want to make a selection over the 'country' axis
immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
immigration_subset
[8]:
    country  citizenship\time   2015   2016   2017
    Belgium           Belgium   6486   6560   6454
    Belgium        Luxembourg    114    108    118
    Belgium       Netherlands   3942   3664   3632
Netherlands           Belgium   1181   1340   1449
Netherlands        Luxembourg     46     60     97
Netherlands       Netherlands  18084  19815  20894

Using the String syntax, the same selection becomes:

[9]:
immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
immigration_subset
[9]:
    country  citizenship\time   2015   2016   2017
    Belgium           Belgium   6486   6560   6454
    Belgium        Luxembourg    114    108    118
    Belgium       Netherlands   3942   3664   3632
Netherlands           Belgium   1181   1340   1449
Netherlands        Luxembourg     46     60     97
Netherlands       Netherlands  18084  19815  20894

Aggregation

An example of an aggregation using the Pythonic syntax is:

[10]:
immigration.mean((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
[10]:
    country  gender\time          even_years           odd_years
    Belgium         Male   5039.166666666667   4900.555555555556
    Belgium       Female  3433.3333333333335  3369.6666666666665
 Luxembourg         Male   577.8333333333334   559.4444444444445
 Luxembourg       Female   430.1666666666667   417.5555555555556
Netherlands         Male   7560.333333333333    7564.11111111111
Netherlands       Female   6621.833333333333   6633.333333333333

Using the String syntax, the same aggregation becomes:

[11]:
immigration.mean('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
[11]:
    country  gender\time          even_years           odd_years
    Belgium         Male   5039.166666666667   4900.555555555556
    Belgium       Female  3433.3333333333335  3369.6666666666665
 Luxembourg         Male   577.8333333333334   559.4444444444445
 Luxembourg       Female   430.1666666666667   417.5555555555556
Netherlands         Male   7560.333333333333    7564.11111111111
Netherlands       Female   6621.833333333333   6633.333333333333

where we used ; to separate groups of labels from the same axis.