Pythonic VS String Syntax
Import the LArray library:
[1]:
from larray import *
The LArray library offers two syntaxes to build axes and make selections and aggregations. The first one is more Pythonic
(uses Python structures) For example, you can create an age_category axis as follows:
[2]:
age_category = Axis(["0-9", "10-17", "18-66", "67+"], "age_category")
age_category
[2]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')
The second one consists of using strings
that are parsed. It is shorter to type. The same age_category axis could have been generated as follows:
[3]:
age_category = Axis("age_category=0-9,10-17,18-66,67+")
age_category
[3]:
Axis(['0-9', '10-17', '18-66', '67+'], 'age_category')
Warning: The drawback of the string syntax is that some characters such as , ; = : .. [ ] >>
have a special meaning and cannot be used with the String
syntax. If you need to work with labels containing such special characters (when importing data from an external source for example), you have to use the Pythonic
syntax which allows to use any character in labels.
String Syntax
Axes And Arrays creation
The string syntax allows to easily create axes.
When creating one axis, the labels are separated using ,
:
[4]:
a = Axis('a=a0,a1,a2,a3')
a
[4]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')
The special syntax start..stop
generates a sequence of labels:
[5]:
a = Axis('a=a0..a3')
a
[5]:
Axis(['a0', 'a1', 'a2', 'a3'], 'a')
When creating an array, it is possible to define several axes in the same string using ;
[6]:
arr = zeros("a=a0..a2; b=b0,b1; c=c0..c5")
arr
[6]:
a b\c c0 c1 c2 c3 c4 c5
a0 b0 0.0 0.0 0.0 0.0 0.0 0.0
a0 b1 0.0 0.0 0.0 0.0 0.0 0.0
a1 b0 0.0 0.0 0.0 0.0 0.0 0.0
a1 b1 0.0 0.0 0.0 0.0 0.0 0.0
a2 b0 0.0 0.0 0.0 0.0 0.0 0.0
a2 b1 0.0 0.0 0.0 0.0 0.0 0.0
Selection
Starting from the array:
[7]:
immigration = load_example_data('demography_eurostat').immigration
immigration.info
[7]:
title: Immigration by age group, sex and citizenship
source: table migr_imm1ctz from Eurostat
3 x 3 x 2 x 5
country [3]: 'Belgium' 'Luxembourg' 'Netherlands'
citizenship [3]: 'Belgium' 'Luxembourg' 'Netherlands'
gender [2]: 'Male' 'Female'
time [5]: 2013 2014 2015 2016 2017
dtype: int32
memory used: 360 bytes
an example of a selection using the Pythonic
syntax is:
[8]:
# since the labels 'Belgium' and 'Netherlands' also exists in the 'citizenship' axis,
# we need to explicitly specify that we want to make a selection over the 'country' axis
immigration_subset = immigration[X.country['Belgium', 'Netherlands'], 'Female', 2015:]
immigration_subset
[8]:
country citizenship\time 2015 2016 2017
Belgium Belgium 6486 6560 6454
Belgium Luxembourg 114 108 118
Belgium Netherlands 3942 3664 3632
Netherlands Belgium 1181 1340 1449
Netherlands Luxembourg 46 60 97
Netherlands Netherlands 18084 19815 20894
Using the String
syntax, the same selection becomes:
[9]:
immigration_subset = immigration['country[Belgium,Netherlands]', 'Female', 2015:]
immigration_subset
[9]:
country citizenship\time 2015 2016 2017
Belgium Belgium 6486 6560 6454
Belgium Luxembourg 114 108 118
Belgium Netherlands 3942 3664 3632
Netherlands Belgium 1181 1340 1449
Netherlands Luxembourg 46 60 97
Netherlands Netherlands 18084 19815 20894
Aggregation
An example of an aggregation using the Pythonic
syntax is:
[10]:
immigration.mean((X.time[2014::2] >> 'even_years', X.time[::2] >> 'odd_years'), 'citizenship')
[10]:
country gender\time even_years odd_years
Belgium Male 5039.166666666667 4900.555555555556
Belgium Female 3433.3333333333335 3369.6666666666665
Luxembourg Male 577.8333333333334 559.4444444444445
Luxembourg Female 430.1666666666667 417.5555555555556
Netherlands Male 7560.333333333333 7564.11111111111
Netherlands Female 6621.833333333333 6633.333333333333
Using the String
syntax, the same aggregation becomes:
[11]:
immigration.mean('time[2014::2] >> even_years; time[::2] >> odd_years', 'citizenship')
[11]:
country gender\time even_years odd_years
Belgium Male 5039.166666666667 4900.555555555556
Belgium Female 3433.3333333333335 3369.6666666666665
Luxembourg Male 577.8333333333334 559.4444444444445
Luxembourg Female 430.1666666666667 417.5555555555556
Netherlands Male 7560.333333333333 7564.11111111111
Netherlands Female 6621.833333333333 6633.333333333333
where we used ;
to separate groups of labels from the same axis.