Interactive online version: Binder badge

Presenting LArray objects (Axis, Groups, Array, Session)

Import the LArray library:

[1]:
from larray import *
Note: The tutorial is generated from Jupyter notebooks which work in the “interactive” mode (like in the LArray Editor console). In the interactive mode, there is no need to use the print() function to display the content of a variable. Simply writing its name is enough. The same remark applies for the returned value of an expression. In a Python script (file with .py extension), you always need to use the print() function to display the content of a variable or the value returned by a function or an expression.
[2]:
s = 1 + 2

# In the interactive mode, there is no need to use the print() function
# to display the content of the variable 's'.
# Simply typing 's' is enough
s
[2]:
3
[3]:
# In the interactive mode, there is no need to use the print() function
# to display the result of an expression
1 + 2
[3]:
3

Axis

An Axis represents a dimension of an Array object. It consists of a name and a list of labels.

They are several ways to create an axis:

[4]:
# labels given as a list
time = Axis([2007, 2008, 2009, 2010], 'time')
# create an axis using one string
gender = Axis('gender=M,F')
# labels generated using the special syntax start..end
age = Axis('age=0..100')

time, gender, age
[4]:
(Axis([2007, 2008, 2009, 2010], 'time'),
 Axis(['M', 'F'], 'gender'),
 Axis([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], 'age'))
Warning:
When using the string syntax "axis_name=list,of,labels" or "axis_name=start..end", LArray will automatically infer the type of labels. For instance, the command line age = Axis("age=0..100") will create an age axis with labels of type int. Mixing special characters like + with numbers will lead to create an axis with labels of type str instead of int. As a consequence, the command line age = Axis("age=0..98,99+") will create an age axis with labels of type str instead of int!
[5]:
# When a string is passed to the Axis() constructor, LArray will automatically infer the type of the labels
age = Axis("age=0..5")
age
[5]:
Axis([0, 1, 2, 3, 4, 5], 'age')
[6]:
# Mixing special characters like + with numbers will lead to create an axis with labels of type str instead of int.
age = Axis("age=0..4,5+")
age
[6]:
Axis(['0', '1', '2', '3', '4', '5+'], 'age')

See the Axis section of the API Reference to explore all methods of Axis objects.

Groups

A Group represents a selection of labels from an Axis. It can optionally have a name (using operator >>). Groups can be used when selecting a subset of an array and in aggregations.

Group objects are created as follow:

[7]:
age = Axis('age=0..100')

# create an anonymous Group object 'teens'
teens = age[10:18]
teens
[7]:
age[10:18]
[8]:
# create a Group object 'pensioners' with a name
pensioners = age[67:] >> 'pensioners'
pensioners
[8]:
age[67:] >> 'pensioners'

It is possible to set a name or to rename a group after its declaration:

[9]:
# method 'named' returns a new group with the given name
teens = teens.named('teens')

# operator >> is just a shortcut for the call of the method named
teens = teens >> 'teens'

teens
[9]:
age[10:18] >> 'teens'
Warning: Mixing slices and individual labels inside the [ ] will generate several groups (a tuple of groups) instead of a single group.If you want to create a single group using both slices and individual labels, you need to use the .union() method (see below).
[10]:
# mixing slices and individual labels leads to the creation of several groups (a tuple of groups)
age[0:10, 20, 30, 40]
[10]:
(age[0:10], age[20], age[30], age[40])
[11]:
# the union() method allows to mix slices and individual labels to create a single group
age[0:10].union(age[20, 30, 40])
[11]:
age[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40].set()

See the Group section of the API Reference to explore all methods of Group objects.

Array

An Array object represents a multidimensional array with labeled axes.

Create an array from scratch

To create an array from scratch, you need to provide the data and a list of axes. Optionally, metadata (title, description, creation date, authors, …) can be associated to the array:

[12]:
# define axes
age = Axis('age=0-9,10-17,18-66,67+')
gender = Axis('gender=female,male')
time = Axis('time=2015..2017')
# list of the axes
axes = [age, gender, time]

# define some data. This is the belgian population (in thousands). Source: eurostat.
data = [[[633, 635, 634],
         [663, 665, 664]],
        [[484, 486, 491],
         [505, 511, 516]],
        [[3572, 3581, 3583],
         [3600, 3618, 3616]],
        [[1023, 1038, 1053],
         [756, 775, 793]]]

# metadata
meta = {'title': 'random array'}

arr = Array(data, axes, meta=meta)
arr
[12]:
  age  gender\time  2015  2016  2017
  0-9       female   633   635   634
  0-9         male   663   665   664
10-17       female   484   486   491
10-17         male   505   511   516
18-66       female  3572  3581  3583
18-66         male  3600  3618  3616
  67+       female  1023  1038  1053
  67+         male   756   775   793

Metadata can be added to an array at any time using:

[13]:
arr.meta.description = 'array containing random values between 0 and 100'

arr.meta
[13]:
title: random array
description: array containing random values between 0 and 100

Warning:

    <li>Currently, only the HDF (.h5) file format supports saving and loading array metadata.</li>
    <li>Metadata is not kept when actions or methods are applied on an array
        except for operations modifying the object in-place, such as `population[age < 10] = 0`,
        and when the method `copy()` is called. Do not add metadata to an array if you know
        you will apply actions or methods on it before dumping it.</li>
    

Array creation functions

Arrays can also be generated in an easier way through creation functions:

  • ndtest : creates a test array with increasing numbers as data
  • empty : creates an array but leaves its allocated memory unchanged (i.e., it contains “garbage”. Be careful !)
  • zeros: fills an array with 0
  • ones : fills an array with 1
  • full : fills an array with a given value
  • sequence : creates an array from an axis by iteratively applying a function to a given initial value.

Except for ndtest, a list of axes must be provided. Axes can be passed in different ways:

  • as Axis objects
  • as integers defining the lengths of auto-generated wildcard axes
  • as a string : ‘gender=M,F;time=2007,2008,2009’ (name is optional)
  • as pairs (name, labels)

Optionally, the type of data stored by the array can be specified using argument dtype.

[14]:
# start defines the starting value of data
ndtest((3, 3), start=-1)
[14]:
a\b  b0  b1  b2
 a0  -1   0   1
 a1   2   3   4
 a2   5   6   7
[15]:
# start defines the starting value of data
# label_start defines the starting index of labels
ndtest((3, 3), start=-1, label_start=2)
[15]:
a\b  b2  b3  b4
 a2  -1   0   1
 a3   2   3   4
 a4   5   6   7
[16]:
# empty generates uninitialised array with correct axes
# (much faster but use with care!).
# This not really random either, it just reuses a portion
# of memory that is available, with whatever content is there.
# Use it only if performance matters and make sure all data
# will be overridden.
empty([age, gender, time])
[16]:
  age  gender\time  ...
  0-9       female  ...
  0-9         male  ...
10-17       female  ...
10-17         male  ...
18-66       female  ...
18-66         male  ...
  67+       female  ...
  67+         male  ...
[17]:
zeros([age, gender, time])
[17]:
  age  gender\time  2015  2016  2017
  0-9       female   0.0   0.0   0.0
  0-9         male   0.0   0.0   0.0
10-17       female   0.0   0.0   0.0
10-17         male   0.0   0.0   0.0
18-66       female   0.0   0.0   0.0
18-66         male   0.0   0.0   0.0
  67+       female   0.0   0.0   0.0
  67+         male   0.0   0.0   0.0
[18]:
# dtype=int forces to store int data instead of default float
ones([age, gender, time], dtype=int)
[18]:
  age  gender\time  2015  2016  2017
  0-9       female     1     1     1
  0-9         male     1     1     1
10-17       female     1     1     1
10-17         male     1     1     1
18-66       female     1     1     1
18-66         male     1     1     1
  67+       female     1     1     1
  67+         male     1     1     1
[19]:
full([age, gender, time], fill_value=1.23)
[19]:
  age  gender\time  2015  2016  2017
  0-9       female  1.23  1.23  1.23
  0-9         male  1.23  1.23  1.23
10-17       female  1.23  1.23  1.23
10-17         male  1.23  1.23  1.23
18-66       female  1.23  1.23  1.23
18-66         male  1.23  1.23  1.23
  67+       female  1.23  1.23  1.23
  67+         male  1.23  1.23  1.23

All the above functions exist in *(func)_like* variants which take axes from another array

[20]:
ones_like(arr)
[20]:
  age  gender\time  2015  2016  2017
  0-9       female     1     1     1
  0-9         male     1     1     1
10-17       female     1     1     1
10-17         male     1     1     1
18-66       female     1     1     1
18-66         male     1     1     1
  67+       female     1     1     1
  67+         male     1     1     1

Create an array using the special sequence function (see link to documention of sequence in API reference for more examples):

[21]:
# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...
sequence(age, initial=1.0, inc=0.5)
[21]:
age  0-9  10-17  18-66  67+
     1.0    1.5    2.0  2.5

Inspecting Array objects

[22]:
# create a test array
ndtest([age, gender, time])
[22]:
  age  gender\time  2015  2016  2017
  0-9       female     0     1     2
  0-9         male     3     4     5
10-17       female     6     7     8
10-17         male     9    10    11
18-66       female    12    13    14
18-66         male    15    16    17
  67+       female    18    19    20
  67+         male    21    22    23

Get array summary : metadata + dimensions + description of axes + dtype + size in memory

[23]:
arr.info
[23]:
title: random array
description: array containing random values between 0 and 100
4 x 2 x 3
 age [4]: '0-9' '10-17' '18-66' '67+'
 gender [2]: 'female' 'male'
 time [3]: 2015 2016 2017
dtype: int64
memory used: 192 bytes

Get axes

[24]:
arr.axes
[24]:
AxisCollection([
    Axis(['0-9', '10-17', '18-66', '67+'], 'age'),
    Axis(['female', 'male'], 'gender'),
    Axis([2015, 2016, 2017], 'time')
])

Get number of dimensions

[25]:
arr.ndim
[25]:
3

Get length of each dimension

[26]:
arr.shape
[26]:
(4, 2, 3)

Get total number of elements of the array

[27]:
arr.size
[27]:
24

Get type of internal data (int, float, …)

[28]:
arr.dtype
[28]:
dtype('int64')

Get size in memory

[29]:
arr.memory_used
[29]:
'192 bytes'

Display the array in the viewer (graphical user interface) in read-only mode. This will open a new window and block execution of the rest of code until the windows is closed! Required PyQt installed.

view(arr)

Or load it in Excel:

arr.to_excel()

Extract an axis from an array

It is possible to extract an axis belonging to an array using its name:

[30]:
# extract the 'time' axis belonging to the 'arr' array
time = arr.time
time
[30]:
Axis([2015, 2016, 2017], 'time')

More on Array objects

To know how to save and load arrays in CSV, Excel or HDF format, please refer to the Loading and Dumping Arrays section of the tutorial.

See the Array section of the API Reference to explore all methods of Array objects.

Session

A Session object is a dictionary-like object used to gather several arrays, axes and groups. A session is particularly adapted to gather all input objects of a model or to gather the output arrays from different scenarios. Like with arrays, it is possible to associate metadata to sessions.

Creating Sessions

To create a session, you can first create an empty session and then populate it with arrays, axes and groups:

[31]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013..2017")

# create an empty session
demography_session = Session()

# add axes to the session
demography_session.gender = gender
demography_session.time = time

# add arrays to the session
demography_session.population = zeros((gender, time))
demography_session.births = zeros((gender, time))
demography_session.deaths = zeros((gender, time))

# add metadata after creation
demography_session.meta.title = 'Demographic Model of Belgium'
demography_session.meta.description = 'Models the demography of Belgium'

# print content of the session
print(demography_session.summary())
Metadata:
        title: Demographic Model of Belgium
        description: Models the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]

or you can create and populate a session in one step:

[32]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013..2017")

demography_session = Session(gender=gender, time=time, population=zeros((gender, time)),
                     births=zeros((gender, time)), deaths=zeros((gender, time)),
                     meta=Metadata(title='Demographic Model of Belgium', description='Modelize the demography of Belgium'))

# print content of the session
print(demography_session.summary())
Metadata:
        title: Demographic Model of Belgium
        description: Modelize the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]

Warning:

    <li>Contrary to array metadata, saving and loading session metadata is supported for
        all current session file formats: Excel, CSV and HDF (.h5).</li>
    <li>Metadata is not kept when actions or methods are applied on a session
        except for operations modifying a session in-place, such as: `s.arr1 = 0`.
        Do not add metadata to a session if you know you will apply actions or methods
        on it before dumping it.</li>
    

More on Session objects

To know how to save and load sessions in CSV, Excel or HDF format, please refer to the Loading and Dumping Sessions section of the tutorial.

To see how to work with sessions, please read the Working With Sessions section of the tutorial.

Finally, see the Session section of the API Reference to explore all methods of Session objects.