Interactive online version: Binder badge

Presenting LArray objects (Axis, Groups, Array, Session)

Import the LArray library:

[1]:
from larray import *

Axis

An Axis represents a dimension of an Array object. It consists of a name and a list of labels.

They are several ways to create an axis:

[2]:
# create a wildcard axis
age = Axis(3, 'age')
# labels given as a list
time = Axis([2007, 2008, 2009], 'time')
# create an axis using one string
gender = Axis('gender=M,F')
# labels generated using a special syntax
other = Axis('other=A01..C03')

age, gender, time, other
[2]:
(Axis(3, 'age'),
 Axis(['M', 'F'], 'gender'),
 Axis([2007, 2008, 2009], 'time'),
 Axis(['A01', 'A02', 'A03', 'B01', 'B02', 'B03', 'C01', 'C02', 'C03'], 'other'))

See the Axis section of the API Reference to explore all methods of Axis objects.

Groups

A Group represents a selection of labels from an Axis. It can optionally have a name (using operator >>). Groups can be used when selecting a subset of an array and in aggregations.

Group objects are created as follow:

[3]:
# define an Axis object 'age'
age = Axis('age=0..100')

# create an anonymous Group object 'teens'
teens = age[10:20]
# create a Group object 'pensioners' with a name
pensioners = age[67:] >> 'pensioners'

teens
[3]:
age[10:20]

It is possible to set a name or to rename a group after its declaration:

[4]:
# method 'named' returns a new group with the given name
teens = teens.named('teens')

# operator >> is just a shortcut for the call of the method named
teens = teens >> 'teens'

teens
[4]:
age[10:20] >> 'teens'

See the Group section of the API Reference to explore all methods of Group objects.

Array

An Array object represents a multidimensional array with labeled axes.

Create an array from scratch

To create an array from scratch, you need to provide the data and a list of axes. Optionally, metadata (title, description, creation date, authors, …) can be associated to the array:

[5]:
import numpy as np

# list of the axes
axes = [age, gender, time, other]
# data (the shape of data array must match axes lengths)
data = np.random.randint(100, size=[len(axis) for axis in axes])
# metadata
meta = [('title', 'random array')]

arr = Array(data, axes, meta=meta)
arr
[5]:
age  gender  time\other  A01  A02  A03  B01  B02  B03  C01  C02  C03
  0       M        2007   33   46   54   37   16   98   44    8   23
  0       M        2008    8   42   60   49   64   76    9   88   52
  0       M        2009   73   71   31   99   32   54   70   70   52
  0       F        2007   96   26   53   19   61   32   58    6   25
  0       F        2008   85   17   93   90   67   38   77   51   92
...     ...         ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
100       M        2008   17    9   40   65   34   99   76   97   93
100       M        2009    8   55   77   21   84   96    1   14   64
100       F        2007   10   25   70   93   74    8   11   67   35
100       F        2008   19   85   36   91   82   45   98   64   63
100       F        2009   75   98   20   13   11   20   86   91   10

Metadata can be added to an array at any time using:

[6]:
arr.meta.description = 'array containing random values between 0 and 100'

arr.meta
[6]:
title: random array
description: array containing random values between 0 and 100

Warning:

  • Currently, only the HDF (.h5) file format supports saving and loading array metadata.

  • Metadata is not kept when actions or methods are applied on an array except for operations modifying the object in-place, such as population[age < 10] = 0, and when the method copy() is called. Do not add metadata to an array if you know you will apply actions or methods on it before dumping it.

Array creation functions

Arrays can also be generated in an easier way through creation functions:

  • ndtest : creates a test array with increasing numbers as data

  • empty : creates an array but leaves its allocated memory unchanged (i.e., it contains “garbage”. Be careful !)

  • zeros: fills an array with 0

  • ones : fills an array with 1

  • full : fills an array with a given value

  • sequence : creates an array from an axis by iteratively applying a function to a given initial value.

Except for ndtest, a list of axes must be provided. Axes can be passed in different ways:

  • as Axis objects

  • as integers defining the lengths of auto-generated wildcard axes

  • as a string : ‘gender=M,F;time=2007,2008,2009’ (name is optional)

  • as pairs (name, labels)

Optionally, the type of data stored by the array can be specified using argument dtype.

[7]:
# start defines the starting value of data
ndtest(['age=0..2', 'gender=M,F', 'time=2007..2009'], start=-1)
[7]:
age  gender\time  2007  2008  2009
  0            M    -1     0     1
  0            F     2     3     4
  1            M     5     6     7
  1            F     8     9    10
  2            M    11    12    13
  2            F    14    15    16
[8]:
# start defines the starting value of data
# label_start defines the starting index of labels
ndtest((3, 3), start=-1, label_start=2)
[8]:
a\b  b2  b3  b4
 a2  -1   0   1
 a3   2   3   4
 a4   5   6   7
[9]:
# empty generates uninitialised array with correct axes
# (much faster but use with care!).
# This not really random either, it just reuses a portion
# of memory that is available, with whatever content is there.
# Use it only if performance matters and make sure all data
# will be overridden.
empty(['age=0..2', 'gender=M,F', 'time=2007..2009'])
[9]:
age  gender\time  ...
  0            M  ...
  0            F  ...
  1            M  ...
  1            F  ...
  2            M  ...
  2            F  ...
[10]:
# example with anonymous axes
zeros(['0..2', 'M,F', '2007..2009'])
[10]:
{0}  {1}\{2}  2007  2008  2009
  0        M   0.0   0.0   0.0
  0        F   0.0   0.0   0.0
  1        M   0.0   0.0   0.0
  1        F   0.0   0.0   0.0
  2        M   0.0   0.0   0.0
  2        F   0.0   0.0   0.0
[11]:
# dtype=int forces to store int data instead of default float
ones(['age=0..2', 'gender=M,F', 'time=2007..2009'], dtype=int)
[11]:
age  gender\time  2007  2008  2009
  0            M     1     1     1
  0            F     1     1     1
  1            M     1     1     1
  1            F     1     1     1
  2            M     1     1     1
  2            F     1     1     1
[12]:
full(['age=0..2', 'gender=M,F', 'time=2007..2009'], 1.23)
[12]:
age  gender\time  2007  2008  2009
  0            M  1.23  1.23  1.23
  0            F  1.23  1.23  1.23
  1            M  1.23  1.23  1.23
  1            F  1.23  1.23  1.23
  2            M  1.23  1.23  1.23
  2            F  1.23  1.23  1.23

All the above functions exist in *(func)_like* variants which take axes from another array

[13]:
ones_like(arr)
[13]:
age  gender  time\other  A01  A02  A03  B01  B02  B03  C01  C02  C03
  0       M        2007    1    1    1    1    1    1    1    1    1
  0       M        2008    1    1    1    1    1    1    1    1    1
  0       M        2009    1    1    1    1    1    1    1    1    1
  0       F        2007    1    1    1    1    1    1    1    1    1
  0       F        2008    1    1    1    1    1    1    1    1    1
...     ...         ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
100       M        2008    1    1    1    1    1    1    1    1    1
100       M        2009    1    1    1    1    1    1    1    1    1
100       F        2007    1    1    1    1    1    1    1    1    1
100       F        2008    1    1    1    1    1    1    1    1    1
100       F        2009    1    1    1    1    1    1    1    1    1

Create an array using the special sequence function (see link to documention of sequence in API reference for more examples):

[14]:
# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...
sequence('gender=M,F', initial=1.0, inc=0.5)
[14]:
gender    M    F
        1.0  1.5

Inspecting Array objects

[15]:
# create a test array
arr = ndtest([age, gender, time, other])

Get array summary : metadata + dimensions + description of axes + dtype + size in memory

[16]:
arr.info
[16]:
101 x 2 x 3 x 9
 age [101]: 0 1 2 ... 98 99 100
 gender [2]: 'M' 'F'
 time [3]: 2007 2008 2009
 other [9]: 'A01' 'A02' 'A03' ... 'C01' 'C02' 'C03'
dtype: int64
memory used: 42.61 Kb

Get axes

[17]:
arr.axes
[17]:
AxisCollection([
    Axis([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], 'age'),
    Axis(['M', 'F'], 'gender'),
    Axis([2007, 2008, 2009], 'time'),
    Axis(['A01', 'A02', 'A03', 'B01', 'B02', 'B03', 'C01', 'C02', 'C03'], 'other')
])

Get number of dimensions

[18]:
arr.ndim
[18]:
4

Get length of each dimension

[19]:
arr.shape
[19]:
(101, 2, 3, 9)

Get total number of elements of the array

[20]:
arr.size
[20]:
5454

Get type of internal data (int, float, …)

[21]:
arr.dtype
[21]:
dtype('int64')

Get size in memory

[22]:
arr.memory_used
[22]:
'42.61 Kb'

Display the array in the viewer (graphical user interface) in read-only mode. This will open a new window and block execution of the rest of code until the windows is closed! Required PyQt installed.

view(arr)

Or load it in Excel:

arr.to_excel()

Extract an axis from an array

It is possible to extract an axis belonging to an array using its name:

[23]:
# extract the 'time' axis belonging to the 'arr' array
time = arr.time
time
[23]:
Axis([2007, 2008, 2009], 'time')

More on Array objects

To know how to save and load arrays in CSV, Excel or HDF format, please refer to the Loading and Dumping Arrays section of the tutorial.

See the Array section of the API Reference to explore all methods of Array objects.

Session

A Session object is a dictionary-like object used to gather several arrays, axes and groups. A session is particularly adapted to gather all input objects of a model or to gather the output arrays from different scenarios. Like with arrays, it is possible to associate metadata to sessions.

Creating Sessions

To create a session, you can first create an empty session and then populate it with arrays, axes and groups:

[24]:
# create an empty session
demography_session = Session()

# add axes to the session
gender = Axis("gender=Male,Female")
demography_session.gender = gender
time = Axis("time=2013..2017")
demography_session.time = time

# add arrays to the session
demography_session.population = zeros((gender, time))
demography_session.births = zeros((gender, time))
demography_session.deaths = zeros((gender, time))

# add metadata after creation
demography_session.meta.title = 'Demographic Model of Belgium'
demography_session.meta.description = 'Models the demography of Belgium'

# print content of the session
print(demography_session.summary())
Metadata:
        title: Demographic Model of Belgium
        description: Models the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]

or you can create and populate a session in one step:

[25]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013..2017")

# create and populate a new session in one step
# Python <= 3.5
demography_session = Session([('gender', gender), ('time', time), ('population', zeros((gender, time))),
                    ('births', zeros((gender, time))), ('deaths', zeros((gender, time)))],
                     meta=[('title', 'Demographic Model of Belgium'),('description', 'Modelize the demography of Belgium')])
# Python 3.6+
demography_session = Session(gender=gender, time=time, population=zeros((gender, time)),
                     births=zeros((gender, time)), deaths=zeros((gender, time)),
                     meta=Metadata(title='Demographic Model of Belgium', description='Modelize the demography of Belgium'))

# print content of the session
print(demography_session.summary())
Metadata:
        title: Demographic Model of Belgium
        description: Modelize the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]

Warning:

  • Contrary to array metadata, saving and loading session metadata is supported for all current session file formats: Excel, CSV and HDF (.h5).

  • Metadata is not kept when actions or methods are applied on a session except for operations modifying a session in-place, such as: s.arr1 = 0. Do not add metadata to a session if you know you will apply actions or methods on it before dumping it.

More on Session objects

To know how to save and load sessions in CSV, Excel or HDF format, please refer to the Loading and Dumping Sessions section of the tutorial.

To see how to work with sessions, please read the Working With Sessions section of the tutorial.

Finally, see the Session section of the API Reference to explore all methods of Session objects.