Interactive online version: Binder badge

Presenting LArray objects (Axis, Groups, Array, Session)

Import the LArray library:

[1]:
from larray import *

Axis

An Axis represents a dimension of an Array object. It consists of a name and a list of labels.

They are several ways to create an axis:

[2]:
# create a wildcard axis
age = Axis(3, 'age')
# labels given as a list
time = Axis([2007, 2008, 2009], 'time')
# create an axis using one string
gender = Axis('gender=M,F')
# labels generated using a special syntax
other = Axis('other=A01..C03')

age, gender, time, other
[2]:
(Axis(3, 'age'),
 Axis(['M', 'F'], 'gender'),
 Axis([2007, 2008, 2009], 'time'),
 Axis(['A01', 'A02', 'A03', 'B01', 'B02', 'B03', 'C01', 'C02', 'C03'], 'other'))

See the Axis section of the API Reference to explore all methods of Axis objects.

Groups

A Group represents a selection of labels from an Axis. It can optionally have a name (using operator >>). Groups can be used when selecting a subset of an array and in aggregations.

Group objects are created as follow:

[3]:
# define an Axis object 'age'
age = Axis('age=0..100')

# create an anonymous Group object 'teens'
teens = age[10:20]
# create a Group object 'pensioners' with a name
pensioners = age[67:] >> 'pensioners'

teens
[3]:
age[10:20]

It is possible to set a name or to rename a group after its declaration:

[4]:
# method 'named' returns a new group with the given name
teens = teens.named('teens')

# operator >> is just a shortcut for the call of the method named
teens = teens >> 'teens'

teens
[4]:
age[10:20] >> 'teens'

See the Group section of the API Reference to explore all methods of Group objects.

Array

An Array object represents a multidimensional array with labeled axes.

Create an array from scratch

To create an array from scratch, you need to provide the data and a list of axes. Optionally, metadata (title, description, creation date, authors, …) can be associated to the array:

[5]:
import numpy as np

# list of the axes
axes = [age, gender, time, other]
# data (the shape of data array must match axes lengths)
data = np.random.randint(100, size=[len(axis) for axis in axes])
# metadata
meta = [('title', 'random array')]

arr = Array(data, axes, meta=meta)
arr
[5]:
age  gender  time\other  A01  A02  A03  B01  B02  B03  C01  C02  C03
  0       M        2007   83   72   98   63   14   18   87   84   85
  0       M        2008   87   54   28    9   68   36   95    8   51
  0       M        2009   27   58    4   35   65   68   74   53    0
  0       F        2007   29   16   92    4   49   94    5    9    2
  0       F        2008   98   83   25   45    2   72    1   98   34
...     ...         ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
100       M        2008   20   37   27   79   25   58   38   98   51
100       M        2009   13    3    1   79   60   82   67    4    1
100       F        2007   78   20   30   73    6   52   89   63   82
100       F        2008   34   99   64   49   41   74   88   32   93
100       F        2009   16   30   49   43   35   35    3   21    8

Metadata can be added to an array at any time using:

[6]:
arr.meta.description = 'array containing random values between 0 and 100'

arr.meta
[6]:
title: random array
description: array containing random values between 0 and 100

Warning:

    <li>Currently, only the HDF (.h5) file format supports saving and loading array metadata.</li>
    <li>Metadata is not kept when actions or methods are applied on an array
        except for operations modifying the object in-place, such as `population[age < 10] = 0`,
        and when the method `copy()` is called. Do not add metadata to an array if you know
        you will apply actions or methods on it before dumping it.</li>
    

Array creation functions

Arrays can also be generated in an easier way through creation functions:

  • ndtest : creates a test array with increasing numbers as data
  • empty : creates an array but leaves its allocated memory unchanged (i.e., it contains “garbage”. Be careful !)
  • zeros: fills an array with 0
  • ones : fills an array with 1
  • full : fills an array with a given value
  • sequence : creates an array from an axis by iteratively applying a function to a given initial value.

Except for ndtest, a list of axes must be provided. Axes can be passed in different ways:

  • as Axis objects
  • as integers defining the lengths of auto-generated wildcard axes
  • as a string : ‘gender=M,F;time=2007,2008,2009’ (name is optional)
  • as pairs (name, labels)

Optionally, the type of data stored by the array can be specified using argument dtype.

[7]:
# start defines the starting value of data
ndtest(['age=0..2', 'gender=M,F', 'time=2007..2009'], start=-1)
[7]:
age  gender\time  2007  2008  2009
  0            M    -1     0     1
  0            F     2     3     4
  1            M     5     6     7
  1            F     8     9    10
  2            M    11    12    13
  2            F    14    15    16
[8]:
# start defines the starting value of data
# label_start defines the starting index of labels
ndtest((3, 3), start=-1, label_start=2)
[8]:
a\b  b2  b3  b4
 a2  -1   0   1
 a3   2   3   4
 a4   5   6   7
[9]:
# empty generates uninitialised array with correct axes
# (much faster but use with care!).
# This not really random either, it just reuses a portion
# of memory that is available, with whatever content is there.
# Use it only if performance matters and make sure all data
# will be overridden.
empty(['age=0..2', 'gender=M,F', 'time=2007..2009'])
[9]:
age  gender\time  ...
  0            M  ...
  0            F  ...
  1            M  ...
  1            F  ...
  2            M  ...
  2            F  ...
[10]:
# example with anonymous axes
zeros(['0..2', 'M,F', '2007..2009'])
[10]:
{0}  {1}\{2}  2007  2008  2009
  0        M   0.0   0.0   0.0
  0        F   0.0   0.0   0.0
  1        M   0.0   0.0   0.0
  1        F   0.0   0.0   0.0
  2        M   0.0   0.0   0.0
  2        F   0.0   0.0   0.0
[11]:
# dtype=int forces to store int data instead of default float
ones(['age=0..2', 'gender=M,F', 'time=2007..2009'], dtype=int)
[11]:
age  gender\time  2007  2008  2009
  0            M     1     1     1
  0            F     1     1     1
  1            M     1     1     1
  1            F     1     1     1
  2            M     1     1     1
  2            F     1     1     1
[12]:
full(['age=0..2', 'gender=M,F', 'time=2007..2009'], 1.23)
[12]:
age  gender\time  2007  2008  2009
  0            M  1.23  1.23  1.23
  0            F  1.23  1.23  1.23
  1            M  1.23  1.23  1.23
  1            F  1.23  1.23  1.23
  2            M  1.23  1.23  1.23
  2            F  1.23  1.23  1.23

All the above functions exist in *(func)_like* variants which take axes from another array

[13]:
ones_like(arr)
[13]:
age  gender  time\other  A01  A02  A03  B01  B02  B03  C01  C02  C03
  0       M        2007    1    1    1    1    1    1    1    1    1
  0       M        2008    1    1    1    1    1    1    1    1    1
  0       M        2009    1    1    1    1    1    1    1    1    1
  0       F        2007    1    1    1    1    1    1    1    1    1
  0       F        2008    1    1    1    1    1    1    1    1    1
...     ...         ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
100       M        2008    1    1    1    1    1    1    1    1    1
100       M        2009    1    1    1    1    1    1    1    1    1
100       F        2007    1    1    1    1    1    1    1    1    1
100       F        2008    1    1    1    1    1    1    1    1    1
100       F        2009    1    1    1    1    1    1    1    1    1

Create an array using the special sequence function (see link to documention of sequence in API reference for more examples):

[14]:
# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...
sequence('gender=M,F', initial=1.0, inc=0.5)
[14]:
gender    M    F
        1.0  1.5

Inspecting Array objects

[15]:
# create a test array
arr = ndtest([age, gender, time, other])

Get array summary : metadata + dimensions + description of axes + dtype + size in memory

[16]:
arr.info
[16]:
101 x 2 x 3 x 9
 age [101]: 0 1 2 ... 98 99 100
 gender [2]: 'M' 'F'
 time [3]: 2007 2008 2009
 other [9]: 'A01' 'A02' 'A03' ... 'C01' 'C02' 'C03'
dtype: int64
memory used: 42.61 Kb

Get axes

[17]:
arr.axes
[17]:
AxisCollection([
    Axis([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], 'age'),
    Axis(['M', 'F'], 'gender'),
    Axis([2007, 2008, 2009], 'time'),
    Axis(['A01', 'A02', 'A03', 'B01', 'B02', 'B03', 'C01', 'C02', 'C03'], 'other')
])

Get number of dimensions

[18]:
arr.ndim
[18]:
4

Get length of each dimension

[19]:
arr.shape
[19]:
(101, 2, 3, 9)

Get total number of elements of the array

[20]:
arr.size
[20]:
5454

Get type of internal data (int, float, …)

[21]:
arr.dtype
[21]:
dtype('int64')

Get size in memory

[22]:
arr.memory_used
[22]:
'42.61 Kb'

Display the array in the viewer (graphical user interface) in read-only mode. This will open a new window and block execution of the rest of code until the windows is closed! Required PyQt installed.

view(arr)

Or load it in Excel:

arr.to_excel()

Extract an axis from an array

It is possible to extract an axis belonging to an array using its name:

[23]:
# extract the 'time' axis belonging to the 'arr' array
time = arr.time
time
[23]:
Axis([2007, 2008, 2009], 'time')

More on Array objects

To know how to save and load arrays in CSV, Excel or HDF format, please refer to the Loading and Dumping Arrays section of the tutorial.

See the Array section of the API Reference to explore all methods of Array objects.

Session

A Session object is a dictionary-like object used to gather several arrays, axes and groups. A session is particularly adapted to gather all input objects of a model or to gather the output arrays from different scenarios. Like with arrays, it is possible to associate metadata to sessions.

Creating Sessions

To create a session, you can first create an empty session and then populate it with arrays, axes and groups:

[24]:
# create an empty session
demography_session = Session()

# add axes to the session
gender = Axis("gender=Male,Female")
demography_session.gender = gender
time = Axis("time=2013..2017")
demography_session.time = time

# add arrays to the session
demography_session.population = zeros((gender, time))
demography_session.births = zeros((gender, time))
demography_session.deaths = zeros((gender, time))

# add metadata after creation
demography_session.meta.title = 'Demographic Model of Belgium'
demography_session.meta.description = 'Models the demography of Belgium'

# print content of the session
print(demography_session.summary())
Metadata:
        title: Demographic Model of Belgium
        description: Models the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]

or you can create and populate a session in one step:

[25]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013..2017")

# create and populate a new session in one step
# Python <= 3.5
demography_session = Session([('gender', gender), ('time', time), ('population', zeros((gender, time))),
                    ('births', zeros((gender, time))), ('deaths', zeros((gender, time)))],
                     meta=[('title', 'Demographic Model of Belgium'),('description', 'Modelize the demography of Belgium')])
# Python 3.6+
demography_session = Session(gender=gender, time=time, population=zeros((gender, time)),
                     births=zeros((gender, time)), deaths=zeros((gender, time)),
                     meta=Metadata(title='Demographic Model of Belgium', description='Modelize the demography of Belgium'))

# print content of the session
print(demography_session.summary())
Metadata:
        title: Demographic Model of Belgium
        description: Modelize the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]

Warning:

    <li>Contrary to array metadata, saving and loading session metadata is supported for
        all current session file formats: Excel, CSV and HDF (.h5).</li>
    <li>Metadata is not kept when actions or methods are applied on a session
        except for operations modifying a session in-place, such as: `s.arr1 = 0`.
        Do not add metadata to a session if you know you will apply actions or methods
        on it before dumping it.</li>
    

More on Session objects

To know how to save and load sessions in CSV, Excel or HDF format, please refer to the Loading and Dumping Sessions section of the tutorial.

To see how to work with sessions, please read the Working With Sessions section of the tutorial.

Finally, see the Session section of the API Reference to explore all methods of Session objects.