Presenting LArray objects (Axis, Groups, Array, Session)¶
Import the LArray library:
[1]:
from larray import *
[2]:
s = 1 + 2
# In the interactive mode, there is no need to use the print() function
# to display the content of the variable 's'.
# Simply typing 's' is enough
s
[2]:
3
[3]:
# In the interactive mode, there is no need to use the print() function
# to display the result of an expression
1 + 2
[3]:
3
Axis¶
An Axis
represents a dimension of an Array object. It consists of a name and a list of labels.
They are several ways to create an axis:
[4]:
# labels given as a list
time = Axis([2007, 2008, 2009, 2010], 'time')
# create an axis using one string
gender = Axis('gender=M,F')
# labels generated using the special syntax start..end
age = Axis('age=0..100')
time, gender, age
[4]:
(Axis([2007, 2008, 2009, 2010], 'time'),
Axis(['M', 'F'], 'gender'),
Axis([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], 'age'))
"axis_name=list,of,labels"
or "axis_name=start..end"
, LArray will automatically infer the type of labels. For instance, the command line age = Axis("age=0..100")
will create an age axis with labels of type int
. Mixing special characters like +
with numbers will lead to create an axis with labels of type str
instead of int
. As a consequence, the command line age = Axis("age=0..98,99+")
will create an age axis with labels of type
str
instead of int
![5]:
# When a string is passed to the Axis() constructor, LArray will automatically infer the type of the labels
age = Axis("age=0..5")
age
[5]:
Axis([0, 1, 2, 3, 4, 5], 'age')
[6]:
# Mixing special characters like + with numbers will lead to create an axis with labels of type str instead of int.
age = Axis("age=0..4,5+")
age
[6]:
Axis(['0', '1', '2', '3', '4', '5+'], 'age')
See the Axis section of the API Reference to explore all methods of Axis objects.
Groups¶
A Group
represents a selection of labels from an Axis. It can optionally have a name (using operator >>
). Groups can be used when selecting a subset of an array and in aggregations.
Group objects are created as follow:
[7]:
age = Axis('age=0..100')
# create an anonymous Group object 'teens'
teens = age[10:18]
teens
[7]:
age[10:18]
[8]:
# create a Group object 'pensioners' with a name
pensioners = age[67:] >> 'pensioners'
pensioners
[8]:
age[67:] >> 'pensioners'
It is possible to set a name or to rename a group after its declaration:
[9]:
# method 'named' returns a new group with the given name
teens = teens.named('teens')
# operator >> is just a shortcut for the call of the method named
teens = teens >> 'teens'
teens
[9]:
age[10:18] >> 'teens'
[ ]
will generate several groups (a tuple of groups) instead of a single group.If you want to create a single group using both slices and individual labels, you need to use the .union()
method (see below).[10]:
# mixing slices and individual labels leads to the creation of several groups (a tuple of groups)
age[0:10, 20, 30, 40]
[10]:
(age[0:10], age[20], age[30], age[40])
[11]:
# the union() method allows to mix slices and individual labels to create a single group
age[0:10].union(age[20, 30, 40])
[11]:
age[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40].set()
See the Group section of the API Reference to explore all methods of Group objects.
Array¶
An Array
object represents a multidimensional array with labeled axes.
Create an array from scratch¶
To create an array from scratch, you need to provide the data and a list of axes. Optionally, metadata (title, description, creation date, authors, …) can be associated to the array:
[12]:
# define axes
age = Axis('age=0-9,10-17,18-66,67+')
gender = Axis('gender=female,male')
time = Axis('time=2015..2017')
# list of the axes
axes = [age, gender, time]
# define some data. This is the belgian population (in thousands). Source: eurostat.
data = [[[633, 635, 634],
[663, 665, 664]],
[[484, 486, 491],
[505, 511, 516]],
[[3572, 3581, 3583],
[3600, 3618, 3616]],
[[1023, 1038, 1053],
[756, 775, 793]]]
# metadata
meta = {'title': 'random array'}
arr = Array(data, axes, meta=meta)
arr
[12]:
age gender\time 2015 2016 2017
0-9 female 633 635 634
0-9 male 663 665 664
10-17 female 484 486 491
10-17 male 505 511 516
18-66 female 3572 3581 3583
18-66 male 3600 3618 3616
67+ female 1023 1038 1053
67+ male 756 775 793
Metadata can be added to an array at any time using:
[13]:
arr.meta.description = 'array containing random values between 0 and 100'
arr.meta
[13]:
title: random array
description: array containing random values between 0 and 100
Warning:
<li>Currently, only the HDF (.h5) file format supports saving and loading array metadata.</li>
<li>Metadata is not kept when actions or methods are applied on an array
except for operations modifying the object in-place, such as `population[age < 10] = 0`,
and when the method `copy()` is called. Do not add metadata to an array if you know
you will apply actions or methods on it before dumping it.</li>
Array creation functions¶
Arrays can also be generated in an easier way through creation functions:
ndtest
: creates a test array with increasing numbers as dataempty
: creates an array but leaves its allocated memory unchanged (i.e., it contains “garbage”. Be careful !)zeros
: fills an array with 0ones
: fills an array with 1full
: fills an array with a given valuesequence
: creates an array from an axis by iteratively applying a function to a given initial value.
Except for ndtest, a list of axes must be provided. Axes can be passed in different ways:
- as Axis objects
- as integers defining the lengths of auto-generated wildcard axes
- as a string : ‘gender=M,F;time=2007,2008,2009’ (name is optional)
- as pairs (name, labels)
Optionally, the type of data stored by the array can be specified using argument dtype.
[14]:
# start defines the starting value of data
ndtest((3, 3), start=-1)
[14]:
a\b b0 b1 b2
a0 -1 0 1
a1 2 3 4
a2 5 6 7
[15]:
# start defines the starting value of data
# label_start defines the starting index of labels
ndtest((3, 3), start=-1, label_start=2)
[15]:
a\b b2 b3 b4
a2 -1 0 1
a3 2 3 4
a4 5 6 7
[16]:
# empty generates uninitialised array with correct axes
# (much faster but use with care!).
# This not really random either, it just reuses a portion
# of memory that is available, with whatever content is there.
# Use it only if performance matters and make sure all data
# will be overridden.
empty([age, gender, time])
[16]:
age gender\time 2015 2016 2017
0-9 female 4.66582253358486e-310 4.66582254568986e-310 0.0
0-9 male 0.0 0.0 0.0
10-17 female 0.0 0.0 0.0
10-17 male 0.0 0.0 0.0
18-66 female 0.0 0.0 0.0
18-66 male 0.0 0.0 0.0
67+ female 0.0 0.0 0.0
67+ male 0.0 0.0 0.0
[17]:
zeros([age, gender, time])
[17]:
age gender\time 2015 2016 2017
0-9 female 0.0 0.0 0.0
0-9 male 0.0 0.0 0.0
10-17 female 0.0 0.0 0.0
10-17 male 0.0 0.0 0.0
18-66 female 0.0 0.0 0.0
18-66 male 0.0 0.0 0.0
67+ female 0.0 0.0 0.0
67+ male 0.0 0.0 0.0
[18]:
# dtype=int forces to store int data instead of default float
ones([age, gender, time], dtype=int)
[18]:
age gender\time 2015 2016 2017
0-9 female 1 1 1
0-9 male 1 1 1
10-17 female 1 1 1
10-17 male 1 1 1
18-66 female 1 1 1
18-66 male 1 1 1
67+ female 1 1 1
67+ male 1 1 1
[19]:
full([age, gender, time], fill_value=1.23)
[19]:
age gender\time 2015 2016 2017
0-9 female 1.23 1.23 1.23
0-9 male 1.23 1.23 1.23
10-17 female 1.23 1.23 1.23
10-17 male 1.23 1.23 1.23
18-66 female 1.23 1.23 1.23
18-66 male 1.23 1.23 1.23
67+ female 1.23 1.23 1.23
67+ male 1.23 1.23 1.23
All the above functions exist in *(func)_like* variants which take axes from another array
[20]:
ones_like(arr)
[20]:
age gender\time 2015 2016 2017
0-9 female 1 1 1
0-9 male 1 1 1
10-17 female 1 1 1
10-17 male 1 1 1
18-66 female 1 1 1
18-66 male 1 1 1
67+ female 1 1 1
67+ male 1 1 1
Create an array using the special sequence
function (see link to documention of sequence
in API reference for more examples):
[21]:
# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...
sequence(age, initial=1.0, inc=0.5)
[21]:
age 0-9 10-17 18-66 67+
1.0 1.5 2.0 2.5
Inspecting Array objects¶
[22]:
# create a test array
ndtest([age, gender, time])
[22]:
age gender\time 2015 2016 2017
0-9 female 0 1 2
0-9 male 3 4 5
10-17 female 6 7 8
10-17 male 9 10 11
18-66 female 12 13 14
18-66 male 15 16 17
67+ female 18 19 20
67+ male 21 22 23
Get array summary : metadata + dimensions + description of axes + dtype + size in memory
[23]:
arr.info
[23]:
title: random array
description: array containing random values between 0 and 100
4 x 2 x 3
age [4]: '0-9' '10-17' '18-66' '67+'
gender [2]: 'female' 'male'
time [3]: 2015 2016 2017
dtype: int64
memory used: 192 bytes
Get axes
[24]:
arr.axes
[24]:
AxisCollection([
Axis(['0-9', '10-17', '18-66', '67+'], 'age'),
Axis(['female', 'male'], 'gender'),
Axis([2015, 2016, 2017], 'time')
])
Get number of dimensions
[25]:
arr.ndim
[25]:
3
Get length of each dimension
[26]:
arr.shape
[26]:
(4, 2, 3)
Get total number of elements of the array
[27]:
arr.size
[27]:
24
Get type of internal data (int, float, …)
[28]:
arr.dtype
[28]:
dtype('int64')
Get size in memory
[29]:
arr.memory_used
[29]:
'192 bytes'
Display the array in the viewer (graphical user interface) in read-only mode. This will open a new window and block execution of the rest of code until the windows is closed! Required PyQt installed.
view(arr)
Or load it in Excel:
arr.to_excel()
Extract an axis from an array¶
It is possible to extract an axis belonging to an array using its name:
[30]:
# extract the 'time' axis belonging to the 'arr' array
time = arr.time
time
[30]:
Axis([2015, 2016, 2017], 'time')
Session¶
A Session
object is a dictionary-like object used to gather several arrays, axes and groups. A session is particularly adapted to gather all input objects of a model or to gather the output arrays from different scenarios. Like with arrays, it is possible to associate metadata to sessions.
Creating Sessions¶
To create a session, you can first create an empty session and then populate it with arrays, axes and groups:
[31]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013..2017")
# create an empty session
demography_session = Session()
# add axes to the session
demography_session.gender = gender
demography_session.time = time
# add arrays to the session
demography_session.population = zeros((gender, time))
demography_session.births = zeros((gender, time))
demography_session.deaths = zeros((gender, time))
# add metadata after creation
demography_session.meta.title = 'Demographic Model of Belgium'
demography_session.meta.description = 'Models the demography of Belgium'
# print content of the session
print(demography_session.summary())
Metadata:
title: Demographic Model of Belgium
description: Models the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]
or you can create and populate a session in one step:
[32]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013..2017")
demography_session = Session(gender=gender, time=time, population=zeros((gender, time)),
births=zeros((gender, time)), deaths=zeros((gender, time)),
meta=Metadata(title='Demographic Model of Belgium', description='Modelize the demography of Belgium'))
# print content of the session
print(demography_session.summary())
Metadata:
title: Demographic Model of Belgium
description: Modelize the demography of Belgium
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015 2016 2017] (5)
population: gender, time (2 x 5) [float64]
births: gender, time (2 x 5) [float64]
deaths: gender, time (2 x 5) [float64]
Warning:
<li>Contrary to array metadata, saving and loading session metadata is supported for
all current session file formats: Excel, CSV and HDF (.h5).</li>
<li>Metadata is not kept when actions or methods are applied on a session
except for operations modifying a session in-place, such as: `s.arr1 = 0`.
Do not add metadata to a session if you know you will apply actions or methods
on it before dumping it.</li>
More on Session objects¶
To know how to save and load sessions in CSV, Excel or HDF format, please refer to the Loading and Dumping Sessions section of the tutorial.
To see how to work with sessions, please read the Working With Sessions section of the tutorial.
Finally, see the Session section of the API Reference to explore all methods of Session objects.