Getting Started¶
The purpose of the present Getting Started section is to give a quick overview of the main objects and features of the LArray library. To get a more detailed presentation of all capabilities of LArray, read the next sections of the tutorial. The API Reference section of the documentation give you the list of all objects, methods and functions with their individual documentation and examples.
To use the LArray library, the first thing to do is to import it:
In [1]: from larray import *
Create an array¶
Working with the LArray library mainly consists of manipulating LArray data structures. They represent N-dimensional labelled arrays and are composed of raw data (NumPy ndarray), axes and optionally some metadata.
An axis represents a dimension of an array. It contains a list of labels and has a name:
# define some axes to be used later
In [2]: age = Axis(['0-9', '10-17', '18-66', '67+'], 'age')
In [3]: sex = Axis(['F', 'M'], 'sex')
In [4]: year = Axis([2015, 2016, 2017], 'year')
The labels allow to select subsets and to manipulate the data without working with the positions of array elements directly.
To create an array from scratch, you need to supply data and axes:
# define some data. This is the belgian population (in thousands). Source: eurostat.
In [5]: data = [[[633, 635, 634],
...: [663, 665, 664]],
...: [[484, 486, 491],
...: [505, 511, 516]],
...: [[3572, 3581, 3583],
...: [3600, 3618, 3616]],
...: [[1023, 1038, 1053],
...: [756, 775, 793]]]
...:
# create an LArray object
In [6]: pop = LArray(data, axes=[age, sex, year])
In [7]: pop
Out[7]:
age sex\year 2015 2016 2017
0-9 F 633 635 634
0-9 M 663 665 664
10-17 F 484 486 491
10-17 M 505 511 516
18-66 F 3572 3581 3583
18-66 M 3600 3618 3616
67+ F 1023 1038 1053
67+ M 756 775 793
You can optionally attach some metadata to an array:
# attach some metadata to the pop array
In [8]: pop.meta.title = 'population by age, sex and year'
In [9]: pop.meta.source = 'Eurostat'
# display metadata
In [10]: pop.meta
Out[10]:
title: population by age, sex and year
source: Eurostat
To get a short summary of an array, type:
# Array summary: metadata + dimensions + description of axes
In [11]: pop.info
Out[11]:
title: population by age, sex and year
source: Eurostat
4 x 2 x 3
age [4]: '0-9' '10-17' '18-66' '67+'
sex [2]: 'F' 'M'
year [3]: 2015 2016 2017
dtype: int64
memory used: 192 bytes
Create an array filled with predefined values¶
Arrays filled with predefined values can be generated through dedicated functions:
zeros()
: creates an array filled with 0
In [12]: zeros([age, sex])
Out[12]:
age\sex F M
0-9 0.0 0.0
10-17 0.0 0.0
18-66 0.0 0.0
67+ 0.0 0.0
ones()
: creates an array filled with 1
In [13]: ones([age, sex])
Out[13]:
age\sex F M
0-9 1.0 1.0
10-17 1.0 1.0
18-66 1.0 1.0
67+ 1.0 1.0
full()
: creates an array filled with a given value
In [14]: full([age, sex], fill_value=10.0)
Out[14]:
age\sex F M
0-9 10.0 10.0
10-17 10.0 10.0
18-66 10.0 10.0
67+ 10.0 10.0
sequence()
: creates an array by sequentially applying modifications to the array along axis.
In [15]: sequence(age)
Out[15]:
age 0-9 10-17 18-66 67+
0 1 2 3
ndtest()
: creates a test array with increasing numbers as data
In [16]: ndtest([age, sex])
Out[16]:
age\sex F M
0-9 0 1
10-17 2 3
18-66 4 5
67+ 6 7
Save/Load an array¶
The LArray library offers many I/O functions to read and write arrays in various formats
(CSV, Excel, HDF5). For example, to save an array in a CSV file, call the method
to_csv()
:
# save our pop array to a CSV file
In [17]: pop.to_csv('belgium_pop.csv')
The content of the CSV file is then:
age,sex\time,2015,2016,2017
0-9,F,633,635,634
0-9,M,663,665,664
10-17,F,484,486,491
10-17,M,505,511,516
18-66,F,3572,3581,3583
18-66,M,3600,3618,3616
67+,F,1023,1038,1053
67+,M,756,775,793
Note
In CSV or Excel files, the last dimension is horizontal and the names of the
last two dimensions are separated by a \
.
To load a saved array, call the function read_csv()
:
In [18]: pop = read_csv('belgium_pop.csv')
In [19]: pop
Out[19]:
age sex\year 2015 2016 2017
0-9 F 633 635 634
0-9 M 663 665 664
10-17 F 484 486 491
10-17 M 505 511 516
18-66 F 3572 3581 3583
18-66 M 3600 3618 3616
67+ F 1023 1038 1053
67+ M 756 775 793
Other input/output functions are described in the Input/Output section of the API documentation.
Selecting a subset¶
To select an element or a subset of an array, use brackets [ ]. In Python we usually use the term indexing for this operation.
Let us start by selecting a single element:
In [20]: pop['67+', 'F', 2017]
Out[20]: 1053
Labels can be given in arbitrary order:
In [21]: pop[2017, 'F', '67+']
Out[21]: 1053
When selecting a larger subset the result is an array:
In [22]: pop[2017]
Out[22]:
age\sex F M
0-9 634 664
10-17 491 516
18-66 3583 3616
67+ 1053 793
In [23]: pop['M']