Working With Sessions¶
Import the LArray library:
[2]:
from larray import *
Check the version of LArray:
[3]:
from larray import __version__
__version__
[3]:
'0.31'
Before To Continue¶
If you not yet comfortable with creating, saving and loading sessions, please read first the Creating Sessions and Loading and Dumping Sessions sections of the tutorial before going further.
Exploring Content¶
To get the list of items names of a session, use the names shortcut (be careful that the list is sorted alphabetically and does not follow the internal order!):
[4]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('population_session.h5')
s_pop = Session(filepath_hdf)
# print the content of the session
print(s_pop.names)
['births', 'country', 'deaths', 'even_years', 'gender', 'odd_years', 'pop', 'time']
To get more information of items of a session, the summary will provide not only the names of items but also the list of labels in the case of axes or groups and the list of axes, the shape and the dtype in the case of arrays:
[5]:
# print the content of the session
print(s_pop.summary())
country: country ['Belgium' 'France' 'Germany'] (3)
gender: gender ['Male' 'Female'] (2)
time: time [2013 2014 2015] (3)
even_years: time['2014'] >> even_years (1)
odd_years: time[2013 2015] >> odd_years (2)
births: country, gender, time (3 x 2 x 3) [int32]
deaths: country, gender, time (3 x 2 x 3) [int32]
pop: country, gender, time (3 x 2 x 3) [int32]
Selecting And Filtering Items¶
To select an item, simply use the syntax <session_var>.<item_name>
:
[6]:
s_pop.pop
[6]:
country gender\time 2013 2014 2015
Belgium Male 5472856 5493792 5524068
Belgium Female 5665118 5687048 5713206
France Male 31772665 31936596 32175328
France Female 33827685 34005671 34280951
Germany Male 39380976 39556923 39835457
Germany Female 41142770 41210540 41362080
To return a new session with selected items, use the syntax <session_var>[list, of, item, names]
:
[7]:
s_pop_new = s_pop['pop', 'births', 'deaths']
s_pop_new.names
[7]:
['births', 'deaths', 'pop']
The filter method allows you to select all items of the same kind (i.e. all axes, or groups or arrays) or all items with names satisfying a given pattern:
[8]:
# select only arrays of a session
s_pop.filter(kind=LArray)
[8]:
Session(births, deaths, pop)
[9]:
# selection all items with a name starting with a letter between a and k
s_pop.filter(pattern='[a-k]*')
[9]:
Session(country, gender, even_years, births, deaths)
Arithmetic Operations On Sessions¶
Session objects accept binary operations with a scalar:
[10]:
# get population, births and deaths in millions
s_pop_div = s_pop / 1e6
s_pop_div.pop
[10]:
country gender\time 2013 2014 2015
Belgium Male 5.472856 5.493792 5.524068
Belgium Female 5.665118 5.687048 5.713206
France Male 31.772665 31.936596 32.175328
France Female 33.827685 34.005671 34.280951
Germany Male 39.380976 39.556923 39.835457
Germany Female 41.14277 41.21054 41.36208
with an array (please read the documentation of the random.choice function first if you don’t know it):
[11]:
from larray import random
random_multiplicator = random.choice([0.98, 1.0, 1.02], p=[0.15, 0.7, 0.15], axes=s_pop.pop.axes)
random_multiplicator
[11]:
country gender\time 2013 2014 2015
Belgium Male 1.0 1.0 1.0
Belgium Female 1.0 1.0 1.0
France Male 0.98 0.98 1.0
France Female 1.0 1.0 1.0
Germany Male 0.98 1.0 1.0
Germany Female 1.0 1.0 1.02
[12]:
# multiply all variables of a session by a common array
s_pop_rand = s_pop * random_multiplicator
s_pop_rand.pop
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-12-5f82b3cbbdf9> in <module>
1 # multiply all variables of a session by a common array
----> 2 s_pop_rand = s_pop * random_multiplicator
3
4 s_pop_rand.pop
~/checkouts/readthedocs.org/user_builds/larray/conda/0.31/lib/python3.6/site-packages/larray-0.31-py3.6.egg/larray/core/session.py in opmethod(self, other)
941 res = []
942 for name in all_keys:
--> 943 self_item = self.get(name, nan)
944 other_operand = other.get(name, nan) if hasattr(other, 'get') else other
945 if arrays_only and not isinstance(self_item, LArray):
~/checkouts/readthedocs.org/user_builds/larray/conda/0.31/lib/python3.6/site-packages/larray-0.31-py3.6.egg/larray/core/session.py in get(self, key, default)
299 """
300 try:
--> 301 return self[key]
302 except KeyError:
303 return default
~/checkouts/readthedocs.org/user_builds/larray/conda/0.31/lib/python3.6/site-packages/larray-0.31-py3.6.egg/larray/core/session.py in __getitem__(self, key)
255 return Session([(name, self[name]) for name in truenames])
256 elif isinstance(key, (tuple, list)):
--> 257 assert all(isinstance(k, str) for k in key)
258 return Session([(k, self[k]) for k in key])
259 else:
AssertionError:
with another session:
[13]:
# compute the difference between each array of the two sessions
s_diff = s_pop - s_pop_rand
s_diff.births
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-13-db5241167ae2> in <module>
1 # compute the difference between each array of the two sessions
----> 2 s_diff = s_pop - s_pop_rand
3
4 s_diff.births
NameError: name 's_pop_rand' is not defined
Applying Functions On All Arrays¶
In addition to the classical arithmetic operations, the apply method can be used to apply the same function on all arrays. This function should take a single element argument and return a single value:
[14]:
# force conversion to type int
def as_type_int(array):
return array.astype(int)
s_pop_rand_int = s_pop_rand.apply(as_type_int)
print('pop array before calling apply:')
print(s_pop_rand.pop)
print()
print('pop array after calling apply:')
print(s_pop_rand_int.pop)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-14-5ba7352689a5> in <module>
3 return array.astype(int)
4
----> 5 s_pop_rand_int = s_pop_rand.apply(as_type_int)
6
7 print('pop array before calling apply:')
NameError: name 's_pop_rand' is not defined
It is possible to pass a function with additional arguments:
[15]:
# passing the LArray.astype method directly with argument
# dtype defined as int
s_pop_rand_int = s_pop_rand.apply(LArray.astype, dtype=int)
print('pop array before calling apply:')
print(s_pop_rand.pop)
print()
print('pop array after calling apply:')
print(s_pop_rand_int.pop)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-15-526833a6ec98> in <module>
1 # passing the LArray.astype method directly with argument
2 # dtype defined as int
----> 3 s_pop_rand_int = s_pop_rand.apply(LArray.astype, dtype=int)
4
5 print('pop array before calling apply:')
NameError: name 's_pop_rand' is not defined
It is also possible to apply a function on non-LArray objects of a session. Please refer the documentation of the apply method.
Comparing Sessions¶
Being able to compare two sessions may be useful when you want to compare two different models expected to give the same results or when you have updated your model and want to see what are the consequences of the recent changes.
Session objects provide the two methods to compare two sessions: equals and element_equals.
The equals
method will return True if all items from both sessions are identical, False otherwise:
[16]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('population_session.h5')
s_pop = Session(filepath_hdf)
# create a copy of the original session
s_pop_copy = Session(filepath_hdf)
# 'equals' returns True if all items of the two sessions have exactly the same items
s_pop.equals(s_pop_copy)
[16]:
True
[17]:
# create a copy of the original session but with the array
# 'births' slightly modified for some labels combination
s_pop_alternative = Session(filepath_hdf)
s_pop_alternative.births *= random_multiplicator
# 'equals' returns False if at least on item of the two sessions are different in values or axes
s_pop.equals(s_pop_alternative)
[17]:
False
[18]:
# add an array to the session
s_pop_new_output = Session(filepath_hdf)
s_pop_new_output.gender_ratio = s_pop_new_output.pop.ratio('gender')
# 'equals' returns False if at least on item is not present in the two sessions
s_pop.equals(s_pop_new_output)
[18]:
False
The element_equals
method will compare items of two sessions one by one and return an array of boolean values:
[19]:
# 'element_equals' compare arrays one by one
s_pop.element_equals(s_pop_copy)
[19]:
name country gender time even_years odd_years births deaths pop
True True True True True True True True
[20]:
# array 'births' is different between the two sessions
s_pop.element_equals(s_pop_alternative)
[20]:
name country gender time even_years odd_years births deaths pop
True True True True True False True True
The ==
operator return a new session with boolean arrays with elements compared element-wise:
[21]:
s_same_values = s_pop == s_pop_alternative
s_same_values.births
[21]:
country gender\time 2013 2014 2015
Belgium Male True True True
Belgium Female True True True
France Male False False True
France Female True True True
Germany Male False True True
Germany Female True True False
This also works for axes and groups:
[22]:
s_same_values.country
[22]:
country Belgium France Germany
True True True
The !=
operator does the opposite of ==
operator:
[23]:
s_different_values = s_pop != s_pop_alternative
s_different_values.births
[23]:
country gender\time 2013 2014 2015
Belgium Male False False False
Belgium Female False False False
France Male True True False
France Female False False False
Germany Male True False False
Germany Female False False True
A more visual way is to use the compare function which will open the Editor
.
compare(s_pop, s_pop_alternative, names=['baseline', 'lower_birth_rate'])