Working With Sessions¶
Import the LArray library:
[1]:
from larray import *
Before To Continue¶
If you not yet comfortable with creating, saving and loading sessions, please read first the Creating Sessions and Loading and Dumping Sessions sections of the tutorial before going further.
Exploring Content¶
To get the list of items names of a session, use the names shortcut (be careful that the list is sorted alphabetically and does not follow the internal order!):
[2]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('demography_eurostat.h5')
s_pop = Session(filepath_hdf)
# print the content of the session
print(s_pop.names)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-6d556d84191c> in <module>
1 # load a session representing the results of a demographic model
2 filepath_hdf = get_example_filepath('demography_eurostat.h5')
----> 3 s_pop = Session(filepath_hdf)
4
5 # print the content of the session
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in __init__(self, *args, **kwargs)
94 if isinstance(a0, str):
95 # assume a0 is a filename
---> 96 self.load(a0)
97 else:
98 # iterable of tuple or dict-like
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in load(self, fname, names, engine, display, **kwargs)
426 else:
427 handler = handler_cls(fname)
--> 428 metadata, objects = handler.read(names, display=display, **kwargs)
429 for k, v in objects.items():
430 self[k] = v
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/common.py in read(self, keys, *args, **kwargs)
128 print("loading", type, "object", key, "...", end=' ')
129 try:
--> 130 res[key] = self._read_item(key, type, *args, **kwargs)
131 except Exception:
132 if not ignore_exceptions:
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in _read_item(self, key, type, *args, **kwargs)
137 else:
138 raise TypeError()
--> 139 return read_hdf(self.handle, hdf_key, *args, **kwargs)
140
141 def _dump_item(self, key, value, *args, **kwargs):
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in read_hdf(filepath_or_buffer, key, fill_value, na, sort_rows, sort_columns, name, **kwargs)
81 cartesian_prod = writer != 'LArray'
82 res = df_asarray(pd_obj, sort_rows=sort_rows, sort_columns=sort_columns, fill_value=fill_value,
---> 83 parse_header=False, cartesian_prod=cartesian_prod)
84 if _meta is not None:
85 res.meta = _meta
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in df_asarray(df, sort_rows, sort_columns, raw, parse_header, wide, cartesian_prod, **kwargs)
338 unfold_last_axis_name = isinstance(axes_names[-1], basestring) and '\\' in axes_names[-1]
339 res = from_frame(df, sort_rows=sort_rows, sort_columns=sort_columns, parse_header=parse_header,
--> 340 unfold_last_axis_name=unfold_last_axis_name, cartesian_prod=cartesian_prod, **kwargs)
341
342 # ugly hack to avoid anonymous axes converted as axes with name 'Unnamed: x' by pandas
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in from_frame(df, sort_rows, sort_columns, parse_header, unfold_last_axis_name, fill_value, meta, cartesian_prod, **kwargs)
241 raise ValueError('sort_rows and sort_columns cannot not be used when cartesian_prod is set to False. '
242 'Please call the method sort_axes on the returned array to sort rows or columns')
--> 243 axes_labels = index_to_labels(df.index, sort=False)
244
245 # Pandas treats column labels as column names (strings) so we need to convert them to values
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in index_to_labels(idx, sort)
41 Returns unique labels for each dimension.
42 """
---> 43 if isinstance(idx, pd.core.index.MultiIndex):
44 if sort:
45 return list(idx.levels)
AttributeError: module 'pandas.core' has no attribute 'index'
To get more information of items of a session, the summary will provide not only the names of items but also the list of labels in the case of axes or groups and the list of axes, the shape and the dtype in the case of arrays:
[3]:
# print the content of the session
print(s_pop.summary())
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-aadafdb11856> in <module>
1 # print the content of the session
----> 2 print(s_pop.summary())
NameError: name 's_pop' is not defined
Selecting And Filtering Items¶
Session objects work like ordinary dict
Python objects. To select an item, use the usual syntax <session_var>['<item_name>']
:
[4]:
s_pop['pop']
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-4-c28bdf7ad79c> in <module>
----> 1 s_pop['pop']
NameError: name 's_pop' is not defined
A simpler way consists in the use the syntax <session_var>.<item_name>
:
[5]:
s_pop.pop
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-e889592ecf82> in <module>
----> 1 s_pop.pop
NameError: name 's_pop' is not defined
Warning: The syntax session_var.item_name
will work as long as you don’t use any special character like , ; :
in the item’s name.
To return a new session with selected items, use the syntax <session_var>[list, of, item, names]
:
[6]:
s_pop_new = s_pop['pop', 'births', 'deaths']
s_pop_new.names
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-6-68de9e1a10e0> in <module>
----> 1 s_pop_new = s_pop['pop', 'births', 'deaths']
2
3 s_pop_new.names
NameError: name 's_pop' is not defined
The filter method allows you to select all items of the same kind (i.e. all axes, or groups or arrays) or all items with names satisfying a given pattern:
[7]:
# select only arrays of a session
s_pop.filter(kind=Array)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-7af5f7f7d26e> in <module>
1 # select only arrays of a session
----> 2 s_pop.filter(kind=Array)
NameError: name 's_pop' is not defined
[8]:
# selection all items with a name starting with a letter between a and k
s_pop.filter(pattern='[a-k]*')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-8-a97970540334> in <module>
1 # selection all items with a name starting with a letter between a and k
----> 2 s_pop.filter(pattern='[a-k]*')
NameError: name 's_pop' is not defined
Iterating over Items¶
Like the built-in Python dict
objects, Session objects provide methods to iterate over items:
[9]:
# iterate over item names
for key in s_pop.keys():
print(key)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-9-88e5eb00e4f8> in <module>
1 # iterate over item names
----> 2 for key in s_pop.keys():
3 print(key)
NameError: name 's_pop' is not defined
[10]:
# iterate over items
for value in s_pop.values():
if isinstance(value, Array):
print(value.info)
else:
print(repr(value))
print()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-10-3bf2c61ed9b3> in <module>
1 # iterate over items
----> 2 for value in s_pop.values():
3 if isinstance(value, Array):
4 print(value.info)
5 else:
NameError: name 's_pop' is not defined
[11]:
# iterate over names and items
for key, value in s_pop.items():
if isinstance(value, Array):
print(key, ':')
print(value.info)
else:
print(key, ':', repr(value))
print()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-cd16090b2eff> in <module>
1 # iterate over names and items
----> 2 for key, value in s_pop.items():
3 if isinstance(value, Array):
4 print(key, ':')
5 print(value.info)
NameError: name 's_pop' is not defined
Arithmetic Operations On Sessions¶
Session objects accept binary operations with a scalar:
[12]:
# get population, births and deaths in millions
s_pop_div = s_pop / 1e6
s_pop_div.pop
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-12-70f958e0d076> in <module>
1 # get population, births and deaths in millions
----> 2 s_pop_div = s_pop / 1e6
3
4 s_pop_div.pop
NameError: name 's_pop' is not defined
with an array (please read the documentation of the random.choice function first if you don’t know it):
[13]:
from larray import random
random_increment = random.choice([-1, 0, 1], p=[0.3, 0.4, 0.3], axes=s_pop.pop.axes) * 1000
random_increment
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-13-dd062de4e6dc> in <module>
1 from larray import random
----> 2 random_increment = random.choice([-1, 0, 1], p=[0.3, 0.4, 0.3], axes=s_pop.pop.axes) * 1000
3 random_increment
NameError: name 's_pop' is not defined
[14]:
# add some variables of a session by a common array
s_pop_rand = s_pop['pop', 'births', 'deaths'] + random_increment
s_pop_rand.pop
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-14-2efaa60f64b2> in <module>
1 # add some variables of a session by a common array
----> 2 s_pop_rand = s_pop['pop', 'births', 'deaths'] + random_increment
3
4 s_pop_rand.pop
NameError: name 's_pop' is not defined
with another session:
[15]:
# compute the difference between each array of the two sessions
s_diff = s_pop - s_pop_rand
s_diff.births
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-15-db5241167ae2> in <module>
1 # compute the difference between each array of the two sessions
----> 2 s_diff = s_pop - s_pop_rand
3
4 s_diff.births
NameError: name 's_pop' is not defined
Applying Functions On All Arrays¶
In addition to the classical arithmetic operations, the apply method can be used to apply the same function on all arrays. This function should take a single element argument and return a single value:
[16]:
# add the next year to all arrays
def add_next_year(array):
if 'time' in array.axes.names:
last_year = array.time.i[-1]
return array.append('time', 0, last_year + 1)
else:
return array
s_pop_with_next_year = s_pop.apply(add_next_year)
print('pop array before calling apply:')
print(s_pop.pop)
print()
print('pop array after calling apply:')
print(s_pop_with_next_year.pop)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-16-edc573de51c2> in <module>
7 return array
8
----> 9 s_pop_with_next_year = s_pop.apply(add_next_year)
10
11 print('pop array before calling apply:')
NameError: name 's_pop' is not defined
It is possible to pass a function with additional arguments:
[17]:
# add the next year to all arrays.
# Use the 'copy_values_from_last_year flag' to indicate
# whether or not to copy values from the last year
def add_next_year(array, copy_values_from_last_year):
if 'time' in array.axes.names:
last_year = array.time.i[-1]
value = array[last_year] if copy_values_from_last_year else 0
return array.append('time', value, last_year + 1)
else:
return array
s_pop_with_next_year = s_pop.apply(add_next_year, True)
print('pop array before calling apply:')
print(s_pop.pop)
print()
print('pop array after calling apply:')
print(s_pop_with_next_year.pop)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-17-919adc40667d> in <module>
10 return array
11
---> 12 s_pop_with_next_year = s_pop.apply(add_next_year, True)
13
14 print('pop array before calling apply:')
NameError: name 's_pop' is not defined
It is also possible to apply a function on non-Array objects of a session. Please refer the documentation of the apply method.
Comparing Sessions¶
Being able to compare two sessions may be useful when you want to compare two different models expected to give the same results or when you have updated your model and want to see what are the consequences of the recent changes.
Session objects provide the two methods to compare two sessions: equals and element_equals:
The
equals
method will return True if all items from both sessions are identical, False otherwise.The
element_equals
method will compare items of two sessions one by one and return an array of boolean values.
[18]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('demography_eurostat.h5')
s_pop = Session(filepath_hdf)
# create a copy of the original session
s_pop_copy = s_pop.copy()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-e82a48f3fb0f> in <module>
1 # load a session representing the results of a demographic model
2 filepath_hdf = get_example_filepath('demography_eurostat.h5')
----> 3 s_pop = Session(filepath_hdf)
4
5 # create a copy of the original session
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in __init__(self, *args, **kwargs)
94 if isinstance(a0, str):
95 # assume a0 is a filename
---> 96 self.load(a0)
97 else:
98 # iterable of tuple or dict-like
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in load(self, fname, names, engine, display, **kwargs)
426 else:
427 handler = handler_cls(fname)
--> 428 metadata, objects = handler.read(names, display=display, **kwargs)
429 for k, v in objects.items():
430 self[k] = v
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/common.py in read(self, keys, *args, **kwargs)
128 print("loading", type, "object", key, "...", end=' ')
129 try:
--> 130 res[key] = self._read_item(key, type, *args, **kwargs)
131 except Exception:
132 if not ignore_exceptions:
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in _read_item(self, key, type, *args, **kwargs)
137 else:
138 raise TypeError()
--> 139 return read_hdf(self.handle, hdf_key, *args, **kwargs)
140
141 def _dump_item(self, key, value, *args, **kwargs):
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in read_hdf(filepath_or_buffer, key, fill_value, na, sort_rows, sort_columns, name, **kwargs)
81 cartesian_prod = writer != 'LArray'
82 res = df_asarray(pd_obj, sort_rows=sort_rows, sort_columns=sort_columns, fill_value=fill_value,
---> 83 parse_header=False, cartesian_prod=cartesian_prod)
84 if _meta is not None:
85 res.meta = _meta
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in df_asarray(df, sort_rows, sort_columns, raw, parse_header, wide, cartesian_prod, **kwargs)
338 unfold_last_axis_name = isinstance(axes_names[-1], basestring) and '\\' in axes_names[-1]
339 res = from_frame(df, sort_rows=sort_rows, sort_columns=sort_columns, parse_header=parse_header,
--> 340 unfold_last_axis_name=unfold_last_axis_name, cartesian_prod=cartesian_prod, **kwargs)
341
342 # ugly hack to avoid anonymous axes converted as axes with name 'Unnamed: x' by pandas
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in from_frame(df, sort_rows, sort_columns, parse_header, unfold_last_axis_name, fill_value, meta, cartesian_prod, **kwargs)
241 raise ValueError('sort_rows and sort_columns cannot not be used when cartesian_prod is set to False. '
242 'Please call the method sort_axes on the returned array to sort rows or columns')
--> 243 axes_labels = index_to_labels(df.index, sort=False)
244
245 # Pandas treats column labels as column names (strings) so we need to convert them to values
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in index_to_labels(idx, sort)
41 Returns unique labels for each dimension.
42 """
---> 43 if isinstance(idx, pd.core.index.MultiIndex):
44 if sort:
45 return list(idx.levels)
AttributeError: module 'pandas.core' has no attribute 'index'
[19]:
# 'element_equals' compare arrays one by one
s_pop.element_equals(s_pop_copy)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-19-7785a94587dd> in <module>
1 # 'element_equals' compare arrays one by one
----> 2 s_pop.element_equals(s_pop_copy)
NameError: name 's_pop' is not defined
[20]:
# 'equals' returns True if all items of the two sessions have exactly the same items
s_pop.equals(s_pop_copy)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-20-deeccf2589e2> in <module>
1 # 'equals' returns True if all items of the two sessions have exactly the same items
----> 2 s_pop.equals(s_pop_copy)
NameError: name 's_pop' is not defined
[21]:
# slightly modify the 'pop' array for some labels combination
s_pop_copy.pop += random_increment
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-21-01bcf10dfc4d> in <module>
1 # slightly modify the 'pop' array for some labels combination
----> 2 s_pop_copy.pop += random_increment
NameError: name 's_pop_copy' is not defined
[22]:
# the 'pop' array is different between the two sessions
s_pop.element_equals(s_pop_copy)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-22-53b7a4e33712> in <module>
1 # the 'pop' array is different between the two sessions
----> 2 s_pop.element_equals(s_pop_copy)
NameError: name 's_pop' is not defined
[23]:
# 'equals' returns False if at least one item of the two sessions are different in values or axes
s_pop.equals(s_pop_copy)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-23-1672284a9f98> in <module>
1 # 'equals' returns False if at least one item of the two sessions are different in values or axes
----> 2 s_pop.equals(s_pop_copy)
NameError: name 's_pop' is not defined
[24]:
# reset the 'copy' session as a copy of the original session
s_pop_copy = s_pop.copy()
# add an array to the 'copy' session
s_pop_copy.gender_ratio = s_pop_copy.pop.ratio('gender')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-24-c21041cc8798> in <module>
1 # reset the 'copy' session as a copy of the original session
----> 2 s_pop_copy = s_pop.copy()
3
4 # add an array to the 'copy' session
5 s_pop_copy.gender_ratio = s_pop_copy.pop.ratio('gender')
NameError: name 's_pop' is not defined
[25]:
# the 'gender_ratio' array is not present in the original session
s_pop.element_equals(s_pop_copy)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-25-08dab1cde8cb> in <module>
1 # the 'gender_ratio' array is not present in the original session
----> 2 s_pop.element_equals(s_pop_copy)
NameError: name 's_pop' is not defined
[26]:
# 'equals' returns False if at least one item is not present in the two sessions
s_pop.equals(s_pop_copy)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-26-c3385740b3c6> in <module>
1 # 'equals' returns False if at least one item is not present in the two sessions
----> 2 s_pop.equals(s_pop_copy)
NameError: name 's_pop' is not defined
The ==
operator return a new session with boolean arrays with elements compared element-wise:
[27]:
# reset the 'copy' session as a copy of the original session
s_pop_copy = s_pop.copy()
# slightly modify the 'pop' array for some labels combination
s_pop_copy.pop += random_increment
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-27-8bcf7020c24c> in <module>
1 # reset the 'copy' session as a copy of the original session
----> 2 s_pop_copy = s_pop.copy()
3
4 # slightly modify the 'pop' array for some labels combination
5 s_pop_copy.pop += random_increment
NameError: name 's_pop' is not defined
[28]:
s_check_same_values = s_pop == s_pop_copy
s_check_same_values.pop
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-28-5fd1ba111955> in <module>
----> 1 s_check_same_values = s_pop == s_pop_copy
2
3 s_check_same_values.pop
NameError: name 's_pop' is not defined
This also works for axes and groups:
[29]:
s_check_same_values.time
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-29-7558bc7abc67> in <module>
----> 1 s_check_same_values.time
NameError: name 's_check_same_values' is not defined
The !=
operator does the opposite of ==
operator:
[30]:
s_check_different_values = s_pop != s_pop_copy
s_check_different_values.pop
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-30-18e0b26d24cb> in <module>
----> 1 s_check_different_values = s_pop != s_pop_copy
2
3 s_check_different_values.pop
NameError: name 's_pop' is not defined
A more visual way is to use the compare function which will open the Editor
.
compare(s_pop, s_pop_alternative, names=['baseline', 'lower_birth_rate'])