Some Useful Functions¶
Import the LArray library:
[1]:
from larray import *
[2]:
# load 'demography_eurostat' dataset
demo_eurostat = load_example_data('demography_eurostat')
# extract the 'pop' array from the dataset
pop = demo_eurostat.pop
pop
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-ce337d55681e> in <module>
1 # load 'demography_eurostat' dataset
----> 2 demo_eurostat = load_example_data('demography_eurostat')
3
4 # extract the 'pop' array from the dataset
5 pop = demo_eurostat.pop
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/example.py in load_example_data(name)
91 if name not in AVAILABLE_EXAMPLE_DATA.keys():
92 raise ValueError("example_data must be chosen from list {}".format(list(AVAILABLE_EXAMPLE_DATA.keys())))
---> 93 return la.Session(AVAILABLE_EXAMPLE_DATA[name])
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in __init__(self, *args, **kwargs)
94 if isinstance(a0, str):
95 # assume a0 is a filename
---> 96 self.load(a0)
97 else:
98 # iterable of tuple or dict-like
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/core/session.py in load(self, fname, names, engine, display, **kwargs)
426 else:
427 handler = handler_cls(fname)
--> 428 metadata, objects = handler.read(names, display=display, **kwargs)
429 for k, v in objects.items():
430 self[k] = v
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/common.py in read(self, keys, *args, **kwargs)
128 print("loading", type, "object", key, "...", end=' ')
129 try:
--> 130 res[key] = self._read_item(key, type, *args, **kwargs)
131 except Exception:
132 if not ignore_exceptions:
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in _read_item(self, key, type, *args, **kwargs)
137 else:
138 raise TypeError()
--> 139 return read_hdf(self.handle, hdf_key, *args, **kwargs)
140
141 def _dump_item(self, key, value, *args, **kwargs):
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/hdf.py in read_hdf(filepath_or_buffer, key, fill_value, na, sort_rows, sort_columns, name, **kwargs)
81 cartesian_prod = writer != 'LArray'
82 res = df_asarray(pd_obj, sort_rows=sort_rows, sort_columns=sort_columns, fill_value=fill_value,
---> 83 parse_header=False, cartesian_prod=cartesian_prod)
84 if _meta is not None:
85 res.meta = _meta
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in df_asarray(df, sort_rows, sort_columns, raw, parse_header, wide, cartesian_prod, **kwargs)
338 unfold_last_axis_name = isinstance(axes_names[-1], basestring) and '\\' in axes_names[-1]
339 res = from_frame(df, sort_rows=sort_rows, sort_columns=sort_columns, parse_header=parse_header,
--> 340 unfold_last_axis_name=unfold_last_axis_name, cartesian_prod=cartesian_prod, **kwargs)
341
342 # ugly hack to avoid anonymous axes converted as axes with name 'Unnamed: x' by pandas
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in from_frame(df, sort_rows, sort_columns, parse_header, unfold_last_axis_name, fill_value, meta, cartesian_prod, **kwargs)
241 raise ValueError('sort_rows and sort_columns cannot not be used when cartesian_prod is set to False. '
242 'Please call the method sort_axes on the returned array to sort rows or columns')
--> 243 axes_labels = index_to_labels(df.index, sort=False)
244
245 # Pandas treats column labels as column names (strings) so we need to convert them to values
~/checkouts/readthedocs.org/user_builds/larray/conda/0.32/lib/python3.6/site-packages/larray-0.32-py3.6.egg/larray/inout/pandas.py in index_to_labels(idx, sort)
41 Returns unique labels for each dimension.
42 """
---> 43 if isinstance(idx, pd.core.index.MultiIndex):
44 if sort:
45 return list(idx.levels)
AttributeError: module 'pandas.core' has no attribute 'index'
with total¶
Add totals to one or several axes:
[3]:
pop.with_total('gender', label='Total')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-d5ae827e7ef1> in <module>
----> 1 pop.with_total('gender', label='Total')
NameError: name 'pop' is not defined
See with_total for more details and examples.
where¶
The where
function can be used to apply some computation depending on a condition:
[4]:
# where(condition, value if true, value if false)
where(pop < pop.mean('time'), -pop, pop)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-4-05a575f73c29> in <module>
1 # where(condition, value if true, value if false)
----> 2 where(pop < pop.mean('time'), -pop, pop)
NameError: name 'pop' is not defined
See where for more details and examples.
clip¶
Set all data between a certain range:
[5]:
# values below 10 millions are set to 10 millions
pop.clip(minval=10**7)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-13053b977351> in <module>
1 # values below 10 millions are set to 10 millions
----> 2 pop.clip(minval=10**7)
NameError: name 'pop' is not defined
[6]:
# values above 40 millions are set to 40 millions
pop.clip(maxval=4*10**7)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-6-e0b0b4c8bb9c> in <module>
1 # values above 40 millions are set to 40 millions
----> 2 pop.clip(maxval=4*10**7)
NameError: name 'pop' is not defined
[7]:
# values below 10 millions are set to 10 millions and
# values above 40 millions are set to 40 millions
pop.clip(10**7, 4*10**7)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-ba0330376a84> in <module>
1 # values below 10 millions are set to 10 millions and
2 # values above 40 millions are set to 40 millions
----> 3 pop.clip(10**7, 4*10**7)
NameError: name 'pop' is not defined
See clip for more details and examples.
divnot0¶
Replace division by 0 by 0:
[8]:
divisor = ones(pop.axes, dtype=int)
divisor['Male'] = 0
divisor
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-8-4dc88d9626bf> in <module>
----> 1 divisor = ones(pop.axes, dtype=int)
2 divisor['Male'] = 0
3 divisor
NameError: name 'pop' is not defined
[9]:
pop / divisor
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-9-75d7cd731c57> in <module>
----> 1 pop / divisor
NameError: name 'pop' is not defined
[10]:
# we use astype(int) since the divnot0 method
# returns a float array in this case while
# we want an integer array
pop.divnot0(divisor).astype(int)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-10-864c0d271282> in <module>
2 # returns a float array in this case while
3 # we want an integer array
----> 4 pop.divnot0(divisor).astype(int)
NameError: name 'pop' is not defined
See divnot0 for more details and examples.
ratio¶
The ratio
(rationot0
) method returns an array with all values divided by the sum of values along given axes:
[11]:
pop.ratio('gender')
# which is equivalent to
pop / pop.sum('gender')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-0468caf179b2> in <module>
----> 1 pop.ratio('gender')
2
3 # which is equivalent to
4 pop / pop.sum('gender')
NameError: name 'pop' is not defined
percents¶
[12]:
# or, if you want the previous ratios in percents
pop.percent('gender')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-12-25393219a8df> in <module>
1 # or, if you want the previous ratios in percents
----> 2 pop.percent('gender')
NameError: name 'pop' is not defined
See percent for more details and examples.
diff¶
The diff
method calculates the n-th order discrete difference along a given axis.
The first order difference is given by out[n+1] = in[n+1] - in[n]
along the given axis.
[13]:
# calculates 'diff[year+1] = pop[year+1] - pop[year]'
pop.diff('time')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-13-8de85c6279ec> in <module>
1 # calculates 'diff[year+1] = pop[year+1] - pop[year]'
----> 2 pop.diff('time')
NameError: name 'pop' is not defined
[14]:
# calculates 'diff[year+2] = pop[year+2] - pop[year]'
pop.diff('time', d=2)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-14-52d4a8a02e05> in <module>
1 # calculates 'diff[year+2] = pop[year+2] - pop[year]'
----> 2 pop.diff('time', d=2)
NameError: name 'pop' is not defined
[15]:
# calculates 'diff[year] = pop[year+1] - pop[year]'
pop.diff('time', label='lower')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-15-ac9f0715f28c> in <module>
1 # calculates 'diff[year] = pop[year+1] - pop[year]'
----> 2 pop.diff('time', label='lower')
NameError: name 'pop' is not defined
See diff for more details and examples.
growth_rate¶
The growth_rate
method calculates the growth along a given axis.
It is roughly equivalent to a.diff(axis, d, label) / a[axis.i[:-d]]
:
[16]:
pop.growth_rate('time')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-16-745057b9829b> in <module>
----> 1 pop.growth_rate('time')
NameError: name 'pop' is not defined
See growth_rate for more details and examples.
shift¶
The shift
method drops first label of an axis and shifts all subsequent labels
[17]:
pop.shift('time')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-17-f61f4cde41a9> in <module>
----> 1 pop.shift('time')
NameError: name 'pop' is not defined
[18]:
# when shift is applied on an (increasing) time axis,
# it effectively brings "past" data into the future
pop_shifted = pop.shift('time')
stack({'pop_shifted_2014': pop_shifted[2014], 'pop_2013': pop[2013]}, 'array')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-18-c18dcab698a8> in <module>
1 # when shift is applied on an (increasing) time axis,
2 # it effectively brings "past" data into the future
----> 3 pop_shifted = pop.shift('time')
4 stack({'pop_shifted_2014': pop_shifted[2014], 'pop_2013': pop[2013]}, 'array')
NameError: name 'pop' is not defined
See shift for more details and examples.
Other interesting functions¶
There are a lot more interesting functions that you can find in the API reference in sections Aggregation Functions, Miscellaneous and Utility Functions.