Change log¶
Version 0.32¶
Released on 2019-11-17.
CORE¶
Backward incompatible changes¶
Because it was broken, the possibility to dump and load Axis and Group objects contained in a session has been removed for the CSV and Excel formats. Fixing it would have taken too much time considering it is very rarely used (no one complains it was broken) so the decision to remove it was taken. However, this is still possible using the HDF format. Closes issue 815.
Miscellaneous improvements¶
conda channel to install or update the larray, larray-editor, larray-eurostat and larrayenv packages switched from
gdementen
tolarray-project
(closes issue 560).
Fixes¶
fixed binary operations between a session and an array object (closes issue 807).
fixed
Array.reindex()
printing a spurious warning message when the axes_to_reindex argument was the name of the axis to reindex (closes issue 812).fixed
zip_array_values()
andzip_array_items()
functions not available when importing the entire larray library asfrom larray import *
(closes issue 816).fixed wrong axes and groups names when loading a session from an HDF file (closes issue 803).
EDITOR¶
New features¶
added
debug()
function which opens an editor window with an extra widget to navigate back in the call stack (the chain of functions called to reach the current line of code).
Miscellaneous improvements¶
Sizes of the main window and the resizable components are saved when closing the viewer and restored when it is reopened (closes issue 165).
added keyword arguments
rtol
,atol
andnans_equal
to thecompare()
function (closes issue 172).run_editor_on_exception()
now usesdebug()
so that one can inspect what the state was in all functions traversed to reach the code which triggered the exception.
Version 0.31¶
Released on 2019-08-09.
added the
ExcelReport
class allowing to generate multiple graphs in an Excel file at once (closes issue 676).
fixed binary operations (+, -, *, etc.) between an LArray and a (scalar) Group which silently gave a wrong result (closes issue 797).
fixed taking a subset of an array with boolean labels for an axis if the user explicitly specify the axis (closes issue 735). When the user does not specify the axis, it currently fails but it is unclear what to do in that case (see issue 794).
fixed a regression in 0.30: X.axis_name[groups] failed when groups were originally defined on axes with the same name (i.e. when the operation was not actually needed). Closes issue 787.
Version 0.30¶
Released on 2019-06-27.
stack()
axis
argument was renamed toaxes
to reflect the fact that the function can now stack along multiple axes at once (see below).to accommodate for the “simpler pattern language” now supported for those functions, using a regular expression in
Axis.matching()
orGroup.matching()
now requires passing the pattern as an explicitregex
keyword argument instead of just the first argument of those methods. For examplemy_axis.matching('test.*')
becomesmy_axis.matching(regex='test.*')
.LArray.as_table()
is deprecated because it duplicated functionality found inLArray.dump()
. Please only useLArray.dump()
from now on.renamed
a_min
anda_max
arguments ofLArray.clip()
tominval
andmaxval
respectively and made them optional (closes issue 747).
modified the behavior of the
pattern
argument ofSession.filter()
to actually support patterns instead of only checking if the object names start with the pattern. Special characters include?
for matching any single character and*
for matching any number of characters. Closes issue 703.Warning
If you were using Session.filter, you must add a
*
to your pattern to keep your code working. For example,my_session.filter('test')
must be changed tomy_session.filter('test*')
.LArray.equals()
now returns True for arrays even when axes are in a different order or some axes are missing on either side (but the data is constant over that axis on the other side). Closes issue 237.Warning
If you were using
LArray.equals()
and want to keep the old, stricter, behavior, you must addcheck_axes=True
.
added
set_options()
andget_options()
functions to respectively set and get options for larray. Available options currently includedisplay_precision
for controlling the number of decimal digits used when showing floating point numbers,display_maxlines
to control the maximum number of lines to use when displaying an array, etc.set_options()
can used either like a normal function to set the options globally or within awith
block to set them only temporarily. Closes issue 274.implemented
read_stata()
andLArray.to_stata()
to read arrays from and write arrays to Stata .dta files.implemented
LArray.isin()
method to check whether each value of an array is contained in a list (or array) of values.implemented
LArray.unique()
method to compute unique values (or sub-arrays) for an array, optionally along axes.implemented
LArray.apply()
method to apply a python function to all values of an array or to all sub-arrays along some axes of an array and return the result. This is an extremely versatile method as it can be used both with aggregating functions or element-wise functions.implemented
LArray.apply_map()
method to apply a transformation mapping to array elements. For example, this can be used to transform some numeric codes to labels.implemented
LArray.reverse()
method to reverse one or several axes of an array (closes issue 631).implemented
LArray.roll()
method to roll the cells of an array n-times to the right along an axis. This is similar toLArray.shift()
, except that cells which are pushed “outside of the axis” are reintroduced on the opposite side of the axis instead of being dropped.implemented
Axis.apply()
method to transform an axis labels by a function and return a new Axis.added
Session.update()
method to add and modify items from an existing session by passing either another session or a dict-like object or an iterable object with (key, value) pairs (closes issue 754).implemented
AxisCollection.rename()
to rename axes of an AxisCollection, independently of any array.implemented
AxisCollection.set_labels()
(closes issue 782).implemented
wrap_elementwise_array_func()
function to make a function defined in another library work with LArray arguments instead of with numpy arrays.implemented
LArray.keys()
,LArray.values()
andLArray.items()
methods to respectively loop on an array labels, values or (key, value) pairs.implemented
zip_array_values()
andzip_array_items()
to loop respectively on several arrays values or (key, value) pairs.implemented
AxisCollection.iter_labels()
to iterate over all (possible combinations of) labels of the axes of the collection.
improved speed of
read_hdf()
function when reading a stored LArray object dumped with the current and future version of larray. To get benefit of the speedup of reading arrays dumped with older versions of larray, please read and re-dump them. Closes issue 563.allowed to not specify the axes in
LArray.set_labels()
(closes issue 634):>>> a = ndtest('nat=BE,FO;sex=M,F') >>> a nat\sex M F BE 0 1 FO 2 3 >>> a.set_labels({'M': 'Men', 'BE': 'Belgian'}) nat\sex Men F Belgian 0 1 FO 2 3
LArray.set_labels()
can now take functions to transform axes labels (closes issue 536).>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> arr.set_labels('a', str.upper) a\b b0 b1 A0 0 1 A1 2 3
implemented the same “simpler pattern language” in
Axis.matching()
andGroup.matching()
than inSession.filter()
, in addition to regular expressions (which now require using theregexp
argument).stack()
can now stack along several axes at once (closes issue 56).>>> country = Axis('country=BE,FR,DE') >>> gender = Axis('gender=M,F') >>> stack({('BE', 'M'): 0, ... ('BE', 'F'): 1, ... ('FR', 'M'): 2, ... ('FR', 'F'): 3, ... ('DE', 'M'): 4, ... ('DE', 'F'): 5}, ... (country, gender)) country\gender M F BE 0 1 FR 2 3 DE 4 5
stack()
using a dictionary as elements can now use a simple axis name instead of requiring a full axis object. This will print a warning on Python < 3.7 though because the ordering of labels is not guaranteed in that case. Closes issue 755 and issue 581.stack()
using keyword arguments can now use a simple axis name instead of requiring a full axis object, even on Python < 3.6. This will print a warning though because the ordering of labels is not guaranteed in that case.added password argument to
Workbook.save()
to allow protecting Excel files with a password.added option
exact
tojoin
argument ofAxis.align()
andLArray.align()
methods. Instead of aligning, passingjoin='exact'
to thealign
method will raise an error when axes are not equal. Closes issue 338.made
Axis.by()
andGroup.by()
return a list of named groups instead of anonymous groups. By default, group names are defined as<start>:<end>
. This can be changed via the newtemplate
argument:>>> age = Axis('age=0..6') >>> age Axis([0, 1, 2, 3, 4, 5, 6], 'age') >>> age.by(3) (age.i[0:3] >> '0:2', age.i[3:6] >> '3:5', age.i[6:7] >> '6') >>> age.by(3, step=2) (age.i[0:3] >> '0:2', age.i[2:5] >> '2:4', age.i[4:7] >> '4:6', age.i[6:7] >> '6') >>> age.by(3, template='{start}-{end}') (age.i[0:3] >> '0-2', age.i[3:6] >> '3-5', age.i[6:7] >> '6')
Closes issue 669.
allowed to specify an axis by its position when selecting a subset of an array using the string notation:
>>> pop_mouv = ndtest('geo_from=BE,FR,UK;geo_to=BE,FR,UK') >>> pop_mouv geo_from\geo_to BE FR UK BE 0 1 2 FR 3 4 5 UK 6 7 8 >>> pop_mouv['0[BE, UK]'] # equivalent to pop_mouv[pop_mouv.geo_from['BE,UK']] geo_from\geo_to BE FR UK BE 0 1 2 UK 6 7 8 >>> pop_mouv['1.i[0, 2]'] # equivalent to pop_mouv[pop_mouv.geo_to.i[0, 2]] geo_from\geo_to BE UK BE 0 2 FR 3 5 UK 6 8
Closes issue 671.
added documentation and examples for
where()
,maximum()
andminimum()
functions (closes issue 700)updated the
Working With Sessions
section of the tutorial (closes issue 568).added dtype argument to LArray to set the type of the array explicitly instead of relying on auto-detection.
added dtype argument to stack to set the type of the resulting array explicitly instead of relying on auto-detection.
allowed to pass a single axis or group as
axes_to_reindex
argument of theLArray.reindex()
method (closes issue 712).LArray.dump()
gained a few extra arguments to further customize output : - axes_names : to specify whether or not the output should contain the axes names (and which) - maxlines and edgeitems : to dump only the start and end of large arrays - light : to output axes labels only when they change instead of repeating them on each line - na_repr : to specify how to represent N/A (NaN) valuessubstantially improved performance of creating, iterating, and doing a few other operations over larray objects. This solves a few pathological cases of slow operations, especially those involving many small-ish arrays but sadly the overall performance improvement is negligible over most of the real-world models using larray that we tested these changes on.
fixed dumping to Excel arrays of “object” dtype containing NaN values using numpy float types (fixes the infamous 65535 bug).
fixed
LArray.divnot0()
being slow when the divisor has many axes and many zeros (closes issue 705).fixed maximum length of sheet names (31 characters instead of 30 characters) when adding a new sheet to an Excel Workbook (closes issue 713).
fixed missing documentation of many functions in Utility Functions section of the API Reference (closes issue 698).
fixed arithmetic operations between two sessions returning a nan value for each axis and group (closes issue 725).
fixed dumping sessions with metadata in HDF format (closes issue 702).
fixed minimum version of pandas to install. The minimum version is now 0.20.0.
fixed from_frame for dataframes with non string index names.
fixed creating an LSet from an IGroup with a (single) scalar key
>>> a = Axis('a=a0,a1,a2') >>> a.i[1].set() a['a1'].set()
Version 0.29¶
Released on 2018-09-07.
Syntax changes¶
deprecated
title
attribute ofLArray
objects andtitle
argument of array creation functions. A title is now considered as a metadata and must be added as:>>> # add title at array creation >>> arr = ndtest((3, 3), meta=[('title', 'array for testing')])
>>> # or after array creation >>> arr = ndtest((3, 3)) >>> arr.meta.title = 'array for testing'
See below for more information about metadata handling.
renamed
LArray.drop_labels()
toLArray.ignore_labels()
to avoid confusion with the newLArray.drop()
method (closes issue 672).renamed
Session.array_equals()
toSession.element_equals()
because this method now also compares axes and groups in addition to arrays.renamed
Sheet.load()
andRange.load()
nb_index
argument tonb_axes
to be consistent with all other input functions (read_*).Sheet
andRange
are the objects one gets when taking subsets of the excelWorkbook
objects obtained viaopen_excel()
(closes issue 648).deprecated the
element_equal()
function in favor of theLArray.eq()
method (closes issue 630) to be consistent with other future methods for operations between two arrays.renamed
nan_equals
argument ofLArray.equals()
andLArray.eq()
methods tonans_equal
because it is grammatically more correct and is explained more naturally as “whether two nans should be considered equal”.LArray.insert()
pos
andaxis
arguments are deprecated because those were only useful for very specific cases and those can easily be rewritten by using an indices group (axis.i[pos]
) for thebefore
argument instead (closes issue 652).
New features¶
allowed arrays to have metadata (e.g. title, description, authors, …).
Metadata can be added when creating arrays:
>>> # for Python <= 3.5 >>> arr = ndtest((3, 3), meta=[('title', 'array for testing'), ('author', 'John Smith')])
>>> # for Python >= 3.6 >>> arr = ndtest((3, 3), meta=Metadata(title='array for testing', author='John Smith'))
To access all existing metadata, use
array.meta
, for example:>>> arr.meta title: array for testing author: John Smith
To access some specific existing metadata, use
array.meta.<name>
, for example:>>> arr.meta.author 'John Smith'
Updating some existing metadata, or creating new metadata (the metadata is added if there was no metadata using that name) should be done using
array.meta.<name> = <value>
. For example:>>> arr.meta.city = 'London'
To remove some metadata, use
del array.meta.<name>
, for example:>>> del arr.meta.city
Note
Currently, only the HDF (.h5) file format supports saving and loading array metadata.
Metadata is not kept when actions or methods are applied on an array except for operations modifying the object in-place, such as pop[age < 10] = 0, and when the method copy() is called. Do not add metadata to an array if you know you will apply actions or methods on it before dumping it.
allowed sessions to have metadata. Session metadata is created and accessed using the same syntax than for arrays (
session.meta.<name>
), for example to add metadata to a session at creation:>>> # Python <= 3.5 >>> s = Session([('arr1', ndtest(2)), ('arr2', ndtest(3)], meta=[('title', 'my title'), ('author', 'John Smith')])
>>> # Python 3.6+ >>> s = Session(arr1=ndtest(2), arr2=ndtest(3), meta=Metadata(title='my title', author='John Smith'))
Note
Contrary to array metadata, saving and loading session metadata is supported for all current session file formats: Excel, CSV and HDF (.h5)
Metadata is not kept when actions or methods are applied on a session except for operations modifying a specific array, such as: s[‘arr1’] = 0. Do not add metadata to a session if you know you will apply actions or methods on it before dumping it.
Closes issue 640.
implemented
LArray.drop()
to return an array without some labels or indices along an axis (closes issue 506).>>> arr1 = ndtest((2, 4)) >>> arr1 a\b b0 b1 b2 b3 a0 0 1 2 3 a1 4 5 6 7 >>> a, b = arr1.axes
Dropping a single label
>>> arr1.drop('b1') a\b b0 b2 b3 a0 0 2 3 a1 4 6 7
Dropping multiple labels
>>> # arr1.drop('b1,b3') >>> arr1.drop(['b1', 'b3']) a\b b0 b2 a0 0 2 a1 4 6
Dropping a slice
>>> # arr1.drop('b1:b3') >>> arr1.drop(b['b1':'b3']) a\b b0 a0 0 a1 4
Dropping labels by position requires to specify the axis
>>> # arr1.drop('b.i[1]') >>> arr1.drop(b.i[1]) a\b b0 b2 b3 a0 0 2 3 a1 4 6 7
added new module to create arrays with values generated randomly following a few different distributions, or shuffle an existing array along an axis:
>>> from larray.random import *
Generate integers between two bounds (0 and 10 in this example)
>>> randint(0, 10, axes='a=a0..a2') a a0 a1 a2 3 6 2
Generate values following a uniform distribution
>>> uniform(axes='a=a0..a2') a a0 a1 a2 0.33293756929238394 0.5331412592583252 0.6748786766763107
Generate values following a normal distribution (\(\mu\) = 1 and \(\sigma\) = 2 in this example)
>>> normal(1, scale=2, axes='a=a0..a2') a a0 a1 a2 -0.9216651561025018 5.119734598931103 4.4467876992838935
Randomly shuffle an existing array along one axis
>>> arr = ndtest((3, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 6 7 8 >>> permutation(arr, axis='b') a\b b1 b2 b0 a0 1 2 0 a1 4 5 3 a2 7 8 6
Generate values by randomly choosing between specified values (5, 10 and 15 in this example), potentially with a specified probability for each value (respectively a 30%, 50%, 20% probability of occurring in this example).
>>> choice([5, 10, 15], p=[0.3, 0.5, 0.2], axes='a=a0,a1;b=b0..b2') a\b b0 b1 b2 a0 15 10 10 a1 10 5 10
Same as above with labels and probabilities given as a one dimensional LArray
>>> proba = LArray([0.3, 0.5, 0.2], Axis([5, 10, 15], 'outcome')) >>> proba outcome 5 10 15 0.3 0.5 0.2 >>> choice(p=proba, axes='a=a0,a1;b=b0..b2') a\b b0 b1 b2 a0 10 15 5 a1 10 5 10
made a few useful constants accessible directly from the larray module:
nan
,inf
,pi
,e
andeuler_gamma
. Like for any Python functionality, you can choose how to import and use them. For example, forpi
:>>> from larray import * >>> pi 3.141592653589793 OR >>> from larray import pi >>> pi 3.141592653589793 OR >>> import larray as la >>> la.pi 3.141592653589793
added
Group.equals()
method which compares group names, associated axis names and labels between two groups:>>> a = Axis('a=a0..a3') >>> a02 = a['a0:a2'] >> 'group_a' >>> # different group name >>> a02.equals(a['a0:a2']) False >>> # different axis name >>> other_axis = a.rename('other_name') >>> a02.equals(other_axis['a0:a2'] >> 'group_a') False >>> # different labels >>> a02.equals(a['a1:a3'] >> 'group_a') False
Miscellaneous improvements¶
completely rewritten the ‘Load And Dump Arrays, Sessions, Axes And Groups’ section of the tutorial (closes issue 645)
saving or loading a session from a file now includes Axis and Group objects in addition to arrays (closes issue 578).
Create a session containing axes, groups and arrays
>>> a, b = Axis("a=a0..a2"), Axis("b=b0..b2") >>> a01 = a['a0,a1'] >> 'a01' >>> arr1, arr2 = ndtest((a, b)), ndtest(a) >>> s = Session([('a', a), ('b', b), ('a01', a01), ('arr1', arr1), ('arr2', arr2)])
Saving a session will save axes, groups and arrays
>>> s.save('session.h5')
Loading a session will load axes, groups and arrays
>>> s2 = s.load('session.h5') >>> s2 Session(arr1, arr2, a, b, a01)
Note
All axes and groups of a session are stored in the same CSV file/Excel sheet/HDF group named respectively
__axes__
and__groups__
.vastly improved indexing using arrays (of labels, indices or booleans). Many advanced cases did not work, including when combining several indexing arrays, or when (one of) the indexing array(s) had an axis present in the array.
First let’s create some test axes
>>> a, b, c = ndtest((2, 3, 2)).axes
Then create a test array.
>>> arr = ndtest((a, b)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
If the key array has an axis not already present in arr (e.g. c), the target axis (a) is replaced by the extra axis (c). This already worked previously.
>>> key = LArray(['a1', 'a0'], c) >>> key c c0 c1 a1 a0 >>> arr[key] c\b b0 b1 b2 c0 3 4 5 c1 0 1 2
If the key array has the target axis, the axis stays the same, but the data is reordered (this also worked previously):
>>> key = LArray(['b1', 'b0', 'b2'], b) >>> key b b0 b1 b2 b1 b0 b2 >>> arr[key] a\b b0 b1 b2 a0 1 0 2 a1 4 3 5
From here on, the examples shown did not work previously…
Now, if the key contains another axis present in the array (b) which is not the target axis (a), the target axis completely disappears (both axes are replaced by the key axis):
>>> key = LArray(['a0', 'a1', 'a0'], b) >>> key b b0 b1 b2 a0 a1 a0 >>> arr[key] b b0 b1 b2 0 4 2
If the key has both the target axis (a) and another existing axis (b)
>>> key a\b b0 b1 b2 a0 a0 a1 a0 a1 a1 a0 a1 >>> arr[key] a\b b0 b1 b2 a0 0 4 2 a1 3 1 5
If the key has both another existing axis (a) and an extra axis (c)
>>> key a\c c0 c1 a0 b0 b1 a1 b2 b0 >>> arr[key] a\c c0 c1 a0 0 1 a1 5 3
It also works if the key has the target axis (a), another existing axis (b) and an extra axis (c), but this is not shown for brevity.
updated
Session.summary()
so as to display all kinds of objects and allowed to pass a function returning a string representation of an object instead of passing a pre-defined string template (closes issue 608):>>> axis1 = Axis("a=a0..a2") >>> group1 = axis1['a0,a1'] >> 'a01' >>> arr1 = ndtest((2, 2), title='array 1', dtype=np.int64) >>> arr2 = ndtest(4, title='array 2', dtype=np.int64) >>> arr3 = ndtest((3, 2), title='array 3', dtype=np.int64) >>> s = Session([('axis1', axis1), ('group1', group1), ('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
Using the default template
>>> print(s.summary()) axis1: a ['a0' 'a1' 'a2'] (3) group1: a['a0', 'a1'] >> a01 (2) arr1: a, b (2 x 2) [int64] array 1 arr2: a (4) [int64] array 2 arr3: a, b (3 x 2) [int64] array 3
Using a specific template
>>> def print_array(key, array): ... axes_names = ', '.join(array.axes.display_names) ... shape = ' x '.join(str(i) for i in array.shape) ... return "{} -> {} ({})\\n title = {}\\n dtype = {}".format(key, axes_names, shape, ... array.title, array.dtype) >>> template = {Axis: "{key} -> {name} [{labels}] ({length})", ... Group: "{key} -> {name}: {axis_name} {labels} ({length})", ... LArray: print_array} >>> print(s.summary(template)) axis1 -> a ['a0' 'a1' 'a2'] (3) group1 -> a01: a ['a0', 'a1'] (2) arr1 -> a, b (2 x 2) title = array 1 dtype = int64 arr2 -> a (4) title = array 2 dtype = int64 arr3 -> a, b (3 x 2) title = array 3 dtype = int64
methods
Session.equals()
andSession.element_equals()
now also compare axes and groups in addition to arrays (closes issue 610):>>> a = Axis('a=a0..a2') >>> a01 = a['a0,a1'] >> 'a01' >>> s1 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
Identical sessions
>>> s1.element_equals(s2) name a a01 arr1 arr2 True True True True
Different value(s) between two arrays
>>> s2.arr1['a1'] = 0 >>> s1.element_equals(s2) name a a01 arr1 arr2 True True False True
Different label(s)
>>> s2.arr2 = ndtest("b=b0,b1; a=a0,a1") >>> s2.a = Axis('a=a0,a1') >>> s1.element_equals(s2) name a a01 arr1 arr2 False True False False
Extra/missing objects
>>> s2.arr3 = ndtest((3, 3)) >>> del s2.a >>> s1.element_equals(s2) name a a01 arr1 arr2 arr3 False True False False False
added arguments
wide
andvalue_name
to methodsLArray.as_table()
andLArray.dump()
like inLArray.to_excel()
andLArray.to_csv()
(closes issue 653).the
from_series()
function supports Pandas series with a MultiIndex (closes issue 465)the
stack()
function supports any array-like object instead of only LArray objects.>>> stack(a0=[1, 2, 3], a1=[4, 5, 6], axis='a') {0}*\a a0 a1 0 1 4 1 2 5 2 3 6
made some operations on Excel Workbooks a bit faster by telling Excel to avoid updating the screen when the Excel instance is not visible anyway. This affects all workbooks opened via
open_excel()
as well asread_excel()
andLArray.to_excel()
when using the defaultxlwings
engine.made the documentation link in Windows start menu version-specific (instead of always pointing to the latest release) so that users do not inadvertently use the latest release syntax when using an older version of larray (closes issue 142).
added menu bar with undo/redo when editing single arrays (as a byproduct of issue 133).
Fixes¶
fixed Copy(to Excel)/Paste/Plot in the editor not working for 1D and 2D arrays (closes issue 140).
fixed Excel add-ins not loaded when opening an Excel Workbook by calling the
LArray.to_excel()
method with no path or via “Copy to Excel (CTRL+E)” in the editor (closes issue 154).made LArray support Pandas versions >= 0.21 (closes issue 569)
fixed current active Excel Workbook being closed when calling the
LArray.to_excel()
method on an array with-1
asfilepath
argument (closes issue 473).fixed
LArray.split_axes()
when splitting a single axis and using the names argument (e.g.arr.split_axes('bd', names=('b', 'd'))
).fixed splitting an anonymous axis without specifying the names argument.
>>> combined = ndtest('a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2') >>> combined {0} a0_b0 a0_b1 a0_b2 a1_b0 a1_b1 a1_b2 0 1 2 3 4 5 >>> combined.split_axes(0) {0}\{1} b0 b1 b2 a0 0 1 2 a1 3 4 5
fixed
LArray.combine_axes()
withwildcard=True
.fixed taking a subset of an array by giving an index along a specific axis using a string (strings like
"axisname.i[pos]"
).fixed the editor not working with Python 2 or recent Qt4 versions.
Version 0.28¶
Released on 2018-03-15.
Backward incompatible changes¶
changed behavior of operators session1 == session2 and session1 != session2: returns a session of boolean arrays (closes issue 516):
>>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> (s1 == s2).arr1 a a0 a1 True True >>> s2.arr1['a1'] = 0 >>> (s1 == s2).arr1 a a0 a1 True False >>> (s1 != s2).arr1 a a0 a1 False True
New features¶
made it possible to run the tutorial online (as a Jupyter notebook) by clicking on the
launch|binder
badge on top of the tutorial web page (closes issue 73)added methods array_equals and equals to Session object to compare arrays from two sessions. The method array_equals return a boolean value for each array while the method equals returns a unique boolean value (True if all arrays of both sessions are equal, False otherwise):
>>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s1.array_equals(s2) name arr1 arr2 True True >>> s1.equals(s2) True
Different value(s)
>>> s2.arr1['a1'] = 0 >>> s1.array_equals(s2) name arr1 arr2 False True >>> s1.equals(s2) False
Different label(s)
>>> from larray import ndrange >>> s2.arr2 = ndrange("b=b0,b1; a=a0,a1") >>> s1.array_equals(s2) name arr1 arr2 False False >>> s1.equals(s2) False
Extra/missing array(s)
>>> s2.arr3 = ndtest((3, 3)) >>> s1.array_equals(s2) name arr1 arr2 arr3 False False False >>> s1.equals(s2) False
Closes issue 517.
added method equals to LArray object to compare two arrays:
>>> arr1 = ndtest((2, 3)) >>> arr1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr2 = arr1.copy() >>> arr1.equals(arr2) True >>> arr2['b1'] += 1 >>> arr1.equals(arr2) False >>> arr3 = arr1.set_labels('a', ['x0', 'x1']) >>> arr1.equals(arr3) False
Arrays with nan values
>>> arr1 = ndtest((2, 3), dtype=float) >>> arr1['a1', 'b1'] = nan >>> arr1 a\b b0 b1 b2 a0 0.0 1.0 2.0 a1 3.0 nan 5.0 >>> arr2 = arr1.copy() >>> # By default, an array containing nan values is never equal to another array, >>> # even if that other array also contains nan values at the same positions. >>> # The reason is that a nan value is different from *anything*, including itself. >>> arr1.equals(arr2) False >>> # set flag nan_equal to True to override this behavior >>> arr1.equals(arr2, nan_equal=True) True
This method also includes the arguments rtol (relative tolerance) and atol (absolute tolerance) allowing to test the equality between two arrays within a given relative or absolute tolerance:
>>> arr1 = LArray([6., 8.], "a=a0,a1") >>> arr1 a a0 a1 6.0 8.0 >>> arr2 = LArray([5.999, 8.001], "a=a0,a1") >>> arr2 a a0 a1 5.999 8.001 >>> arr1.equals(arr2) False >>> # equals returns True if abs(array1 - array2) <= (atol + rtol * abs(array2)) >>> arr1.equals(arr2, atol=0.01) True >>> arr1.equals(arr2, rtol=0.01) True
added Load from Script in the File menu of the editor allowing to load commands from an existing Python file (closes issue 96).
added Edit menu allowing to undo and redo changes of array values by editing cells and removed Apply and Discard buttons. Changes are now kept when switching from an array to another instead of losing them as previously (closes issue 32).
allowed to provide an absolute or relative tolerance value when comparing arrays through the compare function (closes issue 131).
made the editor able to detect and display plot objects stored in tuple, list or arrays. For example, arrays of plot objects are returned when using subplots=True option in calls of plot method:
>>> a = ndtest('sex=M,F; nat=BE,FO; year=2000..2017') >>> # display 4 plots vertically placed (one plot for each pair (sex, nationality)) >>> a.plot(subplots=True) >>> # display 4 plots ordered in a 2 x 2 grid >>> a.plot(subplots=True, layout=(2, 2))
Closes issue 135.
Miscellaneous improvements¶
functions local_arrays, global_arrays and arrays returns a session excluding arrays starting by an underscore by default. To include them, set the flag include_private to True (closes issue 513):
>>> global_arr1 = ndtest((2, 2)) >>> _global_arr2 = ndtest((3, 3)) >>> def foo(): ... local_arr1 = ndtest(2) ... _local_arr2 = ndtest(3) ... ... # exclude arrays starting with '_' by default ... s = arrays() ... print(s.names) ... ... # use flag 'include_private' to include arrays starting with '_' ... s = arrays(include_private=True) ... print(s.names) >>> foo() ['global_arr1', 'local_arr1'] ['_global_arr2', '_local_arr2', 'global_arr1', 'local_arr1']
implemented sessions binary operations with non sessions objects (closes issue 514 and issue 515):
>>> s = Session(arr1=ndtest((2, 2)), arr2=ndtest((3, 3))) >>> s.arr1 a\b b0 b1 a0 0 1 a1 2 3 >>> s.arr2 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 6 7 8
Add a scalar to all arrays
>>> # equivalent to s2 = 3 + s >>> s2 = s + 3 >>> s2.arr1 a\b b0 b1 a0 3 4 a1 5 6 >>> s2.arr2 a\b b0 b1 b2 a0 3 4 5 a1 6 7 8 a2 9 10 11
Apply binary operations between two sessions
>>> sdiff = (s2 - s) / s >>> sdiff.arr1 a\b b0 b1 a0 inf 3.0 a1 1.5 1.0 >>> sdiff.arr2 a\b b0 b1 b2 a0 inf 3.0 1.5 a1 1.0 0.75 0.6 a2 0.5 0.43 0.375
added possibility to call the method reindex with a group (closes issue 531):
>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> b = Axis("b=b2..b0") >>> arr.reindex('b', b['b1':]) a\b b1 b0 a0 1 0 a1 3 2
added possibility to call the methods diff and growth_rate with a group (closes issue 532):
>>> data = [[2, 4, 5, 4, 6], [4, 6, 3, 6, 9]] >>> a = LArray(data, "sex=M,F; year=2016..2020") >>> a sex\year 2016 2017 2018 2019 2020 M 2 4 5 4 6 F 4 6 3 6 9 >>> a.diff(a.year[2017:]) sex\year 2018 2019 2020 M 1 -1 2 F -3 3 3 >>> a.growth_rate(a.year[2017:]) sex\year 2018 2019 2020 M 0.25 -0.2 0.5 F -0.5 1.0 0.5
function ndrange has been deprecated in favor of sequence or ndtest. Also, an Axis or a list/tuple/collection of axes can be passed to the ndtest function (closes issue 534):
>>> ndtest("nat=BE,FO;sex=M,F") nat\sex M F BE 0 1 FO 2 3
allowed to pass a group for argument axis of stack function (closes issue 535):
>>> b = Axis('b=b0..b2') >>> stack(b0=ndtest(2), b1=ndtest(2), axis=b[:'b1']) a\b b0 b1 a0 0 0 a1 1 1
renamed argument nb_index of read_csv, read_excel, read_sas, from_lists and from_string functions as nb_axes. The relation between nb_index and nb_axes is given by nb_axes = nb_index + 1:
For a given file ‘arr.csv’ with content
a,b\c,c0,c1 a0,b0,0,1 a0,b1,2,3 a1,b0,4,5 a1,b1,6,7
previous code to read this array such as :
>>> # deprecated >>> arr = read_csv('arr.csv', nb_index=2)
must be updated as follow :
>>> arr = read_csv('arr.csv', nb_axes=3)
Closes issue 548.
deprecated nan_equal function in favor of element_equal function. The element_equal function has the same optional arguments as the LArray.equals method but compares two arrays element-wise and returns an array of booleans:
>>> arr1 = LArray([6., np.nan, 8.], "a=a0..a2") >>> arr1 a a0 a1 a2 6.0 nan 8.0 >>> arr2 = LArray([5.999, np.nan, 8.001], "a=a0..a2") >>> arr2 a a0 a1 a2 5.999 nan 8.001 >>> element_equal(arr1, arr2) a a0 a1 a2 False False False >>> element_equal(arr1, arr2, nan_equals=True) a a0 a1 a2 False True False >>> element_equal(arr1, arr2, atol=0.01, nan_equals=True) a a0 a1 a2 True True True >>> element_equal(arr1, arr2, rtol=0.01, nan_equals=True) a a0 a1 a2 True True True
Closes issue 593.
renamed argument transpose by wide in to_csv method.
added argument wide in to_excel method. When argument wide is set to False, the array is exported in “narrow” format, i.e. one column per axis plus one value column:
>>> arr = ndtest((2, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
Default behavior (wide=True):
>>> arr.to_excel('my_file.xlsx') a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
With wide=False:
>>> arr.to_excel('my_file.xlsx', wide=False) a b value a0 b0 0 a0 b1 1 a0 b2 2 a1 b0 3 a1 b1 4 a1 b2 5
Argument transpose has a different purpose than wide and is mainly useful to allow multiple axes as header when exporting arrays with more than 2 dimensions. Closes issue 575 and issue 371.
added argument wide to read_csv and read_excel functions. If False, the array to be loaded is assumed to be stored in “narrow” format:
>>> # assuming the array was saved using command: arr.to_excel('my_file.xlsx', wide=False) >>> read_excel('my_file.xlsx', wide=False) a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
Closes issue 574.
added argument name to to_series method allowing to set a name to the Pandas Series returned by the method.
added argument value_name to to_csv and to_excel allowing to change the default name (‘value’) to the column containg the values when the argument wide is set to False:
>>> arr.to_csv('my_file.csv', wide=False, value_name='data') a,b,data a0,b0,0 a0,b1,1 a0,b2,2 a1,b0,3 a1,b1,4 a1,b2,5
Closes issue 549.
renamed argument sheetname of read_excel function as sheet (closes issue 587).
Renamed sheet_name of LArray.to_excel to sheet since it can also be an index (closes issue 580).
allowed to create axes with zero padded string labels (closes issue 533):
>>> Axis('zero_padding=01,02,03,10,11,12') Axis(['01', '02', '03', '10', '11', '12'], 'zero_padding')
added a dropdown menu containing recently used files in dialog boxes of Save Command History To Script and Load from Script from File menu.
Fixes¶
fixed passing a scalar group from an external axis to get a subset of an array (closes issue 178):
>>> arr = ndtest((3, 2)) >>> arr['a1'] b b0 b1 2 3 >>> alt_a = Axis("alt_a=a1..a2") >>> arr[alt_a['a1']] b b0 b1 2 3 >>> arr[alt_a.i[0]] b b0 b1 2 3
fixed subscript a string LGroup key (closes issue 437):
>>> axis = Axis("a=a0,a1") >>> axis['a0'][0] 'a'
fixed Axis.union, Axis.intersection and Axis.difference when passed value is a single string (closes issue 489):
>>> a = Axis('a=a0..a2') >>> a.union('a1') Axis(['a0', 'a1', 'a2'], 'a') >>> a.union('a3') Axis(['a0', 'a1', 'a2', 'a3'], 'a') >>> a.union('a1..a3') Axis(['a0', 'a1', 'a2', 'a3'], 'a') >>> a.intersection('a1..a3') Axis(['a1', 'a2'], 'a') >>> a.difference('a1..a3') Axis(['a0'], 'a')
fixed to_excel applied on >= 2D arrays using transpose=True (closes issue 579)
>>> arr = ndtest((2, 3)) >>> arr.to_excel('my_file.xlsx', transpose=True) b\a a0 a1 b0 0 3 b1 1 4 b2 2 5
fixed aggregation on arrays containing zero padded string labels (closes issue 522):
>>> arr = ndtest('zero_padding=01,02,03,10,11,12') >>> arr zero_padding 01 02 03 10 11 12 0 1 2 3 4 5 >>> arr.sum('01,02,03 >> 01_03; 10') zero_padding 01_03 10 3 3
Version 0.27¶
Released on 2017-11-30.
Syntax changes¶
Backward incompatible changes¶
labels are checked during array subset assignment (closes issue 269):
>>> arr = ndtest(4) >>> arr a a0 a1 a2 a3 0 1 2 3 >>> arr['a0,a1'] = arr['a2,a3'] ValueError: incompatible axes: Axis(['a0', 'a1'], 'a') vs Axis(['a2', 'a3'], 'a')
previous behavior can be recovered through drop_labels or by changing labels via set_labels or set_axes:
>>> arr['a0,a1'] = arr['a2,a3'].drop_labels('a') >>> arr['a0,a1'] = arr['a2,a3'].set_labels('a', {'a2': 'a0', 'a3': 'a1'})
from_frame parse_header argument defaults to False instead of True.
New features¶
implemented Axis.insert and LArray.insert to add values at a given position of an axis (closes issue 54).
>>> arr1 = ndtest((2, 3)) >>> arr1 a\\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr1.insert(42, before='b1', label='b0.5') a\\b b0 b0.5 b1 b2 a0 0 42 1 2 a1 3 42 4 5
insert an array
>>> arr2 = ndtest(2) >>> arr2 a a0 a1 0 1 >>> arr1.insert(arr2, after='b0', label='b0.5') a\\b b0 b0.5 b1 b2 a0 0 0 1 2 a1 3 1 4 5
insert an array which already has the axis
>>> arr3 = ndrange('a=a0,a1;b=b0.1,b0.2') + 42 >>> arr3 a\\b b0.1 b0.2 a0 42 43 a1 44 45 >>> arr1.insert(arr3, before='b1') a\\b b0 b0.1 b0.2 b1 b2 a0 0 42 43 1 2 a1 3 44 45 4 5
added new items in the Help menu of the editor:
Report Issue…: to report an issue on the Github project website.
Users Discussion…: redirect to the LArray Users Google Group (you need to be registered to participate).
New Releases And Announces Mailing List…: redirect to the LArray Announce mailing list.
About: give information about the editor and the versions of packages currently installed on your computer (closes issue 88).
added Save Command History To Script in the File menu of the editor allowing to save executed commands in a new or existing Python file.
added possibility to show only rows with differences when comparing arrays or sessions through the compare function in the editor (closes issue 102).
added ascending argument to methods indicesofsorted and labelsofsorted. Values are sorted in ascending order by default. Set to False to sort values in descending order:
>>> arr = LArray([[1, 5], [3, 2], [0, 4]], "nat=BE,FR,IT; sex=M,F") >>> arr nat\sex M F BE 1 5 FR 3 2 IT 0 4 >>> arr.indicesofsorted("nat", ascending=False) nat\sex M F 0 1 0 1 0 2 2 2 1 >>> arr.labelsofsorted("nat", ascending=False) nat\sex M F 0 FR BE 1 BE IT 2 IT FR
Closes issue 490.
Miscellaneous improvements¶
allowed to sort values of an array along an axis (closes issue 225):
>>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE") >>> a sex\nat EU FO BE M 10 2 4 F 3 7 1 >>> a.sort_values(axis='sex') sex*\nat EU FO BE 0 3 2 1 1 10 7 4 >>> a.sort_values(axis='nat') sex\nat* 0 1 2 M 2 4 10 F 1 3 7
method LArray.sort_values can be called without argument (closes issue 478):
>>> arr = LArray([0, 1, 6, 3, -1], "a=a0..a4") >>> arr a a0 a1 a2 a3 a4 0 1 6 3 -1 >>> arr.sort_values() a a4 a0 a1 a3 a2 -1 0 1 3 6
If the array has more than one dimension, axes are combined together:
>>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE") >>> a sex\nat EU FO BE M 10 2 4 F 3 7 1 >>> a.sort_values() sex_nat F_BE M_FO F_EU M_BE F_FO M_EU 1 2 3 4 7 10
when appending/prepending/extending an array, both the original array and the added values will be converted to a data type which can hold both without loss of information. It used to convert the added values to the type of the original array. For example, given an array of integers like:
>>> arr = ndtest(3) a a0 a1 a2 0 1 2
Trying to add a floating point number to that array used to result in:
>>> arr.append('a', 2.5, 'a3') a a0 a1 a2 a3 0 1 2 2
Now it will result in:
>>> arr.append('a', 2.5, 'a3') a a0 a1 a2 a3 0.0 1.0 2.0 2.5
made the editor more responsive when switching to or changing the filter of large arrays (closes issue 93).
added support for coloring numeric values for object arrays (e.g. arrays containing both strings and numbers).
documentation links in the Help menu of the editor point to the version of the documentation corresponding to the installed version of larray (closes issue 105).
Fixes¶
fixed array values being editable in view() (instead of only in edit()).
Version 0.26.1¶
Released on 2017-10-25.
Miscellaneous improvements¶
Made handling Excel sheets with many blank columns/rows after the data much faster (but still slower than sheets without such blank cells).
Fixes¶
fixed reading from and writing to Excel sheets with 16384 columns or 1048576 rows (Excel’s maximum).
fixed LArray.split_axes using a custom separator and not using sort=True or when the split labels are ambiguous with labels from other axes (closes issue 485).
fixed reading 1D arrays with non-string labels (closes issue 495).
fixed read_csv(sort_columns=True) for 1D arrays (closes issue 497).
Version 0.26¶
Released on 2017-10-13.
Syntax changes¶
renamed special variable x to X to let users define an x variable in their code without breaking all subsequent code using that special variable (closes issue 167).
renamed Axis.startswith, endswith and matches to startingwith, endingwith and matching to avoid a possible confusion with str.startswith and endswith which return booleans (closes issue 432).
renamed na argument of read_csv, read_excel, read_hdf and read_sas functions to fill_value to avoid confusion as to what the argument does and to be consistent with reindex and align (closes issue 394).
renamed split_axis to split_axes to reflect the fact that it can now split several axes at once (see below).
renamed sort_axis to sort_axes to reflect the fact that it can sort multiple axes at once (and does so by default).
renamed several methods with more explicit names (closes issue 50):
argmax, argmin, argsort to labelofmax, labelofmin, labelsofsorted
posargmax, posargmin, posargsort to indexofmax, indexofmin, indicesofsorted
renamed PGroup to IGroup to be consistent with other methods, especially the .i methods on axes and arrays (I is for Index – P was for Position).
Backward incompatible changes¶
getting a subset using a boolean selection returns an array with labels combined with underscore by defaults (for consistency with split_axes and combine_axes). Closes issue 376:
>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> arr[arr < 3] a_b a0_b0 a0_b1 a1_b0 0 1 2
New features¶
added global_arrays() and arrays() functions to complement the local_arrays() function. They return a Session containing respectively all arrays defined in global variables and all available arrays (whether they are defined in local or global variables).
When used outside of a function, these three functions should have the same results, but inside a function local_arrays() will return only arrays local to the function, global_arrays() will return only arrays defined globally and arrays() will return arrays defined either locally or globally. Closes issue 416.
a * symbol is appended to the window title when unsaved changes are detected in the viewer (closes issue 21).
implemented Axis.containing to create a Group with all labels of an axis containing some substring (closes issue 402).
>>> people = Axis(['Bruce Wayne', 'Bruce Willis', 'Arthur Dent'], 'people') >>> people.containing('Will') people['Bruce Willis']
implemented Group.containing, startingwith, endingwith and matching to create a group with all labels of a group matching some criterion (closes issue 108).
>>> group = people.startingwith('Bru') >>> group people['Bruce Wayne', 'Bruce Willis'] >>> group.containing('Will') people['Bruce Willis']
implemented nan_equal() function to create an array of booleans telling whether each cell of the first array is equal to the corresponding cell in the other array, even in the presence of NaN.
>>> arr1 = ndtest(3, dtype=float) >>> arr1['a1'] = nan >>> arr1 a a0 a1 a2 0.0 nan 2.0 >>> arr2 = arr1.copy() >>> arr1 == arr2 a a0 a1 a2 True False True >>> nan_equal(arr1, arr2) a a0 a1 a2 True True True
implemented from_frame() to convert a Pandas DataFrame to an array:
>>> df = ndtest((2, 2, 2)).to_frame() >>> df c c0 c1 a b a0 b0 0 1 b1 2 3 a1 b0 4 5 b1 6 7 >>> from_frame(df) a b\\c c0 c1 a0 b0 0 1 a0 b1 2 3 a1 b0 4 5 a1 b1 6 7
implemented Axis.split to split an axis into several.
>>> a_b = Axis('a_b=a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2') >>> a_b.split() [Axis(['a0', 'a1'], 'a'), Axis(['b0', 'b1', 'b2'], 'b')]
added the possibility to load the example dataset used in the tutorial via the menu
File > Load Example
in the viewer
Miscellaneous improvements¶
view() and edit() without argument now display global arrays in addition to local ones (closes issue 54).
using the mouse scrollwheel on filter combo boxes will switch to the previous/next label.
implemented a combobox to choose which color gradient to use and provide a few gradients.
inverted background colors in the viewer (red for low values and blue for high values). Closes issue 18.
allowed to pass an array of labels as new_axis argument to reindex method (closes issue 384):
>>> arr = ndrange('a=v0..v1;b=v0..v2') >>> arr a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 >>> arr.reindex('a', arr.b.labels) a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 v2 nan nan nan
allowed to call the reindex method using a differently named axis for labels (closes issue 386):
>>> arr = ndrange('a=v0..v1;b=v0..v2') >>> arr a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 >>> arr.reindex('a', arr.b) a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 v2 nan nan nan
arguments fill_value, sort_rows and sort_columns of read_excel function are also supported by the default xlwings engine (closes issue 393).
allowed to pass a label or group as sheet_name argument of the method to_excel or to a Workbook (open_excel). Same for key argument of the method to_hdf. Closes issue 328.
>>> arr = ndtest((4, 4, 4))
>>> # iterate over labels of a given axis >>> with open_excel('my_file.xlsx') as wb: >>> for label in arr.a: ... wb[label] = arr[label].dump() ... wb.save() >>> for label in arr.a: ... arr[label].to_hdf('my_file.h5', label)
>>> # create and use a group >>> even = arr.a['a0,a2'] >> 'even' >>> arr[even].to_excel('my_file.xlsx', even) >>> arr[even].to_hdf('my_file.h5', even)
>>> # special characters : \ / ? * [ or ] in labels or groups are replaced by an _ when exporting to excel >>> # sheet names cannot exceed 31 characters >>> g = arr.a['a1,a3,a4'] >> '?name:with*special\/[char]' >>> arr[g].to_excel('my_file.xlsx', g) >>> print(open_excel('my_file.xlsx').sheet_names()) ['_name_with_special___char_'] >>> # special characters \ or / in labels or groups are replaced by an _ when exporting to HDF file
allowed to pass a Group to read_excel/read_hdf as sheetname/key argument (closes issue 439).
>>> a, b, c = arr.a, arr.b, arr.c
>>> # For Excel >>> new_from_excel = zeros((a, b, c), dtype=int) >>> for label in a: ... new_from_excel[label] = read_excel('my_file.xlsx', label) >>> # But, to avoid loading the file in Excel repeatedly (which is very inefficient), >>> # this particular example should rather be written like this: >>> new_from_excel = zeros((a, b, c), dtype=int) >>> with open_excel('my_file.xlsx') as wb: ... for label in a: ... new_from_excel[label] = wb[label].load()
>>> # For HDF >>> new_from_hdf = zeros((a, b, c), dtype=int) >>> for label in a: ... new_from_hdf[label] = read_hdf('my_file.h5', label)
allowed setting the name of a Group using another Group or Axis (closes issue 341):
>>> arr = ndrange('axis=a,a0..a3,b,b0..b3,c,c0..c3') >>> arr axis a a0 a1 a2 a3 b b0 b1 b2 b3 c c0 c1 c2 c3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> # matches('^.$') will select labels with only one character: 'a', 'b' and 'c' >>> groups = tuple(arr.axis.startswith(code) >> code for code in arr.axis.matches('^.$')) >>> groups (axis['a', 'a0', 'a1', 'a2', 'a3'] >> 'a', axis['b', 'b0', 'b1', 'b2', 'b3'] >> 'b', axis['c', 'c0', 'c1', 'c2', 'c3'] >> 'c') >>> arr.sum(groups) axis a b c 10 35 60
allowed to test if an array contains a label using the in operator (closes issue 343):
>>> arr = ndrange('age=0..99;sex=M,F') >>> 'M' in arr True >>> 'Male' in arr False >>> # this can be useful for example in an 'if' statement >>> if 102 not in arr: ... # with 'reindex', we extend 'age' axis to 102 ... arr = arr.reindex('age', Axis('age=0..102'), fill_value=0) >>> arr.info 103 x 2 age [103]: 0 1 2 ... 100 101 102 sex [2]: 'M' 'F'
allowed to create a group on an axis using labels of another axis (closes issue 362):
>>> year = Axis('year=2000..2017') >>> even_year = Axis(range(2000, 2017, 2), 'even_year') >>> group_even_year = year[even_year] >>> group_even_year year[2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016]
split_axes (formerly split_axis) now allows to split several axes at once (closes issue 366):
>>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1') >>> combined a_b\c_d c0_d0 c0_d1 c1_d0 c1_d1 a0_b0 0 1 2 3 a0_b1 4 5 6 7 a1_b0 8 9 10 11 a1_b1 12 13 14 15 >>> combined.split_axes(['a_b', 'c_d']) a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15 >>> combined.split_axes({'a_b': ('A', 'B'), 'c_d': ('C', 'D')}) A B C\D d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15
argument axes of split_axes has become optional: defaults to all axes whose name contains the specified delimiter (closes issue 365):
>>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1') >>> combined a_b\c_d c0_d0 c0_d1 c1_d0 c1_d1 a0_b0 0 1 2 3 a0_b1 4 5 6 7 a1_b0 8 9 10 11 a1_b1 12 13 14 15 >>> combined.split_axes() a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15
allowed to perform several axes combinations at once with the combine_axes() method (closes issue 382):
>>> arr = ndtest((2, 2, 2, 2)) >>> arr a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15 >>> arr.combine_axes([('a', 'c'), ('b', 'd')]) a_c\b_d b0_d0 b0_d1 b1_d0 b1_d1 a0_c0 0 1 4 5 a0_c1 2 3 6 7 a1_c0 8 9 12 13 a1_c1 10 11 14 15 >>> # set output axes names by passing a dictionary >>> arr.combine_axes({('a', 'c'): 'ac', ('b', 'd'): 'bd'}) ac\bd b0_d0 b0_d1 b1_d0 b1_d1 a0_c0 0 1 4 5 a0_c1 2 3 6 7 a1_c0 8 9 12 13 a1_c1 10 11 14 15
allowed to use keyword arguments in set_labels (closes issue 383):
>>> a = ndrange('nat=BE,FO;sex=M,F') >>> a nat\sex M F BE 0 1 FO 2 3 >>> a.set_labels(sex='Men,Women', nat='Belgian,Foreigner') nat\sex Men Women Belgian 0 1 Foreigner 2 3
allowed passing an axis to set_labels as ‘labels’ argument (closes issue 408).
added data type (dtype) to array.info (closes issue 454):
>>> arr = ndtest((2, 2), dtype=float) >>> arr a\b b0 b1 a0 0.0 1.0 a1 2.0 3.0 >>> arr.info 2 x 2 a [2]: 'a0' 'a1' b [2]: 'b0' 'b1' dtype: float64
To create a 1D array using from_string() and the default separator ” “, a tabulation character
\t
(instead of-
previously) must be added in front of the data line:>>> from_string('''sex M F ... \t 0 1''') sex M F 0 1
viewer window title also includes the dtype of the current displayed array (closes issue 85)
viewer window title uses only the file name instead of the entire file path as it made titles too long in some cases.
when editing .csv files, the viewer window title will be “directoryfname.csv - axes_info” instead of having the file name repeated as before (“dirfname.csv - fname: axes_info”).
the viewer will not update digits/scientific notation nor colors when the filter changes, so that numbers are more easily comparable when quickly changing the filter, especially using the scrollwheel on filter boxes.
NaN values display as grey in the viewer so that they stand out more.
compare() will color values depending on relative difference instead of absolute difference as this is usually more useful.
compare(sessions) uses nan_equal to compare arrays so that identical arrays are not marked different when they contain NaN values.
changed compare() “stacked axis” names: arrays -> array and sessions -> session because that reads a bit more naturally.
Fixes¶
fixed array creation with axis(es) given as string containing only one label (axis name and label were inverted).
fixed reading an array from a CSV or Excel file when the columns axis is not explicitly named (via
\
). For example, let’s say we want to read a CSV file ‘pop.csv’ with the following content (indented for clarity)sex, 2015, 2016 F, 11, 13 M, 12, 10
The result of function read_csv is:
>>> pop = read_csv('pop.csv') >>> pop sex\{1} 2015 2016 F 11 13 M 12 10
Closes issue 372.
fixed converting a 1xN Pandas DataFrame to an array using aslarray (closes issue 427):
>>> df = pd.DataFrame([[1, 2, 3]], index=['a0'], columns=['b0', 'b1', 'b2']) >>> df b0 b1 b2 a0 1 2 3 >>> aslarray(df) {0}\{1} b0 b1 b2 a0 1 2 3
>>> # setting name to index and columns >>> df.index.name = 'a' >>> df.columns.name = 'b' >>> df b b0 b1 b2 a a0 1 2 3 >>> aslarray(df) a\b b0 b1 b2 a0 1 2 3
fixed original file being deleted when trying to overwrite a file via Session.save or open_excel failed (closes issue 441)
fixed loading arrays from Excel sheets containing blank cells below or right of the array to read (closes issue 443)
fixed unary and binary operations between sessions failing entirely when the operation failed/was invalid on any array. Now the result will be nan for that array but the operation will carry on for other arrays.
fixed stacking sessions failing entirely when the stacking failed on any array. Now the result will be nan for that array but the operation will carry on for other arrays.
fixed stacking arrays with anonymous axes.
fixed applying split_axes on an array with labels of type ‘Object’ (could happen when an array is read from a file).
fixed background color in the viewer when using filters in the compare() dialog (closes issue 66)
fixed autoresize of columns by double clicking between column headers (closes issue 43)
fixed representing a 0D array (scalar) in the viewer (closes issue 71)
fixed viewer not displaying an error message when saving or loading a file failed (closes issue 75)
fixed array.split_axis when the combined axis does not contain all the combination of labels resulting from the split (closes issue 369).
fixed array.split_axis when combined labels are not sorted by the first part then second part (closes issue 364).
fixed opening .csv files in the editor will create variables named using only the filename without extension (instead of being named using the full path of the file – making it almost useless). Closes issue 90.
fixed deleting a variable (using the del key in the list) not marking the session/file as being modified.
fixed the link to the tutorial (Help->Online Tutorial) (closes issue 92).
fixed inplace modifications of arrays in the console (via array[xxx] = value) not updating the view (closes issue 94).
fixed background color in compare() being wrong after changing axes order by drag-and-dropping them (closes issue 89).
fixed the whole array/compare being the same color in the presence of -inf or +inf in the array.
Version 0.25.2¶
Released on 2017-09-06.
Miscellaneous improvements¶
Excel Workbooks opened with open_excel(visible=False) will use the global Excel instance by default and those using visible=True will use a new Excel instance by default (closes issue 405).
Fixes¶
fixed view() which did not show any array (closes issue 57).
fixed exceptions in the viewer crashing it when a Qt app was created (e.g. from a plot) before the viewer was started (closes issue 58).
fixed compare() arrays names not being determined correctly (closes issue 61).
fixed filters and title not being updated when displaying array created via the console (closes issue 55).
fixed array grid not being updated when selecting a variable when no variable was selected (closes issue 56).
fixed copying or plotting multiple rows in the editor when they were selected via drag and drop on headers (closes issue 59).
fixed digits not being automatically updated when changing filters.
Version 0.25.1¶
Released on 2017-09-04.
Miscellaneous improvements¶
Deprecated methods display a warning message when they are still used (replaced DeprecationWarning by FutureWarning). Closes issue 310.
updated documentation of method with_total (closes issue 89).
trying to set values of a subset by passing an array with incompatible axes displays a better error message (closes issue 268).
Fixes¶
fixed error raised in viewer when switching between arrays when a filter was set.
fixed displaying empty array when starting the viewer or a new session in it.
fixed Excel instance created via to_excel() and open_excel() without any filename being closed at the end of the Python program (closes issue 390).
fixed the view(), edit() and compare() functions not being available in the viewer console.
fixed row and column resizing by double clicking on the edge of an header cell.
fixed New and Open in the menu File of the viewer when IPython console is not available.
fixed getting a subset of an array by mixing boolean filters and other filters (closes issue 246):
>>> arr = ndrange('a=a0..a2;b=0..3') >>> arr a\b 0 1 2 3 a0 0 1 2 3 a1 4 5 6 7 a2 8 9 10 11 >>> arr['a0,a2', x.b < 2] a\b 0 1 a0 0 1 a2 8 9
Warning: when mixed with other filters, boolean filters are limited to one dimension.
fixed setting an array values using array.points[key] = value when value is an LArray (closes issue 368).
fixed using syntax ‘int..int’ in a selection (closes issue 350):
>>> arr = ndrange('a=2017..2012') >>> arr a 2017 2016 2015 2014 2013 2012 0 1 2 3 4 5 >>> arr['2012..2015'] a 2012 2013 2014 2015 5 4 3 2
fixed mixing ‘..’ sequences and spaces in an indexing string (closes issue 389):
>>> arr = ndtest(7) >>> arr a a0 a1 a2 a3 a4 a5 a6 0 1 2 3 4 5 6 >>> arr['a0, a2, a4..a6'] a a0 a2 a4 a5 a6 0 2 4 5 6
fixed indexing/aggregating using groups with renaming (using >>) when the axis has mixed type labels (object dtype).
Version 0.25¶
Released on 2017-08-22.
New features¶
viewer functions (view, edit and compare) have been moved to the separate larray-editor package, which needs to be installed separately, unless you are using larrayenv. Closes issue 332.
installing larray-editor (or larrayenv) from conda environment creates a new menu ‘LArray’ in the Windows start menu. It contains a link to open the documentation, a shortcut to launch the user interface in edition mode and a shortcut to update larrayenv. Closes issue 281.
added possibility to transpose an array in the viewer by dragging and dropping axes’ names in the filter bar.
implemented array.align(other_array) which makes two arrays compatible with each other (by making all common axes compatible). This is done by adding, removing or reordering labels for each common axis according to the join method used:
outer: will use a label if it is in either arrays axis (ordered like the first array). This is the default as it results in no information loss.
inner: will use a label if it is in both arrays axis (ordered like the first array)
left: will use the first array axis labels
right: will use the other array axis labels
The fill value for missing labels defaults to nan.
>>> arr1 = ndtest((2, 3)) >>> arr1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr2 = -ndtest((3, 2)) >>> # reorder array to make the test more interesting >>> arr2 = arr2[['b1', 'b0']] >>> arr2 a\\b b1 b0 a0 -1 0 a1 -3 -2 a2 -5 -4
Align arr1 and arr2
>>> aligned1, aligned2 = arr1.align(arr2) >>> aligned1 a\b b0 b1 b2 a0 0.0 1.0 2.0 a1 3.0 4.0 5.0 a2 nan nan nan >>> aligned2 a\b b0 b1 b2 a0 0.0 -1.0 nan a1 -2.0 -3.0 nan a2 -4.0 -5.0 nan
After aligning all common axes, one can then do operations between the two arrays
>>> aligned1 + aligned2 a\b b0 b1 b2 a0 0.0 0.0 nan a1 1.0 1.0 nan a2 nan nan nan
The fill value for missing labels defaults to nan but can be changed to any compatible value.
>>> aligned1, aligned2 = arr1.align(arr2, fill_value=0) >>> aligned1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 0 0 0 >>> aligned2 a\b b0 b1 b2 a0 0 -1 0 a1 -2 -3 0 a2 -4 -5 0 >>> aligned1 + aligned2 a\b b0 b1 b2 a0 0 0 2 a1 1 1 5 a2 -4 -5 0
implemented Session.transpose(axes) to reorder axes of all arrays within a session, ignoring missing axes for each array. For example, let us first create a test session and a small helper function to display sessions as a short summary.
>>> arr1 = ndtest((2, 2, 2)) >>> arr2 = ndtest((2, 2)) >>> sess = Session([('arr1', arr1), ('arr2', arr2)]) >>> def print_summary(s): ... print(s.summary("{name} -> {axes_names}")) >>> print_summary(sess) arr1 -> a, b, c arr2 -> a, b
Put the ‘b’ axis in front of all arrays
>>> print_summary(sess.transpose('b')) arr1 -> b, a, c arr2 -> b, a
Axes missing on an array are ignored (‘c’ for arr2 in this case)
>>> print_summary(sess.transpose('c', 'b')) arr1 -> c, b, a arr2 -> b, a
Use … to move axes to the end
>>> print_summary(sess.transpose(..., 'a')) arr1 -> b, c, a arr2 -> b, a
implemented unary operations on Session, which means one can negate all arrays in a Session or take the absolute value of all arrays in a Session without writing an explicit loop for that.
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(4) - 1 >>> arr2 a a0 a1 a2 a3 -1 0 1 2 >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)]) >>> sess2 = -sess1 >>> sess2.arr1 a a0 a1 0 -1 >>> sess2.arr2 a a0 a1 a2 a3 1 0 -1 -2 >>> sess3 = abs(sess1) >>> sess3.arr2 a a0 a1 a2 a3 1 0 1 2
implemented stacking sessions using stack().
Let us first create two test sessions. For example suppose we have a session storing the results of a baseline simulation:
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(3) >>> arr2 a a0 a1 a2 0 1 2 >>> baseline = Session([('arr1', arr1), ('arr2', arr2)])
and another session with a variant
>>> arr1variant = arr1 * 2 >>> arr1variant a a0 a1 0 2 >>> arr2variant = 2 - arr2 / 2 >>> arr2variant a a0 a1 a2 2.0 1.5 1.0 >>> variant = Session([('arr1', arr1variant), ('arr2', arr2variant)])
then we stack them together
>>> stacked = stack([('baseline', baseline), ('variant', variant)], 'sessions') >>> stacked Session(arr1, arr2) >>> stacked.arr1 a\sessions baseline variant a0 0 0 a1 1 2 >>> stacked.arr2 a\sessions baseline variant a0 0.0 2.0 a1 1.0 1.5 a2 2.0 1.0
Combined with the fact that we can compute some very simple expressions on sessions, this can be extremely useful to quickly compare all arrays of several sessions (e.g. simulation variants):
>>> diff = variant - baseline >>> # compute the absolute difference and relative difference for each array of the sessions >>> stacked = stack([('baseline', baseline), ('variant', variant), ('diff', diff), ('abs diff', abs(diff)), ('rel diff', diff / baseline)], 'sessions') >>> stacked Session(arr1, arr2) >>> stacked.arr2 a\sessions baseline variant diff abs diff rel diff a0 0.0 2.0 2.0 2.0 inf a1 1.0 1.5 0.5 0.5 0.5 a2 2.0 1.0 -1.0 1.0 -0.5
implemented Axis.align(other_axis) and AxisCollection.align(other_collection) which makes two axes / axis collections compatible with each other, see LArray.align above.
implemented Session.apply(function) to apply a function to all elements (arrays) of a Session and return a new Session.
Let us first create a test session
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(3) >>> arr2 a a0 a1 a2 0 1 2 >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)]) >>> sess1 Session(arr1, arr2)
Then define the function we want to apply to all arrays of our session
>>> def increment(element): ... return element + 1
Apply it
>>> sess2 = sess1.apply(increment) >>> sess2.arr1 a a0 a1 1 2 >>> sess2.arr2 a a0 a1 a2 1 2 3
implemented setting the value of multiple points using array.points[labels] = value
>>> arr = ndtest((3, 4)) >>> arr a\b b0 b1 b2 b3 a0 0 1 2 3 a1 4 5 6 7 a2 8 9 10 11
Now, suppose you want to retrieve several specific combinations of labels, for example (a0, b1), (a0, b3), (a1, b0) and (a2, b2). You could write a loop like this:
>>> values = [] >>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]: ... values.append(arr[a, b]) >>> values [1, 3, 4, 10]
but you could also (this already worked in previous versions) use array.points like:
>>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']] a,b a0,b1 a0,b3 a1,b0 a2,b2 1 3 4 10
which has the advantages of being both much faster and keep more information. Now suppose you want to set the value of those points, you could write:
>>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]: ... arr[a, b] = 42 >>> arr a\b b0 b1 b2 b3 a0 0 42 2 42 a1 42 5 6 7 a2 8 9 42 11
but now you can also use the faster alternative:
>>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']] = 42
Miscellaneous improvements¶
added icon to display in Windows start menu and editor windows.
viewer keeps labels visible even when scrolling (label rows and columns are now frozen).
added ‘Getting Started’ section in documentation.
implemented axes argument to ipfp to specify on which axes the fitting procedure should be applied (closes issue 185). For example, let us assume you have a 3D array, such as:
>>> initial = ndrange('a=a0..a9;b=b0..b9;year=2000..2016')
and you want to apply a 2D fitting procedure for each value of the year axis. Previously, you had to loop on that year axis explicitly and call ipfp within the loop, like:
>>> result = zeros(initial.axes) >>> for year in initial.year: ... current = initial[year] ... # assume you have some targets for each year ... current_targets = [current.sum(x.a) + 1, current.sum(x.b) + 1] ... result[year] = ipfp(current_targets, current)
Now you can apply the procedure on all years at once, by telling you want to do the fitting procedure on the other axes. This is a bit shorter to type, but this is also much faster.
>>> all_targets = [initial.sum(x.a) + 1, initial.sum(x.b) + 1] >>> result = ipfp(all_targets, initial, axes=(x.a, x.b))
made ipfp 10 to 20% faster (even without using the axes argument).
implemented Session.to_globals(inplace=True) which will update the content of existing arrays instead of creating new variables and overwriting them. This ensures the arrays have the same axes in the session than the existing variables.
added the ability to provide a pattern when loading several .csv files as a session. Among others, patterns can use * to match any number of characters and ? to match any single character.
>>> s = Session() >>> # load all .csv files starting with "output" in the data directory >>> s.load('data/output*.csv')
stack can be used with keyword arguments when labels are “simple strings” (i.e. no integers, no punctuation, no string starting with integers, etc.). This is an attractive alternative but as it only works in the usual case and not in all cases, it is not recommended to use it except in the interactive console.
>>> arr1 = ones('nat=BE,FO') >>> arr1 nat BE FO 1.0 1.0 >>> arr2 = zeros('nat=BE,FO') >>> arr2 nat BE FO 0.0 0.0 >>> stack(M=arr1, F=arr2, axis='sex=M,F') nat\\sex M F BE 1.0 0.0 FO 1.0 0.0
Without passing an explicit order for labels like above (or an axis object), it should only be used on Python 3.6 or later because keyword arguments are NOT ordered on earlier Python versions.
>>> # use this only on Python 3.6 and later >>> stack(M=arr1, F=arr2, axis='sex') nat\\sex M F BE 1.0 0.0 FO 1.0 0.0
binary operations between session now ignore type errors. For example, if you are comparing two sessions with many arrays by computing the difference between them but a few arrays contain strings, the whole operation will not fail, the concerned arrays will be assigned a nan instead.
added optional argument ignore_exceptions to Session.load to ignore exceptions during load. This is mostly useful when trying to load many .csv files in a Session and some of them have an invalid format but you want to load the others.
Fixes¶
fixed disambiguating an ambiguous key by adding the axis within the string, for example arr[‘axis_name[ambiguouslabel]’] (closes issue 331).
fixed converting a string group to integer or float using int() and float() (when that makes sense).
>>> a = Axis('a=10,20,30,total') >>> a Axis(['10', '20', '30', 'total'], 'a') >>> str(a.i[0]) '10' >>> int(a.i[0]) 10 >>> float(a.i[0]) 10.0
Version 0.24¶
Released on 2017-06-14.
New features¶
implemented Session.to_globals which creates global variables from variables stored in the session (closes issue 276). Note that this should usually only be used in an interactive console and not in a script. Code editors are confused by this kind of manipulation and will likely consider as invalid the code using variables created in this way. Additionally, when using this method auto-completion, “show definition”, “go to declaration” and other similar code editor features will probably not work for the variables created in this way and any variable derived from them.
>>> s = Session(arr1=ndtest(3), arr2=ndtest((2, 2))) >>> s.to_globals() >>> arr1 a a0 a1 a2 0 1 2 >>> arr2 a\b b0 b1 a0 0 1 a1 2 3
added new boolean argument ‘overwrite’ to Session.save, Session.to_hdf, Session.to_excel and Session.to_pickle methods (closes issue 293). If overwrite=True and the target file already existed, it is deleted and replaced by a new one. This is the new default behavior. If overwrite=False, an existing file is updated (like it was in previous larray versions):
>>> arr1, arr2, arr3 = ndtest((2, 2)), ndtest(4), ndtest((3, 2)) >>> s = Session([('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
>>> # save arr1, arr2 and arr3 in file output.h5 >>> s.save('output.h5')
>>> # replace arr1 and create arr4 + put them in an second session >>> arr1, arr4 = ndtest((3, 3)), ndtest((2, 3)) >>> s2 = Session([('arr1', arr1), ('arr4', arr4)])
>>> # replace arr1 and add arr4 in file output.h5 >>> s2.save('output.h5', overwrite=False)
>>> # erase content of 'output.h5' and save only arrays contained in the second session >>> s2.save('output.h5')
Miscellaneous improvements¶
renamed create_sequential() to sequence() (closes issue 212).
improved auto-completion in ipython interactive consoles (e.g. the viewer console) for Axis, AxisCollection, Group and Workbook objects. These objects can now complete keys within [].
>>> gender = Axis('gender=Male,Female') >>> gender Axis(['Male', 'Female'], 'gender') gender['Female >>> gender['Fe<tab> # will be completed to `gender['Female`
>>> arr = ndrange(gender) >>> arr.axes['gen<tab> # will be completed to `arr.axes['gender`
>>> wb = open_excel() >>> wb['Sh<tab> # will be completed to `wb['Sheet1`
added documentation for Session methods (closes issue 277).
allowed to provide explict names for arrays or sessions in compare(). Closes issue 307.
Fixes¶
fixed title argument of ndtest creation function: title was not passed to the returned array.
fixed create_sequential when arguments initial and inc are array and scalar respectively (closes issue 288).
fixed auto-completion of attributes of LArray and Group objects (closes issue 302).
fixed name of arrays/sessions in compare() not being inferred correctly (closes issue 306).
fixed indexing Excel sheets by position to always yield the requested shape even when bounds are outside the range of used cells. Closes issue 273.
fixed the array() method on excel.Sheet returning float labels when int labels are expected.
fixed getting float data instead of int when converting an Excel Sheet or Range to an larray or numpy array.
fixed some warning messages to point to the correct line in user code.
fixed crash of Session.save method when it contained 0D arrays. They are now skipped when saving a session (closes issue 291).
fixed Session.save and Session.to_excel failing to create new Excel files (it only worked if the file already existed). Closes issue 313.
fixed Session.load(file, engine=’pandas_excel’) : axes were considered as anonymous.
Version 0.23¶
Released on 2017-05-30.
Miscellaneous improvements¶
changed display of arrays (closes issue 243):
>>> ndtest((2, 3)) a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
instead of
>>> ndtest((2, 3)) a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5
.. can now be used within keys (between []). Previously it could only be used to define new axes. As a reminder, it generates increasing values between the two bounds. It is slightly different from : which takes everything between the two bounds in the axis order.
>>> arr = ndrange('a=a1,a0,a2,a3') >>> arr a a1 a0 a2 a3 0 1 2 3 >>> arr['a1..a3'] a a1 a2 a3 0 2 3
this is different from : which takes everything in between the two bounds :
>>> arr['a1:a3'] a a1 a0 a2 a3 0 1 2 3
in both axes definitions and keys (within []) .. can now be mixed with , and other .. :
>>> arr = ndrange('code=A,C..E,G,X..Z') >>> arr code A C D E G X Y Z 0 1 2 3 4 5 6 7 >>> arr['A,Z..X,G'] code A Z Y X G 0 7 6 5 4
within .. extra zeros are only padded to numbers if zeros are present in the pattern.
>>> ndrange('code=A1..A12') code A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 0 1 2 3 4 5 6 7 8 9 10 11
>>> ndrange('code=A01..A12') code A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 A11 A12 0 1 2 3 4 5 6 7 8 9 10 11
in previous larray versions, the two above definitions returned the second array.
set sep argument of from_string function to ‘ ‘ by default (closes issue 271). For 1D array, a “-” must be added in front of the data line.
>>> from_string('''sex M F - 0 1''') sex M F 0 1 >>> from_string('''nat\\sex M F BE 0 1 FO 2 3''') nat\sex M F BE 0 1 FO 2 3
improved error message when trying to access nonexistent sheet in an Excel workbook (closes issue 266).
when creating an Axis from a Group and no explicit name was given, reuse the name of the group axis.
>>> a = Axis('a=a0..a2') >>> Axis(a[:'a1']) Axis(['a0', 'a1'], 'a')
allowed to create an array using a single group as if it was an Axis.
>>> a = Axis('a=a0..a2') >>> ndrange(a) a a0 a1 a2 0 1 2 >>> # using a group as an axis >>> ndrange(a[:'a1']) a a0 a1 0 1
allowed to use axes (Axis objects) to subset arrays (part of issue 210).
>>> arr = ndtest((2, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> b2 = Axis('b=b0,b2') >>> arr[b2] a\b b0 b2 a0 0 2 a1 3 5
improved string representation of Excel workbooks and sheets (they mention the actual file/sheet they correspond to). This is mostly useful in the interactive console to check what an object corresponds to.
>>> wb = open_excel() >>> wb <larray.io.excel.Workbook [Book1]> >>> wb[0] <larray.io.excel.Sheet [Book1]Sheet1>
Fixes¶
open_excel(‘non existent file’) will raise an explicit error immediately when overwrite_file is False, instead of failing at a seemingly random point later on (closes issue 265).
integer-like strings in axis definition strings using , are converted to integers to be consistent with string definitions using ... In other words, ndrange(‘a=1,2,3’) did not create the same array than ndrange(‘a=1..3’).
fixed reading a single cell from an Excel sheet.
fixed script execution not resuming after quitting the viewer when it was called using view(a_single_array).
fixed opening the viewer after showing a plot window.
do not display an error when setting the value of an element of a non LArray sequence in the viewer console
>>> l = [1, 2, 3] >>> l[0] = 42
Version 0.22¶
Released on 2017-05-11.
New features¶
viewer: added a menu bar with the ability to clear the current session, save all its arrays to a file (.h5, .xlsx, or a directory containing multiple .csv files), and load arrays from such a file (closes issue 88).
WARNING: Only array objects are currently saved. It means that scalars, functions or others non-LArray objects defined in the console are not saved in the file.
implemented a new describe() method on arrays to give quick summary statistics. By default, it includes the number of non-NaN values, the mean, standard deviation, minimum, 25, 50 and 75 percentiles and maximum.
>>> arr = ndrange('gender=Male,Female;year=2014..2020').astype(float) >>> arr gender\year | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 Male | 0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 Female | 7.0 | 8.0 | 9.0 | 10.0 | 11.0 | 12.0 | 13.0 >>> arr.describe() statistic | count | mean | std | min | 25% | 50% | 75% | max | 14.0 | 6.5 | 4.031128874149275 | 0.0 | 3.25 | 6.5 | 9.75 | 13.0
an optional keyword argument allows to specify different percentiles to include
>>> arr.describe(percentiles=[20, 40, 60, 80]) statistic | count | mean | std | min | 20% | 40% | 60% | 80% | max | 14.0 | 6.5 | 4.031128874149275 | 0.0 | 2.6 | 5.2 | 7.8 | 10.4 | 13.0
its sister method, describe_by() was also implemented to give quick summary statistics along axes or groups.
>>> arr.describe_by('gender') gender\statistic | count | mean | std | min | 25% | 50% | 75% | max Male | 7.0 | 3.0 | 2.0 | 0.0 | 1.5 | 3.0 | 4.5 | 6.0 Female | 7.0 | 10.0 | 2.0 | 7.0 | 8.5 | 10.0 | 11.5 | 13.0 >>> arr.describe_by('gender', (x.year[:2015], x.year[2019:])) gender | year\statistic | count | mean | std | min | 25% | 50% | 75% | max Male | :2015 | 2.0 | 0.5 | 0.5 | 0.0 | 0.25 | 0.5 | 0.75 | 1.0 Male | 2019: | 2.0 | 5.5 | 0.5 | 5.0 | 5.25 | 5.5 | 5.75 | 6.0 Female | :2015 | 2.0 | 7.5 | 0.5 | 7.0 | 7.25 | 7.5 | 7.75 | 8.0 Female | 2019: | 2.0 | 12.5 | 0.5 | 12.0 | 12.25 | 12.5 | 12.75 | 13.0
This closes issue 184.
implemented reindex allowing to change the order of labels and add/remove some of them to one or several axes:
>>> arr = ndtest((2, 2)) >>> arr a\b | b0 | b1 a0 | 0 | 1 a1 | 2 | 3 >>> arr.reindex(x.b, ['b1', 'b2', 'b0'], fill_value=-1) a\b | b1 | b2 | b0 a0 | 1 | -1 | 0 a1 | 3 | -1 | 2 >>> a = Axis('a', ['a1', 'a2', 'a0']) >>> b = Axis('b', ['b2', 'b1', 'b0']) >>> arr.reindex({'a': a, 'b': b}, fill_value=-1) a\b | b2 | b1 | b0 a1 | -1 | 3 | 2 a2 | -1 | -1 | -1 a0 | -1 | 1 | 0
using reindex one can make an array compatible with another array which has more/less labels or with labels in a different order:
>>> arr2 = ndtest((3, 3)) >>> arr2 a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 a2 | 6 | 7 | 8 >>> arr.reindex(arr2.axes, fill_value=0) a\b | b0 | b1 | b2 a0 | 0 | 1 | 0 a1 | 2 | 3 | 0 a2 | 0 | 0 | 0 >>> arr.reindex(arr2.axes, fill_value=0) + arr2 a\b | b0 | b1 | b2 a0 | 0 | 2 | 2 a1 | 5 | 7 | 5 a2 | 6 | 7 | 8
This closes issue 18.
added load_example_data function to load datasets used in tutorial and be able to reproduce examples. The name of the dataset must be provided as argument (there is currently only one available dataset). Datasets are returned as Session objects:
>>> demo = load_example_data('demography') >>> demo.pop.info 26 x 3 x 121 x 2 x 2 time [26]: 1991 1992 1993 ... 2014 2015 2016 geo [3]: 'BruCap' 'Fla' 'Wal' age [121]: 0 1 2 ... 118 119 120 sex [2]: 'M' 'F' nat [2]: 'BE' 'FO' >>> demo.qx.info 26 x 3 x 121 x 2 x 2 time [26]: 1991 1992 1993 ... 2014 2015 2016 geo [3]: 'BruCap' 'Fla' 'Wal' age [121]: 0 1 2 ... 118 119 120 sex [2]: 'M' 'F' nat [2]: 'BE' 'FO'
(closes issue 170)
implemented Axis.union, intersection and difference which produce new axes by combining the labels of the axis with the other labels.
>>> letters = Axis('letters=a,b') >>> letters.union(Axis('letters=b,c')) Axis(['a', 'b', 'c'], 'letters') >>> letters.union(['b', 'c']) Axis(['a', 'b', 'c'], 'letters') >>> letters.intersection(['b', 'c']) Axis(['b'], 'letters') >>> letters.difference(['b', 'c']) Axis(['a'], 'letters')
implemented Group.union, intersection and difference which produce new groups by combining the labels of the group with the other labels.
>>> letters = Axis('letters=a..d') >>> letters['a', 'b'].union(letters['b', 'c']) letters['a', 'b', 'c'].set() >>> letters['a', 'b'].union(['b', 'c']) letters['a', 'b', 'c'].set() >>> letters['a', 'b'].intersection(['b', 'c']) letters['b'].set() >>> letters['a', 'b'].difference(['b', 'c']) letters['a'].set()
viewer: added possibility to delete an array by pressing Delete on keyboard (closes issue 116).
Excel sheets in workbooks opened via open_excel can be renamed by changing their .name attribute:
>>> wb = open_excel() >>> wb['old_sheet_name'].name = 'new_sheet_name'
Excel sheets in workbooks opened via open_excel can be deleted using “del”:
>>> wb = open_excel() >>> del wb['sheet_name']
implemented PGroup.set() to transform a positional group to an LSet.
>>> a = Axis('a=a0..a5') >>> a.i[:2].set() a['a0', 'a1'].set()
Miscellaneous improvements¶
inverted name and labels arguments when creating an Axis and made name argument optional (to create anonymous axes). Now, it is also possible to create an Axis by passing a single string of the kind ‘name=labels’:
>>> anonymous = Axis('0..100') >>> age = Axis('age=0..100') >>> gender = Axis('M,F', 'gender')
(closes issue 152)
renamed Session.dump, dump_hdf, dump_excel and dump_csv to save, to_hdf, to_excel and to_csv (closes issue 217).
changed default value of ddof argument for var and std functions from 0 to 1 (closes issue 190).
implemented a new syntax for stack(): stack({label1: value1, label2: value2}, axis)
>>> nat = Axis('nat', 'BE, FO') >>> sex = Axis('sex', 'M, F') >>> males = ones(nat) >>> males nat | BE | FO | 1.0 | 1.0 >>> females = zeros(nat) >>> females nat | BE | FO | 0.0 | 0.0
In the case the axis has already been defined in a variable, this gives:
>>> stack({'M': males, 'F': females}, sex) nat\sex | M | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
Additionally, axis can now be an axis string definition in addition to an Axis object, which means one can write this:
>>> stack({'M': males, 'F': females}, 'sex=M,F')
It is better than the simpler but highly discouraged alternative:
>>> stack([males, females), sex)
because it is all too easy to invert labels. It is very hard to spot the error in the following line, and larray cannot spot it for you either:
>>> stack([females, males), sex) nat\sex | M | F BE | 0.0 | 1.0 FO | 0.0 | 1.0
When creating an axis from scratch (it does not already exist in a variable), one might want to use this:
>>> stack([males, females], 'sex=M,F')
even if this could suffer, to a lesser extent, the same problem as above when stacking many arrays.
handle … in transpose method to avoid having to list all axes. This can be useful, for example, to change which axis is displayed in columns (closes issue 188).
>>> arr.transpose(..., 'time') >>> arr.transpose('gender', ..., 'time')
made scalar Groups behave even more like their value: any method available on the value is available on the Group. For example, if the Group has a string value, the string methods are available on it (closes issue 202).
>>> test = Axis('test', ['abc', 'a1-a2']) >>> test.i[0].upper() 'ABC' >>> test.i[1].split('-') ['a1', 'a2']
updated AxisCollection.replace so as to replace one, several or all axes and to accept axis definitions as new axes.
>>> arr = ndtest((2, 3)) >>> axes = arr.axes >>> axes AxisCollection([ Axis(['a0', 'a1'], 'a'), Axis(['b0', 'b1', 'b2'], 'b') ]) >>> row = Axis(['r0', 'r1'], 'row') >>> column = Axis(['c0', 'c1', 'c2'], 'column')
Replace several axes (keywords, list of tuple or dictionary)
>>> axes.replace(a=row, b=column) >>> # or >>> axes.replace(a="row=r0,r1", b="column=c0,c1,c2") >>> # or >>> axes.replace([(x.a, row), (x.b, column)]) >>> # or >>> axes.replace({x.a: row, x.b: column}) AxisCollection([ Axis(['r0', 'r1'], 'row'), Axis(['c0', 'c1', 'c2'], 'column') ])
added possibility to delete an array from a session:
>>> s = Session({'a': ndtest((3, 3)), 'b': ndtest((2, 4)), 'c': ndtest((4, 2))}) >>> s.names ['a', 'b', 'c'] >>> del s.b >>> del s['c'] >>> s.names ['a']
made create_sequential axis argument accept axis definitions in addition to Axis objects like, for example, using a string definition (closes issue 160).
>>> create_sequential('year=2016..2019') year | 2016 | 2017 | 2018 | 2019 | 0 | 1 | 2 | 3
replaced *args, **kwargs by explicit arguments in documentation of aggregation functions (sum, prod, mean, std, var, …). Closes issue 41.
improved documentation of plot method (closes issue 169).
improved auto-completion in ipython interactive consoles for both LArray and Session objects. LArray objects can now complete keys within [].
>>> a = ndrange('sex=Male,Female') >>> a sex | Male | Female | 0 | 1 >>> a['Fe<tab>`
will autocomplete to a[‘Female. Sessions will now auto-complete both attributes (using session.) and keys (using session[).
>>> s = Session({'a_nice_test_array': ndtest(10)}) >>> s.a_<tab>
will autocomplete to s.a_nice_test_array and s[‘a_<tab> will be completed to s[‘a_nice_test_array
made warning messages for division by 0 and invalid values (usually caused by 0 / 0) point to the user code line, instead of the corresponding line in the larray module.
preserve order of arrays in a session when saving to/loading from an .xlsx file.
when creating a session from a directory containing CSV files, the directory may now contain other (non-CSV) files.
several calls to open_excel from within the same program/script will now reuses a single global Excel instance. This makes Excel I/O much faster without having to create an instance manually using xlwings.App, and still without risking interfering with other instances of Excel opened manually (closes issue 245).
improved error message when trying to copy a sheet from one instance of Excel to another (closes issue 231).
Fixes¶
fixed keyword arguments such as out, ddof, … for aggregation functions (closes issue 189).
fixed percentile(_by) with multiple percentiles values, i.e. when argument q is a list/tuple (closes issue 192).
fixed group aggregates on integer arrays for median, percentile, var and std (closes issue 193).
fixed group sum over boolean arrays (closes issue 194).
fixed set_labels when inplace=True.
fixed array creation functions not raising an exception when called with wrong syntax func(axis1, axis2, …) instead of func([axis1, axis2, …]) (closes issue 203).
fixed position of added sheets in excel workbook: new sheets are appended instead of prepended (closes issue 229).
fixed Workbook behavior in case of new workbook: the first added sheet replaces the default sheet Sheet1 (closes issue 230).
fixed name of Workbook sheets created by copying another sheet (closes issue 244).
>>> wb = open_excel() >>> wb['name_of_new_sheet'] = wb['name_of_sheet_to_copy']
fixed with_axes warning to refer to set_axes instead of replace_axes.
fixed displayed title in viewer: shows path to file associated with current session + current array info + extra info (closes issue 181)
Version 0.21¶
Released on 2017-03-28.
New features¶
implemented set_axes() method to replace one, several or all axes of an array (closes issue 67). The method with_axes() is now deprecated (set_axes() must be used instead).
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> row = Axis('row', ['r0', 'r1']) >>> column = Axis('column', ['c0', 'c1', 'c2'])
Replace one axis (second argument new_axis must be provided)
>>> arr.set_axes(x.a, row) row\b | b0 | b1 | b2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
Replace several axes (keywords, list of tuple or dictionary)
>>> arr.set_axes(a=row, b=column) or >>> arr.set_axes([(x.a, row), (x.b, column)]) or >>> arr.set_axes({x.a: row, x.b: column}) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
Replace all axes (list of axes or AxisCollection)
>>> arr.set_axes([row, column]) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5 >>> arr2 = ndrange([row, column]) >>> arr.set_axes(arr2.axes) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
implemented Axis.replace to replace some labels from an axis:
>>> sex = Axis('sex', ['M', 'F']) >>> sex Axis('sex', ['M', 'F']) >>> sex.replace('M', 'Male') Axis('sex', ['Male', 'F']) >>> sex.replace({'M': 'Male', 'F': 'Female'}) Axis('sex', ['Male', 'Female'])
implemented from_string() method to create an array from a string (closes issue 96).
>>> from_string('''age,nat\\sex, M, F ... 0, BE, 0, 1 ... 0, FO, 2, 3 ... 1, BE, 4, 5 ... 1, FO, 6, 7''') age | nat\sex | M | F 0 | BE | 0 | 1 0 | FO | 2 | 3 1 | BE | 4 | 5 1 | FO | 6 | 7
allowed to use a regular expression in split_axis method (closes issue 106):
>>> combined = ndrange('a_b = a0b0..a1b2') >>> combined a_b | a0b0 | a0b1 | a0b2 | a1b0 | a1b1 | a1b2 | 0 | 1 | 2 | 3 | 4 | 5 >>> combined.split_axis(x.a_b, regex='(\w{2})(\w{2})') a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5
one can assign a new axis to several groups at the same time by using axis[groups]:
>>> group1 = year[2001:2004] >>> group2 = year[2008,2009] >>> # let us change the year axis by time >>> x.time[group1, group2] (x.time[2001:2004], x.time[2008, 2009])
implemented Axis.by() which is equivalent to axis[:].by() and divides the axis into several groups of specified length:
>>> year = Axis('year', '2010..2016') >>> year.by(3) (year.i[0:3], year.i[3:6], year.i[6:7])
which is equivalent to (year[2010:2012], year[2013:2015], year[2016]). Like for groups, the optional second argument specifies the step between groups
>>> year.by(3, step=4) (year.i[0:3], year.i[4:7])
which is equivalent to (year[2010:2012], year[2014:2016]). And if step is smaller than length, we get overlapping groups, which can be useful for example for moving averages.
>>> year.by(3, 2) (year.i[0:3], year.i[2:5], year.i[4:7], year.i[6:7])
which is equivalent to (year[2010:2012], year[2012:2014], year[2014:2016], year[2016])
implemented larray_nan_equal to test whether two arrays are identical even in the presence of nan values. Two arrays are considered identical by larray_equal if they have exactly the same axes and data. However, since a nan value has the odd property of not being equal to itself, larray_equal returns False if either array contains a nan value. larray_nan_equal returns True if all not-nan data is equal and both arrays have nans at the same place.
>>> arr1 = ndtest((2, 3), dtype=float) >>> arr1['a1', 'b1'] = nan >>> arr1 a\b | b0 | b1 | b2 a0 | 0.0 | 1.0 | 2.0 a1 | 3.0 | nan | 5.0 >>> arr2 = arr1.copy() >>> arr2 a\b | b0 | b1 | b2 a0 | 0.0 | 1.0 | 2.0 a1 | 3.0 | nan | 5.0 >>> larray_equal(arr1, arr2) False >>> larray_nan_equal(arr1, arr2) True >>> arr2['b1'] = 0.0 >>> larray_nan_equal(arr1, arr2) False
Miscellaneous improvements¶
viewer: make keyboard shortcuts work even when the focus is not on the array editor widget. It means that, for example, plotting an array (via Ctrl-P) or opening it in Excel (Ctrl-E) can be done directly even when interacting with the list of arrays or within the interactive console (closes issue 102).
viewer: automatically display plots done in the viewer console in a separate window (see example below), unless “%matplotlib inline” is used.
>>> arr = ndtest((3, 3)) >>> arr.plot()
viewer: when calling view(an_array) from within the viewer, the new window opened does not block the initial window, which means you can have several windows open at the same time. view() without argument can still result in odd behavior though.
improved LArray.set_labels to make it possible to replace only some labels of an axis, instead of all of them and to replace labels from several axes at the same time.
>>> a = ndrange('nat=BE,FO;sex=M,F') >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3
to replace only some labels, one must give a mapping giving the new label for each label to replace
>>> a.set_labels(x.sex, {'M': 'Men'}) nat\sex | Men | F BE | 0 | 1 FO | 2 | 3
to replace labels for several axes at the same time, one should give a mapping giving the new labels for each changed axis
>>> a.set_labels({'sex': 'Men,Women', 'nat': 'Belgian,Foreigner'}) nat\sex | Men | Women Belgian | 0 | 1 Foreigner | 2 | 3
one can also replace some labels in several axes by giving a mapping of mappings
>>> a.set_labels({'sex': {'M': 'Men'}, 'nat': {'BE': 'Belgian'}}) nat\sex | Men | F Belgian | 0 | 1 FO | 2 | 3
allowed matrix multiplication (@ operator) between arrays with dimension != 2 (closes issue 122).
improved LArray.plot to get nicer plots by default. The axes are transposed compared to what they used to, because the last axis is often used for time series. Also it considers a 1D array like a single series, not N series of 1 point.
added installation instructions (closes issue 101).
Axis.group and Axis.all are now deprecated (closes issue 148).
>>> city.group(['London', 'Brussels'], name='capitals') # should be written as: >>> city[['London', 'Brussels']] >> 'capitals'
and
>>> city.all() # should be written as: >>> city[:] >> 'all'
Fixes¶
viewer: allow changing the number of displayed digits even for integer arrays as that makes sense when using scientific notation (closes issue 100).
viewer: fixed opening a viewer via view() edit() or compare() from within the viewer (closes issue 109)
viewer: fixed compare() colors when arrays have values which are very close but not exactly equal (closes issue 123)
viewer: fixed legend when plotting arbitrary rows (it always displayed the labels of the first rows) (closes issue 136).
viewer: fixed labels on the x axis when zooming on a plot (closes issue 143)
viewer: fixed storing an array in a variable with a name which existed previously but which was not displayable in the viewer, such as the name of any function or special object. In some cases, this error lead to a crash of the viewer. For example, this code failed when run in the viewer console, because x is already defined (for the x. syntax):
>>> x = ndtest(3)
fixed indexing an array using a positional group with a position which corresponds to a label on that axis. This used to return the wrong data (the data corresponding to the position as if it was the key).
>>> a = Axis('a', '1..3') >>> arr = ndrange(a) >>> arr a | 1 | 2 | 3 | 0 | 1 | 2 >>> # this used to return 0 ! >>> arr[a.i[1]] 1
fixed == for positional groups (closes issue 93)
>>> years = Axis('years', '1995..1997') >>> years Axis('years', [1995, 1996, 1997]) >>> # this used to return False >>> years.i[0] == 1995 True
fixed using positional groups for their value in many cases (slice bounds, within list of values, within other groups, etc.). For example, this used to fail:
>>> arr = ndtest((2, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 >>> b = arr.b >>> start = b.i[0] # equivalent to start = 'b0' >>> stop = b.i[2] # equivalent to stop = 'b2' >>> arr[start:stop] a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 4 | 5 | 6 >>> arr[[b.i[0], b.i[2]]] a\b | b0 | b2 a0 | 0 | 2 a1 | 4 | 6
fixed posargsort labels (closes issue 137).
fixed labels when doing group aggregates using positional groups. Previously, it used the positions as labels. This was most visible when using the Group.by() method (which creates positional groups).
>>> years = Axis('years', '2010..2015') >>> arr = ndrange(years) >>> arr years | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 0 | 1 | 2 | 3 | 4 | 5 >>> arr.sum(years.by(3)) years | 2010:2012 | 2013:2015 | 3 | 12
While this used to return:
>>> arr.sum(years.by(3)) years | 0:3 | 3:6 | 3 | 12
fixed Group.by() when the group was a slice with either bound unspecified. For example, years[2010:2015].by(3) worked but years[:].by(3), years[2010:].by(3) and years[:2015].by(3) did not.
fixed a speed regression in version 0.18 and later versions compared to 0.17. In some cases, it was up to 40% slower than it should (closes issue 165).
Version 0.20¶
Released on 2017-02-09.
IMPORTANT¶
To make sure all users have all optional dependencies installed and use the same version of packages, and to simplify the update process, we created a new “larrayenv” package which will install larray itself AND all its dependencies (including the optional ones). This means that this version needs to be installed using:
conda install larrayenv
in the future, to update from one version to the next, it should always be enough to do:
conda update larrayenv
New features¶
implemented from_lists() to create constant arrays (instead of using LArray directly as that is very error prone). We are not really happy with its name though, so it might change in the future. Any suggestion of a better name is very welcome (closes issue 30).
>>> from_lists([['sex\\year', 1991, 1992, 1993], ... [ 'M', 0, 1, 2], ... [ 'F', 3, 4, 5]]) sex\year | 1991 | 1992 | 1993 M | 0 | 1 | 2 F | 3 | 4 | 5
added support for loading sparse arrays via open_excel().
For example, assuming you have a sheet like this:
age | sex\year | 2015 | 2016 10 | F | 0.0 | 1.0 10 | M | 2.0 | 3.0 20 | M | 4.0 | 5.0
loading it will yield:
>>> wb = open_excel('test_sparse.xlsx') >>> arr = wb['Sheet1'].load() >>> arr age | sex\year | 2015 | 2016 10 | F | 0.0 | 1.0 10 | M | 2.0 | 3.0 20 | F | nan | nan 20 | M | 4.0 | 5.0
Miscellaneous improvements¶
allowed to get an axis from an array by using array.axis_name in addition to array.axes.axis_name:
>>> arr = ndtest((2, 3)) >>> arr.axes AxisCollection([ Axis('a', ['a0', 'a1']), Axis('b', ['b0', 'b1', 'b2']) ]) >>> arr.a Axis('a', ['a0', 'a1'])
viewer: several rows/columns can be plotted together. It draws a separate line for each row except if only one column has been selected.
viewer: the array labels are used as “ticks” in plots.
‘_by’ aggregation methods accept groups in addition to axes (closes issue 59). It will keep only the mentioned groups and aggregate all other dimensions:
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
>>> arr.sum_by('c0,c1;c1:c3') c | c0,c1 | c1:c3 | 126 | 216
viewer: view() and edit() now accept as argument a path to a file containing arrays.
>>> view('myfile.h5')
this is a shortcut for:
>>> view(Session('myfile.h5'))
AxisCollection.without now accepts a single integer position (to exclude an axis by position).
>>> a = ndtest((2, 3)) >>> a.axes AxisCollection([ Axis('a', ['a0', 'a1']), Axis('b', ['b0', 'b1', 'b2']) ]) >>> a.axes.without(0) AxisCollection([ Axis('b', ['b0', 'b1', 'b2']) ])
nicer display (repr) for LSet (closes issue 44).
>>> x.b['b0,b2'].set() x.b['b0', 'b2'].set()
implemented sep argument for LArray & AxisCollection.combine_axes() to allow using a custom delimiter (closes issue 53).
added a check that ipfp target sums haves expected axes (closes issue 42).
when the nb_index argument is not provided explicitly in read_excel(engine=’xlrd’), it is autodetected from the position of the first “” (closes issue 66).
allow any special character except “.” and whitespace when creating axes labels using “..” syntax (previously only _ was allowed).
added many more I/O tests to hopefully lower our regression rate in the future (closes issue 70).
Fixes¶
viewer: selection of entire rows/columns will load any remaining data, if any (closes issue 37). Previously if you selected entire rows or columns of a large dataset (which is not loaded entirely from the start), it only selected (and thus copied/plotted) the part of the data which was already loaded.
viewer: filtering on anonymous axes is now possible (closes issue 33).
fixed loading sparse files using read_excel() (fixes issue 29).
fixed nb_index argument for read_excel().
fixed creating range axes with a negative start bound using string notation (e.g. Axis(‘name’, ‘-1..10’)) (fixes issue 51).
fixed ptp() function.
fixed with_axes() to copy the title of the array.
fixed Group >> ‘name’.
fixed workbook[sheet_position] when using open_excel().
fixed plotting in the viewer when using Qt4.
Version 0.19¶
Released on 2017-01-19.
New features¶
Implemented a “by” variant to all aggregate methods (e.g. sum_by, mean_by, etc.). These methods aggregate all axes except those listed, which means the only axes remaining after the aggregate operation will be those listed. For example: arr.sum_by(x.a) is equivalent to arr.sum(arr.axes - x.a)
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23 >>> arr.sum_by(x.b) b | b0 | b1 | b2 | 60 | 92 | 124
Added .extend() method to Axis class
>>> a = Axis('a', 'a0..a2') >>> a Axis('a', ['a0', 'a1', 'a2']) >>> other = Axis('other', 'a3..a5') >>> a.extend(other) Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
or directly specify the extra labels as a list or as a “label string”:
>>> a.extend('a3..a5') Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
Added title argument to all array creation functions (ndrange, zeros, ones, …) and display it in the .info of array objects.
>>> a = ndrange(3, title='a simple test array') >>> a.info a simple test array 3 {0}* [3]: 0 1 2
implemented creating an Axis using a group:
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> a, b = arr.axes >>> zeros((a, b[:'b1'])) a\b | b0 | b1 a0 | 0.0 | 0.0 a1 | 0.0 | 0.0
made Axis.startswith, .endswith and .matches accept Group instances
>>> a = Axis('a', 'a0..b2') >>> a Axis('a', ['a0', 'a1', 'a2', 'b0', 'b1', 'b2'])
>>> prefix = Axis('prefix', 'a,b') >>> a.startswith(prefix['a']) a['a0', 'a1', 'a2'] >>> a.startswith(prefix.i[1]) a['b0', 'b1', 'b2']
implemented all usual binary operations (+, -, *, /, …) on Group
>>> year = Axis('year', '2011..2016') >>> year[2013] + 1 2014 >>> year.i[2] + 1 2014
made the viewer is much more useful as a debugger in the middle of a function by generalizing SessionEditor to handle any mapping, instead of only Session objects but made it list and display only array objects. To view the value of non-array variable one should type their name in the console. Given those changes, view() will superficially behave as before, but behind the scene, all variables which were defined in the scope where view() was called will be available in the viewer console, even though they will not appear in the list on the left. This means that the viewer console will be able to use scalars defined at that point and call others functions of your code. In other words, there are more chances you can execute some code from the function calling view() by simply copy-pasting the code line.
Backward incompatible changes¶
LGroup lost set-like operations (intersection and union) to the profit of a specific subclass (LSet). In other words, this no longer works:
>>> letters = Axis('letters', 'a..z') >>> letters[':c'] & letters['b:']
To make it work, we need to convert the LGroup(s) to LSets explicitly:
>>> letters[':c'].set() & letters['b:d'].set() letters.set[OrderedSet(['b', 'c'])]
>>> letters[':c'].set() | letters['b:d'].set() letters.set[OrderedSet(['a', 'b', 'c', 'd'])]
>>> letters[':c'].set() - 'b' letters.set[OrderedSet(['a', 'c'])]
group aggregates produce simple string labels for the new aggregated axis instead of using the group themselves as labels. This means one can no longer know where a group comes from but this simplifies the code and fixes a few issues, most notably export of aggregated arrays to Excel, and some operations between two aggregated arrays.
>>> arr = ndtest((3, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 a2 | 8 | 9 | 10 | 11 >>> agg = arr.sum(':b2 >> tob2;b2,b3 >> other') >>> agg a\b | tob2 | other a0 | 3 | 5 a1 | 15 | 13 a2 | 27 | 21 >>> agg.info 3 x 2 a [3]: 'a0' 'a1' 'a2' b [2]: 'tob2' 'other' >>> agg.axes.b.labels[0] 'tob2'
In previous versions this would have returned:
>>> agg.axes.b.labels[0] LGroup(':b2', name='tob2', axis=Axis('b', ['b0', 'b1', 'b2', 'b3']))
a string containing only a single “integer-like” is no longer transformed to an integer e.g. “10” will evaluate to (the string) “10” (like in version 0.17 and earlier) while “10,20” will evaluate to the list of integers: [10, 20]
Other changes¶
changed how Group instances are displayed.
>>> a = Axis('a', 'a0..a2') >>> a['a1,a2'] a['a1', 'a2']
Fixes¶
fixed > and >= on Group using slices
avoid a division by 0 warning when using divnot0
viewer: fixed plots when Qt5 is installed. This also removes the matplotlib warning people got when running the viewer with Qt5 installed.
viewer: display array when typing its name in the console even when no array was selected previously
Misc¶
misc code cleanup, improved docstrings, …
Version 0.18¶
Released on 2016-12-20.
Major improvements¶
the documentation (docstrings) of many functions was vastly improved (thanks to Alix)
implemented a new optional syntax to generate sequences of labels for axes by using patterns
integer strings generate integers
>>> ndrange('age=0..10') age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
you can combine letters and numbers. The number part is treated like increasing (or decreasing numbers)
>>> ndrange('lipro=P01..P12') lipro | P01 | P02 | P03 | P04 | P05 | P06 | P07 | P08 | P09 | P10 | P11 | P12 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11
letter patterns generate all combination of letters between the start and end:
>>> ndrange('test=AA..CC') test | AA | AB | AC | BA | BB | BC | CA | CB | CC | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
other characters are left intact (and should be the same on the start and end patterns:
>>> ndrange('test=A_1..C_2') test | A_1 | A_2 | B_1 | B_2 | C_1 | C_2 | 0 | 1 | 2 | 3 | 4 | 5
this also works within Axis()
>>> Axis('age', '0..10') Axis('age', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
implemented new syntax for defining groups using strings:
>>> arr = ndtest((3, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 a2 | 8 | 9 | 10 | 11
groups can be named using “>>” instead of “=” previously
>>> arr.sum('b1,b3 >> b13;b0:b2 >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
if some labels are ambiguous, one can specify the axis by using “axis_name[labels]”:
>>> arr.sum('b[b1,b3] >> b13;b[b0:b2] >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
groups can also be defined by position using this syntax:
>>> arr.sum('b.i[1,3] >> b13;b.i[0:3] >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
A few notes:
the goal was to have that syntax as close as the “normal” syntax as possible (just remove the “x.” and all inner quotes).
in models, the normal syntax should be preferred, so that the groups can be stored in a variable and reused in several places
strings representing integers are evaluated as integers.
there is experimental support for evaluating expressions within string groups by using “{expr}”, but this is fragile and might be removed in the future.
implemented combine_axes & split_axis on arrays:
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
>>> arr2 = arr.combine_axes((x.a, x.b)) >>> arr2 a_b\c | c0 | c1 | c2 | c3 a0_b0 | 0 | 1 | 2 | 3 a0_b1 | 4 | 5 | 6 | 7 a0_b2 | 8 | 9 | 10 | 11 a1_b0 | 12 | 13 | 14 | 15 a1_b1 | 16 | 17 | 18 | 19 a1_b2 | 20 | 21 | 22 | 23
>>> arr2.split_axis(x.a_b) a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
implemented .by() method on groups which splits them into subgroups of specified length
>>> arr = ndtest((5, 2)) >>> arr a\b | b0 | b1 a0 | 0 | 1 a1 | 2 | 3 a2 | 4 | 5 a3 | 6 | 7 a4 | 8 | 9
>>> arr.sum(a['a0':'a4'].by(2)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a2' 'a3'] | 10 | 12 a['a4'] | 8 | 9
there is also an optional second argument to specify the “step” between groups
>>> arr.sum(a['a0':'a4'].by(2, step=3)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a3' 'a4'] | 14 | 16
if the step is < the group size, you get overlapping groups:
>>> arr.sum(a['a0':'a4'].by(2, step=1)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a1' 'a2'] | 6 | 8 a['a2' 'a3'] | 10 | 12 a['a3' 'a4'] | 14 | 16 a['a4'] | 8 | 9
groups can be renamed using >> (in addition to the “named” method)
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.sum((x.b['b0,b1'] >> 'b01', x.b['b1,b2'] >> 'b12')) a\b | b01 | b12 a0 | 1 | 3 a1 | 7 | 9
implemented rationot0
>>> a = Axis('a', 'a0,a1') >>> b = Axis('b', 'b0,b1,b2') >>> arr = LArray([[6, 0, 2], ... [4, 0, 8]], [a, b]) >>> arr a\b | b0 | b1 | b2 a0 | 6 | 0 | 2 a1 | 4 | 0 | 8 >>> arr.sum() 20 >>> arr.rationot0() a\b | b0 | b1 | b2 a0 | 0.3 | 0.0 | 0.1 a1 | 0.2 | 0.0 | 0.4 >>> arr.rationot0(x.a) a\b | b0 | b1 | b2 a0 | 0.6 | 0.0 | 0.2 a1 | 0.4 | 0.0 | 0.8
for reference, the normal ratio method would return:
>>> arr.ratio(x.a) a\b | b0 | b1 | b2 a0 | 0.6 | nan | 0.2 a1 | 0.4 | nan | 0.8
Misc improvements¶
implemented [] on groups so that you can further subset them
added a new “condensed” option for ipfp’s display_progress argument to get back the old behavior
changed how named groups are displayed (only the name is displayed)
positional groups gained a few features and are almost on par with label groups now
when iterating over an axis (for example when doing “for y in year_axis:” it yields groups (instead of raw labels) so that it works even in the presence of ambiguous labels.
Axis.startswith, endswith, matches create groups which include the axis (so that those groups work even if the labels exist on several axes)
Bug fixes¶
fixed Session.summary() when arrays in the session have axes without name
fixed full() and full_like() with an explicit dtype (the dtype was ignored)
Version 0.17¶
Released on 2016-11-29.
Core¶
added ndtest function to create n-dimensional test arrays (of given shape). Axes are named by single letters starting from ‘a’. Axes labels are constructed using a ‘{axis_name}{label_pos}’ pattern (e.g. ‘a0’).
>>> ndtest(6) a | a0 | a1 | a2 | a3 | a4 | a5 | 0 | 1 | 2 | 3 | 4 | 5 >>> ndtest((2, 3)) a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> ndtest((2, 3), label_start=1) a\b | b1 | b2 | b3 a1 | 0 | 1 | 2 a2 | 3 | 4 | 5
allow naming “one-shot” groups in group aggregates.
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.sum('g1=b0;g2=b1,b2;g3=b0:b2') a\b | 'g1' ('b0') | 'g2' (['b1' 'b2']) | 'g3' ('b0':'b2') a0 | 0 | 3 | 3 a1 | 3 | 9 | 12
implemented argmin, argmax, posargmin, posargmax without an axis argument (works on the full array).
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.argmin() ('a0', 'b0')
added preliminary code to add a title attribute to LArray.
This needs a lot more work to be really useful though, as it can currently only be used in the LArray() function itself and is only used in Session.summary() (see below). There are many places where this should be used, but this is not done yet.
added Session.summary() which displays a list of all arrays, their dimension names and title if any.
This can be used in combination with local_arrays() to produce some kind of codebook with all the arrays of a function.
>>> arr = LArray([[1, 2], [3, 4]], 'sex=M,F;nat=BE,FO', title='a test array') >>> arr sex\nat | BE | FO M | 1 | 2 F | 3 | 4 >>> s = Session({'arr': arr}) >>> s Session(arr) >>> print(s.summary()) arr: sex, nat a test array
fixed using groups from other (compatible) axis
fixed group aggregates using groups without axis
fixed axis[another_label_group] when said group had a non-string Axis
fixed axis.group(another_label_group, name=’a_name’) (name was not set correctly)
fixed ipfp progress message when progress is negative
viewer¶
when setting part of an array in the console (by using e.g. arr[‘M’] = 10), display that array
when typing in the console the name of an existing array, select it in the list
fixed missing tooltips for arrays added to the session from within the session viewer
fixed window title (with axes info) not updating in many cases
fixed the filters bar not being cleared when displaying a non-LArray object after an LArray object
misc¶
improved messages in ipfp(display_progress=True)
improved tests, docstrings, …
Version 0.16.1¶
Released on 2016-11-04.
Viewer¶
renamed “Ok” button in array/session viewer to “Close”.
added apply and discard buttons in session editor, which permanently apply or discard changes to the current array.
Core¶
fixed array[sequence, scalar] = value
fixed array.to_excel() which was broken in 0.16 (by the upgrade to xlwings 0.9+).
improved a few tests
Version 0.16¶
Released on 2016-10-26.
Warning: this release needs to be installed using:
conda update larray conda update xlwings
New features¶
implemented support for xlwings 0.9+. This allowed us to change the way we interact with Excel:
by default, the Excel instance we use is configured to be both hidden and silent (for example, it does not prompt to update/edit links).
by default, we now use a dedicated Excel instance for each call to open_excel, instead of reusing any existing instance if there was any open. In practice, it means input/output from/to Excel is more reliable and does not risk altering any workbook you had open (except if you ask for that explicitly). The cost of this is that it is slower by default. If you open many different workbooks, it is recommended that you create a single Excel instance and reuse it. This can be done with:
>>> from larray import * >>> import xlwings as xw
>>> app = xw.App(visible=False, add_book=False) >>> wb1 = open_excel('workbook1.xlsx', app=app) # use wb1 as before >>> wb1.close() >>> wb2 = open_excel('workbook2.xlsx', app=app) # use wb2 as before >>> wb2.close() >>> app.quit()
added ipfp function which does Iterative Proportional Fitting Procedure (also known as bi-proportional fitting in statistics or RAS algorithm in economics). Note that this new function is currently not in the core module, so it needs a specific import command:
>>> from larray.ipfp import ipfp
>>> a = Axis('a', 2) >>> b = Axis('b', 2) >>> initial = LArray([[2, 1], ... [1, 2]], [a, b]) >>> initial a*\b* | 0 | 1 0 | 2 | 1 1 | 1 | 2 >>> target_sum_along_a = LArray([2, 1], b) >>> target_sum_along_b = LArray([1, 2], a) >>> ipfp([target_sum_along_a, target_sum_along_b], initial, threshold=0.01) a*\b* | 0 | 1 0 | 0.8450704225352113 | 0.15492957746478875 1 | 1.1538461538461537 | 0.8461538461538463
made it possible to create arrays more succintly in some usual cases (especially for quick arrays for testing purposes). Previously, when one created an array from scratch, he had to provide Axis object(s) (or another array). Note that the following examples use zeros() but this change affects all array creation functions (ones, zeros, ndrange, full, empty):
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> zeros([nat, sex]) nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
Now, when you have axe names and axes labels but do not have/want to reuse an existing axis, you can use this syntax:
>>> zeros([('nat', ['BE', 'FO']), ... ('sex', ['M', 'F'])]) nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
If additionally all axe names and labels are strings (not integers or other types) which do not contain any special character (“=”, “,” or “;”) you can use:
>>> zeros('nat=BE,FO;sex=M,F') nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
See below (*) for some more alternate syntaxes and an explanation of how this works.
added additional, less error-prone syntax for stack:
>>> nat = Axis('nat', 'BE,FO') >>> arr1 = ones(nat) >>> arr1 nat | BE | FO | 1.0 | 1.0 >>> arr2 = zeros(nat) >>> arr2 nat | BE | FO | 0.0 | 0.0 >>> stack([('M', arr1), ('F', arr2)], 'sex') nat\sex | H | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
in addition to the still supported but discouraged (because one has to remember the order of labels):
>>> sex = Axis('sex', ['M', 'F']) >>> stack((arr1, arr2), sex) nat\sex | H | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
added LArray.compact and Session.compact() to detect and remove “useless” axes (ie axes for which values are constant over the whole axis)
>>> a = LArray([[1, 2], [1, 2]], [Axis('sex', 'M,F'), Axis('nat', 'BE,FO')]) >>> a sex\nat | BE | FO M | 1 | 2 F | 1 | 2 >>> a.compact() nat | BE | FO | 1 | 2
made Session keep the order in which arrays were added to it. The main goal was to make this work:
>>> b, a = s['b', 'a']
Previously, since sessions were always traversed alphabetically, this was a dangerous operation because if the keys (a and b) were not sorted alphabetically, the result would not be in the expected order:
s[‘b’, ‘a’] previously returned a, b instead of b, a !!
Session.names is still sorted alphabetically though (Session.keys() is not)
added LArray.with_axes(axes) to return a new LArray with the same data but different axes
>>> a = ndrange(2) >>> a {0}* | 0 | 1 | 0 | 1 >>> a.with_axes([Axis('sex', 'H,F')]) sex | H | F | 0 | 1
changed width from which an LArray is summarized (using “…”) from 80 characters to 200.
implemented memory_used property which displays nbytes in human-readable form
>>> a = ndrange('sex=H,F;nat=BE,FO') >>> a.memory_used '16 bytes' >>> a = ndrange(100000) >>> a.memory_used '390.62 Kb'
implemented Axis + AxisCollection
>>> a = ndrange('sex=M,F;type=t1,t2') >>> Axis('nat', 'BE,FO') + a.axes AxisCollection([ Axis('nat', ['BE', 'FO']), Axis('sex', ['M', 'F']), Axis('type', ['t1', 't2']) ])
(*) For the curious, there are also many syntaxes supported for array creation functions. In fact, during array creation, at any place a list or tuple of values is expected, you can specify it using a single string, which will be split successively at the following characters if present: “;” then “=” then “,”. If you apply that algorithm to ‘nat=BE,FO;sex=M,F’, you get:
‘nat=BE,FO;sex=M,F’
(‘nat=BE,FO’, ‘sex=M,F’)
((‘nat’, ‘BE,FO’), (‘sex’, ‘M,F’))
((‘nat’, (‘BE’, ‘FO’)), (‘sex’, (‘M’, ‘F’)))
Recognise this last syntax? This is the same as above, except above we replaced some () with [] for clarity. In fact all the intermediate forms here above are valid (and equivalent) in array creation functions.
Version 0.15¶
Released on 2016-09-23.
Core¶
added new methods on axes: matches, startswith, endswith
>>> country = Axis('country', ['FR', 'BE', 'DE', 'BR']) >>> country.matches('BE|FR') LGroup(['FR', 'BE']) >>> country.matches('^..$') # labels 2 characters long LGroup(['FR', 'BE', 'DE', 'BR'])
>>> country.startswith('B') LGroup(['BE', 'BR']) >>> country.endswith('R') LGroup(['FR', 'BR'])
implemented set-like operations on LGroup: & (intersection), | (union), - (difference). Slice groups do not work yet on axes references (x.) but that will come in the future…
>>> alpha = Axis('alpha', 'a,b,c,d') >>> alpha['a', 'b'] | alpha['c', 'd'] LGroup(['a', 'b', 'c', 'd'], axis=…) >>> alpha['a', 'b', 'c'] | alpha['c', 'd'] LGroup(['a', 'b', 'c', 'd'], axis=…)
a name is computed automatically when both operands are named
>>> r = alpha['a', 'b'].named('ab') | alpha['c', 'd'].named('cd') >>> r.name 'ab | cd' >>> r.key ['a', 'b', 'c', 'd']
numeric axes work too
>>> num = Axis('num', range(10)) >>> num[:2] | num[8:] num[0, 1, 2, 8, 9] >>> num[:2] | num[5] num[0, 1, 2, 5])
intersection
>>> LGroup(['a', 'b', 'c']) & LGroup(['c', 'd']) LGroup(['c'])
difference
>>> LGroup(['a', 'b', 'c']) - LGroup(['c', 'd']) LGroup(['a', 'b']) >>> LGroup(['a', 'b', 'c']) - 'b' LGroup(['a', 'c'])
fixed loading 1D arrays using open_excel
Viewer¶
added tooltip with the axes labels corresponding to each cell of the array viewer
added name and dimensions of the current array to the window title bar in the session viewer
added tooltip with each array .info() in the list of arrays of the session viewer
fixed eval box throwing an exception when trying to set a new variable (if qtconsole is not present)
fixed group aggregates using LGroups defined using axes references (x.), for example:
>>> arr.sum(x.age[:10])
fixed group aggregates using anonymous axes
Version 0.14.1¶
Released on 2016-08-12.
Fixes¶
fixed support for loading arrays without axe names from Excel files (in that case index_col/nb_index are necessary)
fixed using a single int for index_col in read_excel() and sheet.load()
fixed loading empty Excel sheets via xlwings correctly (ie do not crash)
fixed dumping a session loaded from an H5 file to Excel
Version 0.14¶
Released on 2016-08-10.
Important warning¶
This version is not compatible with the new version of xlwings that just came out. Consequently, upgrading to this version is different from the usual “conda update larray”. You should rather use:
conda update larray –no-update-deps
To get the most of this release, you should also install the “qtconsole” package via:
conda install qtconsole
Viewer¶
upgraded session viewer/editor to work like a super-calculator. The input box below the array view can be used to type any expression. eg array1.sum(x.age) / array2, which will be displayed in the viewer. One can also type assignment commands, like: array3 = array1.sum(x.age) / array2 In which case, the new array will be displayed in the viewer AND added to the session (appear on the list on the left), so that you can use it in other expressions.
- If you have the “qtconsole” package installed (see above), that input box will be a full ipython console. This means:
history of typed commands,
tab-completion (for example, type “nd<tab>” and it will change to “ndrange”),
syntax highlighting,
calltips (show the documentation of functions when typing commands using them),
help on functions using “?”. For example, type “ndrange?<enter>” to get the full documentation about ndrange. Use <ESC> or <q> to quit that screen !),
etc.
When having the “qtconsole” package installed, you might get a warning when starting the viewer:
WARNING:root:Message signing is disabled. This is insecure and not recommended!
This is totally harmless and can be safely ignored !
made view() and edit() without argument equivalent to view(local_arrays()) and edit(local_arrays()) respectively.
made the viewer on large arrays start a lot faster by using a small subset of the array to guess the number of decimals to display and whether or not to use scientific notation.
- improved compare():
added support for comparing sessions. Arrays with differences between sessions are colored in red.
use a single array widget instead of 3. This is done by stacking arrays together to create a new dimension. This has the following advantages:
the filter and scrollbars are de-facto automatically synchronized.
any number of arrays can be compared, not just 2. All arrays are compared to the first one.
arrays with different sets of compatible axes can be compared (eg compare an array with its mean along an axis).
added label to show maximum absolute difference.
implemented edit(session) in addition to view(session).
Excel support¶
added support for copying sheets via: wb[‘x’] = wb[‘y’] if ‘x’ sheet already existed, it is completely overwritten.
Core¶
improved performance. My test models run about 10% faster than with 0.13.
made cumsum and cumprod aggregate on the last axis by default so that the axis does not need to be specified when there is only one.
implemented much better support for operations using arrays of different types. For example,
fixed create_sequential when mult, inc and initial are of different types eg create_sequential(…, initial=1, inc=0.1) had an unexpected integer result because it always used the type of the initial value for the output
when appending a string label to an integer axis (eg adding total to an age axis by using with_total()), the resulting axis should have a mixed type, and not be suddenly all string.
stack() now supports arrays with different types.
made stack support arrays with different axes (the result has the union of all axes)
For completeness¶
use xlwings (ie live Excel instance) by default for all Excel input/output, including read_excel(), session.dump and session.load/Session(filename). This has the advantage of more coherent results among the different ways to load/save data to Excel and that simple sessions correctly survive a round-trip to an .xlsx workbook (ie (named) axes are detected properly). However, given the very different library involved, we loose most options that read_excel used to provide (courtesy of pandas.read_excel) and some bugs were probably introduced in the conversion.
fixed creating a new file via open_excel()
fixed loading 1D arrays (ranges with height 1 or width 1) via open_excel()
fixed sheet[‘A1’] = array in some cases
wb.close() only really close if the workbook was not already open in Excel when open_excel was called (so that we do not close a workbook a user is actually viewing).
added support for wb.save(filename), or actually for using any relative path, instead of a full absolute path.
when dumping a session to Excel, sort sheets alphabetically instead of dumping them in a “random” order.
try to convert float to int in more situations
added support for using stack() without providing an axis. It creates an anonymous wildcard axis of the correct length.
added aslarray() top-level function to translate anything into an LArray if it is not already one
made labels_array available via from larray import *
fixed binary operations between an array and an axis where the array appeared first (eg array > axis). Confusingly, axis < array already worked.
added check in “a[bool_larray_key]” to make sure key.axes are compatible with a.axes
made create_sequential a lot faster when mult or inc are constants
made axes without name compatible with any name (this is the equivalent of a wildcard name for labels)
misc cleanup/docstring improvements/improved tests/improved error messages
Version 0.13¶
Released on 2016-07-11.
New features¶
implemented a new way to do input/output from/to Excel
>>> a = ndrange((2, 3)) >>> wb = open_excel('c:/tmp/y.xlsx') # put a at A1 in Sheet1, excluding headers (labels) >>> wb['Sheet1'] = a # dump a at A1 in Sheet2, including headers (labels) >>> wb['Sheet2'] = a.dump() # save the file to disk >>> wb.save() # close it >>> wb.close()
>>> wb = open_excel('c:/tmp/y.xlsx') # load a from the data starting at A1 in Sheet1, assuming the absence of headers. >>> a1 = wb['Sheet1'] # load a from the data starting at A1 in Sheet1, assuming the presence of (correctly formatted) headers. >>> a2 = wb['Sheet2'].load() >>> wb.close()
>>> wb = open_excel('c:/tmp/y.xlsx') # note that Sheet2 must exist >>> sheet2 = wb['Sheet2'] # write a without labels starting at C5 >>> sheet2['C5'] = a # write a with its labels starting at A10 >>> sheet2['A10'] = a.dump()
load an array with its axes information from a range. As you might have guessed, we could also use the sheet2 variable here
>>> b = wb['Sheet2']['A10:D12'].load() >>> b {0}*\{1}* | 0 | 1 | 2 0 | 0 | 1 | 2 1 | 3 | 4 | 5
load an array (raw data) with no axis information from a range.
>>> c = sheet['B11:D12'] >>> # in fact, this is not really an LArray ... >>> c <larray.excel.Range at 0x1ff1bae22e8> >>> # but it can be used as such (this is currently very experimental) >>> c.sum(axis=0) {0}* | 0 | 1 | 2 | 3.0 | 5.0 | 7.0 >>> # ... and it can be used for other stuff, like setting the formula instead of the value: >>> c.formula = '=D10+1' >>> # in the future, we should also be able to set font name, size, style, etc.
implemented LArray.rename({axis: new_name}) as well as using kwargs to rename several axes at once
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange([nat, sex]) >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3 >>> a.rename(nat='nat2', sex='gender') nat2\gender | M | F BE | 0 | 1 FO | 2 | 3 >>> a.rename({'nat': 'nat2', 'sex': 'gender'}) nat2\gender | M | F BE | 0 | 1 FO | 2 | 3
made tab-completion of axes names possible in an interactive console
For completeness¶
taking a subset of an array with wildcard axes now returns an array with wildcard axes
fixed a case where wildcard axes were considered incompatible when they actually were compatible
better support for anonymous axes
fix for obscure bugs, better doctests, cleaner implementation for a few functions, …
Version 0.12¶
Released on 2016-06-21.
New features¶
implemented boolean indexing by using axes objects:
>>> sex = Axis('sex', 'M,F') >>> age = Axis('age', range(5)) >>> a = ndrange((sex, age)) >>> a sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 1 | 2 | 3 | 4 F | 5 | 6 | 7 | 8 | 9
>>> a[age < 3] sex\age | 0 | 1 | 2 M | 0 | 1 | 2 F | 5 | 6 | 7
This new syntax is equivalent to (but currently much slower than):
>>> a[age[:2]] sex\age | 0 | 1 | 2 M | 0 | 1 | 2 F | 5 | 6 | 7
However, the power of this new syntax comes from the fact that you are not limited to scalar constants
>>> age_limit = LArray([2, 3], sex) >>> age_limit sex | M | F | 2 | 3
>>> a[age < age_limit] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7
Notice that the concerned axes are merged, so you cannot do much as much with them. For example, a[age < age_limit].sum(x.age) would not work since there is no “age” axis anymore.
To keep axes intact, one can often set the values of the corresponding cells to 0 or nan instead.
>>> a[age < age_limit] = 0 >>> a sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9 >>> # in this case, the sum *is* valid (but the mean would not -- one should use nan for that) >>> a.sum(x.age) sex | M | F | 9 | 17
To keep axes intact, this idiom is also often useful:
>>> b = a * (age >= age_limit) >>> b sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9
This also works with axes references (x.axis_name), though this is experimental and the filter value is only computed as late as possible (during []), so you cannot display it before that, like you can with “real” axes.
Using “real” axes:
>>> filter1 = age < age_limit >>> filter1 age\sex | M | F 0 | True | True 1 | True | True 2 | False | True 3 | False | False 4 | False | False >>> a[filter1] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7
With axes references:
>>> filter2 = x.age < age_limit >>> filter2 <larray.core.BinaryOp at 0x1332ae3b588> >>> a[filter2] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7 >>> a * ~filter2 sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9
implemented LArray.divnot0
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange((nat, sex)) >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3 >>> b = ndrange(sex) >>> b sex | M | F | 0 | 1 >>> a / b nat\sex | M | F BE | nan | 1.0 FO | inf | 3.0 >>> a.divnot0(b) nat\sex | M | F BE | 0.0 | 1.0 FO | 0.0 | 3.0
implemented .named() on groups to name groups after the fact
>>> a = ndrange(Axis('age', range(100))) >>> a age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 >>> a.sum((x.age[10:19].named('teens'), x.age[20:29].named('twenties'))) age | 'teens' (10:19) | 'twenties' (20:29) | 145 | 245
made all array creation functions (ndrange, zeros, ones, full, LArray, …) more flexible:
They accept a single Axis argument instead of requiring a tuple/list of them
>>> sex = Axis('sex', 'M,F') >>> a = ndrange(sex) >>> a sex | M | F | 0 | 1
Shortcut definition for axes work
>>> ndrange("a,b,c") {0} | a | b | c | 0 | 1 | 2 >>> ndrange(["1:3", "d,e"]) {0}\{1} | d | e 1 | 0 | 1 2 | 2 | 3 3 | 4 | 5 >>> LArray([1, 5, 7], "a,b,c") {0} | a | b | c | 1 | 5 | 7
One can mix Axis objects and ints (for axes without labels)
>>> sex = Axis('sex', 'M,F') >>> ndrange([sex, 3]) sex\{1}* | 0 | 1 | 2 M | 0 | 1 | 2 F | 3 | 4 | 5
made it possible to iterate on labels of a group (eg a slice of an Axis):
>>> for year in a.axes.year[2010:]: ... # do stuff
changed representation of anonymous axes from “axisN” (where N is the position of the axis) to “{N}”. The problem was that “axisN” was not recognizable enough as an anonymous axis, and it was thus misleading. For example “a[x.axis0[…]]” would not work.
better overall support for arrays with anonymous axes or several axes with the same name
fixed all output functions (to_csv, to_excel, to_hdf, …) when the last axis has no name but other axes have one
implemented eye() which creates 2D arrays with ones on the diagonal and zeros elsewhere.
>>> eye(sex) sex\sex | M | F M | 1.0 | 0.0 F | 0.0 | 1.0
implemented the @ operator to do matrix multiplication (Python3.5+ only)
implemented inverse() to return the (matrix) inverse of a (square) 2D array
>>> a = eye(sex) * 2 >>> a sex\sex | M | F M | 2.0 | 0.0 F | 0.0 | 2.0
>>> a @ inverse(a) sex\sex | M | F M | 1.0 | 0.0 F | 0.0 | 1.0
implemented diag() to extract a diagonal or construct a diagonal array.
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange([nat, sex], start=1) >>> a nat\sex | M | F BE | 1 | 2 FO | 3 | 4 >>> d = diag(a) >>> d nat,sex | BE,M | FO,F | 1 | 4 >>> diag(d) nat\sex | M | F BE | 1 | 0 FO | 0 | 4 >>> a = ndrange(sex, start=1) >>> a sex | M | F | 1 | 2 >>> diag(a) sex\sex | M | F M | 1 | 0 F | 0 | 2
For completeness¶
added Axis.rename method which returns a copy of the axis with a different name and deprecate Axis._rename
added labels_array as a generalized version of identity (which is deprecated)
implemented LArray.ipoints[…] to do point selection using coordinates instead of labels (aka numpy indexing)
raise an error when trying to do a[key_with_more_axes_than_a] = value instead of silently ignoring extra axes.
allow using a single int for index_col in read_csv in addition to a list of ints
implemented __getitem__ for “x”. You can now write stuff like:
>>> a = ndrange((3, 4)) >>> a[x[0][1:]] {0}\{1}* | 0 | 1 | 2 | 3 1 | 4 | 5 | 6 | 7 2 | 8 | 9 | 10 | 11 >>> a[x[1][2:]] {0}*\{1} | 2 | 3 0 | 2 | 3 1 | 6 | 7 2 | 10 | 11 >>> a.sum(x[0]) {0}* | 0 | 1 | 2 | 3 | 12 | 15 | 18 | 21
produce normal axes instead of wildcard axes on LArray.points[…]. This is (much) slower but more correct/informative.
changed the way we store axes internally, which has several consequences
better overall support for anonymous axes
better support for arrays with several axes with the same name
small performance improvement
the same axis object cannot be added twice in an array (one should use axis.copy() if that need arises)
changes the way groups with an axis are displayed
fixed sum, min, max functions on non-LArray arguments
changed __repr__ for wildcard axes to not display their labels but their length
>>> ndrange(3).axes[0] Axis(None, 3)
fixed aggregates on several groups “forgetting” the name of groups which had been created using axis.all()
allow Axis(…, long) in addition to int (Python2 only)
better docstrings/tests/comments/error messages/thoughts/…
Version 0.11.1¶
Released on 2016-05-25.
Fixes¶
fixed new functions full, full_like and create_sequential not being available when using from larray import *
Version 0.11¶
Released on 2016-05-25.
Viewer¶
implemented “Copy to Excel” in context menu (Ctrl+E), to open the selection in a new Excel sheet directly, without the need to use paste. If nothing is selected, copies the whole array.
when nothing is selected, Ctrl C selects & copies the whole array to the clipboard.
when nothing is selected, Ctrl V paste at top-left corner
implemented view(dict_with_array_values)
>>> view({'a': array1, 'b': array2})
fixed copy (ctrl-C) when viewing a 2D array: it did not include labels from the first axis in that case
Core¶
implemented LArray.growth_rate to compute the growth along an axis
>>> sex = Axis('sex', ['M', 'F']) >>> year = Axis('year', [2015, 2016, 2017]) >>> a = ndrange([sex, year]).cumsum(x.year) >>> a sex\year | 2015 | 2016 | 2017 M | 0 | 1 | 3 F | 3 | 7 | 12 >>> a.growth_rate() sex\year | 2016 | 2017 M | inf | 2.0 F | 1.33333333333 | 0.714285714286 >>> a.growth_rate(d=2) sex\year | 2017 M | inf F | 3.0
implemented LArray.diff (difference along an axis)
>>> sex = Axis('sex', ['M', 'F']) >>> xtype = Axis('type', ['type1', 'type2', 'type3']) >>> a = ndrange([sex, xtype]).cumsum(x.type) >>> a sex\type | type1 | type2 | type3 M | 0 | 1 | 3 F | 3 | 7 | 12 >>> a.diff() sex\type | type2 | type3 M | 1 | 2 F | 4 | 5 >>> a.diff(n=2) sex\type | type3 M | 1 F | 1 >>> a.diff(x.sex) sex\type | type1 | type2 | type3 F | 3 | 6 | 9
implemented round() (as a nicer alias to around() and round_())
>>> a = ndrange(5) + 0.5 >>> a axis0 | 0 | 1 | 2 | 3 | 4 | 0.5 | 1.5 | 2.5 | 3.5 | 4.5 >>> round(a) axis0 | 0 | 1 | 2 | 3 | 4 | 0.0 | 2.0 | 2.0 | 4.0 | 4.0
implemented Session[[‘list’, ‘of’, ‘str’]] to get a subset of a Session
>>> s = Session({'a': ndrange(3), 'b': ndrange(4), 'c': ndrange(5)}) >>> s Session(a, b, c) >>> s['a', 'c'] Session(a, c)
implemented LArray.points to do pointwise indexing instead of the default orthogonal indexing when indexing several dimensions at the same time.
>>> a = Axis('a', ['a1', 'a2', 'a3']) >>> b = Axis('b', ['b1', 'b2', 'b3']) >>> arr = ndrange((a, b)) >>> arr a\b | b1 | b2 | b3 a1 | 0 | 1 | 2 a2 | 3 | 4 | 5 >>> arr[['a1', 'a3'], ['b1', 'b2']] a\b | b1 | b2 a1 | 0 | 1 a3 | 6 | 7 # this selects the points ('a1', 'b1') and ('a3', 'b2') >>> arr.points[['a1', 'a3'], ['b1', 'b2']] a,b* | 0 | 1 | 0 | 7
Note that .ipoints (to do pointwise indexing with positions instead of labels – aka numpy indexing) is planned but not functional yet.
made “arr1.drop_labels() * arr2” use the labels from arr2 if any
>>> a = Axis('a', ['a1', 'a2']) >>> b = Axis('b', ['b1', 'b2']) >>> b2 = Axis('b', ['b2', 'b3']) >>> arr1 = ndrange([a, b]) >>> arr1 a\b | b1 | b2 a1 | 0 | 1 a2 | 2 | 3 >>> arr1.drop_labels(b) a\b* | 0 | 1 a1 | 0 | 1 a2 | 2 | 3 >>> arr1.drop_labels([a, b]) a*\b* | 0 | 1 0 | 0 | 1 1 | 2 | 3 >>> arr2 = ndrange([a, b2]) >>> arr2 a\b | b2 | b3 a1 | 0 | 1 a2 | 2 | 3 >>> arr1 * arr2 Traceback (most recent call last): ... ValueError: incompatible axes: Axis('b', ['b2', 'b3']) vs Axis('b', ['b1', 'b2']) >>> arr1 * arr2.drop_labels() a\b | b1 | b2 a1 | 0 | 1 a2 | 4 | 9 # in versions < 0.11, it used to return: # >>> arr1.drop_labels() * arr2 # a*\b* | 0 | 1 # 0 | 0 | 1 # 1 | 2 | 3 >>> arr1.drop_labels() * arr2 a\b | b2 | b3 a1 | 0 | 1 a2 | 4 | 9 >>> arr1.drop_labels('a') * arr2.drop_labels('b') a\b | b1 | b2 a1 | 0 | 1 a2 | 4 | 9
made .plot a property, like in Pandas, so that we can do stuff like:
>>> a.plot.bar() # instead of >>> a.plot(kind='bar')
made labels from different types not match against each other even if their value is the same. This might break some code but it is both more efficient and more convenient in some cases, so let us see how it goes:
>>> a = ndrange(4) >>> a axis0 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 >>> a[1] 1 >>> # This used to "work" (and return 1) >>> a[True] … ValueError: True is not a valid label for any axis
>>> a[1.0] … ValueError: 1.0 is not a valid label for any axis
implemented read_csv(dialect=’liam2’) to read .csv files formatted like in LIAM2 (with the axes names on a separate line than the last axis labels)
implemented Session[boolean LArray]
>>> a = ndrange(3) >>> b = ndrange(4) >>> s1 = Session({'a': a, 'b': b}) >>> s2 = Session({'a': a + 1, 'b': b}) >>> s1 == s2 name | a | b | False | True >>> s1[s1 == s2] Session(b) >>> s1[s1 != s2] Session(a)
implemented experimental support for creating an array sequentially. Comments on the name of the function and syntax (especially compared to ndrange) would be appreciated.
>>> year = Axis('year', range(2016, 2020)) >>> sex = Axis('sex', ['M', 'F']) >>> create_sequential(year) year | 2016 | 2017 | 2018 | 2019 | 0 | 1 | 2 | 3 >>> create_sequential(year, 1.0, 0.1) year | 2016 | 2017 | 2018 | 2019 | 1.0 | 1.1 | 1.2 | 1.3 >>> create_sequential(year, 1.0, mult=1.1) year | 2016 | 2017 | 2018 | 2019 | 1.0 | 1.1 | 1.21 | 1.331 >>> inc = LArray([1, 2], [sex]) >>> inc sex | M | F | 1 | 2 >>> create_sequential(year, 1.0, inc) sex\year | 2016 | 2017 | 2018 | 2019 M | 1.0 | 2.0 | 3.0 | 4.0 F | 1.0 | 3.0 | 5.0 | 7.0 >>> mult = LArray([2, 3], [sex]) >>> mult sex | M | F | 2 | 3 >>> create_sequential(year, 1.0, mult=mult) sex\year | 2016 | 2017 | 2018 | 2019 M | 1.0 | 2.0 | 4.0 | 8.0 F | 1.0 | 3.0 | 9.0 | 27.0 >>> initial = LArray([3, 4], [sex]) >>> initial sex | M | F | 3 | 4 >>> create_sequential(year, initial, inc, mult) sex\year | 2016 | 2017 | 2018 | 2019 M | 3 | 7 | 15 | 31 F | 4 | 14 | 44 | 134 >>> def modify(prev_value): ... return prev_value / 2 >>> create_sequential(year, 8, func=modify) year | 2016 | 2017 | 2018 | 2019 | 8 | 4 | 2 | 1 >>> create_sequential(3) axis0* | 0 | 1 | 2 | 0 | 1 | 2 >>> create_sequential(x.year, axes=(sex, year)) sex\year | 2016 | 2017 | 2018 | 2019 M | 0 | 1 | 2 | 3 F | 0 | 1 | 2 | 3
implemented full and full_like to create arrays initialize to something else than zeros or ones
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> full([nat, sex], 42.0) nat\sex | M | F BE | 42.0 | 42.0 FO | 42.0 | 42.0 >>> initial_value = ndrange([sex]) >>> initial_value sex | M | F | 0 | 1 >>> full([nat, sex], initial_value) nat\sex | M | F BE | 0 | 1 FO | 0 | 1
performance improvements when using label keys: a[key] is faster, especially if key is large
Fixes¶
to_excel(filepath) only closes the file if it was not open before
removed code which forced labels from .csv files to be strings (as it caused problems in many cases, e.g. ages in LIAM2 files)
Misc. stuff for completeness¶
made LGroups usable in Python’s builtin range() and convertible to int and float
implemented AxisCollection.union (equivalent to AxisCollection | Axis)
fixed boolean array keys (boolean filter) in combination with scalar keys (for other dimensions)
fixed support for older numpy
fixed LArray.shift(n=0)
still more work on making arrays with anonymous axes usable (not there yet)
added more tests
better docstrings/error messages…
misc. code cleanup/simplification/improved comments
Version 0.10.1¶
Released on 2016-03-25.
New features¶
A single change in this release: a much more powerful to_excel function which (by default) use Excel itself to write files. Additional functionality include:
write in an existing file without overwriting existing data/sheet/…
write at a precise position
view an array in a live Excel instance (a new OR an existing workbook)
See
to_excel()
documentation for details.
Version 0.10¶
Released on 2016-03-22.
Core¶
implemented dropna argument for to_csv, to_frame and to_series to avoid writing lines with either ‘all’ or ‘any’ NA values.
implemented read_sas. Needs pandas >= 0.18 (though it seems still buggy on some files).
implemented experimental support for __getattr__ and __setattr__ on LArray. One can use arr.H instead of arr[‘M’]. It only works for single string labels though (not for slices or list of labels nor integer labels). Not sure it is a good idea :).
- implemented Session +-*/
Eg. sess1 - sess2 will compute the difference on each array present in either session. If an array is present in one session and not in the other, it is replaced by “NaN”.
added .nbytes property to LArray objects (to know how many bytes of memory the array uses)
made sort_axis accept a tuple of axes
raises an error on a.i[tuple_with_len_greater_than_array_ndim]
slightly better support for axes with no name (no, still no complete support yet ;-))
improved AxisCollection: implemented __delitem__(slice), __setitem__(list), __setitem__(slice)
fixed exception on AxisCollection.index(invalid_index)
better docstrings for a few functions
misc code cleanups, refactoring & improved tests
Editor¶
added .dirty property on ArrayEditorWidget
fixed viewing arrays with “inf” (infinite)
fixed a few edge cases for the ndigit detection code
fixed colors in some cases in edit()
made copy-paste of large regions faster in some cases
Version 0.9.2¶
Released on 2016-03-02.
Core¶
much better support for unnamed axes overall. Still a long way to go for full support, but it’s getting there…
Editor¶
fixed edit() for arrays with the same labels on several axes
Version 0.9.1¶
Released on 2016-03-01.
Core¶
better .info for arrays with groups in axes
>>> # example using groups without a name >>> reg = la.sum((fla, wal, bru, belgium)) >>> reg.info 4 x 15 geo [4]: ['A11' ... 'A73'] ['A25' ... 'A93'] 'A21' ['A11' ... 'A21'] lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
>>> # example using groups with a name >>> fla = geo.group(fla_str, name='Flanders') >>> wal = geo.group(wal_str, name='Wallonia') >>> bru = geo.group(bru_str, name='Brussels') >>> reg = la.sum((fla, wal, bru)) >>> reg.info 3 x 15 geo [3]: 'Flanders' (['A11' ... 'A73']) 'Wallonia' (['A25' ... 'A93']) 'Brussels' ('A21') lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
Editor¶
fixed edit() with non-string labels in axes
fixed edit() with filters in some more cases
fixed ArrayEditorWidget.reject_changes and accept_changes to update the model & view accordingly (in case the widget is kept open)
avoid (harmless) error messages in some cases
Version 0.9¶
Released on 2016-02-25.
A minor but backward incompatible version (hence the bump in version number)!
Core¶
fixed int_array.mean() to return floats instead of int (regression in 0.8)
larray_equal returns False when either value is not an LArray, instead of raising an exception
Session¶
changed Session == Session to return an array of booleans instead of a single boolean, so that we know which array(s) differ. Code like session1 == session2, should be changed to all(session1 == session2).
implemented Session != Session
implemented Session.get(k, default) (returns default if k does not exist in Session)
implemented len() for Session objects to know how many objects are in the Session
Viewer¶
fixed view() (regression in 0.8.1)
fixed edit() to actually apply changes on “OK”/accept_changes even when no filter change occurred after the last edit.
Version 0.8.1¶
Released on 2016-02-24.
Viewer¶
implemented min/maxvalue arguments for edit()
do not close the window when pressing Enter
allow to start editing cells by pressing Enter
fixed copy of changed cells (copy the changed value)
fixed pasted values to not be accepted directly (they go to “changes” like for manual edits)
fixed color updates on paste
disabled experimental tooltips on headers
better error message when entering invalid values
Core¶
implemented indexing by position on several dimensions at once (like numpy)
>>> # takes the first item in the first and third dimensions, leave the second dimension intact >>> arr.i[0, :, 0] <some result> >>> # sets all the cells corresponding to the first item in the first dimension and the second item in the fourth >>> # dimension >>> arr.i[0, :, :, 1] = 42
added optional ‘readonly’ argument to expand() to produce a readonly view (much faster since no copying is done)
Version 0.8¶
Released on 2016-02-16.
Core¶
implemented skipna argument for most aggregate functions. defaults to True.
implemented LArray.sort_values(key)
implemented percentile and median
added isnan and isinf toplevel functions
made axis argument optional for argsort & posargsort on 1D arrays
fixed a[key] = value when key corresponds to a single cell of the array
fixed keepaxes argument for aggregate functions
fixed a[int_array] (when the axis needs to be guessed)
fixed empty_like
fixed aggregates on several axes given as integers e.g. arr.sum(axis=(0, 2))
fixed “kind” argument in posargsort
Viewer¶
added title argument to edit() (set automatically if not provided, like for view())
fixed edit() on filtered arrays
fixed view(expression). anything which was not stored in a variable was broken in 0.7.1
reset background color when setting values if necessary (still buggy in some cases, but much less so ;-))
background color for headers is always on
view() => array cells are not editable, instead of being editable and ignoring entered values
fixed compare() colors when arrays are entirely equal
fixed error message for compare() when PyQt is not available
Misc¶
bump numpy requirement to 1.10, implicitly dropping support for python 3.3
renamed view module to editor to not collide with view function
improved/added a few tests
Version 0.7.1¶
Released on 2016-01-29.
Viewer¶
implemented paste (ctrl-V)
implemented experimental array comparator:
>>> compare(array1, array2)
Known limitation: the arrays must have exactly the same axes and the background color is buggy when using filters
when no title is specified in view(), it is determined automatically by inspecting the local variables of the function where view() is called and using the names of the ones matching the object passed. If several matches, up to 3 are displayed.
added axes names to copy (ctrl-C)
fixed copy (ctrl-C) of 0d array
Input/Output¶
added ‘dialect’ argument to to_csv. For example, dialect=’classic’ does not include the last (horizontal) axis name.
fixed loading .csv files without (ie ‘classic’ .csv files), though one needs to specify nb_index in that case if ndim > 2
strip spaces around axes names so that you can use “axis0<space><space>axis1” instead of “axis0axis1” in .csv files
fixed 1d arrays I/O
more precise parsing of input headers: 1 and 0 come out as int, not bool
Misc¶
nicer error message when using an invalid axes names
changed LArray .df property to a to_frame() method so that we can pass options to it
Version 0.7¶
Released on 2016-01-26.
Viewer¶
implemented view() on Session objects
added axes length in window title and add axes info even if title is provided manually (concatenate both)
ndecimals are recomputed when toggling the scientific checkbox
allow viewing (some) non-ndarray stuff (e.g. python lists)
refactored viewer code so that the filter drop downs can be reused too
Known regression: the viewer is slow on large arrays (this will be fixed in a later release, obviously)
Session¶
implemented local_arrays() to return all LArray in locals() as a Session
implemented Session.__getitem__(int_position)
implement Session(filename) to directly load all arrays from a file. Equivalent to:
>>> s = Session() >>> s.load(filename)
implemented Session.__eq__, so that you can compare two sessions and see if all arrays are equal. Suppose you want to refactor your code and make sure you get the same results.
>>> # put results in a Session >>> res = Session({'array1': array1, 'array2': array2}) >>> # before refactoring >>> res.dump('results.h5') >>> # after refactoring >>> assert Session('results.h5') == res
you can load all sheets/arrays of a file (if you do not specify which ones you want, it takes all)
loading several sheets from an excel file is now MUCH faster because the same file is kept open (apparently xlrd parses the whole file each time we open it).
you can specify a subset of arrays to dump
implemented rudimentary session I/O for .csv files, usage is a bit different from .h5 & excel files
>>> # need to specify format manually >>> s.dump('directory_name', fmt='csv') >>> # need to specify format manually >>> s = Session() >>> s.load('directory_name', fmt='csv')
pass *args and **kwargs to lower level functions in Session.load
fail when trying to read an inexistant H5 file through Session, instead of creating it
Other new features¶
added start argument in ndrange to specify starting value
implemented Axis._rename. Not sure it’s a good idea though…
implemented identity function which takes an Axis and returns an LArray with the axis labels as values
implemented size property on AxisCollection
allow a single int in AxisCollection.without
Fixes¶
fixed broadcast_with when other_axes contains 0-len axes
fixed a[bool_array] = value when the first axis of a is not in bool_array
fixed view() on arrays with unnamed axes
fixed view() on arrays of Python objects
various other small bugs fixed
Version 0.6.1¶
Released on 2016-01-13.
New features¶
added dtype argument to all array creation functions to override default data type
aggregates can take an explicit “axis” keyword argument which can be used to target an axis by index
>>> arr.sum(axis=0)
implemented LGroup.__getitem__ & LGroup.__iter__, so that for list-based groups (ie not slices) you can write:
>>> for v in my_group: ... # some code
or
>>> my_group[0]
Miscellaneous improvements¶
renamed LabelGroup to LGroup and PositionalKey to PGroup. We might want to rename the later to IGroup (to be consistent with axis.i[…]).
slightly better support for axes without name
better docstrings for a few functions
misc cleanup
Fixes¶
fixed XXX_like(a) functions to use the same dtype than a instead of always float
fixed to_XXX with 1d arrays (e.g. to_clipboard())
fixed all() and any() toplevel functions without argument
fixed LArray without axes in some cases
fixed array creation functions with only shapes on python2
Version 0.6¶
Released on 2016-01-12.
New features¶
a[bool_array_key] broadcasts missing/differently ordered dimensions and returns an LArray with combined axes
a[bool_array_key] = value broadcasts missing/differently ordered dimensions on both key and value
- implemented argmin, argmax, argsort, posargmin, posargmax, posargsort.
they do indirect operation along an axis. E.g. argmin gives the label of the minimum value, argsort gives the labels which would sort the array along that dimension. posargXXX gives the position/indexes instead of the labels.
implemented Axis.__iter__ so that one can write:
>>> for label in an_array.axes.an_axis: ... <some code>
instead of
>>> for label in an_array.axes.an_axis.labels: ... <some code>
implemented the .info property on AxisCollection
implement all/any top level functions, so that you can use them in with_total.
Miscellaneous improvements¶
renamed ValueGroup to LabelGroup. We might want to rename it to LGroup to be consistent with LArray?
allow a single int as argument to LArray creation functions (ndrange et al.)
e.g. ndrange(10) is now allowed instead of ndrange([10])
use display_name in .info (ie add * next to wildcard axes in .info).
allow specifying a custom window title in view()
viewer displays booleans as True/False instead of 1/0
slightly better support for axes with no name (None). There is still a long way to go for full support though.
improved a few docstrings
nicer errors when tests results are different from expected
removed debug prints from viewer
misc cleanups
Fixes¶
fixed view() on all-negative arrays
fixed view() on string arrays
Version 0.5¶
Released on 2015-12-15.
New features¶
experimental support for indexing an LArray by another (integer) LArray
>>> array[other_array]
experimental support for LArray.drop_labels and the concept of wildcard axes
added LArray.display_name and AxisCollection.display_names which add ‘*’ next to wildcard axes
implemented where(cond, array1, array2)
implemented LArray.__iter__ so that this works:
>>> for value in array: ... <some code>
implement keepaxes=label or keepaxes=True for aggregate functions on full axes
array.sum(x.age, keepaxes=’total’)
AxisCollection.replace can replace several axes in one call
implemented .expand(out=) to expand into an existing array
Miscellaneous improvements¶
removed Axis.sorted()
removed LArray.axes_names & axes_labels. One should use .axes.names & .axes.labels instead.
raise an error when trying to convert an array with more than one value to a Boolean. For example, this will fail:
>>> arr = ndrange([sex]) >>> if arr: ... <some code>
convert value to self.dtype in append/prepend
faster .extend, .append, .prepend and .expand
some code cleanup, better tests, …
Fixes¶
fixed .extend when other has longer axes than self
Version 0.4¶
Released on 2015-12-09.
New features¶
implemented LArray.expand to add dimensions
implemented prepend
implemented sort_axis
allow creating 0d (scalar) LArrays
Miscellaneous improvements¶
made extend expand its arguments
made .append expand its value before appending
changed read_* to not sort data by default
more minor stuff :)
Fixes¶
fixed loading 1d arrays
Version 0.3¶
Released on 2015-11-26.
New features¶
implemented LArray.with_total(): appends axes or group aggregates to the array.
Without argument, it adds totals on all axes. It has optional keyword only arguments:
label: specify the label (“total” by default)
op: specify the aggregate function (sum by default, all other aggregates should work too)
With multiple arguments, it adds totals sequentially. There are some tricky cases. For example when, for the same axis, you add group aggregates and axis aggregates:
>>> # works but "wrong" for x.geo (double what is expected because the total also >>> # includes fla wal & bru) >>> la.with_total(x.sex, (fla, wal, bru), x.geo, x.lipro)
>>> # correct total but the order is not very nice >>> la.with_total(x.sex, x.geo, (fla, wal, bru), x.lipro)
>>> # the correct way to do it, but it is probably not entirely obvious >>> la.with_total(x.sex, (fla, wal, bru, x.geo.all()), x.lipro)
>>> # we probably want to display a warning (or even an error?) in that case. >>> # If the user really wants that behavior, he can split the operation: >>> # .with_total((fla, wal, bru)).with_total(x.geo)
implemented group aggregates without using keyword arguments. As a consequence of this, one can no longer use axis numbers in aggregates. Eg. a.sum(0) does not sum on the first axis anymore (but you can do a.sum(a.axes[0]) if needed)
implemented LArray.percent: equivalent to ratio * 100
implemented Session.filter -> returns a new Session with only objects matching the filter
implemented Session.dump -> dumps all LArray in the Session to a file
implemented Session.load -> load several LArrays from a file to a Session
Version 0.2.6¶
Released on 2015-11-24.
Fixes¶
fixed LArray.cumsum and cumprod.
fixed all doctests just enough so that they run.
Version 0.2.5¶
Released on 2015-10-29.
Miscellaneous improvements¶
many methods got (improved) docstrings (Thanks to Johan).
Fixes¶
fixed mixing keys without axis (e.g. arr[10:15]) with key with axes (e.g. arr[x.age[10:15]]).
Version 0.2.4¶
Released on 2015-10-27.
New features¶
includes an experimental (slightly inefficient) version of guess axis, so that one can write:
>>> arr[10:20]
instead of
>>> arr[age[10:20]]
Version 0.2.3¶
Released on 2015-10-19.
New features¶
positional slicing via “x.” syntax (x.axis.i[:5])
Fixes¶
view(array) is usable when doing from larray import *
fixed a nasty bug for doing “group” aggregates when there is only one dimension
Version 0.2.2¶
Released on 2015-10-15.
New features¶
implement AxisCollection.replace(old_axis, new_axis)
implement positional indexing
Miscellaneous improvements¶
more powerful AxisCollection.pop added support .pop(name) or .pop(Axis object)
LArray.set_labels returns a new LArray by default use inplace=True to get previous behavior
include ndrange and __version__ in __all__
Fixes¶
fixed shift with n <= 0
Version 0.2.1¶
Released on 2015-10-14.
New features¶
implemented LArray.shift(axis, n=1)
Miscellaneous improvements¶
change set_labels API (axis, new_labels)
transform Axis.labels into a property so that _mapping is kept in sync
Fixes¶
hopefully fix build
Version 0.2¶
Released on 2015-10-13.
New features¶
added to_clipboard.
added embryonic documentation.
added sort_columns and na arguments to read_hdf.
added sort_rows, sort_columns and na arguments to read_excel.
added setup.py to install the module.
Miscellaneous improvements¶
IO functions (to_*/read_*) now support unnamed axes. The set of supported operations is very limited with such arrays though.
to_excel sheet_name defaults to “Sheet1” like in Pandas.
reorganised files.
automated somewhat releases (added a rudimentary release script).
Fixes¶
column titles are no longer converted to lowercase.
Version 0.1¶
Released on 2014-10-22.