Change log
Version 0.34.4
Released on 2024-07-23.
CORE
Fixes
fixed our documentation examples.
Version 0.34.3
Released on 2024-07-18.
CORE
New features
added support for Python 3.12 (closes issue 1109).
Miscellaneous improvements
improved the error message when selecting several labels at the same time and some of them were wrong. In that case, it was hard to know which labels were wrong. This was especially annoying if the axis was long and thus not shown in its entirety in the error message. For example, given the following array:
>>> arr = la.ndtest('a=a0,a1;b=b0..b2,b4..b7') >>> arr a\b b0 b1 b2 b4 b5 b6 b7 a0 0 1 2 3 4 5 6 a1 7 8 9 10 11 12 13
This code:
>>> arr['b0,b2,b3,b7']
used to produce the following error:
ValueError: 'b0,b2,b3,b7' is not a valid label for any axis: a [2]: 'a0' 'a1' b [7]: 'b0' 'b1' 'b2' ... 'b5' 'b6' 'b7'
which did not contain enough information to determine the problem was with ‘b3’. It now produces this instead:
ValueError: 'b0,b2,b3,b7' is not a valid subset for any axis: a [2]: 'a0' 'a1' b [7]: 'b0' 'b1' 'b2' ... 'b5' 'b6' 'b7' Some of those labels are valid though: * axis 'b' contains 3 out of 4 labels (missing labels: 'b3')
Closes issue 1101.
Fixes
using a boolean array as a filter to take a subset of another array now raise an error when the two arrays have incompatible axes instead of producing wrong result (closes issue 1085).
fixed copying a sheet from one Excel workbook to another when the destination sheet is given by position (closes issue 1092).
fixed
Array.values()
andArray.items()
on the first axis given by position. (e.g. my_array.values(axes=0)). Closes issue 1093.fixed
Array.dump()
axes_names
argument for 1D arrays (closes issue 1094).fixed
Axis.difference()
,Axis.intersection()
andAxis.union()
with a Group argument (closes issue 1104).fixed converting a scalar Array (an Array with 0 dimensions) to string with numpy 1.22+.
avoid warnings and errors with recent versions of our dependencies (Numpy 2+, Pandas 2.2+ and xlwings 0.30.2+). Closes issue 1100, issue 1107 and issue 1108.
EDITOR
Fixes
changes made to arrays in the console using the “points” syntax (for example: arr.points[‘a0,a1’, ‘b0,b1’] = 0) and the other special .something[] syntaxes were not detected by the viewer and thus not displayed (closes issue 269).
fixed copying to clipboard an array filtered on all dimensions (to a single value). Closes issue 270.
Version 0.34.2
Released on 2023-10-23.
CORE
New features
added support for evaluating expressions using X.axis_name when calling some built-in functions, most notably where(). For example, the following code now works (previously it seemed to work but produced the wrong result – see the fixes section below):
>>> arr = ndtest("age=0..3") >>> arr age 0 1 2 3 0 1 2 3 >>> where(X.age == 2, 42, arr) age 0 1 2 3 0 1 42 3
Fixes
fixed Array.reindex when using an axis object from the array as axes_to_reindex (closes issue 1088).
fixed Array.reindex({axis: list_of_labels}) (closes issue 1068).
Array.split_axes now raises an explicit error when some labels contain more separators than others, instead of silently dropping part of those labels, or even some data (closes issue 1089).
a boolean condition including only X.axis_name and scalars (e.g. X.age == 0) raises an error when Python needs to know whether it is True or not (because there is no array to extract the axis labels from) instead of always evaluating to True. This was especially dangerous in the context of a where() function, which always evaluated to its left side (e.g. where(X.age > 0, arr, 0) evaluated to arr for all ages). Closes issue 1083.
expressions using X.axis_name and an Array now evaluate correctly when the Array is not involved in the first operation. For example, this already worked:
>>> arr = ndtest("age=0..3") >>> arr age 0 1 2 3 0 1 2 3 >>> arr * (X.age != 2) age 0 1 2 3 0 1 0 3
but this did not:
>>> (X.age != 2) * arr
fixed plots with fewer than 6 integer labels in the x axis. In that case, it interpolated the values, which usually looks wrong for integer labels (e.g. year). Closes issue 1076.
EDITOR
Fixes
fixed the viewer being unusable after showing a matplotlib plot (closes issue 261).
silence spurious debugger warning on Python 3.11 (closes issue 263).
when code in the interactive console creates and shows a plot window, avoid showing it a second time (closes issue 265).
depending on the system regional settings, comparator tolerance sometimes did not allow simple fractional numbers (e.g. 0.1). The only way to specify the tolerance was the scientific notation (closes issue 260).
Version 0.34.1
Released on 2023-09-14.
CORE
New features
added support for Python 3.11.
added support for stacking all arrays of a Session by simply doing: stack(my_session) instead of stack(my_session.items()) (closes issue 1057).
Fixes
avoid warnings with recent versions of Pandas or Numpy (closes issue 1061).
EDITOR
New features
added support for Python 3.11.
Fixes
Version 0.34
Released on 2023-03-14.
CORE
Syntax changes
made
Array.append()
work for the cases previously covered byArray.extend()
(when the appended value already has the axis being extended) and deprecatedArray.extend()
(closes issue 887).renamed
Array.sort_axes()
toArray.sort_labels()
(closes issue 861).renamed
Array.percentile()
andArray.percentile_by()
interpolation argument to method to follow numpy and thus support additional “interpolation” methods.deprecated the ability to target a label in an aggregated array using the group that created it. The aggregated array label should be used instead. This is a seldom used feature which is complex to keep working and has a significant performance cost in some cases, even when the feature is not used (closes issue 994).
In other words, the following code will now raise a warning:
>>> arr = ndtest(4) >>> arr a a0 a1 a2 a3 0 1 2 3 >>> group1 = arr.a['a0', 'a2'] >> 'a0_a2' >>> group2 = arr.a['a1', 'a3'] >> 'a1_a3' >>> agg_arr = arr.sum((group1, group2)) >>> agg_arr a a0_a2 a1_a3 2 4 >>> agg_arr[group1] FutureWarning: Using a Group object which was used to create an aggregate to target its aggregated label is deprecated. Please use the aggregated label directly instead. In this case, you should use 'a0_a2' instead of using a['a0', 'a2'] >> 'a0_a2'. 2
One should use the label on the aggregated array instead:
>>> agg_arr['a0_a2'] 2
deprecated passing individual session elements as non-keyword arguments to
Session()
. This means that, for example,Session(axis1, axis2, array1=array1)
should be rewritten asSession(axis1name=axis1, axis2name=axis2, array1=array1)
instead. Closes issue 1024.deprecated
Session.add()
. Please useSession.update()
instead (closes issue 999).
Backward incompatible changes
dropped support for Python 3.6.
deprecations dating to version 0.29 or earlier (released more than 3 years ago) now raise errors instead of printing a warning.
New features
added support for Python 3.10.
implemented
Array.value_counts()
, which computes the number of occurrences of each unique value in an array.added
Session.nbytes
and addedSession.memory_used
attributes.added
display
argument toArray.compact()
to display a message if some axes were “compacted”.
Miscellaneous improvements
made all I/O functions/methods/constructors accept pathlib.Path objects in addition to strings for all arguments representing a path (closes issue 896).
added type hints for all remaining functions and methods which improves autocompletion in editors (such as PyCharm). Closes issue 864.
made several error messages more useful when trying to get an invalid subset of an array (closes issue 875).
when a key is not valid on any axis, the error message includes the array axes
when a key is not valid for the axis specified by the user, the error message includes that axis labels
when a label is ambiguous (valid on several axes), the error message includes the axes labels in addition to the axes names
when several parts of a key seem to target the same axis, the error message includes the bad key in addition to the axis.
made
ipfp()
faster (the smaller the array, the larger the improvement) For example, for small arrays it is several times faster than before, for 1000x1000 arrays it is about 30% faster.made arithmetic operations between two Arrays with the same axes much faster.
made Array[] faster in the usual/simple cases.
made Array.i[] much faster.
Fixes
fixed displaying plots made via
Array.plot()
outside of the LArray editor (closes issue 1019).fixed
Array.insert()
when no label is provided (closes issue 879).fixed
Array.insert()
when (one of) the inserted label(s) is ambigous on the value.fixed comparison between
Array
andNone
returning False instead of an array of boolean values (closes issue 988)fixed binary operations between an
Array
and anAxis
returning False.fixed
AxisCollection.split_axes()
with anonymous axes.fixed the
names
argument inArray.split_axes()
andAxisCollection.split_axes()
not working in some cases.fixed taking a subset of an Excel range (e.g. myworkbook[‘my_sheet’][‘A2:C5’][1:])
fixed setting the first sheet via position in a new workbook opened via open_excel(overwrite=True):
>>> with open_excel(fpath, overwrite_file=True) as wb: ... wb[0] = <something>
fixed Array.ipoints[] when not all dimensions are given in the key.
EDITOR
New features
added support for Python 3.10.
the initial column width is now set depending on the content and resized automatically when changing the number of digits (closes issue 145).
Miscellaneous improvements
plot windows title now include the expression used to make the plot (the name of the array in most cases) (closes issue 233).
when displaying an expression (computed array), the window title includes the actual expression instead of using ‘<expr>’.
compare()
can now take filepaths as argument (and will load them as a Session) to make comparing a in-memory Session with an earlier Session saved on the disk. Those filepaths can be given as both str or Path objects. Closes issue 229.added support for Path objects (in addition to str representing paths) in
view()
andedit()
. See issue 896.when the editor displays currently defined variables (via
debug()
edit()
orview()
without argument within user code or via an exception when run_editor_on_exception is active), LArray functions are not made available systematically in the console anymore (what is available in the console is really what was available in the users code). This closes issue 199.added support for incomplete slices in “save command history”, like in Python slices. For example, one can save from line 10 onwards by using “10:” or “10..”, i.e. without specifying the last line. See issue 225.
Fixes
fixed
run_editor_on_exception()
so that the larray editor is not opened when trying to stop a program (via Ctrl-C or the IDE stop button). Closes issue 231.improved the situation when trying to stop a program (via Ctrl-C or the IDE stop button) with an LArray Editor window open. It used to ignore such events altogether, forcing the IDE to send a “kill” event when pressing the button a second time, which could leave some ressource open (e.g Excel instances). Now, the LArray Editor will close itself when its parent program is asked to stop but so far, it will only do so when the window is active again. This makes for an odd behavior but at least cleans up the program properly (closes issue 231).
when save command history fails, do not do so silently. Closes issue 225.
fixed saving command history to a path containing spaces. Closes issue 244.
fixed compare() background color being red for two equal integer arrays instead of white (closes issue 246).
Version 0.33.1
Released on 2021-09-22.
CORE
Miscellaneous improvements
added type hints for many Array methods (see issue 864) which improves autocompletion in editors (such as PyCharm).
Fixes
fixed CheckedSession with pydantic version >1.5 (closes issue 958).
removed the constraint on pydantic version in larrayenv, making it actually installable.
fixed using labels for x and y in
Array.plot()
andArray.plot.scatter()
functions, as well asArray.plot.pie()
(closes issue 969).fixed wrong “source code line” in “field is not declared” warning in CheckedSession.__init__() (closes issue 968).
fixed
Array.growth_rate()
returning nans instead of zeros when consecutive values are zeros (closes issue 903).
EDITOR
Fixes
Version 0.33
Released on 2021-08-17.
CORE
New features
added official support for Python 3.9 (0.32.3 already supports it even though it was not mentioned).
added
CheckedSession
,CheckedParameters
andCheckedArray
objects.CheckedSession is intended to be inherited by user defined classes in which the variables of a model are declared. By declaring variables, users will speed up the development of their models using the auto-completion (the feature in which development tools like PyCharm try to predict the variable or function a user intends to enter after only a few characters have been typed). All user defined classes inheriting from CheckedSession will have access to the same methods as Session objects.
CheckedParameters is the same as CheckedSession but the declared variables cannot be modified after initialization.
The special
CheckedArray
type represents an Array object with fixed axes and/or dtype. It is intended to be only used along withCheckedSession
.Closes issue 832.
Miscellaneous improvements
greatly improved
Array.plot()
method and “submethods” (Array.plot.bar()
, etc.)support x, y and by arguments in plot functions where it make sense When only some of them are specified, the other arguments pick from remaining available axes. This means a lot of plots can now be expressed more intuitively and concisely (you do not need to transpose your array to get the result you want, you just specify the axes you want to use in ‘x’ or ‘y’.
subplots argument now accepts an axis (or tuple of them) in addition to a boolean to specify which axes to use as subplots.
support for labels (instead of axes) in x and y for line plot and scatter.
support passing a dict as legend to customize the legend.
many tweaks to make several plots look better out of the box.
eye()
now supports an AxisCollection as argument, so you can use axes from another array by usingeye(other_array.axes)
.added arguments rtol, atol and nans_equal to the
Session.element_equals()
andSession.equals()
methods (closes issue 990).
Fixes
fixed Array.values(), zip_array_values and zip_array_items when axes=() (closes issue 883).
fixed several edge cases in
sequence()
.fixed set_labels(labels_dict) with several labels from the same axis (closes issue 851).
fixed loading arrays with anonymous axes and numeric labels from Excel using Pandas 1.3+ (closes issue 950).
fixed
read_hdf()
opening in RW mode instead of read mode (closes issue 980).
EDITOR
Version 0.32.3
Released on 2021-06-08.
CORE
Backward incompatible changes
dropped support for Python 2 (closes issue 567).
New features
added support for Python 3.8 (closes issue 850).
Miscellaneous improvements
scalar objects (i.e of type int, float, bool, string, date, time or datetime) belonging to a session are now also saved and loaded when using the HDF5 or pickle format (closes issue 842).
implemented
Axis.astype()
method (closes issue 880).added min_y, max_y and xticks_spacing keyword arguments to the
ReportSheet.add_graph()
andReportSheet.add_graphs()
methods (closes issue 901).implemented
isscalar()
function (closes issue 872).implemented the
Array.allclose()
method (closes issue 871).implemented
Axis.min()
andAxis.max()
methods (closes issue 874).
Fixes
EDITOR
Backward incompatible changes
dropped Python 2 support (closes issue 132).
Fixes
Version 0.32.2
Released on 2020-04-03.
CORE
Fixes
EDITOR
Fixes
fixed spurious warning in the console when an expression results in an empty sequence (array, list, tuple).
fixed displaying arrays entirely filled with NaN.
Version 0.32.1
Released on 2019-12-19.
CORE
Miscellaneous improvements
improved the tutorial and some examples to make them more intuitive (closes issue 829).
Fixes
EDITOR
Fixes
fixed the “Cancel” button of the confirmation dialog when trying to quit the editor with unsaved modifications. It was equivalent to discard, potentially leading to data loss.
fixed (harmless) error messages appearing when trying to display any variable via the console when using matplotlib 3.1+
Version 0.32
Released on 2019-11-17.
CORE
Syntax changes
Backward incompatible changes
Because it was broken, the possibility to dump and load Axis and Group objects contained in a session has been removed for the CSV and Excel formats. Fixing it would have taken too much time considering it is very rarely used (no one complains it was broken) so the decision to remove it was taken. However, this is still possible using the HDF format. Closes issue 815.
Miscellaneous improvements
conda channel to install or update the larray, larray-editor, larray-eurostat and larrayenv packages switched from
gdementen
tolarray-project
(closes issue 560).
Fixes
fixed binary operations between a session and an array object (closes issue 807).
fixed
Array.reindex()
printing a spurious warning message when the axes_to_reindex argument was the name of the axis to reindex (closes issue 812).fixed
zip_array_values()
andzip_array_items()
functions not available when importing the entire larray library asfrom larray import *
(closes issue 816).fixed wrong axes and groups names when loading a session from an HDF file (closes issue 803).
EDITOR
New features
added
debug()
function which opens an editor window with an extra widget to navigate back in the call stack (the chain of functions called to reach the current line of code).
Miscellaneous improvements
Sizes of the main window and the resizable components are saved when closing the viewer and restored when it is reopened (closes issue 165).
added keyword arguments
rtol
,atol
andnans_equal
to thecompare()
function (closes issue 172).run_editor_on_exception()
now usesdebug()
so that one can inspect what the state was in all functions traversed to reach the code which triggered the exception.
Version 0.31
Released on 2019-08-09.
CORE
New features
added the
ExcelReport
class allowing to generate multiple graphs in an Excel file at once (closes issue 676).
Fixes
fixed binary operations (+, -, *, etc.) between an LArray and a (scalar) Group which silently gave a wrong result (closes issue 797).
fixed taking a subset of an array with boolean labels for an axis if the user explicitly specify the axis (closes issue 735). When the user does not specify the axis, it currently fails but it is unclear what to do in that case (see issue 794).
fixed a regression in 0.30: X.axis_name[groups] failed when groups were originally defined on axes with the same name (i.e. when the operation was not actually needed). Closes issue 787.
EDITOR
New features
implemented
run_editor_on_exception()
function. If you call this function in your code (for example at the top of your main script), Python will open an larray editor if any unexpected error happens anywhere in your script (closes issue 180).
Fixes
Version 0.30
Released on 2019-06-27.
CORE
Syntax changes
stack()
axis
argument was renamed toaxes
to reflect the fact that the function can now stack along multiple axes at once (see below).to accommodate for the “simpler pattern language” now supported for those functions, using a regular expression in
Axis.matching()
orGroup.matching()
now requires passing the pattern as an explicitregex
keyword argument instead of just the first argument of those methods. For examplemy_axis.matching('test.*')
becomesmy_axis.matching(regex='test.*')
.LArray.as_table()
is deprecated because it duplicated functionality found inLArray.dump()
. Please only useLArray.dump()
from now on.renamed
a_min
anda_max
arguments ofLArray.clip()
tominval
andmaxval
respectively and made them optional (closes issue 747).
Backward incompatible changes
modified the behavior of the
pattern
argument ofSession.filter()
to actually support patterns instead of only checking if the object names start with the pattern. Special characters include?
for matching any single character and*
for matching any number of characters. Closes issue 703.Warning
If you were using Session.filter, you must add a
*
to your pattern to keep your code working. For example,my_session.filter('test')
must be changed tomy_session.filter('test*')
.LArray.equals()
now returns True for arrays even when axes are in a different order or some axes are missing on either side (but the data is constant over that axis on the other side). Closes issue 237.Warning
If you were using
LArray.equals()
and want to keep the old, stricter, behavior, you must addcheck_axes=True
.
New features
added
set_options()
andget_options()
functions to respectively set and get options for larray. Available options currently includedisplay_precision
for controlling the number of decimal digits used when showing floating point numbers,display_maxlines
to control the maximum number of lines to use when displaying an array, etc.set_options()
can used either like a normal function to set the options globally or within awith
block to set them only temporarily. Closes issue 274.implemented
read_stata()
andLArray.to_stata()
to read arrays from and write arrays to Stata .dta files.implemented
LArray.isin()
method to check whether each value of an array is contained in a list (or array) of values.implemented
LArray.unique()
method to compute unique values (or sub-arrays) for an array, optionally along axes.implemented
LArray.apply()
method to apply a python function to all values of an array or to all sub-arrays along some axes of an array and return the result. This is an extremely versatile method as it can be used both with aggregating functions or element-wise functions.implemented
LArray.apply_map()
method to apply a transformation mapping to array elements. For example, this can be used to transform some numeric codes to labels.implemented
LArray.reverse()
method to reverse one or several axes of an array (closes issue 631).implemented
LArray.roll()
method to roll the cells of an array n-times to the right along an axis. This is similar toLArray.shift()
, except that cells which are pushed “outside of the axis” are reintroduced on the opposite side of the axis instead of being dropped.implemented
Axis.apply()
method to transform an axis labels by a function and return a new Axis.added
Session.update()
method to add and modify items from an existing session by passing either another session or a dict-like object or an iterable object with (key, value) pairs (closes issue 754).implemented
AxisCollection.rename()
to rename axes of an AxisCollection, independently of any array.implemented
AxisCollection.set_labels()
(closes issue 782).implemented
wrap_elementwise_array_func()
function to make a function defined in another library work with LArray arguments instead of with numpy arrays.implemented
LArray.keys()
,LArray.values()
andLArray.items()
methods to respectively loop on an array labels, values or (key, value) pairs.implemented
zip_array_values()
andzip_array_items()
to loop respectively on several arrays values or (key, value) pairs.implemented
AxisCollection.iter_labels()
to iterate over all (possible combinations of) labels of the axes of the collection.
Miscellaneous improvements
improved speed of
read_hdf()
function when reading a stored LArray object dumped with the current and future version of larray. To get benefit of the speedup of reading arrays dumped with older versions of larray, please read and re-dump them. Closes issue 563.allowed to not specify the axes in
LArray.set_labels()
(closes issue 634):>>> a = ndtest('nat=BE,FO;sex=M,F') >>> a nat\sex M F BE 0 1 FO 2 3 >>> a.set_labels({'M': 'Men', 'BE': 'Belgian'}) nat\sex Men F Belgian 0 1 FO 2 3
LArray.set_labels()
can now take functions to transform axes labels (closes issue 536).>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> arr.set_labels('a', str.upper) a\b b0 b1 A0 0 1 A1 2 3
implemented the same “simpler pattern language” in
Axis.matching()
andGroup.matching()
than inSession.filter()
, in addition to regular expressions (which now require using theregexp
argument).stack()
can now stack along several axes at once (closes issue 56).>>> country = Axis('country=BE,FR,DE') >>> gender = Axis('gender=M,F') >>> stack({('BE', 'M'): 0, ... ('BE', 'F'): 1, ... ('FR', 'M'): 2, ... ('FR', 'F'): 3, ... ('DE', 'M'): 4, ... ('DE', 'F'): 5}, ... (country, gender)) country\gender M F BE 0 1 FR 2 3 DE 4 5
stack()
using a dictionary as elements can now use a simple axis name instead of requiring a full axis object. This will print a warning on Python < 3.7 though because the ordering of labels is not guaranteed in that case. Closes issue 755 and issue 581.stack()
using keyword arguments can now use a simple axis name instead of requiring a full axis object, even on Python < 3.6. This will print a warning though because the ordering of labels is not guaranteed in that case.added password argument to
Workbook.save()
to allow protecting Excel files with a password.added option
exact
tojoin
argument ofAxis.align()
andLArray.align()
methods. Instead of aligning, passingjoin='exact'
to thealign
method will raise an error when axes are not equal. Closes issue 338.made
Axis.by()
andGroup.by()
return a list of named groups instead of anonymous groups. By default, group names are defined as<start>:<end>
. This can be changed via the newtemplate
argument:>>> age = Axis('age=0..6') >>> age Axis([0, 1, 2, 3, 4, 5, 6], 'age') >>> age.by(3) (age.i[0:3] >> '0:2', age.i[3:6] >> '3:5', age.i[6:7] >> '6') >>> age.by(3, step=2) (age.i[0:3] >> '0:2', age.i[2:5] >> '2:4', age.i[4:7] >> '4:6', age.i[6:7] >> '6') >>> age.by(3, template='{start}-{end}') (age.i[0:3] >> '0-2', age.i[3:6] >> '3-5', age.i[6:7] >> '6')
Closes issue 669.
allowed to specify an axis by its position when selecting a subset of an array using the string notation:
>>> pop_mouv = ndtest('geo_from=BE,FR,UK;geo_to=BE,FR,UK') >>> pop_mouv geo_from\geo_to BE FR UK BE 0 1 2 FR 3 4 5 UK 6 7 8 >>> pop_mouv['0[BE, UK]'] # equivalent to pop_mouv[pop_mouv.geo_from['BE,UK']] geo_from\geo_to BE FR UK BE 0 1 2 UK 6 7 8 >>> pop_mouv['1.i[0, 2]'] # equivalent to pop_mouv[pop_mouv.geo_to.i[0, 2]] geo_from\geo_to BE UK BE 0 2 FR 3 5 UK 6 8
Closes issue 671.
added documentation and examples for
where()
,maximum()
andminimum()
functions (closes issue 700)updated the
Working With Sessions
section of the tutorial (closes issue 568).added dtype argument to LArray to set the type of the array explicitly instead of relying on auto-detection.
added dtype argument to stack to set the type of the resulting array explicitly instead of relying on auto-detection.
allowed to pass a single axis or group as
axes_to_reindex
argument of theLArray.reindex()
method (closes issue 712).LArray.dump()
gained a few extra arguments to further customize output : - axes_names : to specify whether or not the output should contain the axes names (and which) - maxlines and edgeitems : to dump only the start and end of large arrays - light : to output axes labels only when they change instead of repeating them on each line - na_repr : to specify how to represent N/A (NaN) valuessubstantially improved performance of creating, iterating, and doing a few other operations over larray objects. This solves a few pathological cases of slow operations, especially those involving many small-ish arrays but sadly the overall performance improvement is negligible over most of the real-world models using larray that we tested these changes on.
Fixes
fixed dumping to Excel arrays of “object” dtype containing NaN values using numpy float types (fixes the infamous 65535 bug).
fixed
LArray.divnot0()
being slow when the divisor has many axes and many zeros (closes issue 705).fixed maximum length of sheet names (31 characters instead of 30 characters) when adding a new sheet to an Excel Workbook (closes issue 713).
fixed missing documentation of many functions in Utility Functions section of the API Reference (closes issue 698).
fixed arithmetic operations between two sessions returning a nan value for each axis and group (closes issue 725).
fixed dumping sessions with metadata in HDF format (closes issue 702).
fixed minimum version of pandas to install. The minimum version is now 0.20.0.
fixed from_frame for dataframes with non string index names.
fixed creating an LSet from an IGroup with a (single) scalar key
>>> a = Axis('a=a0,a1,a2') >>> a.i[1].set() a['a1'].set()
EDITOR
Miscellaneous improvements
display the filename and line number in the status bar when the editor is called from a Python script (closes issue 173).
Fixes
Version 0.29
Released on 2018-09-07.
Syntax changes
deprecated
title
attribute ofLArray
objects andtitle
argument of array creation functions. A title is now considered as a metadata and must be added as:>>> # add title at array creation >>> arr = ndtest((3, 3), meta=[('title', 'array for testing')])
>>> # or after array creation >>> arr = ndtest((3, 3)) >>> arr.meta.title = 'array for testing'
See below for more information about metadata handling.
renamed
LArray.drop_labels()
toLArray.ignore_labels()
to avoid confusion with the newLArray.drop()
method (closes issue 672).renamed
Session.array_equals()
toSession.element_equals()
because this method now also compares axes and groups in addition to arrays.renamed
Sheet.load()
andRange.load()
nb_index
argument tonb_axes
to be consistent with all other input functions (read_*).Sheet
andRange
are the objects one gets when taking subsets of the excelWorkbook
objects obtained viaopen_excel()
(closes issue 648).deprecated the
element_equal()
function in favor of theLArray.eq()
method (closes issue 630) to be consistent with other future methods for operations between two arrays.renamed
nan_equals
argument ofLArray.equals()
andLArray.eq()
methods tonans_equal
because it is grammatically more correct and is explained more naturally as “whether two nans should be considered equal”.LArray.insert()
pos
andaxis
arguments are deprecated because those were only useful for very specific cases and those can easily be rewritten by using an indices group (axis.i[pos]
) for thebefore
argument instead (closes issue 652).
New features
allowed arrays to have metadata (e.g. title, description, authors, …).
Metadata can be added when creating arrays:
>>> # for Python <= 3.5 >>> arr = ndtest((3, 3), meta=[('title', 'array for testing'), ('author', 'John Smith')])
>>> # for Python >= 3.6 >>> arr = ndtest((3, 3), meta=Metadata(title='array for testing', author='John Smith'))
To access all existing metadata, use
array.meta
, for example:>>> arr.meta title: array for testing author: John Smith
To access some specific existing metadata, use
array.meta.<name>
, for example:>>> arr.meta.author 'John Smith'
Updating some existing metadata, or creating new metadata (the metadata is added if there was no metadata using that name) should be done using
array.meta.<name> = <value>
. For example:>>> arr.meta.city = 'London'
To remove some metadata, use
del array.meta.<name>
, for example:>>> del arr.meta.city
Note
Currently, only the HDF (.h5) file format supports saving and loading array metadata.
Metadata is not kept when actions or methods are applied on an array except for operations modifying the object in-place, such as pop[age < 10] = 0, and when the method copy() is called. Do not add metadata to an array if you know you will apply actions or methods on it before dumping it.
allowed sessions to have metadata. Session metadata is created and accessed using the same syntax than for arrays (
session.meta.<name>
), for example to add metadata to a session at creation:>>> # Python <= 3.5 >>> s = Session([('arr1', ndtest(2)), ('arr2', ndtest(3)], meta=[('title', 'my title'), ('author', 'John Smith')])
>>> # Python 3.6+ >>> s = Session(arr1=ndtest(2), arr2=ndtest(3), meta=Metadata(title='my title', author='John Smith'))
Note
Contrary to array metadata, saving and loading session metadata is supported for all current session file formats: Excel, CSV and HDF (.h5)
Metadata is not kept when actions or methods are applied on a session except for operations modifying a specific array, such as: s[‘arr1’] = 0. Do not add metadata to a session if you know you will apply actions or methods on it before dumping it.
Closes issue 640.
implemented
LArray.drop()
to return an array without some labels or indices along an axis (closes issue 506).>>> arr1 = ndtest((2, 4)) >>> arr1 a\b b0 b1 b2 b3 a0 0 1 2 3 a1 4 5 6 7 >>> a, b = arr1.axes
Dropping a single label
>>> arr1.drop('b1') a\b b0 b2 b3 a0 0 2 3 a1 4 6 7
Dropping multiple labels
>>> # arr1.drop('b1,b3') >>> arr1.drop(['b1', 'b3']) a\b b0 b2 a0 0 2 a1 4 6
Dropping a slice
>>> # arr1.drop('b1:b3') >>> arr1.drop(b['b1':'b3']) a\b b0 a0 0 a1 4
Dropping labels by position requires to specify the axis
>>> # arr1.drop('b.i[1]') >>> arr1.drop(b.i[1]) a\b b0 b2 b3 a0 0 2 3 a1 4 6 7
added new module to create arrays with values generated randomly following a few different distributions, or shuffle an existing array along an axis:
>>> from larray.random import *
Generate integers between two bounds (0 and 10 in this example)
>>> randint(0, 10, axes='a=a0..a2') a a0 a1 a2 3 6 2
Generate values following a uniform distribution
>>> uniform(axes='a=a0..a2') a a0 a1 a2 0.33293756929238394 0.5331412592583252 0.6748786766763107
Generate values following a normal distribution (\(\mu\) = 1 and \(\sigma\) = 2 in this example)
>>> normal(1, scale=2, axes='a=a0..a2') a a0 a1 a2 -0.9216651561025018 5.119734598931103 4.4467876992838935
Randomly shuffle an existing array along one axis
>>> arr = ndtest((3, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 6 7 8 >>> permutation(arr, axis='b') a\b b1 b2 b0 a0 1 2 0 a1 4 5 3 a2 7 8 6
Generate values by randomly choosing between specified values (5, 10 and 15 in this example), potentially with a specified probability for each value (respectively a 30%, 50%, 20% probability of occurring in this example).
>>> choice([5, 10, 15], p=[0.3, 0.5, 0.2], axes='a=a0,a1;b=b0..b2') a\b b0 b1 b2 a0 15 10 10 a1 10 5 10
Same as above with labels and probabilities given as a one dimensional LArray
>>> proba = LArray([0.3, 0.5, 0.2], Axis([5, 10, 15], 'outcome')) >>> proba outcome 5 10 15 0.3 0.5 0.2 >>> choice(p=proba, axes='a=a0,a1;b=b0..b2') a\b b0 b1 b2 a0 10 15 5 a1 10 5 10
made a few useful constants accessible directly from the larray module:
nan
,inf
,pi
,e
andeuler_gamma
. Like for any Python functionality, you can choose how to import and use them. For example, forpi
:>>> from larray import * >>> pi 3.141592653589793 OR >>> from larray import pi >>> pi 3.141592653589793 OR >>> import larray as la >>> la.pi 3.141592653589793
added
Group.equals()
method which compares group names, associated axis names and labels between two groups:>>> a = Axis('a=a0..a3') >>> a02 = a['a0:a2'] >> 'group_a' >>> # different group name >>> a02.equals(a['a0:a2']) False >>> # different axis name >>> other_axis = a.rename('other_name') >>> a02.equals(other_axis['a0:a2'] >> 'group_a') False >>> # different labels >>> a02.equals(a['a1:a3'] >> 'group_a') False
Miscellaneous improvements
completely rewritten the ‘Load And Dump Arrays, Sessions, Axes And Groups’ section of the tutorial (closes issue 645)
saving or loading a session from a file now includes Axis and Group objects in addition to arrays (closes issue 578).
Create a session containing axes, groups and arrays
>>> a, b = Axis("a=a0..a2"), Axis("b=b0..b2") >>> a01 = a['a0,a1'] >> 'a01' >>> arr1, arr2 = ndtest((a, b)), ndtest(a) >>> s = Session([('a', a), ('b', b), ('a01', a01), ('arr1', arr1), ('arr2', arr2)])
Saving a session will save axes, groups and arrays
>>> s.save('session.h5')
Loading a session will load axes, groups and arrays
>>> s2 = s.load('session.h5') >>> s2 Session(arr1, arr2, a, b, a01)
Note
All axes and groups of a session are stored in the same CSV file/Excel sheet/HDF group named respectively
__axes__
and__groups__
.vastly improved indexing using arrays (of labels, indices or booleans). Many advanced cases did not work, including when combining several indexing arrays, or when (one of) the indexing array(s) had an axis present in the array.
First let’s create some test axes
>>> a, b, c = ndtest((2, 3, 2)).axes
Then create a test array.
>>> arr = ndtest((a, b)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
If the key array has an axis not already present in arr (e.g. c), the target axis (a) is replaced by the extra axis (c). This already worked previously.
>>> key = LArray(['a1', 'a0'], c) >>> key c c0 c1 a1 a0 >>> arr[key] c\b b0 b1 b2 c0 3 4 5 c1 0 1 2
If the key array has the target axis, the axis stays the same, but the data is reordered (this also worked previously):
>>> key = LArray(['b1', 'b0', 'b2'], b) >>> key b b0 b1 b2 b1 b0 b2 >>> arr[key] a\b b0 b1 b2 a0 1 0 2 a1 4 3 5
From here on, the examples shown did not work previously…
Now, if the key contains another axis present in the array (b) which is not the target axis (a), the target axis completely disappears (both axes are replaced by the key axis):
>>> key = LArray(['a0', 'a1', 'a0'], b) >>> key b b0 b1 b2 a0 a1 a0 >>> arr[key] b b0 b1 b2 0 4 2
If the key has both the target axis (a) and another existing axis (b)
>>> key a\b b0 b1 b2 a0 a0 a1 a0 a1 a1 a0 a1 >>> arr[key] a\b b0 b1 b2 a0 0 4 2 a1 3 1 5
If the key has both another existing axis (a) and an extra axis (c)
>>> key a\c c0 c1 a0 b0 b1 a1 b2 b0 >>> arr[key] a\c c0 c1 a0 0 1 a1 5 3
It also works if the key has the target axis (a), another existing axis (b) and an extra axis (c), but this is not shown for brevity.
updated
Session.summary()
so as to display all kinds of objects and allowed to pass a function returning a string representation of an object instead of passing a pre-defined string template (closes issue 608):>>> axis1 = Axis("a=a0..a2") >>> group1 = axis1['a0,a1'] >> 'a01' >>> arr1 = ndtest((2, 2), title='array 1', dtype=np.int64) >>> arr2 = ndtest(4, title='array 2', dtype=np.int64) >>> arr3 = ndtest((3, 2), title='array 3', dtype=np.int64) >>> s = Session([('axis1', axis1), ('group1', group1), ('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
Using the default template
>>> print(s.summary()) axis1: a ['a0' 'a1' 'a2'] (3) group1: a['a0', 'a1'] >> a01 (2) arr1: a, b (2 x 2) [int64] array 1 arr2: a (4) [int64] array 2 arr3: a, b (3 x 2) [int64] array 3
Using a specific template
>>> def print_array(key, array): ... axes_names = ', '.join(array.axes.display_names) ... shape = ' x '.join(str(i) for i in array.shape) ... return "{} -> {} ({})\\n title = {}\\n dtype = {}".format(key, axes_names, shape, ... array.title, array.dtype) >>> template = {Axis: "{key} -> {name} [{labels}] ({length})", ... Group: "{key} -> {name}: {axis_name} {labels} ({length})", ... LArray: print_array} >>> print(s.summary(template)) axis1 -> a ['a0' 'a1' 'a2'] (3) group1 -> a01: a ['a0', 'a1'] (2) arr1 -> a, b (2 x 2) title = array 1 dtype = int64 arr2 -> a (4) title = array 2 dtype = int64 arr3 -> a, b (3 x 2) title = array 3 dtype = int64
methods
Session.equals()
andSession.element_equals()
now also compare axes and groups in addition to arrays (closes issue 610):>>> a = Axis('a=a0..a2') >>> a01 = a['a0,a1'] >> 'a01' >>> s1 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
Identical sessions
>>> s1.element_equals(s2) name a a01 arr1 arr2 True True True True
Different value(s) between two arrays
>>> s2.arr1['a1'] = 0 >>> s1.element_equals(s2) name a a01 arr1 arr2 True True False True
Different label(s)
>>> s2.arr2 = ndtest("b=b0,b1; a=a0,a1") >>> s2.a = Axis('a=a0,a1') >>> s1.element_equals(s2) name a a01 arr1 arr2 False True False False
Extra/missing objects
>>> s2.arr3 = ndtest((3, 3)) >>> del s2.a >>> s1.element_equals(s2) name a a01 arr1 arr2 arr3 False True False False False
added arguments
wide
andvalue_name
to methodsLArray.as_table()
andLArray.dump()
like inLArray.to_excel()
andLArray.to_csv()
(closes issue 653).the
from_series()
function supports Pandas series with a MultiIndex (closes issue 465)the
stack()
function supports any array-like object instead of only LArray objects.>>> stack(a0=[1, 2, 3], a1=[4, 5, 6], axis='a') {0}*\a a0 a1 0 1 4 1 2 5 2 3 6
made some operations on Excel Workbooks a bit faster by telling Excel to avoid updating the screen when the Excel instance is not visible anyway. This affects all workbooks opened via
open_excel()
as well asread_excel()
andLArray.to_excel()
when using the defaultxlwings
engine.made the documentation link in Windows start menu version-specific (instead of always pointing to the latest release) so that users do not inadvertently use the latest release syntax when using an older version of larray (closes issue 142).
added menu bar with undo/redo when editing single arrays (as a byproduct of issue 133).
Fixes
fixed Copy(to Excel)/Paste/Plot in the editor not working for 1D and 2D arrays (closes issue 140).
fixed Excel add-ins not loaded when opening an Excel Workbook by calling the
LArray.to_excel()
method with no path or via “Copy to Excel (CTRL+E)” in the editor (closes issue 154).made LArray support Pandas versions >= 0.21 (closes issue 569)
fixed current active Excel Workbook being closed when calling the
LArray.to_excel()
method on an array with-1
asfilepath
argument (closes issue 473).fixed
LArray.split_axes()
when splitting a single axis and using the names argument (e.g.arr.split_axes('bd', names=('b', 'd'))
).fixed splitting an anonymous axis without specifying the names argument.
>>> combined = ndtest('a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2') >>> combined {0} a0_b0 a0_b1 a0_b2 a1_b0 a1_b1 a1_b2 0 1 2 3 4 5 >>> combined.split_axes(0) {0}\{1} b0 b1 b2 a0 0 1 2 a1 3 4 5
fixed
LArray.combine_axes()
withwildcard=True
.fixed taking a subset of an array by giving an index along a specific axis using a string (strings like
"axisname.i[pos]"
).fixed the editor not working with Python 2 or recent Qt4 versions.
Version 0.28
Released on 2018-03-15.
Backward incompatible changes
changed behavior of operators session1 == session2 and session1 != session2: returns a session of boolean arrays (closes issue 516):
>>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> (s1 == s2).arr1 a a0 a1 True True >>> s2.arr1['a1'] = 0 >>> (s1 == s2).arr1 a a0 a1 True False >>> (s1 != s2).arr1 a a0 a1 False True
New features
made it possible to run the tutorial online (as a Jupyter notebook) by clicking on the
launch|binder
badge on top of the tutorial web page (closes issue 73)added methods array_equals and equals to Session object to compare arrays from two sessions. The method array_equals return a boolean value for each array while the method equals returns a unique boolean value (True if all arrays of both sessions are equal, False otherwise):
>>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))]) >>> s1.array_equals(s2) name arr1 arr2 True True >>> s1.equals(s2) True
Different value(s)
>>> s2.arr1['a1'] = 0 >>> s1.array_equals(s2) name arr1 arr2 False True >>> s1.equals(s2) False
Different label(s)
>>> from larray import ndrange >>> s2.arr2 = ndrange("b=b0,b1; a=a0,a1") >>> s1.array_equals(s2) name arr1 arr2 False False >>> s1.equals(s2) False
Extra/missing array(s)
>>> s2.arr3 = ndtest((3, 3)) >>> s1.array_equals(s2) name arr1 arr2 arr3 False False False >>> s1.equals(s2) False
Closes issue 517.
added method equals to LArray object to compare two arrays:
>>> arr1 = ndtest((2, 3)) >>> arr1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr2 = arr1.copy() >>> arr1.equals(arr2) True >>> arr2['b1'] += 1 >>> arr1.equals(arr2) False >>> arr3 = arr1.set_labels('a', ['x0', 'x1']) >>> arr1.equals(arr3) False
Arrays with nan values
>>> arr1 = ndtest((2, 3), dtype=float) >>> arr1['a1', 'b1'] = nan >>> arr1 a\b b0 b1 b2 a0 0.0 1.0 2.0 a1 3.0 nan 5.0 >>> arr2 = arr1.copy() >>> # By default, an array containing nan values is never equal to another array, >>> # even if that other array also contains nan values at the same positions. >>> # The reason is that a nan value is different from *anything*, including itself. >>> arr1.equals(arr2) False >>> # set flag nan_equal to True to override this behavior >>> arr1.equals(arr2, nan_equal=True) True
This method also includes the arguments rtol (relative tolerance) and atol (absolute tolerance) allowing to test the equality between two arrays within a given relative or absolute tolerance:
>>> arr1 = LArray([6., 8.], "a=a0,a1") >>> arr1 a a0 a1 6.0 8.0 >>> arr2 = LArray([5.999, 8.001], "a=a0,a1") >>> arr2 a a0 a1 5.999 8.001 >>> arr1.equals(arr2) False >>> # equals returns True if abs(array1 - array2) <= (atol + rtol * abs(array2)) >>> arr1.equals(arr2, atol=0.01) True >>> arr1.equals(arr2, rtol=0.01) True
added Load from Script in the File menu of the editor allowing to load commands from an existing Python file (closes issue 96).
added Edit menu allowing to undo and redo changes of array values by editing cells and removed Apply and Discard buttons. Changes are now kept when switching from an array to another instead of losing them as previously (closes issue 32).
allowed to provide an absolute or relative tolerance value when comparing arrays through the compare function (closes issue 131).
made the editor able to detect and display plot objects stored in tuple, list or arrays. For example, arrays of plot objects are returned when using subplots=True option in calls of plot method:
>>> a = ndtest('sex=M,F; nat=BE,FO; year=2000..2017') >>> # display 4 plots vertically placed (one plot for each pair (sex, nationality)) >>> a.plot(subplots=True) >>> # display 4 plots ordered in a 2 x 2 grid >>> a.plot(subplots=True, layout=(2, 2))
Closes issue 135.
Miscellaneous improvements
functions local_arrays, global_arrays and arrays returns a session excluding arrays starting by an underscore by default. To include them, set the flag include_private to True (closes issue 513):
>>> global_arr1 = ndtest((2, 2)) >>> _global_arr2 = ndtest((3, 3)) >>> def foo(): ... local_arr1 = ndtest(2) ... _local_arr2 = ndtest(3) ... ... # exclude arrays starting with '_' by default ... s = arrays() ... print(s.names) ... ... # use flag 'include_private' to include arrays starting with '_' ... s = arrays(include_private=True) ... print(s.names) >>> foo() ['global_arr1', 'local_arr1'] ['_global_arr2', '_local_arr2', 'global_arr1', 'local_arr1']
implemented sessions binary operations with non sessions objects (closes issue 514 and issue 515):
>>> s = Session(arr1=ndtest((2, 2)), arr2=ndtest((3, 3))) >>> s.arr1 a\b b0 b1 a0 0 1 a1 2 3 >>> s.arr2 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 6 7 8
Add a scalar to all arrays
>>> # equivalent to s2 = 3 + s >>> s2 = s + 3 >>> s2.arr1 a\b b0 b1 a0 3 4 a1 5 6 >>> s2.arr2 a\b b0 b1 b2 a0 3 4 5 a1 6 7 8 a2 9 10 11
Apply binary operations between two sessions
>>> sdiff = (s2 - s) / s >>> sdiff.arr1 a\b b0 b1 a0 inf 3.0 a1 1.5 1.0 >>> sdiff.arr2 a\b b0 b1 b2 a0 inf 3.0 1.5 a1 1.0 0.75 0.6 a2 0.5 0.43 0.375
added possibility to call the method reindex with a group (closes issue 531):
>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> b = Axis("b=b2..b0") >>> arr.reindex('b', b['b1':]) a\b b1 b0 a0 1 0 a1 3 2
added possibility to call the methods diff and growth_rate with a group (closes issue 532):
>>> data = [[2, 4, 5, 4, 6], [4, 6, 3, 6, 9]] >>> a = LArray(data, "sex=M,F; year=2016..2020") >>> a sex\year 2016 2017 2018 2019 2020 M 2 4 5 4 6 F 4 6 3 6 9 >>> a.diff(a.year[2017:]) sex\year 2018 2019 2020 M 1 -1 2 F -3 3 3 >>> a.growth_rate(a.year[2017:]) sex\year 2018 2019 2020 M 0.25 -0.2 0.5 F -0.5 1.0 0.5
function ndrange has been deprecated in favor of sequence or ndtest. Also, an Axis or a list/tuple/collection of axes can be passed to the ndtest function (closes issue 534):
>>> ndtest("nat=BE,FO;sex=M,F") nat\sex M F BE 0 1 FO 2 3
allowed to pass a group for argument axis of stack function (closes issue 535):
>>> b = Axis('b=b0..b2') >>> stack(b0=ndtest(2), b1=ndtest(2), axis=b[:'b1']) a\b b0 b1 a0 0 0 a1 1 1
renamed argument nb_index of read_csv, read_excel, read_sas, from_lists and from_string functions as nb_axes. The relation between nb_index and nb_axes is given by nb_axes = nb_index + 1:
For a given file ‘arr.csv’ with content
a,b\c,c0,c1 a0,b0,0,1 a0,b1,2,3 a1,b0,4,5 a1,b1,6,7
previous code to read this array such as :
>>> # deprecated >>> arr = read_csv('arr.csv', nb_index=2)
must be updated as follow :
>>> arr = read_csv('arr.csv', nb_axes=3)
Closes issue 548.
deprecated nan_equal function in favor of element_equal function. The element_equal function has the same optional arguments as the LArray.equals method but compares two arrays element-wise and returns an array of booleans:
>>> arr1 = LArray([6., np.nan, 8.], "a=a0..a2") >>> arr1 a a0 a1 a2 6.0 nan 8.0 >>> arr2 = LArray([5.999, np.nan, 8.001], "a=a0..a2") >>> arr2 a a0 a1 a2 5.999 nan 8.001 >>> element_equal(arr1, arr2) a a0 a1 a2 False False False >>> element_equal(arr1, arr2, nan_equals=True) a a0 a1 a2 False True False >>> element_equal(arr1, arr2, atol=0.01, nan_equals=True) a a0 a1 a2 True True True >>> element_equal(arr1, arr2, rtol=0.01, nan_equals=True) a a0 a1 a2 True True True
Closes issue 593.
renamed argument transpose by wide in to_csv method.
added argument wide in to_excel method. When argument wide is set to False, the array is exported in “narrow” format, i.e. one column per axis plus one value column:
>>> arr = ndtest((2, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
Default behavior (wide=True):
>>> arr.to_excel('my_file.xlsx') a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
With wide=False:
>>> arr.to_excel('my_file.xlsx', wide=False) a b value a0 b0 0 a0 b1 1 a0 b2 2 a1 b0 3 a1 b1 4 a1 b2 5
Argument transpose has a different purpose than wide and is mainly useful to allow multiple axes as header when exporting arrays with more than 2 dimensions. Closes issue 575 and issue 371.
added argument wide to read_csv and read_excel functions. If False, the array to be loaded is assumed to be stored in “narrow” format:
>>> # assuming the array was saved using command: arr.to_excel('my_file.xlsx', wide=False) >>> read_excel('my_file.xlsx', wide=False) a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
Closes issue 574.
added argument name to to_series method allowing to set a name to the Pandas Series returned by the method.
added argument value_name to to_csv and to_excel allowing to change the default name (‘value’) to the column containg the values when the argument wide is set to False:
>>> arr.to_csv('my_file.csv', wide=False, value_name='data') a,b,data a0,b0,0 a0,b1,1 a0,b2,2 a1,b0,3 a1,b1,4 a1,b2,5
Closes issue 549.
renamed argument sheetname of read_excel function as sheet (closes issue 587).
Renamed sheet_name of LArray.to_excel to sheet since it can also be an index (closes issue 580).
allowed to create axes with zero padded string labels (closes issue 533):
>>> Axis('zero_padding=01,02,03,10,11,12') Axis(['01', '02', '03', '10', '11', '12'], 'zero_padding')
added a dropdown menu containing recently used files in dialog boxes of Save Command History To Script and Load from Script from File menu.
Fixes
fixed passing a scalar group from an external axis to get a subset of an array (closes issue 178):
>>> arr = ndtest((3, 2)) >>> arr['a1'] b b0 b1 2 3 >>> alt_a = Axis("alt_a=a1..a2") >>> arr[alt_a['a1']] b b0 b1 2 3 >>> arr[alt_a.i[0]] b b0 b1 2 3
fixed subscript a string LGroup key (closes issue 437):
>>> axis = Axis("a=a0,a1") >>> axis['a0'][0] 'a'
fixed Axis.union, Axis.intersection and Axis.difference when passed value is a single string (closes issue 489):
>>> a = Axis('a=a0..a2') >>> a.union('a1') Axis(['a0', 'a1', 'a2'], 'a') >>> a.union('a3') Axis(['a0', 'a1', 'a2', 'a3'], 'a') >>> a.union('a1..a3') Axis(['a0', 'a1', 'a2', 'a3'], 'a') >>> a.intersection('a1..a3') Axis(['a1', 'a2'], 'a') >>> a.difference('a1..a3') Axis(['a0'], 'a')
fixed to_excel applied on >= 2D arrays using transpose=True (closes issue 579)
>>> arr = ndtest((2, 3)) >>> arr.to_excel('my_file.xlsx', transpose=True) b\a a0 a1 b0 0 3 b1 1 4 b2 2 5
fixed aggregation on arrays containing zero padded string labels (closes issue 522):
>>> arr = ndtest('zero_padding=01,02,03,10,11,12') >>> arr zero_padding 01 02 03 10 11 12 0 1 2 3 4 5 >>> arr.sum('01,02,03 >> 01_03; 10') zero_padding 01_03 10 3 3
Version 0.27
Released on 2017-11-30.
Syntax changes
Backward incompatible changes
labels are checked during array subset assignment (closes issue 269):
>>> arr = ndtest(4) >>> arr a a0 a1 a2 a3 0 1 2 3 >>> arr['a0,a1'] = arr['a2,a3'] ValueError: incompatible axes: Axis(['a0', 'a1'], 'a') vs Axis(['a2', 'a3'], 'a')
previous behavior can be recovered through drop_labels or by changing labels via set_labels or set_axes:
>>> arr['a0,a1'] = arr['a2,a3'].drop_labels('a') >>> arr['a0,a1'] = arr['a2,a3'].set_labels('a', {'a2': 'a0', 'a3': 'a1'})
from_frame parse_header argument defaults to False instead of True.
New features
implemented Axis.insert and LArray.insert to add values at a given position of an axis (closes issue 54).
>>> arr1 = ndtest((2, 3)) >>> arr1 a\\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr1.insert(42, before='b1', label='b0.5') a\\b b0 b0.5 b1 b2 a0 0 42 1 2 a1 3 42 4 5
insert an array
>>> arr2 = ndtest(2) >>> arr2 a a0 a1 0 1 >>> arr1.insert(arr2, after='b0', label='b0.5') a\\b b0 b0.5 b1 b2 a0 0 0 1 2 a1 3 1 4 5
insert an array which already has the axis
>>> arr3 = ndrange('a=a0,a1;b=b0.1,b0.2') + 42 >>> arr3 a\\b b0.1 b0.2 a0 42 43 a1 44 45 >>> arr1.insert(arr3, before='b1') a\\b b0 b0.1 b0.2 b1 b2 a0 0 42 43 1 2 a1 3 44 45 4 5
added new items in the Help menu of the editor:
Report Issue…: to report an issue on the Github project website.
Users Discussion…: redirect to the LArray Users Google Group (you need to be registered to participate).
New Releases And Announces Mailing List…: redirect to the LArray Announce mailing list.
About: give information about the editor and the versions of packages currently installed on your computer (closes issue 88).
added Save Command History To Script in the File menu of the editor allowing to save executed commands in a new or existing Python file.
added possibility to show only rows with differences when comparing arrays or sessions through the compare function in the editor (closes issue 102).
added ascending argument to methods indicesofsorted and labelsofsorted. Values are sorted in ascending order by default. Set to False to sort values in descending order:
>>> arr = LArray([[1, 5], [3, 2], [0, 4]], "nat=BE,FR,IT; sex=M,F") >>> arr nat\sex M F BE 1 5 FR 3 2 IT 0 4 >>> arr.indicesofsorted("nat", ascending=False) nat\sex M F 0 1 0 1 0 2 2 2 1 >>> arr.labelsofsorted("nat", ascending=False) nat\sex M F 0 FR BE 1 BE IT 2 IT FR
Closes issue 490.
Miscellaneous improvements
allowed to sort values of an array along an axis (closes issue 225):
>>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE") >>> a sex\nat EU FO BE M 10 2 4 F 3 7 1 >>> a.sort_values(axis='sex') sex*\nat EU FO BE 0 3 2 1 1 10 7 4 >>> a.sort_values(axis='nat') sex\nat* 0 1 2 M 2 4 10 F 1 3 7
method LArray.sort_values can be called without argument (closes issue 478):
>>> arr = LArray([0, 1, 6, 3, -1], "a=a0..a4") >>> arr a a0 a1 a2 a3 a4 0 1 6 3 -1 >>> arr.sort_values() a a4 a0 a1 a3 a2 -1 0 1 3 6
If the array has more than one dimension, axes are combined together:
>>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE") >>> a sex\nat EU FO BE M 10 2 4 F 3 7 1 >>> a.sort_values() sex_nat F_BE M_FO F_EU M_BE F_FO M_EU 1 2 3 4 7 10
when appending/prepending/extending an array, both the original array and the added values will be converted to a data type which can hold both without loss of information. It used to convert the added values to the type of the original array. For example, given an array of integers like:
>>> arr = ndtest(3) a a0 a1 a2 0 1 2
Trying to add a floating point number to that array used to result in:
>>> arr.append('a', 2.5, 'a3') a a0 a1 a2 a3 0 1 2 2
Now it will result in:
>>> arr.append('a', 2.5, 'a3') a a0 a1 a2 a3 0.0 1.0 2.0 2.5
made the editor more responsive when switching to or changing the filter of large arrays (closes issue 93).
added support for coloring numeric values for object arrays (e.g. arrays containing both strings and numbers).
documentation links in the Help menu of the editor point to the version of the documentation corresponding to the installed version of larray (closes issue 105).
Fixes
fixed array values being editable in view() (instead of only in edit()).
Version 0.26.1
Released on 2017-10-25.
Miscellaneous improvements
Made handling Excel sheets with many blank columns/rows after the data much faster (but still slower than sheets without such blank cells).
Fixes
fixed reading from and writing to Excel sheets with 16384 columns or 1048576 rows (Excel’s maximum).
fixed LArray.split_axes using a custom separator and not using sort=True or when the split labels are ambiguous with labels from other axes (closes issue 485).
fixed reading 1D arrays with non-string labels (closes issue 495).
fixed read_csv(sort_columns=True) for 1D arrays (closes issue 497).
Version 0.26
Released on 2017-10-13.
Syntax changes
renamed special variable x to X to let users define an x variable in their code without breaking all subsequent code using that special variable (closes issue 167).
renamed Axis.startswith, endswith and matches to startingwith, endingwith and matching to avoid a possible confusion with str.startswith and endswith which return booleans (closes issue 432).
renamed na argument of read_csv, read_excel, read_hdf and read_sas functions to fill_value to avoid confusion as to what the argument does and to be consistent with reindex and align (closes issue 394).
renamed split_axis to split_axes to reflect the fact that it can now split several axes at once (see below).
renamed sort_axis to sort_axes to reflect the fact that it can sort multiple axes at once (and does so by default).
renamed several methods with more explicit names (closes issue 50):
argmax, argmin, argsort to labelofmax, labelofmin, labelsofsorted
posargmax, posargmin, posargsort to indexofmax, indexofmin, indicesofsorted
renamed PGroup to IGroup to be consistent with other methods, especially the .i methods on axes and arrays (I is for Index – P was for Position).
Backward incompatible changes
getting a subset using a boolean selection returns an array with labels combined with underscore by defaults (for consistency with split_axes and combine_axes). Closes issue 376:
>>> arr = ndtest((2, 2)) >>> arr a\b b0 b1 a0 0 1 a1 2 3 >>> arr[arr < 3] a_b a0_b0 a0_b1 a1_b0 0 1 2
New features
added global_arrays() and arrays() functions to complement the local_arrays() function. They return a Session containing respectively all arrays defined in global variables and all available arrays (whether they are defined in local or global variables).
When used outside of a function, these three functions should have the same results, but inside a function local_arrays() will return only arrays local to the function, global_arrays() will return only arrays defined globally and arrays() will return arrays defined either locally or globally. Closes issue 416.
a * symbol is appended to the window title when unsaved changes are detected in the viewer (closes issue 21).
implemented Axis.containing to create a Group with all labels of an axis containing some substring (closes issue 402).
>>> people = Axis(['Bruce Wayne', 'Bruce Willis', 'Arthur Dent'], 'people') >>> people.containing('Will') people['Bruce Willis']
implemented Group.containing, startingwith, endingwith and matching to create a group with all labels of a group matching some criterion (closes issue 108).
>>> group = people.startingwith('Bru') >>> group people['Bruce Wayne', 'Bruce Willis'] >>> group.containing('Will') people['Bruce Willis']
implemented nan_equal() function to create an array of booleans telling whether each cell of the first array is equal to the corresponding cell in the other array, even in the presence of NaN.
>>> arr1 = ndtest(3, dtype=float) >>> arr1['a1'] = nan >>> arr1 a a0 a1 a2 0.0 nan 2.0 >>> arr2 = arr1.copy() >>> arr1 == arr2 a a0 a1 a2 True False True >>> nan_equal(arr1, arr2) a a0 a1 a2 True True True
implemented from_frame() to convert a Pandas DataFrame to an array:
>>> df = ndtest((2, 2, 2)).to_frame() >>> df c c0 c1 a b a0 b0 0 1 b1 2 3 a1 b0 4 5 b1 6 7 >>> from_frame(df) a b\\c c0 c1 a0 b0 0 1 a0 b1 2 3 a1 b0 4 5 a1 b1 6 7
implemented Axis.split to split an axis into several.
>>> a_b = Axis('a_b=a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2') >>> a_b.split() [Axis(['a0', 'a1'], 'a'), Axis(['b0', 'b1', 'b2'], 'b')]
added the possibility to load the example dataset used in the tutorial via the menu
File > Load Example
in the viewer
Miscellaneous improvements
view() and edit() without argument now display global arrays in addition to local ones (closes issue 54).
using the mouse scrollwheel on filter combo boxes will switch to the previous/next label.
implemented a combobox to choose which color gradient to use and provide a few gradients.
inverted background colors in the viewer (red for low values and blue for high values). Closes issue 18.
allowed to pass an array of labels as new_axis argument to reindex method (closes issue 384):
>>> arr = ndrange('a=v0..v1;b=v0..v2') >>> arr a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 >>> arr.reindex('a', arr.b.labels) a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 v2 nan nan nan
allowed to call the reindex method using a differently named axis for labels (closes issue 386):
>>> arr = ndrange('a=v0..v1;b=v0..v2') >>> arr a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 >>> arr.reindex('a', arr.b) a\b v0 v1 v2 v0 0 1 2 v1 3 4 5 v2 nan nan nan
arguments fill_value, sort_rows and sort_columns of read_excel function are also supported by the default xlwings engine (closes issue 393).
allowed to pass a label or group as sheet_name argument of the method to_excel or to a Workbook (open_excel). Same for key argument of the method to_hdf. Closes issue 328.
>>> arr = ndtest((4, 4, 4))
>>> # iterate over labels of a given axis >>> with open_excel('my_file.xlsx') as wb: >>> for label in arr.a: ... wb[label] = arr[label].dump() ... wb.save() >>> for label in arr.a: ... arr[label].to_hdf('my_file.h5', label)
>>> # create and use a group >>> even = arr.a['a0,a2'] >> 'even' >>> arr[even].to_excel('my_file.xlsx', even) >>> arr[even].to_hdf('my_file.h5', even)
>>> # special characters : \ / ? * [ or ] in labels or groups are replaced by an _ when exporting to excel >>> # sheet names cannot exceed 31 characters >>> g = arr.a['a1,a3,a4'] >> '?name:with*special\/[char]' >>> arr[g].to_excel('my_file.xlsx', g) >>> print(open_excel('my_file.xlsx').sheet_names()) ['_name_with_special___char_'] >>> # special characters \ or / in labels or groups are replaced by an _ when exporting to HDF file
allowed to pass a Group to read_excel/read_hdf as sheetname/key argument (closes issue 439).
>>> a, b, c = arr.a, arr.b, arr.c
>>> # For Excel >>> new_from_excel = zeros((a, b, c), dtype=int) >>> for label in a: ... new_from_excel[label] = read_excel('my_file.xlsx', label) >>> # But, to avoid loading the file in Excel repeatedly (which is very inefficient), >>> # this particular example should rather be written like this: >>> new_from_excel = zeros((a, b, c), dtype=int) >>> with open_excel('my_file.xlsx') as wb: ... for label in a: ... new_from_excel[label] = wb[label].load()
>>> # For HDF >>> new_from_hdf = zeros((a, b, c), dtype=int) >>> for label in a: ... new_from_hdf[label] = read_hdf('my_file.h5', label)
allowed setting the name of a Group using another Group or Axis (closes issue 341):
>>> arr = ndrange('axis=a,a0..a3,b,b0..b3,c,c0..c3') >>> arr axis a a0 a1 a2 a3 b b0 b1 b2 b3 c c0 c1 c2 c3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> # matches('^.$') will select labels with only one character: 'a', 'b' and 'c' >>> groups = tuple(arr.axis.startswith(code) >> code for code in arr.axis.matches('^.$')) >>> groups (axis['a', 'a0', 'a1', 'a2', 'a3'] >> 'a', axis['b', 'b0', 'b1', 'b2', 'b3'] >> 'b', axis['c', 'c0', 'c1', 'c2', 'c3'] >> 'c') >>> arr.sum(groups) axis a b c 10 35 60
allowed to test if an array contains a label using the in operator (closes issue 343):
>>> arr = ndrange('age=0..99;sex=M,F') >>> 'M' in arr True >>> 'Male' in arr False >>> # this can be useful for example in an 'if' statement >>> if 102 not in arr: ... # with 'reindex', we extend 'age' axis to 102 ... arr = arr.reindex('age', Axis('age=0..102'), fill_value=0) >>> arr.info 103 x 2 age [103]: 0 1 2 ... 100 101 102 sex [2]: 'M' 'F'
allowed to create a group on an axis using labels of another axis (closes issue 362):
>>> year = Axis('year=2000..2017') >>> even_year = Axis(range(2000, 2017, 2), 'even_year') >>> group_even_year = year[even_year] >>> group_even_year year[2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016]
split_axes (formerly split_axis) now allows to split several axes at once (closes issue 366):
>>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1') >>> combined a_b\c_d c0_d0 c0_d1 c1_d0 c1_d1 a0_b0 0 1 2 3 a0_b1 4 5 6 7 a1_b0 8 9 10 11 a1_b1 12 13 14 15 >>> combined.split_axes(['a_b', 'c_d']) a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15 >>> combined.split_axes({'a_b': ('A', 'B'), 'c_d': ('C', 'D')}) A B C\D d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15
argument axes of split_axes has become optional: defaults to all axes whose name contains the specified delimiter (closes issue 365):
>>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1') >>> combined a_b\c_d c0_d0 c0_d1 c1_d0 c1_d1 a0_b0 0 1 2 3 a0_b1 4 5 6 7 a1_b0 8 9 10 11 a1_b1 12 13 14 15 >>> combined.split_axes() a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15
allowed to perform several axes combinations at once with the combine_axes() method (closes issue 382):
>>> arr = ndtest((2, 2, 2, 2)) >>> arr a b c\d d0 d1 a0 b0 c0 0 1 a0 b0 c1 2 3 a0 b1 c0 4 5 a0 b1 c1 6 7 a1 b0 c0 8 9 a1 b0 c1 10 11 a1 b1 c0 12 13 a1 b1 c1 14 15 >>> arr.combine_axes([('a', 'c'), ('b', 'd')]) a_c\b_d b0_d0 b0_d1 b1_d0 b1_d1 a0_c0 0 1 4 5 a0_c1 2 3 6 7 a1_c0 8 9 12 13 a1_c1 10 11 14 15 >>> # set output axes names by passing a dictionary >>> arr.combine_axes({('a', 'c'): 'ac', ('b', 'd'): 'bd'}) ac\bd b0_d0 b0_d1 b1_d0 b1_d1 a0_c0 0 1 4 5 a0_c1 2 3 6 7 a1_c0 8 9 12 13 a1_c1 10 11 14 15
allowed to use keyword arguments in set_labels (closes issue 383):
>>> a = ndrange('nat=BE,FO;sex=M,F') >>> a nat\sex M F BE 0 1 FO 2 3 >>> a.set_labels(sex='Men,Women', nat='Belgian,Foreigner') nat\sex Men Women Belgian 0 1 Foreigner 2 3
allowed passing an axis to set_labels as ‘labels’ argument (closes issue 408).
added data type (dtype) to array.info (closes issue 454):
>>> arr = ndtest((2, 2), dtype=float) >>> arr a\b b0 b1 a0 0.0 1.0 a1 2.0 3.0 >>> arr.info 2 x 2 a [2]: 'a0' 'a1' b [2]: 'b0' 'b1' dtype: float64
To create a 1D array using from_string() and the default separator ” “, a tabulation character
\t
(instead of-
previously) must be added in front of the data line:>>> from_string('''sex M F ... \t 0 1''') sex M F 0 1
viewer window title also includes the dtype of the current displayed array (closes issue 85)
viewer window title uses only the file name instead of the entire file path as it made titles too long in some cases.
when editing .csv files, the viewer window title will be “directoryfname.csv - axes_info” instead of having the file name repeated as before (“dirfname.csv - fname: axes_info”).
the viewer will not update digits/scientific notation nor colors when the filter changes, so that numbers are more easily comparable when quickly changing the filter, especially using the scrollwheel on filter boxes.
NaN values display as grey in the viewer so that they stand out more.
compare() will color values depending on relative difference instead of absolute difference as this is usually more useful.
compare(sessions) uses nan_equal to compare arrays so that identical arrays are not marked different when they contain NaN values.
changed compare() “stacked axis” names: arrays -> array and sessions -> session because that reads a bit more naturally.
Fixes
fixed array creation with axis(es) given as string containing only one label (axis name and label were inverted).
fixed reading an array from a CSV or Excel file when the columns axis is not explicitly named (via
\
). For example, let’s say we want to read a CSV file ‘pop.csv’ with the following content (indented for clarity)sex, 2015, 2016 F, 11, 13 M, 12, 10
The result of function read_csv is:
>>> pop = read_csv('pop.csv') >>> pop sex\{1} 2015 2016 F 11 13 M 12 10
Closes issue 372.
fixed converting a 1xN Pandas DataFrame to an array using aslarray (closes issue 427):
>>> df = pd.DataFrame([[1, 2, 3]], index=['a0'], columns=['b0', 'b1', 'b2']) >>> df b0 b1 b2 a0 1 2 3 >>> aslarray(df) {0}\{1} b0 b1 b2 a0 1 2 3
>>> # setting name to index and columns >>> df.index.name = 'a' >>> df.columns.name = 'b' >>> df b b0 b1 b2 a a0 1 2 3 >>> aslarray(df) a\b b0 b1 b2 a0 1 2 3
fixed original file being deleted when trying to overwrite a file via Session.save or open_excel failed (closes issue 441)
fixed loading arrays from Excel sheets containing blank cells below or right of the array to read (closes issue 443)
fixed unary and binary operations between sessions failing entirely when the operation failed/was invalid on any array. Now the result will be nan for that array but the operation will carry on for other arrays.
fixed stacking sessions failing entirely when the stacking failed on any array. Now the result will be nan for that array but the operation will carry on for other arrays.
fixed stacking arrays with anonymous axes.
fixed applying split_axes on an array with labels of type ‘Object’ (could happen when an array is read from a file).
fixed background color in the viewer when using filters in the compare() dialog (closes issue 66)
fixed autoresize of columns by double clicking between column headers (closes issue 43)
fixed representing a 0D array (scalar) in the viewer (closes issue 71)
fixed viewer not displaying an error message when saving or loading a file failed (closes issue 75)
fixed array.split_axis when the combined axis does not contain all the combination of labels resulting from the split (closes issue 369).
fixed array.split_axis when combined labels are not sorted by the first part then second part (closes issue 364).
fixed opening .csv files in the editor will create variables named using only the filename without extension (instead of being named using the full path of the file – making it almost useless). Closes issue 90.
fixed deleting a variable (using the del key in the list) not marking the session/file as being modified.
fixed the link to the tutorial (Help->Online Tutorial) (closes issue 92).
fixed inplace modifications of arrays in the console (via array[xxx] = value) not updating the view (closes issue 94).
fixed background color in compare() being wrong after changing axes order by drag-and-dropping them (closes issue 89).
fixed the whole array/compare being the same color in the presence of -inf or +inf in the array.
Version 0.25.2
Released on 2017-09-06.
Miscellaneous improvements
Excel Workbooks opened with open_excel(visible=False) will use the global Excel instance by default and those using visible=True will use a new Excel instance by default (closes issue 405).
Fixes
fixed view() which did not show any array (closes issue 57).
fixed exceptions in the viewer crashing it when a Qt app was created (e.g. from a plot) before the viewer was started (closes issue 58).
fixed compare() arrays names not being determined correctly (closes issue 61).
fixed filters and title not being updated when displaying array created via the console (closes issue 55).
fixed array grid not being updated when selecting a variable when no variable was selected (closes issue 56).
fixed copying or plotting multiple rows in the editor when they were selected via drag and drop on headers (closes issue 59).
fixed digits not being automatically updated when changing filters.
Version 0.25.1
Released on 2017-09-04.
Miscellaneous improvements
Deprecated methods display a warning message when they are still used (replaced DeprecationWarning by FutureWarning). Closes issue 310.
updated documentation of method with_total (closes issue 89).
trying to set values of a subset by passing an array with incompatible axes displays a better error message (closes issue 268).
Fixes
fixed error raised in viewer when switching between arrays when a filter was set.
fixed displaying empty array when starting the viewer or a new session in it.
fixed Excel instance created via to_excel() and open_excel() without any filename being closed at the end of the Python program (closes issue 390).
fixed the view(), edit() and compare() functions not being available in the viewer console.
fixed row and column resizing by double clicking on the edge of an header cell.
fixed New and Open in the menu File of the viewer when IPython console is not available.
fixed getting a subset of an array by mixing boolean filters and other filters (closes issue 246):
>>> arr = ndrange('a=a0..a2;b=0..3') >>> arr a\b 0 1 2 3 a0 0 1 2 3 a1 4 5 6 7 a2 8 9 10 11 >>> arr['a0,a2', x.b < 2] a\b 0 1 a0 0 1 a2 8 9
Warning: when mixed with other filters, boolean filters are limited to one dimension.
fixed setting an array values using array.points[key] = value when value is an LArray (closes issue 368).
fixed using syntax ‘int..int’ in a selection (closes issue 350):
>>> arr = ndrange('a=2017..2012') >>> arr a 2017 2016 2015 2014 2013 2012 0 1 2 3 4 5 >>> arr['2012..2015'] a 2012 2013 2014 2015 5 4 3 2
fixed mixing ‘..’ sequences and spaces in an indexing string (closes issue 389):
>>> arr = ndtest(7) >>> arr a a0 a1 a2 a3 a4 a5 a6 0 1 2 3 4 5 6 >>> arr['a0, a2, a4..a6'] a a0 a2 a4 a5 a6 0 2 4 5 6
fixed indexing/aggregating using groups with renaming (using >>) when the axis has mixed type labels (object dtype).
Version 0.25
Released on 2017-08-22.
New features
viewer functions (view, edit and compare) have been moved to the separate larray-editor package, which needs to be installed separately, unless you are using larrayenv. Closes issue 332.
installing larray-editor (or larrayenv) from conda environment creates a new menu ‘LArray’ in the Windows start menu. It contains a link to open the documentation, a shortcut to launch the user interface in edition mode and a shortcut to update larrayenv. Closes issue 281.
added possibility to transpose an array in the viewer by dragging and dropping axes’ names in the filter bar.
implemented array.align(other_array) which makes two arrays compatible with each other (by making all common axes compatible). This is done by adding, removing or reordering labels for each common axis according to the join method used:
outer: will use a label if it is in either arrays axis (ordered like the first array). This is the default as it results in no information loss.
inner: will use a label if it is in both arrays axis (ordered like the first array)
left: will use the first array axis labels
right: will use the other array axis labels
The fill value for missing labels defaults to nan.
>>> arr1 = ndtest((2, 3)) >>> arr1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> arr2 = -ndtest((3, 2)) >>> # reorder array to make the test more interesting >>> arr2 = arr2[['b1', 'b0']] >>> arr2 a\\b b1 b0 a0 -1 0 a1 -3 -2 a2 -5 -4
Align arr1 and arr2
>>> aligned1, aligned2 = arr1.align(arr2) >>> aligned1 a\b b0 b1 b2 a0 0.0 1.0 2.0 a1 3.0 4.0 5.0 a2 nan nan nan >>> aligned2 a\b b0 b1 b2 a0 0.0 -1.0 nan a1 -2.0 -3.0 nan a2 -4.0 -5.0 nan
After aligning all common axes, one can then do operations between the two arrays
>>> aligned1 + aligned2 a\b b0 b1 b2 a0 0.0 0.0 nan a1 1.0 1.0 nan a2 nan nan nan
The fill value for missing labels defaults to nan but can be changed to any compatible value.
>>> aligned1, aligned2 = arr1.align(arr2, fill_value=0) >>> aligned1 a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 a2 0 0 0 >>> aligned2 a\b b0 b1 b2 a0 0 -1 0 a1 -2 -3 0 a2 -4 -5 0 >>> aligned1 + aligned2 a\b b0 b1 b2 a0 0 0 2 a1 1 1 5 a2 -4 -5 0
implemented Session.transpose(axes) to reorder axes of all arrays within a session, ignoring missing axes for each array. For example, let us first create a test session and a small helper function to display sessions as a short summary.
>>> arr1 = ndtest((2, 2, 2)) >>> arr2 = ndtest((2, 2)) >>> sess = Session([('arr1', arr1), ('arr2', arr2)]) >>> def print_summary(s): ... print(s.summary("{name} -> {axes_names}")) >>> print_summary(sess) arr1 -> a, b, c arr2 -> a, b
Put the ‘b’ axis in front of all arrays
>>> print_summary(sess.transpose('b')) arr1 -> b, a, c arr2 -> b, a
Axes missing on an array are ignored (‘c’ for arr2 in this case)
>>> print_summary(sess.transpose('c', 'b')) arr1 -> c, b, a arr2 -> b, a
Use … to move axes to the end
>>> print_summary(sess.transpose(..., 'a')) arr1 -> b, c, a arr2 -> b, a
implemented unary operations on Session, which means one can negate all arrays in a Session or take the absolute value of all arrays in a Session without writing an explicit loop for that.
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(4) - 1 >>> arr2 a a0 a1 a2 a3 -1 0 1 2 >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)]) >>> sess2 = -sess1 >>> sess2.arr1 a a0 a1 0 -1 >>> sess2.arr2 a a0 a1 a2 a3 1 0 -1 -2 >>> sess3 = abs(sess1) >>> sess3.arr2 a a0 a1 a2 a3 1 0 1 2
implemented stacking sessions using stack().
Let us first create two test sessions. For example suppose we have a session storing the results of a baseline simulation:
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(3) >>> arr2 a a0 a1 a2 0 1 2 >>> baseline = Session([('arr1', arr1), ('arr2', arr2)])
and another session with a variant
>>> arr1variant = arr1 * 2 >>> arr1variant a a0 a1 0 2 >>> arr2variant = 2 - arr2 / 2 >>> arr2variant a a0 a1 a2 2.0 1.5 1.0 >>> variant = Session([('arr1', arr1variant), ('arr2', arr2variant)])
then we stack them together
>>> stacked = stack([('baseline', baseline), ('variant', variant)], 'sessions') >>> stacked Session(arr1, arr2) >>> stacked.arr1 a\sessions baseline variant a0 0 0 a1 1 2 >>> stacked.arr2 a\sessions baseline variant a0 0.0 2.0 a1 1.0 1.5 a2 2.0 1.0
Combined with the fact that we can compute some very simple expressions on sessions, this can be extremely useful to quickly compare all arrays of several sessions (e.g. simulation variants):
>>> diff = variant - baseline >>> # compute the absolute difference and relative difference for each array of the sessions >>> stacked = stack([('baseline', baseline), ('variant', variant), ('diff', diff), ('abs diff', abs(diff)), ('rel diff', diff / baseline)], 'sessions') >>> stacked Session(arr1, arr2) >>> stacked.arr2 a\sessions baseline variant diff abs diff rel diff a0 0.0 2.0 2.0 2.0 inf a1 1.0 1.5 0.5 0.5 0.5 a2 2.0 1.0 -1.0 1.0 -0.5
implemented Axis.align(other_axis) and AxisCollection.align(other_collection) which makes two axes / axis collections compatible with each other, see LArray.align above.
implemented Session.apply(function) to apply a function to all elements (arrays) of a Session and return a new Session.
Let us first create a test session
>>> arr1 = ndtest(2) >>> arr1 a a0 a1 0 1 >>> arr2 = ndtest(3) >>> arr2 a a0 a1 a2 0 1 2 >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)]) >>> sess1 Session(arr1, arr2)
Then define the function we want to apply to all arrays of our session
>>> def increment(element): ... return element + 1
Apply it
>>> sess2 = sess1.apply(increment) >>> sess2.arr1 a a0 a1 1 2 >>> sess2.arr2 a a0 a1 a2 1 2 3
implemented setting the value of multiple points using array.points[labels] = value
>>> arr = ndtest((3, 4)) >>> arr a\b b0 b1 b2 b3 a0 0 1 2 3 a1 4 5 6 7 a2 8 9 10 11
Now, suppose you want to retrieve several specific combinations of labels, for example (a0, b1), (a0, b3), (a1, b0) and (a2, b2). You could write a loop like this:
>>> values = [] >>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]: ... values.append(arr[a, b]) >>> values [1, 3, 4, 10]
but you could also (this already worked in previous versions) use array.points like:
>>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']] a,b a0,b1 a0,b3 a1,b0 a2,b2 1 3 4 10
which has the advantages of being both much faster and keep more information. Now suppose you want to set the value of those points, you could write:
>>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]: ... arr[a, b] = 42 >>> arr a\b b0 b1 b2 b3 a0 0 42 2 42 a1 42 5 6 7 a2 8 9 42 11
but now you can also use the faster alternative:
>>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']] = 42
Miscellaneous improvements
added icon to display in Windows start menu and editor windows.
viewer keeps labels visible even when scrolling (label rows and columns are now frozen).
added ‘Getting Started’ section in documentation.
implemented axes argument to ipfp to specify on which axes the fitting procedure should be applied (closes issue 185). For example, let us assume you have a 3D array, such as:
>>> initial = ndrange('a=a0..a9;b=b0..b9;year=2000..2016')
and you want to apply a 2D fitting procedure for each value of the year axis. Previously, you had to loop on that year axis explicitly and call ipfp within the loop, like:
>>> result = zeros(initial.axes) >>> for year in initial.year: ... current = initial[year] ... # assume you have some targets for each year ... current_targets = [current.sum(x.a) + 1, current.sum(x.b) + 1] ... result[year] = ipfp(current_targets, current)
Now you can apply the procedure on all years at once, by telling you want to do the fitting procedure on the other axes. This is a bit shorter to type, but this is also much faster.
>>> all_targets = [initial.sum(x.a) + 1, initial.sum(x.b) + 1] >>> result = ipfp(all_targets, initial, axes=(x.a, x.b))
made ipfp 10 to 20% faster (even without using the axes argument).
implemented Session.to_globals(inplace=True) which will update the content of existing arrays instead of creating new variables and overwriting them. This ensures the arrays have the same axes in the session than the existing variables.
added the ability to provide a pattern when loading several .csv files as a session. Among others, patterns can use * to match any number of characters and ? to match any single character.
>>> s = Session() >>> # load all .csv files starting with "output" in the data directory >>> s.load('data/output*.csv')
stack can be used with keyword arguments when labels are “simple strings” (i.e. no integers, no punctuation, no string starting with integers, etc.). This is an attractive alternative but as it only works in the usual case and not in all cases, it is not recommended to use it except in the interactive console.
>>> arr1 = ones('nat=BE,FO') >>> arr1 nat BE FO 1.0 1.0 >>> arr2 = zeros('nat=BE,FO') >>> arr2 nat BE FO 0.0 0.0 >>> stack(M=arr1, F=arr2, axis='sex=M,F') nat\\sex M F BE 1.0 0.0 FO 1.0 0.0
Without passing an explicit order for labels like above (or an axis object), it should only be used on Python 3.6 or later because keyword arguments are NOT ordered on earlier Python versions.
>>> # use this only on Python 3.6 and later >>> stack(M=arr1, F=arr2, axis='sex') nat\\sex M F BE 1.0 0.0 FO 1.0 0.0
binary operations between session now ignore type errors. For example, if you are comparing two sessions with many arrays by computing the difference between them but a few arrays contain strings, the whole operation will not fail, the concerned arrays will be assigned a nan instead.
added optional argument ignore_exceptions to Session.load to ignore exceptions during load. This is mostly useful when trying to load many .csv files in a Session and some of them have an invalid format but you want to load the others.
Fixes
fixed disambiguating an ambiguous key by adding the axis within the string, for example arr[‘axis_name[ambiguouslabel]’] (closes issue 331).
fixed converting a string group to integer or float using int() and float() (when that makes sense).
>>> a = Axis('a=10,20,30,total') >>> a Axis(['10', '20', '30', 'total'], 'a') >>> str(a.i[0]) '10' >>> int(a.i[0]) 10 >>> float(a.i[0]) 10.0
Version 0.24.1
Released on 2017-06-14.
Fixes
updated the tutorial to use version 0.24 syntax.
Version 0.24
Released on 2017-06-14.
New features
implemented Session.to_globals which creates global variables from variables stored in the session (closes issue 276). Note that this should usually only be used in an interactive console and not in a script. Code editors are confused by this kind of manipulation and will likely consider as invalid the code using variables created in this way. Additionally, when using this method auto-completion, “show definition”, “go to declaration” and other similar code editor features will probably not work for the variables created in this way and any variable derived from them.
>>> s = Session(arr1=ndtest(3), arr2=ndtest((2, 2))) >>> s.to_globals() >>> arr1 a a0 a1 a2 0 1 2 >>> arr2 a\b b0 b1 a0 0 1 a1 2 3
added new boolean argument ‘overwrite’ to Session.save, Session.to_hdf, Session.to_excel and Session.to_pickle methods (closes issue 293). If overwrite=True and the target file already existed, it is deleted and replaced by a new one. This is the new default behavior. If overwrite=False, an existing file is updated (like it was in previous larray versions):
>>> arr1, arr2, arr3 = ndtest((2, 2)), ndtest(4), ndtest((3, 2)) >>> s = Session([('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
>>> # save arr1, arr2 and arr3 in file output.h5 >>> s.save('output.h5')
>>> # replace arr1 and create arr4 + put them in an second session >>> arr1, arr4 = ndtest((3, 3)), ndtest((2, 3)) >>> s2 = Session([('arr1', arr1), ('arr4', arr4)])
>>> # replace arr1 and add arr4 in file output.h5 >>> s2.save('output.h5', overwrite=False)
>>> # erase content of 'output.h5' and save only arrays contained in the second session >>> s2.save('output.h5')
Miscellaneous improvements
renamed create_sequential() to sequence() (closes issue 212).
improved auto-completion in ipython interactive consoles (e.g. the viewer console) for Axis, AxisCollection, Group and Workbook objects. These objects can now complete keys within [].
>>> gender = Axis('gender=Male,Female') >>> gender Axis(['Male', 'Female'], 'gender') gender['Female >>> gender['Fe<tab> # will be completed to `gender['Female`
>>> arr = ndrange(gender) >>> arr.axes['gen<tab> # will be completed to `arr.axes['gender`
>>> wb = open_excel() >>> wb['Sh<tab> # will be completed to `wb['Sheet1`
added documentation for Session methods (closes issue 277).
allowed to provide explict names for arrays or sessions in compare(). Closes issue 307.
Fixes
fixed title argument of ndtest creation function: title was not passed to the returned array.
fixed create_sequential when arguments initial and inc are array and scalar respectively (closes issue 288).
fixed auto-completion of attributes of LArray and Group objects (closes issue 302).
fixed name of arrays/sessions in compare() not being inferred correctly (closes issue 306).
fixed indexing Excel sheets by position to always yield the requested shape even when bounds are outside the range of used cells. Closes issue 273.
fixed the array() method on excel.Sheet returning float labels when int labels are expected.
fixed getting float data instead of int when converting an Excel Sheet or Range to an larray or numpy array.
fixed some warning messages to point to the correct line in user code.
fixed crash of Session.save method when it contained 0D arrays. They are now skipped when saving a session (closes issue 291).
fixed Session.save and Session.to_excel failing to create new Excel files (it only worked if the file already existed). Closes issue 313.
fixed Session.load(file, engine=’pandas_excel’) : axes were considered as anonymous.
Version 0.23
Released on 2017-05-30.
Miscellaneous improvements
changed display of arrays (closes issue 243):
>>> ndtest((2, 3)) a\b b0 b1 b2 a0 0 1 2 a1 3 4 5
instead of
>>> ndtest((2, 3)) a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5
.. can now be used within keys (between []). Previously it could only be used to define new axes. As a reminder, it generates increasing values between the two bounds. It is slightly different from : which takes everything between the two bounds in the axis order.
>>> arr = ndrange('a=a1,a0,a2,a3') >>> arr a a1 a0 a2 a3 0 1 2 3 >>> arr['a1..a3'] a a1 a2 a3 0 2 3
this is different from : which takes everything in between the two bounds :
>>> arr['a1:a3'] a a1 a0 a2 a3 0 1 2 3
in both axes definitions and keys (within []) .. can now be mixed with , and other .. :
>>> arr = ndrange('code=A,C..E,G,X..Z') >>> arr code A C D E G X Y Z 0 1 2 3 4 5 6 7 >>> arr['A,Z..X,G'] code A Z Y X G 0 7 6 5 4
within .. extra zeros are only padded to numbers if zeros are present in the pattern.
>>> ndrange('code=A1..A12') code A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 0 1 2 3 4 5 6 7 8 9 10 11
>>> ndrange('code=A01..A12') code A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 A11 A12 0 1 2 3 4 5 6 7 8 9 10 11
in previous larray versions, the two above definitions returned the second array.
set sep argument of from_string function to ‘ ‘ by default (closes issue 271). For 1D array, a “-” must be added in front of the data line.
>>> from_string('''sex M F - 0 1''') sex M F 0 1 >>> from_string('''nat\\sex M F BE 0 1 FO 2 3''') nat\sex M F BE 0 1 FO 2 3
improved error message when trying to access nonexistent sheet in an Excel workbook (closes issue 266).
when creating an Axis from a Group and no explicit name was given, reuse the name of the group axis.
>>> a = Axis('a=a0..a2') >>> Axis(a[:'a1']) Axis(['a0', 'a1'], 'a')
allowed to create an array using a single group as if it was an Axis.
>>> a = Axis('a=a0..a2') >>> ndrange(a) a a0 a1 a2 0 1 2 >>> # using a group as an axis >>> ndrange(a[:'a1']) a a0 a1 0 1
allowed to use axes (Axis objects) to subset arrays (part of issue 210).
>>> arr = ndtest((2, 3)) >>> arr a\b b0 b1 b2 a0 0 1 2 a1 3 4 5 >>> b2 = Axis('b=b0,b2') >>> arr[b2] a\b b0 b2 a0 0 2 a1 3 5
improved string representation of Excel workbooks and sheets (they mention the actual file/sheet they correspond to). This is mostly useful in the interactive console to check what an object corresponds to.
>>> wb = open_excel() >>> wb <larray.io.excel.Workbook [Book1]> >>> wb[0] <larray.io.excel.Sheet [Book1]Sheet1>
Fixes
open_excel(‘non existent file’) will raise an explicit error immediately when overwrite_file is False, instead of failing at a seemingly random point later on (closes issue 265).
integer-like strings in axis definition strings using , are converted to integers to be consistent with string definitions using ... In other words, ndrange(‘a=1,2,3’) did not create the same array than ndrange(‘a=1..3’).
fixed reading a single cell from an Excel sheet.
fixed script execution not resuming after quitting the viewer when it was called using view(a_single_array).
fixed opening the viewer after showing a plot window.
do not display an error when setting the value of an element of a non LArray sequence in the viewer console
>>> l = [1, 2, 3] >>> l[0] = 42
Version 0.22
Released on 2017-05-11.
New features
viewer: added a menu bar with the ability to clear the current session, save all its arrays to a file (.h5, .xlsx, or a directory containing multiple .csv files), and load arrays from such a file (closes issue 88).
WARNING: Only array objects are currently saved. It means that scalars, functions or others non-LArray objects defined in the console are not saved in the file.
implemented a new describe() method on arrays to give quick summary statistics. By default, it includes the number of non-NaN values, the mean, standard deviation, minimum, 25, 50 and 75 percentiles and maximum.
>>> arr = ndrange('gender=Male,Female;year=2014..2020').astype(float) >>> arr gender\year | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 Male | 0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 Female | 7.0 | 8.0 | 9.0 | 10.0 | 11.0 | 12.0 | 13.0 >>> arr.describe() statistic | count | mean | std | min | 25% | 50% | 75% | max | 14.0 | 6.5 | 4.031128874149275 | 0.0 | 3.25 | 6.5 | 9.75 | 13.0
an optional keyword argument allows to specify different percentiles to include
>>> arr.describe(percentiles=[20, 40, 60, 80]) statistic | count | mean | std | min | 20% | 40% | 60% | 80% | max | 14.0 | 6.5 | 4.031128874149275 | 0.0 | 2.6 | 5.2 | 7.8 | 10.4 | 13.0
its sister method, describe_by() was also implemented to give quick summary statistics along axes or groups.
>>> arr.describe_by('gender') gender\statistic | count | mean | std | min | 25% | 50% | 75% | max Male | 7.0 | 3.0 | 2.0 | 0.0 | 1.5 | 3.0 | 4.5 | 6.0 Female | 7.0 | 10.0 | 2.0 | 7.0 | 8.5 | 10.0 | 11.5 | 13.0 >>> arr.describe_by('gender', (x.year[:2015], x.year[2019:])) gender | year\statistic | count | mean | std | min | 25% | 50% | 75% | max Male | :2015 | 2.0 | 0.5 | 0.5 | 0.0 | 0.25 | 0.5 | 0.75 | 1.0 Male | 2019: | 2.0 | 5.5 | 0.5 | 5.0 | 5.25 | 5.5 | 5.75 | 6.0 Female | :2015 | 2.0 | 7.5 | 0.5 | 7.0 | 7.25 | 7.5 | 7.75 | 8.0 Female | 2019: | 2.0 | 12.5 | 0.5 | 12.0 | 12.25 | 12.5 | 12.75 | 13.0
This closes issue 184.
implemented reindex allowing to change the order of labels and add/remove some of them to one or several axes:
>>> arr = ndtest((2, 2)) >>> arr a\b | b0 | b1 a0 | 0 | 1 a1 | 2 | 3 >>> arr.reindex(x.b, ['b1', 'b2', 'b0'], fill_value=-1) a\b | b1 | b2 | b0 a0 | 1 | -1 | 0 a1 | 3 | -1 | 2 >>> a = Axis('a', ['a1', 'a2', 'a0']) >>> b = Axis('b', ['b2', 'b1', 'b0']) >>> arr.reindex({'a': a, 'b': b}, fill_value=-1) a\b | b2 | b1 | b0 a1 | -1 | 3 | 2 a2 | -1 | -1 | -1 a0 | -1 | 1 | 0
using reindex one can make an array compatible with another array which has more/less labels or with labels in a different order:
>>> arr2 = ndtest((3, 3)) >>> arr2 a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 a2 | 6 | 7 | 8 >>> arr.reindex(arr2.axes, fill_value=0) a\b | b0 | b1 | b2 a0 | 0 | 1 | 0 a1 | 2 | 3 | 0 a2 | 0 | 0 | 0 >>> arr.reindex(arr2.axes, fill_value=0) + arr2 a\b | b0 | b1 | b2 a0 | 0 | 2 | 2 a1 | 5 | 7 | 5 a2 | 6 | 7 | 8
This closes issue 18.
added load_example_data function to load datasets used in tutorial and be able to reproduce examples. The name of the dataset must be provided as argument (there is currently only one available dataset). Datasets are returned as Session objects:
>>> demo = load_example_data('demography') >>> demo.pop.info 26 x 3 x 121 x 2 x 2 time [26]: 1991 1992 1993 ... 2014 2015 2016 geo [3]: 'BruCap' 'Fla' 'Wal' age [121]: 0 1 2 ... 118 119 120 sex [2]: 'M' 'F' nat [2]: 'BE' 'FO' >>> demo.qx.info 26 x 3 x 121 x 2 x 2 time [26]: 1991 1992 1993 ... 2014 2015 2016 geo [3]: 'BruCap' 'Fla' 'Wal' age [121]: 0 1 2 ... 118 119 120 sex [2]: 'M' 'F' nat [2]: 'BE' 'FO'
(closes issue 170)
implemented Axis.union, intersection and difference which produce new axes by combining the labels of the axis with the other labels.
>>> letters = Axis('letters=a,b') >>> letters.union(Axis('letters=b,c')) Axis(['a', 'b', 'c'], 'letters') >>> letters.union(['b', 'c']) Axis(['a', 'b', 'c'], 'letters') >>> letters.intersection(['b', 'c']) Axis(['b'], 'letters') >>> letters.difference(['b', 'c']) Axis(['a'], 'letters')
implemented Group.union, intersection and difference which produce new groups by combining the labels of the group with the other labels.
>>> letters = Axis('letters=a..d') >>> letters['a', 'b'].union(letters['b', 'c']) letters['a', 'b', 'c'].set() >>> letters['a', 'b'].union(['b', 'c']) letters['a', 'b', 'c'].set() >>> letters['a', 'b'].intersection(['b', 'c']) letters['b'].set() >>> letters['a', 'b'].difference(['b', 'c']) letters['a'].set()
viewer: added possibility to delete an array by pressing Delete on keyboard (closes issue 116).
Excel sheets in workbooks opened via open_excel can be renamed by changing their .name attribute:
>>> wb = open_excel() >>> wb['old_sheet_name'].name = 'new_sheet_name'
Excel sheets in workbooks opened via open_excel can be deleted using “del”:
>>> wb = open_excel() >>> del wb['sheet_name']
implemented PGroup.set() to transform a positional group to an LSet.
>>> a = Axis('a=a0..a5') >>> a.i[:2].set() a['a0', 'a1'].set()
Miscellaneous improvements
inverted name and labels arguments when creating an Axis and made name argument optional (to create anonymous axes). Now, it is also possible to create an Axis by passing a single string of the kind ‘name=labels’:
>>> anonymous = Axis('0..100') >>> age = Axis('age=0..100') >>> gender = Axis('M,F', 'gender')
(closes issue 152)
renamed Session.dump, dump_hdf, dump_excel and dump_csv to save, to_hdf, to_excel and to_csv (closes issue 217).
changed default value of ddof argument for var and std functions from 0 to 1 (closes issue 190).
implemented a new syntax for stack(): stack({label1: value1, label2: value2}, axis)
>>> nat = Axis('nat', 'BE, FO') >>> sex = Axis('sex', 'M, F') >>> males = ones(nat) >>> males nat | BE | FO | 1.0 | 1.0 >>> females = zeros(nat) >>> females nat | BE | FO | 0.0 | 0.0
In the case the axis has already been defined in a variable, this gives:
>>> stack({'M': males, 'F': females}, sex) nat\sex | M | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
Additionally, axis can now be an axis string definition in addition to an Axis object, which means one can write this:
>>> stack({'M': males, 'F': females}, 'sex=M,F')
It is better than the simpler but highly discouraged alternative:
>>> stack([males, females), sex)
because it is all too easy to invert labels. It is very hard to spot the error in the following line, and larray cannot spot it for you either:
>>> stack([females, males), sex) nat\sex | M | F BE | 0.0 | 1.0 FO | 0.0 | 1.0
When creating an axis from scratch (it does not already exist in a variable), one might want to use this:
>>> stack([males, females], 'sex=M,F')
even if this could suffer, to a lesser extent, the same problem as above when stacking many arrays.
handle … in transpose method to avoid having to list all axes. This can be useful, for example, to change which axis is displayed in columns (closes issue 188).
>>> arr.transpose(..., 'time') >>> arr.transpose('gender', ..., 'time')
made scalar Groups behave even more like their value: any method available on the value is available on the Group. For example, if the Group has a string value, the string methods are available on it (closes issue 202).
>>> test = Axis('test', ['abc', 'a1-a2']) >>> test.i[0].upper() 'ABC' >>> test.i[1].split('-') ['a1', 'a2']
updated AxisCollection.replace so as to replace one, several or all axes and to accept axis definitions as new axes.
>>> arr = ndtest((2, 3)) >>> axes = arr.axes >>> axes AxisCollection([ Axis(['a0', 'a1'], 'a'), Axis(['b0', 'b1', 'b2'], 'b') ]) >>> row = Axis(['r0', 'r1'], 'row') >>> column = Axis(['c0', 'c1', 'c2'], 'column')
Replace several axes (keywords, list of tuple or dictionary)
>>> axes.replace(a=row, b=column) >>> # or >>> axes.replace(a="row=r0,r1", b="column=c0,c1,c2") >>> # or >>> axes.replace([(x.a, row), (x.b, column)]) >>> # or >>> axes.replace({x.a: row, x.b: column}) AxisCollection([ Axis(['r0', 'r1'], 'row'), Axis(['c0', 'c1', 'c2'], 'column') ])
added possibility to delete an array from a session:
>>> s = Session({'a': ndtest((3, 3)), 'b': ndtest((2, 4)), 'c': ndtest((4, 2))}) >>> s.names ['a', 'b', 'c'] >>> del s.b >>> del s['c'] >>> s.names ['a']
made create_sequential axis argument accept axis definitions in addition to Axis objects like, for example, using a string definition (closes issue 160).
>>> create_sequential('year=2016..2019') year | 2016 | 2017 | 2018 | 2019 | 0 | 1 | 2 | 3
replaced *args, **kwargs by explicit arguments in documentation of aggregation functions (sum, prod, mean, std, var, …). Closes issue 41.
improved documentation of plot method (closes issue 169).
improved auto-completion in ipython interactive consoles for both LArray and Session objects. LArray objects can now complete keys within [].
>>> a = ndrange('sex=Male,Female') >>> a sex | Male | Female | 0 | 1 >>> a['Fe<tab>`
will autocomplete to a[‘Female. Sessions will now auto-complete both attributes (using session.) and keys (using session[).
>>> s = Session({'a_nice_test_array': ndtest(10)}) >>> s.a_<tab>
will autocomplete to s.a_nice_test_array and s[‘a_<tab> will be completed to s[‘a_nice_test_array
made warning messages for division by 0 and invalid values (usually caused by 0 / 0) point to the user code line, instead of the corresponding line in the larray module.
preserve order of arrays in a session when saving to/loading from an .xlsx file.
when creating a session from a directory containing CSV files, the directory may now contain other (non-CSV) files.
several calls to open_excel from within the same program/script will now reuses a single global Excel instance. This makes Excel I/O much faster without having to create an instance manually using xlwings.App, and still without risking interfering with other instances of Excel opened manually (closes issue 245).
improved error message when trying to copy a sheet from one instance of Excel to another (closes issue 231).
Fixes
fixed keyword arguments such as out, ddof, … for aggregation functions (closes issue 189).
fixed percentile(_by) with multiple percentiles values, i.e. when argument q is a list/tuple (closes issue 192).
fixed group aggregates on integer arrays for median, percentile, var and std (closes issue 193).
fixed group sum over boolean arrays (closes issue 194).
fixed set_labels when inplace=True.
fixed array creation functions not raising an exception when called with wrong syntax func(axis1, axis2, …) instead of func([axis1, axis2, …]) (closes issue 203).
fixed position of added sheets in excel workbook: new sheets are appended instead of prepended (closes issue 229).
fixed Workbook behavior in case of new workbook: the first added sheet replaces the default sheet Sheet1 (closes issue 230).
fixed name of Workbook sheets created by copying another sheet (closes issue 244).
>>> wb = open_excel() >>> wb['name_of_new_sheet'] = wb['name_of_sheet_to_copy']
fixed with_axes warning to refer to set_axes instead of replace_axes.
fixed displayed title in viewer: shows path to file associated with current session + current array info + extra info (closes issue 181)
Version 0.21
Released on 2017-03-28.
New features
implemented set_axes() method to replace one, several or all axes of an array (closes issue 67). The method with_axes() is now deprecated (set_axes() must be used instead).
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> row = Axis('row', ['r0', 'r1']) >>> column = Axis('column', ['c0', 'c1', 'c2'])
Replace one axis (second argument new_axis must be provided)
>>> arr.set_axes(x.a, row) row\b | b0 | b1 | b2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
Replace several axes (keywords, list of tuple or dictionary)
>>> arr.set_axes(a=row, b=column) or >>> arr.set_axes([(x.a, row), (x.b, column)]) or >>> arr.set_axes({x.a: row, x.b: column}) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
Replace all axes (list of axes or AxisCollection)
>>> arr.set_axes([row, column]) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5 >>> arr2 = ndrange([row, column]) >>> arr.set_axes(arr2.axes) row\column | c0 | c1 | c2 r0 | 0 | 1 | 2 r1 | 3 | 4 | 5
implemented Axis.replace to replace some labels from an axis:
>>> sex = Axis('sex', ['M', 'F']) >>> sex Axis('sex', ['M', 'F']) >>> sex.replace('M', 'Male') Axis('sex', ['Male', 'F']) >>> sex.replace({'M': 'Male', 'F': 'Female'}) Axis('sex', ['Male', 'Female'])
implemented from_string() method to create an array from a string (closes issue 96).
>>> from_string('''age,nat\\sex, M, F ... 0, BE, 0, 1 ... 0, FO, 2, 3 ... 1, BE, 4, 5 ... 1, FO, 6, 7''') age | nat\sex | M | F 0 | BE | 0 | 1 0 | FO | 2 | 3 1 | BE | 4 | 5 1 | FO | 6 | 7
allowed to use a regular expression in split_axis method (closes issue 106):
>>> combined = ndrange('a_b = a0b0..a1b2') >>> combined a_b | a0b0 | a0b1 | a0b2 | a1b0 | a1b1 | a1b2 | 0 | 1 | 2 | 3 | 4 | 5 >>> combined.split_axis(x.a_b, regex='(\w{2})(\w{2})') a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5
one can assign a new axis to several groups at the same time by using axis[groups]:
>>> group1 = year[2001:2004] >>> group2 = year[2008,2009] >>> # let us change the year axis by time >>> x.time[group1, group2] (x.time[2001:2004], x.time[2008, 2009])
implemented Axis.by() which is equivalent to axis[:].by() and divides the axis into several groups of specified length:
>>> year = Axis('year', '2010..2016') >>> year.by(3) (year.i[0:3], year.i[3:6], year.i[6:7])
which is equivalent to (year[2010:2012], year[2013:2015], year[2016]). Like for groups, the optional second argument specifies the step between groups
>>> year.by(3, step=4) (year.i[0:3], year.i[4:7])
which is equivalent to (year[2010:2012], year[2014:2016]). And if step is smaller than length, we get overlapping groups, which can be useful for example for moving averages.
>>> year.by(3, 2) (year.i[0:3], year.i[2:5], year.i[4:7], year.i[6:7])
which is equivalent to (year[2010:2012], year[2012:2014], year[2014:2016], year[2016])
implemented larray_nan_equal to test whether two arrays are identical even in the presence of nan values. Two arrays are considered identical by larray_equal if they have exactly the same axes and data. However, since a nan value has the odd property of not being equal to itself, larray_equal returns False if either array contains a nan value. larray_nan_equal returns True if all not-nan data is equal and both arrays have nans at the same place.
>>> arr1 = ndtest((2, 3), dtype=float) >>> arr1['a1', 'b1'] = nan >>> arr1 a\b | b0 | b1 | b2 a0 | 0.0 | 1.0 | 2.0 a1 | 3.0 | nan | 5.0 >>> arr2 = arr1.copy() >>> arr2 a\b | b0 | b1 | b2 a0 | 0.0 | 1.0 | 2.0 a1 | 3.0 | nan | 5.0 >>> larray_equal(arr1, arr2) False >>> larray_nan_equal(arr1, arr2) True >>> arr2['b1'] = 0.0 >>> larray_nan_equal(arr1, arr2) False
Miscellaneous improvements
viewer: make keyboard shortcuts work even when the focus is not on the array editor widget. It means that, for example, plotting an array (via Ctrl-P) or opening it in Excel (Ctrl-E) can be done directly even when interacting with the list of arrays or within the interactive console (closes issue 102).
viewer: automatically display plots done in the viewer console in a separate window (see example below), unless “%matplotlib inline” is used.
>>> arr = ndtest((3, 3)) >>> arr.plot()
viewer: when calling view(an_array) from within the viewer, the new window opened does not block the initial window, which means you can have several windows open at the same time. view() without argument can still result in odd behavior though.
improved LArray.set_labels to make it possible to replace only some labels of an axis, instead of all of them and to replace labels from several axes at the same time.
>>> a = ndrange('nat=BE,FO;sex=M,F') >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3
to replace only some labels, one must give a mapping giving the new label for each label to replace
>>> a.set_labels(x.sex, {'M': 'Men'}) nat\sex | Men | F BE | 0 | 1 FO | 2 | 3
to replace labels for several axes at the same time, one should give a mapping giving the new labels for each changed axis
>>> a.set_labels({'sex': 'Men,Women', 'nat': 'Belgian,Foreigner'}) nat\sex | Men | Women Belgian | 0 | 1 Foreigner | 2 | 3
one can also replace some labels in several axes by giving a mapping of mappings
>>> a.set_labels({'sex': {'M': 'Men'}, 'nat': {'BE': 'Belgian'}}) nat\sex | Men | F Belgian | 0 | 1 FO | 2 | 3
allowed matrix multiplication (@ operator) between arrays with dimension != 2 (closes issue 122).
improved LArray.plot to get nicer plots by default. The axes are transposed compared to what they used to, because the last axis is often used for time series. Also it considers a 1D array like a single series, not N series of 1 point.
added installation instructions (closes issue 101).
Axis.group and Axis.all are now deprecated (closes issue 148).
>>> city.group(['London', 'Brussels'], name='capitals') # should be written as: >>> city[['London', 'Brussels']] >> 'capitals'
and
>>> city.all() # should be written as: >>> city[:] >> 'all'
Fixes
viewer: allow changing the number of displayed digits even for integer arrays as that makes sense when using scientific notation (closes issue 100).
viewer: fixed opening a viewer via view() edit() or compare() from within the viewer (closes issue 109)
viewer: fixed compare() colors when arrays have values which are very close but not exactly equal (closes issue 123)
viewer: fixed legend when plotting arbitrary rows (it always displayed the labels of the first rows) (closes issue 136).
viewer: fixed labels on the x axis when zooming on a plot (closes issue 143)
viewer: fixed storing an array in a variable with a name which existed previously but which was not displayable in the viewer, such as the name of any function or special object. In some cases, this error lead to a crash of the viewer. For example, this code failed when run in the viewer console, because x is already defined (for the x. syntax):
>>> x = ndtest(3)
fixed indexing an array using a positional group with a position which corresponds to a label on that axis. This used to return the wrong data (the data corresponding to the position as if it was the key).
>>> a = Axis('a', '1..3') >>> arr = ndrange(a) >>> arr a | 1 | 2 | 3 | 0 | 1 | 2 >>> # this used to return 0 ! >>> arr[a.i[1]] 1
fixed == for positional groups (closes issue 93)
>>> years = Axis('years', '1995..1997') >>> years Axis('years', [1995, 1996, 1997]) >>> # this used to return False >>> years.i[0] == 1995 True
fixed using positional groups for their value in many cases (slice bounds, within list of values, within other groups, etc.). For example, this used to fail:
>>> arr = ndtest((2, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 >>> b = arr.b >>> start = b.i[0] # equivalent to start = 'b0' >>> stop = b.i[2] # equivalent to stop = 'b2' >>> arr[start:stop] a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 4 | 5 | 6 >>> arr[[b.i[0], b.i[2]]] a\b | b0 | b2 a0 | 0 | 2 a1 | 4 | 6
fixed posargsort labels (closes issue 137).
fixed labels when doing group aggregates using positional groups. Previously, it used the positions as labels. This was most visible when using the Group.by() method (which creates positional groups).
>>> years = Axis('years', '2010..2015') >>> arr = ndrange(years) >>> arr years | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 0 | 1 | 2 | 3 | 4 | 5 >>> arr.sum(years.by(3)) years | 2010:2012 | 2013:2015 | 3 | 12
While this used to return:
>>> arr.sum(years.by(3)) years | 0:3 | 3:6 | 3 | 12
fixed Group.by() when the group was a slice with either bound unspecified. For example, years[2010:2015].by(3) worked but years[:].by(3), years[2010:].by(3) and years[:2015].by(3) did not.
fixed a speed regression in version 0.18 and later versions compared to 0.17. In some cases, it was up to 40% slower than it should (closes issue 165).
Version 0.20
Released on 2017-02-09.
IMPORTANT
To make sure all users have all optional dependencies installed and use the same version of packages, and to simplify the update process, we created a new “larrayenv” package which will install larray itself AND all its dependencies (including the optional ones). This means that this version needs to be installed using:
conda install larrayenv
in the future, to update from one version to the next, it should always be enough to do:
conda update larrayenv
New features
implemented from_lists() to create constant arrays (instead of using LArray directly as that is very error prone). We are not really happy with its name though, so it might change in the future. Any suggestion of a better name is very welcome (closes issue 30).
>>> from_lists([['sex\\year', 1991, 1992, 1993], ... [ 'M', 0, 1, 2], ... [ 'F', 3, 4, 5]]) sex\year | 1991 | 1992 | 1993 M | 0 | 1 | 2 F | 3 | 4 | 5
added support for loading sparse arrays via open_excel().
For example, assuming you have a sheet like this:
age | sex\year | 2015 | 2016 10 | F | 0.0 | 1.0 10 | M | 2.0 | 3.0 20 | M | 4.0 | 5.0
loading it will yield:
>>> wb = open_excel('test_sparse.xlsx') >>> arr = wb['Sheet1'].load() >>> arr age | sex\year | 2015 | 2016 10 | F | 0.0 | 1.0 10 | M | 2.0 | 3.0 20 | F | nan | nan 20 | M | 4.0 | 5.0
Miscellaneous improvements
allowed to get an axis from an array by using array.axis_name in addition to array.axes.axis_name:
>>> arr = ndtest((2, 3)) >>> arr.axes AxisCollection([ Axis('a', ['a0', 'a1']), Axis('b', ['b0', 'b1', 'b2']) ]) >>> arr.a Axis('a', ['a0', 'a1'])
viewer: several rows/columns can be plotted together. It draws a separate line for each row except if only one column has been selected.
viewer: the array labels are used as “ticks” in plots.
‘_by’ aggregation methods accept groups in addition to axes (closes issue 59). It will keep only the mentioned groups and aggregate all other dimensions:
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
>>> arr.sum_by('c0,c1;c1:c3') c | c0,c1 | c1:c3 | 126 | 216
viewer: view() and edit() now accept as argument a path to a file containing arrays.
>>> view('myfile.h5')
this is a shortcut for:
>>> view(Session('myfile.h5'))
AxisCollection.without now accepts a single integer position (to exclude an axis by position).
>>> a = ndtest((2, 3)) >>> a.axes AxisCollection([ Axis('a', ['a0', 'a1']), Axis('b', ['b0', 'b1', 'b2']) ]) >>> a.axes.without(0) AxisCollection([ Axis('b', ['b0', 'b1', 'b2']) ])
nicer display (repr) for LSet (closes issue 44).
>>> x.b['b0,b2'].set() x.b['b0', 'b2'].set()
implemented sep argument for LArray & AxisCollection.combine_axes() to allow using a custom delimiter (closes issue 53).
added a check that ipfp target sums haves expected axes (closes issue 42).
when the nb_index argument is not provided explicitly in read_excel(engine=’xlrd’), it is autodetected from the position of the first “" (closes issue 66).
allow any special character except “.” and whitespace when creating axes labels using “..” syntax (previously only _ was allowed).
added many more I/O tests to hopefully lower our regression rate in the future (closes issue 70).
Fixes
viewer: selection of entire rows/columns will load any remaining data, if any (closes issue 37). Previously if you selected entire rows or columns of a large dataset (which is not loaded entirely from the start), it only selected (and thus copied/plotted) the part of the data which was already loaded.
viewer: filtering on anonymous axes is now possible (closes issue 33).
fixed loading sparse files using read_excel() (fixes issue 29).
fixed nb_index argument for read_excel().
fixed creating range axes with a negative start bound using string notation (e.g. Axis(‘name’, ‘-1..10’)) (fixes issue 51).
fixed ptp() function.
fixed with_axes() to copy the title of the array.
fixed Group >> ‘name’.
fixed workbook[sheet_position] when using open_excel().
fixed plotting in the viewer when using Qt4.
Version 0.19
Released on 2017-01-19.
New features
Implemented a “by” variant to all aggregate methods (e.g. sum_by, mean_by, etc.). These methods aggregate all axes except those listed, which means the only axes remaining after the aggregate operation will be those listed. For example: arr.sum_by(x.a) is equivalent to arr.sum(arr.axes - x.a)
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23 >>> arr.sum_by(x.b) b | b0 | b1 | b2 | 60 | 92 | 124
Added .extend() method to Axis class
>>> a = Axis('a', 'a0..a2') >>> a Axis('a', ['a0', 'a1', 'a2']) >>> other = Axis('other', 'a3..a5') >>> a.extend(other) Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
or directly specify the extra labels as a list or as a “label string”:
>>> a.extend('a3..a5') Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
Added title argument to all array creation functions (ndrange, zeros, ones, …) and display it in the .info of array objects.
>>> a = ndrange(3, title='a simple test array') >>> a.info a simple test array 3 {0}* [3]: 0 1 2
implemented creating an Axis using a group:
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> a, b = arr.axes >>> zeros((a, b[:'b1'])) a\b | b0 | b1 a0 | 0.0 | 0.0 a1 | 0.0 | 0.0
made Axis.startswith, .endswith and .matches accept Group instances
>>> a = Axis('a', 'a0..b2') >>> a Axis('a', ['a0', 'a1', 'a2', 'b0', 'b1', 'b2'])
>>> prefix = Axis('prefix', 'a,b') >>> a.startswith(prefix['a']) a['a0', 'a1', 'a2'] >>> a.startswith(prefix.i[1]) a['b0', 'b1', 'b2']
implemented all usual binary operations (+, -, *, /, …) on Group
>>> year = Axis('year', '2011..2016') >>> year[2013] + 1 2014 >>> year.i[2] + 1 2014
made the viewer is much more useful as a debugger in the middle of a function by generalizing SessionEditor to handle any mapping, instead of only Session objects but made it list and display only array objects. To view the value of non-array variable one should type their name in the console. Given those changes, view() will superficially behave as before, but behind the scene, all variables which were defined in the scope where view() was called will be available in the viewer console, even though they will not appear in the list on the left. This means that the viewer console will be able to use scalars defined at that point and call others functions of your code. In other words, there are more chances you can execute some code from the function calling view() by simply copy-pasting the code line.
Backward incompatible changes
LGroup lost set-like operations (intersection and union) to the profit of a specific subclass (LSet). In other words, this no longer works:
>>> letters = Axis('letters', 'a..z') >>> letters[':c'] & letters['b:']
To make it work, we need to convert the LGroup(s) to LSets explicitly:
>>> letters[':c'].set() & letters['b:d'].set() letters.set[OrderedSet(['b', 'c'])]
>>> letters[':c'].set() | letters['b:d'].set() letters.set[OrderedSet(['a', 'b', 'c', 'd'])]
>>> letters[':c'].set() - 'b' letters.set[OrderedSet(['a', 'c'])]
group aggregates produce simple string labels for the new aggregated axis instead of using the group themselves as labels. This means one can no longer know where a group comes from but this simplifies the code and fixes a few issues, most notably export of aggregated arrays to Excel, and some operations between two aggregated arrays.
>>> arr = ndtest((3, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 a2 | 8 | 9 | 10 | 11 >>> agg = arr.sum(':b2 >> tob2;b2,b3 >> other') >>> agg a\b | tob2 | other a0 | 3 | 5 a1 | 15 | 13 a2 | 27 | 21 >>> agg.info 3 x 2 a [3]: 'a0' 'a1' 'a2' b [2]: 'tob2' 'other' >>> agg.axes.b.labels[0] 'tob2'
In previous versions this would have returned:
>>> agg.axes.b.labels[0] LGroup(':b2', name='tob2', axis=Axis('b', ['b0', 'b1', 'b2', 'b3']))
a string containing only a single “integer-like” is no longer transformed to an integer e.g. “10” will evaluate to (the string) “10” (like in version 0.17 and earlier) while “10,20” will evaluate to the list of integers: [10, 20]
Other changes
changed how Group instances are displayed.
>>> a = Axis('a', 'a0..a2') >>> a['a1,a2'] a['a1', 'a2']
Fixes
fixed > and >= on Group using slices
avoid a division by 0 warning when using divnot0
viewer: fixed plots when Qt5 is installed. This also removes the matplotlib warning people got when running the viewer with Qt5 installed.
viewer: display array when typing its name in the console even when no array was selected previously
Misc
misc code cleanup, improved docstrings, …
Version 0.18
Released on 2016-12-20.
Major improvements
the documentation (docstrings) of many functions was vastly improved (thanks to Alix)
implemented a new optional syntax to generate sequences of labels for axes by using patterns
integer strings generate integers
>>> ndrange('age=0..10') age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
you can combine letters and numbers. The number part is treated like increasing (or decreasing numbers)
>>> ndrange('lipro=P01..P12') lipro | P01 | P02 | P03 | P04 | P05 | P06 | P07 | P08 | P09 | P10 | P11 | P12 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11
letter patterns generate all combination of letters between the start and end:
>>> ndrange('test=AA..CC') test | AA | AB | AC | BA | BB | BC | CA | CB | CC | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
other characters are left intact (and should be the same on the start and end patterns:
>>> ndrange('test=A_1..C_2') test | A_1 | A_2 | B_1 | B_2 | C_1 | C_2 | 0 | 1 | 2 | 3 | 4 | 5
this also works within Axis()
>>> Axis('age', '0..10') Axis('age', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
implemented new syntax for defining groups using strings:
>>> arr = ndtest((3, 4)) >>> arr a\b | b0 | b1 | b2 | b3 a0 | 0 | 1 | 2 | 3 a1 | 4 | 5 | 6 | 7 a2 | 8 | 9 | 10 | 11
groups can be named using “>>” instead of “=” previously
>>> arr.sum('b1,b3 >> b13;b0:b2 >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
if some labels are ambiguous, one can specify the axis by using “axis_name[labels]”:
>>> arr.sum('b[b1,b3] >> b13;b[b0:b2] >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
groups can also be defined by position using this syntax:
>>> arr.sum('b.i[1,3] >> b13;b.i[0:3] >> b012') a\b | b13 | b012 a0 | 4 | 3 a1 | 12 | 15 a2 | 20 | 27
A few notes:
the goal was to have that syntax as close as the “normal” syntax as possible (just remove the “x.” and all inner quotes).
in models, the normal syntax should be preferred, so that the groups can be stored in a variable and reused in several places
strings representing integers are evaluated as integers.
there is experimental support for evaluating expressions within string groups by using “{expr}”, but this is fragile and might be removed in the future.
implemented combine_axes & split_axis on arrays:
>>> arr = ndtest((2, 3, 4)) >>> arr a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
>>> arr2 = arr.combine_axes((x.a, x.b)) >>> arr2 a_b\c | c0 | c1 | c2 | c3 a0_b0 | 0 | 1 | 2 | 3 a0_b1 | 4 | 5 | 6 | 7 a0_b2 | 8 | 9 | 10 | 11 a1_b0 | 12 | 13 | 14 | 15 a1_b1 | 16 | 17 | 18 | 19 a1_b2 | 20 | 21 | 22 | 23
>>> arr2.split_axis(x.a_b) a | b\c | c0 | c1 | c2 | c3 a0 | b0 | 0 | 1 | 2 | 3 a0 | b1 | 4 | 5 | 6 | 7 a0 | b2 | 8 | 9 | 10 | 11 a1 | b0 | 12 | 13 | 14 | 15 a1 | b1 | 16 | 17 | 18 | 19 a1 | b2 | 20 | 21 | 22 | 23
implemented .by() method on groups which splits them into subgroups of specified length
>>> arr = ndtest((5, 2)) >>> arr a\b | b0 | b1 a0 | 0 | 1 a1 | 2 | 3 a2 | 4 | 5 a3 | 6 | 7 a4 | 8 | 9
>>> arr.sum(a['a0':'a4'].by(2)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a2' 'a3'] | 10 | 12 a['a4'] | 8 | 9
there is also an optional second argument to specify the “step” between groups
>>> arr.sum(a['a0':'a4'].by(2, step=3)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a3' 'a4'] | 14 | 16
if the step is < the group size, you get overlapping groups:
>>> arr.sum(a['a0':'a4'].by(2, step=1)) a\b | b0 | b1 a['a0' 'a1'] | 2 | 4 a['a1' 'a2'] | 6 | 8 a['a2' 'a3'] | 10 | 12 a['a3' 'a4'] | 14 | 16 a['a4'] | 8 | 9
groups can be renamed using >> (in addition to the “named” method)
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.sum((x.b['b0,b1'] >> 'b01', x.b['b1,b2'] >> 'b12')) a\b | b01 | b12 a0 | 1 | 3 a1 | 7 | 9
implemented rationot0
>>> a = Axis('a', 'a0,a1') >>> b = Axis('b', 'b0,b1,b2') >>> arr = LArray([[6, 0, 2], ... [4, 0, 8]], [a, b]) >>> arr a\b | b0 | b1 | b2 a0 | 6 | 0 | 2 a1 | 4 | 0 | 8 >>> arr.sum() 20 >>> arr.rationot0() a\b | b0 | b1 | b2 a0 | 0.3 | 0.0 | 0.1 a1 | 0.2 | 0.0 | 0.4 >>> arr.rationot0(x.a) a\b | b0 | b1 | b2 a0 | 0.6 | 0.0 | 0.2 a1 | 0.4 | 0.0 | 0.8
for reference, the normal ratio method would return:
>>> arr.ratio(x.a) a\b | b0 | b1 | b2 a0 | 0.6 | nan | 0.2 a1 | 0.4 | nan | 0.8
Misc improvements
implemented [] on groups so that you can further subset them
added a new “condensed” option for ipfp’s display_progress argument to get back the old behavior
changed how named groups are displayed (only the name is displayed)
positional groups gained a few features and are almost on par with label groups now
when iterating over an axis (for example when doing “for y in year_axis:” it yields groups (instead of raw labels) so that it works even in the presence of ambiguous labels.
Axis.startswith, endswith, matches create groups which include the axis (so that those groups work even if the labels exist on several axes)
Bug fixes
fixed Session.summary() when arrays in the session have axes without name
fixed full() and full_like() with an explicit dtype (the dtype was ignored)
Version 0.17
Released on 2016-11-29.
Core
added ndtest function to create n-dimensional test arrays (of given shape). Axes are named by single letters starting from ‘a’. Axes labels are constructed using a ‘{axis_name}{label_pos}’ pattern (e.g. ‘a0’).
>>> ndtest(6) a | a0 | a1 | a2 | a3 | a4 | a5 | 0 | 1 | 2 | 3 | 4 | 5 >>> ndtest((2, 3)) a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> ndtest((2, 3), label_start=1) a\b | b1 | b2 | b3 a1 | 0 | 1 | 2 a2 | 3 | 4 | 5
allow naming “one-shot” groups in group aggregates.
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.sum('g1=b0;g2=b1,b2;g3=b0:b2') a\b | 'g1' ('b0') | 'g2' (['b1' 'b2']) | 'g3' ('b0':'b2') a0 | 0 | 3 | 3 a1 | 3 | 9 | 12
implemented argmin, argmax, posargmin, posargmax without an axis argument (works on the full array).
>>> arr = ndtest((2, 3)) >>> arr a\b | b0 | b1 | b2 a0 | 0 | 1 | 2 a1 | 3 | 4 | 5 >>> arr.argmin() ('a0', 'b0')
added preliminary code to add a title attribute to LArray.
This needs a lot more work to be really useful though, as it can currently only be used in the LArray() function itself and is only used in Session.summary() (see below). There are many places where this should be used, but this is not done yet.
added Session.summary() which displays a list of all arrays, their dimension names and title if any.
This can be used in combination with local_arrays() to produce some kind of codebook with all the arrays of a function.
>>> arr = LArray([[1, 2], [3, 4]], 'sex=M,F;nat=BE,FO', title='a test array') >>> arr sex\nat | BE | FO M | 1 | 2 F | 3 | 4 >>> s = Session({'arr': arr}) >>> s Session(arr) >>> print(s.summary()) arr: sex, nat a test array
fixed using groups from other (compatible) axis
fixed group aggregates using groups without axis
fixed axis[another_label_group] when said group had a non-string Axis
fixed axis.group(another_label_group, name=’a_name’) (name was not set correctly)
fixed ipfp progress message when progress is negative
viewer
when setting part of an array in the console (by using e.g. arr[‘M’] = 10), display that array
when typing in the console the name of an existing array, select it in the list
fixed missing tooltips for arrays added to the session from within the session viewer
fixed window title (with axes info) not updating in many cases
fixed the filters bar not being cleared when displaying a non-LArray object after an LArray object
misc
improved messages in ipfp(display_progress=True)
improved tests, docstrings, …
Version 0.16.1
Released on 2016-11-04.
Viewer
renamed “Ok” button in array/session viewer to “Close”.
added apply and discard buttons in session editor, which permanently apply or discard changes to the current array.
Core
fixed array[sequence, scalar] = value
fixed array.to_excel() which was broken in 0.16 (by the upgrade to xlwings 0.9+).
improved a few tests
Version 0.16
Released on 2016-10-26.
Warning: this release needs to be installed using:
conda update larray conda update xlwings
New features
implemented support for xlwings 0.9+. This allowed us to change the way we interact with Excel:
by default, the Excel instance we use is configured to be both hidden and silent (for example, it does not prompt to update/edit links).
by default, we now use a dedicated Excel instance for each call to open_excel, instead of reusing any existing instance if there was any open. In practice, it means input/output from/to Excel is more reliable and does not risk altering any workbook you had open (except if you ask for that explicitly). The cost of this is that it is slower by default. If you open many different workbooks, it is recommended that you create a single Excel instance and reuse it. This can be done with:
>>> from larray import * >>> import xlwings as xw
>>> app = xw.App(visible=False, add_book=False) >>> wb1 = open_excel('workbook1.xlsx', app=app) # use wb1 as before >>> wb1.close() >>> wb2 = open_excel('workbook2.xlsx', app=app) # use wb2 as before >>> wb2.close() >>> app.quit()
added ipfp function which does Iterative Proportional Fitting Procedure (also known as bi-proportional fitting in statistics or RAS algorithm in economics). Note that this new function is currently not in the core module, so it needs a specific import command:
>>> from larray.ipfp import ipfp
>>> a = Axis('a', 2) >>> b = Axis('b', 2) >>> initial = LArray([[2, 1], ... [1, 2]], [a, b]) >>> initial a*\b* | 0 | 1 0 | 2 | 1 1 | 1 | 2 >>> target_sum_along_a = LArray([2, 1], b) >>> target_sum_along_b = LArray([1, 2], a) >>> ipfp([target_sum_along_a, target_sum_along_b], initial, threshold=0.01) a*\b* | 0 | 1 0 | 0.8450704225352113 | 0.15492957746478875 1 | 1.1538461538461537 | 0.8461538461538463
made it possible to create arrays more succintly in some usual cases (especially for quick arrays for testing purposes). Previously, when one created an array from scratch, he had to provide Axis object(s) (or another array). Note that the following examples use zeros() but this change affects all array creation functions (ones, zeros, ndrange, full, empty):
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> zeros([nat, sex]) nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
Now, when you have axe names and axes labels but do not have/want to reuse an existing axis, you can use this syntax:
>>> zeros([('nat', ['BE', 'FO']), ... ('sex', ['M', 'F'])]) nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
If additionally all axe names and labels are strings (not integers or other types) which do not contain any special character (“=”, “,” or “;”) you can use:
>>> zeros('nat=BE,FO;sex=M,F') nat\sex | M | F BE | 0.0 | 0.0 FO | 0.0 | 0.0
See below (*) for some more alternate syntaxes and an explanation of how this works.
added additional, less error-prone syntax for stack:
>>> nat = Axis('nat', 'BE,FO') >>> arr1 = ones(nat) >>> arr1 nat | BE | FO | 1.0 | 1.0 >>> arr2 = zeros(nat) >>> arr2 nat | BE | FO | 0.0 | 0.0 >>> stack([('M', arr1), ('F', arr2)], 'sex') nat\sex | H | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
in addition to the still supported but discouraged (because one has to remember the order of labels):
>>> sex = Axis('sex', ['M', 'F']) >>> stack((arr1, arr2), sex) nat\sex | H | F BE | 1.0 | 0.0 FO | 1.0 | 0.0
added LArray.compact and Session.compact() to detect and remove “useless” axes (ie axes for which values are constant over the whole axis)
>>> a = LArray([[1, 2], [1, 2]], [Axis('sex', 'M,F'), Axis('nat', 'BE,FO')]) >>> a sex\nat | BE | FO M | 1 | 2 F | 1 | 2 >>> a.compact() nat | BE | FO | 1 | 2
made Session keep the order in which arrays were added to it. The main goal was to make this work:
>>> b, a = s['b', 'a']
Previously, since sessions were always traversed alphabetically, this was a dangerous operation because if the keys (a and b) were not sorted alphabetically, the result would not be in the expected order:
s[‘b’, ‘a’] previously returned a, b instead of b, a !!
Session.names is still sorted alphabetically though (Session.keys() is not)
added LArray.with_axes(axes) to return a new LArray with the same data but different axes
>>> a = ndrange(2) >>> a {0}* | 0 | 1 | 0 | 1 >>> a.with_axes([Axis('sex', 'H,F')]) sex | H | F | 0 | 1
changed width from which an LArray is summarized (using “…”) from 80 characters to 200.
implemented memory_used property which displays nbytes in human-readable form
>>> a = ndrange('sex=H,F;nat=BE,FO') >>> a.memory_used '16 bytes' >>> a = ndrange(100000) >>> a.memory_used '390.62 Kb'
implemented Axis + AxisCollection
>>> a = ndrange('sex=M,F;type=t1,t2') >>> Axis('nat', 'BE,FO') + a.axes AxisCollection([ Axis('nat', ['BE', 'FO']), Axis('sex', ['M', 'F']), Axis('type', ['t1', 't2']) ])
(*) For the curious, there are also many syntaxes supported for array creation functions. In fact, during array creation, at any place a list or tuple of values is expected, you can specify it using a single string, which will be split successively at the following characters if present: “;” then “=” then “,”. If you apply that algorithm to ‘nat=BE,FO;sex=M,F’, you get:
‘nat=BE,FO;sex=M,F’
(‘nat=BE,FO’, ‘sex=M,F’)
((‘nat’, ‘BE,FO’), (‘sex’, ‘M,F’))
((‘nat’, (‘BE’, ‘FO’)), (‘sex’, (‘M’, ‘F’)))
Recognise this last syntax? This is the same as above, except above we replaced some () with [] for clarity. In fact all the intermediate forms here above are valid (and equivalent) in array creation functions.
Version 0.15
Released on 2016-09-23.
Core
added new methods on axes: matches, startswith, endswith
>>> country = Axis('country', ['FR', 'BE', 'DE', 'BR']) >>> country.matches('BE|FR') LGroup(['FR', 'BE']) >>> country.matches('^..$') # labels 2 characters long LGroup(['FR', 'BE', 'DE', 'BR'])
>>> country.startswith('B') LGroup(['BE', 'BR']) >>> country.endswith('R') LGroup(['FR', 'BR'])
implemented set-like operations on LGroup: & (intersection), | (union), - (difference). Slice groups do not work yet on axes references (x.) but that will come in the future…
>>> alpha = Axis('alpha', 'a,b,c,d') >>> alpha['a', 'b'] | alpha['c', 'd'] LGroup(['a', 'b', 'c', 'd'], axis=…) >>> alpha['a', 'b', 'c'] | alpha['c', 'd'] LGroup(['a', 'b', 'c', 'd'], axis=…)
a name is computed automatically when both operands are named
>>> r = alpha['a', 'b'].named('ab') | alpha['c', 'd'].named('cd') >>> r.name 'ab | cd' >>> r.key ['a', 'b', 'c', 'd']
numeric axes work too
>>> num = Axis('num', range(10)) >>> num[:2] | num[8:] num[0, 1, 2, 8, 9] >>> num[:2] | num[5] num[0, 1, 2, 5])
intersection
>>> LGroup(['a', 'b', 'c']) & LGroup(['c', 'd']) LGroup(['c'])
difference
>>> LGroup(['a', 'b', 'c']) - LGroup(['c', 'd']) LGroup(['a', 'b']) >>> LGroup(['a', 'b', 'c']) - 'b' LGroup(['a', 'c'])
fixed loading 1D arrays using open_excel
Viewer
added tooltip with the axes labels corresponding to each cell of the array viewer
added name and dimensions of the current array to the window title bar in the session viewer
added tooltip with each array .info() in the list of arrays of the session viewer
fixed eval box throwing an exception when trying to set a new variable (if qtconsole is not present)
fixed group aggregates using LGroups defined using axes references (x.), for example:
>>> arr.sum(x.age[:10])
fixed group aggregates using anonymous axes
Version 0.14.1
Released on 2016-08-12.
Fixes
fixed support for loading arrays without axe names from Excel files (in that case index_col/nb_index are necessary)
fixed using a single int for index_col in read_excel() and sheet.load()
fixed loading empty Excel sheets via xlwings correctly (ie do not crash)
fixed dumping a session loaded from an H5 file to Excel
Version 0.14
Released on 2016-08-10.
Important warning
This version is not compatible with the new version of xlwings that just came out. Consequently, upgrading to this version is different from the usual “conda update larray”. You should rather use:
conda update larray –no-update-deps
To get the most of this release, you should also install the “qtconsole” package via:
conda install qtconsole
Viewer
upgraded session viewer/editor to work like a super-calculator. The input box below the array view can be used to type any expression. eg array1.sum(x.age) / array2, which will be displayed in the viewer. One can also type assignment commands, like: array3 = array1.sum(x.age) / array2 In which case, the new array will be displayed in the viewer AND added to the session (appear on the list on the left), so that you can use it in other expressions.
- If you have the “qtconsole” package installed (see above), that input box will be a full ipython console. This means:
history of typed commands,
tab-completion (for example, type “nd<tab>” and it will change to “ndrange”),
syntax highlighting,
calltips (show the documentation of functions when typing commands using them),
help on functions using “?”. For example, type “ndrange?<enter>” to get the full documentation about ndrange. Use <ESC> or <q> to quit that screen !),
etc.
When having the “qtconsole” package installed, you might get a warning when starting the viewer:
WARNING:root:Message signing is disabled. This is insecure and not recommended!
This is totally harmless and can be safely ignored !
made view() and edit() without argument equivalent to view(local_arrays()) and edit(local_arrays()) respectively.
made the viewer on large arrays start a lot faster by using a small subset of the array to guess the number of decimals to display and whether or not to use scientific notation.
- improved compare():
added support for comparing sessions. Arrays with differences between sessions are colored in red.
use a single array widget instead of 3. This is done by stacking arrays together to create a new dimension. This has the following advantages:
the filter and scrollbars are de-facto automatically synchronized.
any number of arrays can be compared, not just 2. All arrays are compared to the first one.
arrays with different sets of compatible axes can be compared (eg compare an array with its mean along an axis).
added label to show maximum absolute difference.
implemented edit(session) in addition to view(session).
Excel support
added support for copying sheets via: wb[‘x’] = wb[‘y’] if ‘x’ sheet already existed, it is completely overwritten.
Core
improved performance. My test models run about 10% faster than with 0.13.
made cumsum and cumprod aggregate on the last axis by default so that the axis does not need to be specified when there is only one.
implemented much better support for operations using arrays of different types. For example,
fixed create_sequential when mult, inc and initial are of different types eg create_sequential(…, initial=1, inc=0.1) had an unexpected integer result because it always used the type of the initial value for the output
when appending a string label to an integer axis (eg adding total to an age axis by using with_total()), the resulting axis should have a mixed type, and not be suddenly all string.
stack() now supports arrays with different types.
made stack support arrays with different axes (the result has the union of all axes)
For completeness
Excel support
use xlwings (ie live Excel instance) by default for all Excel input/output, including read_excel(), session.dump and session.load/Session(filename). This has the advantage of more coherent results among the different ways to load/save data to Excel and that simple sessions correctly survive a round-trip to an .xlsx workbook (ie (named) axes are detected properly). However, given the very different library involved, we loose most options that read_excel used to provide (courtesy of pandas.read_excel) and some bugs were probably introduced in the conversion.
fixed creating a new file via open_excel()
fixed loading 1D arrays (ranges with height 1 or width 1) via open_excel()
fixed sheet[‘A1’] = array in some cases
wb.close() only really close if the workbook was not already open in Excel when open_excel was called (so that we do not close a workbook a user is actually viewing).
added support for wb.save(filename), or actually for using any relative path, instead of a full absolute path.
when dumping a session to Excel, sort sheets alphabetically instead of dumping them in a “random” order.
try to convert float to int in more situations
Core
added support for using stack() without providing an axis. It creates an anonymous wildcard axis of the correct length.
added aslarray() top-level function to translate anything into an LArray if it is not already one
made labels_array available via from larray import *
fixed binary operations between an array and an axis where the array appeared first (eg array > axis). Confusingly, axis < array already worked.
added check in “a[bool_larray_key]” to make sure key.axes are compatible with a.axes
made create_sequential a lot faster when mult or inc are constants
made axes without name compatible with any name (this is the equivalent of a wildcard name for labels)
misc cleanup/docstring improvements/improved tests/improved error messages
Version 0.13
Released on 2016-07-11.
New features
implemented a new way to do input/output from/to Excel
>>> a = ndrange((2, 3)) >>> wb = open_excel('c:/tmp/y.xlsx') # put a at A1 in Sheet1, excluding headers (labels) >>> wb['Sheet1'] = a # dump a at A1 in Sheet2, including headers (labels) >>> wb['Sheet2'] = a.dump() # save the file to disk >>> wb.save() # close it >>> wb.close()
>>> wb = open_excel('c:/tmp/y.xlsx') # load a from the data starting at A1 in Sheet1, assuming the absence of headers. >>> a1 = wb['Sheet1'] # load a from the data starting at A1 in Sheet1, assuming the presence of (correctly formatted) headers. >>> a2 = wb['Sheet2'].load() >>> wb.close()
>>> wb = open_excel('c:/tmp/y.xlsx') # note that Sheet2 must exist >>> sheet2 = wb['Sheet2'] # write a without labels starting at C5 >>> sheet2['C5'] = a # write a with its labels starting at A10 >>> sheet2['A10'] = a.dump()
load an array with its axes information from a range. As you might have guessed, we could also use the sheet2 variable here
>>> b = wb['Sheet2']['A10:D12'].load() >>> b {0}*\{1}* | 0 | 1 | 2 0 | 0 | 1 | 2 1 | 3 | 4 | 5
load an array (raw data) with no axis information from a range.
>>> c = sheet['B11:D12'] >>> # in fact, this is not really an LArray ... >>> c <larray.excel.Range at 0x1ff1bae22e8> >>> # but it can be used as such (this is currently very experimental) >>> c.sum(axis=0) {0}* | 0 | 1 | 2 | 3.0 | 5.0 | 7.0 >>> # ... and it can be used for other stuff, like setting the formula instead of the value: >>> c.formula = '=D10+1' >>> # in the future, we should also be able to set font name, size, style, etc.
implemented LArray.rename({axis: new_name}) as well as using kwargs to rename several axes at once
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange([nat, sex]) >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3 >>> a.rename(nat='nat2', sex='gender') nat2\gender | M | F BE | 0 | 1 FO | 2 | 3 >>> a.rename({'nat': 'nat2', 'sex': 'gender'}) nat2\gender | M | F BE | 0 | 1 FO | 2 | 3
made tab-completion of axes names possible in an interactive console
For completeness
taking a subset of an array with wildcard axes now returns an array with wildcard axes
fixed a case where wildcard axes were considered incompatible when they actually were compatible
better support for anonymous axes
fix for obscure bugs, better doctests, cleaner implementation for a few functions, …
Version 0.12
Released on 2016-06-21.
New features
implemented boolean indexing by using axes objects:
>>> sex = Axis('sex', 'M,F') >>> age = Axis('age', range(5)) >>> a = ndrange((sex, age)) >>> a sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 1 | 2 | 3 | 4 F | 5 | 6 | 7 | 8 | 9
>>> a[age < 3] sex\age | 0 | 1 | 2 M | 0 | 1 | 2 F | 5 | 6 | 7
This new syntax is equivalent to (but currently much slower than):
>>> a[age[:2]] sex\age | 0 | 1 | 2 M | 0 | 1 | 2 F | 5 | 6 | 7
However, the power of this new syntax comes from the fact that you are not limited to scalar constants
>>> age_limit = LArray([2, 3], sex) >>> age_limit sex | M | F | 2 | 3
>>> a[age < age_limit] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7
Notice that the concerned axes are merged, so you cannot do much as much with them. For example, a[age < age_limit].sum(x.age) would not work since there is no “age” axis anymore.
To keep axes intact, one can often set the values of the corresponding cells to 0 or nan instead.
>>> a[age < age_limit] = 0 >>> a sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9 >>> # in this case, the sum *is* valid (but the mean would not -- one should use nan for that) >>> a.sum(x.age) sex | M | F | 9 | 17
To keep axes intact, this idiom is also often useful:
>>> b = a * (age >= age_limit) >>> b sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9
This also works with axes references (x.axis_name), though this is experimental and the filter value is only computed as late as possible (during []), so you cannot display it before that, like you can with “real” axes.
Using “real” axes:
>>> filter1 = age < age_limit >>> filter1 age\sex | M | F 0 | True | True 1 | True | True 2 | False | True 3 | False | False 4 | False | False >>> a[filter1] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7
With axes references:
>>> filter2 = x.age < age_limit >>> filter2 <larray.core.BinaryOp at 0x1332ae3b588> >>> a[filter2] sex,age | M,0 | M,1 | F,0 | F,1 | F,2 | 0 | 1 | 5 | 6 | 7 >>> a * ~filter2 sex\age | 0 | 1 | 2 | 3 | 4 M | 0 | 0 | 2 | 3 | 4 F | 0 | 0 | 0 | 8 | 9
implemented LArray.divnot0
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange((nat, sex)) >>> a nat\sex | M | F BE | 0 | 1 FO | 2 | 3 >>> b = ndrange(sex) >>> b sex | M | F | 0 | 1 >>> a / b nat\sex | M | F BE | nan | 1.0 FO | inf | 3.0 >>> a.divnot0(b) nat\sex | M | F BE | 0.0 | 1.0 FO | 0.0 | 3.0
implemented .named() on groups to name groups after the fact
>>> a = ndrange(Axis('age', range(100))) >>> a age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 >>> a.sum((x.age[10:19].named('teens'), x.age[20:29].named('twenties'))) age | 'teens' (10:19) | 'twenties' (20:29) | 145 | 245
made all array creation functions (ndrange, zeros, ones, full, LArray, …) more flexible:
They accept a single Axis argument instead of requiring a tuple/list of them
>>> sex = Axis('sex', 'M,F') >>> a = ndrange(sex) >>> a sex | M | F | 0 | 1
Shortcut definition for axes work
>>> ndrange("a,b,c") {0} | a | b | c | 0 | 1 | 2 >>> ndrange(["1:3", "d,e"]) {0}\{1} | d | e 1 | 0 | 1 2 | 2 | 3 3 | 4 | 5 >>> LArray([1, 5, 7], "a,b,c") {0} | a | b | c | 1 | 5 | 7
One can mix Axis objects and ints (for axes without labels)
>>> sex = Axis('sex', 'M,F') >>> ndrange([sex, 3]) sex\{1}* | 0 | 1 | 2 M | 0 | 1 | 2 F | 3 | 4 | 5
made it possible to iterate on labels of a group (eg a slice of an Axis):
>>> for year in a.axes.year[2010:]: ... # do stuff
changed representation of anonymous axes from “axisN” (where N is the position of the axis) to “{N}”. The problem was that “axisN” was not recognizable enough as an anonymous axis, and it was thus misleading. For example “a[x.axis0[…]]” would not work.
better overall support for arrays with anonymous axes or several axes with the same name
fixed all output functions (to_csv, to_excel, to_hdf, …) when the last axis has no name but other axes have one
implemented eye() which creates 2D arrays with ones on the diagonal and zeros elsewhere.
>>> eye(sex) sex\sex | M | F M | 1.0 | 0.0 F | 0.0 | 1.0
implemented the @ operator to do matrix multiplication (Python3.5+ only)
implemented inverse() to return the (matrix) inverse of a (square) 2D array
>>> a = eye(sex) * 2 >>> a sex\sex | M | F M | 2.0 | 0.0 F | 0.0 | 2.0
>>> a @ inverse(a) sex\sex | M | F M | 1.0 | 0.0 F | 0.0 | 1.0
implemented diag() to extract a diagonal or construct a diagonal array.
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> a = ndrange([nat, sex], start=1) >>> a nat\sex | M | F BE | 1 | 2 FO | 3 | 4 >>> d = diag(a) >>> d nat,sex | BE,M | FO,F | 1 | 4 >>> diag(d) nat\sex | M | F BE | 1 | 0 FO | 0 | 4 >>> a = ndrange(sex, start=1) >>> a sex | M | F | 1 | 2 >>> diag(a) sex\sex | M | F M | 1 | 0 F | 0 | 2
For completeness
added Axis.rename method which returns a copy of the axis with a different name and deprecate Axis._rename
added labels_array as a generalized version of identity (which is deprecated)
implemented LArray.ipoints[…] to do point selection using coordinates instead of labels (aka numpy indexing)
raise an error when trying to do a[key_with_more_axes_than_a] = value instead of silently ignoring extra axes.
allow using a single int for index_col in read_csv in addition to a list of ints
implemented __getitem__ for “x”. You can now write stuff like:
>>> a = ndrange((3, 4)) >>> a[x[0][1:]] {0}\{1}* | 0 | 1 | 2 | 3 1 | 4 | 5 | 6 | 7 2 | 8 | 9 | 10 | 11 >>> a[x[1][2:]] {0}*\{1} | 2 | 3 0 | 2 | 3 1 | 6 | 7 2 | 10 | 11 >>> a.sum(x[0]) {0}* | 0 | 1 | 2 | 3 | 12 | 15 | 18 | 21
produce normal axes instead of wildcard axes on LArray.points[…]. This is (much) slower but more correct/informative.
changed the way we store axes internally, which has several consequences
better overall support for anonymous axes
better support for arrays with several axes with the same name
small performance improvement
the same axis object cannot be added twice in an array (one should use axis.copy() if that need arises)
changes the way groups with an axis are displayed
fixed sum, min, max functions on non-LArray arguments
changed __repr__ for wildcard axes to not display their labels but their length
>>> ndrange(3).axes[0] Axis(None, 3)
fixed aggregates on several groups “forgetting” the name of groups which had been created using axis.all()
allow Axis(…, long) in addition to int (Python2 only)
better docstrings/tests/comments/error messages/thoughts/…
Version 0.11.1
Released on 2016-05-25.
Fixes
fixed new functions full, full_like and create_sequential not being available when using from larray import *
Version 0.11
Released on 2016-05-25.
Viewer
implemented “Copy to Excel” in context menu (Ctrl+E), to open the selection in a new Excel sheet directly, without the need to use paste. If nothing is selected, copies the whole array.
when nothing is selected, Ctrl C selects & copies the whole array to the clipboard.
when nothing is selected, Ctrl V paste at top-left corner
implemented view(dict_with_array_values)
>>> view({'a': array1, 'b': array2})
fixed copy (ctrl-C) when viewing a 2D array: it did not include labels from the first axis in that case
Core
implemented LArray.growth_rate to compute the growth along an axis
>>> sex = Axis('sex', ['M', 'F']) >>> year = Axis('year', [2015, 2016, 2017]) >>> a = ndrange([sex, year]).cumsum(x.year) >>> a sex\year | 2015 | 2016 | 2017 M | 0 | 1 | 3 F | 3 | 7 | 12 >>> a.growth_rate() sex\year | 2016 | 2017 M | inf | 2.0 F | 1.33333333333 | 0.714285714286 >>> a.growth_rate(d=2) sex\year | 2017 M | inf F | 3.0
implemented LArray.diff (difference along an axis)
>>> sex = Axis('sex', ['M', 'F']) >>> xtype = Axis('type', ['type1', 'type2', 'type3']) >>> a = ndrange([sex, xtype]).cumsum(x.type) >>> a sex\type | type1 | type2 | type3 M | 0 | 1 | 3 F | 3 | 7 | 12 >>> a.diff() sex\type | type2 | type3 M | 1 | 2 F | 4 | 5 >>> a.diff(n=2) sex\type | type3 M | 1 F | 1 >>> a.diff(x.sex) sex\type | type1 | type2 | type3 F | 3 | 6 | 9
implemented round() (as a nicer alias to around() and round_())
>>> a = ndrange(5) + 0.5 >>> a axis0 | 0 | 1 | 2 | 3 | 4 | 0.5 | 1.5 | 2.5 | 3.5 | 4.5 >>> round(a) axis0 | 0 | 1 | 2 | 3 | 4 | 0.0 | 2.0 | 2.0 | 4.0 | 4.0
implemented Session[[‘list’, ‘of’, ‘str’]] to get a subset of a Session
>>> s = Session({'a': ndrange(3), 'b': ndrange(4), 'c': ndrange(5)}) >>> s Session(a, b, c) >>> s['a', 'c'] Session(a, c)
implemented LArray.points to do pointwise indexing instead of the default orthogonal indexing when indexing several dimensions at the same time.
>>> a = Axis('a', ['a1', 'a2', 'a3']) >>> b = Axis('b', ['b1', 'b2', 'b3']) >>> arr = ndrange((a, b)) >>> arr a\b | b1 | b2 | b3 a1 | 0 | 1 | 2 a2 | 3 | 4 | 5 >>> arr[['a1', 'a3'], ['b1', 'b2']] a\b | b1 | b2 a1 | 0 | 1 a3 | 6 | 7 # this selects the points ('a1', 'b1') and ('a3', 'b2') >>> arr.points[['a1', 'a3'], ['b1', 'b2']] a,b* | 0 | 1 | 0 | 7
Note that .ipoints (to do pointwise indexing with positions instead of labels – aka numpy indexing) is planned but not functional yet.
made “arr1.drop_labels() * arr2” use the labels from arr2 if any
>>> a = Axis('a', ['a1', 'a2']) >>> b = Axis('b', ['b1', 'b2']) >>> b2 = Axis('b', ['b2', 'b3']) >>> arr1 = ndrange([a, b]) >>> arr1 a\b | b1 | b2 a1 | 0 | 1 a2 | 2 | 3 >>> arr1.drop_labels(b) a\b* | 0 | 1 a1 | 0 | 1 a2 | 2 | 3 >>> arr1.drop_labels([a, b]) a*\b* | 0 | 1 0 | 0 | 1 1 | 2 | 3 >>> arr2 = ndrange([a, b2]) >>> arr2 a\b | b2 | b3 a1 | 0 | 1 a2 | 2 | 3 >>> arr1 * arr2 Traceback (most recent call last): ... ValueError: incompatible axes: Axis('b', ['b2', 'b3']) vs Axis('b', ['b1', 'b2']) >>> arr1 * arr2.drop_labels() a\b | b1 | b2 a1 | 0 | 1 a2 | 4 | 9 # in versions < 0.11, it used to return: # >>> arr1.drop_labels() * arr2 # a*\b* | 0 | 1 # 0 | 0 | 1 # 1 | 2 | 3 >>> arr1.drop_labels() * arr2 a\b | b2 | b3 a1 | 0 | 1 a2 | 4 | 9 >>> arr1.drop_labels('a') * arr2.drop_labels('b') a\b | b1 | b2 a1 | 0 | 1 a2 | 4 | 9
made .plot a property, like in Pandas, so that we can do stuff like:
>>> a.plot.bar() # instead of >>> a.plot(kind='bar')
made labels from different types not match against each other even if their value is the same. This might break some code but it is both more efficient and more convenient in some cases, so let us see how it goes:
>>> a = ndrange(4) >>> a axis0 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 >>> a[1] 1 >>> # This used to "work" (and return 1) >>> a[True] … ValueError: True is not a valid label for any axis
>>> a[1.0] … ValueError: 1.0 is not a valid label for any axis
implemented read_csv(dialect=’liam2’) to read .csv files formatted like in LIAM2 (with the axes names on a separate line than the last axis labels)
implemented Session[boolean LArray]
>>> a = ndrange(3) >>> b = ndrange(4) >>> s1 = Session({'a': a, 'b': b}) >>> s2 = Session({'a': a + 1, 'b': b}) >>> s1 == s2 name | a | b | False | True >>> s1[s1 == s2] Session(b) >>> s1[s1 != s2] Session(a)
implemented experimental support for creating an array sequentially. Comments on the name of the function and syntax (especially compared to ndrange) would be appreciated.
>>> year = Axis('year', range(2016, 2020)) >>> sex = Axis('sex', ['M', 'F']) >>> create_sequential(year) year | 2016 | 2017 | 2018 | 2019 | 0 | 1 | 2 | 3 >>> create_sequential(year, 1.0, 0.1) year | 2016 | 2017 | 2018 | 2019 | 1.0 | 1.1 | 1.2 | 1.3 >>> create_sequential(year, 1.0, mult=1.1) year | 2016 | 2017 | 2018 | 2019 | 1.0 | 1.1 | 1.21 | 1.331 >>> inc = LArray([1, 2], [sex]) >>> inc sex | M | F | 1 | 2 >>> create_sequential(year, 1.0, inc) sex\year | 2016 | 2017 | 2018 | 2019 M | 1.0 | 2.0 | 3.0 | 4.0 F | 1.0 | 3.0 | 5.0 | 7.0 >>> mult = LArray([2, 3], [sex]) >>> mult sex | M | F | 2 | 3 >>> create_sequential(year, 1.0, mult=mult) sex\year | 2016 | 2017 | 2018 | 2019 M | 1.0 | 2.0 | 4.0 | 8.0 F | 1.0 | 3.0 | 9.0 | 27.0 >>> initial = LArray([3, 4], [sex]) >>> initial sex | M | F | 3 | 4 >>> create_sequential(year, initial, inc, mult) sex\year | 2016 | 2017 | 2018 | 2019 M | 3 | 7 | 15 | 31 F | 4 | 14 | 44 | 134 >>> def modify(prev_value): ... return prev_value / 2 >>> create_sequential(year, 8, func=modify) year | 2016 | 2017 | 2018 | 2019 | 8 | 4 | 2 | 1 >>> create_sequential(3) axis0* | 0 | 1 | 2 | 0 | 1 | 2 >>> create_sequential(x.year, axes=(sex, year)) sex\year | 2016 | 2017 | 2018 | 2019 M | 0 | 1 | 2 | 3 F | 0 | 1 | 2 | 3
implemented full and full_like to create arrays initialize to something else than zeros or ones
>>> nat = Axis('nat', ['BE', 'FO']) >>> sex = Axis('sex', ['M', 'F']) >>> full([nat, sex], 42.0) nat\sex | M | F BE | 42.0 | 42.0 FO | 42.0 | 42.0 >>> initial_value = ndrange([sex]) >>> initial_value sex | M | F | 0 | 1 >>> full([nat, sex], initial_value) nat\sex | M | F BE | 0 | 1 FO | 0 | 1
performance improvements when using label keys: a[key] is faster, especially if key is large
Fixes
to_excel(filepath) only closes the file if it was not open before
removed code which forced labels from .csv files to be strings (as it caused problems in many cases, e.g. ages in LIAM2 files)
Misc. stuff for completeness
made LGroups usable in Python’s builtin range() and convertible to int and float
implemented AxisCollection.union (equivalent to AxisCollection | Axis)
fixed boolean array keys (boolean filter) in combination with scalar keys (for other dimensions)
fixed support for older numpy
fixed LArray.shift(n=0)
still more work on making arrays with anonymous axes usable (not there yet)
added more tests
better docstrings/error messages…
misc. code cleanup/simplification/improved comments
Version 0.10.1
Released on 2016-03-25.
New features
A single change in this release: a much more powerful to_excel function which (by default) use Excel itself to write files. Additional functionality include:
write in an existing file without overwriting existing data/sheet/…
write at a precise position
view an array in a live Excel instance (a new OR an existing workbook)
See
to_excel()
documentation for details.
Version 0.10
Released on 2016-03-22.
Core
implemented dropna argument for to_csv, to_frame and to_series to avoid writing lines with either ‘all’ or ‘any’ NA values.
implemented read_sas. Needs pandas >= 0.18 (though it seems still buggy on some files).
implemented experimental support for __getattr__ and __setattr__ on LArray. One can use arr.H instead of arr[‘M’]. It only works for single string labels though (not for slices or list of labels nor integer labels). Not sure it is a good idea :).
- implemented Session +-*/
Eg. sess1 - sess2 will compute the difference on each array present in either session. If an array is present in one session and not in the other, it is replaced by “NaN”.
added .nbytes property to LArray objects (to know how many bytes of memory the array uses)
made sort_axis accept a tuple of axes
raises an error on a.i[tuple_with_len_greater_than_array_ndim]
slightly better support for axes with no name (no, still no complete support yet ;-))
improved AxisCollection: implemented __delitem__(slice), __setitem__(list), __setitem__(slice)
fixed exception on AxisCollection.index(invalid_index)
better docstrings for a few functions
misc code cleanups, refactoring & improved tests
Editor
added .dirty property on ArrayEditorWidget
fixed viewing arrays with “inf” (infinite)
fixed a few edge cases for the ndigit detection code
fixed colors in some cases in edit()
made copy-paste of large regions faster in some cases
Version 0.9.2
Released on 2016-03-02.
Core
much better support for unnamed axes overall. Still a long way to go for full support, but it’s getting there…
Editor
fixed edit() for arrays with the same labels on several axes
Version 0.9.1
Released on 2016-03-01.
Core
better .info for arrays with groups in axes
>>> # example using groups without a name >>> reg = la.sum((fla, wal, bru, belgium)) >>> reg.info 4 x 15 geo [4]: ['A11' ... 'A73'] ['A25' ... 'A93'] 'A21' ['A11' ... 'A21'] lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
>>> # example using groups with a name >>> fla = geo.group(fla_str, name='Flanders') >>> wal = geo.group(wal_str, name='Wallonia') >>> bru = geo.group(bru_str, name='Brussels') >>> reg = la.sum((fla, wal, bru)) >>> reg.info 3 x 15 geo [3]: 'Flanders' (['A11' ... 'A73']) 'Wallonia' (['A25' ... 'A93']) 'Brussels' ('A21') lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
Editor
fixed edit() with non-string labels in axes
fixed edit() with filters in some more cases
fixed ArrayEditorWidget.reject_changes and accept_changes to update the model & view accordingly (in case the widget is kept open)
avoid (harmless) error messages in some cases
Version 0.9
Released on 2016-02-25.
A minor but backward incompatible version (hence the bump in version number)!
Core
fixed int_array.mean() to return floats instead of int (regression in 0.8)
larray_equal returns False when either value is not an LArray, instead of raising an exception
Session
changed Session == Session to return an array of booleans instead of a single boolean, so that we know which array(s) differ. Code like session1 == session2, should be changed to all(session1 == session2).
implemented Session != Session
implemented Session.get(k, default) (returns default if k does not exist in Session)
implemented len() for Session objects to know how many objects are in the Session
Viewer
fixed view() (regression in 0.8.1)
fixed edit() to actually apply changes on “OK”/accept_changes even when no filter change occurred after the last edit.
Version 0.8.1
Released on 2016-02-24.
Viewer
implemented min/maxvalue arguments for edit()
do not close the window when pressing Enter
allow to start editing cells by pressing Enter
fixed copy of changed cells (copy the changed value)
fixed pasted values to not be accepted directly (they go to “changes” like for manual edits)
fixed color updates on paste
disabled experimental tooltips on headers
better error message when entering invalid values
Core
implemented indexing by position on several dimensions at once (like numpy)
>>> # takes the first item in the first and third dimensions, leave the second dimension intact >>> arr.i[0, :, 0] <some result> >>> # sets all the cells corresponding to the first item in the first dimension and the second item in the fourth >>> # dimension >>> arr.i[0, :, :, 1] = 42
added optional ‘readonly’ argument to expand() to produce a readonly view (much faster since no copying is done)
Version 0.8
Released on 2016-02-16.
Core
implemented skipna argument for most aggregate functions. defaults to True.
implemented LArray.sort_values(key)
implemented percentile and median
added isnan and isinf toplevel functions
made axis argument optional for argsort & posargsort on 1D arrays
fixed a[key] = value when key corresponds to a single cell of the array
fixed keepaxes argument for aggregate functions
fixed a[int_array] (when the axis needs to be guessed)
fixed empty_like
fixed aggregates on several axes given as integers e.g. arr.sum(axis=(0, 2))
fixed “kind” argument in posargsort
Viewer
added title argument to edit() (set automatically if not provided, like for view())
fixed edit() on filtered arrays
fixed view(expression). anything which was not stored in a variable was broken in 0.7.1
reset background color when setting values if necessary (still buggy in some cases, but much less so ;-))
background color for headers is always on
view() => array cells are not editable, instead of being editable and ignoring entered values
fixed compare() colors when arrays are entirely equal
fixed error message for compare() when PyQt is not available
Misc
bump numpy requirement to 1.10, implicitly dropping support for python 3.3
renamed view module to editor to not collide with view function
improved/added a few tests
Version 0.7.1
Released on 2016-01-29.
Viewer
implemented paste (ctrl-V)
implemented experimental array comparator:
>>> compare(array1, array2)
Known limitation: the arrays must have exactly the same axes and the background color is buggy when using filters
when no title is specified in view(), it is determined automatically by inspecting the local variables of the function where view() is called and using the names of the ones matching the object passed. If several matches, up to 3 are displayed.
added axes names to copy (ctrl-C)
fixed copy (ctrl-C) of 0d array
Input/Output
added ‘dialect’ argument to to_csv. For example, dialect=’classic’ does not include the last (horizontal) axis name.
fixed loading .csv files without (ie ‘classic’ .csv files), though one needs to specify nb_index in that case if ndim > 2
strip spaces around axes names so that you can use “axis0<space><space>axis1” instead of “axis0axis1” in .csv files
fixed 1d arrays I/O
more precise parsing of input headers: 1 and 0 come out as int, not bool
Misc
nicer error message when using an invalid axes names
changed LArray .df property to a to_frame() method so that we can pass options to it
Version 0.7
Released on 2016-01-26.
Viewer
implemented view() on Session objects
added axes length in window title and add axes info even if title is provided manually (concatenate both)
ndecimals are recomputed when toggling the scientific checkbox
allow viewing (some) non-ndarray stuff (e.g. python lists)
refactored viewer code so that the filter drop downs can be reused too
Known regression: the viewer is slow on large arrays (this will be fixed in a later release, obviously)
Session
implemented local_arrays() to return all LArray in locals() as a Session
implemented Session.__getitem__(int_position)
implement Session(filename) to directly load all arrays from a file. Equivalent to:
>>> s = Session() >>> s.load(filename)
implemented Session.__eq__, so that you can compare two sessions and see if all arrays are equal. Suppose you want to refactor your code and make sure you get the same results.
>>> # put results in a Session >>> res = Session({'array1': array1, 'array2': array2}) >>> # before refactoring >>> res.dump('results.h5') >>> # after refactoring >>> assert Session('results.h5') == res
you can load all sheets/arrays of a file (if you do not specify which ones you want, it takes all)
loading several sheets from an excel file is now MUCH faster because the same file is kept open (apparently xlrd parses the whole file each time we open it).
you can specify a subset of arrays to dump
implemented rudimentary session I/O for .csv files, usage is a bit different from .h5 & excel files
>>> # need to specify format manually >>> s.dump('directory_name', fmt='csv') >>> # need to specify format manually >>> s = Session() >>> s.load('directory_name', fmt='csv')
pass *args and **kwargs to lower level functions in Session.load
fail when trying to read an inexistant H5 file through Session, instead of creating it
Other new features
added start argument in ndrange to specify starting value
implemented Axis._rename. Not sure it’s a good idea though…
implemented identity function which takes an Axis and returns an LArray with the axis labels as values
implemented size property on AxisCollection
allow a single int in AxisCollection.without
Fixes
fixed broadcast_with when other_axes contains 0-len axes
fixed a[bool_array] = value when the first axis of a is not in bool_array
fixed view() on arrays with unnamed axes
fixed view() on arrays of Python objects
various other small bugs fixed
Version 0.6.1
Released on 2016-01-13.
New features
added dtype argument to all array creation functions to override default data type
aggregates can take an explicit “axis” keyword argument which can be used to target an axis by index
>>> arr.sum(axis=0)
implemented LGroup.__getitem__ & LGroup.__iter__, so that for list-based groups (ie not slices) you can write:
>>> for v in my_group: ... # some code
or
>>> my_group[0]
Miscellaneous improvements
renamed LabelGroup to LGroup and PositionalKey to PGroup. We might want to rename the later to IGroup (to be consistent with axis.i[…]).
slightly better support for axes without name
better docstrings for a few functions
misc cleanup
Fixes
fixed XXX_like(a) functions to use the same dtype than a instead of always float
fixed to_XXX with 1d arrays (e.g. to_clipboard())
fixed all() and any() toplevel functions without argument
fixed LArray without axes in some cases
fixed array creation functions with only shapes on python2
Version 0.6
Released on 2016-01-12.
New features
a[bool_array_key] broadcasts missing/differently ordered dimensions and returns an LArray with combined axes
a[bool_array_key] = value broadcasts missing/differently ordered dimensions on both key and value
- implemented argmin, argmax, argsort, posargmin, posargmax, posargsort.
they do indirect operation along an axis. E.g. argmin gives the label of the minimum value, argsort gives the labels which would sort the array along that dimension. posargXXX gives the position/indexes instead of the labels.
implemented Axis.__iter__ so that one can write:
>>> for label in an_array.axes.an_axis: ... <some code>
instead of
>>> for label in an_array.axes.an_axis.labels: ... <some code>
implemented the .info property on AxisCollection
implement all/any top level functions, so that you can use them in with_total.
Miscellaneous improvements
renamed ValueGroup to LabelGroup. We might want to rename it to LGroup to be consistent with LArray?
allow a single int as argument to LArray creation functions (ndrange et al.)
e.g. ndrange(10) is now allowed instead of ndrange([10])
use display_name in .info (ie add * next to wildcard axes in .info).
allow specifying a custom window title in view()
viewer displays booleans as True/False instead of 1/0
slightly better support for axes with no name (None). There is still a long way to go for full support though.
improved a few docstrings
nicer errors when tests results are different from expected
removed debug prints from viewer
misc cleanups
Fixes
fixed view() on all-negative arrays
fixed view() on string arrays
Version 0.5
Released on 2015-12-15.
New features
experimental support for indexing an LArray by another (integer) LArray
>>> array[other_array]
experimental support for LArray.drop_labels and the concept of wildcard axes
added LArray.display_name and AxisCollection.display_names which add ‘*’ next to wildcard axes
implemented where(cond, array1, array2)
implemented LArray.__iter__ so that this works:
>>> for value in array: ... <some code>
implement keepaxes=label or keepaxes=True for aggregate functions on full axes
array.sum(x.age, keepaxes=’total’)
AxisCollection.replace can replace several axes in one call
implemented .expand(out=) to expand into an existing array
Miscellaneous improvements
removed Axis.sorted()
removed LArray.axes_names & axes_labels. One should use .axes.names & .axes.labels instead.
raise an error when trying to convert an array with more than one value to a Boolean. For example, this will fail:
>>> arr = ndrange([sex]) >>> if arr: ... <some code>
convert value to self.dtype in append/prepend
faster .extend, .append, .prepend and .expand
some code cleanup, better tests, …
Fixes
fixed .extend when other has longer axes than self
Version 0.4
Released on 2015-12-09.
New features
implemented LArray.expand to add dimensions
implemented prepend
implemented sort_axis
allow creating 0d (scalar) LArrays
Miscellaneous improvements
made extend expand its arguments
made .append expand its value before appending
changed read_* to not sort data by default
more minor stuff :)
Fixes
fixed loading 1d arrays
Version 0.3
Released on 2015-11-26.
New features
implemented LArray.with_total(): appends axes or group aggregates to the array.
Without argument, it adds totals on all axes. It has optional keyword only arguments:
label: specify the label (“total” by default)
op: specify the aggregate function (sum by default, all other aggregates should work too)
With multiple arguments, it adds totals sequentially. There are some tricky cases. For example when, for the same axis, you add group aggregates and axis aggregates:
>>> # works but "wrong" for x.geo (double what is expected because the total also >>> # includes fla wal & bru) >>> la.with_total(x.sex, (fla, wal, bru), x.geo, x.lipro)
>>> # correct total but the order is not very nice >>> la.with_total(x.sex, x.geo, (fla, wal, bru), x.lipro)
>>> # the correct way to do it, but it is probably not entirely obvious >>> la.with_total(x.sex, (fla, wal, bru, x.geo.all()), x.lipro)
>>> # we probably want to display a warning (or even an error?) in that case. >>> # If the user really wants that behavior, he can split the operation: >>> # .with_total((fla, wal, bru)).with_total(x.geo)
implemented group aggregates without using keyword arguments. As a consequence of this, one can no longer use axis numbers in aggregates. Eg. a.sum(0) does not sum on the first axis anymore (but you can do a.sum(a.axes[0]) if needed)
implemented LArray.percent: equivalent to ratio * 100
implemented Session.filter -> returns a new Session with only objects matching the filter
implemented Session.dump -> dumps all LArray in the Session to a file
implemented Session.load -> load several LArrays from a file to a Session
Version 0.2.6
Released on 2015-11-24.
Fixes
fixed LArray.cumsum and cumprod.
fixed all doctests just enough so that they run.
Version 0.2.5
Released on 2015-10-29.
Miscellaneous improvements
many methods got (improved) docstrings (Thanks to Johan).
Fixes
fixed mixing keys without axis (e.g. arr[10:15]) with key with axes (e.g. arr[x.age[10:15]]).
Version 0.2.4
Released on 2015-10-27.
New features
includes an experimental (slightly inefficient) version of guess axis, so that one can write:
>>> arr[10:20]
instead of
>>> arr[age[10:20]]
Version 0.2.3
Released on 2015-10-19.
New features
positional slicing via “x.” syntax (x.axis.i[:5])
Fixes
view(array) is usable when doing from larray import *
fixed a nasty bug for doing “group” aggregates when there is only one dimension
Version 0.2.2
Released on 2015-10-15.
New features
implement AxisCollection.replace(old_axis, new_axis)
implement positional indexing
Miscellaneous improvements
more powerful AxisCollection.pop added support .pop(name) or .pop(Axis object)
LArray.set_labels returns a new LArray by default use inplace=True to get previous behavior
include ndrange and __version__ in __all__
Fixes
fixed shift with n <= 0
Version 0.2.1
Released on 2015-10-14.
New features
implemented LArray.shift(axis, n=1)
Miscellaneous improvements
change set_labels API (axis, new_labels)
transform Axis.labels into a property so that _mapping is kept in sync
Fixes
hopefully fix build
Version 0.2
Released on 2015-10-13.
New features
added to_clipboard.
added embryonic documentation.
added sort_columns and na arguments to read_hdf.
added sort_rows, sort_columns and na arguments to read_excel.
added setup.py to install the module.
Miscellaneous improvements
IO functions (to_*/read_*) now support unnamed axes. The set of supported operations is very limited with such arrays though.
to_excel sheet_name defaults to “Sheet1” like in Pandas.
reorganised files.
automated somewhat releases (added a rudimentary release script).
Fixes
column titles are no longer converted to lowercase.
Version 0.1
Released on 2014-10-22.