Change log

Version 0.32.2

Released on 2020-04-03.

CORE

Fixes

  • fixed using Pandas >= 1.0 (closes issue 845).
  • fixed the missing space between parameters name and type in API documentation (closes issue 849).
  • fixed a few issues for Python 2.7 and/or Linux.

EDITOR

Fixes

  • fixed spurious warning in the console when an expression results in an empty sequence (array, list, tuple).
  • fixed displaying arrays entirely filled with NaN.

Version 0.32.1

Released on 2019-12-19.

CORE

Miscellaneous improvements

  • improved the tutorial and some examples to make them more intuitive (closes issue 829).

Fixes

  • fixed loading arrays with more than 2 dimensions but no axes names (even when specifying nb_axes explicitly). This case mostly occurs when trying to load a specific range of an Excel file (closes issue 830 and issue 831).

EDITOR

Fixes

  • fixed the “Cancel” button of the confirmation dialog when trying to quit the editor with unsaved modifications. It was equivalent to discard, potentially leading to data loss.
  • fixed (harmless) error messages appearing when trying to display any variable via the console when using matplotlib 3.1+

Version 0.32

Released on 2019-11-17.

CORE

Syntax changes

Backward incompatible changes

  • Because it was broken, the possibility to dump and load Axis and Group objects contained in a session has been removed for the CSV and Excel formats. Fixing it would have taken too much time considering it is very rarely used (no one complains it was broken) so the decision to remove it was taken. However, this is still possible using the HDF format. Closes issue 815.

Miscellaneous improvements

  • conda channel to install or update the larray, larray-editor, larray-eurostat and larrayenv packages switched from gdementen to larray-project (closes issue 560).

Fixes

  • fixed binary operations between a session and an array object (closes issue 807).
  • fixed Array.reindex() printing a spurious warning message when the axes_to_reindex argument was the name of the axis to reindex (closes issue 812).
  • fixed zip_array_values() and zip_array_items() functions not available when importing the entire larray library as from larray import * (closes issue 816).
  • fixed wrong axes and groups names when loading a session from an HDF file (closes issue 803).

EDITOR

New features

  • added debug() function which opens an editor window with an extra widget to navigate back in the call stack (the chain of functions called to reach the current line of code).

Miscellaneous improvements

  • Sizes of the main window and the resizable components are saved when closing the viewer and restored when it is reopened (closes issue 165).
  • added keyword arguments rtol, atol and nans_equal to the compare() function (closes issue 172).
  • run_editor_on_exception() now uses debug() so that one can inspect what the state was in all functions traversed to reach the code which triggered the exception.

Version 0.31

Released on 2019-08-09.

  • added the ExcelReport class allowing to generate multiple graphs in an Excel file at once (closes issue 676).
  • fixed binary operations (+, -, *, etc.) between an LArray and a (scalar) Group which silently gave a wrong result (closes issue 797).
  • fixed taking a subset of an array with boolean labels for an axis if the user explicitly specify the axis (closes issue 735). When the user does not specify the axis, it currently fails but it is unclear what to do in that case (see issue 794).
  • fixed a regression in 0.30: X.axis_name[groups] failed when groups were originally defined on axes with the same name (i.e. when the operation was not actually needed). Closes issue 787.

Version 0.30

Released on 2019-06-27.

  • stack() axis argument was renamed to axes to reflect the fact that the function can now stack along multiple axes at once (see below).
  • to accommodate for the “simpler pattern language” now supported for those functions, using a regular expression in Axis.matching() or Group.matching() now requires passing the pattern as an explicit regex keyword argument instead of just the first argument of those methods. For example my_axis.matching('test.*') becomes my_axis.matching(regex='test.*').
  • LArray.as_table() is deprecated because it duplicated functionality found in LArray.dump(). Please only use LArray.dump() from now on.
  • renamed a_min and a_max arguments of LArray.clip() to minval and maxval respectively and made them optional (closes issue 747).
  • modified the behavior of the pattern argument of Session.filter() to actually support patterns instead of only checking if the object names start with the pattern. Special characters include ? for matching any single character and * for matching any number of characters. Closes issue 703.

    Warning

    If you were using Session.filter, you must add a * to your pattern to keep your code working. For example, my_session.filter('test') must be changed to my_session.filter('test*').

  • LArray.equals() now returns True for arrays even when axes are in a different order or some axes are missing on either side (but the data is constant over that axis on the other side). Closes issue 237.

    Warning

    If you were using LArray.equals() and want to keep the old, stricter, behavior, you must add check_axes=True.

  • added set_options() and get_options() functions to respectively set and get options for larray. Available options currently include display_precision for controlling the number of decimal digits used when showing floating point numbers, display_maxlines to control the maximum number of lines to use when displaying an array, etc. set_options() can used either like a normal function to set the options globally or within a with block to set them only temporarily. Closes issue 274.
  • implemented read_stata() and LArray.to_stata() to read arrays from and write arrays to Stata .dta files.
  • implemented LArray.isin() method to check whether each value of an array is contained in a list (or array) of values.
  • implemented LArray.unique() method to compute unique values (or sub-arrays) for an array, optionally along axes.
  • implemented LArray.apply() method to apply a python function to all values of an array or to all sub-arrays along some axes of an array and return the result. This is an extremely versatile method as it can be used both with aggregating functions or element-wise functions.
  • implemented LArray.apply_map() method to apply a transformation mapping to array elements. For example, this can be used to transform some numeric codes to labels.
  • implemented LArray.reverse() method to reverse one or several axes of an array (closes issue 631).
  • implemented LArray.roll() method to roll the cells of an array n-times to the right along an axis. This is similar to LArray.shift(), except that cells which are pushed “outside of the axis” are reintroduced on the opposite side of the axis instead of being dropped.
  • implemented Axis.apply() method to transform an axis labels by a function and return a new Axis.
  • added Session.update() method to add and modify items from an existing session by passing either another session or a dict-like object or an iterable object with (key, value) pairs (closes issue 754).
  • implemented AxisCollection.rename() to rename axes of an AxisCollection, independently of any array.
  • implemented AxisCollection.set_labels() (closes issue 782).
  • implemented wrap_elementwise_array_func() function to make a function defined in another library work with LArray arguments instead of with numpy arrays.
  • implemented LArray.keys(), LArray.values() and LArray.items() methods to respectively loop on an array labels, values or (key, value) pairs.
  • implemented zip_array_values() and zip_array_items() to loop respectively on several arrays values or (key, value) pairs.
  • implemented AxisCollection.iter_labels() to iterate over all (possible combinations of) labels of the axes of the collection.
  • improved speed of read_hdf() function when reading a stored LArray object dumped with the current and future version of larray. To get benefit of the speedup of reading arrays dumped with older versions of larray, please read and re-dump them. Closes issue 563.

  • allowed to not specify the axes in LArray.set_labels() (closes issue 634):

    >>> a = ndtest('nat=BE,FO;sex=M,F')
    >>> a
    nat\sex  M  F
         BE  0  1
         FO  2  3
    >>> a.set_labels({'M': 'Men', 'BE': 'Belgian'})
    nat\sex  Men  F
    Belgian    0  1
         FO    2  3
    
  • LArray.set_labels() can now take functions to transform axes labels (closes issue 536).

    >>> arr = ndtest((2, 2))
    >>> arr
    a\b  b0  b1
     a0   0   1
     a1   2   3
    >>> arr.set_labels('a', str.upper)
    a\b  b0  b1
     A0   0   1
     A1   2   3
    
  • implemented the same “simpler pattern language” in Axis.matching() and Group.matching() than in Session.filter(), in addition to regular expressions (which now require using the regexp argument).

  • stack() can now stack along several axes at once (closes issue 56).

    >>> country = Axis('country=BE,FR,DE')
    >>> gender = Axis('gender=M,F')
    >>> stack({('BE', 'M'): 0,
    ...        ('BE', 'F'): 1,
    ...        ('FR', 'M'): 2,
    ...        ('FR', 'F'): 3,
    ...        ('DE', 'M'): 4,
    ...        ('DE', 'F'): 5},
    ...       (country, gender))
    country\gender  M  F
                BE  0  1
                FR  2  3
                DE  4  5
    
  • stack() using a dictionary as elements can now use a simple axis name instead of requiring a full axis object. This will print a warning on Python < 3.7 though because the ordering of labels is not guaranteed in that case. Closes issue 755 and issue 581.

  • stack() using keyword arguments can now use a simple axis name instead of requiring a full axis object, even on Python < 3.6. This will print a warning though because the ordering of labels is not guaranteed in that case.

  • added password argument to Workbook.save() to allow protecting Excel files with a password.

  • added option exact to join argument of Axis.align() and LArray.align() methods. Instead of aligning, passing join='exact' to the align method will raise an error when axes are not equal. Closes issue 338.

  • made Axis.by() and Group.by() return a list of named groups instead of anonymous groups. By default, group names are defined as <start>:<end>. This can be changed via the new template argument:

    >>> age = Axis('age=0..6')
    >>> age
    Axis([0, 1, 2, 3, 4, 5, 6], 'age')
    >>> age.by(3)
    (age.i[0:3] >> '0:2', age.i[3:6] >> '3:5', age.i[6:7] >> '6')
    >>> age.by(3, step=2)
    (age.i[0:3] >> '0:2', age.i[2:5] >> '2:4', age.i[4:7] >> '4:6', age.i[6:7] >> '6')
    >>> age.by(3, template='{start}-{end}')
    (age.i[0:3] >> '0-2', age.i[3:6] >> '3-5', age.i[6:7] >> '6')
    

    Closes issue 669.

  • allowed to specify an axis by its position when selecting a subset of an array using the string notation:

    >>> pop_mouv = ndtest('geo_from=BE,FR,UK;geo_to=BE,FR,UK')
    >>> pop_mouv
    geo_from\geo_to  BE  FR  UK
                 BE   0   1   2
                 FR   3   4   5
                 UK   6   7   8
    >>> pop_mouv['0[BE, UK]']   # equivalent to pop_mouv[pop_mouv.geo_from['BE,UK']]
    geo_from\geo_to  BE  FR  UK
                 BE   0   1   2
                 UK   6   7   8
    >>> pop_mouv['1.i[0, 2]']   # equivalent to pop_mouv[pop_mouv.geo_to.i[0, 2]]
    geo_from\geo_to  BE  UK
                 BE   0   2
                 FR   3   5
                 UK   6   8
    

    Closes issue 671.

  • added documentation and examples for where(), maximum() and minimum() functions (closes issue 700)

  • updated the Working With Sessions section of the tutorial (closes issue 568).

  • added dtype argument to LArray to set the type of the array explicitly instead of relying on auto-detection.

  • added dtype argument to stack to set the type of the resulting array explicitly instead of relying on auto-detection.

  • allowed to pass a single axis or group as axes_to_reindex argument of the LArray.reindex() method (closes issue 712).

  • LArray.dump() gained a few extra arguments to further customize output : - axes_names : to specify whether or not the output should contain the axes names (and which) - maxlines and edgeitems : to dump only the start and end of large arrays - light : to output axes labels only when they change instead of repeating them on each line - na_repr : to specify how to represent N/A (NaN) values

  • substantially improved performance of creating, iterating, and doing a few other operations over larray objects. This solves a few pathological cases of slow operations, especially those involving many small-ish arrays but sadly the overall performance improvement is negligible over most of the real-world models using larray that we tested these changes on.

  • fixed dumping to Excel arrays of “object” dtype containing NaN values using numpy float types (fixes the infamous 65535 bug).

  • fixed LArray.divnot0() being slow when the divisor has many axes and many zeros (closes issue 705).

  • fixed maximum length of sheet names (31 characters instead of 30 characters) when adding a new sheet to an Excel Workbook (closes issue 713).

  • fixed missing documentation of many functions in Utility Functions section of the API Reference (closes issue 698).

  • fixed arithmetic operations between two sessions returning a nan value for each axis and group (closes issue 725).

  • fixed dumping sessions with metadata in HDF format (closes issue 702).

  • fixed minimum version of pandas to install. The minimum version is now 0.20.0.

  • fixed from_frame for dataframes with non string index names.

  • fixed creating an LSet from an IGroup with a (single) scalar key

    >>> a = Axis('a=a0,a1,a2')
    >>> a.i[1].set()
    a['a1'].set()
    

Version 0.29

Released on 2018-09-07.

Syntax changes

  • deprecated title attribute of LArray objects and title argument of array creation functions. A title is now considered as a metadata and must be added as:

    >>> # add title at array creation
    >>> arr = ndtest((3, 3), meta=[('title', 'array for testing')])
    
    >>> # or after array creation
    >>> arr = ndtest((3, 3))
    >>> arr.meta.title = 'array for testing'
    

    See below for more information about metadata handling.

  • renamed LArray.drop_labels() to LArray.ignore_labels() to avoid confusion with the new LArray.drop() method (closes issue 672).

  • renamed Session.array_equals() to Session.element_equals() because this method now also compares axes and groups in addition to arrays.

  • renamed Sheet.load() and Range.load() nb_index argument to nb_axes to be consistent with all other input functions (read_*). Sheet and Range are the objects one gets when taking subsets of the excel Workbook objects obtained via open_excel() (closes issue 648).

  • deprecated the element_equal() function in favor of the LArray.eq() method (closes issue 630) to be consistent with other future methods for operations between two arrays.

  • renamed nan_equals argument of LArray.equals() and LArray.eq() methods to nans_equal because it is grammatically more correct and is explained more naturally as “whether two nans should be considered equal”.

  • LArray.insert() pos and axis arguments are deprecated because those were only useful for very specific cases and those can easily be rewritten by using an indices group (axis.i[pos]) for the before argument instead (closes issue 652).

New features

  • allowed arrays to have metadata (e.g. title, description, authors, …).

    Metadata can be added when creating arrays:

    >>> # for Python <= 3.5
    >>> arr = ndtest((3, 3), meta=[('title', 'array for testing'), ('author', 'John Smith')])
    
    >>> # for Python >= 3.6
    >>> arr = ndtest((3, 3), meta=Metadata(title='array for testing', author='John Smith'))
    

    To access all existing metadata, use array.meta, for example:

    >>> arr.meta
    title: array for testing
    author: John Smith
    

    To access some specific existing metadata, use array.meta.<name>, for example:

    >>> arr.meta.author
    'John Smith'
    

    Updating some existing metadata, or creating new metadata (the metadata is added if there was no metadata using that name) should be done using array.meta.<name> = <value>. For example:

    >>> arr.meta.city = 'London'
    

    To remove some metadata, use del array.meta.<name>, for example:

    >>> del arr.meta.city
    

    Note

    • Currently, only the HDF (.h5) file format supports saving and loading array metadata.
    • Metadata is not kept when actions or methods are applied on an array except for operations modifying the object in-place, such as pop[age < 10] = 0, and when the method copy() is called. Do not add metadata to an array if you know you will apply actions or methods on it before dumping it.

    Closes issue 78 and issue 79.

  • allowed sessions to have metadata. Session metadata is created and accessed using the same syntax than for arrays (session.meta.<name>), for example to add metadata to a session at creation:

    >>> # Python <= 3.5
    >>> s = Session([('arr1', ndtest(2)), ('arr2', ndtest(3)], meta=[('title', 'my title'), ('author', 'John Smith')])
    
    >>> # Python 3.6+
    >>> s = Session(arr1=ndtest(2), arr2=ndtest(3), meta=Metadata(title='my title', author='John Smith'))
    

    Note

    • Contrary to array metadata, saving and loading session metadata is supported for all current session file formats: Excel, CSV and HDF (.h5)
    • Metadata is not kept when actions or methods are applied on a session except for operations modifying a specific array, such as: s[‘arr1’] = 0. Do not add metadata to a session if you know you will apply actions or methods on it before dumping it.

    Closes issue 640.

  • implemented LArray.drop() to return an array without some labels or indices along an axis (closes issue 506).

    >>> arr1 = ndtest((2, 4))
    >>> arr1
    a\b  b0  b1  b2  b3
     a0   0   1   2   3
     a1   4   5   6   7
    >>> a, b = arr1.axes
    

    Dropping a single label

    >>> arr1.drop('b1')
    a\b  b0  b2  b3
     a0   0   2   3
     a1   4   6   7
    

    Dropping multiple labels

    >>> # arr1.drop('b1,b3')
    >>> arr1.drop(['b1', 'b3'])
    a\b  b0  b2
     a0   0   2
     a1   4   6
    

    Dropping a slice

    >>> # arr1.drop('b1:b3')
    >>> arr1.drop(b['b1':'b3'])
    a\b  b0
     a0   0
     a1   4
    

    Dropping labels by position requires to specify the axis

    >>> # arr1.drop('b.i[1]')
    >>> arr1.drop(b.i[1])
    a\b  b0  b2  b3
     a0   0   2   3
     a1   4   6   7
    
  • added new module to create arrays with values generated randomly following a few different distributions, or shuffle an existing array along an axis:

    >>> from larray.random import *
    

    Generate integers between two bounds (0 and 10 in this example)

    >>> randint(0, 10, axes='a=a0..a2')
    a  a0  a1  a2
        3   6   2
    

    Generate values following a uniform distribution

    >>> uniform(axes='a=a0..a2')
    a                   a0                  a1                  a2
       0.33293756929238394  0.5331412592583252  0.6748786766763107
    

    Generate values following a normal distribution (\(\mu\) = 1 and \(\sigma\) = 2 in this example)

    >>> normal(1, scale=2, axes='a=a0..a2')
    a                   a0                 a1                  a2
       -0.9216651561025018  5.119734598931103  4.4467876992838935
    

    Randomly shuffle an existing array along one axis

    >>> arr = ndtest((3, 3))
    >>> arr
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
     a2   6   7   8
    >>> permutation(arr, axis='b')
    a\b  b1  b2  b0
     a0   1   2   0
     a1   4   5   3
     a2   7   8   6
    

    Generate values by randomly choosing between specified values (5, 10 and 15 in this example), potentially with a specified probability for each value (respectively a 30%, 50%, 20% probability of occurring in this example).

    >>> choice([5, 10, 15], p=[0.3, 0.5, 0.2], axes='a=a0,a1;b=b0..b2')
    a\b  b0  b1  b2
     a0  15  10  10
     a1  10   5  10
    

    Same as above with labels and probabilities given as a one dimensional LArray

    >>> proba = LArray([0.3, 0.5, 0.2], Axis([5, 10, 15], 'outcome'))
    >>> proba
    outcome    5   10   15
             0.3  0.5  0.2
    >>> choice(p=proba, axes='a=a0,a1;b=b0..b2')
    a\b  b0  b1  b2
     a0  10  15   5
     a1  10   5  10
    
  • made a few useful constants accessible directly from the larray module: nan, inf, pi, e and euler_gamma. Like for any Python functionality, you can choose how to import and use them. For example, for pi:

    >>> from larray import *
    >>> pi
    3.141592653589793
    OR
    >>> from larray import pi
    >>> pi
    3.141592653589793
    OR
    >>> import larray as la
    >>> la.pi
    3.141592653589793
    
  • added Group.equals() method which compares group names, associated axis names and labels between two groups:

    >>> a = Axis('a=a0..a3')
    >>> a02 = a['a0:a2'] >> 'group_a'
    >>> # different group name
    >>> a02.equals(a['a0:a2'])
    False
    >>> # different axis name
    >>> other_axis = a.rename('other_name')
    >>> a02.equals(other_axis['a0:a2'] >> 'group_a')
    False
    >>> # different labels
    >>> a02.equals(a['a1:a3'] >> 'group_a')
    False
    

Miscellaneous improvements

  • completely rewritten the ‘Load And Dump Arrays, Sessions, Axes And Groups’ section of the tutorial (closes issue 645)

  • saving or loading a session from a file now includes Axis and Group objects in addition to arrays (closes issue 578).

    Create a session containing axes, groups and arrays

    >>> a, b = Axis("a=a0..a2"), Axis("b=b0..b2")
    >>> a01 = a['a0,a1'] >> 'a01'
    >>> arr1, arr2 = ndtest((a, b)), ndtest(a)
    >>> s = Session([('a', a), ('b', b), ('a01', a01), ('arr1', arr1), ('arr2', arr2)])
    

    Saving a session will save axes, groups and arrays

    >>> s.save('session.h5')
    

    Loading a session will load axes, groups and arrays

    >>> s2 = s.load('session.h5')
    >>> s2
    Session(arr1, arr2, a, b, a01)
    

    Note

    All axes and groups of a session are stored in the same CSV file/Excel sheet/HDF group named respectively __axes__ and __groups__.

  • vastly improved indexing using arrays (of labels, indices or booleans). Many advanced cases did not work, including when combining several indexing arrays, or when (one of) the indexing array(s) had an axis present in the array.

    First let’s create some test axes

    >>> a, b, c = ndtest((2, 3, 2)).axes
    

    Then create a test array.

    >>> arr = ndtest((a, b))
    >>> arr
    a\b b0 b1 b2
     a0  0  1  2
     a1  3  4  5
    

    If the key array has an axis not already present in arr (e.g. c), the target axis (a) is replaced by the extra axis (c). This already worked previously.

    >>> key = LArray(['a1', 'a0'], c)
    >>> key
    c  c0  c1
       a1  a0
    >>> arr[key]
    c\b  b0  b1  b2
     c0   3   4   5
     c1   0   1   2
    

    If the key array has the target axis, the axis stays the same, but the data is reordered (this also worked previously):

    >>> key = LArray(['b1', 'b0', 'b2'], b)
    >>> key
    b  b0  b1  b2
       b1  b0  b2
    >>> arr[key]
    a\b  b0  b1  b2
     a0   1   0   2
     a1   4   3   5
    

    From here on, the examples shown did not work previously…

    Now, if the key contains another axis present in the array (b) which is not the target axis (a), the target axis completely disappears (both axes are replaced by the key axis):

    >>> key = LArray(['a0', 'a1', 'a0'], b)
    >>> key
    b  b0  b1  b2
       a0  a1  a0
    >>> arr[key]
    b  b0  b1  b2
        0   4   2
    

    If the key has both the target axis (a) and another existing axis (b)

    >>> key
    a\b b0 b1 b2
     a0 a0 a1 a0
     a1 a1 a0 a1
    >>> arr[key]
    a\b  b0  b1  b2
     a0   0   4   2
     a1   3   1   5
    

    If the key has both another existing axis (a) and an extra axis (c)

    >>> key
    a\c  c0  c1
     a0  b0  b1
     a1  b2  b0
    >>> arr[key]
    a\c  c0  c1
     a0   0   1
     a1   5   3
    

    It also works if the key has the target axis (a), another existing axis (b) and an extra axis (c), but this is not shown for brevity.

  • updated Session.summary() so as to display all kinds of objects and allowed to pass a function returning a string representation of an object instead of passing a pre-defined string template (closes issue 608):

    >>> axis1 = Axis("a=a0..a2")
    >>> group1 = axis1['a0,a1'] >> 'a01'
    >>> arr1 = ndtest((2, 2), title='array 1', dtype=np.int64)
    >>> arr2 = ndtest(4, title='array 2', dtype=np.int64)
    >>> arr3 = ndtest((3, 2), title='array 3', dtype=np.int64)
    >>> s = Session([('axis1', axis1), ('group1', group1), ('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
    

    Using the default template

    >>> print(s.summary())
    axis1: a ['a0' 'a1' 'a2'] (3)
    group1: a['a0', 'a1'] >> a01 (2)
    arr1: a, b (2 x 2) [int64]
        array 1
    arr2: a (4) [int64]
        array 2
    arr3: a, b (3 x 2) [int64]
        array 3
    

    Using a specific template

    >>> def print_array(key, array):
    ...     axes_names = ', '.join(array.axes.display_names)
    ...     shape = ' x '.join(str(i) for i in array.shape)
    ...     return "{} -> {} ({})\\n  title = {}\\n  dtype = {}".format(key, axes_names, shape,
    ...                                                                 array.title, array.dtype)
    >>> template = {Axis:  "{key} -> {name} [{labels}] ({length})",
    ...             Group: "{key} -> {name}: {axis_name} {labels} ({length})",
    ...             LArray: print_array}
    >>> print(s.summary(template))
    axis1 -> a ['a0' 'a1' 'a2'] (3)
    group1 -> a01: a ['a0', 'a1'] (2)
    arr1 -> a, b (2 x 2)
      title = array 1
      dtype = int64
    arr2 -> a (4)
      title = array 2
      dtype = int64
    arr3 -> a, b (3 x 2)
      title = array 3
      dtype = int64
    
  • methods Session.equals() and Session.element_equals() now also compare axes and groups in addition to arrays (closes issue 610):

    >>> a = Axis('a=a0..a2')
    >>> a01 = a['a0,a1'] >> 'a01'
    >>> s1 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
    >>> s2 = Session([('a', a), ('a01', a01), ('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
    

    Identical sessions

    >>> s1.element_equals(s2)
    name     a   a01  arr1  arr2
          True  True  True  True
    

    Different value(s) between two arrays

    >>> s2.arr1['a1'] = 0
    >>> s1.element_equals(s2)
    name     a   a01   arr1  arr2
          True  True  False  True
    

    Different label(s)

    >>> s2.arr2 = ndtest("b=b0,b1; a=a0,a1")
    >>> s2.a = Axis('a=a0,a1')
    >>> s1.element_equals(s2)
    name      a   a01   arr1   arr2
          False  True  False  False
    

    Extra/missing objects

    >>> s2.arr3 = ndtest((3, 3))
    >>> del s2.a
    >>> s1.element_equals(s2)
    name      a   a01   arr1   arr2   arr3
          False  True  False  False  False
    
  • added arguments wide and value_name to methods LArray.as_table() and LArray.dump() like in LArray.to_excel() and LArray.to_csv() (closes issue 653).

  • the from_series() function supports Pandas series with a MultiIndex (closes issue 465)

  • the stack() function supports any array-like object instead of only LArray objects.

    >>> stack(a0=[1, 2, 3], a1=[4, 5, 6], axis='a')
    {0}*\a  a0  a1
         0   1   4
         1   2   5
         2   3   6
    
  • made some operations on Excel Workbooks a bit faster by telling Excel to avoid updating the screen when the Excel instance is not visible anyway. This affects all workbooks opened via open_excel() as well as read_excel() and LArray.to_excel() when using the default xlwings engine.

  • made the documentation link in Windows start menu version-specific (instead of always pointing to the latest release) so that users do not inadvertently use the latest release syntax when using an older version of larray (closes issue 142).

  • added menu bar with undo/redo when editing single arrays (as a byproduct of issue 133).

Fixes

  • fixed Copy(to Excel)/Paste/Plot in the editor not working for 1D and 2D arrays (closes issue 140).

  • fixed Excel add-ins not loaded when opening an Excel Workbook by calling the LArray.to_excel() method with no path or via “Copy to Excel (CTRL+E)” in the editor (closes issue 154).

  • made LArray support Pandas versions >= 0.21 (closes issue 569)

  • fixed current active Excel Workbook being closed when calling the LArray.to_excel() method on an array with -1 as filepath argument (closes issue 473).

  • fixed LArray.split_axes() when splitting a single axis and using the names argument (e.g. arr.split_axes('bd', names=('b', 'd'))).

  • fixed splitting an anonymous axis without specifying the names argument.

    >>> combined = ndtest('a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2')
    >>> combined
    {0}  a0_b0  a0_b1  a0_b2  a1_b0  a1_b1  a1_b2
             0      1      2      3      4      5
    >>> combined.split_axes(0)
    {0}\{1}  b0  b1  b2
         a0   0   1   2
         a1   3   4   5
    
  • fixed LArray.combine_axes() with wildcard=True.

  • fixed taking a subset of an array by giving an index along a specific axis using a string (strings like "axisname.i[pos]").

  • fixed the editor not working with Python 2 or recent Qt4 versions.

Version 0.28

Released on 2018-03-15.

Backward incompatible changes

  • changed behavior of operators session1 == session2 and session1 != session2: returns a session of boolean arrays (closes issue 516):

    >>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
    >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
    >>> (s1 == s2).arr1
    a    a0    a1
       True  True
    >>> s2.arr1['a1'] = 0
    >>> (s1 == s2).arr1
    a    a0     a1
       True  False
    >>> (s1 != s2).arr1
    a     a0    a1
       False  True
    

New features

  • made it possible to run the tutorial online (as a Jupyter notebook) by clicking on the launch|binder badge on top of the tutorial web page (closes issue 73)

  • added methods array_equals and equals to Session object to compare arrays from two sessions. The method array_equals return a boolean value for each array while the method equals returns a unique boolean value (True if all arrays of both sessions are equal, False otherwise):

    >>> s1 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
    >>> s2 = Session([('arr1', ndtest(2)), ('arr2', ndtest((2, 2)))])
    >>> s1.array_equals(s2)
    name  arr1  arr2
          True  True
    >>> s1.equals(s2)
    True
    

    Different value(s)

    >>> s2.arr1['a1'] = 0
    >>> s1.array_equals(s2)
    name   arr1  arr2
          False  True
    >>> s1.equals(s2)
    False
    

    Different label(s)

    >>> from larray import ndrange
    >>> s2.arr2 = ndrange("b=b0,b1; a=a0,a1")
    >>> s1.array_equals(s2)
    name   arr1   arr2
          False  False
    >>> s1.equals(s2)
    False
    

    Extra/missing array(s)

    >>> s2.arr3 = ndtest((3, 3))
    >>> s1.array_equals(s2)
    name   arr1   arr2   arr3
          False  False  False
    >>> s1.equals(s2)
    False
    

    Closes issue 517.

  • added method equals to LArray object to compare two arrays:

    >>> arr1 = ndtest((2, 3))
    >>> arr1
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
    >>> arr2 = arr1.copy()
    >>> arr1.equals(arr2)
    True
    >>> arr2['b1'] += 1
    >>> arr1.equals(arr2)
    False
    >>> arr3 = arr1.set_labels('a', ['x0', 'x1'])
    >>> arr1.equals(arr3)
    False
    

    Arrays with nan values

    >>> arr1 = ndtest((2, 3), dtype=float)
    >>> arr1['a1', 'b1'] = nan
    >>> arr1
    a\b   b0   b1   b2
     a0  0.0  1.0  2.0
     a1  3.0  nan  5.0
    >>> arr2 = arr1.copy()
    >>> # By default, an array containing nan values is never equal to another array,
    >>> # even if that other array also contains nan values at the same positions.
    >>> # The reason is that a nan value is different from *anything*, including itself.
    >>> arr1.equals(arr2)
    False
    >>> # set flag nan_equal to True to override this behavior
    >>> arr1.equals(arr2, nan_equal=True)
    True
    

This method also includes the arguments rtol (relative tolerance) and atol (absolute tolerance) allowing to test the equality between two arrays within a given relative or absolute tolerance:

>>> arr1 = LArray([6., 8.], "a=a0,a1")
>>> arr1
a   a0   a1
   6.0  8.0
>>> arr2 = LArray([5.999, 8.001], "a=a0,a1")
>>> arr2
a     a0     a1
   5.999  8.001
>>> arr1.equals(arr2)
False
>>> # equals returns True if abs(array1 - array2) <= (atol + rtol * abs(array2))
>>> arr1.equals(arr2, atol=0.01)
True
>>> arr1.equals(arr2, rtol=0.01)
True

Closes issue 488 and issue 518.

  • added Load from Script in the File menu of the editor allowing to load commands from an existing Python file (closes issue 96).

  • added Edit menu allowing to undo and redo changes of array values by editing cells and removed Apply and Discard buttons. Changes are now kept when switching from an array to another instead of losing them as previously (closes issue 32).

  • allowed to provide an absolute or relative tolerance value when comparing arrays through the compare function (closes issue 131).

  • made the editor able to detect and display plot objects stored in tuple, list or arrays. For example, arrays of plot objects are returned when using subplots=True option in calls of plot method:

    >>> a = ndtest('sex=M,F; nat=BE,FO; year=2000..2017')
    >>> # display 4 plots vertically placed (one plot for each pair (sex, nationality))
    >>> a.plot(subplots=True)
    >>> # display 4 plots ordered in a 2 x 2 grid
    >>> a.plot(subplots=True, layout=(2, 2))
    

    Closes issue 135.

Miscellaneous improvements

  • functions local_arrays, global_arrays and arrays returns a session excluding arrays starting by an underscore by default. To include them, set the flag include_private to True (closes issue 513):

    >>> global_arr1 = ndtest((2, 2))
    >>> _global_arr2 = ndtest((3, 3))
    >>> def foo():
    ...     local_arr1 = ndtest(2)
    ...     _local_arr2 = ndtest(3)
    ...
    ...     # exclude arrays starting with '_' by default
    ...     s = arrays()
    ...     print(s.names)
    ...
    ...     # use flag 'include_private' to include arrays starting with '_'
    ...     s = arrays(include_private=True)
    ...     print(s.names)
    >>> foo()
    ['global_arr1', 'local_arr1']
    ['_global_arr2', '_local_arr2', 'global_arr1', 'local_arr1']
    
  • implemented sessions binary operations with non sessions objects (closes issue 514 and issue 515):

    >>> s = Session(arr1=ndtest((2, 2)), arr2=ndtest((3, 3)))
    >>> s.arr1
    a\b  b0  b1
     a0   0   1
     a1   2   3
    >>> s.arr2
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
     a2   6   7   8
    

    Add a scalar to all arrays

    >>> # equivalent to s2 = 3 + s
    >>> s2 = s + 3
    >>> s2.arr1
    a\b  b0  b1
     a0   3   4
     a1   5   6
    >>> s2.arr2
    a\b  b0  b1  b2
     a0   3   4   5
     a1   6   7   8
     a2   9  10  11
    

    Apply binary operations between two sessions

    >>> sdiff = (s2 - s) / s
    >>> sdiff.arr1
    a\b   b0   b1
     a0  inf  3.0
     a1  1.5  1.0
    >>> sdiff.arr2
    a\b   b0    b1     b2
     a0  inf   3.0    1.5
     a1  1.0  0.75    0.6
     a2  0.5  0.43  0.375
    
  • added possibility to call the method reindex with a group (closes issue 531):

    >>> arr = ndtest((2, 2))
    >>> arr
    a\b  b0  b1
     a0   0   1
     a1   2   3
    >>> b = Axis("b=b2..b0")
    >>> arr.reindex('b', b['b1':])
    a\b  b1  b0
     a0   1   0
     a1   3   2
    
  • added possibility to call the methods diff and growth_rate with a group (closes issue 532):

    >>> data = [[2, 4, 5, 4, 6], [4, 6, 3, 6, 9]]
    >>> a = LArray(data, "sex=M,F; year=2016..2020")
    >>> a
    sex\year  2016  2017  2018  2019  2020
           M     2     4     5     4     6
           F     4     6     3     6     9
    >>> a.diff(a.year[2017:])
    sex\year  2018  2019  2020
           M     1    -1     2
           F    -3     3     3
    >>> a.growth_rate(a.year[2017:])
    sex\year  2018  2019  2020
           M  0.25  -0.2   0.5
           F  -0.5   1.0   0.5
    
  • function ndrange has been deprecated in favor of sequence or ndtest. Also, an Axis or a list/tuple/collection of axes can be passed to the ndtest function (closes issue 534):

    >>> ndtest("nat=BE,FO;sex=M,F")
    nat\sex  M  F
         BE  0  1
         FO  2  3
    
  • allowed to pass a group for argument axis of stack function (closes issue 535):

    >>> b = Axis('b=b0..b2')
    >>> stack(b0=ndtest(2), b1=ndtest(2), axis=b[:'b1'])
    a\b  b0  b1
     a0   0   0
     a1   1   1
    
  • renamed argument nb_index of read_csv, read_excel, read_sas, from_lists and from_string functions as nb_axes. The relation between nb_index and nb_axes is given by nb_axes = nb_index + 1:

    For a given file ‘arr.csv’ with content

    a,b\c,c0,c1
    a0,b0,0,1
    a0,b1,2,3
    a1,b0,4,5
    a1,b1,6,7
    

    previous code to read this array such as :

    >>> # deprecated
    >>> arr = read_csv('arr.csv', nb_index=2)
    

    must be updated as follow :

    >>> arr = read_csv('arr.csv', nb_axes=3)
    

    Closes issue 548.

  • deprecated nan_equal function in favor of element_equal function. The element_equal function has the same optional arguments as the LArray.equals method but compares two arrays element-wise and returns an array of booleans:

    >>> arr1 = LArray([6., np.nan, 8.], "a=a0..a2")
    >>> arr1
    a   a0   a1   a2
       6.0  nan  8.0
    >>> arr2 = LArray([5.999, np.nan, 8.001], "a=a0..a2")
    >>> arr2
    a     a0   a1     a2
       5.999  nan  8.001
    >>> element_equal(arr1, arr2)
    a     a0     a1     a2
       False  False  False
    >>> element_equal(arr1, arr2, nan_equals=True)
    a     a0    a1     a2
       False  True  False
    >>> element_equal(arr1, arr2, atol=0.01, nan_equals=True)
    a    a0    a1    a2
       True  True  True
    >>> element_equal(arr1, arr2, rtol=0.01, nan_equals=True)
    a    a0    a1    a2
       True  True  True
    

    Closes issue 593.

  • renamed argument transpose by wide in to_csv method.

  • added argument wide in to_excel method. When argument wide is set to False, the array is exported in “narrow” format, i.e. one column per axis plus one value column:

    >>> arr = ndtest((2, 3))
    >>> arr
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
    

    Default behavior (wide=True):

    >>> arr.to_excel('my_file.xlsx')
    a\b  b0  b1  b2
    a0    0   1   2
    a1    3   4   5
    

    With wide=False:

    >>> arr.to_excel('my_file.xlsx', wide=False)
     a   b  value
    a0  b0      0
    a0  b1      1
    a0  b2      2
    a1  b0      3
    a1  b1      4
    a1  b2      5
    

    Argument transpose has a different purpose than wide and is mainly useful to allow multiple axes as header when exporting arrays with more than 2 dimensions. Closes issue 575 and issue 371.

  • added argument wide to read_csv and read_excel functions. If False, the array to be loaded is assumed to be stored in “narrow” format:

    >>> # assuming the array was saved using command: arr.to_excel('my_file.xlsx', wide=False)
    >>> read_excel('my_file.xlsx', wide=False)
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
    

    Closes issue 574.

  • added argument name to to_series method allowing to set a name to the Pandas Series returned by the method.

  • added argument value_name to to_csv and to_excel allowing to change the default name (‘value’) to the column containg the values when the argument wide is set to False:

    >>> arr.to_csv('my_file.csv', wide=False, value_name='data')
    a,b,data
    a0,b0,0
    a0,b1,1
    a0,b2,2
    a1,b0,3
    a1,b1,4
    a1,b2,5
    

    Closes issue 549.

  • renamed argument sheetname of read_excel function as sheet (closes issue 587).

  • Renamed sheet_name of LArray.to_excel to sheet since it can also be an index (closes issue 580).

  • allowed to create axes with zero padded string labels (closes issue 533):

    >>> Axis('zero_padding=01,02,03,10,11,12')
    Axis(['01', '02', '03', '10', '11', '12'], 'zero_padding')
    
  • added a dropdown menu containing recently used files in dialog boxes of Save Command History To Script and Load from Script from File menu.

Fixes

  • fixed passing a scalar group from an external axis to get a subset of an array (closes issue 178):

    >>> arr = ndtest((3, 2))
    >>> arr['a1']
    b  b0  b1
        2   3
    >>> alt_a = Axis("alt_a=a1..a2")
    >>> arr[alt_a['a1']]
    b  b0  b1
        2   3
    >>> arr[alt_a.i[0]]
    b  b0  b1
        2   3
    
  • fixed subscript a string LGroup key (closes issue 437):

    >>> axis = Axis("a=a0,a1")
    >>> axis['a0'][0]
    'a'
    
  • fixed Axis.union, Axis.intersection and Axis.difference when passed value is a single string (closes issue 489):

    >>> a = Axis('a=a0..a2')
    >>> a.union('a1')
    Axis(['a0', 'a1', 'a2'], 'a')
    >>> a.union('a3')
    Axis(['a0', 'a1', 'a2', 'a3'], 'a')
    >>> a.union('a1..a3')
    Axis(['a0', 'a1', 'a2', 'a3'], 'a')
    >>> a.intersection('a1..a3')
    Axis(['a1', 'a2'], 'a')
    >>> a.difference('a1..a3')
    Axis(['a0'], 'a')
    
  • fixed to_excel applied on >= 2D arrays using transpose=True (closes issue 579)

    >>> arr = ndtest((2, 3))
    >>> arr.to_excel('my_file.xlsx', transpose=True)
    b\a  a0  a1
    b0    0   3
    b1    1   4
    b2    2   5
    
  • fixed aggregation on arrays containing zero padded string labels (closes issue 522):

    >>> arr = ndtest('zero_padding=01,02,03,10,11,12')
    >>> arr
    zero_padding  01  02  03  10  11  12
                   0   1   2   3   4   5
    >>> arr.sum('01,02,03 >> 01_03; 10')
    zero_padding  01_03  10
                      3   3
    

Version 0.27

Released on 2017-11-30.

Syntax changes

  • renamed Axis.translate to Axis.index (closes issue 479).
  • deprecated reverse argument of sort_values and sort_axes methods in favor of ascending argument (defaults to True). Closes issue 540.

Backward incompatible changes

  • labels are checked during array subset assignment (closes issue 269):

    >>> arr = ndtest(4)
    >>> arr
    a  a0  a1  a2  a3
        0   1   2   3
    >>> arr['a0,a1'] = arr['a2,a3']
    ValueError: incompatible axes:
    Axis(['a0', 'a1'], 'a')
    vs
    Axis(['a2', 'a3'], 'a')
    

    previous behavior can be recovered through drop_labels or by changing labels via set_labels or set_axes:

    >>> arr['a0,a1'] = arr['a2,a3'].drop_labels('a')
    >>> arr['a0,a1'] = arr['a2,a3'].set_labels('a', {'a2': 'a0', 'a3': 'a1'})
    
  • from_frame parse_header argument defaults to False instead of True.

New features

  • implemented Axis.insert and LArray.insert to add values at a given position of an axis (closes issue 54).

    >>> arr1 = ndtest((2, 3))
    >>> arr1
    a\\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
    >>> arr1.insert(42, before='b1', label='b0.5')
    a\\b  b0  b0.5  b1  b2
     a0   0    42   1   2
     a1   3    42   4   5
    

    insert an array

    >>> arr2 = ndtest(2)
    >>> arr2
    a  a0  a1
        0   1
    >>> arr1.insert(arr2, after='b0', label='b0.5')
    a\\b  b0  b0.5  b1  b2
     a0   0     0   1   2
     a1   3     1   4   5
    

    insert an array which already has the axis

    >>> arr3 = ndrange('a=a0,a1;b=b0.1,b0.2') + 42
    >>> arr3
    a\\b  b0.1  b0.2
     a0    42    43
     a1    44    45
    >>> arr1.insert(arr3, before='b1')
    a\\b  b0  b0.1  b0.2  b1  b2
     a0   0    42    43   1   2
     a1   3    44    45   4   5
    
  • added new items in the Help menu of the editor:

    • Report Issue…: to report an issue on the Github project website.
    • Users Discussion…: redirect to the LArray Users Google Group (you need to be registered to participate).
    • New Releases And Announces Mailing List…: redirect to the LArray Announce mailing list.
    • About: give information about the editor and the versions of packages currently installed on your computer (closes issue 88).
  • added Save Command History To Script in the File menu of the editor allowing to save executed commands in a new or existing Python file.

  • added possibility to show only rows with differences when comparing arrays or sessions through the compare function in the editor (closes issue 102).

  • added ascending argument to methods indicesofsorted and labelsofsorted. Values are sorted in ascending order by default. Set to False to sort values in descending order:

    >>> arr = LArray([[1, 5], [3, 2], [0, 4]], "nat=BE,FR,IT; sex=M,F")
    >>> arr
    nat\sex  M  F
         BE  1  5
         FR  3  2
         IT  0  4
    >>> arr.indicesofsorted("nat", ascending=False)
    nat\sex  M  F
          0  1  0
          1  0  2
          2  2  1
    >>> arr.labelsofsorted("nat", ascending=False)
    nat\sex   M   F
          0  FR  BE
          1  BE  IT
          2  IT  FR
    

    Closes issue 490.

Miscellaneous improvements

  • allowed to sort values of an array along an axis (closes issue 225):

    >>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE")
    >>> a
    sex\nat  EU  FO  BE
          M  10   2   4
          F   3   7   1
    >>> a.sort_values(axis='sex')
    sex*\nat  EU  FO  BE
           0   3   2   1
           1  10   7   4
    >>> a.sort_values(axis='nat')
    sex\nat*  0  1   2
           M  2  4  10
           F  1  3   7
    
  • method LArray.sort_values can be called without argument (closes issue 478):

    >>> arr = LArray([0, 1, 6, 3, -1], "a=a0..a4")
    >>> arr
    a  a0  a1  a2  a3  a4
        0   1   6   3  -1
    >>> arr.sort_values()
    a  a4  a0  a1  a3  a2
       -1   0   1   3   6
    

    If the array has more than one dimension, axes are combined together:

    >>> a = LArray([[10, 2, 4], [3, 7, 1]], "sex=M,F; nat=EU,FO,BE")
    >>> a
    sex\nat  EU  FO  BE
          M  10   2   4
          F   3   7   1
    >>> a.sort_values()
    sex_nat  F_BE  M_FO  F_EU  M_BE  F_FO  M_EU
                1     2     3     4     7    10
    
  • when appending/prepending/extending an array, both the original array and the added values will be converted to a data type which can hold both without loss of information. It used to convert the added values to the type of the original array. For example, given an array of integers like:

    >>> arr = ndtest(3)
    a  a0  a1  a2
        0   1   2
    

    Trying to add a floating point number to that array used to result in:

    >>> arr.append('a', 2.5, 'a3')
    a  a0  a1  a2  a3
        0   1   2   2
    

    Now it will result in:

    >>> arr.append('a', 2.5, 'a3')
    a   a0   a1   a2   a3
       0.0  1.0  2.0  2.5
    
  • made the editor more responsive when switching to or changing the filter of large arrays (closes issue 93).

  • added support for coloring numeric values for object arrays (e.g. arrays containing both strings and numbers).

  • documentation links in the Help menu of the editor point to the version of the documentation corresponding to the installed version of larray (closes issue 105).

Fixes

  • fixed array values being editable in view() (instead of only in edit()).

Version 0.26.1

Released on 2017-10-25.

Miscellaneous improvements

  • Made handling Excel sheets with many blank columns/rows after the data much faster (but still slower than sheets without such blank cells).

Fixes

  • fixed reading from and writing to Excel sheets with 16384 columns or 1048576 rows (Excel’s maximum).
  • fixed LArray.split_axes using a custom separator and not using sort=True or when the split labels are ambiguous with labels from other axes (closes issue 485).
  • fixed reading 1D arrays with non-string labels (closes issue 495).
  • fixed read_csv(sort_columns=True) for 1D arrays (closes issue 497).

Version 0.26

Released on 2017-10-13.

Syntax changes

  • renamed special variable x to X to let users define an x variable in their code without breaking all subsequent code using that special variable (closes issue 167).

  • renamed Axis.startswith, endswith and matches to startingwith, endingwith and matching to avoid a possible confusion with str.startswith and endswith which return booleans (closes issue 432).

  • renamed na argument of read_csv, read_excel, read_hdf and read_sas functions to fill_value to avoid confusion as to what the argument does and to be consistent with reindex and align (closes issue 394).

  • renamed split_axis to split_axes to reflect the fact that it can now split several axes at once (see below).

  • renamed sort_axis to sort_axes to reflect the fact that it can sort multiple axes at once (and does so by default).

  • renamed several methods with more explicit names (closes issue 50):

    • argmax, argmin, argsort to labelofmax, labelofmin, labelsofsorted
    • posargmax, posargmin, posargsort to indexofmax, indexofmin, indicesofsorted
  • renamed PGroup to IGroup to be consistent with other methods, especially the .i methods on axes and arrays (I is for Index – P was for Position).

Backward incompatible changes

  • getting a subset using a boolean selection returns an array with labels combined with underscore by defaults (for consistency with split_axes and combine_axes). Closes issue 376:

    >>> arr = ndtest((2, 2))
    >>> arr
    a\b  b0  b1
     a0   0   1
     a1   2   3
    >>> arr[arr < 3]
    a_b  a0_b0  a0_b1  a1_b0
             0      1      2
    

New features

  • added global_arrays() and arrays() functions to complement the local_arrays() function. They return a Session containing respectively all arrays defined in global variables and all available arrays (whether they are defined in local or global variables).

    When used outside of a function, these three functions should have the same results, but inside a function local_arrays() will return only arrays local to the function, global_arrays() will return only arrays defined globally and arrays() will return arrays defined either locally or globally. Closes issue 416.

  • a * symbol is appended to the window title when unsaved changes are detected in the viewer (closes issue 21).

  • implemented Axis.containing to create a Group with all labels of an axis containing some substring (closes issue 402).

    >>> people = Axis(['Bruce Wayne', 'Bruce Willis', 'Arthur Dent'], 'people')
    >>> people.containing('Will')
    people['Bruce Willis']
    
  • implemented Group.containing, startingwith, endingwith and matching to create a group with all labels of a group matching some criterion (closes issue 108).

    >>> group = people.startingwith('Bru')
    >>> group
    people['Bruce Wayne', 'Bruce Willis']
    >>> group.containing('Will')
    people['Bruce Willis']
    
  • implemented nan_equal() function to create an array of booleans telling whether each cell of the first array is equal to the corresponding cell in the other array, even in the presence of NaN.

    >>> arr1 = ndtest(3, dtype=float)
    >>> arr1['a1'] = nan
    >>> arr1
    a   a0   a1   a2
       0.0  nan  2.0
    >>> arr2 = arr1.copy()
    >>> arr1 == arr2
    a    a0     a1    a2
       True  False  True
    >>> nan_equal(arr1, arr2)
    a    a0    a1    a2
       True  True  True
    
  • implemented from_frame() to convert a Pandas DataFrame to an array:

    >>> df = ndtest((2, 2, 2)).to_frame()
    >>> df
    c      c0  c1
    a  b
    a0 b0   0   1
       b1   2   3
    a1 b0   4   5
       b1   6   7
    >>> from_frame(df)
     a  b\\c  c0  c1
    a0   b0   0   1
    a0   b1   2   3
    a1   b0   4   5
    a1   b1   6   7
    
  • implemented Axis.split to split an axis into several.

    >>> a_b = Axis('a_b=a0_b0,a0_b1,a0_b2,a1_b0,a1_b1,a1_b2')
    >>> a_b.split()
    [Axis(['a0', 'a1'], 'a'), Axis(['b0', 'b1', 'b2'], 'b')]
    
  • added the possibility to load the example dataset used in the tutorial via the menu File > Load Example in the viewer

Miscellaneous improvements

  • view() and edit() without argument now display global arrays in addition to local ones (closes issue 54).

  • using the mouse scrollwheel on filter combo boxes will switch to the previous/next label.

  • implemented a combobox to choose which color gradient to use and provide a few gradients.

  • inverted background colors in the viewer (red for low values and blue for high values). Closes issue 18.

  • allowed to pass an array of labels as new_axis argument to reindex method (closes issue 384):

    >>> arr = ndrange('a=v0..v1;b=v0..v2')
    >>> arr
    a\b  v0  v1  v2
     v0   0   1   2
     v1   3   4   5
    >>> arr.reindex('a', arr.b.labels)
    a\b   v0   v1   v2
     v0    0    1    2
     v1    3    4    5
     v2  nan  nan  nan
    
  • allowed to call the reindex method using a differently named axis for labels (closes issue 386):

    >>> arr = ndrange('a=v0..v1;b=v0..v2')
    >>> arr
    a\b  v0  v1  v2
     v0   0   1   2
     v1   3   4   5
    >>> arr.reindex('a', arr.b)
    a\b   v0   v1   v2
     v0    0    1    2
     v1    3    4    5
     v2  nan  nan  nan
    
  • arguments fill_value, sort_rows and sort_columns of read_excel function are also supported by the default xlwings engine (closes issue 393).

  • allowed to pass a label or group as sheet_name argument of the method to_excel or to a Workbook (open_excel). Same for key argument of the method to_hdf. Closes issue 328.

    >>> arr = ndtest((4, 4, 4))
    
    >>> # iterate over labels of a given axis
    >>> with open_excel('my_file.xlsx') as wb:
    >>>     for label in arr.a:
    ...         wb[label] = arr[label].dump()
    ...     wb.save()
    >>> for label in arr.a:
    ...     arr[label].to_hdf('my_file.h5', label)
    
    >>> # create and use a group
    >>> even = arr.a['a0,a2'] >> 'even'
    >>> arr[even].to_excel('my_file.xlsx', even)
    >>> arr[even].to_hdf('my_file.h5', even)
    
    >>> # special characters : \ / ? * [ or ] in labels or groups are replaced by an _ when exporting to excel
    >>> # sheet names cannot exceed 31 characters
    >>> g = arr.a['a1,a3,a4'] >> '?name:with*special\/[char]'
    >>> arr[g].to_excel('my_file.xlsx', g)
    >>> print(open_excel('my_file.xlsx').sheet_names())
    ['_name_with_special___char_']
    >>> # special characters \ or / in labels or groups are replaced by an _ when exporting to HDF file
    
  • allowed to pass a Group to read_excel/read_hdf as sheetname/key argument (closes issue 439).

    >>> a, b, c = arr.a, arr.b, arr.c
    
    >>> # For Excel
    >>> new_from_excel = zeros((a, b, c), dtype=int)
    >>> for label in a:
    ...     new_from_excel[label] = read_excel('my_file.xlsx', label)
    >>> # But, to avoid loading the file in Excel repeatedly (which is very inefficient),
    >>> # this particular example should rather be written like this:
    >>> new_from_excel = zeros((a, b, c), dtype=int)
    >>> with open_excel('my_file.xlsx') as wb:
    ...     for label in a:
    ...         new_from_excel[label] = wb[label].load()
    
    >>> # For HDF
    >>> new_from_hdf = zeros((a, b, c), dtype=int)
    >>> for label in a:
    ...     new_from_hdf[label] = read_hdf('my_file.h5', label)
    
  • allowed setting the name of a Group using another Group or Axis (closes issue 341):

    >>> arr = ndrange('axis=a,a0..a3,b,b0..b3,c,c0..c3')
    >>> arr
    axis  a  a0  a1  a2  a3  b  b0  b1  b2  b3   c  c0  c1  c2  c3
          0   1   2   3   4  5   6   7   8   9  10  11  12  13  14
    >>> # matches('^.$') will select labels with only one character: 'a', 'b' and 'c'
    >>> groups = tuple(arr.axis.startswith(code) >> code for code in arr.axis.matches('^.$'))
    >>> groups
    (axis['a', 'a0', 'a1', 'a2', 'a3'] >> 'a',
     axis['b', 'b0', 'b1', 'b2', 'b3'] >> 'b',
     axis['c', 'c0', 'c1', 'c2', 'c3'] >> 'c')
    >>> arr.sum(groups)
    axis   a   b   c
          10  35  60
    
  • allowed to test if an array contains a label using the in operator (closes issue 343):

    >>> arr = ndrange('age=0..99;sex=M,F')
    >>> 'M' in arr
    True
    >>> 'Male' in arr
    False
    >>> # this can be useful for example in an 'if' statement
    >>> if 102 not in arr:
    ...     # with 'reindex', we extend 'age' axis to 102
    ...     arr = arr.reindex('age', Axis('age=0..102'), fill_value=0)
    >>> arr.info
    103 x 2
     age [103]: 0 1 2 ... 100 101 102
     sex [2]: 'M' 'F'
    
  • allowed to create a group on an axis using labels of another axis (closes issue 362):

    >>> year = Axis('year=2000..2017')
    >>> even_year = Axis(range(2000, 2017, 2), 'even_year')
    >>> group_even_year = year[even_year]
    >>> group_even_year
    year[2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016]
    
  • split_axes (formerly split_axis) now allows to split several axes at once (closes issue 366):

    >>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1')
    >>> combined
    a_b\c_d  c0_d0  c0_d1  c1_d0  c1_d1
      a0_b0      0      1      2      3
      a0_b1      4      5      6      7
      a1_b0      8      9     10     11
      a1_b1     12     13     14     15
    >>> combined.split_axes(['a_b', 'c_d'])
     a   b  c\d  d0  d1
    a0  b0   c0   0   1
    a0  b0   c1   2   3
    a0  b1   c0   4   5
    a0  b1   c1   6   7
    a1  b0   c0   8   9
    a1  b0   c1  10  11
    a1  b1   c0  12  13
    a1  b1   c1  14  15
    >>> combined.split_axes({'a_b': ('A', 'B'), 'c_d': ('C', 'D')})
     A   B  C\D  d0  d1
    a0  b0   c0   0   1
    a0  b0   c1   2   3
    a0  b1   c0   4   5
    a0  b1   c1   6   7
    a1  b0   c0   8   9
    a1  b0   c1  10  11
    a1  b1   c0  12  13
    a1  b1   c1  14  15
    
  • argument axes of split_axes has become optional: defaults to all axes whose name contains the specified delimiter (closes issue 365):

    >>> combined = ndrange('a_b = a0_b0..a1_b1; c_d = c0_d0..c1_d1')
    >>> combined
    a_b\c_d  c0_d0  c0_d1  c1_d0  c1_d1
      a0_b0      0      1      2      3
      a0_b1      4      5      6      7
      a1_b0      8      9     10     11
      a1_b1     12     13     14     15
    >>> combined.split_axes()
     a   b  c\d  d0  d1
    a0  b0   c0   0   1
    a0  b0   c1   2   3
    a0  b1   c0   4   5
    a0  b1   c1   6   7
    a1  b0   c0   8   9
    a1  b0   c1  10  11
    a1  b1   c0  12  13
    a1  b1   c1  14  15
    
  • allowed to perform several axes combinations at once with the combine_axes() method (closes issue 382):

    >>> arr = ndtest((2, 2, 2, 2))
    >>> arr
     a   b  c\d  d0  d1
    a0  b0   c0   0   1
    a0  b0   c1   2   3
    a0  b1   c0   4   5
    a0  b1   c1   6   7
    a1  b0   c0   8   9
    a1  b0   c1  10  11
    a1  b1   c0  12  13
    a1  b1   c1  14  15
    >>> arr.combine_axes([('a', 'c'), ('b', 'd')])
    a_c\b_d  b0_d0  b0_d1  b1_d0  b1_d1
      a0_c0      0      1      4      5
      a0_c1      2      3      6      7
      a1_c0      8      9     12     13
      a1_c1     10     11     14     15
    >>> # set output axes names by passing a dictionary
    >>> arr.combine_axes({('a', 'c'): 'ac', ('b', 'd'): 'bd'})
    ac\bd  b0_d0  b0_d1  b1_d0  b1_d1
    a0_c0      0      1      4      5
    a0_c1      2      3      6      7
    a1_c0      8      9     12     13
    a1_c1     10     11     14     15
    
  • allowed to use keyword arguments in set_labels (closes issue 383):

    >>> a = ndrange('nat=BE,FO;sex=M,F')
    >>> a
    nat\sex  M  F
         BE  0  1
         FO  2  3
    >>> a.set_labels(sex='Men,Women', nat='Belgian,Foreigner')
      nat\sex  Men  Women
      Belgian    0      1
    Foreigner    2      3
    
  • allowed passing an axis to set_labels as ‘labels’ argument (closes issue 408).

  • added data type (dtype) to array.info (closes issue 454):

    >>> arr = ndtest((2, 2), dtype=float)
    >>> arr
    a\b   b0   b1
     a0  0.0  1.0
     a1  2.0  3.0
    >>> arr.info
    2 x 2
     a [2]: 'a0' 'a1'
     b [2]: 'b0' 'b1'
    dtype: float64
    
  • To create a 1D array using from_string() and the default separator ” “, a tabulation character \t (instead of - previously) must be added in front of the data line:

    >>> from_string('''sex  M  F
    ...                \t   0  1''')
    sex  M  F
         0  1
    
  • viewer window title also includes the dtype of the current displayed array (closes issue 85)

  • viewer window title uses only the file name instead of the entire file path as it made titles too long in some cases.

  • when editing .csv files, the viewer window title will be “directoryfname.csv - axes_info” instead of having the file name repeated as before (“dirfname.csv - fname: axes_info”).

  • the viewer will not update digits/scientific notation nor colors when the filter changes, so that numbers are more easily comparable when quickly changing the filter, especially using the scrollwheel on filter boxes.

  • NaN values display as grey in the viewer so that they stand out more.

  • compare() will color values depending on relative difference instead of absolute difference as this is usually more useful.

  • compare(sessions) uses nan_equal to compare arrays so that identical arrays are not marked different when they contain NaN values.

  • changed compare() “stacked axis” names: arrays -> array and sessions -> session because that reads a bit more naturally.

Fixes

  • fixed array creation with axis(es) given as string containing only one label (axis name and label were inverted).

  • fixed reading an array from a CSV or Excel file when the columns axis is not explicitly named (via \). For example, let’s say we want to read a CSV file ‘pop.csv’ with the following content (indented for clarity)

    sex, 2015, 2016
      F,   11,   13
      M,   12,   10
    

    The result of function read_csv is:

    >>> pop = read_csv('pop.csv')
    >>> pop
    sex\{1}  2015  2016
          F    11    13
          M    12    10
    

    Closes issue 372.

  • fixed converting a 1xN Pandas DataFrame to an array using aslarray (closes issue 427):

    >>> df = pd.DataFrame([[1, 2, 3]], index=['a0'], columns=['b0', 'b1', 'b2'])
    >>> df
        b0  b1  b2
    a0   1   2   3
    >>> aslarray(df)
    {0}\{1}  b0  b1  b2
         a0   1   2   3
    
    >>> # setting name to index and columns
    >>> df.index.name = 'a'
    >>> df.columns.name = 'b'
    >>> df
    b   b0  b1  b2
    a
    a0   1   2   3
    >>> aslarray(df)
    a\b  b0  b1  b2
     a0   1   2   3
    
  • fixed original file being deleted when trying to overwrite a file via Session.save or open_excel failed (closes issue 441)

  • fixed loading arrays from Excel sheets containing blank cells below or right of the array to read (closes issue 443)

  • fixed unary and binary operations between sessions failing entirely when the operation failed/was invalid on any array. Now the result will be nan for that array but the operation will carry on for other arrays.

  • fixed stacking sessions failing entirely when the stacking failed on any array. Now the result will be nan for that array but the operation will carry on for other arrays.

  • fixed stacking arrays with anonymous axes.

  • fixed applying split_axes on an array with labels of type ‘Object’ (could happen when an array is read from a file).

  • fixed background color in the viewer when using filters in the compare() dialog (closes issue 66)

  • fixed autoresize of columns by double clicking between column headers (closes issue 43)

  • fixed representing a 0D array (scalar) in the viewer (closes issue 71)

  • fixed viewer not displaying an error message when saving or loading a file failed (closes issue 75)

  • fixed array.split_axis when the combined axis does not contain all the combination of labels resulting from the split (closes issue 369).

  • fixed array.split_axis when combined labels are not sorted by the first part then second part (closes issue 364).

  • fixed opening .csv files in the editor will create variables named using only the filename without extension (instead of being named using the full path of the file – making it almost useless). Closes issue 90.

  • fixed deleting a variable (using the del key in the list) not marking the session/file as being modified.

  • fixed the link to the tutorial (Help->Online Tutorial) (closes issue 92).

  • fixed inplace modifications of arrays in the console (via array[xxx] = value) not updating the view (closes issue 94).

  • fixed background color in compare() being wrong after changing axes order by drag-and-dropping them (closes issue 89).

  • fixed the whole array/compare being the same color in the presence of -inf or +inf in the array.

Version 0.25.2

Released on 2017-09-06.

Miscellaneous improvements

  • Excel Workbooks opened with open_excel(visible=False) will use the global Excel instance by default and those using visible=True will use a new Excel instance by default (closes issue 405).

Fixes

  • fixed view() which did not show any array (closes issue 57).
  • fixed exceptions in the viewer crashing it when a Qt app was created (e.g. from a plot) before the viewer was started (closes issue 58).
  • fixed compare() arrays names not being determined correctly (closes issue 61).
  • fixed filters and title not being updated when displaying array created via the console (closes issue 55).
  • fixed array grid not being updated when selecting a variable when no variable was selected (closes issue 56).
  • fixed copying or plotting multiple rows in the editor when they were selected via drag and drop on headers (closes issue 59).
  • fixed digits not being automatically updated when changing filters.

Version 0.25.1

Released on 2017-09-04.

Miscellaneous improvements

  • Deprecated methods display a warning message when they are still used (replaced DeprecationWarning by FutureWarning). Closes issue 310.
  • updated documentation of method with_total (closes issue 89).
  • trying to set values of a subset by passing an array with incompatible axes displays a better error message (closes issue 268).

Fixes

  • fixed error raised in viewer when switching between arrays when a filter was set.

  • fixed displaying empty array when starting the viewer or a new session in it.

  • fixed Excel instance created via to_excel() and open_excel() without any filename being closed at the end of the Python program (closes issue 390).

  • fixed the view(), edit() and compare() functions not being available in the viewer console.

  • fixed row and column resizing by double clicking on the edge of an header cell.

  • fixed New and Open in the menu File of the viewer when IPython console is not available.

  • fixed getting a subset of an array by mixing boolean filters and other filters (closes issue 246):

    >>> arr = ndrange('a=a0..a2;b=0..3')
    >>> arr
    a\b  0  1   2   3
     a0  0  1   2   3
     a1  4  5   6   7
     a2  8  9  10  11
    >>> arr['a0,a2', x.b < 2]
    a\b  0  1
     a0  0  1
     a2  8  9
    

    Warning: when mixed with other filters, boolean filters are limited to one dimension.

  • fixed setting an array values using array.points[key] = value when value is an LArray (closes issue 368).

  • fixed using syntax ‘int..int’ in a selection (closes issue 350):

    >>> arr = ndrange('a=2017..2012')
    >>> arr
    a  2017  2016  2015  2014  2013  2012
          0     1     2     3     4     5
    >>> arr['2012..2015']
    a  2012  2013  2014  2015
          5     4     3     2
    
  • fixed mixing ‘..’ sequences and spaces in an indexing string (closes issue 389):

    >>> arr = ndtest(7)
    >>> arr
    a  a0  a1  a2  a3  a4  a5  a6
        0   1   2   3   4   5   6
    >>> arr['a0, a2, a4..a6']
    a  a0  a2  a4  a5  a6
        0   2   4   5   6
    
  • fixed indexing/aggregating using groups with renaming (using >>) when the axis has mixed type labels (object dtype).

Version 0.25

Released on 2017-08-22.

New features

  • viewer functions (view, edit and compare) have been moved to the separate larray-editor package, which needs to be installed separately, unless you are using larrayenv. Closes issue 332.

  • installing larray-editor (or larrayenv) from conda environment creates a new menu ‘LArray’ in the Windows start menu. It contains a link to open the documentation, a shortcut to launch the user interface in edition mode and a shortcut to update larrayenv. Closes issue 281.

  • added possibility to transpose an array in the viewer by dragging and dropping axes’ names in the filter bar.

  • implemented array.align(other_array) which makes two arrays compatible with each other (by making all common axes compatible). This is done by adding, removing or reordering labels for each common axis according to the join method used:

    • outer: will use a label if it is in either arrays axis (ordered like the first array). This is the default as it results in no information loss.
    • inner: will use a label if it is in both arrays axis (ordered like the first array)
    • left: will use the first array axis labels
    • right: will use the other array axis labels

    The fill value for missing labels defaults to nan.

    >>> arr1 = ndtest((2, 3))
    >>> arr1
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
    >>> arr2 = -ndtest((3, 2))
    >>> # reorder array to make the test more interesting
    >>> arr2 = arr2[['b1', 'b0']]
    >>> arr2
    a\\b  b1  b0
     a0  -1   0
     a1  -3  -2
     a2  -5  -4
    

    Align arr1 and arr2

    >>> aligned1, aligned2 = arr1.align(arr2)
    >>> aligned1
    a\b   b0   b1   b2
     a0  0.0  1.0  2.0
     a1  3.0  4.0  5.0
     a2  nan  nan  nan
    >>> aligned2
    a\b    b0    b1   b2
     a0   0.0  -1.0  nan
     a1  -2.0  -3.0  nan
     a2  -4.0  -5.0  nan
    

    After aligning all common axes, one can then do operations between the two arrays

    >>> aligned1 + aligned2
    a\b   b0   b1   b2
     a0  0.0  0.0  nan
     a1  1.0  1.0  nan
     a2  nan  nan  nan
    

    The fill value for missing labels defaults to nan but can be changed to any compatible value.

    >>> aligned1, aligned2 = arr1.align(arr2, fill_value=0)
    >>> aligned1
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
     a2   0   0   0
    >>> aligned2
    a\b  b0  b1  b2
     a0   0  -1   0
     a1  -2  -3   0
     a2  -4  -5   0
    >>> aligned1 + aligned2
    a\b  b0  b1  b2
     a0   0   0   2
     a1   1   1   5
     a2  -4  -5   0
    
  • implemented Session.transpose(axes) to reorder axes of all arrays within a session, ignoring missing axes for each array. For example, let us first create a test session and a small helper function to display sessions as a short summary.

    >>> arr1 = ndtest((2, 2, 2))
    >>> arr2 = ndtest((2, 2))
    >>> sess = Session([('arr1', arr1), ('arr2', arr2)])
    >>> def print_summary(s):
    ...     print(s.summary("{name} -> {axes_names}"))
    >>> print_summary(sess)
    arr1 -> a, b, c
    arr2 -> a, b
    

    Put the ‘b’ axis in front of all arrays

    >>> print_summary(sess.transpose('b'))
    arr1 -> b, a, c
    arr2 -> b, a
    

    Axes missing on an array are ignored (‘c’ for arr2 in this case)

    >>> print_summary(sess.transpose('c', 'b'))
    arr1 -> c, b, a
    arr2 -> b, a
    

    Use … to move axes to the end

    >>> print_summary(sess.transpose(..., 'a'))
    arr1 -> b, c, a
    arr2 -> b, a
    
  • implemented unary operations on Session, which means one can negate all arrays in a Session or take the absolute value of all arrays in a Session without writing an explicit loop for that.

    >>> arr1 = ndtest(2)
    >>> arr1
    a  a0  a1
        0   1
    >>> arr2 = ndtest(4) - 1
    >>> arr2
    a  a0  a1  a2  a3
       -1   0   1   2
    >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)])
    >>> sess2 = -sess1
    >>> sess2.arr1
    a  a0  a1
        0  -1
    >>> sess2.arr2
    a  a0  a1  a2  a3
        1   0  -1  -2
    >>> sess3 = abs(sess1)
    >>> sess3.arr2
    a  a0  a1  a2  a3
        1   0   1   2
    
  • implemented stacking sessions using stack().

    Let us first create two test sessions. For example suppose we have a session storing the results of a baseline simulation:

    >>> arr1 = ndtest(2)
    >>> arr1
    a  a0  a1
        0   1
    >>> arr2 = ndtest(3)
    >>> arr2
    a  a0  a1  a2
        0   1   2
    >>> baseline = Session([('arr1', arr1), ('arr2', arr2)])
    

    and another session with a variant

    >>> arr1variant = arr1 * 2
    >>> arr1variant
    a  a0  a1
        0   2
    >>> arr2variant = 2 - arr2 / 2
    >>> arr2variant
    a   a0   a1   a2
       2.0  1.5  1.0
    >>> variant = Session([('arr1', arr1variant), ('arr2', arr2variant)])
    

    then we stack them together

    >>> stacked = stack([('baseline', baseline), ('variant', variant)], 'sessions')
    >>> stacked
    Session(arr1, arr2)
    >>> stacked.arr1
    a\sessions  baseline  variant
            a0         0        0
            a1         1        2
    >>> stacked.arr2
    a\sessions  baseline  variant
            a0       0.0      2.0
            a1       1.0      1.5
            a2       2.0      1.0
    

    Combined with the fact that we can compute some very simple expressions on sessions, this can be extremely useful to quickly compare all arrays of several sessions (e.g. simulation variants):

    >>> diff = variant - baseline
    >>> # compute the absolute difference and relative difference for each array of the sessions
    >>> stacked = stack([('baseline', baseline),
                         ('variant', variant),
                         ('diff', diff),
                         ('abs diff', abs(diff)),
                         ('rel diff', diff / baseline)], 'sessions')
    >>> stacked
    Session(arr1, arr2)
    >>> stacked.arr2
    a\sessions  baseline  variant  diff  abs diff  rel diff
            a0       0.0      2.0   2.0       2.0       inf
            a1       1.0      1.5   0.5       0.5       0.5
            a2       2.0      1.0  -1.0       1.0      -0.5
    
  • implemented Axis.align(other_axis) and AxisCollection.align(other_collection) which makes two axes / axis collections compatible with each other, see LArray.align above.

  • implemented Session.apply(function) to apply a function to all elements (arrays) of a Session and return a new Session.

    Let us first create a test session

    >>> arr1 = ndtest(2)
    >>> arr1
    a  a0  a1
        0   1
    >>> arr2 = ndtest(3)
    >>> arr2
    a  a0  a1  a2
        0   1   2
    >>> sess1 = Session([('arr1', arr1), ('arr2', arr2)])
    >>> sess1
    Session(arr1, arr2)
    

    Then define the function we want to apply to all arrays of our session

    >>> def increment(element):
    ...     return element + 1
    

    Apply it

    >>> sess2 = sess1.apply(increment)
    >>> sess2.arr1
    a  a0  a1
        1   2
    >>> sess2.arr2
    a  a0  a1  a2
        1   2   3
    
  • implemented setting the value of multiple points using array.points[labels] = value

    >>> arr = ndtest((3, 4))
    >>> arr
    a\b  b0  b1  b2  b3
     a0   0   1   2   3
     a1   4   5   6   7
     a2   8   9  10  11
    

    Now, suppose you want to retrieve several specific combinations of labels, for example (a0, b1), (a0, b3), (a1, b0) and (a2, b2). You could write a loop like this:

    >>> values = []
    >>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]:
    ...     values.append(arr[a, b])
    >>> values
    [1, 3, 4, 10]
    

    but you could also (this already worked in previous versions) use array.points like:

    >>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']]
    a,b  a0,b1  a0,b3  a1,b0  a2,b2
             1      3      4     10
    

    which has the advantages of being both much faster and keep more information. Now suppose you want to set the value of those points, you could write:

    >>> for a, b in [('a0', 'b1'), ('a0', 'b3'), ('a1', 'b0'), ('a2', 'b2')]:
    ...     arr[a, b] = 42
    >>> arr
    a\b  b0  b1  b2  b3
     a0   0  42   2  42
     a1  42   5   6   7
     a2   8   9  42  11
    

    but now you can also use the faster alternative:

    >>> arr.points[['a0', 'a0', 'a1', 'a2'], ['b1', 'b3', 'b0', 'b2']] = 42
    

Miscellaneous improvements

  • added icon to display in Windows start menu and editor windows.

  • viewer keeps labels visible even when scrolling (label rows and columns are now frozen).

  • added ‘Getting Started’ section in documentation.

  • implemented axes argument to ipfp to specify on which axes the fitting procedure should be applied (closes issue 185). For example, let us assume you have a 3D array, such as:

    >>> initial = ndrange('a=a0..a9;b=b0..b9;year=2000..2016')
    

    and you want to apply a 2D fitting procedure for each value of the year axis. Previously, you had to loop on that year axis explicitly and call ipfp within the loop, like:

    >>> result = zeros(initial.axes)
    >>> for year in initial.year:
    ...     current = initial[year]
    ...     # assume you have some targets for each year
    ...     current_targets = [current.sum(x.a) + 1, current.sum(x.b) + 1]
    ...     result[year] = ipfp(current_targets, current)
    

    Now you can apply the procedure on all years at once, by telling you want to do the fitting procedure on the other axes. This is a bit shorter to type, but this is also much faster.

    >>> all_targets = [initial.sum(x.a) + 1, initial.sum(x.b) + 1]
    >>> result = ipfp(all_targets, initial, axes=(x.a, x.b))
    
  • made ipfp 10 to 20% faster (even without using the axes argument).

  • implemented Session.to_globals(inplace=True) which will update the content of existing arrays instead of creating new variables and overwriting them. This ensures the arrays have the same axes in the session than the existing variables.

  • added the ability to provide a pattern when loading several .csv files as a session. Among others, patterns can use * to match any number of characters and ? to match any single character.

    >>> s = Session()
    >>> # load all .csv files starting with "output" in the data directory
    >>> s.load('data/output*.csv')
    
  • stack can be used with keyword arguments when labels are “simple strings” (i.e. no integers, no punctuation, no string starting with integers, etc.). This is an attractive alternative but as it only works in the usual case and not in all cases, it is not recommended to use it except in the interactive console.

    >>> arr1 = ones('nat=BE,FO')
    >>> arr1
    nat   BE   FO
         1.0  1.0
    >>> arr2 = zeros('nat=BE,FO')
    >>> arr2
    nat   BE   FO
         0.0  0.0
    >>> stack(M=arr1, F=arr2, axis='sex=M,F')
    nat\\sex    M    F
         BE  1.0  0.0
         FO  1.0  0.0
    

    Without passing an explicit order for labels like above (or an axis object), it should only be used on Python 3.6 or later because keyword arguments are NOT ordered on earlier Python versions.

    >>> # use this only on Python 3.6 and later
    >>> stack(M=arr1, F=arr2, axis='sex')
    nat\\sex    M    F
         BE  1.0  0.0
         FO  1.0  0.0
    
  • binary operations between session now ignore type errors. For example, if you are comparing two sessions with many arrays by computing the difference between them but a few arrays contain strings, the whole operation will not fail, the concerned arrays will be assigned a nan instead.

  • added optional argument ignore_exceptions to Session.load to ignore exceptions during load. This is mostly useful when trying to load many .csv files in a Session and some of them have an invalid format but you want to load the others.

Fixes

  • fixed disambiguating an ambiguous key by adding the axis within the string, for example arr[‘axis_name[ambiguouslabel]’] (closes issue 331).

  • fixed converting a string group to integer or float using int() and float() (when that makes sense).

    >>> a = Axis('a=10,20,30,total')
    >>> a
    Axis(['10', '20', '30', 'total'], 'a')
    >>> str(a.i[0])
    '10'
    >>> int(a.i[0])
    10
    >>> float(a.i[0])
    10.0
    

Version 0.24.1

Released on 2017-06-14.

Fixes

  • updated the tutorial to use version 0.24 syntax.

Version 0.24

Released on 2017-06-14.

New features

  • implemented Session.to_globals which creates global variables from variables stored in the session (closes issue 276). Note that this should usually only be used in an interactive console and not in a script. Code editors are confused by this kind of manipulation and will likely consider as invalid the code using variables created in this way. Additionally, when using this method auto-completion, “show definition”, “go to declaration” and other similar code editor features will probably not work for the variables created in this way and any variable derived from them.

    >>> s = Session(arr1=ndtest(3), arr2=ndtest((2, 2)))
    >>> s.to_globals()
    >>> arr1
    a  a0  a1  a2
        0   1   2
    >>> arr2
    a\b  b0  b1
     a0   0   1
     a1   2   3
    
  • added new boolean argument ‘overwrite’ to Session.save, Session.to_hdf, Session.to_excel and Session.to_pickle methods (closes issue 293). If overwrite=True and the target file already existed, it is deleted and replaced by a new one. This is the new default behavior. If overwrite=False, an existing file is updated (like it was in previous larray versions):

    >>> arr1, arr2, arr3 = ndtest((2, 2)), ndtest(4), ndtest((3, 2))
    >>> s = Session([('arr1', arr1), ('arr2', arr2), ('arr3', arr3)])
    
    >>> # save arr1, arr2 and arr3 in file output.h5
    >>> s.save('output.h5')
    
    >>> # replace arr1 and create arr4 + put them in an second session
    >>> arr1, arr4 = ndtest((3, 3)), ndtest((2, 3))
    >>> s2 = Session([('arr1', arr1), ('arr4', arr4)])
    
    >>> # replace arr1 and add arr4 in file output.h5
    >>> s2.save('output.h5', overwrite=False)
    
    >>> # erase content of 'output.h5' and save only arrays contained in the second session
    >>> s2.save('output.h5')
    

Miscellaneous improvements

  • renamed create_sequential() to sequence() (closes issue 212).

  • improved auto-completion in ipython interactive consoles (e.g. the viewer console) for Axis, AxisCollection, Group and Workbook objects. These objects can now complete keys within [].

    >>> gender = Axis('gender=Male,Female')
    >>> gender
    Axis(['Male', 'Female'], 'gender')
    gender['Female
    >>> gender['Fe<tab>  # will be completed to `gender['Female`
    
    >>> arr = ndrange(gender)
    >>> arr.axes['gen<tab>  # will be completed to `arr.axes['gender`
    
    >>> wb = open_excel()
    >>> wb['Sh<tab>  # will be completed to `wb['Sheet1`
    
  • added documentation for Session methods (closes issue 277).

  • allowed to provide explict names for arrays or sessions in compare(). Closes issue 307.

Fixes

  • fixed title argument of ndtest creation function: title was not passed to the returned array.
  • fixed create_sequential when arguments initial and inc are array and scalar respectively (closes issue 288).
  • fixed auto-completion of attributes of LArray and Group objects (closes issue 302).
  • fixed name of arrays/sessions in compare() not being inferred correctly (closes issue 306).
  • fixed indexing Excel sheets by position to always yield the requested shape even when bounds are outside the range of used cells. Closes issue 273.
  • fixed the array() method on excel.Sheet returning float labels when int labels are expected.
  • fixed getting float data instead of int when converting an Excel Sheet or Range to an larray or numpy array.
  • fixed some warning messages to point to the correct line in user code.
  • fixed crash of Session.save method when it contained 0D arrays. They are now skipped when saving a session (closes issue 291).
  • fixed Session.save and Session.to_excel failing to create new Excel files (it only worked if the file already existed). Closes issue 313.
  • fixed Session.load(file, engine=’pandas_excel’) : axes were considered as anonymous.

Version 0.23

Released on 2017-05-30.

Miscellaneous improvements

  • changed display of arrays (closes issue 243):

    >>> ndtest((2, 3))
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
    

    instead of

    >>> ndtest((2, 3))
    a\b | b0 | b1 | b2
     a0 |  0 |  1 |  2
     a1 |  3 |  4 |  5
    
  • .. can now be used within keys (between []). Previously it could only be used to define new axes. As a reminder, it generates increasing values between the two bounds. It is slightly different from : which takes everything between the two bounds in the axis order.

    >>> arr = ndrange('a=a1,a0,a2,a3')
    >>> arr
    a  a1  a0  a2  a3
        0   1   2   3
    >>> arr['a1..a3']
    a  a1  a2  a3
        0   2   3
    

    this is different from : which takes everything in between the two bounds :

    >>> arr['a1:a3']
    a  a1  a0  a2  a3
        0   1   2   3
    
  • in both axes definitions and keys (within []) .. can now be mixed with , and other .. :

    >>> arr = ndrange('code=A,C..E,G,X..Z')
    >>> arr
    code  A  C  D  E  G  X  Y  Z
          0  1  2  3  4  5  6  7
    >>> arr['A,Z..X,G']
    code  A  Z  Y  X  G
          0  7  6  5  4
    
  • within .. extra zeros are only padded to numbers if zeros are present in the pattern.

    >>> ndrange('code=A1..A12')
    code  A1  A2  A3  A4  A5  A6  A7  A8  A9  A10  A11  A12
           0   1   2   3   4   5   6   7   8    9   10   11
    
    >>> ndrange('code=A01..A12')
    code  A01  A02  A03  A04  A05  A06  A07  A08  A09  A10  A11  A12
            0    1    2    3    4    5    6    7    8    9   10   11
    

    in previous larray versions, the two above definitions returned the second array.

  • set sep argument of from_string function to ‘ ‘ by default (closes issue 271). For 1D array, a “-” must be added in front of the data line.

    >>> from_string('''sex  M  F
                       -    0  1''')
    sex  M  F
         0  1
    >>> from_string('''nat\\sex  M  F
                       BE        0  1
                       FO        2  3''')
    nat\sex  M  F
         BE  0  1
         FO  2  3
    
  • improved error message when trying to access nonexistent sheet in an Excel workbook (closes issue 266).

  • when creating an Axis from a Group and no explicit name was given, reuse the name of the group axis.

    >>> a = Axis('a=a0..a2')
    >>> Axis(a[:'a1'])
    Axis(['a0', 'a1'], 'a')
    
  • allowed to create an array using a single group as if it was an Axis.

    >>> a = Axis('a=a0..a2')
    >>> ndrange(a)
    a  a0  a1  a2
        0   1   2
    >>> # using a group as an axis
    >>> ndrange(a[:'a1'])
    a  a0  a1
        0   1
    
  • allowed to use axes (Axis objects) to subset arrays (part of issue 210).

    >>> arr = ndtest((2, 3))
    >>> arr
    a\b  b0  b1  b2
     a0   0   1   2
     a1   3   4   5
    >>> b2 = Axis('b=b0,b2')
    >>> arr[b2]
    a\b  b0  b2
     a0   0   2
     a1   3   5
    
  • improved string representation of Excel workbooks and sheets (they mention the actual file/sheet they correspond to). This is mostly useful in the interactive console to check what an object corresponds to.

    >>> wb = open_excel()
    >>> wb
    <larray.io.excel.Workbook [Book1]>
    >>> wb[0]
    <larray.io.excel.Sheet [Book1]Sheet1>
    

Fixes

  • open_excel(‘non existent file’) will raise an explicit error immediately when overwrite_file is False, instead of failing at a seemingly random point later on (closes issue 265).

  • integer-like strings in axis definition strings using , are converted to integers to be consistent with string definitions using ... In other words, ndrange(‘a=1,2,3’) did not create the same array than ndrange(‘a=1..3’).

  • fixed reading a single cell from an Excel sheet.

  • fixed script execution not resuming after quitting the viewer when it was called using view(a_single_array).

  • fixed opening the viewer after showing a plot window.

  • do not display an error when setting the value of an element of a non LArray sequence in the viewer console

    >>> l = [1, 2, 3]
    >>> l[0] = 42
    

Version 0.22

Released on 2017-05-11.

New features

  • viewer: added a menu bar with the ability to clear the current session, save all its arrays to a file (.h5, .xlsx, or a directory containing multiple .csv files), and load arrays from such a file (closes issue 88).

    WARNING: Only array objects are currently saved. It means that scalars, functions or others non-LArray objects defined in the console are not saved in the file.

  • implemented a new describe() method on arrays to give quick summary statistics. By default, it includes the number of non-NaN values, the mean, standard deviation, minimum, 25, 50 and 75 percentiles and maximum.

    >>> arr = ndrange('gender=Male,Female;year=2014..2020').astype(float)
    >>> arr
    gender\year | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020
           Male |  0.0 |  1.0 |  2.0 |  3.0 |  4.0 |  5.0 |  6.0
         Female |  7.0 |  8.0 |  9.0 | 10.0 | 11.0 | 12.0 | 13.0
    >>> arr.describe()
    statistic | count | mean |               std | min |  25% | 50% |  75% |  max
              |  14.0 |  6.5 | 4.031128874149275 | 0.0 | 3.25 | 6.5 | 9.75 | 13.0
    

    an optional keyword argument allows to specify different percentiles to include

    >>> arr.describe(percentiles=[20, 40, 60, 80])
    statistic | count | mean |               std | min | 20% | 40% | 60% |  80% |  max
              |  14.0 |  6.5 | 4.031128874149275 | 0.0 | 2.6 | 5.2 | 7.8 | 10.4 | 13.0
    

    its sister method, describe_by() was also implemented to give quick summary statistics along axes or groups.

    >>> arr.describe_by('gender')
    gender\statistic | count | mean | std | min | 25% |  50% |  75% |  max
                Male |   7.0 |  3.0 | 2.0 | 0.0 | 1.5 |  3.0 |  4.5 |  6.0
              Female |   7.0 | 10.0 | 2.0 | 7.0 | 8.5 | 10.0 | 11.5 | 13.0
    >>> arr.describe_by('gender', (x.year[:2015], x.year[2019:]))
    gender | year\statistic | count | mean | std |  min |   25% |  50% |   75% |  max
      Male |          :2015 |   2.0 |  0.5 | 0.5 |  0.0 |  0.25 |  0.5 |  0.75 |  1.0
      Male |          2019: |   2.0 |  5.5 | 0.5 |  5.0 |  5.25 |  5.5 |  5.75 |  6.0
    Female |          :2015 |   2.0 |  7.5 | 0.5 |  7.0 |  7.25 |  7.5 |  7.75 |  8.0
    Female |          2019: |   2.0 | 12.5 | 0.5 | 12.0 | 12.25 | 12.5 | 12.75 | 13.0
    

    This closes issue 184.

  • implemented reindex allowing to change the order of labels and add/remove some of them to one or several axes:

    >>> arr = ndtest((2, 2))
    >>> arr
    a\b | b0 | b1
     a0 |  0 |  1
     a1 |  2 |  3
    >>> arr.reindex(x.b, ['b1', 'b2', 'b0'], fill_value=-1)
    a\b | b1 | b2 | b0
     a0 |  1 | -1 |  0
     a1 |  3 | -1 |  2
    >>> a = Axis('a', ['a1', 'a2', 'a0'])
    >>> b = Axis('b', ['b2', 'b1', 'b0'])
    >>> arr.reindex({'a': a, 'b': b}, fill_value=-1)
    a\b | b2 | b1 | b0
     a1 | -1 |  3 |  2
     a2 | -1 | -1 | -1
     a0 | -1 |  1 |  0
    

    using reindex one can make an array compatible with another array which has more/less labels or with labels in a different order:

    >>> arr2 = ndtest((3, 3))
    >>> arr2
    a\b | b0 | b1 | b2
     a0 |  0 |  1 |  2
     a1 |  3 |  4 |  5
     a2 |  6 |  7 |  8
    >>> arr.reindex(arr2.axes, fill_value=0)
    a\b | b0 | b1 | b2
     a0 |  0 |  1 |  0
     a1 |  2 |  3 |  0
     a2 |  0 |  0 |  0
    >>> arr.reindex(arr2.axes, fill_value=0) + arr2
    a\b | b0 | b1 | b2
     a0 |  0 |  2 |  2
     a1 |  5 |  7 |  5
     a2 |  6 |  7 |  8
    

    This closes issue 18.

  • added load_example_data function to load datasets used in tutorial and be able to reproduce examples. The name of the dataset must be provided as argument (there is currently only one available dataset). Datasets are returned as Session objects:

    >>> demo = load_example_data('demography')
    >>> demo.pop.info
    26 x 3 x 121 x 2 x 2
     time [26]: 1991 1992 1993 ... 2014 2015 2016
     geo [3]: 'BruCap' 'Fla' 'Wal'
     age [121]: 0 1 2 ... 118 119 120
     sex [2]: 'M' 'F'
     nat [2]: 'BE' 'FO'
    >>> demo.qx.info
    26 x 3 x 121 x 2 x 2
     time [26]: 1991 1992 1993 ... 2014 2015 2016
     geo [3]: 'BruCap' 'Fla' 'Wal'
     age [121]: 0 1 2 ... 118 119 120
     sex [2]: 'M' 'F'
     nat [2]: 'BE' 'FO'
    

    (closes issue 170)

  • implemented Axis.union, intersection and difference which produce new axes by combining the labels of the axis with the other labels.

    >>> letters = Axis('letters=a,b')
    >>> letters.union(Axis('letters=b,c'))
    Axis(['a', 'b', 'c'], 'letters')
    >>> letters.union(['b', 'c'])
    Axis(['a', 'b', 'c'], 'letters')
    >>> letters.intersection(['b', 'c'])
    Axis(['b'], 'letters')
    >>> letters.difference(['b', 'c'])
    Axis(['a'], 'letters')
    
  • implemented Group.union, intersection and difference which produce new groups by combining the labels of the group with the other labels.

    >>> letters = Axis('letters=a..d')
    >>> letters['a', 'b'].union(letters['b', 'c'])
    letters['a', 'b', 'c'].set()
    >>> letters['a', 'b'].union(['b', 'c'])
    letters['a', 'b', 'c'].set()
    >>> letters['a', 'b'].intersection(['b', 'c'])
    letters['b'].set()
    >>> letters['a', 'b'].difference(['b', 'c'])
    letters['a'].set()
    
  • viewer: added possibility to delete an array by pressing Delete on keyboard (closes issue 116).

  • Excel sheets in workbooks opened via open_excel can be renamed by changing their .name attribute:

    >>> wb = open_excel()
    >>> wb['old_sheet_name'].name = 'new_sheet_name'
    
  • Excel sheets in workbooks opened via open_excel can be deleted using “del”:

    >>> wb = open_excel()
    >>> del wb['sheet_name']
    
  • implemented PGroup.set() to transform a positional group to an LSet.

    >>> a = Axis('a=a0..a5')
    >>> a.i[:2].set()
    a['a0', 'a1'].set()
    

Miscellaneous improvements

  • inverted name and labels arguments when creating an Axis and made name argument optional (to create anonymous axes). Now, it is also possible to create an Axis by passing a single string of the kind ‘name=labels’:

    >>> anonymous = Axis('0..100')
    >>> age = Axis('age=0..100')
    >>> gender = Axis('M,F', 'gender')
    

    (closes issue 152)

  • renamed Session.dump, dump_hdf, dump_excel and dump_csv to save, to_hdf, to_excel and to_csv (closes issue 217).

  • changed default value of ddof argument for var and std functions from 0 to 1 (closes issue 190).

  • implemented a new syntax for stack(): stack({label1: value1, label2: value2}, axis)

    >>> nat = Axis('nat', 'BE, FO')
    >>> sex = Axis('sex', 'M, F')
    >>> males = ones(nat)
    >>> males
    nat |  BE |  FO
        | 1.0 | 1.0
    >>> females = zeros(nat)
    >>> females
    nat |  BE |  FO
        | 0.0 | 0.0
    

    In the case the axis has already been defined in a variable, this gives:

    >>> stack({'M': males, 'F': females}, sex)
    nat\sex |   M |   F
         BE | 1.0 | 0.0
         FO | 1.0 | 0.0
    

    Additionally, axis can now be an axis string definition in addition to an Axis object, which means one can write this:

    >>> stack({'M': males, 'F': females}, 'sex=M,F')
    

    It is better than the simpler but highly discouraged alternative:

    >>> stack([males, females), sex)
    

    because it is all too easy to invert labels. It is very hard to spot the error in the following line, and larray cannot spot it for you either:

    >>> stack([females, males), sex)
    nat\sex |   M |   F
         BE | 0.0 | 1.0
         FO | 0.0 | 1.0
    

    When creating an axis from scratch (it does not already exist in a variable), one might want to use this:

    >>> stack([males, females], 'sex=M,F')
    

    even if this could suffer, to a lesser extent, the same problem as above when stacking many arrays.

  • handle … in transpose method to avoid having to list all axes. This can be useful, for example, to change which axis is displayed in columns (closes issue 188).

    >>> arr.transpose(..., 'time')
    >>> arr.transpose('gender', ..., 'time')
    
  • made scalar Groups behave even more like their value: any method available on the value is available on the Group. For example, if the Group has a string value, the string methods are available on it (closes issue 202).

    >>> test = Axis('test', ['abc', 'a1-a2'])
    >>> test.i[0].upper()
    'ABC'
    >>> test.i[1].split('-')
    ['a1', 'a2']
    
  • updated AxisCollection.replace so as to replace one, several or all axes and to accept axis definitions as new axes.

    >>> arr = ndtest((2, 3))
    >>> axes = arr.axes
    >>> axes
    AxisCollection([
        Axis(['a0', 'a1'], 'a'),
        Axis(['b0', 'b1', 'b2'], 'b')
    ])
    >>> row = Axis(['r0', 'r1'], 'row')
    >>> column = Axis(['c0', 'c1', 'c2'], 'column')
    

    Replace several axes (keywords, list of tuple or dictionary)

    >>> axes.replace(a=row, b=column)
    >>> # or
    >>> axes.replace(a="row=r0,r1", b="column=c0,c1,c2")
    >>> # or
    >>> axes.replace([(x.a, row), (x.b, column)])
    >>> # or
    >>> axes.replace({x.a: row, x.b: column})
    AxisCollection([
        Axis(['r0', 'r1'], 'row'),
        Axis(['c0', 'c1', 'c2'], 'column')
    ])
    
  • added possibility to delete an array from a session:

    >>> s = Session({'a': ndtest((3, 3)), 'b': ndtest((2, 4)), 'c': ndtest((4, 2))})
    >>> s.names
    ['a', 'b', 'c']
    >>> del s.b
    >>> del s['c']
    >>> s.names
    ['a']
    
  • made create_sequential axis argument accept axis definitions in addition to Axis objects like, for example, using a string definition (closes issue 160).

    >>> create_sequential('year=2016..2019')
    year | 2016 | 2017 | 2018 | 2019
         |    0 |    1 |    2 |    3
    
  • replaced *args, **kwargs by explicit arguments in documentation of aggregation functions (sum, prod, mean, std, var, …). Closes issue 41.

  • improved documentation of plot method (closes issue 169).

  • improved auto-completion in ipython interactive consoles for both LArray and Session objects. LArray objects can now complete keys within [].

    >>> a = ndrange('sex=Male,Female')
    >>> a
    sex | Male | Female
        |    0 |      1
    >>> a['Fe<tab>`
    

    will autocomplete to a[‘Female. Sessions will now auto-complete both attributes (using session.) and keys (using session[).

    >>> s = Session({'a_nice_test_array': ndtest(10)})
    >>> s.a_<tab>
    

    will autocomplete to s.a_nice_test_array and s[‘a_<tab> will be completed to s[‘a_nice_test_array

  • made warning messages for division by 0 and invalid values (usually caused by 0 / 0) point to the user code line, instead of the corresponding line in the larray module.

  • preserve order of arrays in a session when saving to/loading from an .xlsx file.

  • when creating a session from a directory containing CSV files, the directory may now contain other (non-CSV) files.

  • several calls to open_excel from within the same program/script will now reuses a single global Excel instance. This makes Excel I/O much faster without having to create an instance manually using xlwings.App, and still without risking interfering with other instances of Excel opened manually (closes issue 245).

  • improved error message when trying to copy a sheet from one instance of Excel to another (closes issue 231).

Fixes

  • fixed keyword arguments such as out, ddof, … for aggregation functions (closes issue 189).

  • fixed percentile(_by) with multiple percentiles values, i.e. when argument q is a list/tuple (closes issue 192).

  • fixed group aggregates on integer arrays for median, percentile, var and std (closes issue 193).

  • fixed group sum over boolean arrays (closes issue 194).

  • fixed set_labels when inplace=True.

  • fixed array creation functions not raising an exception when called with wrong syntax func(axis1, axis2, …) instead of func([axis1, axis2, …]) (closes issue 203).

  • fixed position of added sheets in excel workbook: new sheets are appended instead of prepended (closes issue 229).

  • fixed Workbook behavior in case of new workbook: the first added sheet replaces the default sheet Sheet1 (closes issue 230).

  • fixed name of Workbook sheets created by copying another sheet (closes issue 244).

    >>> wb = open_excel()
    >>> wb['name_of_new_sheet'] = wb['name_of_sheet_to_copy']
    
  • fixed with_axes warning to refer to set_axes instead of replace_axes.

  • fixed displayed title in viewer: shows path to file associated with current session + current array info + extra info (closes issue 181)

Version 0.21

Released on 2017-03-28.

New features

  • implemented set_axes() method to replace one, several or all axes of an array (closes issue 67). The method with_axes() is now deprecated (set_axes() must be used instead).

    >>> arr = ndtest((2, 3))
    >>> arr
    a\b | b0 | b1 | b2
     a0 |  0 |  1 |  2
     a1 |  3 |  4 |  5
    >>> row = Axis('row', ['r0', 'r1'])
    >>> column = Axis('column', ['c0', 'c1', 'c2'])
    

    Replace one axis (second argument new_axis must be provided)

    >>> arr.set_axes(x.a, row)
    row\b | b0 | b1 | b2
       r0 |  0 |  1 |  2
       r1 |  3 |  4 |  5
    

    Replace several axes (keywords, list of tuple or dictionary)

    >>> arr.set_axes(a=row, b=column)
    or
    >>> arr.set_axes([(x.a, row), (x.b, column)])
    or
    >>> arr.set_axes({x.a: row, x.b: column})
    row\column | c0 | c1 | c2
            r0 |  0 |  1 |  2
            r1 |  3 |  4 |  5
    

    Replace all axes (list of axes or AxisCollection)

    >>> arr.set_axes([row, column])
    row\column | c0 | c1 | c2
            r0 |  0 |  1 |  2
            r1 |  3 |  4 |  5
    >>> arr2 = ndrange([row, column])
    >>> arr.set_axes(arr2.axes)
    row\column | c0 | c1 | c2
            r0 |  0 |  1 |  2
            r1 |  3 |  4 |  5
    
  • implemented Axis.replace to replace some labels from an axis:

    >>> sex = Axis('sex', ['M', 'F'])
    >>> sex
    Axis('sex', ['M', 'F'])
    >>> sex.replace('M', 'Male')
    Axis('sex', ['Male', 'F'])
    >>> sex.replace({'M': 'Male', 'F': 'Female'})
    Axis('sex', ['Male', 'Female'])
    
  • implemented from_string() method to create an array from a string (closes issue 96).

    >>> from_string('''age,nat\\sex, M, F
    ...                0,  BE,       0, 1
    ...                0,  FO,       2, 3
    ...                1,  BE,       4, 5
    ...                1,  FO,       6, 7''')
    age | nat\sex | M | F
      0 |      BE | 0 | 1
      0 |      FO | 2 | 3
      1 |      BE | 4 | 5
      1 |      FO | 6 | 7
    
  • allowed to use a regular expression in split_axis method (closes issue 106):

    >>> combined = ndrange('a_b = a0b0..a1b2')
    >>> combined
    a_b | a0b0 | a0b1 | a0b2 | a1b0 | a1b1 | a1b2
        |    0 |    1 |    2 |    3 |    4 |    5
    >>> combined.split_axis(x.a_b, regex='(\w{2})(\w{2})')
    a\b | b0 | b1 | b2
     a0 |  0 |  1 |  2
     a1 |  3 |  4 |  5
    
  • one can assign a new axis to several groups at the same time by using axis[groups]:

    >>> group1 = year[2001:2004]
    >>> group2 = year[2008,2009]
    >>> # let us change the year axis by time
    >>> x.time[group1, group2]
    (x.time[2001:2004], x.time[2008, 2009])
    
  • implemented Axis.by() which is equivalent to axis[:].by() and divides the axis into several groups of specified length:

    >>> year = Axis('year', '2010..2016')
    >>> year.by(3)
    (year.i[0:3], year.i[3:6], year.i[6:7])
    

    which is equivalent to (year[2010:2012], year[2013:2015], year[2016]). Like for groups, the optional second argument specifies the step between groups

    >>> year.by(3, step=4)
    (year.i[0:3], year.i[4:7])
    

    which is equivalent to (year[2010:2012], year[2014:2016]). And if step is smaller than length, we get overlapping groups, which can be useful for example for moving averages.

    >>> year.by(3, 2)
    (year.i[0:3], year.i[2:5], year.i[4:7], year.i[6:7])
    

    which is equivalent to (year[2010:2012], year[2012:2014], year[2014:2016], year[2016])

  • implemented larray_nan_equal to test whether two arrays are identical even in the presence of nan values. Two arrays are considered identical by larray_equal if they have exactly the same axes and data. However, since a nan value has the odd property of not being equal to itself, larray_equal returns False if either array contains a nan value. larray_nan_equal returns True if all not-nan data is equal and both arrays have nans at the same place.

    >>> arr1 = ndtest((2, 3), dtype=float)
    >>> arr1['a1', 'b1'] = nan
    >>> arr1
    a\b |  b0 |  b1 |  b2
     a0 | 0.0 | 1.0 | 2.0
     a1 | 3.0 | nan | 5.0
    >>> arr2 = arr1.copy()
    >>> arr2
    a\b |  b0 |  b1 |  b2
     a0 | 0.0 | 1.0 | 2.0
     a1 | 3.0 | nan | 5.0
    >>> larray_equal(arr1, arr2)
    False
    >>> larray_nan_equal(arr1, arr2)
    True
    >>> arr2['b1'] = 0.0
    >>> larray_nan_equal(arr1, arr2)
    False
    

Miscellaneous improvements

  • viewer: make keyboard shortcuts work even when the focus is not on the array editor widget. It means that, for example, plotting an array (via Ctrl-P) or opening it in Excel (Ctrl-E) can be done directly even when interacting with the list of arrays or within the interactive console (closes issue 102).

  • viewer: automatically display plots done in the viewer console in a separate window (see example below), unless “%matplotlib inline” is used.

    >>> arr = ndtest((3, 3))
    >>> arr.plot()
    
  • viewer: when calling view(an_array) from within the viewer, the new window opened does not block the initial window, which means you can have several windows open at the same time. view() without argument can still result in odd behavior though.

  • improved LArray.set_labels to make it possible to replace only some labels of an axis, instead of all of them and to replace labels from several axes at the same time.

    >>> a = ndrange('nat=BE,FO;sex=M,F')
    >>> a
    nat\sex | M | F
         BE | 0 | 1
         FO | 2 | 3
    

    to replace only some labels, one must give a mapping giving the new label for each label to replace

    >>> a.set_labels(x.sex, {'M': 'Men'})
    nat\sex | Men | F
         BE |   0 | 1
         FO |   2 | 3
    

    to replace labels for several axes at the same time, one should give a mapping giving the new labels for each changed axis

    >>> a.set_labels({'sex': 'Men,Women', 'nat': 'Belgian,Foreigner'})
      nat\sex | Men | Women
      Belgian |   0 |     1
    Foreigner |   2 |     3
    

    one can also replace some labels in several axes by giving a mapping of mappings

    >>> a.set_labels({'sex': {'M': 'Men'}, 'nat': {'BE': 'Belgian'}})
    nat\sex | Men | F
    Belgian |   0 | 1
         FO |   2 | 3
    
  • allowed matrix multiplication (@ operator) between arrays with dimension != 2 (closes issue 122).

  • improved LArray.plot to get nicer plots by default. The axes are transposed compared to what they used to, because the last axis is often used for time series. Also it considers a 1D array like a single series, not N series of 1 point.

  • added installation instructions (closes issue 101).

  • Axis.group and Axis.all are now deprecated (closes issue 148).

    >>> city.group(['London', 'Brussels'], name='capitals')
    # should be written as:
    >>> city[['London', 'Brussels']] >> 'capitals'
    

    and

    >>> city.all()
    # should be written as:
    >>> city[:] >> 'all'
    

Fixes

  • viewer: allow changing the number of displayed digits even for integer arrays as that makes sense when using scientific notation (closes issue 100).

  • viewer: fixed opening a viewer via view() edit() or compare() from within the viewer (closes issue 109)

  • viewer: fixed compare() colors when arrays have values which are very close but not exactly equal (closes issue 123)

  • viewer: fixed legend when plotting arbitrary rows (it always displayed the labels of the first rows) (closes issue 136).

  • viewer: fixed labels on the x axis when zooming on a plot (closes issue 143)

  • viewer: fixed storing an array in a variable with a name which existed previously but which was not displayable in the viewer, such as the name of any function or special object. In some cases, this error lead to a crash of the viewer. For example, this code failed when run in the viewer console, because x is already defined (for the x. syntax):

    >>> x = ndtest(3)
    
  • fixed indexing an array using a positional group with a position which corresponds to a label on that axis. This used to return the wrong data (the data corresponding to the position as if it was the key).

    >>> a = Axis('a', '1..3')
    >>> arr = ndrange(a)
    >>> arr
    a | 1 | 2 | 3
      | 0 | 1 | 2
    >>> # this used to return 0 !
    >>> arr[a.i[1]]
    1
    
  • fixed == for positional groups (closes issue 93)

    >>> years = Axis('years', '1995..1997')
    >>> years
    Axis('years', [1995, 1996, 1997])
    >>> # this used to return False
    >>> years.i[0] == 1995
    True
    
  • fixed using positional groups for their value in many cases (slice bounds, within list of values, within other groups, etc.). For example, this used to fail:

    >>> arr = ndtest((2, 4))
    >>> arr
    a\b | b0 | b1 | b2 | b3
     a0 |  0 |  1 |  2 |  3
     a1 |  4 |  5 |  6 |  7
    >>> b = arr.b
    >>> start = b.i[0]  # equivalent to start = 'b0'
    >>> stop = b.i[2]   # equivalent to stop = 'b2'
    >>> arr[start:stop]
    a\b | b0 | b1 | b2
     a0 |  0 |  1 |  2
     a1 |  4 |  5 |  6
    >>> arr[[b.i[0], b.i[2]]]
    a\b | b0 | b2
     a0 |  0 |  2
     a1 |  4 |  6
    
  • fixed posargsort labels (closes issue 137).

  • fixed labels when doing group aggregates using positional groups. Previously, it used the positions as labels. This was most visible when using the Group.by() method (which creates positional groups).

    >>> years = Axis('years', '2010..2015')
    >>> arr = ndrange(years)
    >>> arr
    years | 2010 | 2011 | 2012 | 2013 | 2014 | 2015
          |    0 |    1 |    2 |    3 |    4 |    5
    >>> arr.sum(years.by(3))
    years | 2010:2012 | 2013:2015
          |         3 |        12
    

    While this used to return:

    >>> arr.sum(years.by(3))
    years | 0:3 | 3:6
          |   3 |  12
    
  • fixed Group.by() when the group was a slice with either bound unspecified. For example, years[2010:2015].by(3) worked but years[:].by(3), years[2010:].by(3) and years[:2015].by(3) did not.

  • fixed a speed regression in version 0.18 and later versions compared to 0.17. In some cases, it was up to 40% slower than it should (closes issue 165).

Version 0.20

Released on 2017-02-09.

IMPORTANT

To make sure all users have all optional dependencies installed and use the same version of packages, and to simplify the update process, we created a new “larrayenv” package which will install larray itself AND all its dependencies (including the optional ones). This means that this version needs to be installed using:

conda install larrayenv

in the future, to update from one version to the next, it should always be enough to do:

conda update larrayenv

New features

  • implemented from_lists() to create constant arrays (instead of using LArray directly as that is very error prone). We are not really happy with its name though, so it might change in the future. Any suggestion of a better name is very welcome (closes issue 30).

    >>> from_lists([['sex\\year', 1991, 1992, 1993],
    ...             [ 'M',           0,    1,    2],
    ...             [ 'F',           3,    4,    5]])
    sex\year | 1991 | 1992 | 1993
           M |    0 |    1 |    2
           F |    3 |    4 |    5
    
  • added support for loading sparse arrays via open_excel().

    For example, assuming you have a sheet like this:

    age | sex\year | 2015 | 2016
     10 |        F |  0.0 |  1.0
     10 |        M |  2.0 |  3.0
     20 |        M |  4.0 |  5.0
    

    loading it will yield:

    >>> wb = open_excel('test_sparse.xlsx')
    >>> arr = wb['Sheet1'].load()
    >>> arr
    age | sex\year | 2015 | 2016
     10 |        F |  0.0 |  1.0
     10 |        M |  2.0 |  3.0
     20 |        F |  nan |  nan
     20 |        M |  4.0 |  5.0
    

Miscellaneous improvements

  • allowed to get an axis from an array by using array.axis_name in addition to array.axes.axis_name:

    >>> arr = ndtest((2, 3))
    >>> arr.axes
    AxisCollection([
        Axis('a', ['a0', 'a1']),
        Axis('b', ['b0', 'b1', 'b2'])
    ])
    >>> arr.a
    Axis('a', ['a0', 'a1'])
    
  • viewer: several rows/columns can be plotted together. It draws a separate line for each row except if only one column has been selected.

  • viewer: the array labels are used as “ticks” in plots.

  • ‘_by’ aggregation methods accept groups in addition to axes (closes issue 59). It will keep only the mentioned groups and aggregate all other dimensions:

    >>> arr = ndtest((2, 3, 4))
    >>> arr
     a | b\c | c0 | c1 | c2 | c3
    a0 |  b0 |  0 |  1 |  2 |  3
    a0 |  b1 |  4 |  5 |  6 |  7
    a0 |  b2 |  8 |  9 | 10 | 11
    a1 |  b0 | 12 | 13 | 14 | 15
    a1 |  b1 | 16 | 17 | 18 | 19
    a1 |  b2 | 20 | 21 | 22 | 23
    
    >>> arr.sum_by('c0,c1;c1:c3')
    c | c0,c1 | c1:c3
      |   126 |   216
    
  • viewer: view() and edit() now accept as argument a path to a file containing arrays.

    >>> view('myfile.h5')
    

    this is a shortcut for:

    >>> view(Session('myfile.h5'))
    
  • AxisCollection.without now accepts a single integer position (to exclude an axis by position).

    >>> a = ndtest((2, 3))
    >>> a.axes
    AxisCollection([
        Axis('a', ['a0', 'a1']),
        Axis('b', ['b0', 'b1', 'b2'])
    ])
    >>> a.axes.without(0)
    AxisCollection([
        Axis('b', ['b0', 'b1', 'b2'])
    ])
    
  • nicer display (repr) for LSet (closes issue 44).

    >>> x.b['b0,b2'].set()
    x.b['b0', 'b2'].set()
    
  • implemented sep argument for LArray & AxisCollection.combine_axes() to allow using a custom delimiter (closes issue 53).

  • added a check that ipfp target sums haves expected axes (closes issue 42).

  • when the nb_index argument is not provided explicitly in read_excel(engine=’xlrd’), it is autodetected from the position of the first “” (closes issue 66).

  • allow any special character except “.” and whitespace when creating axes labels using “..” syntax (previously only _ was allowed).

  • added many more I/O tests to hopefully lower our regression rate in the future (closes issue 70).

Fixes

  • viewer: selection of entire rows/columns will load any remaining data, if any (closes issue 37). Previously if you selected entire rows or columns of a large dataset (which is not loaded entirely from the start), it only selected (and thus copied/plotted) the part of the data which was already loaded.
  • viewer: filtering on anonymous axes is now possible (closes issue 33).
  • fixed loading sparse files using read_excel() (fixes issue 29).
  • fixed nb_index argument for read_excel().
  • fixed creating range axes with a negative start bound using string notation (e.g. Axis(‘name’, ‘-1..10’)) (fixes issue 51).
  • fixed ptp() function.
  • fixed with_axes() to copy the title of the array.
  • fixed Group >> ‘name’.
  • fixed workbook[sheet_position] when using open_excel().
  • fixed plotting in the viewer when using Qt4.

Version 0.19

Released on 2017-01-19.

New features

  • Implemented a “by” variant to all aggregate methods (e.g. sum_by, mean_by, etc.). These methods aggregate all axes except those listed, which means the only axes remaining after the aggregate operation will be those listed. For example: arr.sum_by(x.a) is equivalent to arr.sum(arr.axes - x.a)

    >>> arr = ndtest((2, 3, 4))
    >>> arr
     a | b\c | c0 | c1 | c2 | c3
    a0 |  b0 |  0 |  1 |  2 |  3
    a0 |  b1 |  4 |  5 |  6 |  7
    a0 |  b2 |  8 |  9 | 10 | 11
    a1 |  b0 | 12 | 13 | 14 | 15
    a1 |  b1 | 16 | 17 | 18 | 19
    a1 |  b2 | 20 | 21 | 22 | 23
    >>> arr.sum_by(x.b)
    b | b0 | b1 |  b2
      | 60 | 92 | 124
    
  • Added .extend() method to Axis class

    >>> a = Axis('a', 'a0..a2')
    >>> a
    Axis('a', ['a0', 'a1', 'a2'])
    >>> other = Axis('other', 'a3..a5')
    >>> a.extend(other)
    Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
    

    or directly specify the extra labels as a list or as a “label string”:

    >>> a.extend('a3..a5')
    Axis('a', ['a0', 'a1', 'a2', 'a3', 'a4', 'a5'])
    
  • Added title argument to all array creation functions (ndrange, zeros, ones, …) and display it in the .info of array objects.

    >>> a = ndrange(3, title='a simple test array')
    >>> a.info
    a simple test array
    3
     {0}* [3]: 0 1 2
    
  • implemented creating an Axis using a group:

    >>> arr = ndtest((2, 3))
    >>> arr
    a\b | b0 | b1 | b2
    a0 |  0 |  1 |  2
    a1 |  3 |  4 |  5
    >>> a, b = arr.axes
    >>> zeros((a, b[:'b1']))
    a\b |  b0 |  b1
    a0 | 0.0 | 0.0
    a1 | 0.0 | 0.0
    
  • made Axis.startswith, .endswith and .matches accept Group instances

    >>> a = Axis('a', 'a0..b2')
    >>> a
    Axis('a', ['a0', 'a1', 'a2', 'b0', 'b1', 'b2'])
    
    >>> prefix = Axis('prefix', 'a,b')
    >>> a.startswith(prefix['a'])
    a['a0', 'a1', 'a2']
    >>> a.startswith(prefix.i[1])
    a['b0', 'b1', 'b2']
    
  • implemented all usual binary operations (+, -, *, /, …) on Group

    >>> year = Axis('year', '2011..2016')
    >>> year[2013] + 1
    2014
    >>> year.i[2] + 1
    2014
    
  • made the viewer is much more useful as a debugger in the middle of a function by generalizing SessionEditor to handle any mapping, instead of only Session objects but made it list and display only array objects. To view the value of non-array variable one should type their name in the console. Given those changes, view() will superficially behave as before, but behind the scene, all variables which were defined in the scope where view() was called will be available in the viewer console, even though they will not appear in the list on the left. This means that the viewer console will be able to use scalars defined at that point and call others functions of your code. In other words, there are more chances you can execute some code from the function calling view() by simply copy-pasting the code line.

Backward incompatible changes

  • LGroup lost set-like operations (intersection and union) to the profit of a specific subclass (LSet). In other words, this no longer works:

    >>> letters = Axis('letters', 'a..z')
    >>> letters[':c'] & letters['b:']
    

    To make it work, we need to convert the LGroup(s) to LSets explicitly:

    >>> letters[':c'].set() & letters['b:d'].set()
    letters.set[OrderedSet(['b', 'c'])]
    
    >>> letters[':c'].set() | letters['b:d'].set()
    letters.set[OrderedSet(['a', 'b', 'c', 'd'])]
    
    >>> letters[':c'].set() - 'b'
    letters.set[OrderedSet(['a', 'c'])]
    
  • group aggregates produce simple string labels for the new aggregated axis instead of using the group themselves as labels. This means one can no longer know where a group comes from but this simplifies the code and fixes a few issues, most notably export of aggregated arrays to Excel, and some operations between two aggregated arrays.

    >>> arr = ndtest((3, 4))
    >>> arr
    a\b | b0 | b1 | b2 | b3
     a0 |  0 |  1 |  2 |  3
     a1 |  4 |  5 |  6 |  7
     a2 |  8 |  9 | 10 | 11
    >>> agg = arr.sum(':b2 >> tob2;b2,b3 >> other')
    >>> agg
    a\b | tob2 | other
     a0 |    3 |     5
     a1 |   15 |    13
     a2 |   27 |    21
    >>> agg.info
    3 x 2
     a [3]: 'a0' 'a1' 'a2'
     b [2]: 'tob2' 'other'
    >>> agg.axes.b.labels[0]
    'tob2'
    

    In previous versions this would have returned:

    >>> agg.axes.b.labels[0]
    LGroup(':b2', name='tob2', axis=Axis('b', ['b0', 'b1', 'b2', 'b3']))
    
  • a string containing only a single “integer-like” is no longer transformed to an integer e.g. “10” will evaluate to (the string) “10” (like in version 0.17 and earlier) while “10,20” will evaluate to the list of integers: [10, 20]

Other changes

  • changed how Group instances are displayed.

    >>> a = Axis('a', 'a0..a2')
    >>> a['a1,a2']
    a['a1', 'a2']
    

Fixes

  • fixed > and >= on Group using slices
  • avoid a division by 0 warning when using divnot0
  • viewer: fixed plots when Qt5 is installed. This also removes the matplotlib warning people got when running the viewer with Qt5 installed.
  • viewer: display array when typing its name in the console even when no array was selected previously

Misc

  • misc code cleanup, improved docstrings, …

Version 0.18

Released on 2016-12-20.

Major improvements

  • the documentation (docstrings) of many functions was vastly improved (thanks to Alix)

  • implemented a new optional syntax to generate sequences of labels for axes by using patterns

    integer strings generate integers

    >>> ndrange('age=0..10')
    age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
        | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
    

    you can combine letters and numbers. The number part is treated like increasing (or decreasing numbers)

    >>> ndrange('lipro=P01..P12')
    lipro | P01 | P02 | P03 | P04 | P05 | P06 | P07 | P08 | P09 | P10 | P11 | P12
          |   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |   8 |   9 |  10 |  11
    

    letter patterns generate all combination of letters between the start and end:

    >>> ndrange('test=AA..CC')
    test | AA | AB | AC | BA | BB | BC | CA | CB | CC
         |  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8
    

    other characters are left intact (and should be the same on the start and end patterns:

    >>> ndrange('test=A_1..C_2')
    test | A_1 | A_2 | B_1 | B_2 | C_1 | C_2
         |   0 |   1 |   2 |   3 |   4 |   5
    

    this also works within Axis()

    >>> Axis('age', '0..10')
    Axis('age', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    
  • implemented new syntax for defining groups using strings:

    >>> arr = ndtest((3, 4))
    >>> arr
    a\b | b0 | b1 | b2 | b3
     a0 |  0 |  1 |  2 |  3
     a1 |  4 |  5 |  6 |  7
     a2 |  8 |  9 | 10 | 11
    

    groups can be named using “>>” instead of “=” previously

    >>> arr.sum('b1,b3 >> b13;b0:b2 >> b012')
    a\b | b13 | b012
     a0 |   4 |    3
     a1 |  12 |   15
     a2 |  20 |   27
    

    if some labels are ambiguous, one can specify the axis by using “axis_name[labels]”:

    >>> arr.sum('b[b1,b3] >> b13;b[b0:b2] >> b012')
    a\b | b13 | b012
     a0 |   4 |    3
     a1 |  12 |   15
     a2 |  20 |   27
    

    groups can also be defined by position using this syntax:

    >>> arr.sum('b.i[1,3] >> b13;b.i[0:3] >> b012')
    a\b | b13 | b012
     a0 |   4 |    3
     a1 |  12 |   15
     a2 |  20 |   27
    

    A few notes:

    • the goal was to have that syntax as close as the “normal” syntax as possible (just remove the “x.” and all inner quotes).
    • in models, the normal syntax should be preferred, so that the groups can be stored in a variable and reused in several places
    • strings representing integers are evaluated as integers.
    • there is experimental support for evaluating expressions within string groups by using “{expr}”, but this is fragile and might be removed in the future.
  • implemented combine_axes & split_axis on arrays:

    >>> arr = ndtest((2, 3, 4))
    >>> arr
     a | b\c | c0 | c1 | c2 | c3
    a0 |  b0 |  0 |  1 |  2 |  3
    a0 |  b1 |  4 |  5 |  6 |  7
    a0 |  b2 |  8 |  9 | 10 | 11
    a1 |  b0 | 12 | 13 | 14 | 15
    a1 |  b1 | 16 | 17 | 18 | 19
    a1 |  b2 | 20 | 21 | 22 | 23
    
    >>> arr2 = arr.combine_axes((x.a, x.b))
    >>> arr2
    a_b\c | c0 | c1 | c2 | c3
    a0_b0 |  0 |  1 |  2 |  3
    a0_b1 |  4 |  5 |  6 |  7
    a0_b2 |  8 |  9 | 10 | 11
    a1_b0 | 12 | 13 | 14 | 15
    a1_b1 | 16 | 17 | 18 | 19
    a1_b2 | 20 | 21 | 22 | 23
    
    >>> arr2.split_axis(x.a_b)
     a | b\c | c0 | c1 | c2 | c3
    a0 |  b0 |  0 |  1 |  2 |  3
    a0 |  b1 |  4 |  5 |  6 |  7
    a0 |  b2 |  8 |  9 | 10 | 11
    a1 |  b0 | 12 | 13 | 14 | 15
    a1 |  b1 | 16 | 17 | 18 | 19
    a1 |  b2 | 20 | 21 | 22 | 23
    
  • implemented .by() method on groups which splits them into subgroups of specified length

    >>> arr = ndtest((5, 2))
    >>> arr
    a\b | b0 | b1
     a0 |  0 |  1
     a1 |  2 |  3
     a2 |  4 |  5
     a3 |  6 |  7
     a4 |  8 |  9
    
    >>> arr.sum(a['a0':'a4'].by(2))
             a\b | b0 | b1
    a['a0' 'a1'] |  2 |  4
    a['a2' 'a3'] | 10 | 12
         a['a4'] |  8 |  9
    

    there is also an optional second argument to specify the “step” between groups

    >>> arr.sum(a['a0':'a4'].by(2, step=3))
             a\b | b0 | b1
    a['a0' 'a1'] |  2 |  4
    a['a3' 'a4'] | 14 | 16
    

    if the step is < the group size, you get overlapping groups:

    >>> arr.sum(a['a0':'a4'].by(2, step=1))
             a\b | b0 | b1
    a['a0' 'a1'] |  2 |  4
    a['a1' 'a2'] |  6 |  8
    a['a2' 'a3'] | 10 | 12
    a['a3' 'a4'] | 14 | 16
         a['a4'] |  8 |  9
    
  • groups can be renamed using >> (in addition to the “named” method)

    >>> arr = ndtest((2, 3))
    >>> arr
    a\b | b0 | b1 | b2
     a0 |  0 |  1 |  2
     a1 |  3 |  4 |  5
    >>> arr.sum((x.b['b0,b1'] >> 'b01', x.b['b1,b2'] >> 'b12'))
    a\b | b01 | b12
     a0 |   1 |   3
     a1 |   7 |   9
    
  • implemented rationot0

    >>> a = Axis('a', 'a0,a1')
    >>> b = Axis('b', 'b0,b1,b2')
    >>> arr = LArray([[6, 0, 2],
    ...               [4, 0, 8]], [a, b])
    >>> arr
    a\b | b0 | b1 | b2
     a0 |  6 |  0 |  2
     a1 |  4 |  0 |  8
    >>> arr.sum()
    20
    >>> arr.rationot0()
    a\b |  b0 |  b1 |  b2
     a0 | 0.3 | 0.0 | 0.1
     a1 | 0.2 | 0.0 | 0.4
    >>> arr.rationot0(x.a)
    a\b |  b0 |  b1 |  b2
     a0 | 0.6 | 0.0 | 0.2
     a1 | 0.4 | 0.0 | 0.8
    

    for reference, the normal ratio method would return:

    >>> arr.ratio(x.a)
    a\b |  b0 |  b1 |  b2
     a0 | 0.6 | nan | 0.2
     a1 | 0.4 | nan | 0.8
    

Misc improvements

  • implemented [] on groups so that you can further subset them
  • added a new “condensed” option for ipfp’s display_progress argument to get back the old behavior
  • changed how named groups are displayed (only the name is displayed)
  • positional groups gained a few features and are almost on par with label groups now
  • when iterating over an axis (for example when doing “for y in year_axis:” it yields groups (instead of raw labels) so that it works even in the presence of ambiguous labels.
  • Axis.startswith, endswith, matches create groups which include the axis (so that those groups work even if the labels exist on several axes)

Bug fixes

  • fixed Session.summary() when arrays in the session have axes without name
  • fixed full() and full_like() with an explicit dtype (the dtype was ignored)

Version 0.17

Released on 2016-11-29.

Core

  • added ndtest function to create n-dimensional test arrays (of given shape). Axes are named by single letters starting from ‘a’. Axes labels are constructed using a ‘{axis_name}{label_pos}’ pattern (e.g. ‘a0’).

    >>> ndtest(6)
    a | a0 | a1 | a2 | a3 | a4 | a5
      |  0 |  1 |  2 |  3 |  4 |  5
    >>> ndtest((2, 3))
    a\b | b0 | b1 | b2
    a0 |  0 |  1 |  2
    a1 |  3 |  4 |  5
    >>> ndtest((2, 3), label_start=1)
    a\b | b1 | b2 | b3
    a1 |  0 |  1 |  2
    a2 |  3 |  4 |  5
    
  • allow naming “one-shot” groups in group aggregates.

    >>> arr = ndtest((2, 3))
    >>> arr
    a\b | b0 | b1 | b2
    a0 |  0 |  1 |  2
    a1 |  3 |  4 |  5
    >>> arr.sum('g1=b0;g2=b1,b2;g3=b0:b2')
    a\b | 'g1' ('b0') | 'g2' (['b1' 'b2']) | 'g3' ('b0':'b2')
    a0 |           0 |                  3 |                3
    a1 |           3 |                  9 |               12
    
  • implemented argmin, argmax, posargmin, posargmax without an axis argument (works on the full array).

    >>> arr = ndtest((2, 3))
    >>> arr
    a\b | b0 | b1 | b2
    a0 |  0 |  1 |  2
    a1 |  3 |  4 |  5
    >>> arr.argmin()
    ('a0', 'b0')
    
  • added preliminary code to add a title attribute to LArray.

    This needs a lot more work to be really useful though, as it can currently only be used in the LArray() function itself and is only used in Session.summary() (see below). There are many places where this should be used, but this is not done yet.

  • added Session.summary() which displays a list of all arrays, their dimension names and title if any.

    This can be used in combination with local_arrays() to produce some kind of codebook with all the arrays of a function.

    >>> arr = LArray([[1, 2], [3, 4]], 'sex=M,F;nat=BE,FO', title='a test array')
    >>> arr
    sex\nat | BE | FO
          M |  1 |  2
          F |  3 |  4
    >>> s = Session({'arr': arr})
    >>> s
    Session(arr)
    >>> print(s.summary())
    arr: sex, nat
        a test array
    
  • fixed using groups from other (compatible) axis

  • fixed group aggregates using groups without axis

  • fixed axis[another_label_group] when said group had a non-string Axis

  • fixed axis.group(another_label_group, name=’a_name’) (name was not set correctly)

  • fixed ipfp progress message when progress is negative

viewer

  • when setting part of an array in the console (by using e.g. arr[‘M’] = 10), display that array
  • when typing in the console the name of an existing array, select it in the list
  • fixed missing tooltips for arrays added to the session from within the session viewer
  • fixed window title (with axes info) not updating in many cases
  • fixed the filters bar not being cleared when displaying a non-LArray object after an LArray object

misc

  • improved messages in ipfp(display_progress=True)
  • improved tests, docstrings, …

Version 0.16.1

Released on 2016-11-04.

Viewer

  • renamed “Ok” button in array/session viewer to “Close”.
  • added apply and discard buttons in session editor, which permanently apply or discard changes to the current array.

Core

  • fixed array[sequence, scalar] = value
  • fixed array.to_excel() which was broken in 0.16 (by the upgrade to xlwings 0.9+).
  • improved a few tests

Version 0.16

Released on 2016-10-26.

Warning: this release needs to be installed using:

conda update larray conda update xlwings

New features

  • implemented support for xlwings 0.9+. This allowed us to change the way we interact with Excel:

    • by default, the Excel instance we use is configured to be both hidden and silent (for example, it does not prompt to update/edit links).

    • by default, we now use a dedicated Excel instance for each call to open_excel, instead of reusing any existing instance if there was any open. In practice, it means input/output from/to Excel is more reliable and does not risk altering any workbook you had open (except if you ask for that explicitly). The cost of this is that it is slower by default. If you open many different workbooks, it is recommended that you create a single Excel instance and reuse it. This can be done with:

      >>> from larray import *
      >>> import xlwings as xw
      
      >>> app = xw.App(visible=False, add_book=False)
      >>> wb1 = open_excel('workbook1.xlsx', app=app)
      # use wb1 as before
      >>> wb1.close()
      >>> wb2 = open_excel('workbook2.xlsx', app=app)
      # use wb2 as before
      >>> wb2.close()
      >>> app.quit()
      
  • added ipfp function which does Iterative Proportional Fitting Procedure (also known as bi-proportional fitting in statistics or RAS algorithm in economics). Note that this new function is currently not in the core module, so it needs a specific import command:

    >>> from larray.ipfp import ipfp
    
    >>> a = Axis('a', 2)
    >>> b = Axis('b', 2)
    >>> initial = LArray([[2, 1],
    ...                   [1, 2]], [a, b])
    >>> initial
    a*\b* | 0 | 1
        0 | 2 | 1
        1 | 1 | 2
    >>> target_sum_along_a = LArray([2, 1], b)
    >>> target_sum_along_b = LArray([1, 2], a)
    >>> ipfp([target_sum_along_a, target_sum_along_b], initial, threshold=0.01)
    a*\b* |                  0 |                   1
        0 | 0.8450704225352113 | 0.15492957746478875
        1 | 1.1538461538461537 |  0.8461538461538463
    
  • made it possible to create arrays more succintly in some usual cases (especially for quick arrays for testing purposes). Previously, when one created an array from scratch, he had to provide Axis object(s) (or another array). Note that the following examples use zeros() but this change affects all array creation functions (ones, zeros, ndrange, full, empty):

    >>> nat = Axis('nat', ['BE', 'FO'])
    >>> sex = Axis('sex', ['M', 'F'])
    >>> zeros([nat, sex])
    nat\sex |   M |   F
         BE | 0.0 | 0.0
         FO | 0.0 | 0.0
    

    Now, when you have axe names and axes labels but do not have/want to reuse an existing axis, you can use this syntax:

    >>> zeros([('nat', ['BE', 'FO']),
    ...        ('sex', ['M', 'F'])])
    nat\sex |   M |   F
         BE | 0.0 | 0.0
         FO | 0.0 | 0.0
    

    If additionally all axe names and labels are strings (not integers or other types) which do not contain any special character (“=”, “,” or “;”) you can use:

    >>> zeros('nat=BE,FO;sex=M,F')
    nat\sex |   M |   F
         BE | 0.0 | 0.0
         FO | 0.0 | 0.0
    

    See below (*) for some more alternate syntaxes and an explanation of how this works.

  • added additional, less error-prone syntax for stack:

    >>> nat = Axis('nat', 'BE,FO')
    >>> arr1 = ones(nat)
    >>> arr1
    nat |  BE |  FO
        | 1.0 | 1.0
    >>> arr2 = zeros(nat)
    >>> arr2
    nat |  BE |  FO
        | 0.0 | 0.0
    >>> stack([('M', arr1), ('F', arr2)], 'sex')
    nat\sex |   H |   F
         BE | 1.0 | 0.0
         FO | 1.0 | 0.0
    

    in addition to the still supported but discouraged (because one has to remember the order of labels):

    >>> sex = Axis('sex', ['M', 'F'])
    >>> stack((arr1, arr2), sex)
    nat\sex |   H |   F
         BE | 1.0 | 0.0
         FO | 1.0 | 0.0
    
  • added LArray.compact and Session.compact() to detect and remove “useless” axes (ie axes for which values are constant over the whole axis)

    >>> a = LArray([[1, 2], [1, 2]], [Axis('sex', 'M,F'), Axis('nat', 'BE,FO')])
    >>> a
    sex\nat | BE | FO
          M |  1 |  2
          F |  1 |  2
    >>> a.compact()
    nat | BE | FO
        |  1 |  2
    
  • made Session keep the order in which arrays were added to it. The main goal was to make this work:

    >>> b, a = s['b', 'a']
    

    Previously, since sessions were always traversed alphabetically, this was a dangerous operation because if the keys (a and b) were not sorted alphabetically, the result would not be in the expected order:

    s[‘b’, ‘a’] previously returned a, b instead of b, a !!

    Session.names is still sorted alphabetically though (Session.keys() is not)

  • added LArray.with_axes(axes) to return a new LArray with the same data but different axes

    >>> a = ndrange(2)
    >>> a
    {0}* | 0 | 1
         | 0 | 1
    >>> a.with_axes([Axis('sex', 'H,F')])
    sex | H | F
        | 0 | 1
    
  • changed width from which an LArray is summarized (using “…”) from 80 characters to 200.

  • implemented memory_used property which displays nbytes in human-readable form

    >>> a = ndrange('sex=H,F;nat=BE,FO')
    >>> a.memory_used
    '16 bytes'
    >>> a = ndrange(100000)
    >>> a.memory_used
    '390.62 Kb'
    
  • implemented Axis + AxisCollection

    >>> a = ndrange('sex=M,F;type=t1,t2')
    >>> Axis('nat', 'BE,FO') + a.axes
    AxisCollection([
        Axis('nat', ['BE', 'FO']),
        Axis('sex', ['M', 'F']),
        Axis('type', ['t1', 't2'])
    ])
    

(*) For the curious, there are also many syntaxes supported for array creation functions. In fact, during array creation, at any place a list or tuple of values is expected, you can specify it using a single string, which will be split successively at the following characters if present: “;” then “=” then “,”. If you apply that algorithm to ‘nat=BE,FO;sex=M,F’, you get:

  1. ‘nat=BE,FO;sex=M,F’
  2. (‘nat=BE,FO’, ‘sex=M,F’)
  3. ((‘nat’, ‘BE,FO’), (‘sex’, ‘M,F’))
  4. ((‘nat’, (‘BE’, ‘FO’)), (‘sex’, (‘M’, ‘F’)))

Recognise this last syntax? This is the same as above, except above we replaced some () with [] for clarity. In fact all the intermediate forms here above are valid (and equivalent) in array creation functions.

Version 0.15

Released on 2016-09-23.

Core

  • added new methods on axes: matches, startswith, endswith

    >>> country = Axis('country', ['FR', 'BE', 'DE', 'BR'])
    >>> country.matches('BE|FR')
    LGroup(['FR', 'BE'])
    >>> country.matches('^..$') # labels 2 characters long
    LGroup(['FR', 'BE', 'DE', 'BR'])
    
    >>> country.startswith('B')
    LGroup(['BE', 'BR'])
    >>> country.endswith('R')
    LGroup(['FR', 'BR'])
    
  • implemented set-like operations on LGroup: & (intersection), | (union), - (difference). Slice groups do not work yet on axes references (x.) but that will come in the future…

    >>> alpha = Axis('alpha', 'a,b,c,d')
    >>> alpha['a', 'b'] | alpha['c', 'd']
    LGroup(['a', 'b', 'c', 'd'], axis=…)
    >>> alpha['a', 'b', 'c'] | alpha['c', 'd']
    LGroup(['a', 'b', 'c', 'd'], axis=…)
    

    a name is computed automatically when both operands are named

    >>> r = alpha['a', 'b'].named('ab') | alpha['c', 'd'].named('cd')
    >>> r.name
    'ab | cd'
    >>> r.key
    ['a', 'b', 'c', 'd']
    

    numeric axes work too

    >>> num = Axis('num', range(10))
    >>> num[:2] | num[8:]
    num[0, 1, 2, 8, 9]
    >>> num[:2] | num[5]
    num[0, 1, 2, 5])
    

    intersection

    >>> LGroup(['a', 'b', 'c']) & LGroup(['c', 'd'])
    LGroup(['c'])
    

    difference

    >>> LGroup(['a', 'b', 'c']) - LGroup(['c', 'd'])
    LGroup(['a', 'b'])
    >>> LGroup(['a', 'b', 'c']) - 'b'
    LGroup(['a', 'c'])
    
  • fixed loading 1D arrays using open_excel

Viewer

  • added tooltip with the axes labels corresponding to each cell of the array viewer

  • added name and dimensions of the current array to the window title bar in the session viewer

  • added tooltip with each array .info() in the list of arrays of the session viewer

  • fixed eval box throwing an exception when trying to set a new variable (if qtconsole is not present)

  • fixed group aggregates using LGroups defined using axes references (x.), for example:

    >>> arr.sum(x.age[:10])
    
  • fixed group aggregates using anonymous axes

Version 0.14.1

Released on 2016-08-12.

Fixes

  • fixed support for loading arrays without axe names from Excel files (in that case index_col/nb_index are necessary)
  • fixed using a single int for index_col in read_excel() and sheet.load()
  • fixed loading empty Excel sheets via xlwings correctly (ie do not crash)
  • fixed dumping a session loaded from an H5 file to Excel

Version 0.14

Released on 2016-08-10.

Important warning

This version is not compatible with the new version of xlwings that just came out. Consequently, upgrading to this version is different from the usual “conda update larray”. You should rather use:

conda update larray –no-update-deps

To get the most of this release, you should also install the “qtconsole” package via:

conda install qtconsole

Viewer

  • upgraded session viewer/editor to work like a super-calculator. The input box below the array view can be used to type any expression. eg array1.sum(x.age) / array2, which will be displayed in the viewer. One can also type assignment commands, like: array3 = array1.sum(x.age) / array2 In which case, the new array will be displayed in the viewer AND added to the session (appear on the list on the left), so that you can use it in other expressions.

    If you have the “qtconsole” package installed (see above), that input box will be a full ipython console. This means:
    • history of typed commands,
    • tab-completion (for example, type “nd<tab>” and it will change to “ndrange”),
    • syntax highlighting,
    • calltips (show the documentation of functions when typing commands using them),
    • help on functions using “?”. For example, type “ndrange?<enter>” to get the full documentation about ndrange. Use <ESC> or <q> to quit that screen !),
    • etc.

    When having the “qtconsole” package installed, you might get a warning when starting the viewer:

    WARNING:root:Message signing is disabled.  This is insecure and not recommended!
    

    This is totally harmless and can be safely ignored !

  • made view() and edit() without argument equivalent to view(local_arrays()) and edit(local_arrays()) respectively.

  • made the viewer on large arrays start a lot faster by using a small subset of the array to guess the number of decimals to display and whether or not to use scientific notation.

  • improved compare():
    • added support for comparing sessions. Arrays with differences between sessions are colored in red.

    • use a single array widget instead of 3. This is done by stacking arrays together to create a new dimension. This has the following advantages:

      • the filter and scrollbars are de-facto automatically synchronized.
      • any number of arrays can be compared, not just 2. All arrays are compared to the first one.
      • arrays with different sets of compatible axes can be compared (eg compare an array with its mean along an axis).
    • added label to show maximum absolute difference.

  • implemented edit(session) in addition to view(session).

Excel support

  • added support for copying sheets via: wb[‘x’] = wb[‘y’] if ‘x’ sheet already existed, it is completely overwritten.

Core

  • improved performance. My test models run about 10% faster than with 0.13.

  • made cumsum and cumprod aggregate on the last axis by default so that the axis does not need to be specified when there is only one.

  • implemented much better support for operations using arrays of different types. For example,

    • fixed create_sequential when mult, inc and initial are of different types eg create_sequential(…, initial=1, inc=0.1) had an unexpected integer result because it always used the type of the initial value for the output
    • when appending a string label to an integer axis (eg adding total to an age axis by using with_total()), the resulting axis should have a mixed type, and not be suddenly all string.
    • stack() now supports arrays with different types.
  • made stack support arrays with different axes (the result has the union of all axes)

For completeness

  • use xlwings (ie live Excel instance) by default for all Excel input/output, including read_excel(), session.dump and session.load/Session(filename). This has the advantage of more coherent results among the different ways to load/save data to Excel and that simple sessions correctly survive a round-trip to an .xlsx workbook (ie (named) axes are detected properly). However, given the very different library involved, we loose most options that read_excel used to provide (courtesy of pandas.read_excel) and some bugs were probably introduced in the conversion.
  • fixed creating a new file via open_excel()
  • fixed loading 1D arrays (ranges with height 1 or width 1) via open_excel()
  • fixed sheet[‘A1’] = array in some cases
  • wb.close() only really close if the workbook was not already open in Excel when open_excel was called (so that we do not close a workbook a user is actually viewing).
  • added support for wb.save(filename), or actually for using any relative path, instead of a full absolute path.
  • when dumping a session to Excel, sort sheets alphabetically instead of dumping them in a “random” order.
  • try to convert float to int in more situations
  • added support for using stack() without providing an axis. It creates an anonymous wildcard axis of the correct length.
  • added aslarray() top-level function to translate anything into an LArray if it is not already one
  • made labels_array available via from larray import *
  • fixed binary operations between an array and an axis where the array appeared first (eg array > axis). Confusingly, axis < array already worked.
  • added check in “a[bool_larray_key]” to make sure key.axes are compatible with a.axes
  • made create_sequential a lot faster when mult or inc are constants
  • made axes without name compatible with any name (this is the equivalent of a wildcard name for labels)
  • misc cleanup/docstring improvements/improved tests/improved error messages

Version 0.13

Released on 2016-07-11.

New features

  • implemented a new way to do input/output from/to Excel

    >>> a = ndrange((2, 3))
    >>> wb = open_excel('c:/tmp/y.xlsx')
    # put a at A1 in Sheet1, excluding headers (labels)
    >>> wb['Sheet1'] = a
    # dump a at A1 in Sheet2, including headers (labels)
    >>> wb['Sheet2'] = a.dump()
    # save the file to disk
    >>> wb.save()
    # close it
    >>> wb.close()
    
    >>> wb = open_excel('c:/tmp/y.xlsx')
    # load a from the data starting at A1 in Sheet1, assuming the absence of headers.
    >>> a1 = wb['Sheet1']
    # load a from the data starting at A1 in Sheet1, assuming the presence of (correctly formatted) headers.
    >>> a2 = wb['Sheet2'].load()
    >>> wb.close()
    
    >>> wb = open_excel('c:/tmp/y.xlsx')
    # note that Sheet2 must exist
    >>> sheet2 = wb['Sheet2']
    # write a without labels starting at C5
    >>> sheet2['C5'] = a
    # write a with its labels starting at A10
    >>> sheet2['A10'] = a.dump()
    

    load an array with its axes information from a range. As you might have guessed, we could also use the sheet2 variable here

    >>> b = wb['Sheet2']['A10:D12'].load()
    >>> b
    {0}*\{1}* | 0 | 1 | 2
            0 | 0 | 1 | 2
            1 | 3 | 4 | 5
    

    load an array (raw data) with no axis information from a range.

    >>> c = sheet['B11:D12']
    >>> # in fact, this is not really an LArray ...
    >>> c
    <larray.excel.Range at 0x1ff1bae22e8>
    >>> # but it can be used as such (this is currently very experimental)
    >>> c.sum(axis=0)
    {0}* |   0 |   1 |   2
         | 3.0 | 5.0 | 7.0
    >>> # ... and it can be used for other stuff, like setting the formula instead of the value:
    >>> c.formula = '=D10+1'
    >>> # in the future, we should also be able to set font name, size, style, etc.
    
  • implemented LArray.rename({axis: new_name}) as well as using kwargs to rename several axes at once

    >>> nat = Axis('nat', ['BE', 'FO'])
    >>> sex = Axis('sex', ['M', 'F'])
    >>> a = ndrange([nat, sex])
    >>> a
    nat\sex | M | F
         BE | 0 | 1
         FO | 2 | 3
    >>> a.rename(nat='nat2', sex='gender')
    nat2\gender | M | F
             BE | 0 | 1
             FO | 2 | 3
    >>> a.rename({'nat': 'nat2', 'sex': 'gender'})
    nat2\gender | M | F
             BE | 0 | 1
             FO | 2 | 3
    
  • made tab-completion of axes names possible in an interactive console

For completeness

  • taking a subset of an array with wildcard axes now returns an array with wildcard axes
  • fixed a case where wildcard axes were considered incompatible when they actually were compatible
  • better support for anonymous axes
  • fix for obscure bugs, better doctests, cleaner implementation for a few functions, …

Version 0.12

Released on 2016-06-21.

New features

  • implemented boolean indexing by using axes objects:

    >>> sex = Axis('sex', 'M,F')
    >>> age = Axis('age', range(5))
    >>> a = ndrange((sex, age))
    >>> a
    sex\age | 0 | 1 | 2 | 3 | 4
          M | 0 | 1 | 2 | 3 | 4
          F | 5 | 6 | 7 | 8 | 9
    
    >>> a[age < 3]
    sex\age | 0 | 1 | 2
          M | 0 | 1 | 2
          F | 5 | 6 | 7
    

    This new syntax is equivalent to (but currently much slower than):

    >>> a[age[:2]]
    sex\age | 0 | 1 | 2
          M | 0 | 1 | 2
          F | 5 | 6 | 7
    

    However, the power of this new syntax comes from the fact that you are not limited to scalar constants

    >>> age_limit = LArray([2, 3], sex)
    >>> age_limit
    sex | M | F
        | 2 | 3
    
    >>> a[age < age_limit]
    sex,age | M,0 | M,1 | F,0 | F,1 | F,2
            |   0 |   1 |   5 |   6 |   7
    

    Notice that the concerned axes are merged, so you cannot do much as much with them. For example, a[age < age_limit].sum(x.age) would not work since there is no “age” axis anymore.

    To keep axes intact, one can often set the values of the corresponding cells to 0 or nan instead.

    >>> a[age < age_limit] = 0
    >>> a
    sex\age | 0 | 1 | 2 | 3 | 4
          M | 0 | 0 | 2 | 3 | 4
          F | 0 | 0 | 0 | 8 | 9
    >>> # in this case, the sum *is* valid (but the mean would not -- one should use nan for that)
    >>> a.sum(x.age)
    sex | M |  F
        | 9 | 17
    

    To keep axes intact, this idiom is also often useful:

    >>> b = a * (age >= age_limit)
    >>> b
    sex\age | 0 | 1 | 2 | 3 | 4
          M | 0 | 0 | 2 | 3 | 4
          F | 0 | 0 | 0 | 8 | 9
    

    This also works with axes references (x.axis_name), though this is experimental and the filter value is only computed as late as possible (during []), so you cannot display it before that, like you can with “real” axes.

    Using “real” axes:

    >>> filter1 = age < age_limit
    >>> filter1
    age\sex |     M |     F
          0 |  True |  True
          1 |  True |  True
          2 | False |  True
          3 | False | False
          4 | False | False
    >>> a[filter1]
    sex,age | M,0 | M,1 | F,0 | F,1 | F,2
            |   0 |   1 |   5 |   6 |   7
    

    With axes references:

    >>> filter2 = x.age < age_limit
    >>> filter2
    <larray.core.BinaryOp at 0x1332ae3b588>
    >>> a[filter2]
    sex,age | M,0 | M,1 | F,0 | F,1 | F,2
            |   0 |   1 |   5 |   6 |   7
    >>> a * ~filter2
    sex\age | 0 | 1 | 2 | 3 | 4
          M | 0 | 0 | 2 | 3 | 4
          F | 0 | 0 | 0 | 8 | 9
    
  • implemented LArray.divnot0

    >>> nat = Axis('nat', ['BE', 'FO'])
    >>> sex = Axis('sex', ['M', 'F'])
    >>> a = ndrange((nat, sex))
    >>> a
    nat\sex | M | F
         BE | 0 | 1
         FO | 2 | 3
    >>> b = ndrange(sex)
    >>> b
    sex | M | F
        | 0 | 1
    >>> a / b
    nat\sex |   M |   F
         BE | nan | 1.0
         FO | inf | 3.0
    >>> a.divnot0(b)
    nat\sex |   M |   F
         BE | 0.0 | 1.0
         FO | 0.0 | 3.0
    
  • implemented .named() on groups to name groups after the fact

    >>> a = ndrange(Axis('age', range(100)))
    >>> a
    age | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99
        | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99
    >>> a.sum((x.age[10:19].named('teens'), x.age[20:29].named('twenties')))
    age | 'teens' (10:19) | 'twenties' (20:29)
        |             145 |                245
    
  • made all array creation functions (ndrange, zeros, ones, full, LArray, …) more flexible:

    They accept a single Axis argument instead of requiring a tuple/list of them

    >>> sex = Axis('sex', 'M,F')
    >>> a = ndrange(sex)
    >>> a
    sex | M | F
        | 0 | 1
    

    Shortcut definition for axes work

    >>> ndrange("a,b,c")
    {0} | a | b | c
        | 0 | 1 | 2
    >>> ndrange(["1:3", "d,e"])
    {0}\{1} | d | e
          1 | 0 | 1
          2 | 2 | 3
          3 | 4 | 5
    >>> LArray([1, 5, 7], "a,b,c")
    {0} | a | b | c
        | 1 | 5 | 7
    

    One can mix Axis objects and ints (for axes without labels)

    >>> sex = Axis('sex', 'M,F')
    >>> ndrange([sex, 3])
    sex\{1}* | 0 | 1 | 2
           M | 0 | 1 | 2
           F | 3 | 4 | 5
    
  • made it possible to iterate on labels of a group (eg a slice of an Axis):

    >>> for year in a.axes.year[2010:]:
    ...     # do stuff
    
  • changed representation of anonymous axes from “axisN” (where N is the position of the axis) to “{N}”. The problem was that “axisN” was not recognizable enough as an anonymous axis, and it was thus misleading. For example “a[x.axis0[…]]” would not work.

  • better overall support for arrays with anonymous axes or several axes with the same name

  • fixed all output functions (to_csv, to_excel, to_hdf, …) when the last axis has no name but other axes have one

  • implemented eye() which creates 2D arrays with ones on the diagonal and zeros elsewhere.

    >>> eye(sex)
    sex\sex |   M |   F
          M | 1.0 | 0.0
          F | 0.0 | 1.0
    
  • implemented the @ operator to do matrix multiplication (Python3.5+ only)

  • implemented inverse() to return the (matrix) inverse of a (square) 2D array

    >>> a = eye(sex) * 2
    >>> a
    sex\sex |   M |   F
          M | 2.0 | 0.0
          F | 0.0 | 2.0
    
    >>> a @ inverse(a)
    sex\sex |   M |   F
          M | 1.0 | 0.0
          F | 0.0 | 1.0
    
  • implemented diag() to extract a diagonal or construct a diagonal array.

    >>> nat = Axis('nat', ['BE', 'FO'])
    >>> sex = Axis('sex', ['M', 'F'])
    >>> a = ndrange([nat, sex], start=1)
    >>> a
    nat\sex | M | F
         BE | 1 | 2
         FO | 3 | 4
    >>> d = diag(a)
    >>> d
    nat,sex | BE,M | FO,F
            |    1 |    4
    >>> diag(d)
    nat\sex | M | F
         BE | 1 | 0
         FO | 0 | 4
    >>> a = ndrange(sex, start=1)
    >>> a
    sex | M | F
        | 1 | 2
    >>> diag(a)
    sex\sex | M | F
          M | 1 | 0
          F | 0 | 2
    

For completeness

  • added Axis.rename method which returns a copy of the axis with a different name and deprecate Axis._rename

  • added labels_array as a generalized version of identity (which is deprecated)

  • implemented LArray.ipoints[…] to do point selection using coordinates instead of labels (aka numpy indexing)

  • raise an error when trying to do a[key_with_more_axes_than_a] = value instead of silently ignoring extra axes.

  • allow using a single int for index_col in read_csv in addition to a list of ints

  • implemented __getitem__ for “x”. You can now write stuff like:

    >>> a = ndrange((3, 4))
    >>> a[x[0][1:]]
    {0}\{1}* | 0 | 1 |  2 |  3
           1 | 4 | 5 |  6 |  7
           2 | 8 | 9 | 10 | 11
    >>> a[x[1][2:]]
    {0}*\{1} |  2 |  3
           0 |  2 |  3
           1 |  6 |  7
           2 | 10 | 11
    >>> a.sum(x[0])
    {0}* |  0 |  1 |  2 |  3
         | 12 | 15 | 18 | 21
    
  • produce normal axes instead of wildcard axes on LArray.points[…]. This is (much) slower but more correct/informative.

  • changed the way we store axes internally, which has several consequences

    • better overall support for anonymous axes
    • better support for arrays with several axes with the same name
    • small performance improvement
    • the same axis object cannot be added twice in an array (one should use axis.copy() if that need arises)
    • changes the way groups with an axis are displayed
  • fixed sum, min, max functions on non-LArray arguments

  • changed __repr__ for wildcard axes to not display their labels but their length

    >>> ndrange(3).axes[0]
    Axis(None, 3)
    
  • fixed aggregates on several groups “forgetting” the name of groups which had been created using axis.all()

  • allow Axis(…, long) in addition to int (Python2 only)

  • better docstrings/tests/comments/error messages/thoughts/…

Version 0.11.1

Released on 2016-05-25.

Fixes

  • fixed new functions full, full_like and create_sequential not being available when using from larray import *

Version 0.11

Released on 2016-05-25.

Viewer

  • implemented “Copy to Excel” in context menu (Ctrl+E), to open the selection in a new Excel sheet directly, without the need to use paste. If nothing is selected, copies the whole array.

  • when nothing is selected, Ctrl C selects & copies the whole array to the clipboard.

  • when nothing is selected, Ctrl V paste at top-left corner

  • implemented view(dict_with_array_values)

    >>> view({'a': array1, 'b': array2})
    
  • fixed copy (ctrl-C) when viewing a 2D array: it did not include labels from the first axis in that case

Core

  • implemented LArray.growth_rate to compute the growth along an axis

    >>> sex = Axis('sex', ['M', 'F'])
    >>> year = Axis('year', [2015, 2016, 2017])
    >>> a = ndrange([sex, year]).cumsum(x.year)
    >>> a
    sex\year | 2015 | 2016 | 2017
           M |    0 |    1 |    3
           F |    3 |    7 |   12
    >>> a.growth_rate()
    sex\year |          2016 |           2017
           M |           inf |            2.0
           F | 1.33333333333 | 0.714285714286
    >>> a.growth_rate(d=2)
    sex\year | 2017
           M |  inf
           F |  3.0
    
  • implemented LArray.diff (difference along an axis)

    >>> sex = Axis('sex', ['M', 'F'])
    >>> xtype = Axis('type', ['type1', 'type2', 'type3'])
    >>> a = ndrange([sex, xtype]).cumsum(x.type)
    >>> a
    sex\type | type1 | type2 | type3
           M |     0 |     1 |     3
           F |     3 |     7 |    12
    >>> a.diff()
    sex\type | type2 | type3
           M |     1 |     2
           F |     4 |     5
    >>> a.diff(n=2)
    sex\type | type3
           M |     1
           F |     1
    >>> a.diff(x.sex)
    sex\type | type1 | type2 | type3
           F |     3 |     6 |     9
    
  • implemented round() (as a nicer alias to around() and round_())

    >>> a = ndrange(5) + 0.5
    >>> a
    axis0 |   0 |   1 |   2 |   3 |   4
          | 0.5 | 1.5 | 2.5 | 3.5 | 4.5
    >>> round(a)
    axis0 |   0 |   1 |   2 |   3 |   4
          | 0.0 | 2.0 | 2.0 | 4.0 | 4.0
    
  • implemented Session[[‘list’, ‘of’, ‘str’]] to get a subset of a Session

    >>> s = Session({'a': ndrange(3), 'b': ndrange(4), 'c': ndrange(5)})
    >>> s
    Session(a, b, c)
    >>> s['a', 'c']
    Session(a, c)
    
  • implemented LArray.points to do pointwise indexing instead of the default orthogonal indexing when indexing several dimensions at the same time.

    >>> a = Axis('a', ['a1', 'a2', 'a3'])
    >>> b = Axis('b', ['b1', 'b2', 'b3'])
    >>> arr = ndrange((a, b))
    >>> arr
    a\b | b1 | b2 | b3
    a1 |  0 |  1 |  2
    a2 |  3 |  4 |  5
    >>> arr[['a1', 'a3'], ['b1', 'b2']]
    a\b | b1 | b2
    a1 |  0 |  1
    a3 |  6 |  7
    # this selects the points ('a1', 'b1') and ('a3', 'b2')
    >>> arr.points[['a1', 'a3'], ['b1', 'b2']]
    a,b* | 0 | 1
         | 0 | 7
    

    Note that .ipoints (to do pointwise indexing with positions instead of labels – aka numpy indexing) is planned but not functional yet.

  • made “arr1.drop_labels() * arr2” use the labels from arr2 if any

    >>> a = Axis('a', ['a1', 'a2'])
    >>> b = Axis('b', ['b1', 'b2'])
    >>> b2 = Axis('b', ['b2', 'b3'])
    >>> arr1 = ndrange([a, b])
    >>> arr1
    a\b | b1 | b2
    a1 |  0 |  1
    a2 |  2 |  3
    >>> arr1.drop_labels(b)
    a\b* | 0 | 1
      a1 | 0 | 1
      a2 | 2 | 3
    >>> arr1.drop_labels([a, b])
    a*\b* | 0 | 1
        0 | 0 | 1
        1 | 2 | 3
    >>> arr2 = ndrange([a, b2])
    >>> arr2
    a\b | b2 | b3
    a1 |  0 |  1
    a2 |  2 |  3
    >>> arr1 * arr2
    Traceback (most recent call last):
    ...
    ValueError: incompatible axes:
    Axis('b', ['b2', 'b3'])
    vs
    Axis('b', ['b1', 'b2'])
    >>> arr1 * arr2.drop_labels()
    a\b | b1 | b2
    a1 |  0 |  1
    a2 |  4 |  9
    # in versions < 0.11, it used to return:
    # >>> arr1.drop_labels() * arr2
    # a*\b* | 0 | 1
    #    0 | 0 | 1
    #    1 | 2 | 3
    >>> arr1.drop_labels() * arr2
    a\b | b2 | b3
    a1 |  0 |  1
    a2 |  4 |  9
    >>> arr1.drop_labels('a') * arr2.drop_labels('b')
    a\b | b1 | b2
    a1 |  0 |  1
    a2 |  4 |  9
    
  • made .plot a property, like in Pandas, so that we can do stuff like:

    >>> a.plot.bar()
    # instead of
    >>> a.plot(kind='bar')
    
  • made labels from different types not match against each other even if their value is the same. This might break some code but it is both more efficient and more convenient in some cases, so let us see how it goes:

    >>> a = ndrange(4)
    >>> a
    axis0 | 0 | 1 | 2 | 3
          | 0 | 1 | 2 | 3
    >>> a[1]
    1
    >>> # This used to "work" (and return 1)
    >>> a[True]
    
    ValueError: True is not a valid label for any axis
    
    >>> a[1.0]
    
    ValueError: 1.0 is not a valid label for any axis
    
  • implemented read_csv(dialect=’liam2’) to read .csv files formatted like in LIAM2 (with the axes names on a separate line than the last axis labels)

  • implemented Session[boolean LArray]

    >>> a = ndrange(3)
    >>> b = ndrange(4)
    >>> s1 = Session({'a': a, 'b': b})
    >>> s2 = Session({'a': a + 1, 'b': b})
    >>> s1 == s2
    name |     a |    b
         | False | True
    >>> s1[s1 == s2]
    Session(b)
    >>> s1[s1 != s2]
    Session(a)
    
  • implemented experimental support for creating an array sequentially. Comments on the name of the function and syntax (especially compared to ndrange) would be appreciated.

    >>> year = Axis('year', range(2016, 2020))
    >>> sex = Axis('sex', ['M', 'F'])
    >>> create_sequential(year)
    year | 2016 | 2017 | 2018 | 2019
         |    0 |    1 |    2 |    3
    >>> create_sequential(year, 1.0, 0.1)
    year | 2016 | 2017 | 2018 | 2019
         |  1.0 |  1.1 |  1.2 |  1.3
    >>> create_sequential(year, 1.0, mult=1.1)
    year | 2016 | 2017 | 2018 |  2019
         |  1.0 |  1.1 | 1.21 | 1.331
    >>> inc = LArray([1, 2], [sex])
    >>> inc
    sex | M | F
        | 1 | 2
    >>> create_sequential(year, 1.0, inc)
    sex\year | 2016 | 2017 | 2018 | 2019
           M |  1.0 |  2.0 |  3.0 |  4.0
           F |  1.0 |  3.0 |  5.0 |  7.0
    >>> mult = LArray([2, 3], [sex])
    >>> mult
    sex | M | F
        | 2 | 3
    >>> create_sequential(year, 1.0, mult=mult)
    sex\year | 2016 | 2017 | 2018 | 2019
           M |  1.0 |  2.0 |  4.0 |  8.0
           F |  1.0 |  3.0 |  9.0 | 27.0
    >>> initial = LArray([3, 4], [sex])
    >>> initial
    sex | M | F
        | 3 | 4
    >>> create_sequential(year, initial, inc, mult)
    sex\year | 2016 | 2017 | 2018 | 2019
           M |    3 |    7 |   15 |   31
           F |    4 |   14 |   44 |  134
    >>> def modify(prev_value):
    ...     return prev_value / 2
    >>> create_sequential(year, 8, func=modify)
    year | 2016 | 2017 | 2018 | 2019
         |    8 |    4 |    2 |    1
    >>> create_sequential(3)
    axis0* | 0 | 1 | 2
           | 0 | 1 | 2
    >>> create_sequential(x.year, axes=(sex, year))
    sex\year | 2016 | 2017 | 2018 | 2019
           M |    0 |    1 |    2 |    3
           F |    0 |    1 |    2 |    3
    
  • implemented full and full_like to create arrays initialize to something else than zeros or ones

    >>> nat = Axis('nat', ['BE', 'FO'])
    >>> sex = Axis('sex', ['M', 'F'])
    >>> full([nat, sex], 42.0)
    nat\sex |    M |    F
         BE | 42.0 | 42.0
         FO | 42.0 | 42.0
    >>> initial_value = ndrange([sex])
    >>> initial_value
    sex | M | F
        | 0 | 1
    >>> full([nat, sex], initial_value)
    nat\sex | M | F
         BE | 0 | 1
         FO | 0 | 1
    
  • performance improvements when using label keys: a[key] is faster, especially if key is large

Fixes

  • to_excel(filepath) only closes the file if it was not open before
  • removed code which forced labels from .csv files to be strings (as it caused problems in many cases, e.g. ages in LIAM2 files)

Misc. stuff for completeness

  • made LGroups usable in Python’s builtin range() and convertible to int and float
  • implemented AxisCollection.union (equivalent to AxisCollection | Axis)
  • fixed boolean array keys (boolean filter) in combination with scalar keys (for other dimensions)
  • fixed support for older numpy
  • fixed LArray.shift(n=0)
  • still more work on making arrays with anonymous axes usable (not there yet)
  • added more tests
  • better docstrings/error messages…
  • misc. code cleanup/simplification/improved comments

Version 0.10.1

Released on 2016-03-25.

New features

  • A single change in this release: a much more powerful to_excel function which (by default) use Excel itself to write files. Additional functionality include:

    • write in an existing file without overwriting existing data/sheet/…
    • write at a precise position
    • view an array in a live Excel instance (a new OR an existing workbook)

    See to_excel() documentation for details.

Version 0.10

Released on 2016-03-22.

Core

  • implemented dropna argument for to_csv, to_frame and to_series to avoid writing lines with either ‘all’ or ‘any’ NA values.
  • implemented read_sas. Needs pandas >= 0.18 (though it seems still buggy on some files).
  • implemented experimental support for __getattr__ and __setattr__ on LArray. One can use arr.H instead of arr[‘M’]. It only works for single string labels though (not for slices or list of labels nor integer labels). Not sure it is a good idea :).
  • implemented Session +-*/
    Eg. sess1 - sess2 will compute the difference on each array present in either session. If an array is present in one session and not in the other, it is replaced by “NaN”.
  • added .nbytes property to LArray objects (to know how many bytes of memory the array uses)
  • made sort_axis accept a tuple of axes
  • raises an error on a.i[tuple_with_len_greater_than_array_ndim]
  • slightly better support for axes with no name (no, still no complete support yet ;-))
  • improved AxisCollection: implemented __delitem__(slice), __setitem__(list), __setitem__(slice)
  • fixed exception on AxisCollection.index(invalid_index)
  • better docstrings for a few functions
  • misc code cleanups, refactoring & improved tests

Editor

  • added .dirty property on ArrayEditorWidget
  • fixed viewing arrays with “inf” (infinite)
  • fixed a few edge cases for the ndigit detection code
  • fixed colors in some cases in edit()
  • made copy-paste of large regions faster in some cases

Version 0.9.2

Released on 2016-03-02.

Core

  • much better support for unnamed axes overall. Still a long way to go for full support, but it’s getting there…

Editor

  • fixed edit() for arrays with the same labels on several axes

Version 0.9.1

Released on 2016-03-01.

Core

  • better .info for arrays with groups in axes

    >>> # example using groups without a name
    >>> reg = la.sum((fla, wal, bru, belgium))
    >>> reg.info
    4 x 15
     geo [4]: ['A11' ... 'A73'] ['A25' ... 'A93'] 'A21' ['A11' ... 'A21']
     lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
    
    >>> # example using groups with a name
    >>> fla = geo.group(fla_str, name='Flanders')
    >>> wal = geo.group(wal_str, name='Wallonia')
    >>> bru = geo.group(bru_str, name='Brussels')
    >>> reg = la.sum((fla, wal, bru))
    >>> reg.info
    3 x 15
     geo [3]: 'Flanders' (['A11' ... 'A73']) 'Wallonia' (['A25' ... 'A93']) 'Brussels' ('A21')
     lipro [15]: 'P01' 'P02' 'P03' ... 'P13' 'P14' 'P15'
    

Editor

  • fixed edit() with non-string labels in axes
  • fixed edit() with filters in some more cases
  • fixed ArrayEditorWidget.reject_changes and accept_changes to update the model & view accordingly (in case the widget is kept open)
  • avoid (harmless) error messages in some cases

Version 0.9

Released on 2016-02-25.

A minor but backward incompatible version (hence the bump in version number)!

Core

  • fixed int_array.mean() to return floats instead of int (regression in 0.8)
  • larray_equal returns False when either value is not an LArray, instead of raising an exception

Session

  • changed Session == Session to return an array of booleans instead of a single boolean, so that we know which array(s) differ. Code like session1 == session2, should be changed to all(session1 == session2).
  • implemented Session != Session
  • implemented Session.get(k, default) (returns default if k does not exist in Session)
  • implemented len() for Session objects to know how many objects are in the Session

Viewer

  • fixed view() (regression in 0.8.1)
  • fixed edit() to actually apply changes on “OK”/accept_changes even when no filter change occurred after the last edit.

Version 0.8.1

Released on 2016-02-24.

Viewer

  • implemented min/maxvalue arguments for edit()
  • do not close the window when pressing Enter
  • allow to start editing cells by pressing Enter
  • fixed copy of changed cells (copy the changed value)
  • fixed pasted values to not be accepted directly (they go to “changes” like for manual edits)
  • fixed color updates on paste
  • disabled experimental tooltips on headers
  • better error message when entering invalid values

Core

  • implemented indexing by position on several dimensions at once (like numpy)

    >>> # takes the first item in the first and third dimensions, leave the second dimension intact
    >>> arr.i[0, :,  0]
    <some result>
    >>> # sets all the cells corresponding to the first item in the first dimension and the second item in the fourth
    >>> # dimension
    >>> arr.i[0, :,  :, 1] = 42
    
  • added optional ‘readonly’ argument to expand() to produce a readonly view (much faster since no copying is done)

Version 0.8

Released on 2016-02-16.

Core

  • implemented skipna argument for most aggregate functions. defaults to True.
  • implemented LArray.sort_values(key)
  • implemented percentile and median
  • added isnan and isinf toplevel functions
  • made axis argument optional for argsort & posargsort on 1D arrays
  • fixed a[key] = value when key corresponds to a single cell of the array
  • fixed keepaxes argument for aggregate functions
  • fixed a[int_array] (when the axis needs to be guessed)
  • fixed empty_like
  • fixed aggregates on several axes given as integers e.g. arr.sum(axis=(0, 2))
  • fixed “kind” argument in posargsort

Viewer

  • added title argument to edit() (set automatically if not provided, like for view())
  • fixed edit() on filtered arrays
  • fixed view(expression). anything which was not stored in a variable was broken in 0.7.1
  • reset background color when setting values if necessary (still buggy in some cases, but much less so ;-))
  • background color for headers is always on
  • view() => array cells are not editable, instead of being editable and ignoring entered values
  • fixed compare() colors when arrays are entirely equal
  • fixed error message for compare() when PyQt is not available

Misc

  • bump numpy requirement to 1.10, implicitly dropping support for python 3.3
  • renamed view module to editor to not collide with view function
  • improved/added a few tests

Version 0.7.1

Released on 2016-01-29.

Viewer

  • implemented paste (ctrl-V)

  • implemented experimental array comparator:

    >>> compare(array1, array2)
    

    Known limitation: the arrays must have exactly the same axes and the background color is buggy when using filters

  • when no title is specified in view(), it is determined automatically by inspecting the local variables of the function where view() is called and using the names of the ones matching the object passed. If several matches, up to 3 are displayed.

  • added axes names to copy (ctrl-C)

  • fixed copy (ctrl-C) of 0d array

Input/Output

  • added ‘dialect’ argument to to_csv. For example, dialect=’classic’ does not include the last (horizontal) axis name.
  • fixed loading .csv files without (ie ‘classic’ .csv files), though one needs to specify nb_index in that case if ndim > 2
  • strip spaces around axes names so that you can use “axis0<space><space>axis1” instead of “axis0axis1” in .csv files
  • fixed 1d arrays I/O
  • more precise parsing of input headers: 1 and 0 come out as int, not bool

Misc

  • nicer error message when using an invalid axes names
  • changed LArray .df property to a to_frame() method so that we can pass options to it

Version 0.7

Released on 2016-01-26.

Viewer

  • implemented view() on Session objects
  • added axes length in window title and add axes info even if title is provided manually (concatenate both)
  • ndecimals are recomputed when toggling the scientific checkbox
  • allow viewing (some) non-ndarray stuff (e.g. python lists)
  • refactored viewer code so that the filter drop downs can be reused too
  • Known regression: the viewer is slow on large arrays (this will be fixed in a later release, obviously)

Session

  • implemented local_arrays() to return all LArray in locals() as a Session

  • implemented Session.__getitem__(int_position)

  • implement Session(filename) to directly load all arrays from a file. Equivalent to:

    >>> s = Session()
    >>> s.load(filename)
    
  • implemented Session.__eq__, so that you can compare two sessions and see if all arrays are equal. Suppose you want to refactor your code and make sure you get the same results.

    >>> # put results in a Session
    >>> res = Session({'array1': array1, 'array2': array2})
    >>> # before refactoring
    >>> res.dump('results.h5')
    >>> # after refactoring
    >>> assert Session('results.h5') == res
    
  • you can load all sheets/arrays of a file (if you do not specify which ones you want, it takes all)

  • loading several sheets from an excel file is now MUCH faster because the same file is kept open (apparently xlrd parses the whole file each time we open it).

  • you can specify a subset of arrays to dump

  • implemented rudimentary session I/O for .csv files, usage is a bit different from .h5 & excel files

    >>> # need to specify format manually
    >>> s.dump('directory_name', fmt='csv')
    >>> # need to specify format manually
    >>> s = Session()
    >>> s.load('directory_name', fmt='csv')
    
  • pass *args and **kwargs to lower level functions in Session.load

  • fail when trying to read an inexistant H5 file through Session, instead of creating it

Other new features

  • added start argument in ndrange to specify starting value
  • implemented Axis._rename. Not sure it’s a good idea though…
  • implemented identity function which takes an Axis and returns an LArray with the axis labels as values
  • implemented size property on AxisCollection
  • allow a single int in AxisCollection.without

Fixes

  • fixed broadcast_with when other_axes contains 0-len axes
  • fixed a[bool_array] = value when the first axis of a is not in bool_array
  • fixed view() on arrays with unnamed axes
  • fixed view() on arrays of Python objects
  • various other small bugs fixed

Version 0.6.1

Released on 2016-01-13.

New features

  • added dtype argument to all array creation functions to override default data type

  • aggregates can take an explicit “axis” keyword argument which can be used to target an axis by index

    >>> arr.sum(axis=0)
    
  • implemented LGroup.__getitem__ & LGroup.__iter__, so that for list-based groups (ie not slices) you can write:

    >>> for v in my_group:
    ...     # some code
    

    or

    >>> my_group[0]
    

Miscellaneous improvements

  • renamed LabelGroup to LGroup and PositionalKey to PGroup. We might want to rename the later to IGroup (to be consistent with axis.i[…]).
  • slightly better support for axes without name
  • better docstrings for a few functions
  • misc cleanup

Fixes

  • fixed XXX_like(a) functions to use the same dtype than a instead of always float
  • fixed to_XXX with 1d arrays (e.g. to_clipboard())
  • fixed all() and any() toplevel functions without argument
  • fixed LArray without axes in some cases
  • fixed array creation functions with only shapes on python2

Version 0.6

Released on 2016-01-12.

New features

  • a[bool_array_key] broadcasts missing/differently ordered dimensions and returns an LArray with combined axes

  • a[bool_array_key] = value broadcasts missing/differently ordered dimensions on both key and value

  • implemented argmin, argmax, argsort, posargmin, posargmax, posargsort.

    they do indirect operation along an axis. E.g. argmin gives the label of the minimum value, argsort gives the labels which would sort the array along that dimension. posargXXX gives the position/indexes instead of the labels.

  • implemented Axis.__iter__ so that one can write:

    >>> for label in an_array.axes.an_axis:
    ...     <some code>
    

    instead of

    >>> for label in an_array.axes.an_axis.labels:
    ...     <some code>
    
  • implemented the .info property on AxisCollection

  • implement all/any top level functions, so that you can use them in with_total.

Miscellaneous improvements

  • renamed ValueGroup to LabelGroup. We might want to rename it to LGroup to be consistent with LArray?

  • allow a single int as argument to LArray creation functions (ndrange et al.)

    e.g. ndrange(10) is now allowed instead of ndrange([10])

  • use display_name in .info (ie add * next to wildcard axes in .info).

  • allow specifying a custom window title in view()

  • viewer displays booleans as True/False instead of 1/0

  • slightly better support for axes with no name (None). There is still a long way to go for full support though.

  • improved a few docstrings

  • nicer errors when tests results are different from expected

  • removed debug prints from viewer

  • misc cleanups

Fixes

  • fixed view() on all-negative arrays
  • fixed view() on string arrays

Version 0.5

Released on 2015-12-15.

New features

  • experimental support for indexing an LArray by another (integer) LArray

    >>> array[other_array]
    
  • experimental support for LArray.drop_labels and the concept of wildcard axes

  • added LArray.display_name and AxisCollection.display_names which add ‘*’ next to wildcard axes

  • implemented where(cond, array1, array2)

  • implemented LArray.__iter__ so that this works:

    >>> for value in array:
    ...     <some code>
    
  • implement keepaxes=label or keepaxes=True for aggregate functions on full axes

    array.sum(x.age, keepaxes=’total’)

  • AxisCollection.replace can replace several axes in one call

  • implemented .expand(out=) to expand into an existing array

Miscellaneous improvements

  • removed Axis.sorted()

  • removed LArray.axes_names & axes_labels. One should use .axes.names & .axes.labels instead.

  • raise an error when trying to convert an array with more than one value to a Boolean. For example, this will fail:

    >>> arr = ndrange([sex])
    >>> if arr:
    ...     <some code>
    
  • convert value to self.dtype in append/prepend

  • faster .extend, .append, .prepend and .expand

  • some code cleanup, better tests, …

Fixes

  • fixed .extend when other has longer axes than self

Version 0.4

Released on 2015-12-09.

New features

  • implemented LArray.expand to add dimensions
  • implemented prepend
  • implemented sort_axis
  • allow creating 0d (scalar) LArrays

Miscellaneous improvements

  • made extend expand its arguments
  • made .append expand its value before appending
  • changed read_* to not sort data by default
  • more minor stuff :)

Fixes

  • fixed loading 1d arrays

Version 0.3

Released on 2015-11-26.

New features

  • implemented LArray.with_total(): appends axes or group aggregates to the array.

    Without argument, it adds totals on all axes. It has optional keyword only arguments:

    • label: specify the label (“total” by default)
    • op: specify the aggregate function (sum by default, all other aggregates should work too)

    With multiple arguments, it adds totals sequentially. There are some tricky cases. For example when, for the same axis, you add group aggregates and axis aggregates:

    >>> # works but "wrong" for x.geo (double what is expected because the total also
    >>> # includes fla wal & bru)
    >>> la.with_total(x.sex, (fla, wal, bru), x.geo, x.lipro)
    
    >>> # correct total but the order is not very nice
    >>> la.with_total(x.sex, x.geo, (fla, wal, bru), x.lipro)
    
    >>> # the correct way to do it, but it is probably not entirely obvious
    >>> la.with_total(x.sex, (fla, wal, bru, x.geo.all()), x.lipro)
    
    >>> # we probably want to display a warning (or even an error?) in that case.
    >>> # If the user really wants that behavior, he can split the operation:
    >>> # .with_total((fla, wal, bru)).with_total(x.geo)
    
  • implemented group aggregates without using keyword arguments. As a consequence of this, one can no longer use axis numbers in aggregates. Eg. a.sum(0) does not sum on the first axis anymore (but you can do a.sum(a.axes[0]) if needed)

  • implemented LArray.percent: equivalent to ratio * 100

  • implemented Session.filter -> returns a new Session with only objects matching the filter

  • implemented Session.dump -> dumps all LArray in the Session to a file

  • implemented Session.load -> load several LArrays from a file to a Session

Version 0.2.6

Released on 2015-11-24.

Fixes

  • fixed LArray.cumsum and cumprod.
  • fixed all doctests just enough so that they run.

Version 0.2.5

Released on 2015-10-29.

Miscellaneous improvements

  • many methods got (improved) docstrings (Thanks to Johan).

Fixes

  • fixed mixing keys without axis (e.g. arr[10:15]) with key with axes (e.g. arr[x.age[10:15]]).

Version 0.2.4

Released on 2015-10-27.

New features

  • includes an experimental (slightly inefficient) version of guess axis, so that one can write:

    >>> arr[10:20]
    

    instead of

    >>> arr[age[10:20]]
    

Version 0.2.3

Released on 2015-10-19.

New features

  • positional slicing via “x.” syntax (x.axis.i[:5])

Fixes

  • view(array) is usable when doing from larray import *
  • fixed a nasty bug for doing “group” aggregates when there is only one dimension

Version 0.2.2

Released on 2015-10-15.

New features

  • implement AxisCollection.replace(old_axis, new_axis)
  • implement positional indexing

Miscellaneous improvements

  • more powerful AxisCollection.pop added support .pop(name) or .pop(Axis object)
  • LArray.set_labels returns a new LArray by default use inplace=True to get previous behavior
  • include ndrange and __version__ in __all__

Fixes

  • fixed shift with n <= 0

Version 0.2.1

Released on 2015-10-14.

New features

  • implemented LArray.shift(axis, n=1)

Miscellaneous improvements

  • change set_labels API (axis, new_labels)
  • transform Axis.labels into a property so that _mapping is kept in sync

Fixes

  • hopefully fix build

Version 0.2

Released on 2015-10-13.

New features

  • added to_clipboard.
  • added embryonic documentation.
  • added sort_columns and na arguments to read_hdf.
  • added sort_rows, sort_columns and na arguments to read_excel.
  • added setup.py to install the module.

Miscellaneous improvements

  • IO functions (to_*/read_*) now support unnamed axes. The set of supported operations is very limited with such arrays though.
  • to_excel sheet_name defaults to “Sheet1” like in Pandas.
  • reorganised files.
  • automated somewhat releases (added a rudimentary release script).

Fixes

  • column titles are no longer converted to lowercase.

Version 0.1

Released on 2014-10-22.