Interactive online version: Binder badge

Transforming Arrays (Relabeling, Renaming, Reordering, Sorting, …)

Import the LArray library:

[1]:
from larray import *

Import the population array from the demography_eurostat dataset:

[2]:
demography_eurostat = load_example_data('demography_eurostat')
population = demography_eurostat.population

# display the 'population' array
population
[2]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535

Manipulating axes

The Array class offers several methods to manipulate the axes and labels of an array:

  • set_labels: to replace all or some labels of one or several axes.

  • rename: to replace one or several axis names.

  • set_axes: to replace one or several axes.

  • transpose: to modify the order of axes.

  • drop: to remove one or several labels.

  • combine_axes: to combine axes.

  • split_axes: to split one or several axes by splitting their labels and names.

  • reindex: to reorder, add and remove labels of one or several axes.

  • insert: to insert a label at a given position.

Relabeling

Replace some labels of an axis:

[3]:
# replace only one label of the 'gender' axis by passing a dict
population_new_labels = population.set_labels('gender', {'Male': 'Men'})
population_new_labels
[3]:
country  gender\time      2013      2014      2015      2016      2017
Belgium          Men   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France          Men  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany          Men  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535
[4]:
# set all labels of the 'country' axis to uppercase by passing the function str.upper()
population_new_labels = population.set_labels('country', str.upper)
population_new_labels
[4]:
country  gender\time      2013      2014      2015      2016      2017
BELGIUM         Male   5472856   5493792   5524068   5569264   5589272
BELGIUM       Female   5665118   5687048   5713206   5741853   5762455
 FRANCE         Male  31772665  32045129  32174258  32247386  32318973
 FRANCE       Female  33827685  34120851  34283895  34391005  34485148
GERMANY         Male  39380976  39556923  39835457  40514123  40697118
GERMANY       Female  41142770  41210540  41362080  41661561  41824535

See set_labels for more details and examples.

Renaming axes

Rename one axis:

[5]:
# 'rename' returns a copy of the array
population_new_names = population.rename('time', 'year')
population_new_names
[5]:
country  gender\year      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535

Rename several axes at once:

[6]:
population_new_names = population.rename({'gender': 'sex', 'time': 'year'})
population_new_names
[6]:
country  sex\year      2013      2014      2015      2016      2017
Belgium      Male   5472856   5493792   5524068   5569264   5589272
Belgium    Female   5665118   5687048   5713206   5741853   5762455
 France      Male  31772665  32045129  32174258  32247386  32318973
 France    Female  33827685  34120851  34283895  34391005  34485148
Germany      Male  39380976  39556923  39835457  40514123  40697118
Germany    Female  41142770  41210540  41362080  41661561  41824535

See rename for more details and examples.

Replacing Axes

Replace one axis:

[7]:
new_gender = Axis('sex=Men,Women')
population_new_axis = population.set_axes('gender', new_gender)
population_new_axis
[7]:
country  sex\time      2013      2014      2015      2016      2017
Belgium       Men   5472856   5493792   5524068   5569264   5589272
Belgium     Women   5665118   5687048   5713206   5741853   5762455
 France       Men  31772665  32045129  32174258  32247386  32318973
 France     Women  33827685  34120851  34283895  34391005  34485148
Germany       Men  39380976  39556923  39835457  40514123  40697118
Germany     Women  41142770  41210540  41362080  41661561  41824535

Replace several axes at once:

[8]:
new_country = Axis('country_codes=BE,FR,DE')
population_new_axes = population.set_axes({'country': new_country, 'gender': new_gender})
population_new_axes
[8]:
country_codes  sex\time      2013      2014      2015      2016      2017
           BE       Men   5472856   5493792   5524068   5569264   5589272
           BE     Women   5665118   5687048   5713206   5741853   5762455
           FR       Men  31772665  32045129  32174258  32247386  32318973
           FR     Women  33827685  34120851  34283895  34391005  34485148
           DE       Men  39380976  39556923  39835457  40514123  40697118
           DE     Women  41142770  41210540  41362080  41661561  41824535

Reordering axes

Axes can be reordered using transpose method. By default, transpose reverse axes, otherwise it permutes the axes according to the list given as argument. Axes not mentioned come after those which are mentioned(and keep their relative order). Finally, transpose returns a copy of the array.

[9]:
# starting order : country, gender, time
population
[9]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535
[10]:
# no argument --> reverse all axes
population_transposed = population.transpose()

# .T is a shortcut for .transpose()
population_transposed = population.T

population_transposed
[10]:
time  gender\country  Belgium    France   Germany
2013            Male  5472856  31772665  39380976
2013          Female  5665118  33827685  41142770
2014            Male  5493792  32045129  39556923
2014          Female  5687048  34120851  41210540
2015            Male  5524068  32174258  39835457
2015          Female  5713206  34283895  41362080
2016            Male  5569264  32247386  40514123
2016          Female  5741853  34391005  41661561
2017            Male  5589272  32318973  40697118
2017          Female  5762455  34485148  41824535
[11]:
# reorder according to list
population_transposed = population.transpose('gender', 'country', 'time')
population_transposed
[11]:
gender  country\time      2013      2014      2015      2016      2017
  Male       Belgium   5472856   5493792   5524068   5569264   5589272
  Male        France  31772665  32045129  32174258  32247386  32318973
  Male       Germany  39380976  39556923  39835457  40514123  40697118
Female       Belgium   5665118   5687048   5713206   5741853   5762455
Female        France  33827685  34120851  34283895  34391005  34485148
Female       Germany  41142770  41210540  41362080  41661561  41824535
[12]:
# move 'time' axis at first place
# not mentioned axes come after those which are mentioned (and keep their relative order)
population_transposed = population.transpose('time')
population_transposed
[12]:
time  country\gender      Male    Female
2013         Belgium   5472856   5665118
2013          France  31772665  33827685
2013         Germany  39380976  41142770
2014         Belgium   5493792   5687048
2014          France  32045129  34120851
2014         Germany  39556923  41210540
2015         Belgium   5524068   5713206
2015          France  32174258  34283895
2015         Germany  39835457  41362080
2016         Belgium   5569264   5741853
2016          France  32247386  34391005
2016         Germany  40514123  41661561
2017         Belgium   5589272   5762455
2017          France  32318973  34485148
2017         Germany  40697118  41824535
[13]:
# move 'gender' axis at last place
# not mentioned axes come before those which are mentioned (and keep their relative order)
population_transposed = population.transpose(..., 'gender')
population_transposed
[13]:
country  time\gender      Male    Female
Belgium         2013   5472856   5665118
Belgium         2014   5493792   5687048
Belgium         2015   5524068   5713206
Belgium         2016   5569264   5741853
Belgium         2017   5589272   5762455
 France         2013  31772665  33827685
 France         2014  32045129  34120851
 France         2015  32174258  34283895
 France         2016  32247386  34391005
 France         2017  32318973  34485148
Germany         2013  39380976  41142770
Germany         2014  39556923  41210540
Germany         2015  39835457  41362080
Germany         2016  40514123  41661561
Germany         2017  40697118  41824535

See transpose for more details and examples.

Dropping Labels

[14]:
population_labels_dropped = population.drop([2014, 2016])
population_labels_dropped
[14]:
country  gender\time      2013      2015      2017
Belgium         Male   5472856   5524068   5589272
Belgium       Female   5665118   5713206   5762455
 France         Male  31772665  32174258  32318973
 France       Female  33827685  34283895  34485148
Germany         Male  39380976  39835457  40697118
Germany       Female  41142770  41362080  41824535

See drop for more details and examples.

Combine And Split Axes

Combine two axes:

[15]:
population_combined_axes = population.combine_axes(('country', 'gender'))
population_combined_axes
[15]:
country_gender\time      2013      2014      2015      2016      2017
       Belgium_Male   5472856   5493792   5524068   5569264   5589272
     Belgium_Female   5665118   5687048   5713206   5741853   5762455
        France_Male  31772665  32045129  32174258  32247386  32318973
      France_Female  33827685  34120851  34283895  34391005  34485148
       Germany_Male  39380976  39556923  39835457  40514123  40697118
     Germany_Female  41142770  41210540  41362080  41661561  41824535

Split an axis:

[16]:
population_split_axes = population_combined_axes.split_axes('country_gender')
population_split_axes
[16]:
country  gender\time      2013      2014      2015      2016      2017
Belgium         Male   5472856   5493792   5524068   5569264   5589272
Belgium       Female   5665118   5687048   5713206   5741853   5762455
 France         Male  31772665  32045129  32174258  32247386  32318973
 France       Female  33827685  34120851  34283895  34391005  34485148
Germany         Male  39380976  39556923  39835457  40514123  40697118
Germany       Female  41142770  41210540  41362080  41661561  41824535

See combine_axes and split_axes for more details and examples.

Reordering, adding and removing labels

The reindex method allows to reorder, add and remove labels along one axis:

[17]:
# reverse years + remove 2013 + add 2018 + copy data for 2017 to 2018
population_new_time = population.reindex('time', '2018..2014', fill_value=population[2017])
population_new_time
[17]:
country  gender\time      2018      2017      2016      2015      2014
Belgium         Male   5589272   5589272   5569264   5524068   5493792
Belgium       Female   5762455   5762455   5741853   5713206   5687048
 France         Male  32318973  32318973  32247386  32174258  32045129
 France       Female  34485148  34485148  34391005  34283895  34120851
Germany         Male  40697118  40697118  40514123  39835457  39556923
Germany       Female  41824535  41824535  41661561  41362080  41210540

or several axes:

[18]:
population_new = population.reindex({'country': 'country=Luxembourg,Belgium,France,Germany',
                       'time': 'time=2018..2014'}, fill_value=0)
population_new
[18]:
   country  gender\time  2018      2017      2016      2015      2014
Luxembourg         Male     0         0         0         0         0
Luxembourg       Female     0         0         0         0         0
   Belgium         Male     0   5589272   5569264   5524068   5493792
   Belgium       Female     0   5762455   5741853   5713206   5687048
    France         Male     0  32318973  32247386  32174258  32045129
    France       Female     0  34485148  34391005  34283895  34120851
   Germany         Male     0  40697118  40514123  39835457  39556923
   Germany       Female     0  41824535  41661561  41362080  41210540

See reindex for more details and examples.

Another way to insert new labels is to use the insert method:

[19]:
# insert a new country before 'France' with all values set to 0
population_new_country = population.insert(0, before='France', label='Luxembourg')
# or equivalently
population_new_country = population.insert(0, after='Belgium', label='Luxembourg')

population_new_country
[19]:
   country  gender\time      2013      2014      2015      2016      2017
   Belgium         Male   5472856   5493792   5524068   5569264   5589272
   Belgium       Female   5665118   5687048   5713206   5741853   5762455
Luxembourg         Male         0         0         0         0         0
Luxembourg       Female         0         0         0         0         0
    France         Male  31772665  32045129  32174258  32247386  32318973
    France       Female  33827685  34120851  34283895  34391005  34485148
   Germany         Male  39380976  39556923  39835457  40514123  40697118
   Germany       Female  41142770  41210540  41362080  41661561  41824535

See insert for more details and examples.

Sorting

[20]:
# get a copy of the 'population_benelux' array
population_benelux = demography_eurostat.population_benelux.copy()
population_benelux
[20]:
    country  gender\time     2013     2014     2015     2016     2017
    Belgium         Male  5472856  5493792  5524068  5569264  5589272
    Belgium       Female  5665118  5687048  5713206  5741853  5762455
 Luxembourg         Male   268412   275117   281972   289193   296641
 Luxembourg       Female   268627   274563   280986   287056   294026
Netherlands         Male  8307339  8334385  8372858  8417135  8475102
Netherlands       Female  8472236  8494904  8527868  8561985  8606405

Sort an axis (alphabetically if labels are strings)

[21]:
population_sorted = population_benelux.sort_axes('gender')
population_sorted
[21]:
    country  gender\time     2013     2014     2015     2016     2017
    Belgium       Female  5665118  5687048  5713206  5741853  5762455
    Belgium         Male  5472856  5493792  5524068  5569264  5589272
 Luxembourg       Female   268627   274563   280986   287056   294026
 Luxembourg         Male   268412   275117   281972   289193   296641
Netherlands       Female  8472236  8494904  8527868  8561985  8606405
Netherlands         Male  8307339  8334385  8372858  8417135  8475102

Give labels which would sort the axis

[22]:
population_benelux.labelsofsorted('country')
[22]:
country  gender\time         2013  ...         2017
      0         Male   Luxembourg  ...   Luxembourg
      0       Female   Luxembourg  ...   Luxembourg
      1         Male      Belgium  ...      Belgium
      1       Female      Belgium  ...      Belgium
      2         Male  Netherlands  ...  Netherlands
      2       Female  Netherlands  ...  Netherlands

Sort according to values

[23]:
population_sorted = population_benelux.sort_values(('Male', 2017))
population_sorted
[23]:
    country  gender\time     2013     2014     2015     2016     2017
 Luxembourg         Male   268412   275117   281972   289193   296641
 Luxembourg       Female   268627   274563   280986   287056   294026
    Belgium         Male  5472856  5493792  5524068  5569264  5589272
    Belgium       Female  5665118  5687048  5713206  5741853  5762455
Netherlands         Male  8307339  8334385  8372858  8417135  8475102
Netherlands       Female  8472236  8494904  8527868  8561985  8606405

Aligning Arrays

The align method align two arrays on their axes with a specified join method. In other words, it ensure all common axes are compatible.

[24]:
# get a copy of the 'births' array
births = demography_eurostat.births.copy()

# align the two arrays with the 'inner' join method
population_aligned, births_aligned = population_benelux.align(births, join='inner')
[25]:
print('population_benelux before align:')
print(population_benelux)
print()
print('population_benelux after align:')
print(population_aligned)
population_benelux before align:
    country  gender\time     2013     2014     2015     2016     2017
    Belgium         Male  5472856  5493792  5524068  5569264  5589272
    Belgium       Female  5665118  5687048  5713206  5741853  5762455
 Luxembourg         Male   268412   275117   281972   289193   296641
 Luxembourg       Female   268627   274563   280986   287056   294026
Netherlands         Male  8307339  8334385  8372858  8417135  8475102
Netherlands       Female  8472236  8494904  8527868  8561985  8606405

population_benelux after align:
country  gender\time       2013       2014       2015       2016       2017
Belgium         Male  5472856.0  5493792.0  5524068.0  5569264.0  5589272.0
Belgium       Female  5665118.0  5687048.0  5713206.0  5741853.0  5762455.0
[26]:
print('births before align:')
print(births)
print()
print('births after align:')
print(births_aligned)
births before align:
country  gender\time    2013    2014    2015    2016    2017
Belgium         Male   64371   64173   62561   62428   61179
Belgium       Female   61235   60841   59713   59468   58511
 France         Male  415762  418721  409145  401388  394058
 France       Female  396581  400607  390526  382937  375987
Germany         Male  349820  366835  378478  405587  402517
Germany       Female  332249  348092  359097  386554  382384

births after align:
country  gender\time     2013     2014     2015     2016     2017
Belgium         Male  64371.0  64173.0  62561.0  62428.0  61179.0
Belgium       Female  61235.0  60841.0  59713.0  59468.0  58511.0

Aligned arrays can then be used in arithmetic operations:

[27]:
population_aligned - births_aligned
[27]:
country  gender\time       2013       2014       2015       2016       2017
Belgium         Male  5408485.0  5429619.0  5461507.0  5506836.0  5528093.0
Belgium       Female  5603883.0  5626207.0  5653493.0  5682385.0  5703944.0

See align for more details and examples.