Transforming Arrays (Relabeling, Renaming, Reordering, Sorting, …)
Import the LArray library:
[1]:
from larray import *
Import the population
array from the demography_eurostat
dataset:
[2]:
demography_eurostat = load_example_data('demography_eurostat')
population = demography_eurostat.population
# display the 'population' array
population
[2]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
Manipulating axes
The Array
class offers several methods to manipulate the axes and labels of an array:
set_labels: to replace all or some labels of one or several axes.
rename: to replace one or several axis names.
set_axes: to replace one or several axes.
transpose: to modify the order of axes.
drop: to remove one or several labels.
combine_axes: to combine axes.
split_axes: to split one or several axes by splitting their labels and names.
reindex: to reorder, add and remove labels of one or several axes.
insert: to insert a label at a given position.
Relabeling
Replace some labels of an axis:
[3]:
# replace only one label of the 'gender' axis by passing a dict
population_new_labels = population.set_labels('gender', {'Male': 'Men'})
population_new_labels
[3]:
country gender\time 2013 2014 2015 2016 2017
Belgium Men 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Men 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Men 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
[4]:
# set all labels of the 'country' axis to uppercase by passing the function str.upper()
population_new_labels = population.set_labels('country', str.upper)
population_new_labels
[4]:
country gender\time 2013 2014 2015 2016 2017
BELGIUM Male 5472856 5493792 5524068 5569264 5589272
BELGIUM Female 5665118 5687048 5713206 5741853 5762455
FRANCE Male 31772665 32045129 32174258 32247386 32318973
FRANCE Female 33827685 34120851 34283895 34391005 34485148
GERMANY Male 39380976 39556923 39835457 40514123 40697118
GERMANY Female 41142770 41210540 41362080 41661561 41824535
See set_labels for more details and examples.
Renaming axes
Rename one axis:
[5]:
# 'rename' returns a copy of the array
population_new_names = population.rename('time', 'year')
population_new_names
[5]:
country gender\year 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
Rename several axes at once:
[6]:
population_new_names = population.rename({'gender': 'sex', 'time': 'year'})
population_new_names
[6]:
country sex\year 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
See rename for more details and examples.
Replacing Axes
Replace one axis:
[7]:
new_gender = Axis('sex=Men,Women')
population_new_axis = population.set_axes('gender', new_gender)
population_new_axis
[7]:
country sex\time 2013 2014 2015 2016 2017
Belgium Men 5472856 5493792 5524068 5569264 5589272
Belgium Women 5665118 5687048 5713206 5741853 5762455
France Men 31772665 32045129 32174258 32247386 32318973
France Women 33827685 34120851 34283895 34391005 34485148
Germany Men 39380976 39556923 39835457 40514123 40697118
Germany Women 41142770 41210540 41362080 41661561 41824535
Replace several axes at once:
[8]:
new_country = Axis('country_codes=BE,FR,DE')
population_new_axes = population.set_axes({'country': new_country, 'gender': new_gender})
population_new_axes
[8]:
country_codes sex\time 2013 2014 2015 2016 2017
BE Men 5472856 5493792 5524068 5569264 5589272
BE Women 5665118 5687048 5713206 5741853 5762455
FR Men 31772665 32045129 32174258 32247386 32318973
FR Women 33827685 34120851 34283895 34391005 34485148
DE Men 39380976 39556923 39835457 40514123 40697118
DE Women 41142770 41210540 41362080 41661561 41824535
Reordering axes
Axes can be reordered using transpose
method. By default, transpose reverse axes, otherwise it permutes the axes according to the list given as argument. Axes not mentioned come after those which are mentioned(and keep their relative order). Finally, transpose returns a copy of the array.
[9]:
# starting order : country, gender, time
population
[9]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
[10]:
# no argument --> reverse all axes
population_transposed = population.transpose()
# .T is a shortcut for .transpose()
population_transposed = population.T
population_transposed
[10]:
time gender\country Belgium France Germany
2013 Male 5472856 31772665 39380976
2013 Female 5665118 33827685 41142770
2014 Male 5493792 32045129 39556923
2014 Female 5687048 34120851 41210540
2015 Male 5524068 32174258 39835457
2015 Female 5713206 34283895 41362080
2016 Male 5569264 32247386 40514123
2016 Female 5741853 34391005 41661561
2017 Male 5589272 32318973 40697118
2017 Female 5762455 34485148 41824535
[11]:
# reorder according to list
population_transposed = population.transpose('gender', 'country', 'time')
population_transposed
[11]:
gender country\time 2013 2014 2015 2016 2017
Male Belgium 5472856 5493792 5524068 5569264 5589272
Male France 31772665 32045129 32174258 32247386 32318973
Male Germany 39380976 39556923 39835457 40514123 40697118
Female Belgium 5665118 5687048 5713206 5741853 5762455
Female France 33827685 34120851 34283895 34391005 34485148
Female Germany 41142770 41210540 41362080 41661561 41824535
[12]:
# move 'time' axis at first place
# not mentioned axes come after those which are mentioned (and keep their relative order)
population_transposed = population.transpose('time')
population_transposed
[12]:
time country\gender Male Female
2013 Belgium 5472856 5665118
2013 France 31772665 33827685
2013 Germany 39380976 41142770
2014 Belgium 5493792 5687048
2014 France 32045129 34120851
2014 Germany 39556923 41210540
2015 Belgium 5524068 5713206
2015 France 32174258 34283895
2015 Germany 39835457 41362080
2016 Belgium 5569264 5741853
2016 France 32247386 34391005
2016 Germany 40514123 41661561
2017 Belgium 5589272 5762455
2017 France 32318973 34485148
2017 Germany 40697118 41824535
[13]:
# move 'gender' axis at last place
# not mentioned axes come before those which are mentioned (and keep their relative order)
population_transposed = population.transpose(..., 'gender')
population_transposed
[13]:
country time\gender Male Female
Belgium 2013 5472856 5665118
Belgium 2014 5493792 5687048
Belgium 2015 5524068 5713206
Belgium 2016 5569264 5741853
Belgium 2017 5589272 5762455
France 2013 31772665 33827685
France 2014 32045129 34120851
France 2015 32174258 34283895
France 2016 32247386 34391005
France 2017 32318973 34485148
Germany 2013 39380976 41142770
Germany 2014 39556923 41210540
Germany 2015 39835457 41362080
Germany 2016 40514123 41661561
Germany 2017 40697118 41824535
See transpose for more details and examples.
Dropping Labels
[14]:
population_labels_dropped = population.drop([2014, 2016])
population_labels_dropped
[14]:
country gender\time 2013 2015 2017
Belgium Male 5472856 5524068 5589272
Belgium Female 5665118 5713206 5762455
France Male 31772665 32174258 32318973
France Female 33827685 34283895 34485148
Germany Male 39380976 39835457 40697118
Germany Female 41142770 41362080 41824535
See drop for more details and examples.
Combine And Split Axes
Combine two axes:
[15]:
population_combined_axes = population.combine_axes(('country', 'gender'))
population_combined_axes
[15]:
country_gender\time 2013 2014 2015 2016 2017
Belgium_Male 5472856 5493792 5524068 5569264 5589272
Belgium_Female 5665118 5687048 5713206 5741853 5762455
France_Male 31772665 32045129 32174258 32247386 32318973
France_Female 33827685 34120851 34283895 34391005 34485148
Germany_Male 39380976 39556923 39835457 40514123 40697118
Germany_Female 41142770 41210540 41362080 41661561 41824535
Split an axis:
[16]:
population_split_axes = population_combined_axes.split_axes('country_gender')
population_split_axes
[16]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
See combine_axes and split_axes for more details and examples.
Reordering, adding and removing labels
The reindex
method allows to reorder, add and remove labels along one axis:
[17]:
# reverse years + remove 2013 + add 2018 + copy data for 2017 to 2018
population_new_time = population.reindex('time', '2018..2014', fill_value=population[2017])
population_new_time
[17]:
country gender\time 2018 2017 2016 2015 2014
Belgium Male 5589272 5589272 5569264 5524068 5493792
Belgium Female 5762455 5762455 5741853 5713206 5687048
France Male 32318973 32318973 32247386 32174258 32045129
France Female 34485148 34485148 34391005 34283895 34120851
Germany Male 40697118 40697118 40514123 39835457 39556923
Germany Female 41824535 41824535 41661561 41362080 41210540
or several axes:
[18]:
population_new = population.reindex({'country': 'country=Luxembourg,Belgium,France,Germany',
'time': 'time=2018..2014'}, fill_value=0)
population_new
[18]:
country gender\time 2018 2017 2016 2015 2014
Luxembourg Male 0 0 0 0 0
Luxembourg Female 0 0 0 0 0
Belgium Male 0 5589272 5569264 5524068 5493792
Belgium Female 0 5762455 5741853 5713206 5687048
France Male 0 32318973 32247386 32174258 32045129
France Female 0 34485148 34391005 34283895 34120851
Germany Male 0 40697118 40514123 39835457 39556923
Germany Female 0 41824535 41661561 41362080 41210540
See reindex for more details and examples.
Another way to insert new labels is to use the insert
method:
[19]:
# insert a new country before 'France' with all values set to 0
population_new_country = population.insert(0, before='France', label='Luxembourg')
# or equivalently
population_new_country = population.insert(0, after='Belgium', label='Luxembourg')
population_new_country
[19]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
Luxembourg Male 0 0 0 0 0
Luxembourg Female 0 0 0 0 0
France Male 31772665 32045129 32174258 32247386 32318973
France Female 33827685 34120851 34283895 34391005 34485148
Germany Male 39380976 39556923 39835457 40514123 40697118
Germany Female 41142770 41210540 41362080 41661561 41824535
See insert for more details and examples.
Sorting
sort_labels: sort the labels of an axis.
labelsofsorted: give labels which would sort an axis.
sort_values: sort axes according to values
[20]:
# get a copy of the 'population_benelux' array
population_benelux = demography_eurostat.population_benelux.copy()
population_benelux
[20]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
Luxembourg Male 268412 275117 281972 289193 296641
Luxembourg Female 268627 274563 280986 287056 294026
Netherlands Male 8307339 8334385 8372858 8417135 8475102
Netherlands Female 8472236 8494904 8527868 8561985 8606405
Sort an axis (alphabetically if labels are strings)
[21]:
population_sorted = population_benelux.sort_labels('gender')
population_sorted
[21]:
country gender\time 2013 2014 2015 2016 2017
Belgium Female 5665118 5687048 5713206 5741853 5762455
Belgium Male 5472856 5493792 5524068 5569264 5589272
Luxembourg Female 268627 274563 280986 287056 294026
Luxembourg Male 268412 275117 281972 289193 296641
Netherlands Female 8472236 8494904 8527868 8561985 8606405
Netherlands Male 8307339 8334385 8372858 8417135 8475102
Give labels which would sort the axis
[22]:
population_benelux.labelsofsorted('country')
[22]:
country gender\time 2013 ... 2017
0 Male Luxembourg ... Luxembourg
0 Female Luxembourg ... Luxembourg
1 Male Belgium ... Belgium
1 Female Belgium ... Belgium
2 Male Netherlands ... Netherlands
2 Female Netherlands ... Netherlands
Sort according to values
[23]:
population_sorted = population_benelux.sort_values(('Male', 2017))
population_sorted
[23]:
country gender\time 2013 2014 2015 2016 2017
Luxembourg Male 268412 275117 281972 289193 296641
Luxembourg Female 268627 274563 280986 287056 294026
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
Netherlands Male 8307339 8334385 8372858 8417135 8475102
Netherlands Female 8472236 8494904 8527868 8561985 8606405
Aligning Arrays
The align
method align two arrays on their axes with a specified join method. In other words, it ensure all common axes are compatible.
[24]:
# get a copy of the 'births' array
births = demography_eurostat.births.copy()
# align the two arrays with the 'inner' join method
population_aligned, births_aligned = population_benelux.align(births, join='inner')
[25]:
print('population_benelux before align:')
print(population_benelux)
print()
print('population_benelux after align:')
print(population_aligned)
population_benelux before align:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856 5493792 5524068 5569264 5589272
Belgium Female 5665118 5687048 5713206 5741853 5762455
Luxembourg Male 268412 275117 281972 289193 296641
Luxembourg Female 268627 274563 280986 287056 294026
Netherlands Male 8307339 8334385 8372858 8417135 8475102
Netherlands Female 8472236 8494904 8527868 8561985 8606405
population_benelux after align:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5472856.0 5493792.0 5524068.0 5569264.0 5589272.0
Belgium Female 5665118.0 5687048.0 5713206.0 5741853.0 5762455.0
[26]:
print('births before align:')
print(births)
print()
print('births after align:')
print(births_aligned)
births before align:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 64371 64173 62561 62428 61179
Belgium Female 61235 60841 59713 59468 58511
France Male 415762 418721 409145 401388 394058
France Female 396581 400607 390526 382937 375987
Germany Male 349820 366835 378478 405587 402517
Germany Female 332249 348092 359097 386554 382384
births after align:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 64371.0 64173.0 62561.0 62428.0 61179.0
Belgium Female 61235.0 60841.0 59713.0 59468.0 58511.0
Aligned arrays can then be used in arithmetic operations:
[27]:
population_aligned - births_aligned
[27]:
country gender\time 2013 2014 2015 2016 2017
Belgium Male 5408485.0 5429619.0 5461507.0 5506836.0 5528093.0
Belgium Female 5603883.0 5626207.0 5653493.0 5682385.0 5703944.0
See align for more details and examples.