larray.read_excel

larray.read_excel(filepath, sheet=0, nb_axes=None, index_col=None, fill_value=nan, na=nan, sort_rows=False, sort_columns=False, wide=True, engine=None, range=slice(None, None, None), **kwargs) → Array[source]

Read excel file from sheet name and returns an Array with the contents.

Parameters

filepathstr or Path: Path where the Excel file has to be read or use -1 to refer to the currently active workbook.
sheetstr, Group or int, optional: Name or index of the Excel sheet containing the array to be read. By default the array is read from the first sheet.
nb_axesint, optional: Number of axes of output array. The first nb_axes - 1 columns and the header of the Excel sheet will be used to set the axes of the output array. If not specified, the number of axes is given by the position of the first column header including a \ character plus one. If no column header includes a \ character, the array is assumed to have one axis. Defaults to None.
index_collist, optional: Positions of columns for the n-1 first axes (ex. [0, 1, 2, 3]). Defaults to None (see nb_axes above).
fill_valuescalar or Array, optional: Value used to fill cells corresponding to label combinations which are not present in the input. Defaults to NaN.
sort_rowsbool, optional: Whether to sort the rows alphabetically (sorting is more efficient than not sorting). Defaults to False.
sort_columnsbool, optional: Whether to sort the columns alphabetically (sorting is more efficient than not sorting). Defaults to False.
widebool, optional: Whether to assume the array is stored in “wide” format. If False, the array is assumed to be stored in “narrow” format: one column per axis plus one value column. Defaults to True.
engine{‘xlwings’, ‘openpyxl’, ‘xlrd’}, optional: Engine to use to read the Excel file. The ‘xlrd’ engine must be used to read Excel files with the old ‘.xls’ extension. Either ‘xlwings’ or ‘openpyxl’ can be used to read Excel files with the standard ‘.xlsx’ extension. Defaults to ‘xlwings’ if the module is installed, ‘openpyxl’ otherwise.
rangestr, optional: Range to load the array from (only supported for the ‘xlwings’ engine). Defaults to slice(None) which loads the whole sheet, ignoring blank cells in the bottom right corner.
**kwargs

Returns

Array

Examples

>>> fname = get_example_filepath('examples.xlsx')

Read array from first sheet

>>> # The data below is derived from a subset of the demo_pjan table from Eurostat
>>> read_excel(fname)                                                               
country  gender\time      2013      2014      2015
Belgium         Male   5472856   5493792   5524068
Belgium       Female   5665118   5687048   5713206
 France         Male  31772665  32045129  32174258
 France       Female  33827685  34120851  34283895
Germany         Male  39380976  39556923  39835457
Germany       Female  41142770  41210540  41362080

Read array from a specific sheet

>>> # The data below is derived from a subset of the demo_fasec table from Eurostat
>>> read_excel(fname, 'births')                                                     
country  gender\time    2013    2014    2015
Belgium         Male   64371   64173   62561
Belgium       Female   61235   60841   59713
 France         Male  415762  418721  409145
 France       Female  396581  400607  390526
Germany         Male  349820  366835  378478
Germany       Female  332249  348092  359097

Missing label combinations

Let us take a look inside the sheet ‘population_missing_values’. Note the missing label combinations: (Paris, male) and (New York, female):

country  gender\time      2013      2014      2015
Belgium         Male   5472856   5493792   5524068
Belgium       Female   5665118   5687048   5713206
 France       Female  33827685  34120851  34283895
Germany         Male  39380976  39556923  39835457

By default, cells associated with missing label combinations are filled with NaN. In that case, an int array is converted to a float array.

>>> read_excel(fname, sheet='population_missing_values')                            
country  gender\time        2013        2014        2015
Belgium         Male   5472856.0   5493792.0   5524068.0
Belgium       Female   5665118.0   5687048.0   5713206.0
 France         Male         nan         nan         nan
 France       Female  33827685.0  34120851.0  34283895.0
Germany         Male  39380976.0  39556923.0  39835457.0
Germany       Female         nan         nan         nan

Using the fill_value argument, you can choose another value to use to fill missing cells.

>>> read_excel(fname, sheet='population_missing_values', fill_value=0)              
country  gender\time      2013      2014      2015
Belgium         Male   5472856   5493792   5524068
Belgium       Female   5665118   5687048   5713206
 France         Male         0         0         0
 France       Female  33827685  34120851  34283895
Germany         Male  39380976  39556923  39835457
Germany       Female         0         0         0

Specify the number of axes of the output array (useful when the name of the last axis is implicit)

The content of the sheet ‘missing_axis_name’ is:

country  gender      2013      2014      2015
Belgium    Male   5472856   5493792   5524068
Belgium  Female   5665118   5687048   5713206
 France    Male  31772665  32045129  32174258
 France  Female  33827685  34120851  34283895
Germany    Male  39380976  39556923  39835457
Germany  Female  41142770  41210540  41362080

>>> # read the array stored in the sheet 'population_missing_axis_name' as is
>>> arr = read_excel(fname, sheet='population_missing_axis_name')                   
>>> # we expected a 3 x 2 x 3 array with data of type int
>>> # but we got a 6 x 4 array with data of type object
>>> arr.info                                                                        
6 x 4
 country [6]: 'Belgium' 'Belgium' 'France' 'France' 'Germany' 'Germany'
 {1} [4]: 'gender' '2013' '2014' '2015'
dtype: object
memory used: 192 bytes
>>> # using argument 'nb_axes', you can force the number of axes of the output array
>>> arr = read_excel(fname, sheet='population_missing_axis_name', nb_axes=3)        
>>> # as expected, we have a 3 x 2 x 3 array with data of type int
>>> arr.info                                                                        
3 x 2 x 3
 country [3]: 'Belgium' 'France' 'Germany'
 gender [2]: 'Male' 'Female'
 {2} [3]: 2013 2014 2015
dtype: int64
memory used: 144 bytes

Read array saved in “narrow” format (wide=False)

Let us take a look inside the sheet ‘population_narrow’ where the data is stored in a ‘narrow’ format:

country  time     value
Belgium  2013  11137974
Belgium  2014  11180840
Belgium  2015  11237274
 France  2013  65600350
 France  2014  66165980
 France  2015  66458153

>>> # to read arrays stored in 'narrow' format, you must pass wide=False to read_excel
>>> read_excel(fname, 'population_narrow_format', wide=False)                       
country\time      2013      2014      2015
     Belgium  11137974  11180840  11237274
      France  65600350  66165980  66458153

Extract array from a given range (xlwings only)

>>> read_excel(fname, 'population_births_deaths', range='A9:E15')                   
country  gender\time    2013    2014    2015
Belgium         Male   64371   64173   62561
Belgium       Female   61235   60841   59713
 France         Male  415762  418721  409145
 France       Female  396581  400607  390526
Germany         Male  349820  366835  378478
Germany       Female  332249  348092  359097