iris.pandas#
Provide conversion to and from Pandas data structures.
See also: http://pandas.pydata.org/
In this module:
- iris.pandas.as_cube(pandas_array, copy=True, calendars=None)[source]#
Convert a Pandas Series/DataFrame into a 1D/2D Iris Cube.
Deprecated since version 3.3.0: This function is scheduled for removal in a future release, being replaced by
iris.pandas.as_cubes()
, which offers richer dimensional intelligence.- Parameters
pandas_array (
pandas.Series
orpandas.DataFrame
) β The Pandas object to convertcopy (bool, default=True) β Whether to copy pandas_array, or to create array views where possible. Provided in case of memory limit concerns.
calendars (dict, optional) β A dict mapping a dimension to a calendar. Required to convert datetime indices/columns.
Notes
This function will copy your data by default.
Example usage:
as_cube(series, calendars={0: cf_units.CALENDAR_360_DAY}) as_cube(data_frame, calendars={1: cf_units.CALENDAR_STANDARD})
- iris.pandas.as_cubes(pandas_structure, copy=True, calendars=None, aux_coord_cols=None, cell_measure_cols=None, ancillary_variable_cols=None)[source]#
Convert a Pandas Series/DataFrame into n-dimensional Iris Cubes, including dimensional metadata.
The index of pandas_structure will be used for generating the
Cube
dimension(s) andDimCoord
s. Other dimensional metadata may span multiple dimensions - based on how the column values vary with the index values.- Parameters
pandas_structure (
pandas.Series
orpandas.DataFrame
) β The Pandas object to convertcopy (bool, default=True) β Whether the Cube
data
is a copy of the pandas_structure column, or a view of the same array. Arrays other than the data (coords etc.) are always copies. This option is provided to help with memory size concerns.calendars (dict, optional) β Calendar conversions for individual date-time coordinate columns/index-levels e.g.
{"my_column": cf_units.CALENDAR_360_DAY}
.aux_coord_cols (list of str, optional) β Names of columns to be converted into
AuxCoord
,CellMeasure
andAncillaryVariable
objects.cell_measure_cols (list of str, optional) β Names of columns to be converted into
AuxCoord
,CellMeasure
andAncillaryVariable
objects.ancillary_variable_cols (list of str, optional) β Names of columns to be converted into
AuxCoord
,CellMeasure
andAncillaryVariable
objects.
- Returns
One
Cube
for each column not referenced in aux_coord_cols/cell_measure_cols/ancillary_variable_cols.- Return type
Notes
A
DataFrame
using columns as a second data dimension will need to be βmeltedβ before conversion. See the Examples for how.Dask
DataFrame
s are not supported.Examples
>>> from iris.pandas import as_cubes >>> import numpy as np >>> from pandas import DataFrame, Series
Converting a simple
Series
:>>> my_series = Series([300, 301, 302], name="air_temperature") >>> converted_cubes = as_cubes(my_series) >>> print(converted_cubes) 0: air_temperature / (unknown) (unknown: 3) >>> print(converted_cubes[0]) air_temperature / (unknown) (unknown: 3) Dimension coordinates: unknown x
A
DataFrame
, with a custom index becoming theDimCoord
:>>> my_df = DataFrame({ ... "air_temperature": [300, 301, 302], ... "longitude": [30, 40, 50] ... }) >>> my_df = my_df.set_index("longitude") >>> converted_cubes = as_cubes(my_df) >>> print(converted_cubes[0]) air_temperature / (unknown) (longitude: 3) Dimension coordinates: longitude x
A
DataFrame
representing two 3-dimensional datasets, including a 2-dimensionalAuxCoord
:>>> my_df = DataFrame({ ... "air_temperature": np.arange(300, 312, 1), ... "air_pressure": np.arange(1000, 1012, 1), ... "longitude": [0, 10] * 6, ... "latitude": [25, 25, 35, 35] * 3, ... "height": ([0] * 4) + ([100] * 4) + ([200] * 4), ... "in_region": [True, False, False, False] * 3 ... }) >>> print(my_df) air_temperature air_pressure longitude latitude height in_region 0 300 1000 0 25 0 True 1 301 1001 10 25 0 False 2 302 1002 0 35 0 False 3 303 1003 10 35 0 False 4 304 1004 0 25 100 True 5 305 1005 10 25 100 False 6 306 1006 0 35 100 False 7 307 1007 10 35 100 False 8 308 1008 0 25 200 True 9 309 1009 10 25 200 False 10 310 1010 0 35 200 False 11 311 1011 10 35 200 False >>> my_df = my_df.set_index(["longitude", "latitude", "height"]) >>> my_df = my_df.sort_index() >>> converted_cubes = as_cubes(my_df, aux_coord_cols=["in_region"]) >>> print(converted_cubes) 0: air_temperature / (unknown) (longitude: 2; latitude: 2; height: 3) 1: air_pressure / (unknown) (longitude: 2; latitude: 2; height: 3) >>> print(converted_cubes[0]) air_temperature / (unknown) (longitude: 2; latitude: 2; height: 3) Dimension coordinates: longitude x - - latitude - x - height - - x Auxiliary coordinates: in_region x x -
Pandas uses
NaN
rather than masking data. ConvertedCube
s can be masked in downstream user code :>>> my_series = Series([300, np.NaN, 302], name="air_temperature") >>> converted_cube = as_cubes(my_series)[0] >>> print(converted_cube.data) [300. nan 302.] >>> converted_cube.data = np.ma.masked_invalid(converted_cube.data) >>> print(converted_cube.data) [300.0 -- 302.0]
If the
DataFrame
uses columns as a second dimension,pandas.melt()
should be used to convert the data to the expected n-dimensional format :>>> my_df = DataFrame({ ... "latitude": [35, 25], ... 0: [300, 301], ... 10: [302, 303], ... }) >>> print(my_df) latitude 0 10 0 35 300 302 1 25 301 303 >>> my_df = my_df.melt( ... id_vars=["latitude"], ... value_vars=[0, 10], ... var_name="longitude", ... value_name="air_temperature" ... ) >>> print(my_df) latitude longitude air_temperature 0 35 0 300 1 25 0 301 2 35 10 302 3 25 10 303 >>> my_df = my_df.set_index(["latitude", "longitude"]) >>> my_df = my_df.sort_index() >>> converted_cube = as_cubes(my_df)[0] >>> print(converted_cube) air_temperature / (unknown) (latitude: 2; longitude: 2) Dimension coordinates: latitude x - longitude - x
- iris.pandas.as_data_frame(cube, copy=True)[source]#
Convert a 2D cube to a Pandas DataFrame.
- Parameters
DataFrame. (* cube - The cube to convert to a Pandas) β
Kwargs:
- copy - Whether to make a copy of the data.
Defaults to True. Must be True for masked data and some data types (see notes below).
Note
This function will copy your data by default. If you have a large array that cannot be copied, make sure it is not masked and use copy=False.
Note
Pandas will sometimes make a copy of the array, for example when creating from an int32 array. Iris will detect this and raise an exception if copy=False.
- iris.pandas.as_series(cube, copy=True)[source]#
Convert a 1D cube to a Pandas Series.
- Parameters
Series. (* cube - The cube to convert to a Pandas) β
Kwargs:
- copy - Whether to make a copy of the data.
Defaults to True. Must be True for masked data.
Note
This function will copy your data by default. If you have a large array that cannot be copied, make sure it is not masked and use copy=False.