BUG: when using id_vars in .melt(), the id_vars does not recognize the column name in a multiIndex dataframe #60018

ivan-marroquin · 2024-10-11T00:48:34Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import yfinance as yf
from datetime import date, timedelta

end_date= date.today().strftime("%Y-%m-%d")
start_date= (date.today() - timedelta(days= 365)).strftime("%Y-%m-%d")

tickers= ['RELIANCE.NS', 'TCS.NS', 'INFY.NS', 'HDFCBANK.NS']

data= yf.download(tickers, start= start_date, end= end_date)

data= data.reset_index()

data_melted= data.melt(id_vars= ['Date'], var_name= ['Attribute', 'Ticker'])

Issue Description

KeyError Traceback (most recent call last)
Cell In [19], line 15
11 data= data.reset_index()
13 # melt the DataFrame to make it long format where each row is a
14 # unique combination of Date, Ticker, and attributes
---> 15 data_melted= data.melt(id_vars= ['Date'], var_name= ['Attribute', 'Ticker'])
17 # pivot the melted DataFrame to have the attributes (Open, High, Low, etc.) as columns
18 data_pivoted= data_melted.pivot_table(index= ['Date', 'Ticker'],
19 columns= 'Attribute', values= 'value',
20 aggfunc= 'first')

File ~/python_3.9.0/lib/python3.9/site-packages/pandas/core/frame.py:9942, in DataFrame.melt(self, id_vars, value_vars, var_name, value_name, col_level, ignore_index)
9932 @appender(_shared_docs["melt"] % {"caller": "df.melt(", "other": "melt"})
9933 def melt(
9934 self,
(...)
9940 ignore_index: bool = True,
9941 ) -> DataFrame:
-> 9942 return melt(
9943 self,
9944 id_vars=id_vars,
9945 value_vars=value_vars,
9946 var_name=var_name,
9947 value_name=value_name,
...
77 )
78 if value_vars_was_not_none:
79 frame = frame.iloc[:, algos.unique(idx)]

KeyError: "The following id_vars or value_vars are not present in the DataFrame: ['Date']"

Expected Behavior

The melt function should generate a DataFrame to make it long format where each row is a unique combination of Date, Ticker, and attributes

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2e python : 3.9.0.final.0 python-bits : 64 OS : Linux OS-release : 6.5.0-25-generic Version : #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.2.2
numpy : 1.23.3
pytz : 2024.2
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.3.1
Cython : 0.29.32
pytest : None
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.1
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.13.1
sqlalchemy : 2.0.31
tables : None
tabulate : 0.9.0
xarray : 2024.6.0
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

Python 3.9.0

The text was updated successfully, but these errors were encountered:

vamsi-verma-s · 2024-10-13T21:58:29Z

breaking change : #55948

vamsi-verma-s · 2024-10-13T22:25:12Z

@ivan-marroquin

It does not look like this is supported. Two things -

The id_vars is no longer checked against a flattened index. (changes in REF: melt #55948). For multiIndex, this must be a list of tuples.
var_name name must be a scalar as mentioned in the docs. (enforeced in REF: melt #55948) It looks like for a multiIndex, the var_names are determined from the index names and cannot be specified.

Therefore, following might be the correct way to go about this -

data.columns.names = ['Attribute', 'Ticker']

df.melt(id_vars= [('Date', '')]).rename(columns={('Date', ''): 'Date'})

Output -

                          Date  Attribute       Ticker         value
0    2023-10-12 00:00:00+00:00  Adj Close  HDFCBANK.NS  1.528971e+03
1    2023-10-13 00:00:00+00:00  Adj Close  HDFCBANK.NS  1.515061e+03
2    2023-10-16 00:00:00+00:00  Adj Close  HDFCBANK.NS  1.508994e+03
3    2023-10-17 00:00:00+00:00  Adj Close  HDFCBANK.NS  1.520438e+03
4    2023-10-18 00:00:00+00:00  Adj Close  HDFCBANK.NS  1.499277e+03
...                        ...        ...          ...           ...
5851 2024-10-04 00:00:00+00:00     Volume       TCS.NS  2.965463e+06
5852 2024-10-07 00:00:00+00:00     Volume       TCS.NS  1.472619e+06
5853 2024-10-08 00:00:00+00:00     Volume       TCS.NS  1.541867e+06
5854 2024-10-09 00:00:00+00:00     Volume       TCS.NS  1.082504e+06
5855 2024-10-10 00:00:00+00:00     Volume       TCS.NS  2.378875e+06

[5856 rows x 4 columns]

ivan-marroquin added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: when using id_vars in .melt(), the id_vars does not recognize the column name in a multiIndex dataframe #60018

BUG: when using id_vars in .melt(), the id_vars does not recognize the column name in a multiIndex dataframe #60018

ivan-marroquin commented Oct 11, 2024

vamsi-verma-s commented Oct 13, 2024

vamsi-verma-s commented Oct 13, 2024

BUG: when using id_vars in .melt(), the id_vars does not recognize the column name in a multiIndex dataframe #60018

BUG: when using id_vars in .melt(), the id_vars does not recognize the column name in a multiIndex dataframe #60018

Comments

ivan-marroquin commented Oct 11, 2024

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

vamsi-verma-s commented Oct 13, 2024

vamsi-verma-s commented Oct 13, 2024