Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The create_final_community_reports.parquet file disappears. #940

Open
2 tasks
yangxue-1 opened this issue Aug 15, 2024 · 11 comments
Open
2 tasks

The create_final_community_reports.parquet file disappears. #940

yangxue-1 opened this issue Aug 15, 2024 · 11 comments
Labels
triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@yangxue-1
Copy link

Is there an existing issue for this?

  • I have searched the existing issues
  • I have checked #657 to validate if my issue is covered by community support

Describe the issue

The create_final_community_reports.parquet file is not generated after all processes of the index are executed.

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:
@yangxue-1 yangxue-1 added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 15, 2024
@xgl0626
Copy link

xgl0626 commented Aug 15, 2024

I've been experiencing this bug for about a week now, and this error appears in the indexing-engine.log:

Traceback (most recent call last):
File "/home/notebook/code/group/rag_reearch/graphrag-0.3.0/graphrag/index/emit/parquet_table_emitter.py", line 40, in emit
await self._storage.set(filename, data.to_parquet())
File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/util/_decorators.py", line 333, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/core/frame.py", line 3113, in to_parquet
return to_parquet(
File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/io/parquet.py", line 480, in to_parquet
impl.write(
File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/io/parquet.py", line 190, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas
File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 624, in dataframe_to_arrays
arrays[i] = maybe_fut.result()
File "/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self.exception
File "/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column
raise e
File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column
result = pa.array(col, type=type
, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 339, in pyarrow.lib.array
File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('cannot mix list and non-list, non-null values', 'Conversion failed for column findings with type object')

I've also re-executed on version 0.3.0 and still get errors

@yangxue-1
Copy link
Author

I've been experiencing this bug for about a week now, and this error appears in the indexing-engine.log:

Traceback (most recent call last): File "/home/notebook/code/group/rag_reearch/graphrag-0.3.0/graphrag/index/emit/parquet_table_emitter.py", line 40, in emit await self._storage.set(filename, data.to_parquet()) File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/util/_decorators.py", line 333, in wrapper return func(*args, **kwargs) File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/core/frame.py", line 3113, in to_parquet return to_parquet( File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/io/parquet.py", line 480, in to_parquet impl.write( File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/io/parquet.py", line 190, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 624, in dataframe_to_arrays arrays[i] = maybe_fut.result() File "/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self.exception File "/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 339, in pyarrow.lib.array File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ('cannot mix list and non-list, non-null values', 'Conversion failed for column findings with type object')

I've also re-executed on version 0.3.0 and still get errors

I found out that my problem might be a slight formatting error in the template.

@xgl0626
Copy link

xgl0626 commented Aug 16, 2024

Thanks, I found that my report propmt and the official report propmt also have some format differences, I'm trying it out

我已经遇到这个错误大约一个星期了,这个错误出现在indexing-engine.log中:
回溯(最近一次调用最后一次): 文件“/home/notebook/code/group/rag_reearch/graphrag-0.3.0/graphrag/index/emit/parquet_table_emitter.py”,第 40 行,在 emit await self._storage.set(filename, data.to_parquet()) 文件“/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/util/_decorators.py”,第 333 行,在包装器中返回 func(*args, **kwargs) 文件“/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/core/frame.py”, 第 3113 行,在to_parquet返回 to_parquet( 文件“/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/io/parquet.py”,第 480 行,to_parquet impl.write( 文件“/opt/conda/envs/graphrag/lib/python3.10/site-packages/pandas/io/parquet.py”,第 190 行,写入表 = self.api.Table.from_pandas(df, **from_pandas_kwargs) 文件“pyarrow/table.pxi”,第 3874 行,pyarrow.lib.Table.from_pandas文件“/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py“,第 624 行,在 dataframe_to_arrays 数组中[i] = maybe_fut.result() 文件”/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/_base.py“,第 451 行,在结果返回中 self.__get_result() 文件”/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/_base.py“,第 403 行,__get_result自。异常文件“/opt/conda/envs/graphrag/lib/python3.10/concurrent/futures/thread.py”,第 58 行,运行结果 = self.fn(*self.args, **self.kwargs) 文件“/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py”,第 598 行,convert_column引发 e 文件“/opt/conda/envs/graphrag/lib/python3.10/site-packages/pyarrow/pandas_compat.py”,第 592 行,convert_column结果 = pa.array(col, type=type, from_pandas=True, safe=safe) 文件“pyarrow/array.pxi”,第 339 行,在 pyarrow.lib.array 中 文件“pyarrow/array.pxi”,第 85 行,在pyarrow.lib._ndarray_to_array 文件“pyarrow/error.pxi”,第 91 行,在pyarrow.lib.check_status中 pyarrow.lib.ArrowInvalid:(“无法混合列表和非列表,非空值”,“对象类型的列发现转换失败”)
我也在版本 0.3.0 上重新执行,但仍然收到错误

我发现我的问题可能是模板中的轻微格式错误。

Thanks, I found that my report propmt and the official report propmt also have some format differences, I'm trying it out

@xxll88
Copy link

xxll88 commented Aug 16, 2024

same problem ,how to resolve?
21:56:42,374 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_final_community_reports.parquet
21:56:42,376 graphrag.index.emit.parquet_table_emitter ERROR Error while emitting parquet table
Traceback (most recent call last):
File "/home/lile/microsoft/graphrag/graphrag/index/emit/parquet_table_emitter.py", line 40, in emit
await self._storage.set(filename, data.to_parquet())
^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/util/decorators.py", line 333, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 3113, in to_parquet
return to_parquet(
^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/io/parquet.py", line 480, in to_parquet
impl.write(
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/io/parquet.py", line 190, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays
arrays = [convert_column(c, f)
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 611, in
arrays = [convert_column(c, f)
^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column
raise e
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column
result = pa.array(col, type=type
, from_pandas=True, safe=safe)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 339, in pyarrow.lib.array
File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('cannot mix struct and non-struct, non-null values', 'Conversion failed for column findings with type object')
21:56:42,381 graphrag.index.reporting.file_workflow_callbacks INFO Error emitting table details=None
21:56:42,677 graphrag.index.run INFO Running workflow: create_final_text_units...
21:56:42,677 graphrag.index.run INFO dependencies for create_final_text_units: ['join_text_units_to_entity_ids', 'create_base_text_units', 'join_text_units_to_relationship_ids']
21:56:42,677 graphrag.index.run INFO read table from storage: join_text_units_to_entity_ids.parquet

@xgl0626
Copy link

xgl0626 commented Aug 16, 2024

same problem ,how to resolve? 21:56:42,374 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_final_community_reports.parquet 21:56:42,376 graphrag.index.emit.parquet_table_emitter ERROR Error while emitting parquet table Traceback (most recent call last): File "/home/lile/microsoft/graphrag/graphrag/index/emit/parquet_table_emitter.py", line 40, in emit await self._storage.set(filename, data.to_parquet()) ^^^^^^^^^^^^^^^^^ File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/util/decorators.py", line 333, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/frame.py", line 3113, in to_parquet return to_parquet( ^^^^^^^^^^^ File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/io/parquet.py", line 480, in to_parquet impl.write( File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/io/parquet.py", line 190, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays arrays = [convert_column(c, f) ^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 611, in arrays = [convert_column(c, f) ^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type, from_pandas=True, safe=safe) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 339, in pyarrow.lib.array File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ('cannot mix struct and non-struct, non-null values', 'Conversion failed for column findings with type object') 21:56:42,381 graphrag.index.reporting.file_workflow_callbacks INFO Error emitting table details=None 21:56:42,677 graphrag.index.run INFO Running workflow: create_final_text_units... 21:56:42,677 graphrag.index.run INFO dependencies for create_final_text_units: ['join_text_units_to_entity_ids', 'create_base_text_units', 'join_text_units_to_relationship_ids'] 21:56:42,677 graphrag.index.run INFO read table from storage: join_text_units_to_entity_ids.parquet

I tried to modify the prompt, but it still reported an error, I don't know what the problem is, there will be no bug on a small data set, and it will not work if I replace it with a large data set.

File "/home/lile/microsoft/graphrag/graphrag/index/emit/parquet_table_emitter.py", line 40, in emit
await self._storage.set(filename, data.to_parquet())

I'm planning to store the data in a csv at this line of code to see what the problem is

@webZW
Copy link

webZW commented Aug 19, 2024

I tried to add the error blocking and error pocketing logic, the code is as follows, after processing it works fine graphrag.

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""ParquetTableEmitter module."""

import logging
import traceback

import pandas as pd
from pyarrow.lib import ArrowInvalid, ArrowTypeError

from graphrag.index.storage import PipelineStorage
from graphrag.index.typing import ErrorHandlerFn

from .table_emitter import TableEmitter

log = logging.getLogger(__name__)

class ParquetTableEmitter(TableEmitter):
    """ParquetTableEmitter class."""

    _storage: PipelineStorage
    _on_error: ErrorHandlerFn

    def __init__(
        self,
        storage: PipelineStorage,
        on_error: ErrorHandlerFn,
    ):
        """Create a new Parquet Table Emitter."""
        self._storage = storage
        self._on_error = on_error

    async def preprocess_and_emit(self, filename: str, data: pd.DataFrame) -> None:
        """Preprocess data and emit to storage."""
        def preprocess_findings_column(df):
            def ensure_struct(x):
                if isinstance(x, dict):
                    return x
                elif pd.isnull(x).any():
                    return None
                else:
                    return {'value': x}

            df['findings'] = df['findings'].apply(ensure_struct)
            return df

        # Apply preprocessing
        data = preprocess_findings_column(data)
        await self._storage.set(filename, data.to_parquet())

    async def emit(self, name: str, data: pd.DataFrame) -> None:
        """Emit a dataframe to storage."""
        filename = f"{name}.parquet"
        log.info("Emitting parquet table %s", filename)
        
        try:
            await self._storage.set(filename, data.to_parquet())
        except (ArrowTypeError, ArrowInvalid) as e:
            log.warning("Initial parquet save failed, preprocessing data and retrying due to error: %s", str(e))
            try:
                await self.preprocess_and_emit(filename, data)
            except Exception as ex:
                log.exception("Error while emitting parquet table after retry")
                self._on_error(
                    ex,
                    traceback.format_exc(),
                    None,
                )
        except Exception as e:
            log.exception("Unexpected error while emitting parquet table")
            self._on_error(
                e,
                traceback.format_exc(),
                None,
            )

@xgl0626
Copy link

xgl0626 commented Aug 19, 2024

I tried to add the error blocking and error pocketing logic, the code is as follows, after processing it works fine graphrag.

Thanks for the reply, I also solved the problem by converting him to csv before, I'll try your method

@therealcyberlord
Copy link

therealcyberlord commented Aug 20, 2024

Thanks everyone for your insights. Converting the pandas data frame to csv, then converting to parquet worked for me. However, I am getting a new issue:

You are trying to merge on int64 and object columns for key 'community'. If you wish to proceed you should use pd.concat

Update: solved the issue by casting column community to string in pandas

@LingXuanYin
Copy link

LingXuanYin commented Aug 26, 2024

感谢 @xgl0626@therealcyberlord ,我遇到了同样的问题,并如下更改了代码,现在一切运行正常了
image

try:
            open('./buf.csv','w+',encoding='UTF-8')
            data.to_csv('./buf.csv',encoding='UTF-8')
            data=pd.read_csv('./buf.csv',encoding='UTF-8')
            data['community']=data['community'].astype(str)

            await self._storage.set(filename, data.to_parquet())
            shutil.rmtree('./buf.csv')
except ArrowTypeError as e:

@xxll88
Copy link

xxll88 commented Sep 2, 2024

感谢 @xgl0626@therealcyberlord ,我遇到了同样的问题,并如下更改了代码,现在一切运行正常了 image

try:
            open('./buf.csv','w+',encoding='UTF-8')
            data.to_csv('./buf.csv',encoding='UTF-8')
            data=pd.read_csv('./buf.csv',encoding='UTF-8')
            data['community']=data['community'].astype(str)

            await self._storage.set(filename, data.to_parquet())
            shutil.rmtree('./buf.csv')
except ArrowTypeError as e:

Thanks for help , The create_final_community_reports.parquet file has been created.
but when local search:

File "/home/lile/graphrag/graphrag/query/api.py", line 272, in local_search_streaming
_entities = read_indexer_entities(nodes, entities, community_level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lile/graphrag/graphrag/query/indexer_adapters.py", line 105, in read_indexer_entities
entity_df["community"] = entity_df["community"].astype(int)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/generic.py", line 664 3, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/internals/managers.py ", line 430, in astype
return self.apply(
^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/internals/managers.py ", line 363, in apply
applied = getattr(b, f)(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 758, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/dtypes/astype.py", li ne 237, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/dtypes/astype.py", li ne 182, in astype_array
values = _astype_nansafe(values, dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/mambaforge/envs/graphrag/lib/python3.11/site-packages/pandas/core/dtypes/astype.py", li ne 133, in _astype_nansafe
return arr.astype(dtype, copy=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '4.0'

@onepointconsulting
Copy link

I tried to add the error blocking and error pocketing logic, the code is as follows, after processing it works fine graphrag.

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""ParquetTableEmitter module."""

import logging
import traceback

import pandas as pd
from pyarrow.lib import ArrowInvalid, ArrowTypeError

from graphrag.index.storage import PipelineStorage
from graphrag.index.typing import ErrorHandlerFn

from .table_emitter import TableEmitter

log = logging.getLogger(__name__)

class ParquetTableEmitter(TableEmitter):
    """ParquetTableEmitter class."""

    _storage: PipelineStorage
    _on_error: ErrorHandlerFn

    def __init__(
        self,
        storage: PipelineStorage,
        on_error: ErrorHandlerFn,
    ):
        """Create a new Parquet Table Emitter."""
        self._storage = storage
        self._on_error = on_error

    async def preprocess_and_emit(self, filename: str, data: pd.DataFrame) -> None:
        """Preprocess data and emit to storage."""
        def preprocess_findings_column(df):
            def ensure_struct(x):
                if isinstance(x, dict):
                    return x
                elif pd.isnull(x).any():
                    return None
                else:
                    return {'value': x}

            df['findings'] = df['findings'].apply(ensure_struct)
            return df

        # Apply preprocessing
        data = preprocess_findings_column(data)
        await self._storage.set(filename, data.to_parquet())

    async def emit(self, name: str, data: pd.DataFrame) -> None:
        """Emit a dataframe to storage."""
        filename = f"{name}.parquet"
        log.info("Emitting parquet table %s", filename)
        
        try:
            await self._storage.set(filename, data.to_parquet())
        except (ArrowTypeError, ArrowInvalid) as e:
            log.warning("Initial parquet save failed, preprocessing data and retrying due to error: %s", str(e))
            try:
                await self.preprocess_and_emit(filename, data)
            except Exception as ex:
                log.exception("Error while emitting parquet table after retry")
                self._on_error(
                    ex,
                    traceback.format_exc(),
                    None,
                )
        except Exception as e:
            log.exception("Unexpected error while emitting parquet table")
            self._on_error(
                e,
                traceback.format_exc(),
                None,
            )

I have tried this out and it fixes the problem. Many thanks. I wish that there would be a pull request to fix this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

8 participants
@therealcyberlord @onepointconsulting @webZW @xgl0626 @xxll88 @LingXuanYin @yangxue-1 and others