Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Table Concurrent query Failure handling in Delta Lake #24250

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vinay-kl
Copy link
Contributor

@vinay-kl vinay-kl commented Nov 25, 2024

Description

Create Table [as select] concurrent query failure handling

Additional context and related issues

Fixes #24153

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Delta-Lake
* Fixes Query Failure leading to table base-path deletion. ({issue}`24153 `)

@github-actions github-actions bot added the delta-lake Delta Lake connector label Nov 25, 2024
@vinay-kl vinay-kl self-assigned this Nov 25, 2024
@vinay-kl vinay-kl changed the title trino/hive: Create Table Concurrent query Failure handling trino/delta-lake: Create Table Concurrent query Failure handling Nov 25, 2024
@vinay-kl vinay-kl force-pushed the databricks-create-table-concurrent-fix branch from 078d491 to c7628b3 Compare November 25, 2024 17:12
@@ -1263,7 +1263,8 @@ public void createTable(ConnectorSession session, ConnectorTableMetadata tableMe
statisticsAccess.deleteExtendedStatistics(session, schemaTableName, location);
}
else {
setRollback(() -> deleteRecursivelyIfExists(fileSystem, deltaLogDirectory));
// deleteRecursivelyIfNothingExists ensures current CREATE TABLE doesn't delete directory if there's a conflict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we checking in the catch clause whether we're dealing with a TransactionConflictException instead?
By doing this, we'd likely know whether we're in a concurrency situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath the rollback happens in a different thread AFAIK, the exception context needs to be passed on as well. Also the rollback initialisation happens in beginCreateTable & createTable calls which is much prior to finishCreateTable which is later.

}

@Test
public void testConcurrentCreateTableAsSelect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this test to TestDeltaLakeLocalConcurrentWritesTest?

Copy link
Contributor Author

@vinay-kl vinay-kl Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr should it be considered as write?, thought as it's a table getting created first time, thought let me keep it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we treat CTAS as a write operation. Please merge the test class.

@ebyhr ebyhr changed the title trino/delta-lake: Create Table Concurrent query Failure handling Create Table Concurrent query Failure handling in Delta Lake Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
delta-lake Delta Lake connector
3 participants