Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCD-481 : Adding reconnection logic added for HDFS2 sink connector #531

Open
wants to merge 2 commits into
base: 10.0.x
Choose a base branch
from

Conversation

SanchayGupta1197
Copy link

@SanchayGupta1197 SanchayGupta1197 commented Dec 1, 2020

Problem

The connector was failing for Avro and ORC format when the connection to the cluster was interrupted.

Solution

Fix for Avro

  • The schema was re-initialized to null in order to invalidate the current temp file.
  • The AvroRuntimeException was caught and thrown as a ConnectException.

Fix for ORC

  • The schema was re-initialized to null in order to invalidate the current temp file.
  • The AlreadyBeingCreatedException was handled by deleting the temp file from the cluster.

Test Strategy

Testing done:
  • Unit tests
  • Integration tests
  • System tests
  • Manual tests

Issues/Limitation

  • Integration test not feasible due to dependency conflicts between connect-runtime and HDFS minicluster.

@SanchayGupta1197 SanchayGupta1197 requested a review from a team as a code owner December 1, 2020 06:49
@ghost
Copy link

ghost commented Dec 1, 2020

It looks like @SanchayGupta1197 hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

conf.getHadoopConfiguration()
);
fileSystem.delete(path, true);
} catch (IOException ex) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log this error message as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added the exception to the log below.

URI.create(conf.getUrl()),
conf.getHadoopConfiguration()
);
fileSystem.delete(path, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with HDFS. Why are we deleting the file for ORC here? Why is schema=null not enough like with AVRO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of org.apache.hadoop.hive.ql.io.orc.OrcFile.WriterOptions does not supports file overwriting therefore deleting tmp file wheneverAlreadyBeingCreatedException is thrown.

@cla-assistant
Copy link

cla-assistant bot commented Aug 27, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ sagar-ab-2702
❌ SanchayGupta1197
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants